[GitHub] [kafka] vcrfxia commented on a change in pull request #11368: KAFKA-13261: Add support for custom partitioners in foreign key joins

GitBox Fri, 22 Oct 2021 09:10:34 -0700


vcrfxia commented on a change in pull request #11368:
URL: https://github.com/apache/kafka/pull/11368#discussion_r734673460




##########
File path: 
streams/src/main/java/org/apache/kafka/streams/kstream/TableJoined.java
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.kafka.streams.kstream;
+
+import org.apache.kafka.streams.processor.StreamPartitioner;
+
+import java.util.function.Function;
+
+/**
+ * The {@code TableJoined} class represents optional params that can be passed 
to
+ * {@link KTable#join(KTable, Function, ValueJoiner, TableJoined) 
KTable#join(KTable,Function,...)} and
+ * {@link KTable#leftJoin(KTable, Function, ValueJoiner, TableJoined) 
KTable#leftJoin(KTable,Function,...)}
+ * operations, for foreign key joins.
+ * @param <K>   this key type ; key type for the left (primary) table
+ * @param <KO>  other key type ; key type for the right (foreign key) table
+ */
+public class TableJoined<K, KO> implements NamedOperation<TableJoined<K, KO>> {
+
+    protected final StreamPartitioner<K, Void> partitioner;
+    protected final StreamPartitioner<KO, Void> otherPartitioner;
+    protected final String name;
+
+    private TableJoined(final StreamPartitioner<K, Void> partitioner,
+                        final StreamPartitioner<KO, Void> otherPartitioner,
+                        final String name) {
+        this.partitioner = partitioner;
+        this.otherPartitioner = otherPartitioner;
+        this.name = name;
+    }
+
+    protected TableJoined(final TableJoined<K, KO> tableJoined) {
+        this(tableJoined.partitioner, tableJoined.otherPartitioner, 
tableJoined.name);
+    }
+
+    /**
+     * Create an instance of {@code TableJoined} with partitioner and 
otherPartitioner {@link StreamPartitioner} instances.

Review comment:
       Good call. I've updated the text -- LMK what you think.

##########
File path: 
streams/src/test/java/org/apache/kafka/streams/integration/KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest.java
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kafka.streams.integration;
+
+import static java.time.Duration.ofSeconds;
+import static java.util.Arrays.asList;
+import static 
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.startApplicationAndWaitUntilRunning;
+import static 
org.apache.kafka.streams.integration.utils.IntegrationTestUtils.waitUntilMinKeyValueRecordsReceived;
+import static org.junit.Assert.assertEquals;
+
+import java.io.IOException;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Properties;
+import java.util.Set;
+
+import org.apache.kafka.clients.consumer.ConsumerConfig;
+import org.apache.kafka.clients.producer.ProducerConfig;
+import org.apache.kafka.common.serialization.Serdes;
+import org.apache.kafka.common.serialization.StringDeserializer;
+import org.apache.kafka.common.serialization.StringSerializer;
+import org.apache.kafka.common.utils.Bytes;
+import org.apache.kafka.streams.KafkaStreams;
+import org.apache.kafka.streams.KeyValue;
+import org.apache.kafka.streams.StreamsBuilder;
+import org.apache.kafka.streams.StreamsConfig;
+import org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster;
+import org.apache.kafka.streams.integration.utils.IntegrationTestUtils;
+import org.apache.kafka.streams.kstream.Consumed;
+import org.apache.kafka.streams.kstream.KTable;
+import org.apache.kafka.streams.kstream.Materialized;
+import org.apache.kafka.streams.kstream.Named;
+import org.apache.kafka.streams.kstream.Produced;
+import org.apache.kafka.streams.kstream.Repartitioned;
+import org.apache.kafka.streams.kstream.TableJoined;
+import org.apache.kafka.streams.kstream.ValueJoiner;
+import org.apache.kafka.streams.state.KeyValueStore;
+import org.apache.kafka.streams.utils.UniqueTopicSerdeScope;
+import org.apache.kafka.test.IntegrationTest;
+import org.apache.kafka.test.TestUtils;
+import org.junit.After;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import kafka.utils.MockTime;
+
+@Category({IntegrationTest.class})
+public class KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest {
+    private final static int NUM_BROKERS = 1;
+
+    public final static EmbeddedKafkaCluster CLUSTER = new 
EmbeddedKafkaCluster(NUM_BROKERS);
+    private final static MockTime MOCK_TIME = CLUSTER.time;
+    private final static String TABLE_1 = "table1";
+    private final static String TABLE_2 = "table2";
+    private final static String OUTPUT = "output-";
+    private final Properties streamsConfig = getStreamsConfig();
+    private final Properties streamsConfigTwo = getStreamsConfig();
+    private final Properties streamsConfigThree = getStreamsConfig();
+    private KafkaStreams streams;
+    private KafkaStreams streamsTwo;
+    private KafkaStreams streamsThree;
+    private final static Properties CONSUMER_CONFIG = new Properties();
+
+    private final static Properties PRODUCER_CONFIG_1 = new Properties();
+    private final static Properties PRODUCER_CONFIG_2 = new Properties();
+
+    @BeforeClass
+    public static void startCluster() throws IOException, InterruptedException 
{
+        CLUSTER.start();
+        //Use multiple partitions to ensure distribution of keys.
+
+        CLUSTER.createTopic(TABLE_1, 4, 1);
+        CLUSTER.createTopic(TABLE_2, 4, 1);
+        CLUSTER.createTopic(OUTPUT, 4, 1);
+
+        PRODUCER_CONFIG_1.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, 
CLUSTER.bootstrapServers());
+        PRODUCER_CONFIG_1.put(ProducerConfig.ACKS_CONFIG, "all");
+        PRODUCER_CONFIG_1.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, 
StringSerializer.class);
+        PRODUCER_CONFIG_1.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, 
StringSerializer.class);
+
+        PRODUCER_CONFIG_2.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, 
CLUSTER.bootstrapServers());
+        PRODUCER_CONFIG_2.put(ProducerConfig.ACKS_CONFIG, "all");
+        PRODUCER_CONFIG_2.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, 
StringSerializer.class);
+        PRODUCER_CONFIG_2.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, 
StringSerializer.class);
+
+        final List<KeyValue<String, String>> table1 = asList(
+            new KeyValue<>("ID123-1", "ID123-A1"),
+            new KeyValue<>("ID123-2", "ID123-A2"),
+            new KeyValue<>("ID123-3", "ID123-A3"),
+            new KeyValue<>("ID123-4", "ID123-A4")
+        );
+
+        final List<KeyValue<String, String>> table2 = asList(
+            new KeyValue<>("ID123", "BBB")
+        );
+
+        IntegrationTestUtils.produceKeyValuesSynchronously(TABLE_1, table1, 
PRODUCER_CONFIG_1, MOCK_TIME);
+        IntegrationTestUtils.produceKeyValuesSynchronously(TABLE_2, table2, 
PRODUCER_CONFIG_2, MOCK_TIME);
+
+        CONSUMER_CONFIG.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, 
CLUSTER.bootstrapServers());
+        CONSUMER_CONFIG.put(ConsumerConfig.GROUP_ID_CONFIG, 
"ktable-ktable-consumer");
+        CONSUMER_CONFIG.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, 
StringDeserializer.class);
+        CONSUMER_CONFIG.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, 
StringDeserializer.class);
+    }
+
+    @AfterClass
+    public static void closeCluster() {
+        CLUSTER.stop();
+    }
+
+    @Before
+    public void before() throws IOException {
+        final String stateDirBasePath = TestUtils.tempDirectory().getPath();
+        streamsConfig.put(StreamsConfig.STATE_DIR_CONFIG, stateDirBasePath + 
"-1");
+        streamsConfigTwo.put(StreamsConfig.STATE_DIR_CONFIG, stateDirBasePath 
+ "-2");
+        streamsConfigThree.put(StreamsConfig.STATE_DIR_CONFIG, 
stateDirBasePath + "-3");
+    }
+
+    @After
+    public void after() throws IOException {
+        if (streams != null) {
+            streams.close();
+            streams = null;
+        }
+        if (streamsTwo != null) {
+            streamsTwo.close();
+            streamsTwo = null;
+        }
+        if (streamsThree != null) {
+            streamsThree.close();
+            streamsThree = null;
+        }
+        IntegrationTestUtils.purgeLocalStreamsState(asList(streamsConfig, 
streamsConfigTwo, streamsConfigThree));
+    }
+
+    @Test
+    public void shouldInnerJoinMultiPartitionQueryable() throws Exception {
+        final Set<KeyValue<String, String>> expectedOne = new HashSet<>();
+        expectedOne.add(new KeyValue<>("ID123-1", 
"value1=ID123-A1,value2=BBB"));
+        expectedOne.add(new KeyValue<>("ID123-2", 
"value1=ID123-A2,value2=BBB"));
+        expectedOne.add(new KeyValue<>("ID123-3", 
"value1=ID123-A3,value2=BBB"));
+        expectedOne.add(new KeyValue<>("ID123-4", 
"value1=ID123-A4,value2=BBB"));
+
+        verifyKTableKTableJoin(expectedOne);
+    }
+
+    private void verifyKTableKTableJoin(final Set<KeyValue<String, String>> 
expectedResult) throws Exception {
+        final String innerJoinType = "INNER";
+        final String queryableName = innerJoinType + "-store1";
+
+        streams = prepareTopology(queryableName, streamsConfig);

Review comment:
       This is what the existing 
`KTableKTableForeignKeyInnerJoinMultiIntegrationTest` does, I believe in order 
to test running with multiple Streams instances:
   
https://github.com/apache/kafka/blob/86e83de742720e9800ef0f55d1a65c8a2f79e4cc/streams/src/test/java/org/apache/kafka/streams/integration/KTableKTableForeignKeyInnerJoinMultiIntegrationTest.java#L191-L193
   You're right that it's not strictly necessary for this test, though. Do you 
think it's better to simplify the test and just have one instance?

##########
File path: 
streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java
##########
@@ -995,7 +1075,8 @@ boolean sendingOldValueEnabled() {
         //This occurs whenever the extracted foreignKey changes values.
         enableSendingOldValues(true);
 
-        final NamedInternal renamed = new NamedInternal(joinName);
+        final TableJoinedInternal<K, KO> tableJoinedInternal = new 
TableJoinedInternal<>(tableJoined);
+        final NamedInternal renamed = new 
NamedInternal(tableJoinedInternal.name());

Review comment:
       The method `suffixWithOrElseGet(String, InternalNameProvider, String)` 
only exists on `NamedInternal`. Do you think it would be cleaner to introduce 
the method on `TableJoinedInternal` and have 
`TableJoinedInternal#suffixWithOrElseGet()` call 
`NamedInternal#suffixWithOrElseGet()`? I could get behind that. I don't think 
it makes sense to duplicate the code for `suffixWithOrElseGet()` into 
`TableJoinedInternal`, though.

##########
File path: 
streams/src/main/java/org/apache/kafka/streams/kstream/TableJoined.java
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.kafka.streams.kstream;
+
+import org.apache.kafka.streams.processor.StreamPartitioner;
+
+import java.util.function.Function;
+
+/**
+ * The {@code TableJoined} class represents optional params that can be passed 
to
+ * {@link KTable#join(KTable, Function, ValueJoiner, TableJoined) 
KTable#join(KTable,Function,...)} and
+ * {@link KTable#leftJoin(KTable, Function, ValueJoiner, TableJoined) 
KTable#leftJoin(KTable,Function,...)}
+ * operations, for foreign key joins.
+ * @param <K>   this key type ; key type for the left (primary) table
+ * @param <KO>  other key type ; key type for the right (foreign key) table
+ */
+public class TableJoined<K, KO> implements NamedOperation<TableJoined<K, KO>> {
+
+    protected final StreamPartitioner<K, Void> partitioner;
+    protected final StreamPartitioner<KO, Void> otherPartitioner;
+    protected final String name;
+
+    private TableJoined(final StreamPartitioner<K, Void> partitioner,
+                        final StreamPartitioner<KO, Void> otherPartitioner,
+                        final String name) {
+        this.partitioner = partitioner;
+        this.otherPartitioner = otherPartitioner;
+        this.name = name;

Review comment:
       For both StreamJoined and Joined, yes:
   
https://github.com/apache/kafka/blob/86e83de742720e9800ef0f55d1a65c8a2f79e4cc/streams/src/main/java/org/apache/kafka/streams/kstream/StreamJoined.java#L71
   
https://github.com/apache/kafka/blob/86e83de742720e9800ef0f55d1a65c8a2f79e4cc/streams/src/main/java/org/apache/kafka/streams/kstream/StreamJoined.java#L97
   
https://github.com/apache/kafka/blob/86e83de742720e9800ef0f55d1a65c8a2f79e4cc/streams/src/main/java/org/apache/kafka/streams/kstream/Joined.java#L40
   
https://github.com/apache/kafka/blob/86e83de742720e9800ef0f55d1a65c8a2f79e4cc/streams/src/main/java/org/apache/kafka/streams/kstream/Joined.java#L62
   

##########
File path: 
streams/src/main/java/org/apache/kafka/streams/kstream/TableJoined.java
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.kafka.streams.kstream;
+
+import org.apache.kafka.streams.processor.StreamPartitioner;
+
+import java.util.function.Function;
+
+/**
+ * The {@code TableJoined} class represents optional params that can be passed 
to
+ * {@link KTable#join(KTable, Function, ValueJoiner, TableJoined) 
KTable#join(KTable,Function,...)} and
+ * {@link KTable#leftJoin(KTable, Function, ValueJoiner, TableJoined) 
KTable#leftJoin(KTable,Function,...)}
+ * operations, for foreign key joins.
+ * @param <K>   this key type ; key type for the left (primary) table
+ * @param <KO>  other key type ; key type for the right (foreign key) table
+ */
+public class TableJoined<K, KO> implements NamedOperation<TableJoined<K, KO>> {
+
+    protected final StreamPartitioner<K, Void> partitioner;
+    protected final StreamPartitioner<KO, Void> otherPartitioner;
+    protected final String name;
+
+    private TableJoined(final StreamPartitioner<K, Void> partitioner,
+                        final StreamPartitioner<KO, Void> otherPartitioner,
+                        final String name) {
+        this.partitioner = partitioner;
+        this.otherPartitioner = otherPartitioner;
+        this.name = name;
+    }
+
+    protected TableJoined(final TableJoined<K, KO> tableJoined) {
+        this(tableJoined.partitioner, tableJoined.otherPartitioner, 
tableJoined.name);
+    }
+
+    /**
+     * Create an instance of {@code TableJoined} with partitioner and 
otherPartitioner {@link StreamPartitioner} instances.
+     * {@code null} values are accepted and will result in the default 
partitioner being used.
+     *
+     * @param partitioner      a {@link StreamPartitioner} that specifies the 
partitioning strategy for the left (primary)
+     *                         table of the foreign key join. The partitioning 
strategy must depend only on the message key
+     *                         and not the message value. If {@code null} the 
default partitioner will be used.
+     * @param otherPartitioner a {@link StreamPartitioner} that specifies the 
partitioning strategy for the right (foreign
+     *                         key) table of the foreign key join. The 
partitioning strategy must depend only on the message
+     *                         key and not the message value. If {@code null} 
the default partitioner will be used.
+     * @param <K>              this key type ; key type for the left (primary) 
table
+     * @param <KO>             other key type ; key type for the right 
(foreign key) table
+     * @return new {@code TableJoined} instance with the provided partitioners
+     */
+    public static <K, KO> TableJoined<K, KO> with(final StreamPartitioner<K, 
Void> partitioner,
+                                                  final StreamPartitioner<KO, 
Void> otherPartitioner) {
+        return new TableJoined<>(partitioner, otherPartitioner, null);
+    }
+
+    /**
+     * Create an instance of {@code TableJoined} with base name for all 
components of the join, including internal topics
+     * created to complete the join.
+     *
+     * @param name the name used as the base for naming components of the join 
including internal topics
+     * @param <K>  this key type ; key type for the left (primary) table
+     * @param <KO> other key type ; key type for the right (foreign key) table
+     * @return new {@code TableJoined} instance configured with the name
+     *
+     */
+    public static <K, KO> TableJoined<K, KO> as(final String name) {
+        return new TableJoined<>(null, null, name);
+    }
+
+    /**
+     * Set the custom {@link StreamPartitioner} to be used. Null values are 
accepted and will result in the

Review comment:
       Done, though I slightly reworded to `to be used as part of computing the 
join`. Can easily change back if you prefer the original, though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] vcrfxia commented on a change in pull request #11368: KAFKA-13261: Add support for custom partitioners in foreign key joins

Reply via email to