[ https://issues.apache.org/jira/browse/KAFKA-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722943#comment-16722943 ]
Matthias J. Sax commented on KAFKA-7293: ---------------------------------------- >From my understanding, nothing would happen (ie, no exception) but the >computation will we executed, but compute an incorrect result. > Merge followed by groupByKey/join might violate co-partioning > ------------------------------------------------------------- > > Key: KAFKA-7293 > URL: https://issues.apache.org/jira/browse/KAFKA-7293 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Matthias J. Sax > Priority: Major > > The merge() operations can be applied to input KStreams that have a different > number of tasks (ie, input topic partitions). For this case, the input topics > are not co-partitioned and thus the result KStream is not partitioned even if > each input KStream is partitioned by its own. > Because, no "repartitionRequired" flag is set on the input KStreams, the flag > is also not set on the output KStream. Hence, if a groupByKey() or join() > operation is applied the output KStream, we don't insert a repartition topic. > However, repartitioning would be required because the KStream is not > partitioned. > We cannot detect this during compile time, because the number or partitions > is unknown, and thus, we cannot decide if repartitioning is required or not. > However, we can add a runtime check similar to joins() that checks if data is > correctly (co-)partitioned and if not, we can raise a runtime exception. > Note, for merge() in contrast to join(), we should only check for > co-partitioning, if the merge() is followed by a groupByKey() or join() > operations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)