Hello Kay, What you describe is "by design" -- unfortunately.
The problem is, that when we build the `Topology` we don't know the partition count of the input topics, and thus, StreamsBuilder cannot insert a repartition topic for this case (we always assume that the partition count is the same for all input topic).
To work around this, you would need to rewrite the program to use either `groupBy((k,v) -> k)` instead of `groupByKey()`, or do a `.repartition().groupByKey()`.
Does this make sense? -Matthias On 5/16/24 2:11 AM, Kay Hannay wrote:
Hi, we have a Kafka streams application which merges (merge, groupByKey, aggretgate) a few topics into one topic. The application is stateful, of course. There are currently six instances of the application running in parallel. We had an issue where one new Topic for aggregation did have another partition count than all other topics. This caused data corruption in our application. We expected that a re-partitioning topic would be created automatically by Kafka streams or that we would get an error. But this did not happen. Instead, some of the keys (all merged topics share the same key schema) found their way into at least two different instances of the application. One key is available in more than one local state store. Can you explain why this happened? As already said, we would have expected to get an error or a re-partitioning topic in this case. Cheers Kay