Hello Kay,

What you describe is "by design" -- unfortunately.

The problem is, that when we build the `Topology` we don't know the partition count of the input topics, and thus, StreamsBuilder cannot insert a repartition topic for this case (we always assume that the partition count is the same for all input topic).

To work around this, you would need to rewrite the program to use either `groupBy((k,v) -> k)` instead of `groupByKey()`, or do a `.repartition().groupByKey()`.

Does this make sense?


-Matthias


On 5/16/24 2:11 AM, Kay Hannay wrote:
Hi,

we have a Kafka streams application which merges (merge, groupByKey, 
aggretgate) a few topics into one topic. The application is stateful, of 
course. There are currently six instances of the application running in 
parallel.

We had an issue where one new Topic for aggregation did have another partition 
count than all other topics. This caused data corruption in our application. We 
expected that a re-partitioning topic would be created automatically by Kafka 
streams or that we would get an error. But this did not happen. Instead, some 
of the keys (all merged topics share the same key schema) found their way into 
at least two different instances of the application. One key is available in 
more than one local state store. Can you explain why this happened? As already 
said, we would have expected to get an error or a re-partitioning topic in this 
case.

Cheers
Kay

Reply via email to