bbejeck commented on issue #8504: URL: https://github.com/apache/kafka/pull/8504#issuecomment-618447861
>Why that? Because such a topology would hit the bug, it could never be deployed, and thus nobody can actually run such a topology? In fact, shouldn't be "burn" and index even if a name is provided (IIRC, we do this for some cases)? Yes in some cases we increment the index when users provide names. But right now we don't increment the counter at all when creating repartition topics as we reuse the name as is. My main point is that if we started to generate new names for repartition topics we'd create topology compatibility issues as the newly generated name would bump the count for all downstream nodes. Right now I'm leaning towards going with the solution you presented in point one in https://github.com/apache/kafka/pull/8504#discussion_r413380852 >I agree thought, that merging repartition topics (as proposed in (1)) should be done if possible (it's a historic artifact that we did not merge them in the past and IMHO we should not make the same mistake again?). But by doing so we are "leaking" optimization logic as you pointed out above. I'm leaning towards building the topology "as is", meaning create two repartition topics if that's what is required. But I don't have a strong opinion and I would be fine with keeping the current solution in this PR. >For (2), it's a tricky question because the different names are used for different stores and changelog topics (ie, main purpose?) -- it seems to be a "nasty side effect" if we would end up with two repartition topics for this case? Of course, given the new repartition() operator, a user can work around it by using it after map() and before calling join(). Just brainstorming here what the impact could be and what tradeoff we want to pick. I'm not sure I follow here the "nasty side effect" comment. If a user does `streamA.join(streamB, ..., StreamJoined.name("foo")` and `streamA.join(streamC, ..., StreamJoined.name("bar")` then we should create two repartiton topics as that's what the user is expecting. If they elect to use optimization then removing redundant repartition topics is expected behavior. I think this also goes back to your original comment about the leaking of optimization details. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org