ableegoldman commented on a change in pull request #9582: URL: https://github.com/apache/kafka/pull/9582#discussion_r522549685
########## File path: streams/src/main/java/org/apache/kafka/streams/kstream/internals/InternalStreamsBuilder.java ########## @@ -314,6 +317,50 @@ public void buildAndOptimizeTopology(final Properties props) { internalTopologyBuilder.validateCopartition(); } + private void mergeDuplicateSourceNodes() { + final Map<String, StreamSourceNode<?, ?>> topicsToSourceNodes = new HashMap<>(); + + // We don't really care about the order here, but since Pattern does not implement equals() we can't rely on + // a regular HashMap and containsKey(Pattern). But for our purposes it's sufficient to compare the compiled + // string and flags to determine if two pattern subscriptions can be merged into a single source node + final Map<Pattern, StreamSourceNode<?, ?>> patternsToSourceNodes = + new TreeMap<>(Comparator.comparing(Pattern::pattern).thenComparing(Pattern::flags)); Review comment: Yes to all of that: this PR improves some situations, but not all. Specifically you would still get a TopologyException if (a) subscribing to overlapping but not equal collection of topics, (b) subscribing to a topic and to a pattern that matches said topic, and (c) subscribing to two (or more) patterns that match the same topic(s). Case (c) is what you described, I just wanted to list them all here for completion. Here's my take on what we can/should reasonably try to tackle: (a) this case is easily detected, easily worked around, and easy for us to fix. It results in a "compile time" exception (meaning when the topology is compiled, not the program) which users can quickly detect and work around if need be by rewriting the topology themselves. Fix is relatively straightforward but very low priority, so I plan to just file a followup ticket for this for now (b) is easily detected (you get a compile time exception) and possible to work around, but difficult to solve. I think in all cases a user could find a way around this issue by some combination of topology rewriting and Pattern manipulation or topic renaming, depending on what exactly they're trying to achieve. Of course there's no way for us to detect what an arbitrary user is trying to do in this case, so I don't see any path forwarding to making this case possible. No plans to file a followup ticket (c) is difficult to detect, might be possible to work around, and probably very complicated to actually fix. Unfortunately, in this case you only get a run-time exception, since there's no way of knowing which topics will or will not be created ahead of time. And I'm thinking that determining whether two regexes will both match any possible string may be unsolvable...so, no followup ticket planned for this. WDYT? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org