ableegoldman commented on a change in pull request #9582:
URL: https://github.com/apache/kafka/pull/9582#discussion_r522549685



##########
File path: 
streams/src/main/java/org/apache/kafka/streams/kstream/internals/InternalStreamsBuilder.java
##########
@@ -314,6 +317,50 @@ public void buildAndOptimizeTopology(final Properties 
props) {
         internalTopologyBuilder.validateCopartition();
     }
 
+    private void mergeDuplicateSourceNodes() {
+        final Map<String, StreamSourceNode<?, ?>> topicsToSourceNodes = new 
HashMap<>();
+
+        // We don't really care about the order here, but since Pattern does 
not implement equals() we can't rely on
+        // a regular HashMap and containsKey(Pattern). But for our purposes 
it's sufficient to compare the compiled
+        // string and flags to determine if two pattern subscriptions can be 
merged into a single source node
+        final Map<Pattern, StreamSourceNode<?, ?>> patternsToSourceNodes =
+            new 
TreeMap<>(Comparator.comparing(Pattern::pattern).thenComparing(Pattern::flags));

Review comment:
       Yes to all of that: this PR improves some situations, but not all. 
Specifically you would still get a TopologyException if  (a) subscribing to 
overlapping but not equal collection of topics, (b) subscribing to a topic and 
to a pattern that matches said topic, and (c) subscribing to two (or more) 
patterns that match the same topic(s).
   
   Case (c) is what you described, I just wanted to list them all here for 
completion. Here's my take on what we can/should reasonably try to tackle:
   
   (a) this case is easily detected, easily worked around, and easy for us to 
fix. It results in a "compile time" exception (meaning when the topology is 
compiled, not the program) which users can quickly detect and work around if 
need be by rewriting the topology themselves. Fix is relatively straightforward 
but very low priority, so I plan to just file a followup ticket for this for now
   (b) is easily detected (you get a compile time exception) and possible to 
work around, but difficult to solve. I think in all cases a user could find a 
way around this issue by some combination of topology rewriting and Pattern 
manipulation or topic renaming, depending on what exactly they're trying to 
achieve. Of course there's no way for us to detect what an arbitrary user is 
trying to do in this case, so I don't see any path forwarding to making this 
case possible. No plans to file a followup ticket
   (c) is difficult to detect, might be possible to work around, and probably 
very complicated to actually fix. Unfortunately, in this case you only get a 
run-time exception, since there's no way of knowing which topics will or will 
not be created ahead of time. And I'm thinking that determining whether two 
regexes will both match any possible string may be unsolvable...so, no followup 
ticket planned for this. 
   WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to