[GitHub] [kafka] vvcephei commented on a change in pull request #8504: KAFKA-9298: reuse mapped stream error in joins

GitBox Fri, 24 Apr 2020 10:10:43 -0700


vvcephei commented on a change in pull request #8504:
URL: https://github.com/apache/kafka/pull/8504#discussion_r414731001




##########
File path: 
streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamImpl.java
##########
@@ -989,16 +994,18 @@ private void to(final TopicNameExtractor<K, V> 
topicExtractor,
             null,
             optimizableRepartitionNodeBuilder);
 
-        final OptimizableRepartitionNode<K, V> optimizableRepartitionNode = 
optimizableRepartitionNodeBuilder.build();
-        builder.addGraphNode(streamsGraphNode, optimizableRepartitionNode);
+        if (repartitionNode == null || !name.equals(repartitionName)) {

Review comment:
       Since you made the mistake of asking my opinion, here it is :) :
   
   > bumping the index
   It's true that users can't currently reuse the KStream, so there's no 
compatibility issue there, but we can't bump the index for the _first_ 
repartition topic, or we would break every topology that uses generated 
repartition topic names already. So, either way, we have to cache something to 
tell us to do something different on the "first reuse" (i.e., the second use of 
the KStream).
   
   Since we have to do that anyway, maybe it's fine to just cache the 
repartition node itself instead of a flag that says "bump the index next time". 
   
   > leaking optimizations into the DSL
   
   I'm on the fence about whether this is an "optimization" or "reasonable 
behavior". It sort of feels like the latter, and the only reason we needed to 
introduce the "repartition-collapsing" optimization is that we failed to 
introduce reasonable behavior from the beginning. Also, my read is that the DSL 
builder and the optimizer are not cleanly separated right now anyway, and if we 
ever want to build more optimizations, we'll most likely need to make another 
pass on both anyway. We're also starting to think about topology evolution (cc 
@cadonna ), which makes this a less scary prospect, as we can then implement a 
mechanism to _compatibly_ introduce new optimizations. In other words, I'm not 
taking a hard stance, but leaning in the direction of doing the more efficient 
thing than the more pure thing, since we're not currently super pure anyway.
   
   > Other repartition topics
   
   I think we'd better leave it alone for now, implement topology evolution, 
then migrate to a completely pure and consistent approach.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] vvcephei commented on a change in pull request #8504: KAFKA-9298: reuse mapped stream error in joins

Reply via email to