A. Sophie Blee-Goldman created KAFKA-12710:
----------------------------------------------

             Summary: Consider enabling (at least some) optimizations by default
                 Key: KAFKA-12710
                 URL: https://issues.apache.org/jira/browse/KAFKA-12710
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: A. Sophie Blee-Goldman


Topology optimizations such as the repartition consolidation and source topic 
changelog are extremely useful at reducing the footprint of a Kafka Streams 
application on the broker. The additional storage and resource utilization due 
to changelogs and repartitions is a very real pain point, and has even been 
cited as the reason for turning to other stream processing frameworks in the 
past (though of course I question that judgement)

The repartition topic optimization, at the very least, should be enabled by 
default. The problem is that we can't just flip the switch without breaking 
existing applications during upgrade, since the location and name of such 
topics in the topology may change. One possibility is to just detect this 
situation and disable the optimization if we find that it would produce an 
incompatible topology for an existing application. We can determine that this 
is the case simply by looking for pre-existing repartition topics. If any such 
topics are present, and match the set of repartition topics in the un-optimized 
topology, then we know we need to switch the optimization off. If we don't find 
any repartition topics, or they match the optimized topology, then we're safe 
to enable it by default.

Alternatively, we could just do a KIP to indicate that we intend to change the 
default in the next breaking release and that existing applications should 
override this config if necessary. We should be able to implement a fail-safe 
and shut down if a user misses or forgets to do so, using the method mentioned 
above.

The source topic optimization is perhaps more controversial, as there have been 
a few issues raised with regards to things like [restoring bad data and 
asymmetric serdes|https://issues.apache.org/jira/browse/KAFKA-8037], or more 
recently the bug discovered in the [emit-on-change semantics for 
KTables|https://issues.apache.org/jira/browse/KAFKA-12508?focusedCommentId=17306323&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17306323].
 However for this case at least there are no compatibility concerns. It's safe 
to upgrade from using a separate changelog for a source KTable to just using 
that source topic directly, although the reverse is not true. We could even 
automatically delete the no-longer-necessary changelog for upgrading 
applications



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to