[ https://issues.apache.org/jira/browse/KAFKA-12710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330979#comment-17330979 ]
Guozhang Wang commented on KAFKA-12710: --------------------------------------- Thanks [~ableegoldman], this seems relevant to [~agavra]'s KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-705%3A+Selectively+Disable+Topology+Optimizations > Consider enabling (at least some) optimizations by default > ---------------------------------------------------------- > > Key: KAFKA-12710 > URL: https://issues.apache.org/jira/browse/KAFKA-12710 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: A. Sophie Blee-Goldman > Priority: Major > > Topology optimizations such as the repartition consolidation and source topic > changelog are extremely useful at reducing the footprint of a Kafka Streams > application on the broker. The additional storage and resource utilization > due to changelogs and repartitions is a very real pain point, and has even > been cited as the reason for turning to other stream processing frameworks in > the past (though of course I question that judgement) > The repartition topic optimization, at the very least, should be enabled by > default. The problem is that we can't just flip the switch without breaking > existing applications during upgrade, since the location and name of such > topics in the topology may change. One possibility is to just detect this > situation and disable the optimization if we find that it would produce an > incompatible topology for an existing application. We can determine that this > is the case simply by looking for pre-existing repartition topics. If any > such topics are present, and match the set of repartition topics in the > un-optimized topology, then we know we need to switch the optimization off. > If we don't find any repartition topics, or they match the optimized > topology, then we're safe to enable it by default. > Alternatively, we could just do a KIP to indicate that we intend to change > the default in the next breaking release and that existing applications > should override this config if necessary. We should be able to implement a > fail-safe and shut down if a user misses or forgets to do so, using the > method mentioned above. > The source topic optimization is perhaps more controversial, as there have > been a few issues raised with regards to things like [restoring bad data and > asymmetric serdes|https://issues.apache.org/jira/browse/KAFKA-8037], or more > recently the bug discovered in the [emit-on-change semantics for > KTables|https://issues.apache.org/jira/browse/KAFKA-12508?focusedCommentId=17306323&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17306323]. > However for this case at least there are no compatibility concerns. It's > safe to upgrade from using a separate changelog for a source KTable to just > using that source topic directly, although the reverse is not true. We could > even automatically delete the no-longer-necessary changelog for upgrading > applications -- This message was sent by Atlassian Jira (v8.3.4#803005)