[ 
https://issues.apache.org/jira/browse/KAFKA-12710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330979#comment-17330979
 ] 

Guozhang Wang commented on KAFKA-12710:
---------------------------------------

Thanks [~ableegoldman], this seems relevant to [~agavra]'s KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-705%3A+Selectively+Disable+Topology+Optimizations

> Consider enabling (at least some) optimizations by default
> ----------------------------------------------------------
>
>                 Key: KAFKA-12710
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12710
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: A. Sophie Blee-Goldman
>            Priority: Major
>
> Topology optimizations such as the repartition consolidation and source topic 
> changelog are extremely useful at reducing the footprint of a Kafka Streams 
> application on the broker. The additional storage and resource utilization 
> due to changelogs and repartitions is a very real pain point, and has even 
> been cited as the reason for turning to other stream processing frameworks in 
> the past (though of course I question that judgement)
> The repartition topic optimization, at the very least, should be enabled by 
> default. The problem is that we can't just flip the switch without breaking 
> existing applications during upgrade, since the location and name of such 
> topics in the topology may change. One possibility is to just detect this 
> situation and disable the optimization if we find that it would produce an 
> incompatible topology for an existing application. We can determine that this 
> is the case simply by looking for pre-existing repartition topics. If any 
> such topics are present, and match the set of repartition topics in the 
> un-optimized topology, then we know we need to switch the optimization off. 
> If we don't find any repartition topics, or they match the optimized 
> topology, then we're safe to enable it by default.
> Alternatively, we could just do a KIP to indicate that we intend to change 
> the default in the next breaking release and that existing applications 
> should override this config if necessary. We should be able to implement a 
> fail-safe and shut down if a user misses or forgets to do so, using the 
> method mentioned above.
> The source topic optimization is perhaps more controversial, as there have 
> been a few issues raised with regards to things like [restoring bad data and 
> asymmetric serdes|https://issues.apache.org/jira/browse/KAFKA-8037], or more 
> recently the bug discovered in the [emit-on-change semantics for 
> KTables|https://issues.apache.org/jira/browse/KAFKA-12508?focusedCommentId=17306323&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17306323].
>  However for this case at least there are no compatibility concerns. It's 
> safe to upgrade from using a separate changelog for a source KTable to just 
> using that source topic directly, although the reverse is not true. We could 
> even automatically delete the no-longer-necessary changelog for upgrading 
> applications



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to