[
https://issues.apache.org/jira/browse/KAFKA-12710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330979#comment-17330979
]
Guozhang Wang commented on KAFKA-12710:
---------------------------------------
Thanks [~ableegoldman], this seems relevant to [~agavra]'s KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-705%3A+Selectively+Disable+Topology+Optimizations
> Consider enabling (at least some) optimizations by default
> ----------------------------------------------------------
>
> Key: KAFKA-12710
> URL: https://issues.apache.org/jira/browse/KAFKA-12710
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: A. Sophie Blee-Goldman
> Priority: Major
>
> Topology optimizations such as the repartition consolidation and source topic
> changelog are extremely useful at reducing the footprint of a Kafka Streams
> application on the broker. The additional storage and resource utilization
> due to changelogs and repartitions is a very real pain point, and has even
> been cited as the reason for turning to other stream processing frameworks in
> the past (though of course I question that judgement)
> The repartition topic optimization, at the very least, should be enabled by
> default. The problem is that we can't just flip the switch without breaking
> existing applications during upgrade, since the location and name of such
> topics in the topology may change. One possibility is to just detect this
> situation and disable the optimization if we find that it would produce an
> incompatible topology for an existing application. We can determine that this
> is the case simply by looking for pre-existing repartition topics. If any
> such topics are present, and match the set of repartition topics in the
> un-optimized topology, then we know we need to switch the optimization off.
> If we don't find any repartition topics, or they match the optimized
> topology, then we're safe to enable it by default.
> Alternatively, we could just do a KIP to indicate that we intend to change
> the default in the next breaking release and that existing applications
> should override this config if necessary. We should be able to implement a
> fail-safe and shut down if a user misses or forgets to do so, using the
> method mentioned above.
> The source topic optimization is perhaps more controversial, as there have
> been a few issues raised with regards to things like [restoring bad data and
> asymmetric serdes|https://issues.apache.org/jira/browse/KAFKA-8037], or more
> recently the bug discovered in the [emit-on-change semantics for
> KTables|https://issues.apache.org/jira/browse/KAFKA-12508?focusedCommentId=17306323&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17306323].
> However for this case at least there are no compatibility concerns. It's
> safe to upgrade from using a separate changelog for a source KTable to just
> using that source topic directly, although the reverse is not true. We could
> even automatically delete the no-longer-necessary changelog for upgrading
> applications
--
This message was sent by Atlassian Jira
(v8.3.4#803005)