Hello Nick, Thanks for bringing this up and for the proposed options. I read though your writeup and here are some of my thoughts:
1) When changing the topology of Kafka Streams, the developer need to first decide if the whole topology's persisted state (including both the state store as well as its changelogs, and the repartition topics, and the source/sink external topics) or part of the persisted state can be reused. This involves two types of changes: a) structural change of the topology, such like a new processor node is added/removed, a new intermediate topic is added/removed etc. b) semantic change of a processor, such as a numerical filter node changing its filter threshold etc. Today both of them are more or less determined by developers manually. However, though automatically determining on changes of type b) is hard if not possible, automatic determining on the type of a) is doable since it's depend on just the information of: * number of sub-topologies, and their orders (i.e. sequence of ids) * used state stores and changelog topics within the sub-topology * used repartition topics * etc So let's assume in the long run we can indeed automatically determine if a topology or part of it (a sub-topology) is structurally the same, what we can do is to "translate" the old persisted state names to the new, isomorphic topology's names. Following this thought I'm leaning towards the direction of option B in your proposal. But since in this KIP automatic determining structural changes are out of the scope, I feel we can consider adding some sort of a "migration tool" from an old topology to new topology by renaming all the persisted states (store dirs and names, topic names). Guozhang On Tue, Jan 25, 2022 at 9:10 AM Nick Telford <nick.telf...@gmail.com> wrote: > Hi everyone, > > I'd like to start a discussion on Kafka Streams KIP-816 ( > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-816%3A+Topology+changes+without+local+state+reset > ) > > This KIP outlines 3 possible solutions to the problem, and I plan to > whittle this down to a definitive solution based on this discussion. > > Of the 3 proposed solutions: > * 'A' is probably the "correct" solution, but is also quite a significant > change. > * 'B' is the least invasive, but most "hacky" solution. > * 'C' requires a change to the wire protocol and will likely have > unintended consequences. C is also the least complete solution, and will > need significant additional work to make it work. > > Please let me know if the Motivation and Background sections need more > clarity. > > Regards, > > Nick Telford > -- -- Guozhang