Re: [DISCUSS] KIP-813 Shared State Stores

Matthias J. Sax Wed, 19 Jan 2022 12:31:41 -0800

Daan,

thanks for the KIP. I personally find the motivation section a littlebit confusing. If I understand the KIP correctly, you want to read atopic into a state store (ie, materialize it). This is already possibletoday.

Of course, today a "second" changelog topic would be created. It seemsthe KIP aims to avoid the additional changelog topic, and to allow tore-use the original input topic (this optimization is already availablefor the DSL, but not for the PAPI).

If my observation is correct, we can simplify the motivation accordingly(the fact that you want to use this feature to share state acrossdifferent applications more efficiently seems to be secondary and wecould omit it IMHO to keep the motivation focused).

As a result, we also don't need to concept of "leader" and "follower".In the end, Kafka Streams cannot reason/enforce any usage patternsacross different apps, but we can only guarantee stuff within a singleapplication (ie, don't create a changelog but reuse an input topic aschangelog). It would simplify the KIP if we remove these parts.

For the API, I am wondering why you propose to pass in `processorNames`?To me, it seems more reasonable to pass a `ProcessorSupplier` instead(similar to what we do for `addGlobalStore`)? The provided `Processor`must implement a certain pattern, ie, take each input record an apply itunmodified to the state store (ie, the Processor will be solelyresponsible to maintain the state store). We might also need to pass inother argument similar to `addGlobalStore` into this method). (More below.)

If other processors need to read the state store, they can be connectedto it explicitly via `connectProcessorAndStateStores()`? I guess ahybrid approach to keep `processorName` would also be possible, but IMHOall those should only _read_ the state store (but not modify it), tokeep a clear conceptual separation.

About the method name: wondering if we should use a different name to bemore explicit what the method does? Maybe `addReadOnlyStateStore`?

Btw: please omit any code snippets and only put the newly added methodsignature in the KIP.

What I don't yet understand is the section "Allow state stores tocontinue listening for changes from their changelog". Can you elaborate?


About:

Since a changelog topic is created with the application id in it’s name, it 
would allow us to check in the follower if the changelog topic starts with our 
application id. If it doesn’t, we are not allowed to send a log.

The DSL implements this differently, and just disabled the changelog forthe state store (ie, for the "follower"). We could do the same thing(either enforcing that the provided `StoreBuilder` has changeloggingdisabled, or by just ignoring it and disabled it hard coded).

Ie, overall I would prefer the "source-procssor appraoch" that you putinto rejected alternatives. Note that the problem you call out, namely

Problem with this approach is the lack of having restoring support within the 
state store

does not apply. A restore it absolutely possible and the DSL alreadysupports it.

Or is your concern with regard to performance? The "source-processorapproach" would have the disadvantage that input data is firstdeserialized, fed into the Processor, and than serialized again when putinto the state store. Re-using the state restore code is a good ideafrom a performance point of view, but it might require quite someinternal changes (your proposal to "not stop restoring" might not workas it could trigger quite some undesired side-effects given the currentarchitecture of Kafka Streams).



-Matthias




On 1/16/22 11:52 PM, Daan Gertis wrote:

Hey everyone,

Just created a KIP on sharing statestore state across multiple applications 
without duplicating the data on multiple changelog topics. Have a look and tell 
me what you think or what to improve. This is my first one, so please be gentle 
😉

https://cwiki.apache.org/confluence/x/q53kCw

Cheers!
D.

Re: [DISCUSS] KIP-813 Shared State Stores

Reply via email to