Re: [DISCUSS] KIP-1024: Make the restore behavior of GlobalKTables with custom processors configureable

Bruno Cadonna Wed, 06 Mar 2024 00:54:55 -0800

Hi Walker,

Thanks for the KIP!


Great that you are going to fix this long-standing issue!

1.

I was wondering if we need the timestamp extractor as well as the keyand value deserializer in Topology#addGlobalStore() that do not take aProcessorSupplier? What about Consumed in StreamsBuilder#addGlobalStore()?Since those methods setup a global state store that does not process anyrecords, do they still need to deserialize records and extracttimestamps? Name might still be needed, right?

2.

From an API point of view, it might make sense to put allprocessor-related arguments into a parameter object. Something like:

GlobalStoreParameters.globalStore().withKeySerde(keySerde).disableReprocessOnRestore()
Just an idea, open for discussion.

3.

Could you please go over the KIP and correct typos and other mistakes inthe KIP?



Best,
Bruno



On 3/2/24 1:43 AM, Matthias J. Sax wrote:

Thanks for the KIP Walker.
Fixing this issue, and providing users some flexibility to opt-in/out on"restore reprocessing" is certainly a good improvement.
From an API design POV, I like the idea to not require passing in aProcessorSupplier to begin with. Given the current implementation of therestore process, this might have been the better API from the beginningon... Well, better late than never :)
For this new method w/o a supplier, I am wondering if we want to keep`addGlobalStore` or name it `addGlobalReadOnlyStore` -- we do a similarthing via KIP-813. Just an idea.
However, I am not convinced that adding a new boolean parameter is thebest way to design the API. Unfortunately, I don't have any elegantproposal myself. Just a somewhat crazy idea to do a larger API change:
Making a step back, a global store, is by definition a terminal node --we don't support to add child nodes. Hence, while we expose a full`ProcessorContext` interface, we actually limit what functionality itsupports. Thus, I am wondering if we should stop using the generic`Processor` interface to begin with, but design a new one which istailored to the needs of global stores? -- This would of course be ofmuch larger scope than originally intended by this KIP, but it might bea great opportunity to kill two birds with one stone?
The only other question to consider is: do we believe that global storeswill never have child nodes, or could we actually allow for child nodesin the future? If yes, it might not be smart to move off using`Processor` interface.... In general, I could imagine, especially as wenow want to support "process on restore" to allow simple statelessoperators like `map()` or `filter()` on a `GlobalTable` (or allow to addcustom global processors) at some point in the future?
Just wanted to put this out to see what people think...


-Matthias


On 2/29/24 1:26 PM, Walker Carlson wrote:
Hello everybody,

I wanted to propose a change to our addGlobalStore methods so that the
restore behavior can be controlled on a preprocessor level. This should
help Kafka Stream users to better tune Global stores added with the
processor API to better fit their needs.

Details are in the kip here: https://cwiki.apache.org/confluence/x/E4t3EQ

Thanks,
Walker

Re: [DISCUSS] KIP-1024: Make the restore behavior of GlobalKTables with custom processors configureable

Reply via email to