Thanks for the KIP Victoria. Very well written!

Couple of questions (many might just require to add some more details to the KIP):

(1) Why does the new store not extend KeyValueStore, but StateStore? In the end, it's a KeyValueStore?

(2) Should we have a ReadOnlyVersionedKeyValueStore? Even if we don't want to support IQ in this KIP, it might be good to add this interface right away to avoid complications for follow up KIPs? Or won't there by any complications anyway?

(3) Why do we not have a `delete(key)` method? I am ok with not supporting all methods from existing KV-store, but a `delete(key)` seems to be fundamentally to have?

(4a) Do we need `get(key)`? It seems to be the same as `get(key, MAX_VALUE)`? Maybe is good to have as syntactic sugar though? Just for my own clarification (should we add something to the JavaDocs?).

(4b) Should we throw an exception if a user queries out-of-bound instead of returning `null` (in `get(key,ts)`)? -> You put it into "rejected alternatives", and I understand your argument. Would love to get input from others about this question though. -- It seems we also return `null` for windowed stores, so maybe the strongest argument is to align to existing behavior? Or do we have case for which the current behavior is problematic?

(4c) JavaDoc on `get(key,ts)` says: "(up to store implementation discretion when this is the case)" -> Should we make it a stricter contract such that the user can reason about it better (there is WIP to make retention time a strict bound for windowed stores atm) -> JavaDocs on `persistentVersionedKeyValueStore` seems to suggest a strict bound, too.

(5a) Do we need to expose `segmentInterval`? For windowed-stores, we also use segments but hard-code it to two (it was exposed in earlier versions but it seems not useful, even if we would be open to expose it again if there is user demand).

(5b) JavaDocs says: "Performance degrades as more record versions for the same key are collected in a single segment. On the other hand, out-of-order writes and reads which access older segments may slow down if there are too many segments." -- Wondering if JavaDocs should make any statements about expected performance? Seems to be an implementation detail?

(6) validTo timestamp is "exclusive", right? Ie, if I query `get(key,ts[=validToV1])` I would get `null` or the "next" record v2 with validFromV2=ts?

(7) The KIP says, that segments are stores in the same RocksDB -- for this case, how are efficient deletes handled? For windowed-store, we can just delete a full RocksDB.

(8) Rejected alternatives: you propose to not return the validTo timestamp -- if we find it useful in the future to return it, would there be a clean path to change it accordingly?


-Matthias


On 11/16/22 9:57 PM, Victoria Xia wrote:
Hi everyone,

I have a proposal for introducing versioned state stores in Kafka Streams.
Versioned state stores are similar to key-value stores except they can
store multiple record versions for a single key. This KIP focuses on
interfaces only in order to limit the scope of the KIP.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores

Thanks,
Victoria

Reply via email to