Thanks for all the input. My intention was not to block the KIP, but just to take a step back and try get a holistic picture and discussion, to explore if there are good/viable alternative designs. As said originally, I really like to close this gap, and was always aware that the current config is not flexible enough.

I guess, my "concern" is that the KIP does increase the API surface area significantly, as we need all kind of `StoreTypeSpec` implementations, and it might also imply that we need follow up KIPs for new feature (like in-memory versioned store) that might not need a KIP otherwise.

The second question is if it might make the already "patchy" situation with regard to customization worse.

We did de-scope the original KIP-591 for this reason, and given the new situation of the DSL, it seems that it actually got worse compared to back in the days.

Lastly, I hope to make the new versioned stores the default in the DSL and we did not do it in the previous KIP due to backward compatibility issues. Thus, from a DSL point of view, I believe there should be only "RocksDB", "InMemory", and "Custom" in an ideal world. Introducing (I am exaggerating to highlight my point) "KvRocksDbSpec", "TimestampeKvRocksDbSpec", "VersionedRocksDbSpec", plus the corresponding in-memory specs seems to head into the opposite direction. -- My goal is to give users a handle of the _physical_ store (RocksDB vs InMemory vs Custom) but not the _logical_ stores (plain kv, ts-kv, versioned) which is "dictated" by the DSL itself and should not be customizable (we are just in a weird intermediate situation that we need to clean up, but not "lean into" IMHO).

Thus, I am also not sure if adding "VersionedRocksDbSpec" would be ideal (also, given that it only changes a single store, but not the two windowed stores)?

Furthermore, I actually hope that we could use the new versioned store to replace the window- and sessions- stores, and thus to decrease the number of required store types.


Admittedly, I am talking a lot about a potential future, but the goal is only to explore opportunities to not get into "worse" intermediate state, that will require a huge deprecation surface area later on. Of course, if there is no better way, and my concerns are not shared, I am ok to move forward with the KIP.


Bottom line: I would personally prefer to keep the current config and add a `Custom` option to it, plus adding one new config that allows people to set their custom `StoreTypeSpec` class. -- I would not add a built-in spec for versioned stores at this point (or any other built-in `StorytypeSpec` implementations). I guess, users could write a custom spec if they want to enable versioned store across the board for now (until we make them the default anyway)?


Hope my train of thoughts is halfway reasonable and not totally off track?


-Matthias

On 7/21/23 15:27, Sophie Blee-Goldman wrote:
I agree with everything Almog said above, and will just add on to two
points:

1. Regarding whether to block this KIP on the completion of any or all
future implementations of in-memory version stores (or persist suppression
buffers), I feel that would be unfair to this feature which is completely
unrelated to the semantic improvements offered by versioned state stores.
It seems like the responsibility of those driving the versioned state
stores feature, not Almog/this KIP, to make sure that those bases are
covered. Further, if anything, this KIP will help with the massive
proliferation of StoreSuppliers on the Stores factory class, and provide
users with a much easier way to leverage the versioned stores without
having to muck around directly with the StoreSuppliers.

I also thought about it a bit, and really like Almog's suggestion to
introduce an additional StoreSpec for the Versioned state stores. Obviously
we can add the RocksDB one to this KIP now, and then as he mentioned,
there's an easy way to get users onto the IMVersionedStateStore types once
they are completed.

Lastly, on this note, I want to point out how smoothly this solved the
issue of timestamped stores, which are intended to be the DSL default but
are only a special case for RocksDB. Right now it can be confusing for a
user scrolling through the endless Stores class and seeing a timestamped
version of the persistent but not in-memory stores. One could easily assume
there was no timestamped option for IM stores and that this feature was
incomplete, if they weren't acutely aware of the internal implementation
details (namely that it's only required for RocksDB for compatibility
reasons). However, with this KIP, all that is handled completely
transparently to the user, and we the devs, who *are* aware of the internal
implementation details, are now appropriately the ones responsible for
handing the correct store type to a particular operator. While versioned
state stores may not be completely comparable, depending on whether we want
users to remain able to easily choose between using them or not (vs
timestamped which should be used by all), I still feel this KIP is a great
step in the right direction that not only should not be blocked on the
completion of the IM implementations, but in fact should specifically be
done first as it enables an easier way to utilize those IM versioned
stores. Just my 2 cents :)

2. The idea to expand the existing the config with a CUSTOM enum without
introducing another config to specify the CUSTOM store spec does not seem
appropriate, or  even possible (for the reasons Almog mentioned above about
config types, though perhaps there is a way I'm not seeing). I do not buy
the argument that we should optimize the API to make it easy for users who
just want to switch to all in-memory stores, as I truly believe this is a
very small fraction of the potential userbase of this feature (anyone who's
actually using this should please chime in!). It seems very likely that the
majority of users of this feature are actually those with custom state
stores, as to my knowledge, this has been the case any/every time this
feature was requested by users.

That said, while I don't see any way to get around introducing a new
config, I don't personally have a preference w.r.t whether to keep around
the old config or deprecate it. I simply don't get the impression it is
very heavily used as it stands today, while it only works for those who
want all in-memory stores. Again, input from actual users would be very
valuable here. In the absence of that data, I will just point to the fact
that this KIP was proposed by a Streams dev (you :P), abandoned, picked up
by another Streams dev, and finally implemented without ever hearing from a
user that they would find this useful. This is not to disparage the
original KIP, but just to say again, as I stated back then, it seemed like
a major opportunity loss to leave out custom state stores

Cheers,
Sophie

On Fri, Jul 21, 2023 at 1:52 PM Almog Gavra <almog.ga...@gmail.com> wrote:

Thanks for all the feedback folk! Responses inline.

Basically, I'm suggesting two things: first, call out in some way
(perhaps the StoreTypeSpec javadocs) that each StoreTypeSpec is considered
a public contract in itself and should outline any semantic guarantees it
does, or does not, make. Second, we should add a note on ordering
guarantees in the two OOTB specs: for RocksDB we assert that range queries
will honor serialized byte ordering, whereas the InMemory flavor gives no
ordering guarantee whatsoever at this time.

That makes sense to me Sophie! I'll make the changes to the KIP. And @Colt,
yes I believe that would be the new javadoc for the generic
ReadOnlyKeyValueStore.

However, I am wondering if we should close others gaps first?

@Matthias, thanks for the review and thoughts! I think we should separate
closing other gaps in the product from providing this as useful
functionality to avoid feature creep so long as the API proposed here will
be suitable for when we want to close those implementation gaps! My general
proposal is that for things that are not customizable today by
default.dsl.store they remain not customizable after this KIP. The good
news is, however, that there's no reason why this cannot be extended to
cover those in the future if we want to - see specifics below.

Comments on the specifics below

In particular, this holds for the new versioned-store ... Should
versioned stores also be covered by the KIP

Is there a reason why we can't introduce a VersionedRocksDBStoreTypeSpec
and if we ever support an in-memory an equivalent
VersionedInMemoryRocksDBStoreTypeSpec? If so, then there would not need to
be any additional changes to the API proposed in this KIP.

For `suppress()` it's actually other way around we only have an in-memory
implementation. Do you aim to allow custom stores for `suppress()`, too?

We have three options here:
1) we can decide to maintain existing behavior and use the in-memory
implementation for all stores (not even going through the API at all)
2a) we can introduce a new parameter to the KeyValueParams class (boolean
isTimeOrderedBuffer or something like that) and return an in-memory store
in the implementation of RocksDBStoreTypeSpec (this maintains the existing
behavior, and would allow us in the future to make the change to return a
RocksDB store if we ever provide one)
2b) same as 2a but we throw an exception if the requested store type does
not support that (this is backwards incompatible, and since ROCKS_DB is the
default we probably shouldn't do this)

My proposal for now is 1) because as of KIP-825
EmitStrategy#ON_WINDOW_CLOSE is the preferred way of suppressing and that
is accounted for in this API already.

Last, I am not sure if the new parameter replacing the existing one is
the
best way to go?

I'm happy either way, just let me know which you prefer - the discussion
around CUSTOM is in the rejected alternatives but I'm happy to differ to
whatever the project conventions are :)

If it's matches existing `ROCKS_DB` or `IN_MEMORY` we just process it as
we
do know, and if know we assume it's a fully qualified class name and try to
instantiate it?

Note that there is no functionality for this kind of thing in
AbstractConfig (it's either a String validated enum or a class) so this
would be a departure from convention. Again, I'm happy to implement that if
it's preferred.

Also wondering how it would related to the existing `Stores` factory?

StoreTypeSpec will depend on Stores factory - they're one layer removed.
You can imagine that StoreTypeSpec is just a grouping of methods from the
Stores factory into a convenient package for default configuration
purposes.

Thanks again for all the detailed thoughts Matthias!

On Fri, Jul 21, 2023 at 11:50 AM Matthias J. Sax <mj...@apache.org> wrote:

Thanks for the KIP. Overall I like the idea to close this gap.

However, I am wondering if we should close others gaps first? In
particular, IIRC, we have a few cases for which we only have a RocksDB
implementation for a store, and thus, adding an in-memory version for
these stores first, to make the current `IN_MEMORY` parameter work,
might be the first step?

In particular, this holds for the new versioned-store (but I actually
believe the is some other internal store with no in-memory
implementation). -- For `suppress()` it's actually other way around we
we only have an in-memory implementation. Do you aim to allow custom
stores for `suppress()`, too?

Btw: Should versioned stores also be covered by the KIP (ie,
`StoreTypeSpec`)? We did consider to add a new option `VERSIONED` to the
existing `default.dsl.store` config, but opted out for various reasons.

Last, I am not sure if the new parameter replacing the existing one is
the best way to go? Did you put the idea to add `CUSTOM` to the existing
config into rejected alternative. Personally, I would prefer to add
`CUSTOM` as I would like to optimize to easy of use for the majority of
users (which don't implement a custom store), but only switch to
in-memory sometimes. -- As an alternative, you would also just extend
`dsl.default.store` (it's just a String) and allow to pass in anything.
If it's matches existing `ROCKS_DB` or `IN_MEMORY` we just process it as
we do know, and if know we assume it's a fully qualified class name and
try to instantiate it? -- Given that we plan to keep the store-enum, is
seems cleaner to keep the existing config and keep both the config and
enum aligned to each other?


It's just preliminary thought. I will need to go back an take a more
detailed look into the code to grok how the propose `StoreTypeSpec`
falls into place. Also wondering how it would related to the existing
`Stores` factory?

-Matthias


On 7/21/23 6:45 AM, Colt McNealy wrote:
Sophie—

Thanks for chiming in here. +1 to the idea of specifying the ordering
guarantees that we make in the StorageTypeSpec javadocs.

Quick question then. Is the javadoc that says:

Order is not guaranteed as bytes lexicographical ordering might not
represent key order.

no longer correct, and should say:

Order guarantees depend on the underlying implementation of the
ReadOnlyKeyValueStore. For more information, please consult the
[StorageTypeSpec javadocs](....)

Thanks,
Colt McNealy

*Founder, LittleHorse.dev*


On Thu, Jul 20, 2023 at 9:28 PM Sophie Blee-Goldman <
ableegold...@gmail.com>
wrote:

Hey Almog, first off, thanks for the KIP! I (and others) raised
concerns
over how restrictive the default.dsl.store config would be if not
extendable to custom store types, especially given that this seems to
be
the primary userbase of such a feature. At the time we didn't really
have
any better ideas for a clean way to achieve that, but what you
proposed
makes a lot of sense to me. Happy to see a good solution to this, and
hopefully others will share my satisfaction :P

I did have one quick piece of feedback which arose from an unrelated
question posed to the dev mailing list w/ subject line
"ReadOnlyKeyValueStore#range()
Semantics"
<https://lists.apache.org/thread/jbckmth8d3mcgg0rd670cpvsgwzqlwrm>. I
recommend checking out the full thread for context, but it made me
think
about how we can leverage the new StoreTypeSpec concept as an answer
to
the
long-standing question in Streams: where can we put guarantees of the
public contract for RocksDB (or other store implementations) when all
the
RocksDB stuff is technically internal.

Basically, I'm suggesting two things: first, call out in some way
(perhaps
the StoreTypeSpec javadocs) that each StoreTypeSpec is considered a
public
contract in itself and should outline any semantic guarantees it does,
or
does not, make. Second, we should add a note on ordering guarantees in
the
two OOTB specs: for RocksDB we assert that range queries will honor
serialized byte ordering, whereas the InMemory flavor gives no
ordering
guarantee whatsoever at this time.

Thoughts?

-Sophie

On Thu, Jul 20, 2023 at 4:28 PM Almog Gavra <almog.ga...@gmail.com>
wrote:

Hi All,

I would like to propose a KIP to expand support for default store
types
(KIP-591) to encompass custom store implementations:




https://cwiki.apache.org/confluence/display/KAFKA/KIP-954%3A+expand+default+DSL+store+configuration+to+custom+types

Looking forward to your feedback!

Cheers,
Almog






Reply via email to