Thanks. Will take a look into the PR.

I don't have any objection to the goal; contrary! It's very annoying what we have right now, and if we can improve it, I am totally in favor of it.


-Matthias

On 11/3/23 8:47 AM, Almog Gavra wrote:
Good question :) I have a PR for it already here:
https://github.com/apache/kafka/pull/14659. The concept is to wrap the
suppliers with an interface that allows for delayed creation of the
StoreBuilder instead of creating the StoreBuilder from the suppliers right
away. Happy to get on a quick call to outline the implementation strategy
if you'd like, but hopefully you have no objections to the goal!

On Thu, Nov 2, 2023 at 8:44 PM Matthias J. Sax <mj...@apache.org> wrote:

Almog,

can you explain how you intent to implement this change? It's not clear
to me, how we could do this?

When we call `StreasmBuilder.build()` it will give us a already fully
wired `Topology`, including all store suppliers needed. I don't see a
clean way how we could change the store supplier after the fact?


-Matthias

On 11/2/23 5:11 PM, Almog Gavra wrote:
Hello everyone - I updated the KIP to also include the following
modification:

Both the new dsl.store.suppliers.class  and the old default.dsl.store
will
now respect the configurations when passed in via
KafkaStreams#new(Topology,
StreamsConfig)  (and other related constructors) instead of only being
respected when passed in to the initial StoreBuilder#new(TopologyConfig)
(though it will be respected if passed in via the original path as well).

I was honestly a bit shocked this wasn't the case with the original KIP
that introduced default.dsl.store!

On Fri, Jul 28, 2023 at 4:55 PM Almog Gavra <almog.ga...@gmail.com>
wrote:

OK! I think I got everything, but I'll give the KIP another read with
fresh eyes later. Just a reminder that the voting is open, so go out and
exercise your civic duty! ;)

- Almog

On Fri, Jul 28, 2023 at 10:38 AM Almog Gavra <almog.ga...@gmail.com>
wrote:

Thanks Guozhang & Sophie:

A2. Will clarify in the KIP
A3. Will change back to the deprecated version!
A5. Seems like I'm outnumbered... DslStoreSuppliers it is.

Will update the KIP today.

- Almog

On Thu, Jul 27, 2023 at 12:42 PM Guozhang Wang <
guozhang.wang...@gmail.com> wrote:

Yes, that sounds right to me. Thanks Sophie.

On Thu, Jul 27, 2023 at 12:35 PM Sophie Blee-Goldman
<ableegold...@gmail.com> wrote:

A2: Guozhang, just to close the book on the ListValue store thing, I
fully
agree it seems like overreach
to expose/force this on users, especially if it's fully internal
today. But
just to make sure we're on the same
page here, you're still ok with this KIP fixing the API gap that
exists
today, in which these stores cannot be
customized by the user at all?

In other words, after this KIP, the new behavior for the ListValue
store in
a stream join will be:

S1: First, check if the user passed in a `DSLStoreSuppliers` (or
whatever
the name will be) to the
         StreamJoined config object, and use that to obtain the
KVStoreSupplier for this ListValue store

S2: If none was provided, check if the user has set a default
DSLStoreSuppliers via the new config,
         and use that to get the KVStoreSupplier if so

S3: If neither is set, fall back to the original logic as it is
today,
which is to pass in a KVStoreSupplier
         that is hard-coded to be either RocksDB or InMemory, based on
what
is returned for the #persistent
         API by the StreamJoined's WindowStoreSupplier

Does that sound right? We can clarify this further in the KIP if need
be

On Thu, Jul 27, 2023 at 10:48 AM Guozhang Wang <
guozhang.wang...@gmail.com>
wrote:

Hi all,

Like Almog's secretary as well! Just following up on that index:

A2: I'm also happy without introducing versioned KV in this KIP as I
would envision it to be introduced as new params into the
KeyValuePluginParams in the future. And just to clarify on Sophie's
previous comment, I think ListStore should not be exposed in this
API
until we see it as a common usage and hence would want to (again, we
need to think very carefully since it would potentially ask all
implementers to adopt) move it from the internal category to the
public interface category. As for now, I think only having kv /
window
/ session as public store types is fine.

A3: Seems I was not making myself very clear at the beginning :P The
major thing that I'd actually like to avoid having two configs
co-exist for the same function since it will be a confusing learning
curve for users, and hence what I was proposing is to just have the
newly introduced interface but not introducing a new config, and I
realized now that it is actually more aligned with the CUSTOM idea
where the ordering would be looking at config first, and then the
interface. I blushed as I read Almog likes it.. After thinking about
it twice, I'm now a bit leaning towards just deprecating the old
config with the new API+config as well.

A5: Among the names we have been discussed so far:

DslStorePlugin
StoreTypeSpec
StoreImplSpec
DslStoreSuppliers

I am in favor of DslStoreSuppliers as well as a restrictiveness on
its
scope, just to echo Bruno's comments above.



Guozhang

On Thu, Jul 27, 2023 at 4:15 AM Bruno Cadonna <cado...@apache.org>
wrote:

Hi,

A5. I have to admit that
"If we envision extending this beyond just StoreSupplier types, it
could
be a good option."
is scaring me a bit.
I am wondering what would be an example for such an extension?
In general, I would propose to limit the scope of a config. In
this case
the config should provide suppliers for state stores for the DSL.

BTW, maybe it is a good idea to let DslStorePlugin extend
Configurable.

Best,
Bruno

On 7/27/23 2:15 AM, Sophie Blee-Goldman wrote:
Thanks for the feedback Bruno -- sounds like we're getting close
to a
final
consensus here.
It sounds like the two main (only?) semi-unresolved questions
that
still
have differing
opinions floating around are whether to deprecate the old
config, and
what
to name the new config
+ interface.

Although I won't personally push back on any of the options
listed
above,
here's my final two cents:

A3. I'm still a firm believer in deprecating the old config, and
agree
wholeheartedly with what Bruno said.

A5. I also wasn't crazy about "Plugin" at first, but I will
admit it's
grown on me. I think it rubbed me the wrong
way at  first because it's just not part of the standard
vocabulary in
Streams so far. If we envision extending
this beyond just StoreSupplier types, it could be a good option.
DSLStoreSuppliers does make a lot of sense,
though.

To throw out a few more ideas in case any of them stick, what
about
something like DSLStoreFormat or
DSLStorageType, or even DSLStorageEngine? Or even
DSLStoreFactory --
the
Stores class is described as
a "factory" (though not named so) and, to me, is actually quite
comparable
-- both are providers not of the
stores themselves, but of the basic building blocks of Stores (eg
StoreSuppliers)

Ultimately fine with anything though. We should try not to drag
out
the KIP
discussion too long once it's down
to just nits :P

Cheers,
Sophie



On Wed, Jul 26, 2023 at 8:04 AM Almog Gavra <
almog.ga...@gmail.com>
wrote:

Thanks for the comments Bruno!

A3. Oops... I think I didn't do a great job updating the KIP to
reflect
Guozhang's suggestion. This seems like the last point of
contention,
where
we have two options:

1. Deprecate the config entirely and replace IN_MEMORY/ROCKSDB
with
implementations of the DslStorePlugin
2. (What's currently in the KIP) Introduce a new config which
defaults to
DefaultDslStorePlugin and only the DefaultDslStorePlugin will
respect
the
old default.store.type config

I'm happy with either, I'll keep the KIP with (2) for now as
that
seemed
like the result of the previous discussion but I have no problem
changing
it back to (1) which was the original proposal.

A5. I like "DslStorePlugin" because it leaves room for
configuring
implementations beyond just supplying stores (e.g. we could
introduce
a
`configure()` method etc...). I'll keep it as is for now (and
change
Materialized/Stores API sections - thanks for catching that)! I
don't
feel
too strongly and wouldn't dig my heels in if most people
preferred
"DslStoreSuppliers" (I don't love DslStores as it resembles the
Stores
class to closely in name and they're a little different).

A6. Yup, that's the suggestion.

- Almog

On Wed, Jul 26, 2023 at 6:38 AM Bruno Cadonna <
cado...@apache.org>
wrote:

Hi,

Sorry for being late to the party!

A1: I agree with Sophie, Guozhang, and Almog not to block the
KIP on
gaps in the implementation.

A2: I am happy with not considering anything special w.r.t.
versioned
state stores in this KIP.

A3: Here I agree with Sophie to deprecate the old config. I
would
also
not use config value CUSTOM. Having two configs that sometimes
depend on
each other to configure one single concept seems confusing to
me. I
see
future me looking at default.dsl.store = IN_MEMORY and
wondering why
something is written to disk because I did not check config
dsl.store.plugin.class?
BTW, the KIP in its current version is not clear about whether
default.dsl.store will be deprecated or not. In "Compatibility,
Deprecation, and Migration Plan" it says default.dsl.store
will be
deprecated but in "Configuration" default.dsl.store seems to
be an
essential part of the configuration.

A4: I agree

A5: I do not completely like the name "DslStorePlugin". What
about
naming it simply "DslStores" or "DslStoreSuppliers"? If we
decide to
rename we should also rename dsl.store.plugin.class to
dsl.store.suppliers.class or similar.
BTW, I think you missed to rename some occurrences in section
"Materialized API" especially in the code section
"Stores.java".

A6: Actually I am not sure if I completely follow here. Is
this about
the static methods in class Stores? If yes, I agree with Almog
to
keep
this out of the KIP.

Best,
Bruno

On 7/26/23 5:20 AM, Almog Gavra wrote:
I have updated the KIP with the points as discussed above.
@Guozhang,
the
suggested configuration makes it a little more awkward around
the
Materialized.as and Materialized.withStoreType APIs than it
was
when we
were totally deprecating the old configuration. Let me know
what you
think.

I will open the voting tomorrow! Thanks again everyone for the
discussion.

Cheers,
Almog

On Tue, Jul 25, 2023 at 9:20 AM Almog Gavra <
almog.ga...@gmail.com>
wrote:

Glad you like my KIP-secretary skills ;)

A2. I'm definitely happy to take your suggestion here and
not do
anything
special w.r.t. Versioned stores, I think it makes sense
especially
if
we
consider them implementation details of a specific store
type.

At EOD I'll update the KIP with all of these changes and if
the
discussion is silent I'll open a vote tomorrow morning.

Cheers,
Almog

On Mon, Jul 24, 2023 at 2:02 PM Sophie Blee-Goldman <
ableegold...@gmail.com> wrote:

Awesome summary (seriously) -- would you kindly offer your
organizational
skills to every ongoing KIP from henceforth? We need you :P

A few answers/comments:

A2: I think there is a 3rd sub-option here, which is to
leave
versioned-ness out of this KIP entirely, return only the
non-versioned
stores for now, and then switch over to the versioned stores
(only)
when
the time comes to flip the switch on making them the default
across
the
DSL. This has the advantage of retaining the current
behavior/semantics
and
provides a clear way to transition smoothly in the future,
since
it
seems
we will want to cut to all versioned state stores rather
than
offer
users
a
choice between versioned or non-versioned stores going
forward
(similar to
how we only offer timestamped stores presently, and have
completely
replaced non-timestamped stores in the DSL.) . In both the
timestamped
and
versioned cases, the old stores are/will still be available
or
accessible
to users via the bare StoreSuppliers, should they somehow
desire
or
require
the old store type. Ultimately, I think either this or
option (1)
would be
preferable, though I think it should be up to Matthias or
anyone
else
involved in the versioned stores feature to decide which
approach
makes
sense in the context of that feature's future plans.

A3: sounds reasonable to me

A5: Also sounds fine to me, though I'll let others chime in
with/if
they
have an alternative suggestion/preference. I guess the other
contender
would be something like DSLStoreImpl or something like that?



On Mon, Jul 24, 2023 at 9:36 AM Almog Gavra <
almog.ga...@gmail.com>
wrote:

Lots of thoughts! Happy to see the thriving discussion on
this
post
-
lots
going on so I'm trying to enumerate them to keep things
organized
(prefix
"A" for "Almog" so we can use numbers in responses for
other
things
;P).

A1. Question around closing implementation gaps (e.g. no
rocks
based
suppression store)
A2. Specifically how to handle Versioned stores
A3. Configuration (new config/reuse old one + new one and
ordering
of
config resolution)
A4. Drawing a line between what is implementation detail
(not
exposed
in
API) and what is customizable (exposed in API)
A5. Naming of StoreTypeSpec
A6. Param classes in StoreBuilders

------------------------------

Here are summaries for where it seems each of these stands
(trying
not
to
add any additional opinion yet):

A1. Sophie/Guozhang/Me (if I count hah!) seem to agree
that it is
worth
pushing this KIP through independently of the
implementation
gaps as
it
doesn't seem to move the intermediate state further from
the end
state.
Matthias originally had some concerns.

A2. There's questions around whether versioned stores
should be
their
own
configurable option or whether they are an implementation
detail
that
the
StoreSpec should decide. It seems like the discussion is
converging
here,
this should be an implementation detail.

A3. Matthias/Guozhang prefer adding CUSTOM and then having
an
additional
config to determine the implementation. Sophie prefers
deprecating
the
old
config. Guozhang additional suggested flipping the
resolution
order
such
that the old config is only respected in a DefaultStoreSpec
implementation.

A4. This KIP (or rather, the discussion on the KIP) blurs
the
lines
between
top level store types (KV, windowed, session) and the
underlying
implementation of them (timestamped, versioned, kv-list).
It
seems
everyone
is in alignment to ensure that we keep these two things
separate
and
that
the line is clearly delineated in the text of the KIP.

A5. Guozhang and Sophie agree that the current name
StoreTypeSpec is
misleading, as it's really an implementation spec, not a
type
specification.

A6. Agreement that this is an improvement, Sophie believes
this
can
be
done
in a follow up but we should ensure our naming is good
here so
there's
no
conflicts down the line.

---------------------------------

Ok, phew! Hopefully that covers it all! Now for my
thoughts,
hopefully
wrapping up some of these discussions:

A1.  @Matthias - are you still hesitant here? What would
you
need to
be
convinced here?

A2. Since we are all in agreement that versioned stores
should
be an
implementation detail, we have two options:

(1) we can extend the KVParams to include a parameter that
indicates
whether or not the store should be versioned
(2) we can introduce a configuration for whether or not to
use a
versioned
store, and each implementation can choose to read/ignore
that
config

Any preferences? (1) would align more closely with what we
are
doing
today
(they are a top-level concept in the Stores API).

A3. I like Guozhang's suggestion of making the "primary"
configuration
to
be the new one, and then having a "DefaultStoreTypeSpec"
(using
the
old
naming) which respects the old configuration. That seems
to solve
nearly
all the concerns (e.g. it'd be easy to see where the enum
is used
because
it would only be used within that one class instead of
littered
throughout
the code base).

@Sophie, unless you have objections here I will update the
KIP
to do
that.

A4. I will make these changes to the KIP to make it clear.

A5. I will rename it `DslStorePlugin` - any objections to
this
name?

A6. Let's punt this ;) I agree with everyone that this
would be a
welcome
improvement and that this KIP is aligned with moving in
that
direction.
Given how much discussion there was on this KIP, which is
minor
relative to
making the changes to StoreBuilder API, I'd rather not tie
the
two
together.

Cheers & Thanks everyone for the thoughts!
- Almog

On Sun, Jul 23, 2023 at 5:15 PM Sophie Blee-Goldman <
ableegold...@gmail.com>
wrote:

Guozhang:

On your 2nd point:

"impl types" (in hindsight it may not be a good name) for
rocksdb
/
memory / custom, and we used "store types" for kv /
windowed /
sessioned
etc,
First off, thanks so much for this clarification -- using
"store
type"
here
was definitely making me uncomfortable as this usually
refers
to KV
vs
window, etc -- but I just couldn't for the life of me
think of
the
right
term for rocks vs IM. We should 100% change to something
like
StoreImplSpec
for this kind of interface.

As for list-value store (for stream-stream Join)
Again, glad you mentioned this -- I forgot how the extra
stream-stream
join
store is not a "regular" KV Store but rather this special
list-value
store.
If we proceed with something like the current approach,
perhaps
that
should
be a boolean (or enum) parameter in the KVConfig, similar
to the
EmitStrategy? After all, the high-level goal of this KIP
is to
be
able to
fully customize all DSL state stores, and this is
currently not
possible
due to KAFKA-14976 <
https://issues.apache.org/jira/browse/KAFKA-14976
.

If we expect there to be further customizations like this
going
forward,
perhaps we could instead have each of the three
StoreConfig
classes
accept
a single enum parameter for the "sub-type" (or whatever
you
want to
call
it), which would encompass (and replace) things like the
EmitStrategy
as
well as the list-value type (we could define one enum for
each
Config
class
so there is no accidentally requesting a LIST_VALUE
subtype on a
WindowStore). Thoughts?

Lastly, regarding 3.b:

I love that you brought this up because that is actually
what I
first
proposed to Almog, ie introducing a param class to clean
up the
StoreBuilder API, during our chat that led to this KIP. He
pushed
back,
claiming (rightly so) that this change would be huge in
scope
for a
purely
aesthetic/API change that doesn't add any functionality,
and
that
it
makes
more sense to start with the DSL config since there is a
clear
gap
in
functionality there, particularly when it comes to custom
state
stores
(reasons 1 & 3 in the Motivation section). He did agree
that the
param
classes were a better API, which is why you see them in
this
KIP.
In
other
words: I fully agree that we should apply this
improvement to
the
PAPI/StoreBuilder interface as well, but I think that's a
distinct
concept
for the time-being and should not block the DSL
improvement.
Rather,
I
consider this KIP as setting the stage for a followup KIP
down
the
line
to
clean up the StoreBuilders and bring them in line with
the new
param/config
class approach.

That said,  A) we should definitely make sure whatever we
introduce
here
can be extended to the PAPI StoreBuilders in a natural
way, and
B)
I
should
clarify that I personally would be happy to see this
included in
the
current KIP, but as Almog's KIP it's up to him to decide
whether
he's
comfortable expanding the scope like this. If you can
convince
him
where
I
could not, more power to you! :P

On Sun, Jul 23, 2023 at 4:48 PM Sophie Blee-Goldman <
ableegold...@gmail.com>
wrote:

Matthias:

I'm not sure I agree with (or maybe don't follow) this
take:

we need all kind of `StoreTypeSpec` implementations,
and it might also imply that we need follow up KIPs for
new
feature
(like in-memory versioned store) that might not need a
KIP
otherwise.

I see this feature as being a nice add-on/convenience
API for
any
store
types which have a full DSL implementation. I don't
think it's
unreasonable
to just say that this feature is only going to be
available for
store
types
that have KV, Window, and Session implementations. I
can't
think
of
any
case besides versioned stores where this would force a
KIP for
a
new
feature that would not otherwise have to go through a
KIP, and
even
for
versioned state stores, the only issue is that the KIP
for that
was
already
accepted.

However, I think I agree on your main point -- that
things like
"regular"
vs timestamped vs versioned are/should be an
implementation
detail
that's
hidden from the user. As I noted previously, the current
KIP
actually
greatly improves the situation for timestamped stores,
as this
would be
handled completely transparently by the OOTB
RocksDBStoreSpec.
To
me,
this
provides a very natural way to let the DSL operators
using the
default
store type/spec to specify which kind of store (eg
versioned/timestamped/etc) it wants, and choose the
correct
default. If
the
eventual intention is to have versioned state stores
replace
timestamped
stores as the default in the DSL, then we can simply
swap out
the
versioned
stores for the timestamped stores in the
RocksDBStoreTypeSpec,
when
that
time comes. Until then, users who want to use the
versioned
store
will
have
to do what they do today, which is individually override
operators
via
Materialized/StoreSuppliers.

All in all, it sounds like we should not offer a
versioned
store
type
spec, as "versioned" is more akin to "timestamped" than
to a
true
difference in underlying store implementation type (eg
rocks vs
in-memory).
W.r.t whether to deprecate the old config or introduce a
new
CUSTOM
enum
type, either seems fine to me, and we can go with that
alternative
instead.
The only other con to this approach that I can think of,
and
I'm
honestly
not sure if this is something users would care about or
only
devs,
is
that
the advantage to moving rocks and IM to the store type
spec
interface
is
that it helps to keep the relevant logic encapsulated in
one
easy
place
you
can quickly check to tell what kind of state store is
used
where.
In
the
current code, I found it extremely annoying and
difficult to
track
down
all
usages of the StoreType enum to see which actual rocksdb
store
was
being
used where (for example some stores using the
TimeOrderedBuffer
variants
in
some special cases, or to understand whether the DSL was
defaulting
to
plain, timestamped, or versioned stores for RocksDB vs
InMemory --
both
of
which seem like they could be of interest to a user).
This
would
be
much
easier if everything was handled in one place, and you
can
just go
to
the
(eg) RocksDBStoreTypeSpec and see what it's doing, or
find
usages
of
the
methods to understand what stores are being handed to
which DSL
operators.

I suppose we could still clean up the API and solve this
problem
by
having
the old (and new) config delegate to a StoreTypeSpec no
matter
what,
but
make RocksDBStoreTypeSpec and InMemoryStoreTypeSpec
internal
classes
that
are simply implementation details of the ROCKSDB vs
IN_MEMORY
enums.
WDYT?


On Sun, Jul 23, 2023 at 11:14 AM Guozhang Wang <
guozhang.wang...@gmail.com>
wrote:

Thanks everyone for the great discussions so far! I
first saw
the
JIRA
and left some quick thoughts without being aware of the
already-written KIP (kudos to Almog, very great one)
and the
DISCUSS
thread here. And I happily find some of my initial
thoughts
align
with
the KIP already :)

Would like to add a bit more of my 2c after reading
through
the
KIP
and the thread here:

1. On the high level, I'm in favor of pushing this KIP
through
without
waiting on the other gaps to be closed. In my back
pocket's
"dependency graph" of Kafka Streams roadmap of large
changes
or
feature gaps, the edges of dependencies are defined
based on
my
understanding of whether doing one first would largely
complicate /
negate the effort of the other but not vice versa, in
which
case
we
should consider getting the other done first. In this
case, I
feel
such a dependency is not strong enough, so encouraging
the KIP
contributor to finish what he/she would love to do to
close
some
gaps
early would be higher priorities. I did not see by just
doing
this
we
could end up in a worse intermediate stage yet, but I
could be
corrected.

2. Regarding the store types --- gain here I'd like to
just
clarify
the terms a bit since in the past it has some
confusions: we
used
"impl types" (in hindsight it may not be a good name)
for
rocksdb /
memory / custom, and we used "store types" for kv /
windowed /
sessioned etc, as I said in the JIRA I think the current
proposal
also
have a good side effect as quality bar to really
enforce us
think
twice when trying to add more store types in the future
as it
will
impact API instantiations. In the ideal world, I would
consider:

* We have (timestamped) kv store, versioned kv store,
window
store,
session store as first-class DSL store types. Some DSL
operators
could
accept multiple store types (e.g. versioned and non
versioned
kv-store) for semantics / efficiency trade-offs. But I
think
we
would
remove un-timestamped kv stores eventually since that
efficiency
trade-off is so minimal compared to its usage
limitations.
* As for list-value store (for stream-stream Join),
memory-lru-cache
(for PAPI use only), memory-time-ordered-buffer (for
suppression),
they would not be exposed as DSL first-class store
types in
the
future. Instead, they would be treated as internal used
stores
(e.g.
list-value store is built on key-value store with
specialized
serde
and putInternal), or continue to be just convenient
OOTB PAPI
used
stores only.
* As we move on, we will continue to be very, very
strict on
what
would be added as DSL store types (and hence requires
changes
to
the
proposed APIs), what to be added as convenient OOTB
PAPI store
impls
only, what to be added as internal used store types that
should
not be
exposed to users nor customizable at all.

3. Some more detailed thoughts below:

3.a) I originally also think that we can extend the
existing
config,
rather than replacing it. The difference was that I was
thinking
that
order-wise, the runtime would look at the API first,
and then
the
config, whereas in your rejected alternative it was
looking at
the
config first, and then the API --- that I think is a
minor
thing
and
either is fine. I'm in agreement that having two configs
would be
more
confusing to users to learn about their precedence
rather than
helpful, but if we keep both a config and a public API,
then
the
precedence ordering would not be so confusing as long
as we
state
them
clearly. For example:

* We have DefaultStoreTypeSpec OOTB, in that impl we
look at
the
config only, and would only expect either ROCKS or
MEMORY, and
return
corresponding OOTB store impls; if any other values
configured,
we
error out.
* Users extend that by having MyStoreTypeSpec, in which
they
could
do
arbituray things without respecting the config at all,
but our
recommended pattern in docs would still say "look into
the
config,
if
it is ROCKS or MEMORY just return fall back to
DefaultStoreTypeSepc;
otherwise if it's some String you recognize, then
return your
customized store based on the string value, otherwise
error
out".

3.b) About the struct-like Params classes, I like the
idea a
lot
and
wished we would pursue this in the first place, but if
we
only do
this
in Spec it would leave some inconsistencies with the
StoreBuilders
though arguably the latter is only for PAPI. I'm
wondering if
we
should consider including the changes in StoreBuilders
(e.g.
WindowStoreBuilder(WindowSupplierParams)), and if yes,
maybe
we
should
also consider renaming that e.g. `WindowSupplierParams`
to
`WindowStoreSpecParams` too? For this one I only have a
"weak
feeling"
so I can be convinced otherwise :P

Thanks,
Guozhang



On Sun, Jul 23, 2023 at 9:52 AM Matthias J. Sax <
mj...@apache.org>
wrote:

Thanks for all the input. My intention was not to
block the
KIP,
but
just to take a step back and try get a holistic
picture and
discussion,
to explore if there are good/viable alternative
designs. As
said
originally, I really like to close this gap, and was
always
aware
that
the current config is not flexible enough.


I guess, my "concern" is that the KIP does increase
the API
surface
area
significantly, as we need all kind of `StoreTypeSpec`
implementations,
and it might also imply that we need follow up KIPs
for new
feature
(like in-memory versioned store) that might not need a
KIP
otherwise.

The second question is if it might make the already
"patchy"
situation
with regard to customization worse.

We did de-scope the original KIP-591 for this reason,
and
given
the
new
situation of the DSL, it seems that it actually got
worse
compared
to
back in the days.

Lastly, I hope to make the new versioned stores the
default
in
the
DSL
and we did not do it in the previous KIP due to
backward
compatibility
issues. Thus, from a DSL point of view, I believe there
should
be
only
"RocksDB", "InMemory", and "Custom" in an ideal world.
Introducing
(I
am
exaggerating to highlight my point) "KvRocksDbSpec",
"TimestampeKvRocksDbSpec", "VersionedRocksDbSpec",
plus the
corresponding in-memory specs seems to head into the
opposite
direction.
-- My goal is to give users a handle of the _physical_
store
(RocksDB
vs
InMemory vs Custom) but not the _logical_ stores
(plain kv,
ts-kv,
versioned) which is "dictated" by the DSL itself and
should
not
be
customizable (we are just in a weird intermediate
situation
that
we
need
to clean up, but not "lean into" IMHO).

Thus, I am also not sure if adding
"VersionedRocksDbSpec"
would
be
ideal
(also, given that it only changes a single store, but
not the
two
windowed stores)?

Furthermore, I actually hope that we could use the new
versioned
store
to replace the window- and sessions- stores, and thus
to
decrease
the
number of required store types.


Admittedly, I am talking a lot about a potential
future, but
the
goal
is
only to explore opportunities to not get into "worse"
intermediate
state, that will require a huge deprecation surface
area
later
on.
Of
course, if there is no better way, and my concerns are
not
shared, I
am
ok to move forward with the KIP.


Bottom line: I would personally prefer to keep the
current
config
and
add a `Custom` option to it, plus adding one new
config that
allows
people to set their custom `StoreTypeSpec` class. -- I
would
not
add a
built-in spec for versioned stores at this point (or
any
other
built-in
`StorytypeSpec` implementations). I guess, users could
write
a
custom
spec if they want to enable versioned store across the
board
for
now
(until we make them the default anyway)?


Hope my train of thoughts is halfway reasonable and not
totally
off
track?


-Matthias

On 7/21/23 15:27, Sophie Blee-Goldman wrote:
I agree with everything Almog said above, and will
just add
on
to
two
points:

1. Regarding whether to block this KIP on the
completion of
any or
all
future implementations of in-memory version stores (or
persist
suppression
buffers), I feel that would be unfair to this feature
which
is
completely
unrelated to the semantic improvements offered by
versioned
state
stores.
It seems like the responsibility of those driving the
versioned
state
stores feature, not Almog/this KIP, to make sure that
those
bases
are
covered. Further, if anything, this KIP will help
with the
massive
proliferation of StoreSuppliers on the Stores factory
class,
and
provide
users with a much easier way to leverage the versioned
stores
without
having to muck around directly with the
StoreSuppliers.

I also thought about it a bit, and really like Almog's
suggestion
to
introduce an additional StoreSpec for the Versioned
state
stores.
Obviously
we can add the RocksDB one to this KIP now, and then
as he
mentioned,
there's an easy way to get users onto the
IMVersionedStateStore
types
once
they are completed.

Lastly, on this note, I want to point out how
smoothly this
solved
the
issue of timestamped stores, which are intended to be
the
DSL
default
but
are only a special case for RocksDB. Right now it can
be
confusing
for a
user scrolling through the endless Stores class and
seeing a
timestamped
version of the persistent but not in-memory stores.
One
could
easily
assume
there was no timestamped option for IM stores and
that this
feature
was
incomplete, if they weren't acutely aware of the
internal
implementation
details (namely that it's only required for RocksDB
for
compatibility
reasons). However, with this KIP, all that is handled
completely
transparently to the user, and we the devs, who *are*
aware
of
the
internal
implementation details, are now appropriately the ones
responsible
for
handing the correct store type to a particular
operator.
While
versioned
state stores may not be completely comparable,
depending on
whether
we want
users to remain able to easily choose between using
them or
not
(vs
timestamped which should be used by all), I still
feel this
KIP
is a
great
step in the right direction that not only should not
be
blocked on
the
completion of the IM implementations, but in fact
should
specifically
be
done first as it enables an easier way to utilize
those IM
versioned
stores. Just my 2 cents :)

2. The idea to expand the existing the config with a
CUSTOM
enum
without
introducing another config to specify the CUSTOM
store spec
does
not
seem
appropriate, or  even possible (for the reasons Almog
mentioned
above
about
config types, though perhaps there is a way I'm not
seeing). I
do
not
buy
the argument that we should optimize the API to make
it easy
for
users who
just want to switch to all in-memory stores, as I
truly
believe
this
is a
very small fraction of the potential userbase of this
feature
(anyone
who's
actually using this should please chime in!). It
seems very
likely
that the
majority of users of this feature are actually those
with
custom
state
stores, as to my knowledge, this has been the case
any/every
time
this
feature was requested by users.

That said, while I don't see any way to get around
introducing
a
new
config, I don't personally have a preference w.r.t
whether
to
keep
around
the old config or deprecate it. I simply don't get the
impression
it
is
very heavily used as it stands today, while it only
works
for
those
who
want all in-memory stores. Again, input from actual
users
would be
very
valuable here. In the absence of that data, I will
just
point
to
the
fact
that this KIP was proposed by a Streams dev (you :P),
abandoned,
picked up
by another Streams dev, and finally implemented
without ever
hearing
from a
user that they would find this useful. This is not to
disparage
the
original KIP, but just to say again, as I stated back
then,
it
seemed
like
a major opportunity loss to leave out custom state
stores

Cheers,
Sophie

On Fri, Jul 21, 2023 at 1:52 PM Almog Gavra <
almog.ga...@gmail.com>
wrote:

Thanks for all the feedback folk! Responses inline.

Basically, I'm suggesting two things: first, call
out in
some
way
(perhaps the StoreTypeSpec javadocs) that each
StoreTypeSpec
is
considered
a public contract in itself and should outline any
semantic
guarantees it
does, or does not, make. Second, we should add a
note on
ordering
guarantees in the two OOTB specs: for RocksDB we
assert
that
range
queries
will honor serialized byte ordering, whereas the
InMemory
flavor
gives no
ordering guarantee whatsoever at this time.

That makes sense to me Sophie! I'll make the changes
to the
KIP.
And
@Colt,
yes I believe that would be the new javadoc for the
generic
ReadOnlyKeyValueStore.

However, I am wondering if we should close others
gaps
first?

@Matthias, thanks for the review and thoughts! I
think we
should
separate
closing other gaps in the product from providing
this as
useful
functionality to avoid feature creep so long as the
API
proposed
here will
be suitable for when we want to close those
implementation
gaps!
My
general
proposal is that for things that are not customizable
today by
default.dsl.store they remain not customizable after
this
KIP.
The
good
news is, however, that there's no reason why this
cannot be
extended
to
cover those in the future if we want to - see
specifics
below.

Comments on the specifics below

In particular, this holds for the new
versioned-store ...
Should
versioned stores also be covered by the KIP

Is there a reason why we can't introduce a
VersionedRocksDBStoreTypeSpec
and if we ever support an in-memory an equivalent
VersionedInMemoryRocksDBStoreTypeSpec? If so, then
there
would
not
need to
be any additional changes to the API proposed in
this KIP.

For `suppress()` it's actually other way around we
only
have
an
in-memory
implementation. Do you aim to allow custom stores for
`suppress()`,
too?

We have three options here:
1) we can decide to maintain existing behavior and
use the
in-memory
implementation for all stores (not even going
through the
API
at
all)
2a) we can introduce a new parameter to the
KeyValueParams
class
(boolean
isTimeOrderedBuffer or something like that) and
return an
in-memory
store
in the implementation of RocksDBStoreTypeSpec (this
maintains
the
existing
behavior, and would allow us in the future to make
the
change
to
return a
RocksDB store if we ever provide one)
2b) same as 2a but we throw an exception if the
requested
store
type
does
not support that (this is backwards incompatible,
and since
ROCKS_DB
is the
default we probably shouldn't do this)

My proposal for now is 1) because as of KIP-825
EmitStrategy#ON_WINDOW_CLOSE is the preferred way of
suppressing
and
that
is accounted for in this API already.

Last, I am not sure if the new parameter replacing
the
existing
one
is
the
best way to go?

I'm happy either way, just let me know which you
prefer -
the
discussion
around CUSTOM is in the rejected alternatives but I'm
happy to
differ to
whatever the project conventions are :)

If it's matches existing `ROCKS_DB` or `IN_MEMORY`
we just
process
it as
we
do know, and if know we assume it's a fully
qualified class
name
and
try to
instantiate it?

Note that there is no functionality for this kind of
thing
in
AbstractConfig (it's either a String validated enum
or a
class)
so
this
would be a departure from convention. Again, I'm
happy to
implement
that if
it's preferred.

Also wondering how it would related to the existing
`Stores`
factory?

StoreTypeSpec will depend on Stores factory -
they're one
layer
removed.
You can imagine that StoreTypeSpec is just a
grouping of
methods
from the
Stores factory into a convenient package for default
configuration
purposes.

Thanks again for all the detailed thoughts Matthias!

On Fri, Jul 21, 2023 at 11:50 AM Matthias J. Sax <
mj...@apache.org

wrote:

Thanks for the KIP. Overall I like the idea to
close this
gap.

However, I am wondering if we should close others
gaps
first? In
particular, IIRC, we have a few cases for which we
only
have
a
RocksDB
implementation for a store, and thus, adding an
in-memory
version
for
these stores first, to make the current `IN_MEMORY`
parameter
work,
might be the first step?

In particular, this holds for the new
versioned-store
(but I
actually
believe the is some other internal store with no
in-memory
implementation). -- For `suppress()` it's actually
other
way
around
we
we only have an in-memory implementation. Do you
aim to
allow
custom
stores for `suppress()`, too?

Btw: Should versioned stores also be covered by the
KIP
(ie,
`StoreTypeSpec`)? We did consider to add a new
option
`VERSIONED`
to the
existing `default.dsl.store` config, but opted out
for
various
reasons.

Last, I am not sure if the new parameter replacing
the
existing
one
is
the best way to go? Did you put the idea to add
`CUSTOM`
to
the
existing
config into rejected alternative. Personally, I
would
prefer
to
add
`CUSTOM` as I would like to optimize to easy of use
for
the
majority of
users (which don't implement a custom store), but
only
switch to
in-memory sometimes. -- As an alternative, you
would also
just
extend
`dsl.default.store` (it's just a String) and allow
to
pass in
anything.
If it's matches existing `ROCKS_DB` or `IN_MEMORY`
we just
process
it as
we do know, and if know we assume it's a fully
qualified
class
name
and
try to instantiate it? -- Given that we plan to
keep the
store-enum, is
seems cleaner to keep the existing config and keep
both
the
config
and
enum aligned to each other?


It's just preliminary thought. I will need to go
back an
take a
more
detailed look into the code to grok how the propose
`StoreTypeSpec`
falls into place. Also wondering how it would
related to
the
existing
`Stores` factory?

-Matthias


On 7/21/23 6:45 AM, Colt McNealy wrote:
Sophie—

Thanks for chiming in here. +1 to the idea of
specifying
the
ordering
guarantees that we make in the StorageTypeSpec
javadocs.

Quick question then. Is the javadoc that says:

Order is not guaranteed as bytes lexicographical
ordering
might
not
represent key order.

no longer correct, and should say:

Order guarantees depend on the underlying
implementation of
the
ReadOnlyKeyValueStore. For more information, please
consult
the
[StorageTypeSpec javadocs](....)

Thanks,
Colt McNealy

*Founder, LittleHorse.dev*


On Thu, Jul 20, 2023 at 9:28 PM Sophie
Blee-Goldman <
ableegold...@gmail.com>
wrote:

Hey Almog, first off, thanks for the KIP! I (and
others)
raised
concerns
over how restrictive the default.dsl.store config
would
be
if
not
extendable to custom store types, especially
given that
this
seems to
be
the primary userbase of such a feature. At the
time we
didn't
really
have
any better ideas for a clean way to achieve that,
but
what
you
proposed
makes a lot of sense to me. Happy to see a good
solution to
this,
and
hopefully others will share my satisfaction :P

I did have one quick piece of feedback which
arose from
an
unrelated
question posed to the dev mailing list w/ subject
line
"ReadOnlyKeyValueStore#range()
Semantics"
<

https://lists.apache.org/thread/jbckmth8d3mcgg0rd670cpvsgwzqlwrm>.
I
recommend checking out the full thread for
context, but
it
made
me
think
about how we can leverage the new StoreTypeSpec
concept
as
an
answer
to
the
long-standing question in Streams: where can we
put
guarantees
of
the
public contract for RocksDB (or other store
implementations)
when
all
the
RocksDB stuff is technically internal.

Basically, I'm suggesting two things: first, call
out in
some
way
(perhaps
the StoreTypeSpec javadocs) that each
StoreTypeSpec is
considered
a
public
contract in itself and should outline any semantic
guarantees
it
does,
or
does not, make. Second, we should add a note on
ordering
guarantees in
the
two OOTB specs: for RocksDB we assert that range
queries
will
honor
serialized byte ordering, whereas the InMemory
flavor
gives no
ordering
guarantee whatsoever at this time.

Thoughts?

-Sophie

On Thu, Jul 20, 2023 at 4:28 PM Almog Gavra <
almog.ga...@gmail.com>
wrote:

Hi All,

I would like to propose a KIP to expand support
for
default
store
types
(KIP-591) to encompass custom store
implementations:













https://cwiki.apache.org/confluence/display/KAFKA/KIP-954%3A+expand+default+DSL+store+configuration+to+custom+types

Looking forward to your feedback!

Cheers,
Almog






















Reply via email to