Re: [VOTE] KIP-806: Add session and window query over KV-store in IQv2

Bruno Cadonna Thu, 16 Dec 2021 05:09:54 -0800

Hi Partick,

Thank you for the KIP!


+1 (binding)

Best,
Bruno

On 16.12.21 07:22, Luke Chen wrote:

Hi Patrick, John,

Thanks for your explanation and update.
It looks better now.

+1 (non-binding) from me.

just one minor comment:
10. In the example section, we are still using the variable names `lower`
and `upper` as before. We should update them, too. (follow the decision for
point 7 above)

Thank you.
Luke

On Thu, Dec 16, 2021 at 3:34 AM Guozhang Wang <wangg...@gmail.com> wrote:

Thanks Patrick, John.

I read through the updated KIP and it's much cleared on the scope now,
appreciate that! I'm +1 overall, modulo a few minor comments:

7. The parameter names of `WindowKeyQuery#withKeyAndWindowStartRange`, i.e.
`startTime` and `endTime` seem not updated; also for the parameter names of
`WindowRangeQuery#withWindowStartRange`, the `earliest / latest` seems
inconsistent with the existing API parameter names, how about naming both
of them `fromWindowStartTime` and `toWindowStartTime`?

8.  `getStartTime / getEndTime` are not updated as well: should they be
`getEarliestWindowStartTime` / `getLatestWindowStartTime` or `getFrom...
getTo...` if you agree with 7) above?

9. "One reason we cannot use WindowKeyQuery to support
SessionStore.fetch(key)": not sure I understand this part, for the new APIs
we do not need to be following the old API's return types since they are
separated, right? So if we think it's actually better for the new API
mimicing `sessionStore.fetch(key)` to return a `WindowStoreIterator<V>`,
why can't we just do that?


Guozhang

On Wed, Dec 15, 2021 at 8:49 AM John Roesler <vvcep...@apache.org> wrote:

FYI, I filed this follow-on task to explore a more general
pattern for these queries:
https://issues.apache.org/jira/browse/KAFKA-13548

We can unblock the current scope for these queries but still
plan to revisit the API before the first release of IQv2.

Thanks!
-John

On Wed, 2021-12-15 at 10:34 -0600, John Roesler wrote:

Thanks for the update, Patrick!

Tl;dr: I'm +1 (binding)

I just reviewed the KIP again (I hope you don't mind, I
fixed a couple of missed renames in the text and examples).

One of the design of IQv2 is to make proposing and evolving
these queries much less onerous. Unlike adding new methods
to all StateStore interfaces and implementations, if we want
to make (eg) the WindowRangeQuery more flexible in the
future, we can easily do so by just adding some builder
methods to set the bounds independently, or by adding new
static methods to provide different parameterizations.

Therefore, even though this is not the ultimate expression
of what we think the range queries should be, it's a
perfectly fine starting point. The most pressing concerns to
me were the cases where Luke and Guozhang pointed out that
some parts of the proposal or interfaces were ambiguous,
which looks like it's fixed now.

My preference would be to go ahead and accept this proposed
scope so that we have at least some basic key-value, window,
and session store queries in the code base. Until we have
those MVP queries in place, we can't really start to address
higher-level follow-up items like:
https://issues.apache.org/jira/browse/KAFKA-13541 (to refine
how the serdes interplay with the queries) or
https://issues.apache.org/jira/browse/KAFKA-13541 (to
consider alternative ways to pin down the generic type
parameters better).

Anyway, for the current version of the KIP, my vote is +1
(binding).

Thanks again!
-John

On Wed, 2021-12-15 at 12:53 +0100, Patrick Stuedi wrote:

Thanks everyone for the sharing comments and suggestions, this is
very helpful!

I have updated the KIP based on the suggestions, please let me know

what

you think. Here are the main changes:
- Removed constructor in WindowRangeQuery (Guozhang)
- Clearly specified how each of the instantiation of the different

query

types maps to specific operations in WindowStore and SessionStore

(Guozhang)

- Renamed the window start end time parameters and getter methods

(Luke)

- Added some comments on the use of WindowRangeQuery.fromKey

(Guozhang,

John)

The KIP currently takes a minimalist approach to move things forward.

There

are two follow ups I can think of from this KIP:
* Add additional parameter combinations to WindowRangeQuery, e.g.,

support

for key range with and without window ranges.
* Remove WindowRangeQuery.fromKey and either use
WindowKeyQuery.withKeyAndWindowBounds or add a new call like
WindowKeyQuery.withKeyAndNoWindowBounds(): either way that would then

raise

the question which operation in WindowStore this query should map to,

given

that WindowKeyQuery is templated against WindowStoreIterator and the
current use of WindowRangeQuery.fromKey is to call SessionStore.fetch

which

returns a KeyValueIterator.
* Combine WindowKeyQuery and WindowRangeQuery, e.g., by exposing the
template type
* Create a "Builder" interface for the query types to avoid blowing

up

the

static methods.

Since I think any of those tasks will require some more discussion,

and in

the interest of being pragmatic here, I suggest we defer those
optimizations to future KIPs.

Best,
   Patrick

On Tue, Dec 14, 2021 at 1:02 AM Guozhang Wang <wangg...@gmail.com>

wrote:

Hi John,

Please see my follow-up comments inlined below.

On Mon, Dec 13, 2021 at 2:26 PM John Roesler <vvcep...@apache.org>

wrote:

Hi Patrick, thanks for the KIP!

I hope you, Guozhang, and Luke don't mind if I share some
thoughts:

1. I think you just meant to remove that private constructor
from the KIP, right?


Yes.


2. I think WindowRangeQuery#withWindowRage(windowLower,
windowUpper) is the version that yields an iterator over
both keys and windows, so it's not totally redundant.

However, it does seem like WindowRangeQuery#withKey is
equivalent to WindowKeyQuery#withKey . I overlooked it
before, but we probably don't need both.

Stepping back, it seems like we could either just de-
duplicate that specific withKey query or we could even just
merge both use cases into the WindowRangeQuery. I've
personally never been convinced that the extra complexity of
having the WindowStoreIterator is worth the savings of not
duplicating the key on each record. Maybe it would be if we
separately deserialized the same key over and over again,
but that seems optimizable.

On the other hand, that's more work, and the focus for IQv2
right now is to just get some MVP of queries implemented to
cover basic use cases, so it might be worthwhile to keep it
simple (by just dropping `WindowRangeQuery#withKey`). I'm
happy with whatever call Patrick wants to make here, since
it's his work to do.


My original comments was actually referring to that

`WindowRangeQuery` does

not allow users to specify a range of [keyFrom: windowFrom, keyTo:
windowTo], but only [key: windowFrom, key: windowTo], or [-INF:

windowFrom,

INF: windowTo] if we do not specify the key but only do

`withWindowRage`. I

was not sure if this is intentional by design, i.e. that the only
difference between the two classes is that the latter allows [-INF:
windowFrom, INF: windowTo], and and I was assuming it's now, but

otherwise

these two are then very much alike.

So I guess my question is that, is it by-design to not allow users

do

something like `query.withWindowRange(..).withKeyRange(..)`?


3. I got the impression that WindowRangeQuery was intended
for use with both Window and Session stores, but that might
be my assumption.

Today the session store's query API is at least different on the

names,

e.g. "findSessions", so I'm not sure if the proposal intend to

replace

these functions with the WindowRangeQuery such that:

sessionStore.findSessions(K, Ins, Ins) ->

query.withKey().withWindowRange()

sessionStore.findSessions(K, K, Ins, Ins) ->
query.withKeyRange().withWindowRange()   // again, assuming we

would

want

`withKeyRange` as in 2) above.
sessionStore.fetch(K) -> query.withKey()
sessionStore.fetch(K, K) -> query.withKeyRange()

Also, sessionStore.fetchSession(K, Ins, Ins) seems would not be

supported

with the new API. I'm not enforcing that we have the complete

coverage in

this KIP, and I'm with you that we would focus on getting some MVP

out

sooner, but I'd like our KIP to clarify on what's really covered

comparing

against the old API and what's not.

4. That's good feedback. I think those names were inspired
by the WindowStore methods that just say stuff like
`timeFrom` and `timeTo`. The SessionStore does a better job
of being unambiguous, since it says stuff like
`earliestSessionEndTime` and `latestSessionStartTime`.

And, actually, that brings up a point that I think we
overlooked before. While the method signatures of the
WindowStore and SessionStore methods look the same, the
meanings of their arguments are not the same. In particular,
the WindowStore ranges are always in reference to the window
start time, while the SessionStore ranges may be in
reference to the window start time or end time (or different
combinations of both).

I suspect that your instinct to cover both stores with the
same Query is correct, and we just need to be specific about
the kinds of ranges we're talking about. For example,
instead of withWindowRange, maybe we would have:

withWindowStartRange(
   Instant earliestWindowStartTime,
   Instant latestWindowStartTime
);

Then in the future, we could add stuff like:

withWindowStartAndEndRange(
   Instant earliestWindowStartTime,
   Instant latestWindowEndTime
);

Etc.

5. I think Luke's question reveals how weirdly similar but
not the same these two queries are. It's not your fault,
this is inherited from the existing set of weirdly-similar-
but-not-the-same store methods. The thing that distinguishes
the WindowKeyQuery from the WindowRangeQuery is:
* WindowKeyQuery: all results share the same K key (cf point
2: I don't think we need `WindowRangeQuery#withKey(k)`).
Therefore, the iterator is just (windowStartTime, value)
pairs.
* WindowRangeQuery: the results may have different keys, so
the iterator is (Windowed<K>, value) pairs, where
Windowed<K> is basically a (windowStartTime, K key) pair.

Therefore, we could support a WindowRangeQuery with a fixed
key, but it would just be another way to express the same
thing as the WindowKeyQuery with the same parameters.

As I suggested, we do not necessarily need to cover all existing

cases

compared with the old API, but it's better to be very clear on

what's

actually covered v.s. what's not. For example in WindowRangeQuery,

I'd like

to clarify if all the following are intended in this KIP's scope:

* query.withKey().withWindowRange()
* query.withKey() // range all window
* query.withWindowRange() // range all keys.

As in point 2, I do think it might be worth converging all
use cases into the WindowRangeQuery, but we can also just
shoot for parity now and defer elegance for the future.

6. I'll just add one question of my own here: If we do keep
both query classes, then I think we would drop `withKey` and
`getKey` from the WindowRangeQuery. But if we do wind up
keeping the WindowRangeQuery class, I think we should
reconsider the name of the getter, since there are other
range queries that support a range of keys. Is there a
getter name we can use that would still work if we come back
later and add lower and upper key bounds?

Thanks again for the KIP!
John

On Mon, 2021-12-13 at 16:35 +0800, Luke Chen wrote:

Hi Patrick,

Thanks for the KIP!

I have some comments, in addition to Guozhang's comments:
4. The parameter names `windowLower` and `windowUpper` are kind

of

ambiguous to me. Could we come up a better name for it, like
`windowStartTime`, `windowEndTime`, or even we don't need the

"window"

name, just `startTime` and `endTime`?
5. Why can't we support window range query with a key within a

time

range?

You might need to explain in the KIP.

Thank you.
Luke


On Sat, Dec 11, 2021 at 7:54 AM Guozhang Wang <

wangg...@gmail.com>

wrote:

Hi Patrick,

I made a pass on the KIP and have a few comments below:

1. The `WindowRangeQuery` has a private constructor while the
`WindowKeyQuery` has not, is that intentional?

2. The `WindowRangeQuery` seems not allowing to range over

both

window

and

key, but only window with a fixed key, in that case it seems

pretty

much

the same as the other (ignoring the constructor), since we

know we

would

only have a single `key` value in the returned iterator, and

hence it

seems

returning in the form of `WindowStoreIterator<V>` is also

fine

as the

key

is fixed and hence no need to maintain it in the returned

iterator.

I'm

wondering should we actually support ranging over keys as

well

in

`WindowRangeQuery`?

3. The KIP title mentioned both session and window, but the

APIs only

involves window stores; However the return type

`WindowStoreIterator`

is

only for window stores not session stores, so I feel we would

still

have

some differences for session window query interface?


Guozhang

On Fri, Dec 10, 2021 at 1:32 PM Patrick Stuedi
<pstu...@confluent.io.invalid>
wrote:

Hi everyone,

I would like to start the vote for KIP-806 that adds window

and

session

query support to query KV-stores using IQv2.

The KIP can be found here:
https://cwiki.apache.org/confluence/x/LJaqCw

Skipping the discussion phase as this KIP is following the

same

pattern

as

the previously submitted KIP-805 (KIP:
https://cwiki.apache.org/confluence/x/85OqCw, Discussion:
https://tinyurl.com/msp5mcb2). Of course concerns/comments

can

still be

brought up in this thread.

-Patrick



--
-- Guozhang


--
-- Guozhang


--
-- Guozhang

Re: [VOTE] KIP-806: Add session and window query over KV-store in IQv2

Reply via email to