Re: [DISCUSS] KIP-968: Support single-key_multi-timestamp interactive queries (IQv2) for versioned state stores

Matthias J. Sax Tue, 21 Nov 2023 08:21:34 -0800

Thanks! SGTM.

Seems all open questions are resolved. Thanks for pushing this through!


-Matthias

On 11/21/23 2:29 AM, Alieh Saeedi wrote:

Yes Matthias,
Based on the discussion we had, it has now been changed to Optional and the
default is empty (for the latest). Also, the `validTo()` method returns an
Optional.

Cheers,
Alieh

On Mon, Nov 20, 2023 at 7:38 PM Matthias J. Sax <mj...@apache.org> wrote:

I think we should also discuss a little more about `validTo()` method?

Given that "latest" version does not have a valid-to TS, should we
change the return type to `Optional` and return `empty()` for "latest"?

ATM the KIP uses `MAX_VALUE` for "latest" what seems to be less clean?
We could also use `-1` (unknown), but both might be less expressive than
`Optional`?


-Matthias

On 11/20/23 1:59 AM, Bruno Cadonna wrote:

Hi Alieh,

Although, I've already voted, I found a minor miss. You should also add
a method isDescending() since the results could also be unordered now
that we agreed that the results are unordered by default. If both --
isDescending() and isAscending -- are false neither
withDescendingTimestamps() nor withAscendingTimestamps() was called.

Best,
Bruno

On 11/17/23 11:25 AM, Alieh Saeedi wrote:

Hi all,
Thank you for the feedback.

So we agreed on no default ordering for keys and TSs. So I must provide
both withAscendingXx() and with DescendingXx() for the class.
Apart from that, I think we can either remove the existing constructor
for
the `VersionedRecord` class or follow the `Optional` thing.

Since many hidden aspects of the KIP are quite clear now and we have

come

to a consensus about them, I think it 's time to vote ;-)
I look forward to your votes. Thanks a lot.

Cheers,
Alieh

On Fri, Nov 17, 2023 at 2:27 AM Matthias J. Sax <mj...@apache.org>

wrote:

Thanks, Alieh.

Overall SGTM. About `validTo` -- wondering if we should make it an
`Optional` and set to `empty()` by default?

I am totally ok with going with the 3-way option about ordering using
default "undefined". For this KIP (as it's all net new) nothing really
changes. -- However, we should amend `RangeQuery`/KIP-985 to align it.

Btw: so far we focused on key-ordering, but I believe the same

"ordering

undefined by default" would apply to time-ordering, too? This might
affect KIP-997, too.


-Matthias

On 11/16/23 12:51 AM, Bruno Cadonna wrote:

Hi,

80)
We do not keep backwards compatibility with IQv1, right? I would even
say that currently we do not need to keep backwards compatibility

among

IQv2 versions since we marked the API "Evolving" (do we only mean code
compatibility here or also behavioral compatibility?). I propose to

try

to not limit ourselves for backwards compatibility that we explicitly
marked as evolving.
I re-read the discussion on KIP-985. In that discussion, we were quite
focused on what the state store provides. I see that for range

queries,

we have methods on the state store interface that specify the order,
but
that should be kind of orthogonal to the IQv2 query type. Let's assume
somebody in the future adds a state store implementation that is not
order based. To account for use cases where the order does not matter,
this person might also add a method to the state store interface that
does not guarantee any order. However, our range query type is
specified
to guarantee order by default. So we need to add something like
withNoOrder() to the query type to allow the use cases that does not
need order and has the better performance in IQ. That does not look
very
nice to me. Having the no-order-guaranteed option does not cost us
anything and it keeps the IQv2 interface flexible. I assume we want to
drop the Evolving annotation at some point.
Sorry for not having brought this up in the discussion about KIP-985.

Best,
Bruno





On 11/15/23 6:56 AM, Matthias J. Sax wrote:

Just catching up on this one.


50) I am also in favor of setting `validTo` in VersionedRecord for
single-key single-ts lookup; it seems better to return the proper
timestamp. The timestamp is already in the store and it's cheap to
extract it and add to the result, and it might be valuable

information

for the user. Not sure though if we should deprecate the existing
constructor though, because for "latest" it's convenient to have?


60) Yes, I meant `VersionedRecord`. Sorry for the mixup.


80) We did discuss this question on KIP-985 (maybe you missed it
Bruno). It's kinda tricky.

Historically, it seems that IQv1, ie, the `ReadOnlyXxx` interfaces
provide a clear contract that `range()` is ascending and
`reverseRange()` is descending.

For `RangeQuery`, the question is, if we did implicitly inherit this
contract? Our conclusion on KIP-985 discussion was, that we did
inherit it. If this holds true, changing the contract would be a
breaking change (what might still be acceptable, given that the
interface is annotated as unstable, and that IQv2 is not widely
adopted yet). I am happy to go with the 3-option contract, but just
want to ensure we all agree it's the right way to go, and we are
potentially willing to pay the price of backward incompatibility.

Do we need a snapshot semantic or can we specify a weaker but still
useful semantic?


I don't think we necessarily need it, but as pointed out by Lucas,

all

existing queries provide it. Overall, my main point is really about
not implementing something "random", but defining a proper binding
contract that allows users to reason about it.

I general, I agree that weaker semantics might be sufficient, but I

am

not sure if we can implement anything weaker in a reasonable way?
Happy to be convinced otherwise. (I have some example, that I will
omit for now, as I hope we can actually go with snapshot semantics.)

The RocksDB Snaptshot idea from Lucas sounds very promising. I was

not

aware that we could do this with RocksDB (otherwise I might have
suggested it on the PR right away). I guess the only open question
remaining would be, if we can provide the same guarantees for a

future

in-memory implementation for VersionedStores? It sounds possible to
do, but we should have some level of confidence about it?


-Matthias

On 11/14/23 6:33 AM, Lucas Brutschy wrote:

Hi Alieh,

I agree with Bruno that a weaker guarantee could be useful already,
and it's anyway possible to strengthen the guarantees later on. On
the
other hand, it would be nice to provide a consistent view across all
segments if it doesn't come with major downsides, because until
now IQ
does provide a consistent view (via iterators), and this would be

the

first feature that diverges from that guarantee.

I think a consistent could be achieved relatively easily by
creating a
snapshot (https://github.com/facebook/rocksdb/wiki/Snapshot) that
will
ensure a consistent view on a single DB and using
`ReadOptions.setSnapshot` for the gets. Since versioned state stores
segments seem to be backed by a single RocksDB instance (that was
unclear to me during our earlier discussion), a single snapshot
should
be enough to guarantee a consistent view - we just need to make sure
to clean up the snapshots after use, similar to iterators. If we
instead need a consistent view across multiple RocksDB instances, we
may have to acquire a write lock on all of those (could use the
current object monitors of our `RocksDB` interface) for the duration
of creating snapshots across all instances - which I think would

also

be permissible performance-wise. Snapshots are just a sequence

number

and should be pretty lightweight to create (they have, however,
downside when it comes to compaction just like iterators).

With that in mind, I would be in favor of at least exploring the
option of using snapshots for a consistent view here, before

dropping

this useful guarantee.

Cheers,
Lucas

On Tue, Nov 14, 2023 at 2:20 PM Bruno Cadonna <cado...@apache.org>
wrote:


Hi Alieh,

Regarding the semantics/guarantees of the query type:

Do we need a snapshot semantic or can we specify a weaker but still
useful semantic?

An option could be to guarantee that:
1. the query returns the latest version when the query arrived at
the
state store
2. the query returns a valid history, i.e., versions with
adjacent and
non-overlapping validity intervals.

I think guarantee 1 is quite obvious. Guarantee 2 maybe needs some
explanation. If we do not avoid writes to the state store during

the

processing of interactive queries, it might for example happen that

the

latest version in the state store moves to data structures that are
responsible for older versions. In our RocksDB implementation
that are
the segments. Thus, it could be that during query processing Kafka
Streams reads the latest value x and encounters again x in a

segment

because a processor put a newer version of x in the versioned state
store. A similar scenario might also happen to earlier versions. If
Streams does not account for such cases it could return invalid
histories.

Maybe such weaker guarantees are enough for most use cases.

You could consider implementing weaker guarantees as I described

and

if

there is demand propose stricter guarantees in a follow-up KIP.

Maybe there are also other simpler guarantees that make sense.

Best,
Bruno


On 11/9/23 12:30 PM, Bruno Cadonna wrote:

Hi,

Thanks for the updates!

First my take on previous comments:


50)
I am in favor of deprecating the constructor that does not take

the

validTo parameter. That implies that I am in favor of modifying
get(key,
asOf) to set the correct validTo.


60)
I am in favor of renaming ValueIterator to VersionedRecordIterator

and

define it as:

public interface VersionedRecordIterator<V> extends
Iterator<VersionedRecord<V>>

(Matthias, you mixed up ValueAndTimestamp with VersionedRecord in

your

last e-mail, didn't you? Just double-checking if I understood what

you

are proposing.)


70)
I agree with Matthias that adding a new method on the
VersionedKeyValueStore interface defeats the purpose of one of the
goals
of IQv2, i.e., not to need to extend the state store interface for
IQ. I
would say if we do not need the method in normal processing, we

should

not extend the public state store interface. BTW, nobody forces

you

to

StoreQueryUtils. I think that internal utils class was
introduced for
convenience to leverage existing methods on the state store

interface.



80)
Why do we limit ourselves by specifying a default order on the

result?

Different state store implementation might have different
strategies to
store the records which affects the order in which the records are
returned if they are not sorted before they are returned to the
user.
Some users might not be interested in an order of the result and

so

there is no reason those users pay the cost for sorting. I
propose to
not specify a default order but sort the results (if needed) when
withDescendingX() and withAscendingX() is specified on the query
object.


Regarding the snapshot guarantees for the iterators, I need to
think

bit more about it. I will come back to this thread in the next
days.


Best,
Bruno


On 11/8/23 5:30 PM, Alieh Saeedi wrote:

Thank you, Bruno and Matthias, for keeping the discussion going
and for
reviewing the PR.

Here are the KIP updates:

       - I removed the `peek()` from the `ValueIterator` interface
since
we do
       not need it.
       - Yes, Bruno, the `validTo` field in the `VersionedRecod`
class is
       exclusive. I updated the javadocs for that.


Very important critical open questions. I list them here based on
priority
(descendingly).

       - I implemented the `get(key, fromtime, totime)` method

here

https://github.com/aliehsaeedii/kafka/blob/9578b7cb7cdade22cc63f671693f5aeb993937ca/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBVersionedStore.java#L262

       the problem is that this implementation does not guarantee
consistency
       because processing might continue interleaved (no snapshot
semantics is
       implemented). More over, it materializes all results in
memory.
          - Solution 1: Use a lock and release it after
retrieving all
desired
          records from all segments.
             - positive point: snapshot semantics is implemented
             - negative points: 1) It is expensive since iterating
over all
             segments may take a long time. 2) It still requires
materializing results
             on memory
          - Solution 2: use `RocksDbIterator`.
             - positive points: 1) It guarantees snapshot

segments.

2) It
does
             not require materializing results in memory.
             - negative points: it is expensive because, anyway,

we

need to
             iterate over all (many) segments.

               Do you have any thoughts on this issue? (ref:

Matthias's

comment
<

https://github.com/apache/kafka/pull/14626#pullrequestreview-1709280589>)


       - I added the field `validTo` in `VersionedRecord`. Its
default
value is
       MAX. But as Matthias mentioned, for the single-key

single-ts

       (`VersionedKeyQuery` in KIP-960), it may not always be

true.

If the
       returned record belongs to an old segment, maybe it is not

valid

any more.
       So MAX is not the correct value for `ValidTo`. Two

solutions

come
to mind:
          - Solution 1: make the `ValidTo` as an `Optional` and

set

it to
          `empty` for the retuned result of `get(key,
asOfTimestamp)`.
          - Solution 2: change the implementation of `get(key,
asOfTimestamp)`
          so that it finds the correct `validTo` for the returned
versionedRecord.

          - In this KIP and the next one, even though the default
ordering is
       with ascending ts, I added the method
`withAscendingTimestamps()`
to have
       more user readibility (as Bruno suggested), while Hanyu did
only add
       `withDescending...` methods (he did not need ascending
because
that's the
       default anyway). Matthias believes that we should not have
       inconsistencies (he actually hates them:D). Shall I
change my
KIP
or Hanyu?
       Thoughts?


That would be maybe helpful to look into the PR
<https://github.com/apache/kafka/pull/14626> for more clarity

and

even
review that ;-)

Cheers,
Alieh

On Thu, Nov 2, 2023 at 7:13 PM Bruno Cadonna <cado...@apache.org

wrote:

Hi Alieh,

First of all, I like the examples.

Is validTo in VersionedRecord exclusive or inclusive?
In the javadocs you write:

"@param validTo    the latest timestamp that value is valid"

I think that is not true because the validity is defined by the
start
time of the new version. The new and the old version cannot
both be
valid at that same time.

Theoretically, you could set validTo to the start time of the

new

version - 1. However, what is the unit of the 1? Is it
nanoseconds?
Milliseconds? Seconds? Sure we could agree on one, but I think

it

is

more elegant to just make the validTo exclusive. Actually, you
used it
as exclusive in your examples.


Thanks for the KIP!

Best,
Bruno

On 11/1/23 9:01 PM, Alieh Saeedi wrote:

Hi all,
@Matthias: I think Victoria was right. I must add the method
`get(key,
fromTime, toTime)` to the interface `VersionedKeyValueStore`.

Right

now,
having the method only in `RocksDBVersionedStore`, made me to
have an
instance of `RocksDBVersionedStore` (instead of
`VersionedKeyValueStore`)
in `StoreQueryUtils.runMultiVersionedKeyQuery()` method. In
future, we

are

going to use the same method for In-memory/SPDB/blaBla

versioned

stores.
Then either this method won't work any more, or we have to add
code (if
clauses) for each type of versioned stores. What do you think

about

that?

Bests,
Alieh

On Tue, Oct 24, 2023 at 10:01 PM Alieh Saeedi
<asae...@confluent.io>

wrote:

Thank you, Matthias, Bruno, and Guozhang for keeping the
discussion

going.


Here is the list of changes I made:

        1. I enriched the "Example" section as Bruno suggested.
Do you

please

        have a look at that section? I think I devised an
interesting one

;-)

        2. As Matthias and Guozhang suggested, I renamed
variables and

methods

        as follows:
           - "fromTimestamp" -> "fromTime"
           - "asOfTimestamp" -> "toTime"
           - "from(Instant fromTime)" -> "fromTime(Instant
fromTime)"
           - "asOf(Instant toTime)" -> "toTime(Instant toTime)"
        3.

        About throwing an NPE when time bounds are `null`: Ok,
Matthias, I

am

        convinced by your idea. I consider a null timestamp
as "no
bound".

But I

        had to change KIP-960 and the corresponding PR to be
consistent as

well.


Answering questions and some more discussion points:

        1.

        For now, I keep the class names as they are.
        2.

        About the new field "validTo" in VersionedRecord. Yes
Matthias I

keep

        the old constructor, which does not have `validTo` as

an

input

parameter.

        But in the body of the old constructor, I consider the
default

value MAX

        for the validTo field. I think MAX must be the default
value for

`validTo`

        since before inserting a tombstone or a new value for
the
same
key,

the

        value must be valid till iternity.
        3.

        Regarding changing the ValueIterator interface from
`public

interface

        ValueIterator<V> extends Iterator<V>` to `public
interface
        ValueIterator<V> extends Iterator<VersionedRecord<V>>`:
Matthias, I

do not

        know how it improves type safety because the
MultiVersionedKeyQuery

class

        returns a ValueIterator of VersionedRecord any way. But
if we
want

to be

        consistent with KeyValueIterator, we must apply the
changes you

suggested.

        4.

        Regarding adding the new `get(key, fromTime, toTime)`

method

to the
        public interface `VersionedKeyValueStore` or only

adding

it to
the
        class `RocksDBVersionedStore`: In the KIP, I changed

the

interface

as

        Victoria suggested. But still, I am not convinced why

we

do that.

@Victoria:

        Do you please clarify why we have to define it in the
interface?

More

        specifically, why do we need to use generic
VersionedKeyValueStore
        instead of simply using the implementing classes?

Cheers,
Alieh

On Sat, Oct 14, 2023 at 3:35 AM Guozhang Wang <

guozhang.wang...@gmail.com>

wrote:

Thanks Alieh for the KIP, as well as a nice summary of all

the

discussions! Just my 2c regarding your open questions:

1. I just checked KIP-889
(

https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores

)
and we used "VersionedRecord<V> get(K key, long

asOfTimestamp);",

so I
feel to be consistent with this API is better compared with

being

consistent with "WindowKeyQuery"?

3. I agree with Matthias that naming is always tricky, and I

also

tend
to be consistent with what we already have (even if in retro
it may
not be the best idea :P and if that was really becoming a
complaint,
we would change it across the board in one shot as well
later).

Guozhang

On Wed, Oct 11, 2023 at 9:12 PM Matthias J. Sax
<mj...@apache.org>

wrote:


Thanks for the update!



Some thoughts about changes you made and open questions you
raised:


10) About asOf vs until: I was just looking into
`WindowKeyQuery`,
`WindowRangeQuery` and also `ReadOnlyWindowStore`

interfaces.

For

those,

we use "timeFrom" and "timeTo", so it seems best to actually

use

`to(Instant toTime)` to keep the naming consistent across

the

board?

If yes, we should also do `from (Instant fromTime)` and use
getters
`fromTime()` and `toTime()` -- given that it's range bounds
it seems
acceptable to me, to diverge a little bit from KIP-960

`asOfTimestamp()`

-- but we could also rename it to `asOfTime()`? -- Given

that

we

strongly type with `Instant` I am not worried about semantic

ambiguity.




20) About throwing a NPE when time bounds are `null` -- why?
(For
the
key it makes sense as is mandatory to have a key.) Could
we not
interpret `null` as "no bound". We did KIP-941 to add `null`

for

open-ended `RangeQueries`, so I am wondering if we should
just stick

to

the same semantics?

Cf

https://cwiki.apache.org/confluence/display/KAFKA/KIP-941%3A+Range+queries+to+accept+null+lower+and+upper+bounds




30) About the class naming. That's always tricky, and I am
not
married
to my proposal. I agree with Bruno that the other suggested
names
are
not really better. -- The underlying idea was, to get some

"consistent"

naming across the board.

Existing `KeyQuery`
New `VersionedKeyQuery` (KIP-960; we add a prefix)
New `MultiVersionKeyQuery` (this KIP; extend the prefix
with a

pre-prefix)


Existing `RangeQuery`
New `MultiVersionRangeQuery` (KIP-969; add same prefix as

above)




40) I am fine with not adding `range(from, to)` -- it was
just an

idea.






Some more follow up question:

50) You propose to add a new constructor and getter to

`VersionedRecord`

-- I am wondering if this implies that `validTo` is optional
because

the

existing constructor is not deprecated? -- Also, what

happens

if

`validTo` is not set and `valueTo()` is called? Or do we
intent to

make

`validTo` mandatory?

Maybe this question can only be answered when working on the
code,

but I

am wondering if we should make `validTo` mandatory or not...

And

what
the "blast radius" of changing `VersionedRecord` will be in
general.

Do

you have already some POC PR that we could look at to get
some
signals
about this?



60) The new query class is defined to return
`ValueIterator<VersionedRecord<V>>` -- while I like the idea
to add
`ValueIterator<V>` in a generic way on the one hand, I am
wondering if
it might be better to change it, and enforce its usage (ie,
return

type)

of `VersionedRecord` to improve type safety (type erasure is
often a
pain, and we could mitigate it this way).

Btw: We actually do a similar thing for `KeyValueIterator`.

Ie,

public interface ValueIterator<V> extends

Iterator<ValueAndTimestamp<V>>


and

ValueAndTimestamp<V> peek();

This would imply that the return type of the new query is
`ValueIterator<V>` on the interface what seems simpler and
more

elegant?


If we go with the change, I am also wondering if we need to
find a
better name for the new iterator class? Maybe
`VersionIterator` or
something like this?

Of course it might limit the use of `ValueIterator` for

other

value
types -- not sure if this a limitation that is prohibitive?
My gut
feeling is, that is should not be too limiting.




70) Do we really need the change in `VersionedKeyValueStore`

and

add a
new method? In the end, the idea of IQv2 is to avoid exactly
this...

It

was the main issue for IQv1, that the base interface of the
store

needed

an update and thus all classed implementing the base
interface,
making
it very cumbersome to add new query types. -- Of course, we

need

this
new method on the actually implementation (as private

method)

that can
be called from `query()` method, but adding it to the
interface
seems

to

defeat the purpose of IQv2.

Note, for existing IQv2 queries types that go against others
stores,

the

public methods already existed when IQv2 was introduces, and
thus
the
implementation of these query types just pragmatically
re-used

existing

methods -- but it does not imply that new public method
should be

added.





-Matthias


On 10/11/23 5:11 AM, Bruno Cadonna wrote:

Thanks for the updates, Alieh!

The example in the KIP uses the allVersions() method
which we
agreed

to

remove.

Regarding your questions:
1. asOf vs. until: I am fine with both but slightly prefer
until.
2. What about KeyMultiVersionsQuery, KeyVersionsQuery
(KIP-960
would
then be KeyVersionQuery). However, I am also fine with
MultiVersionedKeyQuery since none of the names sounds

better

or

worse

to

me.
3. I agree with you not to introduce the method with the

two

bounds

to

keep things simple.
4. Forget about fromTime() an asOfTime(), from() and
asOf() is
fine.
5. The main purpose is to show how to use the API. Maybe
make an

example

with just the key to distinguish this query from the single
value

query

of KIP-960 and then one with a key and a time range. When
you
iterate
over the results you could also call validTo(). Maybe add
some
actual
records in the comments to show what the result might look
like.

Regarding the test plan, I hope you also plan to add unit
tests in

all

of your KIPs. Maybe you could also explain why system tests
are not
needed here.

Best,
Bruno

On 10/10/23 5:36 PM, Alieh Saeedi wrote:

Thank you all for the very exact and constructive
comments. I
really
enjoyed reading your ideas and all the important points

you

made me

aware

of. I updated KIP-968 as follows:

         1. If the key or time bounds are null, the method
returns
NPE.
         2. The "valid" word: I removed the sentence "all
the
records

that are

         valid..." and replaced it with an exact
explanation.
More

over, I

explained
         it with an example in the KIP but not in the
javadocs. Do I

need

to add the
         example to the javadocs as well?
         3. Since I followed Bruno's suggestion and
removed the

allVersions()

         method, the problem of meaningless combinations is
solved, and

do not
         need any IllegalArgumentException or something

like

that.
Therefore, the
         change is that if no time bound is specified, the

query

returns
the records
         with the specified key for all timestamps (all
versions).
         4. As Victoria suggested, adding a method to the
*VersionedKeyValueStore
         *interface is essential. So I did that. I had this
method
only

in the

         RocksDBVersionedStore class, which was not enough.
         5. I added the *validTo* field to the
VersionedRecord
class to

be

able
         to represent the tombstones. As you suggested, we
postpone

solving

the
         problem of retrieving consecutive tombstones for

later.

         6. I added the "Test Plan" section to all KIPs.
I hope
what I
wrote is
         convincing.
         7. I added the *withAscendingTimestamp()* method

to

provide

more

code readability
         for the user.
         8. I removed the evil word "get" from all getter
methods.

There have also been some more suggestions which I am

still

not

convinced

or clear about them:

         1. Regarding asOf vs until: reading all
comments, my
conclusion
was that
         I keep it as "asOf" (following Walker's idea as

the

native

speaker

as well
         as Bruno's suggestion to be consistent with

single-key_single_ts

queries).
         But I do not have a personal preference. If you
insist on

"until",

I change
         it.
         2. Bruno suggested renaming the class
"MultiVersionedKeyQuery"

to sth

         else. We already had a long discussion about the
name with
Matthias. I am
         open to renaming it to something else, but do you
have any

ideas?

         3. Matthias suggested having a method with two
input
parameters

that

         enables the user to specify both time bounds in

the

same

method.

Isn't it
         introducing redundancy? It is somehow
disrespectful to
the idea

of

having
         composable methods.
         4. Bruno suggested renaming the methods "asOf" and
"from" to
"asOfTime"
         and "fromTime". If I do that, then it is not
consistent with

KIP-960.

         Moreover, the input parameter is clearly a
timestamp, which

explains

         enough. What do you think about that?
         5. I was asked to add more examples to the example
section. My
question
         is, what is the main purpose of that? If I know it
clearly,

then

can add
         what you mean.



Cheers,
Alieh

On Tue, Oct 10, 2023 at 1:13 AM Matthias J. Sax
<mj...@apache.org>

wrote:

Bruno and I had some background conversation about the
`get`
prefix
question including a few other committers.

The official policy was never changed, and we should not
add the
`get`-prefix. It's a slip on our side in previous KIPs to
add the
`get`-prefix and we should actually clean it up doing a
follow up

KIP.



-Matthias


On 10/5/23 5:26 AM, Bruno Cadonna wrote:

Hi Matthias,

Given all the IQv2 KIPs that use getX and given recent
PRs

(internal

interfaces mainly) that got merged, I was under the
impression

that we

moved away from the strict no-getX policy.

I do not think it was an accident using getX in the IQv2
KIPs

since

somebody would have brought it up, otherwise.

I am fine with both types of getters.

If we think, we need to discuss this in a broader
context, let's
start a
separate thread.


Best,
Bruno





On 10/5/23 7:44 AM, Matthias J. Sax wrote:

I agree to (almost) everything what Bruno said.

In general, we tend to move away from using getters
without

"get",

recently. So I would keep the "get".


This is new to me? Can you elaborate on this point? Why
do you

think

that's the case?

I actually did realize (after Walker mentioned it) that
existing

query

types use `get` prefix, but to me it seems that it
was by

accident and

we should consider correcting it? Thus, I would

actually

prefer

to not

add the `get` prefix for new methods query types.

IMHO, we should do a follow up KIP to deprecate all

methods

with

`get`

prefix and replace them with new ones without `get` --
it's of

course

always kinda "unnecessary" noise, but if we don't do

it,

we

might

get

into more and more inconsistent naming what would

result

in a

"bad"

API.

If we indeed want to change the convention and use the
`get`

prefix, I

would strongly advocate to bit the bullet and do KIP to

pro-actively

add the `get` "everywhere" it's missing... But
overall, it
seems

to be

a much broader decision, and we should get buy in from

many

committers

about it -- as long as there is no broad consensus to
add `get`
everywhere, I would strongly prefer not to diverge from

the

current

agreement to omit `get`.



-Matthias




On 10/4/23 2:36 AM, Bruno Cadonna wrote:

Hi,

Regarding tombstones:
As far as I understand, we need to add either a

validTo

field to
VersionedRecord or we need to return tombstones,
otherwise the

result

is not complete, because users could never know a
record was

deleted

at some point before the second non-null value was

put.

I like more adding the validTo field since it makes

the

result

more

concise and easier interpretable.

Extending on Victoria's example, with the following
puts

put(k, v1, time=0)
put(k, null, time=5)
put(k, null, time=10)
put(k, null, time=15)
put(k, v2, time=20)

the result with tombstones would be

value, timestamp
(v1, 0)
(null, 5)
(null, 10)
(null, 15)
(v2, 20)

instead of

value, timestamp, validTo
(v1, 0, 5)
(v2, 20, null)

The benefit of conciseness would already apply to one
single
tombstone.

On the other hand, why would somebody write

consecutive

tombstones

into a versioned state store? I guess if somebody does
that on
purpose, then there should be a way to retrieve each

of

those
tombstones, right?
So maybe we need both -- validTo field and the
option to
return
tombstones. The latter might be moved to a future
KIP in
case we

see

the need.


Regarding .within(fromTs, toTs):
I would keep it simple with .from() and
.asOfTimestamp() (or
.until()). If we go with .within(), I would opt for
.withinTimeRange(fromTs, toTs), because the query
becomes more

readable:


MultiVersionedKeyQuery
         .withKey(1)

.withinTimeRange(Instant.parse(2023-08-03T10:37:30.00Z),

Instant.parse(2023-08-04T10:37:30.00Z))

If we stay with .from() and .until(), we should
consider

.fromTime()

and .untilTime() (or .toTime()):

MultiVersionedKeyQuery
        .withKey(1)

.fromTime(Instant.parse(2023-08-03T10:37:30.00Z))

.untilTime(Instant.parse(2023-08-04T10:37:30.00Z))



Regarding asOf vs. until:
I think asOf() is more used in point in time queries

as

Walker
mentioned where this KIP specifies a time range. IMO
asOf()
fits

very

well with KIP-960 where one version is queried, but
here

think
.until() fits better. That might just be a matter of
taste and

in the

end I am fine with both as long as it is well
documented.


Regarding getters without "get":
In the other IQv2 classes we used getters with
"get". In

general, we

tend to move away from using getters without "get",
recently. So

would keep the "get".


Best,
Bruno

On 10/3/23 7:49 PM, Walker Carlson wrote:

Hey Alieh thanks for the KIP,

Weighing in on the AsOf vs Until debate I think

either

is

fine
from a
natural language perspective. Personally AsOf makes
more
sense

to me

where
until gives me the idea that the query is making a
change.
It's
totally a
connotative difference and not that important. I

think

as

of is
pretty
frequently used in point of time queries.

Also for these methods it makes sense to drop the
"get" We

don't

normally use that in getters

          * The key that was specified for this query.
          */
         public K getKey();

         /**
          * The starting time point of the query, if
specified
          */
         public Optional<Instant> getFromTimestamp();

         /**
          * The ending time point of the query, if
specified
          */
         public Optional<Instant> getAsOfTimestamp();

Other than that I didn't have too much to add.

Overall

I like

the

direction
of the KIP and think the funcatinlyt is all there!
best,
Walker



On Mon, Oct 2, 2023 at 10:46 PM Matthias J. Sax <

mj...@apache.org>

wrote:

Thanks for the updated KIP. Overall I like it.

Victoria raises a very good point, and I personally
tend to
prefer (I
believe so does Victoria, but it's not totally clear
from
her
email) if
a range query would not return any tombstones, ie,
only two

records

in

Victoria's example. Thus, it seems best to include a
`validTo`
ts-field
to `VersionedRecord` -- otherwise, the retrieved
result
cannot

be

interpreted correctly.

Not sure what others think about it.

I would also be open to actually add a
`includeDeletes()`
(or
`includeTombstones()`) method/flag (disabled by
default) to

allow

users
to get all tombstone: this would only be helpful if
there
are

two

consecutive tombstone though (if I got it right), so

not

sure

if we

want
to add it or not -- it seems also possible to add it
later if

there

is

user demand for it, so it might be a premature
addition as

this

point?



Nit:

the public interface ValueIterator is used


"is used" -> "is added" (otherwise it sounds like
as if
`ValueIterator`
exist already)



Should we also add a `.within(fromTs, toTs)` (or
maybe some

better

name?) to allow specifying both bounds at once? The
existing
`RangeQuery` does the same for specifying the
key-range, so
might be
good to add for time-range too?



-Matthias


On 9/6/23 5:01 AM, Bruno Cadonna wrote:

In my last e-mail I missed to finish a sentence.

"I think from a KIP"

should be

"I think the KIP looks good!"


On 9/6/23 1:59 PM, Bruno Cadonna wrote:

Hi Alieh,

Thanks for the KIP!

I think from a KIP

1.
I propose to throw an IllegalArgumentException
or an
IllegalStateException for meaningless

combinations.

In any

case,

the

KIP should specify what exception is thrown.

2.
Why does not specifying a range return the latest
version? I
would
expect that it returns all versions since an empty
lower or

upper

limit is interpreted as no limit.

3.
I second Matthias comment about replacing "asOf"
with

"until" or

"to".

4.
Do we need "allVersions()"? As I said above I

would

return

all

versions if no limits are specified. I think if we
get rid

of

allVersions() there might not be any meaningless

combinations

anymore.
If a user applies twice the same limit like for
example
MultiVersionedKeyQuery.with(key).from(t1).from(t2)

the

last

one

wins.

5.
Could you add some more examples with time ranges
to the

example

section?


6.
The KIP misses the test plan section.

7.
I propose to rename the class to
"MultiVersionKeyQuery"

since we

are

querying multiple versions of the same key.

8.
Could you also add withAscendingTimestamps()? IMO
it gives

users

the

possibility to make their code more readable
instead of
only

relying

on the default.

Best,
Bruno


On 8/17/23 4:13 AM, Matthias J. Sax wrote:

Thanks for splitting this part into a separate
KIP!

For `withKey()` we should be explicit that `null`
is not
allowed.

(Looking into existing `KeyQuery` it seems the
JavaDocs

don't

cover

this either -- would you like to do a tiny

cleanup

PR for
this, or
fix on-the-side in one of your PRs?)

The key query returns all the records that are
valid in

the

time
range starting from the timestamp {@code
fromTimestamp}.


In the JavaDocs you use the phrase `are valid` --
I think

we

need to
explain what "valid" means? It might even be

worth

to add

some

examples. It's annoying, but being precise if
kinda

important.


With regard to KIP-962, should we allow `null`

for

time

bounds ?

The
JavaDocs should also be explicit if `null` is
allowed or

not and

what
the semantics are if allowed.



You are using `asOf()` however, because we are
doing

time-range

queries, to me using `until()` to describe the
upper
bound

would

sound better (I am not a native speaker though,

so

maybe I

am

off?)

The key query returns all the records that have
timestamp

<=

{@code
asOfTimestamp}.


This is only correct if not lower-bound is set,
right?


In your reply to KIP-960 you mentioned:

the meaningless combinations are prevented by
throwing

exceptions.


We should add corresponding JavaDocs like:

           @throws IllegalArgumentException if
{@code
fromTimestamp} is
equal or
                                            larger

than

{@code
untilTimestamp}

Or something similar.


With regard to KIP-960: if we need to introduce a
`VersionedKeyQuery`
class for single-key-single-ts lookup, would we
need to

find

a new
name for the query class of this KIP, given
that the
return

type

is

different?


-Matthias



On 8/16/23 10:57 AM, Alieh Saeedi wrote:

Hi all,

I splitted KIP-960
<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores

into three separate KIPs. Therefore, please

continue

discussions
about single-key, multi-timestamp interactive
queries

here. You

can
see all
the addressed reviews on the following page.
Thanks in

advance.


KIP-968: Support single-key_multi-timestamp
interactive

queries

(IQv2) for
versioned state stores
<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-968%3A+Support+single-key_multi-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores


I look forward to your feedback!

Cheers,
Alieh

Re: [DISCUSS] KIP-968: Support single-key_multi-timestamp interactive queries (IQv2) for versioned state stores

Reply via email to