Hi all,
I updated the KIP based on the changes made in the former KIP (KIP-968). So
with the `ResultOrder` enum, the class `MultiVersionedRangeQuery` had some
changes both in the defined fields and defined methods.

Based on the PoC PR, what we currently promise in the KIP about ordering
seems like a dream. I intended to enable the user to have a global ordering
based on the key or timestamp (using `orderByKey()` or
`orderByTimestamp()`) and then even have a partial ordering based on the
other parameter.  For example, if the user specifies
`orderByKey().withDescendingKey().withAscendingTimestamps()`, then the
global ordering is based on keys in a descending order, and then all the
records with the same key are ordered ascendingly based on ts. The result
will be something like (k3,v1,t1), (k3,v2,t2), (k2,v2,t3), (k1.v1.t1)
(assuming that k1<k2<k3 and t1<t2<t3).

But in reality, we have limitations for having a global ordering based on
keys since we are iterating over the segments in a lazy manner. Therefore,
when we are processing the current segment, we have no knowledge of the
keys in the next segment.

Now I have two suggestions:
1. Changing the `MultiVersionedRangeQuery` class as follows:

private final ResultOrder *segmentOrder*;
private final contentOrder *segmentContentOrder*; // can be KEY_WISE or
TIMESTAMP_WISE
private final ResultOrder  *keyOrder*;
private final ResultOrder *timestampOrder*;

This way, the global ordering is specified by the `segmentOrder`. It means
we either show the results from the oldest to the latest segment
(ASCENDING) or from the latest to the oldest segment (DESCENDING).
Then, inside each segment, we guarantee a `segmentContentOrder` which can
be `KEY_WISE` or `TIMESTAMP_WISE`. The key order and timestamp order are
specified by the `keyOrder` and `timestampOrder` properties, respectively.
If the content of a segment must be ordered key-wise and then we have two
records with the same key (it happens in older segments), then the
`timestampOrder` determines the order between them.

2. We define that global ordering can only be based on timestamps (the
`timestampOrder` property), and if two records have the same timestamp, the
`keyOrder` determines the order between them.

I think the first suggestion gives more flexibility to the user, but it is
more complicated. I mean, it needs good Javadocs.

I look forward to your ideas.

Cheers,
Alieh


On Mon, Nov 6, 2023 at 3:08 PM Alieh Saeedi <asae...@confluent.io> wrote:

> Thank you, Bruno and Matthias.
> I updated the KIP as follows:
> 1. The one remaining `asOf` word in the KIP is removed.
> 2. Example 2 is updated. Thanks, Bruno for the correction.
>
> Discussions and open questions
> 1. Yes, Bruno. We need `orderByKey()` and `orderByTimestamp()` as well.
> Because the results must have a global ordering. Either based on key or
> based on ts. For example, we can have
> `orderByKey().withDescendingKey().withAscendingTimestamps()`. Then the
> global ordering is based on keys in a descending order, and then all the
> records with the same key are ordered ascendingly based on ts. The result
> will be something like (k3,v1,t1), (k3,v2,t2), (k3,v1,t1), (k2,v2,t2),
> (k1.v1.t1) (assuming that k1<k2<k3 and t1<t2<t3)
> 2. About having the `latest()` method: it seems like we are undecided yet.
> Adding a new class or ignoring `latest()` for VersionedRangeQuery and
> instead using the `TimestampedRangeQuery` as Matthias suggested.
>
> Cheers,
> Alieh
>
> On Sat, Nov 4, 2023 at 1:38 AM Matthias J. Sax <mj...@apache.org> wrote:
>
>> Great discussion. Seems we are making good progress.
>>
>> I see advantages and disadvantages in splitting out a "single-ts
>> key-range" query type. I guess, the main question might be, what
>> use-cases do we anticipate and how common we believe they are?
>>
>> We should also take KIP-992 (adding `TimestampedRangeQuery`) into account.
>>
>> (1) The most common use case seems to be, a key-range over latest. For
>> this one, we could use `TimestampedRangeQuery` -- it would return a
>> `ValueAndTimestamp<V>` instead of a `VersionedRecord<V>` but the won't
>> be any information loss, because "toTimestamp" would be `null` anyway.
>>
>>
>> (2) The open question is, how common is a key-range in a point in the
>> past? For this case, using
>> `MultiVersionedRangeQuery.withKeyRange().from(myTs).to(myTs)` seems
>> actually not to be a bad UX, and also does not really need to be
>> explained how to do this (compared to "latest" that required to pass in
>> MAX_INT).
>>
>>
>> If we add a new query type, we avoid both issues (are they actually
>> issues?) and add some nice syntactic sugar to the API. The question is,
>> if it's worth the effort and expanded API surface area?
>>
>> To summarize:
>>
>> Add new query type:
>>
>> > // queries latest; returns VersionedRecords
>> > VersionedRangeQuery.withKeyRange(...);
>> >
>> > VersionedRangeQuery.withKeyRange(...).asOf(ts);
>>
>> vs
>>
>> No new query type:
>>
>> > // queries latest; returns ValueAndTimestamps
>> > TimestampedRangeQuery.withRange(...);
>> >
>> > MultiVersionedRangeQuery.withKeyRange(...).from(myTs).to(myTs)
>>
>>
>>
>> I guess, bottom line, I would be ok with either one and I am actually
>> not even sure which one I prefer personally. Just wanted to lay out the
>> tradeoffs I see. Not sure if three are other considerations that would
>> tip the scale into either direction?
>>
>>
>>
>> -Matthias
>>
>> On 11/3/23 3:43 AM, Bruno Cadonna wrote:
>> > Hi Alieh,
>> >
>> > I like the examples!
>> >
>> >
>> > 1.
>> > Some terms like `asOf` in the descriptions still need to be replaced in
>> > the KIP.
>> >
>> >
>> > 2.
>> > In your last e-mail you state:
>> >
>> > "How can a user retrieve the latest value? We have the same issue with
>> > kip-968 as well."
>> >
>> > Why do we have the same issue in KIP-968?
>> > If users need to retrieve the latest value for a specific key, they
>> > should use KIP-960.
>> >
>> >
>> > 3.
>> > Regarding querying the latest version (or an asOf version) of records
>> in
>> > a given key range, that is exactly why I proposed to split the query
>> > class. One class would return the latest and the asOf versions (i.e. a
>> > single version) of records in a key range and the other class would
>> > return all versions in a given time range (i.e. multiple versions) of
>> > the records in a given key range. The splitting in two classes avoids
>> to
>> > specify a time range and latest or a time range and asOf on a given key
>> > range.
>> >
>> > Alternatively, you could keep one class and you could specify that the
>> > last call wins as you specified for fromTime() and toTime(). For
>> > example, for
>> >
>> >
>> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).toTime(t2).latest()
>> >
>> > latest() wins. However, how would you interpret
>> >
>> >
>> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).latest().toTime(t2)
>> >
>> > Is it [t1, t2] or [-INF, t2]?
>> > (I would say the latter, but somebody else would say differently)
>> >
>> > The two class solution seems cleaner to me since we do not need to
>> > consider those edge cases.
>> > You could propose both classes in this KIP.
>> >
>> >
>> > 4.
>> > Why do we need orderByKey() and orderByTimestamp()?
>> > Aren't withAscendingKeys(), withDescendingKeys(),
>> > withAscendingTimestamps(), and withDescendingTimestamps() sufficient?
>> >
>> >
>> > 5.
>> > In example 2, why is
>> >
>> > key,value: 2,20, timestamp: 2023-01-10T10:00:00.00Z, valid till:
>> > 2023-01-25T10:00:00.00Z
>> >
>> > not part of the result?
>> > It is valid from 2023-01-10T10:00:00.00Z to 2023-01-25T10:00:00.00Z
>> > which overlaps with the time range [2023-01-17T10:00:00.00Z,
>> > 2023-01-30T10:00:00.00Z] of the query.
>> >
>> > (BTW, in the second example, you forgot to add "key" to the output.)
>> >
>> >
>> > Best,
>> > Bruno
>> >
>> >
>> > On 10/25/23 4:01 PM, Alieh Saeedi wrote:
>> >> Thanks, Matthias and Bruno.
>> >> Here is a list of updates:
>> >>
>> >>     1. I changed the variable and method names as I did for KIP-968, as
>> >>     follows:
>> >>        - "fromTimestamp" -> fromTime
>> >>        - "asOfTimestamp"-> toTime
>> >>        - "from(instant)" -> fromTime(instant)"
>> >>        - asOf(instant)"->toTime(instant)
>> >>     2. As Bruno suggested for KIP-968, I added `orderByKey()`,
>> >>     `withAscendingKeys()`, and `withAscendingTimestamps()` methods for
>> >> user
>> >>     readability.
>> >>     3. I updated the "Example" section as well.
>> >>
>> >> Some points:
>> >>
>> >>     1. Even though the kip suggests adding the `get(k lowerkeybound, k
>> >>     upperkeybound, long fromtime, long totime)` method to the
>> >> interface, I
>> >>     added this method to the `rocksdbversionedstore` class for now.
>> >>     2. Matthias, you mentioned a very important point. How can a user
>> >>     retrieve the latest value? We have the same issue with kip-968 as
>> >> well.
>> >>     Asking a user to call `toTime(max)` violates the API design rules,
>> >> as you
>> >>     mentioned. So I think we must have a `latest()` method for both
>> >> KIP-968 and
>> >>     KIP-969. What do you think about that?
>> >>
>> >>
>> >> Cheers,
>> >> Alieh
>> >>
>> >> On Thu, Oct 12, 2023 at 6:33 AM Matthias J. Sax <mj...@apache.org>
>> wrote:
>> >>
>> >>> Thanks for the update.
>> >>>
>> >>>
>> >>>> To retrieve
>> >>>>>      the latest value(s), the user must call just the asOf method
>> with
>> >>> the MAX
>> >>>>>      value (asOf(MAX)). The same applies to KIP-968. Do you think
>> >>>>> it is
>> >>> clumsy,
>> >>>>>      Matthias?
>> >>>
>> >>>
>> >>> Well, in KIP-968 calling `asOf` and passing in a timestamp is
>> optional,
>> >>> and default is "latest", right? So while `asOf(MAX)` does the same
>> >>> thing, practically users would never call `asOf` for a "latest" query?
>> >>>
>> >>> In this KIP, we enforce that users give us a key range (we have the 4
>> >>> static entry point methods to define a query for this), and we say we
>> >>> default to "no bounds" for time range by default.
>> >>>
>> >>> The existing `RangeQuery` allows to query a range of keys for existing
>> >>> stores. It seems to be a common pattern to query a key-range on
>> latest.
>> >>> -- in the current proposal, users would need to do:
>> >>>
>> >>> MultiVersionedRangeQuery.withKeyRange(startKey, endKey).asOf(MAX);
>> >>>
>> >>> Would like to hear from others if we think that's good user
>> experience?
>> >>> If we agree to accept this, I think we should explain how to do this
>> in
>> >>> the JavaDocs (and also regular docs... --- otherwise, I can already
>> >>> anticipate user question on all question-asking-channels how to do a
>> >>> "normal key range query". IMHO, the problem is not that the code
>> itself
>> >>> it too clumsy, but that it's totally not obvious to uses how to
>> express
>> >>> it without actually explaining it to them. It basically violated the
>> API
>> >>> design rule "make it easy to use / simple things should be easy".
>> >>>
>> >>> Btw: We could also re-use `RangeQuery` and add am implementation to
>> >>> `VersionedStateStore` to just accept this query type, with "key range
>> >>> over latest" semantics. -- The issue is of course, that uses need to
>> >>> know that the query would return `ValueAndTimestamp` and not plain `V`
>> >>> (or we add a translation step to unwrap the value, but we would lose
>> the
>> >>> "validFrom" timestamp -- validTo would be `null`). Because type safety
>> >>> is a general issue in IQv2 it would not make it worse (in the strict
>> >>> sense), but I am also not sure if we want to dig an even deeper
>> hole...
>> >>>
>> >>>
>> >>> -Matthias
>> >>>
>> >>>
>> >>> On 10/10/23 11:55 AM, Alieh Saeedi wrote:
>> >>>> Thanks, Matthias and Bruno, for the feedback on KIP-969. Here is a
>> >>> summary
>> >>>> of the updates I made to the KIP:
>> >>>>
>> >>>>      1.  I liked the idea of renaming methods as Matthias suggested.
>> >>>>      2. I removed the allversions() method as I did in KIP-968. To
>> >>> retrieve
>> >>>>      the latest value(s), the user must call just the asOf method
>> with
>> >>> the MAX
>> >>>>      value (asOf(MAX)). The same applies to KIP-968. Do you think it
>> is
>> >>> clumsy,
>> >>>>      Matthias?
>> >>>>      3. I added a method to the *VersionedKeyValueStore *interface,
>> >>>> as I
>> >>> did
>> >>>>      for KIP-968.
>> >>>>      4. Matthias: I do not get what you mean by your second comment.
>> >>>> Isn't
>> >>>>      the KIP already explicit about that?
>> >>>>
>> >>>>      > I assume, results are returned by timestamp for each key. The
>> >>>> KIP
>> >>>>      should be explicit about it.
>> >>>>
>> >>>>
>> >>>> Cheers,
>> >>>> Alieh
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Oct 3, 2023 at 6:07 AM Matthias J. Sax <mj...@apache.org>
>> >>>> wrote:
>> >>>>
>> >>>>> Thanks for updating the KIP.
>> >>>>>
>> >>>>> Not sure if I agree or not with Bruno's idea to split the query
>> types
>> >>>>> further? In the end, we split them only because there is three
>> >>>>> different
>> >>>>> return types: single value, value-iterator, key-value-iterator.
>> >>>>>
>> >>>>> What do we gain by splitting out single-ts-range-key? In the end,
>> for
>> >>>>> range-ts-range-key the proposed class is necessary and is a superset
>> >>>>> (one can set both timestamps to the same value, for single-ts
>> lookup).
>> >>>>>
>> >>>>> The mentioned simplification might apply to "single-ts-range-key"
>> >>>>> but I
>> >>>>> don't see a simplification for the proposed (and necessary) query
>> >>>>> type?
>> >>>>>
>> >>>>> On the other hand, I see an advantage of a single-ts-range-key for
>> >>>>> querying over the "latest version" with a range of keys. For a
>> >>>>> single-ts-range-key query, this it would be the default (similar to
>> >>>>> VersionedKeyQuery with not asOf-timestamped defined).
>> >>>>>
>> >>>>> In the current version of the KIP, (if we agree that default should
>> >>>>> actually return "all versions" not "latest" -- this default was
>> >>>>> suggested by Bruno on KIP-968 and makes sense to me, so we would
>> >>>>> need to
>> >>>>> have the same default here to stay consistent), users would need to
>> >>>>> pass
>> >>>>> in `from(Long.MAX).to(Long.MAX)` (if I got this right) to query the
>> >>>>> latest point in time only, what seems to be clumsy? Or we could add
>> a
>> >>>>> `lastestKeyOnly` option to `MultiVersionedRangeQuery`, but it does
>> >>>>> seems
>> >>>>> a little clumsy, too.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>> The overall order of the returned records is by Key
>> >>>>>
>> >>>>> I assume, results are returned by timestamp for each key. The KIP
>> >>>>> should
>> >>>>> be explicit about it.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> To be very explicit, should we rename the methods to specify the key
>> >>> bound?
>> >>>>>
>> >>>>>     - withRange -> withKeyRange
>> >>>>>     - withLowerBound -> withLowerKeyBound
>> >>>>>     - withUpperBound -> withUpperKeyBound
>> >>>>>     - withNoBounds -> allKeys (or withNoKeyBounds, but we use
>> >>>>> `allVersions` and not `noTimeBound` and should align the naming?)
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> -Matthias
>> >>>>>
>> >>>>>
>> >>>>> On 9/6/23 5:25 AM, Bruno Cadonna wrote:
>> >>>>>> Hi Alieh,
>> >>>>>>
>> >>>>>> Thanks for the KIP!
>> >>>>>>
>> >>>>>> One high level comment/question:
>> >>>>>>
>> >>>>>> I assume you separated single key queries into two classes because
>> >>>>>> versioned key queries return a single value and multi version key
>> >>>>>> queries return iterators. Although, range queries always return
>> >>>>>> iterators, it would make sense to also separate range queries for
>> >>>>>> versioned state stores into range queries that return one single
>> >>> version
>> >>>>>> of the keys within a range and range queries that return multiple
>> >>>>>> version of the keys within a range, IMO. That would reduce the
>> >>>>>> meaningless combinations.
>> >>>>>> WDYT?
>> >>>>>>
>> >>>>>> Best,
>> >>>>>> Bruno
>> >>>>>>
>> >>>>>> On 8/16/23 8:01 PM, Alieh Saeedi wrote:
>> >>>>>>> Hi all,
>> >>>>>>>
>> >>>>>>> I splitted KIP-960
>> >>>>>>> <
>> >>>>>
>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
>> >>>>>>
>> >>>>>>> into
>> >>>>>>> three separate KIPs. Therefore, please continue discussions about
>> >>> range
>> >>>>>>> interactive queries here. You can see all the addressed reviews
>> >>>>>>> on the
>> >>>>>>> following page. Thanks in advance.
>> >>>>>>>
>> >>>>>>> KIP-969: Support range interactive queries (IQv2) for versioned
>> >>>>>>> state
>> >>>>>>> stores
>> >>>>>>> <
>> >>>>>
>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores
>> >>>>>>
>> >>>>>>>
>> >>>>>>> I look forward to your feedback!
>> >>>>>>>
>> >>>>>>> Cheers,
>> >>>>>>> Alieh
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>>
>

Reply via email to