I agree to (almost) everything what Bruno said.

In general, we tend to move away from using getters without "get", recently. So I would 
keep the "get".

This is new to me? Can you elaborate on this point? Why do you think that's the case?

I actually did realize (after Walker mentioned it) that existing query types use `get` prefix, but to me it seems that it was by accident and we should consider correcting it? Thus, I would actually prefer to not add the `get` prefix for new methods query types.

IMHO, we should do a follow up KIP to deprecate all methods with `get` prefix and replace them with new ones without `get` -- it's of course always kinda "unnecessary" noise, but if we don't do it, we might get into more and more inconsistent naming what would result in a "bad" API.

If we indeed want to change the convention and use the `get` prefix, I would strongly advocate to bit the bullet and do KIP to pro-actively add the `get` "everywhere" it's missing... But overall, it seems to be a much broader decision, and we should get buy in from many committers about it -- as long as there is no broad consensus to add `get` everywhere, I would strongly prefer not to diverge from the current agreement to omit `get`.



-Matthias




On 10/4/23 2:36 AM, Bruno Cadonna wrote:
Hi,

Regarding tombstones:
As far as I understand, we need to add either a validTo field to VersionedRecord or we need to return tombstones, otherwise the result is not complete, because users could never know a record was deleted at some point before the second non-null value was put. I like more adding the validTo field since it makes the result more concise and easier interpretable.

Extending on Victoria's example, with the following puts

put(k, v1, time=0)
put(k, null, time=5)
put(k, null, time=10)
put(k, null, time=15)
put(k, v2, time=20)

the result with tombstones would be

value, timestamp
(v1, 0)
(null, 5)
(null, 10)
(null, 15)
(v2, 20)

instead of

value, timestamp, validTo
(v1, 0, 5)
(v2, 20, null)

The benefit of conciseness would already apply to one single tombstone.

On the other hand, why would somebody write consecutive tombstones into a versioned state store? I guess if somebody does that on purpose, then there should be a way to retrieve each of those tombstones, right? So maybe we need both -- validTo field and the option to return tombstones. The latter might be moved to a future KIP in case we see the need.


Regarding .within(fromTs, toTs):
I would keep it simple with .from() and .asOfTimestamp() (or .until()). If we go with .within(), I would opt for .withinTimeRange(fromTs, toTs), because the query becomes more readable:

MultiVersionedKeyQuery
   .withKey(1)
  .withinTimeRange(Instant.parse(2023-08-03T10:37:30.00Z), Instant.parse(2023-08-04T10:37:30.00Z))

If we stay with .from() and .until(), we should consider .fromTime() and .untilTime() (or .toTime()):

MultiVersionedKeyQuery
  .withKey(1)
  .fromTime(Instant.parse(2023-08-03T10:37:30.00Z))
  .untilTime(Instant.parse(2023-08-04T10:37:30.00Z))



Regarding asOf vs. until:
I think asOf() is more used in point in time queries as Walker mentioned where this KIP specifies a time range. IMO asOf() fits very well with KIP-960 where one version is queried, but here I think .until() fits better. That might just be a matter of taste and in the end I am fine with both as long as it is well documented.


Regarding getters without "get":
In the other IQv2 classes we used getters with "get". In general, we tend to move away from using getters without "get", recently. So I would keep the "get".


Best,
Bruno

On 10/3/23 7:49 PM, Walker Carlson wrote:
Hey Alieh thanks for the KIP,

Weighing in on the AsOf vs Until debate I think either is fine from a
natural language perspective. Personally AsOf makes more sense to me where
until gives me the idea that the query is making a change. It's totally a
connotative difference and not that important. I think as of is pretty
frequently used in point of time queries.

Also for these methods it makes sense to drop the "get" We don't
normally use that in getters

    * The key that was specified for this query.
    */
   public K getKey();

   /**
    * The starting time point of the query, if specified
    */
   public Optional<Instant> getFromTimestamp();

   /**
    * The ending time point of the query, if specified
    */
   public Optional<Instant> getAsOfTimestamp();

Other than that I didn't have too much to add. Overall I like the direction
of the KIP and think the funcatinlyt is all there!
best,
Walker



On Mon, Oct 2, 2023 at 10:46 PM Matthias J. Sax <mj...@apache.org> wrote:

Thanks for the updated KIP. Overall I like it.

Victoria raises a very good point, and I personally tend to prefer (I
believe so does Victoria, but it's not totally clear from her email) if
a range query would not return any tombstones, ie, only two records in
Victoria's example. Thus, it seems best to include a `validTo` ts-field
to `VersionedRecord` -- otherwise, the retrieved result cannot be
interpreted correctly.

Not sure what others think about it.

I would also be open to actually add a `includeDeletes()` (or
`includeTombstones()`) method/flag (disabled by default) to allow users
to get all tombstone: this would only be helpful if there are two
consecutive tombstone though (if I got it right), so not sure if we want
to add it or not -- it seems also possible to add it later if there is
user demand for it, so it might be a premature addition as this point?


Nit:

the public interface ValueIterator is used

"is used" -> "is added" (otherwise it sounds like as if `ValueIterator`
exist already)



Should we also add a `.within(fromTs, toTs)` (or maybe some better
name?) to allow specifying both bounds at once? The existing
`RangeQuery` does the same for specifying the key-range, so might be
good to add for time-range too?



-Matthias


On 9/6/23 5:01 AM, Bruno Cadonna wrote:
In my last e-mail I missed to finish a sentence.

"I think from a KIP"

should be

"I think the KIP looks good!"


On 9/6/23 1:59 PM, Bruno Cadonna wrote:
Hi Alieh,

Thanks for the KIP!

I think from a KIP

1.
I propose to throw an IllegalArgumentException or an
IllegalStateException for meaningless combinations. In any case, the
KIP should specify what exception is thrown.

2.
Why does not specifying a range return the latest version? I would
expect that it returns all versions since an empty lower or upper
limit is interpreted as no limit.

3.
I second Matthias comment about replacing "asOf" with "until" or "to".

4.
Do we need "allVersions()"? As I said above I would return all
versions if no limits are specified. I think if we get rid of
allVersions() there might not be any meaningless combinations anymore.
If a user applies twice the same limit like for example
MultiVersionedKeyQuery.with(key).from(t1).from(t2) the last one wins.

5.
Could you add some more examples with time ranges to the example
section?

6.
The KIP misses the test plan section.

7.
I propose to rename the class to "MultiVersionKeyQuery" since we are
querying multiple versions of the same key.

8.
Could you also add withAscendingTimestamps()? IMO it gives users the
possibility to make their code more readable instead of only relying
on the default.

Best,
Bruno


On 8/17/23 4:13 AM, Matthias J. Sax wrote:
Thanks for splitting this part into a separate KIP!

For `withKey()` we should be explicit that `null` is not allowed.

(Looking into existing `KeyQuery` it seems the JavaDocs don't cover
this either -- would you like to do a tiny cleanup PR for this, or
fix on-the-side in one of your PRs?)



The key query returns all the records that are valid in the time
range starting from the timestamp {@code fromTimestamp}.

In the JavaDocs you use the phrase `are valid` -- I think we need to
explain what "valid" means? It might even be worth to add some
examples. It's annoying, but being precise if kinda important.

With regard to KIP-962, should we allow `null` for time bounds ? The
JavaDocs should also be explicit if `null` is allowed or not and what
the semantics are if allowed.



You are using `asOf()` however, because we are doing time-range
queries, to me using `until()` to describe the upper bound would
sound better (I am not a native speaker though, so maybe I am off?)


The key query returns all the records that have timestamp <= {@code
asOfTimestamp}.

This is only correct if not lower-bound is set, right?


In your reply to KIP-960 you mentioned:

the meaningless combinations are prevented by throwing exceptions.

We should add corresponding JavaDocs like:

     @throws IllegalArgumentException if {@code fromTimestamp} is
equal or
                                      larger than {@code untilTimestamp}

Or something similar.


With regard to KIP-960: if we need to introduce a `VersionedKeyQuery`
class for single-key-single-ts lookup, would we need to find a new
name for the query class of this KIP, given that the return type is
different?


-Matthias



On 8/16/23 10:57 AM, Alieh Saeedi wrote:
Hi all,

I splitted KIP-960
<
https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores

into three separate KIPs. Therefore, please continue discussions
about single-key, multi-timestamp interactive queries here. You can
see all
the addressed reviews on the following page. Thanks in advance.

KIP-968: Support single-key_multi-timestamp interactive queries
(IQv2) for
versioned state stores
<
https://cwiki.apache.org/confluence/display/KAFKA/KIP-968%3A+Support+single-key_multi-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores


I look forward to your feedback!

Cheers,
Alieh



Reply via email to