Just catching up on this one.
50) I am also in favor of setting `validTo` in VersionedRecord for
single-key single-ts lookup; it seems better to return the proper
timestamp. The timestamp is already in the store and it's cheap to
extract it and add to the result, and it might be valuable information
for the user. Not sure though if we should deprecate the existing
constructor though, because for "latest" it's convenient to have?
60) Yes, I meant `VersionedRecord`. Sorry for the mixup.
80) We did discuss this question on KIP-985 (maybe you missed it Bruno).
It's kinda tricky.
Historically, it seems that IQv1, ie, the `ReadOnlyXxx` interfaces
provide a clear contract that `range()` is ascending and
`reverseRange()` is descending.
For `RangeQuery`, the question is, if we did implicitly inherit this
contract? Our conclusion on KIP-985 discussion was, that we did inherit
it. If this holds true, changing the contract would be a breaking change
(what might still be acceptable, given that the interface is annotated
as unstable, and that IQv2 is not widely adopted yet). I am happy to go
with the 3-option contract, but just want to ensure we all agree it's
the right way to go, and we are potentially willing to pay the price of
backward incompatibility.
Do we need a snapshot semantic or can we specify a weaker but still useful semantic?
I don't think we necessarily need it, but as pointed out by Lucas, all
existing queries provide it. Overall, my main point is really about not
implementing something "random", but defining a proper binding contract
that allows users to reason about it.
I general, I agree that weaker semantics might be sufficient, but I am
not sure if we can implement anything weaker in a reasonable way? Happy
to be convinced otherwise. (I have some example, that I will omit for
now, as I hope we can actually go with snapshot semantics.)
The RocksDB Snaptshot idea from Lucas sounds very promising. I was not
aware that we could do this with RocksDB (otherwise I might have
suggested it on the PR right away). I guess the only open question
remaining would be, if we can provide the same guarantees for a future
in-memory implementation for VersionedStores? It sounds possible to do,
but we should have some level of confidence about it?
-Matthias
On 11/14/23 6:33 AM, Lucas Brutschy wrote:
Hi Alieh,
I agree with Bruno that a weaker guarantee could be useful already,
and it's anyway possible to strengthen the guarantees later on. On the
other hand, it would be nice to provide a consistent view across all
segments if it doesn't come with major downsides, because until now IQ
does provide a consistent view (via iterators), and this would be the
first feature that diverges from that guarantee.
I think a consistent could be achieved relatively easily by creating a
snapshot (https://github.com/facebook/rocksdb/wiki/Snapshot) that will
ensure a consistent view on a single DB and using
`ReadOptions.setSnapshot` for the gets. Since versioned state stores
segments seem to be backed by a single RocksDB instance (that was
unclear to me during our earlier discussion), a single snapshot should
be enough to guarantee a consistent view - we just need to make sure
to clean up the snapshots after use, similar to iterators. If we
instead need a consistent view across multiple RocksDB instances, we
may have to acquire a write lock on all of those (could use the
current object monitors of our `RocksDB` interface) for the duration
of creating snapshots across all instances - which I think would also
be permissible performance-wise. Snapshots are just a sequence number
and should be pretty lightweight to create (they have, however,
downside when it comes to compaction just like iterators).
With that in mind, I would be in favor of at least exploring the
option of using snapshots for a consistent view here, before dropping
this useful guarantee.
Cheers,
Lucas
On Tue, Nov 14, 2023 at 2:20 PM Bruno Cadonna <cado...@apache.org> wrote:
Hi Alieh,
Regarding the semantics/guarantees of the query type:
Do we need a snapshot semantic or can we specify a weaker but still
useful semantic?
An option could be to guarantee that:
1. the query returns the latest version when the query arrived at the
state store
2. the query returns a valid history, i.e., versions with adjacent and
non-overlapping validity intervals.
I think guarantee 1 is quite obvious. Guarantee 2 maybe needs some
explanation. If we do not avoid writes to the state store during the
processing of interactive queries, it might for example happen that the
latest version in the state store moves to data structures that are
responsible for older versions. In our RocksDB implementation that are
the segments. Thus, it could be that during query processing Kafka
Streams reads the latest value x and encounters again x in a segment
because a processor put a newer version of x in the versioned state
store. A similar scenario might also happen to earlier versions. If
Streams does not account for such cases it could return invalid histories.
Maybe such weaker guarantees are enough for most use cases.
You could consider implementing weaker guarantees as I described and if
there is demand propose stricter guarantees in a follow-up KIP.
Maybe there are also other simpler guarantees that make sense.
Best,
Bruno
On 11/9/23 12:30 PM, Bruno Cadonna wrote:
Hi,
Thanks for the updates!
First my take on previous comments:
50)
I am in favor of deprecating the constructor that does not take the
validTo parameter. That implies that I am in favor of modifying get(key,
asOf) to set the correct validTo.
60)
I am in favor of renaming ValueIterator to VersionedRecordIterator and
define it as:
public interface VersionedRecordIterator<V> extends
Iterator<VersionedRecord<V>>
(Matthias, you mixed up ValueAndTimestamp with VersionedRecord in your
last e-mail, didn't you? Just double-checking if I understood what you
are proposing.)
70)
I agree with Matthias that adding a new method on the
VersionedKeyValueStore interface defeats the purpose of one of the goals
of IQv2, i.e., not to need to extend the state store interface for IQ. I
would say if we do not need the method in normal processing, we should
not extend the public state store interface. BTW, nobody forces you to
StoreQueryUtils. I think that internal utils class was introduced for
convenience to leverage existing methods on the state store interface.
80)
Why do we limit ourselves by specifying a default order on the result?
Different state store implementation might have different strategies to
store the records which affects the order in which the records are
returned if they are not sorted before they are returned to the user.
Some users might not be interested in an order of the result and so
there is no reason those users pay the cost for sorting. I propose to
not specify a default order but sort the results (if needed) when
withDescendingX() and withAscendingX() is specified on the query object.
Regarding the snapshot guarantees for the iterators, I need to think a
bit more about it. I will come back to this thread in the next days.
Best,
Bruno
On 11/8/23 5:30 PM, Alieh Saeedi wrote:
Thank you, Bruno and Matthias, for keeping the discussion going and for
reviewing the PR.
Here are the KIP updates:
- I removed the `peek()` from the `ValueIterator` interface since
we do
not need it.
- Yes, Bruno, the `validTo` field in the `VersionedRecod` class is
exclusive. I updated the javadocs for that.
Very important critical open questions. I list them here based on
priority
(descendingly).
- I implemented the `get(key, fromtime, totime)` method here
<https://github.com/aliehsaeedii/kafka/blob/9578b7cb7cdade22cc63f671693f5aeb993937ca/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBVersionedStore.java#L262>:
the problem is that this implementation does not guarantee
consistency
because processing might continue interleaved (no snapshot
semantics is
implemented). More over, it materializes all results in memory.
- Solution 1: Use a lock and release it after retrieving all
desired
records from all segments.
- positive point: snapshot semantics is implemented
- negative points: 1) It is expensive since iterating over all
segments may take a long time. 2) It still requires
materializing results
on memory
- Solution 2: use `RocksDbIterator`.
- positive points: 1) It guarantees snapshot segments. 2) It
does
not require materializing results in memory.
- negative points: it is expensive because, anyway, we need to
iterate over all (many) segments.
Do you have any thoughts on this issue? (ref: Matthias's
comment
<https://github.com/apache/kafka/pull/14626#pullrequestreview-1709280589>)
- I added the field `validTo` in `VersionedRecord`. Its default
value is
MAX. But as Matthias mentioned, for the single-key single-ts
(`VersionedKeyQuery` in KIP-960), it may not always be true. If the
returned record belongs to an old segment, maybe it is not valid
any more.
So MAX is not the correct value for `ValidTo`. Two solutions come
to mind:
- Solution 1: make the `ValidTo` as an `Optional` and set it to
`empty` for the retuned result of `get(key, asOfTimestamp)`.
- Solution 2: change the implementation of `get(key,
asOfTimestamp)`
so that it finds the correct `validTo` for the returned
versionedRecord.
- In this KIP and the next one, even though the default
ordering is
with ascending ts, I added the method `withAscendingTimestamps()`
to have
more user readibility (as Bruno suggested), while Hanyu did only add
`withDescending...` methods (he did not need ascending because
that's the
default anyway). Matthias believes that we should not have
inconsistencies (he actually hates them:D). Shall I change my KIP
or Hanyu?
Thoughts?
That would be maybe helpful to look into the PR
<https://github.com/apache/kafka/pull/14626> for more clarity and even
review that ;-)
Cheers,
Alieh
On Thu, Nov 2, 2023 at 7:13 PM Bruno Cadonna <cado...@apache.org> wrote:
Hi Alieh,
First of all, I like the examples.
Is validTo in VersionedRecord exclusive or inclusive?
In the javadocs you write:
"@param validTo the latest timestamp that value is valid"
I think that is not true because the validity is defined by the start
time of the new version. The new and the old version cannot both be
valid at that same time.
Theoretically, you could set validTo to the start time of the new
version - 1. However, what is the unit of the 1? Is it nanoseconds?
Milliseconds? Seconds? Sure we could agree on one, but I think it is
more elegant to just make the validTo exclusive. Actually, you used it
as exclusive in your examples.
Thanks for the KIP!
Best,
Bruno
On 11/1/23 9:01 PM, Alieh Saeedi wrote:
Hi all,
@Matthias: I think Victoria was right. I must add the method `get(key,
fromTime, toTime)` to the interface `VersionedKeyValueStore`. Right
now,
having the method only in `RocksDBVersionedStore`, made me to have an
instance of `RocksDBVersionedStore` (instead of
`VersionedKeyValueStore`)
in `StoreQueryUtils.runMultiVersionedKeyQuery()` method. In future, we
are
going to use the same method for In-memory/SPDB/blaBla versioned
stores.
Then either this method won't work any more, or we have to add code (if
clauses) for each type of versioned stores. What do you think about
that?
Bests,
Alieh
On Tue, Oct 24, 2023 at 10:01 PM Alieh Saeedi <asae...@confluent.io>
wrote:
Thank you, Matthias, Bruno, and Guozhang for keeping the discussion
going.
Here is the list of changes I made:
1. I enriched the "Example" section as Bruno suggested. Do you
please
have a look at that section? I think I devised an interesting one
;-)
2. As Matthias and Guozhang suggested, I renamed variables and
methods
as follows:
- "fromTimestamp" -> "fromTime"
- "asOfTimestamp" -> "toTime"
- "from(Instant fromTime)" -> "fromTime(Instant fromTime)"
- "asOf(Instant toTime)" -> "toTime(Instant toTime)"
3.
About throwing an NPE when time bounds are `null`: Ok,
Matthias, I
am
convinced by your idea. I consider a null timestamp as "no
bound".
But I
had to change KIP-960 and the corresponding PR to be
consistent as
well.
Answering questions and some more discussion points:
1.
For now, I keep the class names as they are.
2.
About the new field "validTo" in VersionedRecord. Yes Matthias I
keep
the old constructor, which does not have `validTo` as an input
parameter.
But in the body of the old constructor, I consider the default
value MAX
for the validTo field. I think MAX must be the default value for
`validTo`
since before inserting a tombstone or a new value for the same
key,
the
value must be valid till iternity.
3.
Regarding changing the ValueIterator interface from `public
interface
ValueIterator<V> extends Iterator<V>` to `public interface
ValueIterator<V> extends Iterator<VersionedRecord<V>>`:
Matthias, I
do not
know how it improves type safety because the
MultiVersionedKeyQuery
class
returns a ValueIterator of VersionedRecord any way. But if we
want
to be
consistent with KeyValueIterator, we must apply the changes you
suggested.
4.
Regarding adding the new `get(key, fromTime, toTime)` method
to the
public interface `VersionedKeyValueStore` or only adding it to
the
class `RocksDBVersionedStore`: In the KIP, I changed the
interface
as
Victoria suggested. But still, I am not convinced why we do that.
@Victoria:
Do you please clarify why we have to define it in the interface?
More
specifically, why do we need to use generic
VersionedKeyValueStore
instead of simply using the implementing classes?
Cheers,
Alieh
On Sat, Oct 14, 2023 at 3:35 AM Guozhang Wang <
guozhang.wang...@gmail.com>
wrote:
Thanks Alieh for the KIP, as well as a nice summary of all the
discussions! Just my 2c regarding your open questions:
1. I just checked KIP-889
(
https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores
)
and we used "VersionedRecord<V> get(K key, long asOfTimestamp);",
so I
feel to be consistent with this API is better compared with being
consistent with "WindowKeyQuery"?
3. I agree with Matthias that naming is always tricky, and I also
tend
to be consistent with what we already have (even if in retro it may
not be the best idea :P and if that was really becoming a complaint,
we would change it across the board in one shot as well later).
Guozhang
On Wed, Oct 11, 2023 at 9:12 PM Matthias J. Sax <mj...@apache.org>
wrote:
Thanks for the update!
Some thoughts about changes you made and open questions you raised:
10) About asOf vs until: I was just looking into `WindowKeyQuery`,
`WindowRangeQuery` and also `ReadOnlyWindowStore` interfaces. For
those,
we use "timeFrom" and "timeTo", so it seems best to actually use
`to(Instant toTime)` to keep the naming consistent across the board?
If yes, we should also do `from (Instant fromTime)` and use getters
`fromTime()` and `toTime()` -- given that it's range bounds it seems
acceptable to me, to diverge a little bit from KIP-960
`asOfTimestamp()`
-- but we could also rename it to `asOfTime()`? -- Given that we
strongly type with `Instant` I am not worried about semantic
ambiguity.
20) About throwing a NPE when time bounds are `null` -- why? (For
the
key it makes sense as is mandatory to have a key.) Could we not
interpret `null` as "no bound". We did KIP-941 to add `null` for
open-ended `RangeQueries`, so I am wondering if we should just stick
to
the same semantics?
Cf
https://cwiki.apache.org/confluence/display/KAFKA/KIP-941%3A+Range+queries+to+accept+null+lower+and+upper+bounds
30) About the class naming. That's always tricky, and I am not
married
to my proposal. I agree with Bruno that the other suggested names
are
not really better. -- The underlying idea was, to get some
"consistent"
naming across the board.
Existing `KeyQuery`
New `VersionedKeyQuery` (KIP-960; we add a prefix)
New `MultiVersionKeyQuery` (this KIP; extend the prefix with a
pre-prefix)
Existing `RangeQuery`
New `MultiVersionRangeQuery` (KIP-969; add same prefix as above)
40) I am fine with not adding `range(from, to)` -- it was just an
idea.
Some more follow up question:
50) You propose to add a new constructor and getter to
`VersionedRecord`
-- I am wondering if this implies that `validTo` is optional because
the
existing constructor is not deprecated? -- Also, what happens if
`validTo` is not set and `valueTo()` is called? Or do we intent to
make
`validTo` mandatory?
Maybe this question can only be answered when working on the code,
but I
am wondering if we should make `validTo` mandatory or not... And
what
the "blast radius" of changing `VersionedRecord` will be in general.
Do
you have already some POC PR that we could look at to get some
signals
about this?
60) The new query class is defined to return
`ValueIterator<VersionedRecord<V>>` -- while I like the idea to add
`ValueIterator<V>` in a generic way on the one hand, I am
wondering if
it might be better to change it, and enforce its usage (ie, return
type)
of `VersionedRecord` to improve type safety (type erasure is often a
pain, and we could mitigate it this way).
Btw: We actually do a similar thing for `KeyValueIterator`.
Ie,
public interface ValueIterator<V> extends
Iterator<ValueAndTimestamp<V>>
and
ValueAndTimestamp<V> peek();
This would imply that the return type of the new query is
`ValueIterator<V>` on the interface what seems simpler and more
elegant?
If we go with the change, I am also wondering if we need to find a
better name for the new iterator class? Maybe `VersionIterator` or
something like this?
Of course it might limit the use of `ValueIterator` for other value
types -- not sure if this a limitation that is prohibitive? My gut
feeling is, that is should not be too limiting.
70) Do we really need the change in `VersionedKeyValueStore` and
add a
new method? In the end, the idea of IQv2 is to avoid exactly this...
It
was the main issue for IQv1, that the base interface of the store
needed
an update and thus all classed implementing the base interface,
making
it very cumbersome to add new query types. -- Of course, we need
this
new method on the actually implementation (as private method)
that can
be called from `query()` method, but adding it to the interface
seems
to
defeat the purpose of IQv2.
Note, for existing IQv2 queries types that go against others stores,
the
public methods already existed when IQv2 was introduces, and thus
the
implementation of these query types just pragmatically re-used
existing
methods -- but it does not imply that new public method should be
added.
-Matthias
On 10/11/23 5:11 AM, Bruno Cadonna wrote:
Thanks for the updates, Alieh!
The example in the KIP uses the allVersions() method which we
agreed
to
remove.
Regarding your questions:
1. asOf vs. until: I am fine with both but slightly prefer until.
2. What about KeyMultiVersionsQuery, KeyVersionsQuery (KIP-960
would
then be KeyVersionQuery). However, I am also fine with
MultiVersionedKeyQuery since none of the names sounds better or
worse
to
me.
3. I agree with you not to introduce the method with the two bounds
to
keep things simple.
4. Forget about fromTime() an asOfTime(), from() and asOf() is
fine.
5. The main purpose is to show how to use the API. Maybe make an
example
with just the key to distinguish this query from the single value
query
of KIP-960 and then one with a key and a time range. When you
iterate
over the results you could also call validTo(). Maybe add some
actual
records in the comments to show what the result might look like.
Regarding the test plan, I hope you also plan to add unit tests in
all
of your KIPs. Maybe you could also explain why system tests are not
needed here.
Best,
Bruno
On 10/10/23 5:36 PM, Alieh Saeedi wrote:
Thank you all for the very exact and constructive comments. I
really
enjoyed reading your ideas and all the important points you
made me
aware
of. I updated KIP-968 as follows:
1. If the key or time bounds are null, the method returns
NPE.
2. The "valid" word: I removed the sentence "all the records
that are
valid..." and replaced it with an exact explanation. More
over, I
explained
it with an example in the KIP but not in the javadocs. Do I
need
to add the
example to the javadocs as well?
3. Since I followed Bruno's suggestion and removed the
allVersions()
method, the problem of meaningless combinations is
solved, and
I
do not
need any IllegalArgumentException or something like that.
Therefore, the
change is that if no time bound is specified, the query
returns
the records
with the specified key for all timestamps (all versions).
4. As Victoria suggested, adding a method to the
*VersionedKeyValueStore
*interface is essential. So I did that. I had this method
only
in the
RocksDBVersionedStore class, which was not enough.
5. I added the *validTo* field to the VersionedRecord
class to
be
able
to represent the tombstones. As you suggested, we postpone
solving
the
problem of retrieving consecutive tombstones for later.
6. I added the "Test Plan" section to all KIPs. I hope
what I
wrote is
convincing.
7. I added the *withAscendingTimestamp()* method to provide
more
code readability
for the user.
8. I removed the evil word "get" from all getter methods.
There have also been some more suggestions which I am still not
convinced
or clear about them:
1. Regarding asOf vs until: reading all comments, my
conclusion
was that
I keep it as "asOf" (following Walker's idea as the native
speaker
as well
as Bruno's suggestion to be consistent with
single-key_single_ts
queries).
But I do not have a personal preference. If you insist on
"until",
I change
it.
2. Bruno suggested renaming the class
"MultiVersionedKeyQuery"
to sth
else. We already had a long discussion about the name with
Matthias. I am
open to renaming it to something else, but do you have any
ideas?
3. Matthias suggested having a method with two input
parameters
that
enables the user to specify both time bounds in the same
method.
Isn't it
introducing redundancy? It is somehow disrespectful to
the idea
of
having
composable methods.
4. Bruno suggested renaming the methods "asOf" and "from" to
"asOfTime"
and "fromTime". If I do that, then it is not consistent with
KIP-960.
Moreover, the input parameter is clearly a timestamp, which
explains
enough. What do you think about that?
5. I was asked to add more examples to the example
section. My
question
is, what is the main purpose of that? If I know it clearly,
then
I
can add
what you mean.
Cheers,
Alieh
On Tue, Oct 10, 2023 at 1:13 AM Matthias J. Sax <mj...@apache.org>
wrote:
Bruno and I had some background conversation about the `get`
prefix
question including a few other committers.
The official policy was never changed, and we should not add the
`get`-prefix. It's a slip on our side in previous KIPs to add the
`get`-prefix and we should actually clean it up doing a follow up
KIP.
-Matthias
On 10/5/23 5:26 AM, Bruno Cadonna wrote:
Hi Matthias,
Given all the IQv2 KIPs that use getX and given recent PRs
(internal
interfaces mainly) that got merged, I was under the impression
that we
moved away from the strict no-getX policy.
I do not think it was an accident using getX in the IQv2 KIPs
since
somebody would have brought it up, otherwise.
I am fine with both types of getters.
If we think, we need to discuss this in a broader context, let's
start a
separate thread.
Best,
Bruno
On 10/5/23 7:44 AM, Matthias J. Sax wrote:
I agree to (almost) everything what Bruno said.
In general, we tend to move away from using getters without
"get",
recently. So I would keep the "get".
This is new to me? Can you elaborate on this point? Why do you
think
that's the case?
I actually did realize (after Walker mentioned it) that
existing
query
types use `get` prefix, but to me it seems that it was by
accident and
we should consider correcting it? Thus, I would actually prefer
to not
add the `get` prefix for new methods query types.
IMHO, we should do a follow up KIP to deprecate all methods
with
`get`
prefix and replace them with new ones without `get` -- it's of
course
always kinda "unnecessary" noise, but if we don't do it, we
might
get
into more and more inconsistent naming what would result in a
"bad"
API.
If we indeed want to change the convention and use the `get`
prefix, I
would strongly advocate to bit the bullet and do KIP to
pro-actively
add the `get` "everywhere" it's missing... But overall, it
seems
to be
a much broader decision, and we should get buy in from many
committers
about it -- as long as there is no broad consensus to add `get`
everywhere, I would strongly prefer not to diverge from the
current
agreement to omit `get`.
-Matthias
On 10/4/23 2:36 AM, Bruno Cadonna wrote:
Hi,
Regarding tombstones:
As far as I understand, we need to add either a validTo
field to
VersionedRecord or we need to return tombstones, otherwise the
result
is not complete, because users could never know a record was
deleted
at some point before the second non-null value was put.
I like more adding the validTo field since it makes the result
more
concise and easier interpretable.
Extending on Victoria's example, with the following puts
put(k, v1, time=0)
put(k, null, time=5)
put(k, null, time=10)
put(k, null, time=15)
put(k, v2, time=20)
the result with tombstones would be
value, timestamp
(v1, 0)
(null, 5)
(null, 10)
(null, 15)
(v2, 20)
instead of
value, timestamp, validTo
(v1, 0, 5)
(v2, 20, null)
The benefit of conciseness would already apply to one single
tombstone.
On the other hand, why would somebody write consecutive
tombstones
into a versioned state store? I guess if somebody does that on
purpose, then there should be a way to retrieve each of those
tombstones, right?
So maybe we need both -- validTo field and the option to
return
tombstones. The latter might be moved to a future KIP in
case we
see
the need.
Regarding .within(fromTs, toTs):
I would keep it simple with .from() and .asOfTimestamp() (or
.until()). If we go with .within(), I would opt for
.withinTimeRange(fromTs, toTs), because the query becomes more
readable:
MultiVersionedKeyQuery
.withKey(1)
.withinTimeRange(Instant.parse(2023-08-03T10:37:30.00Z),
Instant.parse(2023-08-04T10:37:30.00Z))
If we stay with .from() and .until(), we should consider
.fromTime()
and .untilTime() (or .toTime()):
MultiVersionedKeyQuery
.withKey(1)
.fromTime(Instant.parse(2023-08-03T10:37:30.00Z))
.untilTime(Instant.parse(2023-08-04T10:37:30.00Z))
Regarding asOf vs. until:
I think asOf() is more used in point in time queries as Walker
mentioned where this KIP specifies a time range. IMO asOf()
fits
very
well with KIP-960 where one version is queried, but here I
think
.until() fits better. That might just be a matter of taste and
in the
end I am fine with both as long as it is well documented.
Regarding getters without "get":
In the other IQv2 classes we used getters with "get". In
general, we
tend to move away from using getters without "get",
recently. So
I
would keep the "get".
Best,
Bruno
On 10/3/23 7:49 PM, Walker Carlson wrote:
Hey Alieh thanks for the KIP,
Weighing in on the AsOf vs Until debate I think either is
fine
from a
natural language perspective. Personally AsOf makes more
sense
to me
where
until gives me the idea that the query is making a change.
It's
totally a
connotative difference and not that important. I think as
of is
pretty
frequently used in point of time queries.
Also for these methods it makes sense to drop the "get" We
don't
normally use that in getters
* The key that was specified for this query.
*/
public K getKey();
/**
* The starting time point of the query, if specified
*/
public Optional<Instant> getFromTimestamp();
/**
* The ending time point of the query, if specified
*/
public Optional<Instant> getAsOfTimestamp();
Other than that I didn't have too much to add. Overall I like
the
direction
of the KIP and think the funcatinlyt is all there!
best,
Walker
On Mon, Oct 2, 2023 at 10:46 PM Matthias J. Sax <
mj...@apache.org>
wrote:
Thanks for the updated KIP. Overall I like it.
Victoria raises a very good point, and I personally tend to
prefer (I
believe so does Victoria, but it's not totally clear from
her
email) if
a range query would not return any tombstones, ie, only two
records
in
Victoria's example. Thus, it seems best to include a
`validTo`
ts-field
to `VersionedRecord` -- otherwise, the retrieved result
cannot
be
interpreted correctly.
Not sure what others think about it.
I would also be open to actually add a `includeDeletes()`
(or
`includeTombstones()`) method/flag (disabled by default) to
allow
users
to get all tombstone: this would only be helpful if there
are
two
consecutive tombstone though (if I got it right), so not
sure
if we
want
to add it or not -- it seems also possible to add it
later if
there
is
user demand for it, so it might be a premature addition as
this
point?
Nit:
the public interface ValueIterator is used
"is used" -> "is added" (otherwise it sounds like as if
`ValueIterator`
exist already)
Should we also add a `.within(fromTs, toTs)` (or maybe some
better
name?) to allow specifying both bounds at once? The existing
`RangeQuery` does the same for specifying the key-range, so
might be
good to add for time-range too?
-Matthias
On 9/6/23 5:01 AM, Bruno Cadonna wrote:
In my last e-mail I missed to finish a sentence.
"I think from a KIP"
should be
"I think the KIP looks good!"
On 9/6/23 1:59 PM, Bruno Cadonna wrote:
Hi Alieh,
Thanks for the KIP!
I think from a KIP
1.
I propose to throw an IllegalArgumentException or an
IllegalStateException for meaningless combinations. In any
case,
the
KIP should specify what exception is thrown.
2.
Why does not specifying a range return the latest
version? I
would
expect that it returns all versions since an empty
lower or
upper
limit is interpreted as no limit.
3.
I second Matthias comment about replacing "asOf" with
"until" or
"to".
4.
Do we need "allVersions()"? As I said above I would return
all
versions if no limits are specified. I think if we get rid
of
allVersions() there might not be any meaningless
combinations
anymore.
If a user applies twice the same limit like for example
MultiVersionedKeyQuery.with(key).from(t1).from(t2) the
last
one
wins.
5.
Could you add some more examples with time ranges to the
example
section?
6.
The KIP misses the test plan section.
7.
I propose to rename the class to "MultiVersionKeyQuery"
since we
are
querying multiple versions of the same key.
8.
Could you also add withAscendingTimestamps()? IMO it gives
users
the
possibility to make their code more readable instead of
only
relying
on the default.
Best,
Bruno
On 8/17/23 4:13 AM, Matthias J. Sax wrote:
Thanks for splitting this part into a separate KIP!
For `withKey()` we should be explicit that `null` is not
allowed.
(Looking into existing `KeyQuery` it seems the JavaDocs
don't
cover
this either -- would you like to do a tiny cleanup PR for
this, or
fix on-the-side in one of your PRs?)
The key query returns all the records that are valid in
the
time
range starting from the timestamp {@code fromTimestamp}.
In the JavaDocs you use the phrase `are valid` -- I think
we
need to
explain what "valid" means? It might even be worth to add
some
examples. It's annoying, but being precise if kinda
important.
With regard to KIP-962, should we allow `null` for time
bounds ?
The
JavaDocs should also be explicit if `null` is allowed or
not and
what
the semantics are if allowed.
You are using `asOf()` however, because we are doing
time-range
queries, to me using `until()` to describe the upper
bound
would
sound better (I am not a native speaker though, so
maybe I
am
off?)
The key query returns all the records that have
timestamp
<=
{@code
asOfTimestamp}.
This is only correct if not lower-bound is set, right?
In your reply to KIP-960 you mentioned:
the meaningless combinations are prevented by throwing
exceptions.
We should add corresponding JavaDocs like:
@throws IllegalArgumentException if {@code
fromTimestamp} is
equal or
larger than
{@code
untilTimestamp}
Or something similar.
With regard to KIP-960: if we need to introduce a
`VersionedKeyQuery`
class for single-key-single-ts lookup, would we need to
find
a new
name for the query class of this KIP, given that the
return
type
is
different?
-Matthias
On 8/16/23 10:57 AM, Alieh Saeedi wrote:
Hi all,
I splitted KIP-960
<
https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
into three separate KIPs. Therefore, please continue
discussions
about single-key, multi-timestamp interactive queries
here. You
can
see all
the addressed reviews on the following page. Thanks in
advance.
KIP-968: Support single-key_multi-timestamp interactive
queries
(IQv2) for
versioned state stores
<
https://cwiki.apache.org/confluence/display/KAFKA/KIP-968%3A+Support+single-key_multi-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
I look forward to your feedback!
Cheers,
Alieh