Great discussion!

It seems the only open question might be about ordering guarantees? IIRC, we had a discussion about this in the past.


Technically (at least from my POV), existing `RangeQuery` does not have a guarantee that data is return in any specific order (not even on a per partitions bases). It just happens that RocksDB (and as pointed out by Hanyu already, also the built-in in-memory store that is base on a tree-map) allows us to return data ordered by key; as mentioned already, this guarantee is limited on a per partition basis.

If there would be custom store base on a hashed key-value store, this store could implement RangeQuery and return data (even for a single partition) with no ordering, without violating the contract.



Thus, it could actually make sense, to extend `RangeQuery` and allow three options: no-order, ascending, descending. For our existing Rocks/InMemory implementations, no-order could be equal to ascending and nothing changes effectively, but it might be a better API contract? -- If we assume that there might be a custom hash-based store, such a store could reject a query if "ascending" is required, or might need to do more work to implement it (up to the store maintainer). This is actually the beauty of IQv2 that different stores can pick what queries they want to support.

From an API contract point of view, it seems confusing to say: specifying nothing means no guarantee (or ascending if the store can offer it), but descending can we explicitly request. Thus, a hash-based store, might be able to accept "order not specified query", but would reject "descending". This seems to be somewhat unbalanced?

Thus, I am wondering if we should actually add `withAscendingKeys()`, too, even if it won't impact our current RocksDB/In-Memory implementations?


The second question is about per-partition or across-partition ordering: it's not possible right now to actually offer across-partition ordering the way IQv2 is setup. The reason is, that the store that implements a query type, is always a single shard. Thus, the implementation does not have access to other shards. It's hard-coded inside Kafka Streams, to query each shared, and to "accumulate" partial results, and return the back to the user. Note that the API is:


StateQueryResult<R> result = KafkaStreams.query(...);
Map<Integer, QueryResult<R>> resultPerPartitions = result.getPartitionResults();


Thus, if we would want to offer across-partition ordering, we cannot do it right now, because Kafka Streams does not know anything about the semantics of the query it distributes... -- the result is an unknown type <R>. We would need to extend IQv2 with an additional mechanism, that allows users to plug in more custom code to "merge" multiple partitions result into a "global result". This is clearly out-of-scope for this KIP and would require a new KIP by itself.

I seems that this contract, which is independent of the query type is not well understood, and thus a big +1 to fix the documentation. I don't think that this KIP must "define" anything, but it might of course be worth to add the explanation why the KIP cannot even offer global-ordering, as it's defined/limited by the IQv2 "framework" itself, not the individual queries.



-Matthias




On 10/4/23 4:38 PM, Hao Li wrote:
Hi Hanyu,

Thanks for the KIP! Seems there are already a lot of good discussions. I
only have two comments:

1. Please make it clear in
```
     /**
      * Interactive range query using a lower and upper bound to filter the
keys returned.
      * @param lower The key that specifies the lower bound of the range
      * @param upper The key that specifies the upper bound of the range
      * @param <K> The key type
      * @param <V> The value type
      */
     public static <K, V> RangeQuery<K, V> withRange(final K lower, final K
upper) {
         return new RangeQuery<>(Optional.ofNullable(lower),
Optional.ofNullable(upper), true);
     }
```
that a `null` in lower or upper parameter means it's unbounded.
2. What's the behavior if lower is 3 and upper is 1? Is it IllegalArgument
or will this return an empty result? Maybe also clarify this in the
document.

Thanks,
Hao


On Wed, Oct 4, 2023 at 9:27 AM Hanyu (Peter) Zheng
<pzh...@confluent.io.invalid> wrote:

For testing purposes, we previously used a Set to record the results in
IQv2StoreIntegrationTest. Let's take an example where we now have two
partitions and four key-value pairs: <0,0> in p0, <1,1> in p1, <2,2> in p0,
and <3,3> in p1.

If we execute withRange(1,3), it will return a Set of <1, 2, 3>. However,
if we run withRange(1,3).withDescendingKeys(), and still use a Set, the
result will again be a Set of <1,2,3>. This means we won't be able to
determine whether the results have been reversed.

To resolve this ambiguity, I've switched to using a List to record the
results, ensuring the order of retrieval from partitions p0 and p1. So,
withRange(1,3) would yield a List of [2, 1, 3], whereas
withRange(1,3).withDescendingKeys() would produce a List of [2,3,1].

This ordering makes sense since RocksDB sorts its keys, and InMemoryStore
uses a TreeMap structure, which means the keys are already sorted.

Sincerely,
Hanyu

On Wed, Oct 4, 2023 at 9:25 AM Hanyu (Peter) Zheng <pzh...@confluent.io>
wrote:

Hi,  Bruno

Thank you for your suggestions, I will update them soon.
Sincerely,

Hanyu

On Wed, Oct 4, 2023 at 9:25 AM Hanyu (Peter) Zheng <pzh...@confluent.io>
wrote:

Hi, Lucas,

Thank you for your suggestions.
I will update the KIP and code together.

Sincerely,
Hanyu

On Tue, Oct 3, 2023 at 8:16 PM Hanyu (Peter) Zheng <pzh...@confluent.io

wrote:

If we use  WithDescendingKeys() to generate a RangeQuery to do the
reveseQuery, how do we achieve the methods like withRange,
withUpperBound,
and withLowerBound only in this method?

On Tue, Oct 3, 2023 at 8:01 PM Hanyu (Peter) Zheng <
pzh...@confluent.io>
wrote:

I believe there's no need to introduce a method like
WithDescendingKeys(). Instead, we can simply add a reverse flag to
RangeQuery. Each method within RangeQuery would then accept an
additional
parameter. If the reverse is set to true, it would indicate the
results
should be reversed.

Initially, I introduced a reverse variable. When set to false, the
RangeQuery class behaves normally. However, when reverse is set to
true,
the RangeQuery essentially takes on the functionality of
ReverseRangeQuery.
Further details can be found in the "Rejected Alternatives" section.

In my perspective, RangeQuery is a class responsible for creating a
series of RangeQuery objects. It offers methods such as withRange,
withUpperBound, and withLowerBound, allowing us to generate objects
representing different queries. I'm unsure how adding a
withDescendingOrder() method would be compatible with the other
methods,
especially considering that, based on KIP 969, WithDescendingKeys()
doesn't
appear to take any input variables. And if withDescendingOrder()
doesn't
accept any input, how does it return a RangeQuery?

On Tue, Oct 3, 2023 at 4:37 PM Hanyu (Peter) Zheng <
pzh...@confluent.io>
wrote:

Hi, Colt,
The underlying structure of inMemoryKeyValueStore is treeMap.
Sincerely,
Hanyu

On Tue, Oct 3, 2023 at 4:34 PM Hanyu (Peter) Zheng <
pzh...@confluent.io> wrote:

Hi Bill,
1. I will update the KIP in accordance with the PR and synchronize
their future updates.
2. I will use that name.
3. you mean add something about ordering at the motivation section?

Sincerely,
Hanyu


On Tue, Oct 3, 2023 at 4:29 PM Hanyu (Peter) Zheng <
pzh...@confluent.io> wrote:

Hi, Walker,

1. I will update the KIP in accordance with the PR and synchronize
their future updates.
2. I will use that name.
3. I'll provide additional details in that section.
4. I intend to utilize rangeQuery to achieve what we're referring
to
as reverseQuery. In essence, reverseQuery is merely a term. To
clear up any
ambiguity, I'll make necessary adjustments to the KIP.

Sincerely,
Hanyu



On Tue, Oct 3, 2023 at 4:09 PM Hanyu (Peter) Zheng <
pzh...@confluent.io> wrote:

Ok, I will change it back to following the code, and update them
together.

On Tue, Oct 3, 2023 at 2:27 PM Walker Carlson
<wcarl...@confluent.io.invalid> wrote:

Hello Hanyu,

Looking over your kip things mostly make sense but I have a
couple
of
comments.


    1. You have "withDescandingOrder()". I think you mean
"descending" :)
    Also there are still a few places in the do where its called
"setReverse"
    2. Also I like "WithDescendingKeys()" better
    3. I'm not sure of what ordering guarantees we are offering.
Perhaps we
    can add a section to the motivation clearly spelling out the
current
    ordering and the new offering?
    4. When you say "use unbounded reverseQuery to achieve
reverseAll" do
    you mean "use unbounded RangeQuery to achieve reverseAll"? as
far as I can
    tell we don't have a reverseQuery as a named object?


Looking good so far

best,
Walker

On Tue, Oct 3, 2023 at 2:13 PM Colt McNealy <c...@littlehorse.io

wrote:

Hello Hanyu,

Thank you for the KIP. I agree with Matthias' proposal to keep
the naming
convention consistent with KIP-969. I favor the
`.withDescendingKeys()`
name.

I am curious about one thing. RocksDB guarantees that records
returned
during a range scan are lexicographically ordered by the bytes
of the keys
(either ascending or descending order, as specified in the
query). This
means that results within a single partition are indeed
ordered.** My
reading of KIP-805 suggests to me that you don't need to
specify
the
partition number you are querying in IQv2, which means that you
can have a
valid reversed RangeQuery over a store with "multiple
partitions" in it.

Currently, IQv1 does not guarantee order of keys in this
scenario. Does
IQv2 support ordering across partitions? Such an implementation
would
require opening a rocksdb range scan** on multiple rocksdb
instances (one
per partition), and polling the first key of each. Whether or
not this is
ordered, could we please add that to the documentation?

**(How is this implemented/guaranteed in an
`inMemoryKeyValueStore`? I
don't know about that implementation).

Colt McNealy

*Founder, LittleHorse.dev*


On Tue, Oct 3, 2023 at 1:35 PM Hanyu (Peter) Zheng
<pzh...@confluent.io.invalid> wrote:

ok, I will update it. Thank you  Matthias

Sincerely,
Hanyu

On Tue, Oct 3, 2023 at 11:23 AM Matthias J. Sax <
mj...@apache.org>
wrote:

Thanks for the KIP Hanyu!


I took a quick look and it think the proposal makes sense
overall.

A few comments about how to structure the KIP.

As you propose to not add `ReverseRangQuery` class, the
code
example
should go into "Rejected Alternatives" section, not in the
"Proposed
Changes" section.

For the `RangeQuery` code example, please omit all existing
methods
etc,
and only include what will be added/changed. This make it
simpler to
read the KIP.


nit: typo

  the fault value is false

Should be "the default value is false".


Not sure if `setReverse()` is the best name. Maybe
`withDescandingOrder`
(or similar, I guess `withReverseOrder` would also work)
might be
better? Would be good to align to KIP-969 proposal that
suggest do use
`withDescendingKeys` methods for "reverse key-range"; if we
go with
`withReverseOrder` we should change KIP-969 accordingly.

Curious to hear what others think about naming this
consistently across
both KIPs.


-Matthias


On 10/3/23 9:17 AM, Hanyu (Peter) Zheng wrote:





https://cwiki.apache.org/confluence/display/KAFKA/KIP-985%3A+Add+reverseRange+and+reverseAll+query+over+kv-store+in+IQv2




--

[image: Confluent] <https://www.confluent.io>
Hanyu (Peter) Zheng he/him/his
Software Engineer Intern
+1 (213) 431-7193 <+1+(213)+431-7193>
Follow us: [image: Blog]
<



https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
[image:
Twitter] <https://twitter.com/ConfluentInc>[image: LinkedIn]
<https://www.linkedin.com/in/hanyu-peter-zheng/>[image:
Slack]
<https://slackpass.io/confluentcommunity>[image: YouTube]
<https://youtube.com/confluent>

[image: Try Confluent Cloud for Free]
<



https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic






--

[image: Confluent] <https://www.confluent.io>
Hanyu (Peter) Zheng he/him/his
Software Engineer Intern
+1 (213) 431-7193 <+1+(213)+431-7193>
Follow us: [image: Blog]
<
https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
[image:
Twitter] <https://twitter.com/ConfluentInc>[image: LinkedIn]
<https://www.linkedin.com/in/hanyu-peter-zheng/>[image: Slack]
<https://slackpass.io/confluentcommunity>[image: YouTube]
<https://youtube.com/confluent>

[image: Try Confluent Cloud for Free]
<
https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic




--

[image: Confluent] <https://www.confluent.io>
Hanyu (Peter) Zheng he/him/his
Software Engineer Intern
+1 (213) 431-7193 <+1+(213)+431-7193>
Follow us: [image: Blog]
<
https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
[image:
Twitter] <https://twitter.com/ConfluentInc>[image: LinkedIn]
<https://www.linkedin.com/in/hanyu-peter-zheng/>[image: Slack]
<https://slackpass.io/confluentcommunity>[image: YouTube]
<https://youtube.com/confluent>

[image: Try Confluent Cloud for Free]
<
https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic




--

[image: Confluent] <https://www.confluent.io>
Hanyu (Peter) Zheng he/him/his
Software Engineer Intern
+1 (213) 431-7193 <+1+(213)+431-7193>
Follow us: [image: Blog]
<
https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
[image:
Twitter] <https://twitter.com/ConfluentInc>[image: LinkedIn]
<https://www.linkedin.com/in/hanyu-peter-zheng/>[image: Slack]
<https://slackpass.io/confluentcommunity>[image: YouTube]
<https://youtube.com/confluent>

[image: Try Confluent Cloud for Free]
<
https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic




--

[image: Confluent] <https://www.confluent.io>
Hanyu (Peter) Zheng he/him/his
Software Engineer Intern
+1 (213) 431-7193 <+1+(213)+431-7193>
Follow us: [image: Blog]
<
https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
[image:
Twitter] <https://twitter.com/ConfluentInc>[image: LinkedIn]
<https://www.linkedin.com/in/hanyu-peter-zheng/>[image: Slack]
<https://slackpass.io/confluentcommunity>[image: YouTube]
<https://youtube.com/confluent>

[image: Try Confluent Cloud for Free]
<
https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic




--

[image: Confluent] <https://www.confluent.io>
Hanyu (Peter) Zheng he/him/his
Software Engineer Intern
+1 (213) 431-7193 <+1+(213)+431-7193>
Follow us: [image: Blog]
<
https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
[image:
Twitter] <https://twitter.com/ConfluentInc>[image: LinkedIn]
<https://www.linkedin.com/in/hanyu-peter-zheng/>[image: Slack]
<https://slackpass.io/confluentcommunity>[image: YouTube]
<https://youtube.com/confluent>

[image: Try Confluent Cloud for Free]
<
https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic




--

[image: Confluent] <https://www.confluent.io>
Hanyu (Peter) Zheng he/him/his
Software Engineer Intern
+1 (213) 431-7193 <+1+(213)+431-7193>
Follow us: [image: Blog]
<
https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
[image:
Twitter] <https://twitter.com/ConfluentInc>[image: LinkedIn]
<https://www.linkedin.com/in/hanyu-peter-zheng/>[image: Slack]
<https://slackpass.io/confluentcommunity>[image: YouTube]
<https://youtube.com/confluent>

[image: Try Confluent Cloud for Free]
<
https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic




--

[image: Confluent] <https://www.confluent.io>
Hanyu (Peter) Zheng he/him/his
Software Engineer Intern
+1 (213) 431-7193 <+1+(213)+431-7193>
Follow us: [image: Blog]
<
https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
[image:
Twitter] <https://twitter.com/ConfluentInc>[image: LinkedIn]
<https://www.linkedin.com/in/hanyu-peter-zheng/>[image: Slack]
<https://slackpass.io/confluentcommunity>[image: YouTube]
<https://youtube.com/confluent>

[image: Try Confluent Cloud for Free]
<
https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic




--

[image: Confluent] <https://www.confluent.io>
Hanyu (Peter) Zheng he/him/his
Software Engineer Intern
+1 (213) 431-7193 <+1+(213)+431-7193>
Follow us: [image: Blog]
<
https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
[image:
Twitter] <https://twitter.com/ConfluentInc>[image: LinkedIn]
<https://www.linkedin.com/in/hanyu-peter-zheng/>[image: Slack]
<https://slackpass.io/confluentcommunity>[image: YouTube]
<https://youtube.com/confluent>

[image: Try Confluent Cloud for Free]
<
https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic




--

[image: Confluent] <https://www.confluent.io>
Hanyu (Peter) Zheng he/him/his
Software Engineer Intern
+1 (213) 431-7193 <+1+(213)+431-7193>
Follow us: [image: Blog]
<
https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
[image:
Twitter] <https://twitter.com/ConfluentInc>[image: LinkedIn]
<https://www.linkedin.com/in/hanyu-peter-zheng/>[image: Slack]
<https://slackpass.io/confluentcommunity>[image: YouTube]
<https://youtube.com/confluent>

[image: Try Confluent Cloud for Free]
<
https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic



Reply via email to