cadonna commented on a change in pull request #10052:
URL: https://github.com/apache/kafka/pull/10052#discussion_r576789945
##########
File path:
streams/src/test/java/org/apache/kafka/streams/state/internals/InMemoryKeyValueStoreTest.java
##########
@@ -60,4 +67,22 @@ public void shouldRemoveKeysWithNullValues() {
assertThat(store.get(0), nullValue());
}
+
+
+ @Test
+ public void shouldReturnKeysWithGivenPrefix(){
+ store = createKeyValueStore(driver.context());
+ final String value = "value";
+ final List<KeyValue<Integer, String>> entries = new ArrayList<>();
+ entries.add(new KeyValue<>(1, value));
+ entries.add(new KeyValue<>(2, value));
+ entries.add(new KeyValue<>(11, value));
+ entries.add(new KeyValue<>(13, value));
+
+ store.putAll(entries);
+ final KeyValueIterator<Integer, String> keysWithPrefix =
store.prefixScan(1, new IntegerSerializer());
Review comment:
The reason, we get only `1` when we scan for prefix `1` is that the
integer serializer serializes `11` and `13` in the least significant byte
instead of serializing `1` in the byte before the least significant byte and
`1` and `3` in the least significant byte. With the former the **byte**
lexicographical order of `1 2 11 13` would be `1 2 11 13` which corresponds to
the natural order of integers. With the latter the **byte** lexicographical
order of `1 2 11 13` would be `1 11 13 2` which corresponds to the string
lexicographical order. So the serializer determines the order of the entries
and the store always returns the entries in byte lexicographical order.
You will experience a similar when you call `range(-1, 2)` on the in-memory
state store in the unit test. You will get back an empty result since `-1` is
larger then `2` in byte lexicographical order
when the `IntegerSerializer` is used. Also not the warning that is output,
especially this part `... or serdes that don't preserve ordering when
lexicographically comparing the serialized bytes ...`
I think we should clearly state this limitation in the javadocs of the
`prefixScan()` as we have done for `range()`, maybe with an example.
Currently, to get `prefixScan()` working for all types, we would need to do
a complete scan (i.e. `all()`) followed by a filter, right?
Double checking: Is my understanding correct? @ableegoldman
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]