cadonna commented on a change in pull request #10052: URL: https://github.com/apache/kafka/pull/10052#discussion_r576789945
########## File path: streams/src/test/java/org/apache/kafka/streams/state/internals/InMemoryKeyValueStoreTest.java ########## @@ -60,4 +67,22 @@ public void shouldRemoveKeysWithNullValues() { assertThat(store.get(0), nullValue()); } + + + @Test + public void shouldReturnKeysWithGivenPrefix(){ + store = createKeyValueStore(driver.context()); + final String value = "value"; + final List<KeyValue<Integer, String>> entries = new ArrayList<>(); + entries.add(new KeyValue<>(1, value)); + entries.add(new KeyValue<>(2, value)); + entries.add(new KeyValue<>(11, value)); + entries.add(new KeyValue<>(13, value)); + + store.putAll(entries); + final KeyValueIterator<Integer, String> keysWithPrefix = store.prefixScan(1, new IntegerSerializer()); Review comment: The reason, we get only `1` when we scan for prefix `1` is that the integer serializer serializes `11` and `13` in the least significant byte instead of serializing `1` in the byte before the least significant byte and `1` and `3` in the least significant byte. With the former the **byte** lexicographical order of `1 2 11 13` would be `1 2 11 13` which corresponds to the natural order of integers. With the latter the **byte** lexicographical order of `1 2 11 13` would be `1 11 13 2` which corresponds to the string lexicographical order. So the serializer determines the order of the entries and the store always returns the entries in byte lexicographical order. You will experience a similar when you call `range(-1, 2)` on the in-memory state store in the unit test. You will get back an empty result since `-1` is larger then `2` in byte lexicographical order when the `IntegerSerializer` is used. Also not the warning that is output, especially this part `... or serdes that don't preserve ordering when lexicographically comparing the serialized bytes ...` I think we should clearly state this limitation in the javadocs of the `prefixScan()` as we have done for `range()`, maybe with an example. Currently, to get `prefixScan()` working for all types, we would need to do a complete scan (i.e. `all()`) followed by a filter, right? Double checking: Is my understanding correct? @ableegoldman ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org