[GitHub] [kafka] cadonna commented on a change in pull request #10052: KAFKA-12289: Adding test cases for prefix scan in InMemoryKeyValueStore

GitBox Tue, 16 Feb 2021 04:42:31 -0800


cadonna commented on a change in pull request #10052:
URL: https://github.com/apache/kafka/pull/10052#discussion_r576789945




##########
File path: 
streams/src/test/java/org/apache/kafka/streams/state/internals/InMemoryKeyValueStoreTest.java
##########
@@ -60,4 +67,22 @@ public void shouldRemoveKeysWithNullValues() {
 
         assertThat(store.get(0), nullValue());
     }
+
+
+    @Test
+    public void shouldReturnKeysWithGivenPrefix(){
+        store = createKeyValueStore(driver.context());
+        final String value = "value";
+        final List<KeyValue<Integer, String>> entries = new ArrayList<>();
+        entries.add(new KeyValue<>(1, value));
+        entries.add(new KeyValue<>(2, value));
+        entries.add(new KeyValue<>(11, value));
+        entries.add(new KeyValue<>(13, value));
+
+        store.putAll(entries);
+        final KeyValueIterator<Integer, String> keysWithPrefix = 
store.prefixScan(1, new IntegerSerializer());

Review comment:
       The reason, we get only `1` when we scan for prefix `1` is that the 
integer serializer serializes `11` and `13` in the least significant byte 
instead of serializing `1` in the byte before the least significant byte and 
`1` and `3` in the least significant byte. With the former the **byte** 
lexicographical order of `1 2 11 13` would be `1 2 11 13` which corresponds to 
the natural order of integers. With the latter the **byte** lexicographical 
order of `1 2 11 13` would be `1 11 13 2` which corresponds to the string 
lexicographical order. So the serializer determines the order of the entries 
and the store always returns the entries in byte lexicographical order.
   
   You will experience a similar when you call `range(-1, 2)` on the in-memory 
state store in the unit test. You will get back an empty result since `-1` is 
larger then `2` in byte lexicographical order   
    when the `IntegerSerializer` is used. Also not the warning that is output, 
especially this part `... or serdes that don't preserve ordering when 
lexicographically comparing the serialized bytes ...`
    
   I think we should clearly state this limitation in the javadocs of the 
`prefixScan()` as we have done for `range()`, maybe with an example. 
   
   Currently, to get `prefixScan()` working for all types, we would need to do 
a complete scan (i.e. `all()`) followed by a filter, right? 
   
   Double checking: Is my understanding correct? @ableegoldman 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [kafka] cadonna commented on a change in pull request #10052: KAFKA-12289: Adding test cases for prefix scan in InMemoryKeyValueStore

Reply via email to