[GitHub] [kafka] vamossagar12 commented on a change in pull request #10052: KAFKA-12289: Adding test cases for prefix scan in InMemoryKeyValueStore

GitBox Tue, 16 Feb 2021 22:48:56 -0800


vamossagar12 commented on a change in pull request #10052:
URL: https://github.com/apache/kafka/pull/10052#discussion_r577363103




##########
File path: 
streams/src/test/java/org/apache/kafka/streams/state/internals/InMemoryKeyValueStoreTest.java
##########
@@ -60,4 +67,22 @@ public void shouldRemoveKeysWithNullValues() {
 
         assertThat(store.get(0), nullValue());
     }
+
+
+    @Test
+    public void shouldReturnKeysWithGivenPrefix(){
+        store = createKeyValueStore(driver.context());
+        final String value = "value";
+        final List<KeyValue<Integer, String>> entries = new ArrayList<>();
+        entries.add(new KeyValue<>(1, value));
+        entries.add(new KeyValue<>(2, value));
+        entries.add(new KeyValue<>(11, value));
+        entries.add(new KeyValue<>(13, value));
+
+        store.putAll(entries);
+        final KeyValueIterator<Integer, String> keysWithPrefix = 
store.prefixScan(1, new IntegerSerializer());

Review comment:
       > The reason, we get only `1` when we scan for prefix `1` is that the 
integer serializer serializes `11` and `13` in the least significant byte 
instead of serializing `1` in the byte before the least significant byte and 
`1` and `3` in the least significant byte. With the former the **byte** 
lexicographical order of `1 2 11 13` would be `1 2 11 13` which corresponds to 
the natural order of integers. With the latter the **byte** lexicographical 
order of `1 2 11 13` would be `1 11 13 2` which corresponds to the string 
lexicographical order. So the serializer determines the order of the entries 
and the store always returns the entries in byte lexicographical order.
   > 
   > You will experience a similar when you call `range(-1, 2)` on the 
in-memory state store in the unit test. You will get back an empty result since 
`-1` is larger then `2` in byte lexicographical order
   > when the `IntegerSerializer` is used. Also not the warning that is output, 
especially this part `... or serdes that don't preserve ordering when 
lexicographically comparing the serialized bytes ...`
   > 
   > I think we should clearly state this limitation in the javadocs of the 
`prefixScan()` as we have done for `range()`, maybe with an example.
   > 
   > Currently, to get `prefixScan()` working for all types, we would need to 
do a complete scan (i.e. `all()`) followed by a filter, right?
   
   That is correct. That is the only way currently. 
   
   > 
   > Double checking: Is my understanding correct? @ableegoldman
   
   I think adding a warning similar to the range() query would be good. I will 
do that as part of the PR. However, in this test class, adding test cases for 
the integer serializer won't make sense. Probably I will create another KVStore 
and add tests for those. Is that ok, @cadonna ?
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [kafka] vamossagar12 commented on a change in pull request #10052: KAFKA-12289: Adding test cases for prefix scan in InMemoryKeyValueStore

Reply via email to