[jira] [Commented] (KAFKA-8159) Built-in serdes for signed numbers do not obey lexicographical ordering

John Roesler (Jira) Tue, 19 May 2020 19:18:09 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111694#comment-17111694
 ]


John Roesler commented on KAFKA-8159:
-------------------------------------

Hi [~ableegoldman] ,

Thanks for pointing this out in the discussion for KIP-614.

This may sound crazy, but I'd actually say that we do _not_ assume that there 
is any relationship like key1 > key2 implies serialize(key1) > serialize(key2) .

Rather, for store ordering, the ordering between key1 and key2 is _defined_ by 
the ordering between serialize(key1) and serialize(key2). In other words, if 
you select the current IntegerSerializer (as written), then -1 _actually is_ 
greater than 3. This may be bizarre, and it may not be intentional, but that's 
how IntegerSerializer defines it. If you wanted some other ordering (for 
example, if you were going to do range scans in a store), then you need to 
choose a serializer that gives you the ordering you want.

As a less insane example, consider a complex record: Person(FirstName, 
LastName). Which greater, Person(Alex, Baldwin) or Person(Barry, Allen)? It 
seems like there are four possible answers:
 # Person(Alex, Baldwin) > Person(Barry, Allen) : We define the serde to put 
the *first* name first _because_ we want the data sorted this way for the 
purpose of a range scan.
 # Person(Alex, Baldwin) < Person(Barry, Allen) : We define the serde to put 
the *last* name first _because_ we want the data sorted this way for the 
purpose of a range scan.
 # We don't care because we'll never do a range scan. In this case, the we can 
use any serde that can round-trip the data, and we simply don't care how the 
data is ordered.

 

However, I would not say that everything was hunky-dory because, as you pointed 
out, the cache and our two provided store types (TreeMap and RocksDB) did not 
display consistent behavior when serialize(to) < serialize(from). You've 
corrected this in PR 6521, and now the contract is that in that case, any store 
implementation should return an empty iterator.

 

Also I would not say that this ticket is "not a bug" because we actually didn't 
intend for these serdes to produce this ordering. It seems like the "fix" would 
simply be to offer new serdes that produce the ordering we desire. If someone 
wants to be able to scan over negative and positive numerical keys in their 
natural ordering, they would just provide our new serdes in the Materialized 
argument or StoreBuilder for whatever store they need to query.

> Built-in serdes for signed numbers do not obey lexicographical ordering
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-8159
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8159
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Sophie Blee-Goldman
>            Priority: Major
>
> Currently we assume consistent ordering between serialized and deserialized 
> keys, e.g. if the objects obey objA < objB < objC then the serialized Bytes 
> will also obey bytesA < bytesB < bytesC. This is not true in general of the 
> built-in serdes for signed numerical types (eg Integer, Long). Specifically, 
> it is broken by the negative number representations which are 
> lexicographically greater than (all) positive number representations. 
>  
> One consequence of this is that an interactive query of a key range with a 
> negative lower bound and positive upper bound (eg keyValueStore.range(-1, 1) 
> will result in "unexpected behavior" depending on the specific store type.
>  
> For RocksDB stores with caching disabled, an empty iterator will be returned 
> regardless of whether any records do exist in that range. 
> For in-memory stores and ANY store with caching enabled, Streams will throw 
> an unchecked exception and crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-8159) Built-in serdes for signed numbers do not obey lexicographical ordering

Reply via email to