[ https://issues.apache.org/jira/browse/KAFKA-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111694#comment-17111694 ]
John Roesler commented on KAFKA-8159: ------------------------------------- Hi [~ableegoldman] , Thanks for pointing this out in the discussion for KIP-614. This may sound crazy, but I'd actually say that we do _not_ assume that there is any relationship like key1 > key2 implies serialize(key1) > serialize(key2) . Rather, for store ordering, the ordering between key1 and key2 is _defined_ by the ordering between serialize(key1) and serialize(key2). In other words, if you select the current IntegerSerializer (as written), then -1 _actually is_ greater than 3. This may be bizarre, and it may not be intentional, but that's how IntegerSerializer defines it. If you wanted some other ordering (for example, if you were going to do range scans in a store), then you need to choose a serializer that gives you the ordering you want. As a less insane example, consider a complex record: Person(FirstName, LastName). Which greater, Person(Alex, Baldwin) or Person(Barry, Allen)? It seems like there are four possible answers: # Person(Alex, Baldwin) > Person(Barry, Allen) : We define the serde to put the *first* name first _because_ we want the data sorted this way for the purpose of a range scan. # Person(Alex, Baldwin) < Person(Barry, Allen) : We define the serde to put the *last* name first _because_ we want the data sorted this way for the purpose of a range scan. # We don't care because we'll never do a range scan. In this case, the we can use any serde that can round-trip the data, and we simply don't care how the data is ordered. However, I would not say that everything was hunky-dory because, as you pointed out, the cache and our two provided store types (TreeMap and RocksDB) did not display consistent behavior when serialize(to) < serialize(from). You've corrected this in PR 6521, and now the contract is that in that case, any store implementation should return an empty iterator. Also I would not say that this ticket is "not a bug" because we actually didn't intend for these serdes to produce this ordering. It seems like the "fix" would simply be to offer new serdes that produce the ordering we desire. If someone wants to be able to scan over negative and positive numerical keys in their natural ordering, they would just provide our new serdes in the Materialized argument or StoreBuilder for whatever store they need to query. > Built-in serdes for signed numbers do not obey lexicographical ordering > ----------------------------------------------------------------------- > > Key: KAFKA-8159 > URL: https://issues.apache.org/jira/browse/KAFKA-8159 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Sophie Blee-Goldman > Priority: Major > > Currently we assume consistent ordering between serialized and deserialized > keys, e.g. if the objects obey objA < objB < objC then the serialized Bytes > will also obey bytesA < bytesB < bytesC. This is not true in general of the > built-in serdes for signed numerical types (eg Integer, Long). Specifically, > it is broken by the negative number representations which are > lexicographically greater than (all) positive number representations. > > One consequence of this is that an interactive query of a key range with a > negative lower bound and positive upper bound (eg keyValueStore.range(-1, 1) > will result in "unexpected behavior" depending on the specific store type. > > For RocksDB stores with caching disabled, an empty iterator will be returned > regardless of whether any records do exist in that range. > For in-memory stores and ANY store with caching enabled, Streams will throw > an unchecked exception and crash. -- This message was sent by Atlassian Jira (v8.3.4#803005)