[ 
https://issues.apache.org/jira/browse/KAFKA-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112511#comment-17112511
 ] 

Guozhang Wang commented on KAFKA-8159:
--------------------------------------

Hey John,

What I was asking is actually beyond signed numerical types -- for which I 
agree offering new serdes as opt-in should be sufficient -- but for any types 
like you described in the example above. Today our javadoc for a range query 
looks like this:

{code}
    /**
     * Get an iterator over a given range of keys. This iterator must be closed 
after use.
     * The returned iterator must be safe from {@link 
java.util.ConcurrentModificationException}s
     * and must not return null values. No ordering guarantees are provided.
     * @param from The first key that could be in the range
     * @param to The last key that could be in the range
     * @return The iterator for this range.
     * @throws NullPointerException If null is used for from or to.
     * @throws InvalidStateStoreException if the store is not initialized
     */
    KeyValueIterator<K, V> range(K from, K to);
{code}

For most users the {{from}} < {{to}} relationship is implicitly define as 
`from.compareTo(to) <= 0`, however what they mostly also assume, but actually 
not guaranteed is that `serialize(from).compareTo(serialize(to)) <= 0`. And we 
should make it clear in the javadoc that the "first / last" key in the range is 
actually defined based on their serialized bytes, not by their objects, and it 
is user's responsibility to either make sure the serializers can correctly 
transfer the object ordering to bytes ordering, or have parameters passed in to 
{{from / to}} to obey the bytes ordering already.

> Built-in serdes for signed numbers do not obey lexicographical ordering
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-8159
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8159
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Sophie Blee-Goldman
>            Priority: Major
>
> Currently we assume consistent ordering between serialized and deserialized 
> keys, e.g. if the objects obey objA < objB < objC then the serialized Bytes 
> will also obey bytesA < bytesB < bytesC. This is not true in general of the 
> built-in serdes for signed numerical types (eg Integer, Long). Specifically, 
> it is broken by the negative number representations which are 
> lexicographically greater than (all) positive number representations. 
>  
> One consequence of this is that an interactive query of a key range with a 
> negative lower bound and positive upper bound (eg keyValueStore.range(-1, 1) 
> will result in "unexpected behavior" depending on the specific store type.
>  
> For RocksDB stores with caching disabled, an empty iterator will be returned 
> regardless of whether any records do exist in that range. 
> For in-memory stores and ANY store with caching enabled, Streams will throw 
> an unchecked exception and crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to