[ 
https://issues.apache.org/jira/browse/KAFKA-12314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282215#comment-17282215
 ] 

Guozhang Wang commented on KAFKA-12314:
---------------------------------------

cc [~vcrfxia] a PR from 3 years ago on using customized comparator through JNI: 
https://github.com/apache/kafka/pull/4576 we ran some benchmarks but the 
results are not very promising, mainly because the JNI comparator on the 
critical path is too slow.

> Leverage custom comparator for optimized range scans on RocksDB
> ---------------------------------------------------------------
>
>                 Key: KAFKA-12314
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12314
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: A. Sophie Blee-Goldman
>            Priority: Major
>
> Currently our SessionStore has poor performance on any range scans due to the 
> byte layout and possibility of varyingly sized keys. A session window 
> consists of the key and two timestamps, the windowEnd and windowStart. This 
> data is formatted as
> [key, windowEnd, windowStart]
> The default comparator in rocksdb is lexicographical, and so it compares 
> bytes starting with the key. This means with the above format, the records 
> are effectively sorted first by key and then by windowEnd. But if two keys 
> are of different lengths, the comparator will start on the left and end up 
> comparing the tail bytes of the longer key against the windowEnd timestamp of 
> the shorter key. Due to this, we have to set the bounds on SessionStore range 
> scans very conservatively, which means we end up reading way more data than 
> we need.
> One way out of this would be to use a custom comparator which understands the 
> window bytes format we use. So far we haven't done this because of the 
> overhead in crossing the JNI with the Java Comparator; we would need a native 
> comparator to avoid further performance hit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to