[ 
https://issues.apache.org/jira/browse/KAFKA-12314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282165#comment-17282165
 ] 

A. Sophie Blee-Goldman commented on KAFKA-12314:
------------------------------------------------

Hey Sagar, one thing I forgot to add to the initial ticket is that this is 
blocked by KAFKA-8897, which itself is blocked by waiting for the 3.0 release. 
As of now we expect the next release after 2.8 to be 3.0, but that hasn't been 
officially agreed on yet. So at the very least you'll need to wait for 
KAFKA-8897.

After that, feel free! I think this would be a huge improvement, but we need to 
be careful not to accidentally introduce a performance regression here. So I 
would definitely recommend to run some benchmarks to establish a baseline and 
then see where we can go from there. It's still unclear (at least to me) 
whether we can do this in a way that's performant enough. But I'll be 
interested to find out!

> Leverage custom comparator for optimized range scans on RocksDB
> ---------------------------------------------------------------
>
>                 Key: KAFKA-12314
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12314
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: A. Sophie Blee-Goldman
>            Priority: Major
>
> Currently our SessionStore has poor performance on any range scans due to the 
> byte layout and possibility of varyingly sized keys. A session window 
> consists of the key and two timestamps, the windowEnd and windowStart. This 
> data is formatted as
> [key, windowEnd, windowStart]
> The default comparator in rocksdb is lexicographical, and so it compares 
> bytes starting with the key. This means with the above format, the records 
> are effectively sorted first by key and then by windowEnd. But if two keys 
> are of different lengths, the comparator will start on the left and end up 
> comparing the tail bytes of the longer key against the windowEnd timestamp of 
> the shorter key. Due to this, we have to set the bounds on SessionStore range 
> scans very conservatively, which means we end up reading way more data than 
> we need.
> One way out of this would be to use a custom comparator which understands the 
> window bytes format we use. So far we haven't done this because of the 
> overhead in crossing the JNI with the Java Comparator; we would need a native 
> comparator to avoid further performance hit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to