[ 
https://issues.apache.org/jira/browse/KAFKA-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317410#comment-17317410
 ] 

A. Sophie Blee-Goldman commented on KAFKA-8295:
-----------------------------------------------

Thanks Sagar -- I think the custom MergeOperator did exist back then but I was 
hesitant to assume that it would always perform better given our experience 
with the performance of a custom ByteComparator. We would want to run some 
benchmarks to determine whether this is or isn't the case for the merge 
operator as well. It's possible that MergeOperator isn't as severely affected 
by the overhead of crossing the jni, or that it's been improved over the years. 
I actually recall reading a blog post about rocksdb tuning recently where they 
mentioned using the merge operator gave them a significant improvement for 
certain data types, I think this was in Flink maybe?

This is definitely something we could add to the StateStore interface. It would 
require a KIP, and need to be careful since not all store backends will 
necessarily support this. You could check out [KIP-617: Allow Kafka Streams 
State Stores to be iterated 
backwards|https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards]
 as a reference, reverse iteration is another rocksdb feature we wanted to 
expose but which isn't necessarily supported by all.

> Optimize count() using RocksDB merge operator
> ---------------------------------------------
>
>                 Key: KAFKA-8295
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8295
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: A. Sophie Blee-Goldman
>            Priority: Major
>
> In addition to regular put/get/delete RocksDB provides a fourth operation, 
> merge. This essentially provides an optimized read/update/write path in a 
> single operation. One of the built-in (C++) merge operators exposed over the 
> Java API is a counter. We should be able to leverage this for a more 
> efficient implementation of count()
>  
> (Note: Unfortunately it seems unlikely we can use this to optimize general 
> aggregations, even if RocksJava allowed for a custom merge operator, unless 
> we provide a way for the user to specify and connect a C++ implemented 
> aggregator – otherwise we incur too much cost crossing the jni for a net 
> performance benefit)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to