[ https://issues.apache.org/jira/browse/KAFKA-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317410#comment-17317410 ]
A. Sophie Blee-Goldman commented on KAFKA-8295: ----------------------------------------------- Thanks Sagar -- I think the custom MergeOperator did exist back then but I was hesitant to assume that it would always perform better given our experience with the performance of a custom ByteComparator. We would want to run some benchmarks to determine whether this is or isn't the case for the merge operator as well. It's possible that MergeOperator isn't as severely affected by the overhead of crossing the jni, or that it's been improved over the years. I actually recall reading a blog post about rocksdb tuning recently where they mentioned using the merge operator gave them a significant improvement for certain data types, I think this was in Flink maybe? This is definitely something we could add to the StateStore interface. It would require a KIP, and need to be careful since not all store backends will necessarily support this. You could check out [KIP-617: Allow Kafka Streams State Stores to be iterated backwards|https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards] as a reference, reverse iteration is another rocksdb feature we wanted to expose but which isn't necessarily supported by all. > Optimize count() using RocksDB merge operator > --------------------------------------------- > > Key: KAFKA-8295 > URL: https://issues.apache.org/jira/browse/KAFKA-8295 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: A. Sophie Blee-Goldman > Priority: Major > > In addition to regular put/get/delete RocksDB provides a fourth operation, > merge. This essentially provides an optimized read/update/write path in a > single operation. One of the built-in (C++) merge operators exposed over the > Java API is a counter. We should be able to leverage this for a more > efficient implementation of count() > > (Note: Unfortunately it seems unlikely we can use this to optimize general > aggregations, even if RocksJava allowed for a custom merge operator, unless > we provide a way for the user to specify and connect a C++ implemented > aggregator – otherwise we incur too much cost crossing the jni for a net > performance benefit) -- This message was sent by Atlassian Jira (v8.3.4#803005)