[ https://issues.apache.org/jira/browse/KAFKA-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363356#comment-17363356 ]
A. Sophie Blee-Goldman commented on KAFKA-8295: ----------------------------------------------- Yes, any of the count DSL operators. It may be a bit more tricky than it appears on the surface because count is actually converted into a generic aggregation under the covers, so you'd have to tease it out into its own independent optimized implementation. To be honest, I don't have a good sense of whether it's even worth the additional code complexity, because I don't know how much additional code and/or code paths this will introduce :) I recommend looking into that before jumping straight in. Of course, we could consider introducing some kind of top-level merge-based operator to the DSL as a feature in its own right. Then count could just be converted to use that instead of the aggregation implementation. Not sure what that would look like, or if it would even be useful at all – just throwing out thoughts here. Anyways I just thought it would be interesting to explore what we might be able to do with this merge operator in Kafka Streams, whether that's an optimization of existing operators or some kind of first class operator of its own. That's really the point of this ticket: to explore the merge operator. > Optimize count() using RocksDB merge operator > --------------------------------------------- > > Key: KAFKA-8295 > URL: https://issues.apache.org/jira/browse/KAFKA-8295 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: A. Sophie Blee-Goldman > Assignee: Sagar Rao > Priority: Major > > In addition to regular put/get/delete RocksDB provides a fourth operation, > merge. This essentially provides an optimized read/update/write path in a > single operation. One of the built-in (C++) merge operators exposed over the > Java API is a counter. We should be able to leverage this for a more > efficient implementation of count() > > (Note: Unfortunately it seems unlikely we can use this to optimize general > aggregations, even if RocksJava allowed for a custom merge operator, unless > we provide a way for the user to specify and connect a C++ implemented > aggregator – otherwise we incur too much cost crossing the jni for a net > performance benefit) -- This message was sent by Atlassian Jira (v8.3.4#803005)