[jira] [Commented] (KAFKA-8295) Optimize count() using RocksDB merge operator

A. Sophie Blee-Goldman (Jira) Mon, 14 Jun 2021 20:46:07 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363356#comment-17363356
 ]


A. Sophie Blee-Goldman commented on KAFKA-8295:
-----------------------------------------------

Yes, any of the count DSL operators. It may be a bit more tricky than it 
appears on the surface because count is actually converted into a generic 
aggregation under the covers, so you'd have to tease it out into its own 
independent optimized implementation. To be honest, I don't have a good sense 
of whether it's even worth the additional code complexity, because I don't know 
how much additional code and/or code paths this will introduce :) I recommend 
looking into that before jumping straight in.

Of course, we could consider introducing some kind of top-level merge-based 
operator to the DSL as a feature in its own right. Then count could just be 
converted to use that instead of the aggregation implementation. 

Not sure what that would look like, or if it would even be useful at all – just 
throwing out thoughts here. Anyways I just thought it would be interesting to 
explore what we might be able to do with this merge operator in Kafka Streams, 
whether that's an optimization of existing operators or some kind of first 
class operator of its own. That's really the point of this ticket: to explore 
the merge operator.

> Optimize count() using RocksDB merge operator
> ---------------------------------------------
>
>                 Key: KAFKA-8295
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8295
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: A. Sophie Blee-Goldman
>            Assignee: Sagar Rao
>            Priority: Major
>
> In addition to regular put/get/delete RocksDB provides a fourth operation, 
> merge. This essentially provides an optimized read/update/write path in a 
> single operation. One of the built-in (C++) merge operators exposed over the 
> Java API is a counter. We should be able to leverage this for a more 
> efficient implementation of count()
>  
> (Note: Unfortunately it seems unlikely we can use this to optimize general 
> aggregations, even if RocksJava allowed for a custom merge operator, unless 
> we provide a way for the user to specify and connect a C++ implemented 
> aggregator – otherwise we incur too much cost crossing the jni for a net 
> performance benefit)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-8295) Optimize count() using RocksDB merge operator

Reply via email to