Vinoth Rajasekar created KAFKA-7820:
---------------------------------------

             Summary: distinct count kafka streams api
                 Key: KAFKA-7820
                 URL: https://issues.apache.org/jira/browse/KAFKA-7820
             Project: Kafka
          Issue Type: New Feature
          Components: core
            Reporter: Vinoth Rajasekar


we are using Kafka streams for our real-time analytic use cases. most of our 
use cases involved with doing distinct count on certain fields.

currently we do distinct count by storing the hash map value of the data in a 
set and do a count as event flows in. There are lot of challenges doing this 
using application memory, because storing the hashmap value and counting them 
is limited by the allotted memory size. When we get high volume  or spike in 
traffic hash map of the distinct count fields grows beyond allotted memory size 
leading to issues.

other issue is when  we scale the app, we need to use global ktables so we get 
all the values for doing distinct count and this adds back pressure in the 
cluster or we have to re-partition the topic and do count on the key.

Can we have feature, where the distinct count is supported by through streams 
api at the framework level, rather than dealing it with application level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to