[ https://issues.apache.org/jira/browse/KAFKA-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias J. Sax updated KAFKA-7820: ----------------------------------- Component/s: (was: core) streams > distinct count kafka streams api > -------------------------------- > > Key: KAFKA-7820 > URL: https://issues.apache.org/jira/browse/KAFKA-7820 > Project: Kafka > Issue Type: New Feature > Components: streams > Reporter: Vinoth Rajasekar > Priority: Minor > > we are using Kafka streams for our real-time analytic use cases. most of our > use cases involved with doing distinct count on certain fields. > currently we do distinct count by storing the hash map value of the data in a > set and do a count as event flows in. There are lot of challenges doing this > using application memory, because storing the hashmap value and counting them > is limited by the allotted memory size. When we get high volume or spike in > traffic hash map of the distinct count fields grows beyond allotted memory > size leading to issues. > other issue is when we scale the app, we need to use global ktables so we > get all the values for doing distinct count and this adds back pressure in > the cluster or we have to re-partition the topic and do count on the key. > Can we have feature, where the distinct count is supported by through streams > api at the framework level, rather than dealing it with application level. -- This message was sent by Atlassian JIRA (v7.6.3#76005)