[ https://issues.apache.org/jira/browse/KAFKA-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eno Thereska resolved KAFKA-3511. --------------------------------- Resolution: Won't Fix > Add common aggregation functions like Sum and Avg as build-ins in Kafka > Streams DSL > ----------------------------------------------------------------------------------- > > Key: KAFKA-3511 > URL: https://issues.apache.org/jira/browse/KAFKA-3511 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Guozhang Wang > Assignee: Eno Thereska > Labels: api > Fix For: 0.10.1.0 > > > Currently we have the following aggregation APIs in the Streams DSL: > {code} > KStream.aggregateByKey(..) > KStream.reduceByKey(..) > KStream.countByKey(..) > KTable.groupBy(...).aggregate(..) > KTable.groupBy(...).reduce(..) > KTable.groupBy(...).count(..) > {code} > And it is better to add common aggregation functions like Sum and Avg as > built-in into the Streams DSL. A few questions to ask though: > 1. Should we add those built-in functions as, for example > {{KTable.groupBy(...).sum(...)} or {{KTable.groupBy(...).aggregate(SUM, > ...)}}. Please see the comments below for detailed pros and cons. > 2. If we go with the second option above, should we replace the countByKey / > count operators with aggregate(COUNT) as well? Personally I (Guozhang) feel > it is not necessary, as COUNT is a special aggregate function since we do not > need to map on any value fields; this is the same approach as in Spark as > well, where Count is built-in as first-citizen in the DSL, and others are > built-in as {{aggregate(SUM)}}, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)