[ 
https://issues.apache.org/jira/browse/KAFKA-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199642#comment-16199642
 ] 

Guozhang Wang edited comment on KAFKA-6049 at 10/11/17 12:59 AM:
-----------------------------------------------------------------

WIP PR ready at https://github.com/apache/kafka/pull/2975

Needs someone to pick it up, address the left comments, rebase on trunk and 
push a new PR to continue.


was (Author: guozhang):
WIP PR ready at 
https://github.com/apache/kafka/pull/2975#issuecomment-331275009.

Needs someone to pick it up, address the left comments, rebase on trunk and 
push a new PR to continue.

> Kafka Streams: Add Cogroup in the DSL
> -------------------------------------
>
>                 Key: KAFKA-6049
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6049
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Guozhang Wang
>              Labels: api, needs-kip, user-experience
>
> When multiple streams aggregate together to form a single larger object (eg. 
> A shopping website may have a cart stream, a wish list stream, and a 
> purchases stream. Together they make up a Customer.), it is very difficult to 
> accommodate this in the Kafka-Streams DSL. It generally requires you to group 
> and aggregate all of the streams to KTables then make multiple outerjoin 
> calls to end up with a KTable with your desired object. This will create a 
> state store for each stream and a long chain of ValueJoiners that each new 
> record must go through to get to the final object.
> Creating a cogroup method where you use a single state store will:
>  Reduce the number of gets from state stores. With the multiple joins when a 
> new value comes into any of the streams a chain reaction happens where 
> ValueGetters keep calling ValueGetters until we have accessed all state 
> stores.
> Slight performance increase. As described above all ValueGetters are called 
> also causing all ValueJoiners to be called forcing a recalculation of the 
> current joined value of all other streams, impacting performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to