[ https://issues.apache.org/jira/browse/STORM-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064194#comment-16064194 ]
Arun Mahadevan commented on STORM-2258: --------------------------------------- We already have a groupByKey implementation. What we are looking for is a coGroupByKey which is more like a join but instead of returning the cross product of the matching keys, groups together values for the same key from the joined streams. For example, Say stream1 has values - (k1, v1), (k2, v2), (k2, v3) and stream2 has values - (k1, x1), (k1, x2), (k3, x3) The the co-grouped stream would contain - {noformat} (k1, ([v1], [x1, x2]), (k2, ([v2, v3], [])), (k3, ([], [x3])) {noformat} Since you are co-grouping two streams containing key-value pairs, you would define the operation on the PairStream class, with the signature of the method to be something like, {noformat} public <V1> PairStream<K, Pair<Iterable<V>, Iterable<V1>>> coGroupByKey(PairStream<K, V1> otherStream) { // ... } {noformat} To get some hints on how to go about implementing this, you can take a look at the implementation of the join operation. (see JoinProcessor.java) > Streams api - support CoGroupByKey > ---------------------------------- > > Key: STORM-2258 > URL: https://issues.apache.org/jira/browse/STORM-2258 > Project: Apache Storm > Issue Type: Sub-task > Reporter: Arun Mahadevan > > Group together values with same key from both streams. Similar constructs are > supported in beam, spark and flink. > When called on a Stream of (K, V) and (K, W) pairs, return a new Stream of > (K, Seq[V], Seq[W]) tuples > See also - https://cloud.google.com/dataflow/model/group-by-key -- This message was sent by Atlassian JIRA (v6.4.14#64029)