Sebastian Kruse created FLINK-1043:
--------------------------------------
Summary: Alternative combine interface
Key: FLINK-1043
URL: https://issues.apache.org/jira/browse/FLINK-1043
Project: Flink
Issue Type: Wish
Reporter: Sebastian Kruse
The GroupReduce allows for the following combination reduce step: {{InputType}}
-> combine -> {{InputType}} -> reduce -> {{OutputType}}. However, in the use
cases I have stumbled upon so far, it would make more sense to have the
following steps: {{InputType}} -> {{OutputType}} -> {{OutputType}}. It seems
more intuitive to me to create a set of partial results with the combiners that
will finally merged within the reduce phase into an overall result. This
sometimes bars me from using a combiner.
I provide some examples for this intuition.
* WordCount
** If you want to implement WordCount with as a combinable GroupReduce, then
you have to preprocess all words as {{Tuple2<String, 1>}}. This could be
avoided if the combination result was not necessarily equal to the input type.
* create a Bloom filter
** Bloom filters can be created locally on each node and later on be merged
into a final, global Bloom filter, thus lend themselves for a combine-reduce
proceeding. Doing this with a combinable GroupReduce would currently require to
turn each input element into a singleton Bloom filter before the combination
phase.
Therefore, it would be nice to have the ability to use {{OutputType}} as the
combiner result.
--
This message was sent by Atlassian JIRA
(v6.2#6252)