[
https://issues.apache.org/jira/browse/FLINK-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090793#comment-14090793
]
Fabian Hueske commented on FLINK-1043:
--------------------------------------
Right now a Combiner is considered to be optional and the Reduce must be able
to consume to consume its input data even if the Combiner was not executed.
This can happen, if the data is already partitioned in a suitable way. In this
case, a Combiner would cause additional overhead and not help to reduce the
data to be shuffled (as there is nothing to be shuffled).
Introducing a new combiner interface would make the execution of the combiner
mandatory which needs to be considered during optimization.
However, I think this would go well with our recent changes to extract combine
into an interface (FLINK-848).
> Alternative combine interface
> -----------------------------
>
> Key: FLINK-1043
> URL: https://issues.apache.org/jira/browse/FLINK-1043
> Project: Flink
> Issue Type: Wish
> Reporter: Sebastian Kruse
> Assignee: Fabian Hueske
>
> The GroupReduce allows for the following combination reduce step:
> {{InputType}} -> combine -> {{InputType}} -> reduce -> {{OutputType}}.
> However, in the use cases I have stumbled upon so far, it would make more
> sense to have the following steps: {{InputType}} -> {{OutputType}} ->
> {{OutputType}}. It seems more intuitive to me to create a set of partial
> results with the combiners that will finally merged within the reduce phase
> into an overall result. This sometimes bars me from using a combiner.
> I provide some examples for this intuition.
> * WordCount
> ** If you want to implement WordCount with as a combinable GroupReduce, then
> you have to preprocess all words as {{Tuple2<String, 1>}}. This could be
> avoided if the combination result was not necessarily equal to the input type.
> * create a Bloom filter
> ** Bloom filters can be created locally on each node and later on be merged
> into a final, global Bloom filter, thus lend themselves for a combine-reduce
> proceeding. Doing this with a combinable GroupReduce would currently require
> to turn each input element into a singleton Bloom filter before the
> combination phase.
> Therefore, it would be nice to have the ability to use {{OutputType}} as the
> combiner result.
--
This message was sent by Atlassian JIRA
(v6.2#6252)