Great discussion! As Aljoscha says, Fold, Reduce, and Combine are all
intertwined and not quite identical as we use them.

Another simple but perhaps coy answer is that if you read the MapReduce
paper by Dean and Ghemawat that started this all, they used "Map",
"Reduce", and "Combine" (see section 4.3:
https://research.google.com/archive/mapreduce.html)

So then it's likely just the lineage of Beam as "evolving from MapReduce"
:). [Looking around the source tree: we have MapElements, ReduceFn, and
Combine. And the DataflowRunner has Shuffle inside of GroupByKey. ;)]

Dan

On Tue, Apr 18, 2017 at 3:16 AM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> The definition of foldl in Haskell is the same as the description I gave
> earlier:
>
> foldl :: (a -> b -> a) -> a -> [b] -> a
>
> The function (a -> b -> a) is what I described as (T, A) -> A and it’s
> used to fold a list of b’s into an a (the accumulator type).
>
> You’re right that the mapping AccumT->OutputT is not important and could
> be delegated to a separate method. The important part of the interface is
> mergeAccumulators() since this makes the operation distributive: we can
> “fold” a bunch of Ts into As in parallel (even on different machines) and
> then merge them together. This is what is missing from a functional fold.
>
> Best,
> Aljoscha
>
>
> > On 18. Apr 2017, at 12:03, Wesley Tanaka <wtan...@yahoo.com.INVALID>
> wrote:
> >
> > I believe that foldl in Haskell https://www.haskell.org/
> hoogle/?hoogle=foldl admits a separate accumulator type from the type of
> the data structure being "folded"
> > And, well, python lets you have your way with mixing types, but this
> certainly works as another example:python -c "print(reduce(lambda ac, elem:
> '%s%d' % (ac,elem), [1,2,3,4,5], ''))"
> > Is there anything special about the AccumT->OutputT conversion that
> extractOutput() needs to be in the same interface as createAccumulator(),
> addInput() and mergeAccumulators()?  If the interface were segregated such
> that one interface managed the InputT->AccumT conversion, and the second
> managed the AccumT->InputT conversion, it seems like maybe the
> AccumT->OutputT conversion could even get replaced with MapElements?  And
> then the full current "Combine" functionality could be implemented as a
> composition of the lower-level primitives?
> > I haven't dug that deeply into Combine yet, so I may be missing
> something obvious.
> > ---
> > Wesley Tanaka
> > https://wtanaka.com/
> >
> > On Monday, April 17, 2017, 11:32:29 PM HST, Aljoscha Krettek <
> aljos...@apache.org> wrote:Hi,
> > I think both fold and reduce fail to capture all the power or (what we
> call) combine. Reduce requires a function of type (T, T) -> T. It requires
> that the output type be the same as the input type. Fold takes a function
> (T, A) -> A where T is the input type and A is the accumulation type. Here,
> the output type can be different from the input type. However, there is no
> way of combining these aggregators so the operation is not distributive,
> i.e. we cannot hierarchically apply the operation.
> >
> > Combine is the generalisation of this: We have three types, T (input), A
> (accumulator), O (output) and we require a function that can merge
> accumulators. The operation is distributive, meaning we can efficiently
> execute it and we can also have an output type that is different from the
> input type.
> >
> > Quick FYI: in Flink the CombineFn is called AggregatingFunction and
> CombiningState is AggregatingState.
> >
> > Best,
> > Aljoscha
> >> On 18. Apr 2017, at 04:29, Wesley Tanaka <wtan...@yahoo.com.INVALID>
> wrote:
> >>
> >> As I start to understand Combine.Globally, it seems that it is, in
> spirit, Beam's implementation of the "fold" higher-order function
> >> https://en.wikipedia.org/wiki/Fold_(higher-order_function)#
> Folds_in_various_languages
> >>
> >> Was there a reason the word "combine" was picked instead of either
> "fold" or "reduce"?  From the wikipedia list above, it seems as though
> "fold" and "reduce" are in much more common usage, so either of those might
> be easier for newcomers to understand.
> >> ---
> >> Wesley Tanaka
> >> http://wtanaka.com/
>
>

Reply via email to