...Curiously enough, ReduceFn is by far the closest of all these to a
sequential fold. It is also internal (runner-facing rather than
user-facing).

On Tue, Apr 18, 2017 at 8:27 AM Dan Halperin <dhalp...@google.com.invalid>
wrote:

> Great discussion! As Aljoscha says, Fold, Reduce, and Combine are all
> intertwined and not quite identical as we use them.
>
> Another simple but perhaps coy answer is that if you read the MapReduce
> paper by Dean and Ghemawat that started this all, they used "Map",
> "Reduce", and "Combine" (see section 4.3:
> https://research.google.com/archive/mapreduce.html)
>
> So then it's likely just the lineage of Beam as "evolving from MapReduce"
> :). [Looking around the source tree: we have MapElements, ReduceFn, and
> Combine. And the DataflowRunner has Shuffle inside of GroupByKey. ;)]
>
> Dan
>
> On Tue, Apr 18, 2017 at 3:16 AM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
> > The definition of foldl in Haskell is the same as the description I gave
> > earlier:
> >
> > foldl :: (a -> b -> a) -> a -> [b] -> a
> >
> > The function (a -> b -> a) is what I described as (T, A) -> A and it’s
> > used to fold a list of b’s into an a (the accumulator type).
> >
> > You’re right that the mapping AccumT->OutputT is not important and could
> > be delegated to a separate method. The important part of the interface is
> > mergeAccumulators() since this makes the operation distributive: we can
> > “fold” a bunch of Ts into As in parallel (even on different machines) and
> > then merge them together. This is what is missing from a functional fold.
> >
> > Best,
> > Aljoscha
> >
> >
> > > On 18. Apr 2017, at 12:03, Wesley Tanaka <wtan...@yahoo.com.INVALID>
> > wrote:
> > >
> > > I believe that foldl in Haskell https://www.haskell.org/
> > hoogle/?hoogle=foldl admits a separate accumulator type from the type of
> > the data structure being "folded"
> > > And, well, python lets you have your way with mixing types, but this
> > certainly works as another example:python -c "print(reduce(lambda ac,
> elem:
> > '%s%d' % (ac,elem), [1,2,3,4,5], ''))"
> > > Is there anything special about the AccumT->OutputT conversion that
> > extractOutput() needs to be in the same interface as createAccumulator(),
> > addInput() and mergeAccumulators()?  If the interface were segregated
> such
> > that one interface managed the InputT->AccumT conversion, and the second
> > managed the AccumT->InputT conversion, it seems like maybe the
> > AccumT->OutputT conversion could even get replaced with MapElements?  And
> > then the full current "Combine" functionality could be implemented as a
> > composition of the lower-level primitives?
> > > I haven't dug that deeply into Combine yet, so I may be missing
> > something obvious.
> > > ---
> > > Wesley Tanaka
> > > https://wtanaka.com/
> > >
> > > On Monday, April 17, 2017, 11:32:29 PM HST, Aljoscha Krettek <
> > aljos...@apache.org> wrote:Hi,
> > > I think both fold and reduce fail to capture all the power or (what we
> > call) combine. Reduce requires a function of type (T, T) -> T. It
> requires
> > that the output type be the same as the input type. Fold takes a function
> > (T, A) -> A where T is the input type and A is the accumulation type.
> Here,
> > the output type can be different from the input type. However, there is
> no
> > way of combining these aggregators so the operation is not distributive,
> > i.e. we cannot hierarchically apply the operation.
> > >
> > > Combine is the generalisation of this: We have three types, T (input),
> A
> > (accumulator), O (output) and we require a function that can merge
> > accumulators. The operation is distributive, meaning we can efficiently
> > execute it and we can also have an output type that is different from the
> > input type.
> > >
> > > Quick FYI: in Flink the CombineFn is called AggregatingFunction and
> > CombiningState is AggregatingState.
> > >
> > > Best,
> > > Aljoscha
> > >> On 18. Apr 2017, at 04:29, Wesley Tanaka <wtan...@yahoo.com.INVALID>
> > wrote:
> > >>
> > >> As I start to understand Combine.Globally, it seems that it is, in
> > spirit, Beam's implementation of the "fold" higher-order function
> > >> https://en.wikipedia.org/wiki/Fold_(higher-order_function)#
> > Folds_in_various_languages
> > >>
> > >> Was there a reason the word "combine" was picked instead of either
> > "fold" or "reduce"?  From the wikipedia list above, it seems as though
> > "fold" and "reduce" are in much more common usage, so either of those
> might
> > be easier for newcomers to understand.
> > >> ---
> > >> Wesley Tanaka
> > >> http://wtanaka.com/
> >
> >
>

Reply via email to