...Curiously enough, ReduceFn is by far the closest of all these to a sequential fold. It is also internal (runner-facing rather than user-facing).
On Tue, Apr 18, 2017 at 8:27 AM Dan Halperin <dhalp...@google.com.invalid> wrote: > Great discussion! As Aljoscha says, Fold, Reduce, and Combine are all > intertwined and not quite identical as we use them. > > Another simple but perhaps coy answer is that if you read the MapReduce > paper by Dean and Ghemawat that started this all, they used "Map", > "Reduce", and "Combine" (see section 4.3: > https://research.google.com/archive/mapreduce.html) > > So then it's likely just the lineage of Beam as "evolving from MapReduce" > :). [Looking around the source tree: we have MapElements, ReduceFn, and > Combine. And the DataflowRunner has Shuffle inside of GroupByKey. ;)] > > Dan > > On Tue, Apr 18, 2017 at 3:16 AM, Aljoscha Krettek <aljos...@apache.org> > wrote: > > > The definition of foldl in Haskell is the same as the description I gave > > earlier: > > > > foldl :: (a -> b -> a) -> a -> [b] -> a > > > > The function (a -> b -> a) is what I described as (T, A) -> A and it’s > > used to fold a list of b’s into an a (the accumulator type). > > > > You’re right that the mapping AccumT->OutputT is not important and could > > be delegated to a separate method. The important part of the interface is > > mergeAccumulators() since this makes the operation distributive: we can > > “fold” a bunch of Ts into As in parallel (even on different machines) and > > then merge them together. This is what is missing from a functional fold. > > > > Best, > > Aljoscha > > > > > > > On 18. Apr 2017, at 12:03, Wesley Tanaka <wtan...@yahoo.com.INVALID> > > wrote: > > > > > > I believe that foldl in Haskell https://www.haskell.org/ > > hoogle/?hoogle=foldl admits a separate accumulator type from the type of > > the data structure being "folded" > > > And, well, python lets you have your way with mixing types, but this > > certainly works as another example:python -c "print(reduce(lambda ac, > elem: > > '%s%d' % (ac,elem), [1,2,3,4,5], ''))" > > > Is there anything special about the AccumT->OutputT conversion that > > extractOutput() needs to be in the same interface as createAccumulator(), > > addInput() and mergeAccumulators()? If the interface were segregated > such > > that one interface managed the InputT->AccumT conversion, and the second > > managed the AccumT->InputT conversion, it seems like maybe the > > AccumT->OutputT conversion could even get replaced with MapElements? And > > then the full current "Combine" functionality could be implemented as a > > composition of the lower-level primitives? > > > I haven't dug that deeply into Combine yet, so I may be missing > > something obvious. > > > --- > > > Wesley Tanaka > > > https://wtanaka.com/ > > > > > > On Monday, April 17, 2017, 11:32:29 PM HST, Aljoscha Krettek < > > aljos...@apache.org> wrote:Hi, > > > I think both fold and reduce fail to capture all the power or (what we > > call) combine. Reduce requires a function of type (T, T) -> T. It > requires > > that the output type be the same as the input type. Fold takes a function > > (T, A) -> A where T is the input type and A is the accumulation type. > Here, > > the output type can be different from the input type. However, there is > no > > way of combining these aggregators so the operation is not distributive, > > i.e. we cannot hierarchically apply the operation. > > > > > > Combine is the generalisation of this: We have three types, T (input), > A > > (accumulator), O (output) and we require a function that can merge > > accumulators. The operation is distributive, meaning we can efficiently > > execute it and we can also have an output type that is different from the > > input type. > > > > > > Quick FYI: in Flink the CombineFn is called AggregatingFunction and > > CombiningState is AggregatingState. > > > > > > Best, > > > Aljoscha > > >> On 18. Apr 2017, at 04:29, Wesley Tanaka <wtan...@yahoo.com.INVALID> > > wrote: > > >> > > >> As I start to understand Combine.Globally, it seems that it is, in > > spirit, Beam's implementation of the "fold" higher-order function > > >> https://en.wikipedia.org/wiki/Fold_(higher-order_function)# > > Folds_in_various_languages > > >> > > >> Was there a reason the word "combine" was picked instead of either > > "fold" or "reduce"? From the wikipedia list above, it seems as though > > "fold" and "reduce" are in much more common usage, so either of those > might > > be easier for newcomers to understand. > > >> --- > > >> Wesley Tanaka > > >> http://wtanaka.com/ > > > > >