Great, this is https://issues.apache.org/jira/browse/BEAM-2049 via https://github.com/apache/beam/pull/2636.
On Fri, Apr 21, 2017 at 11:40 PM, Pei HE <p...@apache.org> wrote: > +1 > > On Sat, Apr 22, 2017 at 12:16 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > > +1 > > > > Regards > > JB > > > > > > On 04/21/2017 07:24 PM, Kenneth Knowles wrote: > > > >> Hi all, > >> > >> I propose that we remove KeyedCombineFn before the first stable release. > >> > >> I don't think it adds enough value for the complexity it adds to e.g. > >> CombineWithContext [1] and state [2, 3], and it doesn't seem to me that > >> users really use it when we might expect. I am happy to be demonstrated > >> wrong. > >> > >> It is very likely that you have never written [4, 5] or thought about > >> KeyedCombineFn. So for context, here are excepts from signatures just to > >> show the difference from CombineFn: > >> > >> CombineFn<InputT, AccumT, OutputT> { > >> AccumT createAccumulator(); > >> AccumT addInput(AccumT accum, InputT input); > >> AccumT mergeAccumulators(Iterable<AccumT> accums); > >> OutputT extractOutput(AccumT accum); > >> } > >> > >> KeyedCombineFn<K, InputT, AccumT, OutputT> { > >> AccumT createAccumulator(K key); > >> AccumT addInput(K key, AccumT accum, InputT input); > >> AccumT mergeAccumulators(K key, Iterable<AccumT> accums); > >> OutputT extractOutput(K key, AccumT accum); > >> } > >> > >> So what are the particular reasons for this, versus a CombineFn that has > >> KVs as its input and accumulator types? > >> > >> - There are some performance improvements potentially from not passing > >> keys around, based on the assumption they are always available. > >> > >> - There is also a spec difference because it only has to be associative > >> and commutative per key, cannot be applied in a global combine, and > >> addInput is automatically key preserving. > >> > >> But in fact, in all of my code crawling the class is almost never used > >> (even over the course of its history at Google) and even the few uses I > >> found were often mistakes where the key is totally ignored, probably > >> because a user thinks "I am doing a keyed combine so I need a keyed > >> combine > >> function". So the number of users actually affected is about zero. > >> > >> I would be curious if anyone has a compelling case for keeping > >> KeyedCombineFn. > >> > >> Kenn > >> > >> [1] > >> https://github.com/yafengguo/Apache-beam/blob/master/sdks/ja > >> va/core/src/main/java/org/apache/beam/sdk/transforms/Combine > >> WithContext.java > >> [2] https://issues.apache.org/jira/browse/BEAM-1336 > >> [3] https://github.com/apache/beam/pull/2627 > >> [4] > >> https://github.com/search?l=Java&q=KeyedCombineFn&ref=advsea > >> rch&type=Code&utf8=%E2%9C%93 > >> [5] https://www.google.com/search?q=KeyedCombineFn > >> > >> > > -- > > Jean-Baptiste Onofré > > jbono...@apache.org > > http://blog.nanthrax.net > > Talend - http://www.talend.com > > >