I don't remember if there was a particular reason I didn't implement this as AlgebraicEvalFunc. It seems like it could be. I believe the Java MapReduce version leverages the combiner. If you want to try making this Algebraic we would be happy to accept a patch :)
-Matt > On Mar 7, 2015, at 12:11 PM, Ido Hadanny <ido.hada...@gmail.com> wrote: > > data.fu has a nice implementation of HyperLogLog for estimating cardinality > here > <https://github.com/apache/incubator-datafu/blob/master/datafu-pig/src/main/java/datafu/pig/stats/HyperLogLogPlusPlus.java> > > However, it's implemented as Accumulator which means it will run only at > the reducer and not in the combiner (but it will never load the entire set > into memory as in normal EvalFunc). Why couldn't data.fu implement it as > Algebraic - and fill the registers at every combiner, then merge and reduce > the result? Am I missing something here? > also available here: > http://stackoverflow.com/questions/28908217/why-is-data-fu-implementing-hyperloglog-as-an-accumulator-and-not-as-algebraic > > thanks! > > > -- > Sent from my androido