I don't remember if there was a particular reason I didn't implement this as 
AlgebraicEvalFunc. It seems like it could be. I believe the Java MapReduce 
version leverages the combiner. If you want to try making this Algebraic we 
would be happy to accept a patch :) 

-Matt

> On Mar 7, 2015, at 12:11 PM, Ido Hadanny <ido.hada...@gmail.com> wrote:
> 
> data.fu has a nice implementation of HyperLogLog for estimating cardinality
> here
> <https://github.com/apache/incubator-datafu/blob/master/datafu-pig/src/main/java/datafu/pig/stats/HyperLogLogPlusPlus.java>
> 
> However, it's implemented as Accumulator which means it will run only at
> the reducer and not in the combiner (but it will never load the entire set
> into memory as in normal EvalFunc). Why couldn't data.fu implement it as
> Algebraic - and fill the registers at every combiner, then merge and reduce
> the result? Am I missing something here?
> also available here:
> http://stackoverflow.com/questions/28908217/why-is-data-fu-implementing-hyperloglog-as-an-accumulator-and-not-as-algebraic
> 
> thanks!
> 
> 
> -- 
> Sent from my androido

Reply via email to