data.fu has a nice implementation of HyperLogLog for estimating cardinality
here
<https://github.com/apache/incubator-datafu/blob/master/datafu-pig/src/main/java/datafu/pig/stats/HyperLogLogPlusPlus.java>

However, it's implemented as Accumulator which means it will run only at
the reducer and not in the combiner (but it will never load the entire set
into memory as in normal EvalFunc). Why couldn't data.fu implement it as
Algebraic - and fill the registers at every combiner, then merge and reduce
the result? Am I missing something here?
also available here:
http://stackoverflow.com/questions/28908217/why-is-data-fu-implementing-hyperloglog-as-an-accumulator-and-not-as-algebraic

thanks!


-- 
Sent from my androido

Reply via email to