data.fu has a nice implementation of HyperLogLog for estimating cardinality here <https://github.com/apache/incubator-datafu/blob/master/datafu-pig/src/main/java/datafu/pig/stats/HyperLogLogPlusPlus.java>
However, it's implemented as Accumulator which means it will run only at the reducer and not in the combiner (but it will never load the entire set into memory as in normal EvalFunc). Why couldn't data.fu implement it as Algebraic - and fill the registers at every combiner, then merge and reduce the result? Am I missing something here? also available here: http://stackoverflow.com/questions/28908217/why-is-data-fu-implementing-hyperloglog-as-an-accumulator-and-not-as-algebraic thanks! -- Sent from my androido