https://issues.apache.org/jira/browse/DATAFU-91
On 27 April 2015 at 18:02, Matthew Hayes <matthew.terence.ha...@gmail.com> wrote: > Great thanks :) Please file a JIRA and attach the patch there. > > -Matt > > On Apr 27, 2015, at 6:26 AM, Ido Hadanny <ido.hada...@gmail.com> wrote: > > Hey guys, > patch is attached + tested on unit-tests + We're testing it on a > 1000-nodes real hadoop cluster as we speak. > Do you want us to create a jira issue for this, or is this good enough? > Thanks, Ilia and Ido > > On 7 March 2015 at 23:09, Matthew Hayes <matthew.terence.ha...@gmail.com> > wrote: > >> I don't remember if there was a particular reason I didn't implement this >> as AlgebraicEvalFunc. It seems like it could be. I believe the Java >> MapReduce version leverages the combiner. If you want to try making this >> Algebraic we would be happy to accept a patch :) >> >> -Matt >> >> > On Mar 7, 2015, at 12:11 PM, Ido Hadanny <ido.hada...@gmail.com> wrote: >> > >> > data.fu has a nice implementation of HyperLogLog for estimating >> cardinality >> > here >> > < >> https://github.com/apache/incubator-datafu/blob/master/datafu-pig/src/main/java/datafu/pig/stats/HyperLogLogPlusPlus.java >> > >> > >> > However, it's implemented as Accumulator which means it will run only at >> > the reducer and not in the combiner (but it will never load the entire >> set >> > into memory as in normal EvalFunc). Why couldn't data.fu implement it as >> > Algebraic - and fill the registers at every combiner, then merge and >> reduce >> > the result? Am I missing something here? >> > also available here: >> > >> http://stackoverflow.com/questions/28908217/why-is-data-fu-implementing-hyperloglog-as-an-accumulator-and-not-as-algebraic >> > >> > thanks! >> > >> > >> > -- >> > Sent from my androido >> > > > > -- > Sent from my androido > > <hyper-log-log-algebraic.diff> > > -- Sent from my androido