[ https://issues.apache.org/jira/browse/DATAFU-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581699#comment-14581699 ]
Jan Willem commented on DATAFU-91: ---------------------------------- I am under the impression that the type output by both initial and intermediate should be the same. Looking at how AVG is implemented (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.8.0/org/apache/pig/builtin/AVG.java), or the udf manual on the wiki (https://wiki.apache.org/pig/UDFManual search for algebraic), they return a tuple containing the combined information. I'm probably just obsessing about the type cast, but you could get around it by using a tagged union: a tuple of two, with the first element indicating whether it's a Long in the second field, or rather a serialized HyperLogLogPlus. So: (1, <longvalue>) or (2, <HyperLogLogPlus>). You could also go for the first value containing the number of items, and as a special case have the 1 case contain just a long value: (1, <longvalue>) or (<value larger than 1>, <HyperLogLogValue>). It would only add a little to the size, and get rid of the instanceof. It's just a matter of opinion, I guess. > pig version of HyperLogLog estimator should be Algebraic and use combiners > -------------------------------------------------------------------------- > > Key: DATAFU-91 > URL: https://issues.apache.org/jira/browse/DATAFU-91 > Project: DataFu > Issue Type: Bug > Affects Versions: 1.3.0 > Reporter: Ido Hadanny > Assignee: Ido Hadanny > Priority: Minor > Fix For: 1.3.0 > > Attachments: hyper-log-log-algebraic-3.diff, > hyper-log-log-algebraic.diff, hyper-log-log-algebraic.diff > > > Matt: I don't remember if there was a particular reason I didn't implement > this as AlgebraicEvalFunc. It seems like it could be. I believe the Java > MapReduce version leverages the combiner. If you want to try making this > Algebraic we would be happy to accept a patch :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)