[ 
https://issues.apache.org/jira/browse/DATAFU-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581699#comment-14581699
 ] 

Jan Willem commented on DATAFU-91:
----------------------------------

I am under the impression that the type output by both initial and intermediate 
should be the same. Looking at how AVG is implemented 
(http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pig/pig/0.8.0/org/apache/pig/builtin/AVG.java),
 or the udf manual on the wiki (https://wiki.apache.org/pig/UDFManual search 
for algebraic), they return a tuple containing the combined information.

I'm probably just obsessing about the type cast, but you could get around it by 
using a tagged union: a tuple of two, with the first element indicating whether 
it's a Long in the second field, or rather a serialized HyperLogLogPlus. So: 
(1, <longvalue>) or (2, <HyperLogLogPlus>). You could also go for the first 
value containing the number of items, and as a special case have the 1 case 
contain just a long value: (1, <longvalue>) or (<value larger than 1>, 
<HyperLogLogValue>).

It would only add a little to the size, and get rid of the instanceof.

It's just a matter of opinion, I guess.

> pig version of HyperLogLog estimator should be Algebraic and use combiners
> --------------------------------------------------------------------------
>
>                 Key: DATAFU-91
>                 URL: https://issues.apache.org/jira/browse/DATAFU-91
>             Project: DataFu
>          Issue Type: Bug
>    Affects Versions: 1.3.0
>            Reporter: Ido Hadanny
>            Assignee: Ido Hadanny
>            Priority: Minor
>             Fix For: 1.3.0
>
>         Attachments: hyper-log-log-algebraic-3.diff, 
> hyper-log-log-algebraic.diff, hyper-log-log-algebraic.diff
>
>
> Matt: I don't remember if there was a particular reason I didn't implement 
> this as AlgebraicEvalFunc. It seems like it could be. I believe the Java 
> MapReduce version leverages the combiner. If you want to try making this 
> Algebraic we would be happy to accept a patch :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to