[
https://issues.apache.org/jira/browse/DATAFU-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580301#comment-14580301
]
Jan Willem commented on DATAFU-91:
----------------------------------
Hi [~matterhayes], I agree that a long is much smaller than a HyperLogLogPlus.
But I'm wondering if deviating from the specification is worthwhile, or a
premature optimisation. Did you perchance try out the difference in performance?
If it were my own code, I would throw out the accumulator implementation, just
like you did, but the documentation says you shouldn't :-(
Two implementations means extra maintenance, more chance of bugs, and it's even
unclear to users which implementation is actually used. So you can be the
judge...
> pig version of HyperLogLog estimator should be Algebraic and use combiners
> --------------------------------------------------------------------------
>
> Key: DATAFU-91
> URL: https://issues.apache.org/jira/browse/DATAFU-91
> Project: DataFu
> Issue Type: Bug
> Affects Versions: 1.3.0
> Reporter: Ido Hadanny
> Assignee: Ido Hadanny
> Priority: Minor
> Fix For: 1.3.0
>
> Attachments: hyper-log-log-algebraic-3.diff,
> hyper-log-log-algebraic.diff, hyper-log-log-algebraic.diff
>
>
> Matt: I don't remember if there was a particular reason I didn't implement
> this as AlgebraicEvalFunc. It seems like it could be. I believe the Java
> MapReduce version leverages the combiner. If you want to try making this
> Algebraic we would be happy to accept a patch :)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)