[
https://issues.apache.org/jira/browse/DATAFU-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547249#comment-14547249
]
Matthew Hayes commented on DATAFU-91:
-------------------------------------
Can you open a review board to make it easier to comment? See:
https://reviews.apache.org/r/
Some comments:
* This still implements AccumulatorEvalFunc. Don't you want to use
AlgebraicEvalFunc? I don't think your new methods will be used otherwise.
* The indention isn't consistent. Take a look at the diff. Are you using tabs
or spaces?
* Once you implement AlgebraicEvalFunc I don't believe the accumulate method is
necessary anymore because we get a free implementation based off the algebraic
methods.
* Seems Initial's exec should just fail if it doesn't get any input instead of
using the final return statement.
* Shouldn't Initial just be a pass through that does nothing?
* Take a look at
datafu-hourglass/src/test/java/datafu/hourglass/demo/EstimateCardinality.java.
This is a Hourglass demo of the same thing you are trying to do and you can
probably base your implementation off of this. The estimator can be serialized
to a byte array and then merged with other estimators. I think your
Intermediate exec method should basically do this. It offers each element to
the estimator and then calls getBytes at the end. The bytes are passed to the
Final exec and merge into a final estimator, from which the cardinality can be
retrieved.
> pig version of HyperLogLog estimator should be Algebraic and use combiners
> --------------------------------------------------------------------------
>
> Key: DATAFU-91
> URL: https://issues.apache.org/jira/browse/DATAFU-91
> Project: DataFu
> Issue Type: Bug
> Affects Versions: 1.3.0
> Reporter: Ido Hadanny
> Assignee: Ido Hadanny
> Priority: Minor
> Fix For: 1.3.0
>
> Attachments: hyper-log-log-algebraic.diff
>
>
> Matt: I don't remember if there was a particular reason I didn't implement
> this as AlgebraicEvalFunc. It seems like it could be. I believe the Java
> MapReduce version leverages the combiner. If you want to try making this
> Algebraic we would be happy to accept a patch :)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)