[ 
https://issues.apache.org/jira/browse/DATAFU-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547249#comment-14547249
 ] 

Matthew Hayes commented on DATAFU-91:
-------------------------------------

Can you open a review board to make it easier to comment?  See: 
https://reviews.apache.org/r/

Some comments:

* This still implements AccumulatorEvalFunc.  Don't you want to use 
AlgebraicEvalFunc?  I don't think your new methods will be used otherwise.
* The indention isn't consistent.  Take a look at the diff.  Are you using tabs 
or spaces?
* Once you implement AlgebraicEvalFunc I don't believe the accumulate method is 
necessary anymore because we get a free implementation based off the algebraic 
methods.
* Seems Initial's exec should just fail if it doesn't get any input instead of 
using the final return statement.
* Shouldn't Initial just be a pass through that does nothing?
* Take a look at 
datafu-hourglass/src/test/java/datafu/hourglass/demo/EstimateCardinality.java.  
This is a Hourglass demo of the same thing you are trying to do and you can 
probably base your implementation off of this.  The estimator can be serialized 
to a byte array and then merged with other estimators.  I think your 
Intermediate exec method should basically do this.  It offers each element to 
the estimator and then calls getBytes at the end.  The bytes are passed to the 
Final exec and merge into a final estimator, from which the cardinality can be 
retrieved.

> pig version of HyperLogLog estimator should be Algebraic and use combiners
> --------------------------------------------------------------------------
>
>                 Key: DATAFU-91
>                 URL: https://issues.apache.org/jira/browse/DATAFU-91
>             Project: DataFu
>          Issue Type: Bug
>    Affects Versions: 1.3.0
>            Reporter: Ido Hadanny
>            Assignee: Ido Hadanny
>            Priority: Minor
>             Fix For: 1.3.0
>
>         Attachments: hyper-log-log-algebraic.diff
>
>
> Matt: I don't remember if there was a particular reason I didn't implement 
> this as AlgebraicEvalFunc. It seems like it could be. I believe the Java 
> MapReduce version leverages the combiner. If you want to try making this 
> Algebraic we would be happy to accept a patch :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to