[ https://issues.apache.org/jira/browse/DATAFU-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547249#comment-14547249 ]
Matthew Hayes commented on DATAFU-91: ------------------------------------- Can you open a review board to make it easier to comment? See: https://reviews.apache.org/r/ Some comments: * This still implements AccumulatorEvalFunc. Don't you want to use AlgebraicEvalFunc? I don't think your new methods will be used otherwise. * The indention isn't consistent. Take a look at the diff. Are you using tabs or spaces? * Once you implement AlgebraicEvalFunc I don't believe the accumulate method is necessary anymore because we get a free implementation based off the algebraic methods. * Seems Initial's exec should just fail if it doesn't get any input instead of using the final return statement. * Shouldn't Initial just be a pass through that does nothing? * Take a look at datafu-hourglass/src/test/java/datafu/hourglass/demo/EstimateCardinality.java. This is a Hourglass demo of the same thing you are trying to do and you can probably base your implementation off of this. The estimator can be serialized to a byte array and then merged with other estimators. I think your Intermediate exec method should basically do this. It offers each element to the estimator and then calls getBytes at the end. The bytes are passed to the Final exec and merge into a final estimator, from which the cardinality can be retrieved. > pig version of HyperLogLog estimator should be Algebraic and use combiners > -------------------------------------------------------------------------- > > Key: DATAFU-91 > URL: https://issues.apache.org/jira/browse/DATAFU-91 > Project: DataFu > Issue Type: Bug > Affects Versions: 1.3.0 > Reporter: Ido Hadanny > Assignee: Ido Hadanny > Priority: Minor > Fix For: 1.3.0 > > Attachments: hyper-log-log-algebraic.diff > > > Matt: I don't remember if there was a particular reason I didn't implement > this as AlgebraicEvalFunc. It seems like it could be. I believe the Java > MapReduce version leverages the combiner. If you want to try making this > Algebraic we would be happy to accept a patch :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)