[ 
https://issues.apache.org/jira/browse/SPARK-18728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723866#comment-15723866
 ] 

Alex Levenson commented on SPARK-18728:
---------------------------------------

I think the main selling point of Algebird aggregators are:

1) They are composable (you can take a Min aggregator and combine it with a Max 
aggregator to get an aggregator that gets both the Min + Max in 1 pass) -- as 
[~mashraf] points out, you can compose many times to get lots of aggregations 
in 1 pass

2) They have the option for efficient addition methods -- they use algebird's 
Semigroup, which has both plus(a,b) for adding 2 items, and sumOption(iter: 
TraversableOnce[T]) for adding N items. This allows for opting in to efficient 
additions without having a mutable API (sumOption can be mutable internally, 
but it has to be referentially transparent)

3) There are many already built implementations of Aggregator for both common 
types as well as probabilistic data structures available in algebird.

> Consider using Algebird's Aggregator instead of 
> org.apache.spark.sql.expressions.Aggregator
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18728
>                 URL: https://issues.apache.org/jira/browse/SPARK-18728
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Alex Levenson
>            Priority: Minor
>
> Mansur (https://twitter.com/mansur_ashraf) pointed out this comment in 
> spark's Aggregator here:
> "Based loosely on Aggregator from Algebird: 
> https://github.com/twitter/algebird";
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala#L46
> Which got a few of us wondering, given that this API is still experimental, 
> would you consider using algebird's Aggregator API directly instead?
> The algebird API is not coupled with any implementation details, and 
> shouldn't have any extra dependencies.
> Are there any blockers to doing that?
> Thanks!
> Alex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to