[ 
https://issues.apache.org/jira/browse/SPARK-15598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303418#comment-15303418
 ] 

koert kuipers commented on SPARK-15598:
---------------------------------------

the reason i ask is that if you plan to do:
{noformat}
inputs.foldLeft(aggregator.init(input.head))(aggregator.reduce _)
{noformat}
then i think your implementation of reduce using an Aggregator will not work, 
since the first element gets added twice.

but looking at the code for TypedAggregateExpression and DeclarativeAggregate 
the alternative involves serious changes... 


> Change Aggregator.zero to Aggregator.init
> -----------------------------------------
>
>                 Key: SPARK-15598
>                 URL: https://issues.apache.org/jira/browse/SPARK-15598
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>
> org.apache.spark.sql.expressions.Aggregator currently requires defining the 
> zero value for an aggregator. This is actually a limitation making it 
> difficult to implement APIs such as reduce. In reduce (or reduceByKey), a 
> single associative and commutative reduce function is specified by the user, 
> and there is no definition of zero value.
> A small tweak to the API is to change zero to init, taking an input, similar 
> to the following:
> {code}
> abstract class Aggregator[-IN, BUF, OUT] extends Serializable {
>   def init(a: IN): BUF
>   def reduce(b: BUF, a: IN): BUF
>   def merge(b1: BUF, b2: BUF): BUF
>   def finish(reduction: BUF): OUT
> }
> {code}
> Then reduce can be implemented using:
> {code}
> f: (T, T) => T
> new Aggregator[T, T, T] {
>   override def init(a: T): T = identify
>   override def reduce(b: T, a: T): T = f(b, a)
>   override def merge(b1: T, b2: T): T = f(b1, b2)
>   override def finish(reduction: T): T = identify
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to