[ https://issues.apache.org/jira/browse/SPARK-15598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303374#comment-15303374 ]
Apache Spark commented on SPARK-15598: -------------------------------------- User 'techaddict' has created a pull request for this issue: https://github.com/apache/spark/pull/13347 > Change Aggregator.zero to Aggregator.init > ----------------------------------------- > > Key: SPARK-15598 > URL: https://issues.apache.org/jira/browse/SPARK-15598 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Reynold Xin > > org.apache.spark.sql.expressions.Aggregator currently requires defining the > zero value for an aggregator. This is actually a limitation making it > difficult to implement APIs such as reduce. In reduce (or reduceByKey), a > single associative and commutative reduce function is specified by the user, > and there is no definition of zero value. > A small tweak to the API is to change zero to init, taking an input, similar > to the following: > {code} > abstract class Aggregator[-IN, BUF, OUT] extends Serializable { > def init(a: IN): BUF > def reduce(b: BUF, a: IN): BUF > def merge(b1: BUF, b2: BUF): BUF > def finish(reduction: BUF): OUT > } > {code} > Then reduce can be implemented using: > {code} > f: (T, T) => T > new Aggregator[T, T, T] { > override def init(a: T): T = identify > override def reduce(b: T, a: T): T = f(b, a) > override def merge(b1: T, b2: T): T = f(b1, b2) > override def finish(reduction: T): T = identify > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org