GitHub user clockfly opened a pull request: https://github.com/apache/spark/pull/14753
[SPARK-17187][SQL] Supports using arbitrary Java object as internal aggregation buffer object ## What changes were proposed in this pull request? This PR introduces an abstract class `TypedImperativeAggregate` so that an aggregation function of TypedImperativeAggregate can use **arbitrary** user-defined Java object as intermediate aggregation buffer object. **This has advantages like:** 1. It now can support larger category of aggregation functions. For example, it will be much easier to implement aggregation function `percentile_approx`, which has a complex aggregation buffer definition. 2. It can be used to avoid doing serialization/de-serialization for every call of `update` or `merge` when converting domain specific aggregation object to internal Spark-Sql storage format. 3. It is easier to integrate with other existing monoid libraries like algebird, and supports more aggregation functions with high performance. Please see Java doc of `TypedImperativeAggregate` and Jira ticket SPARK-17187 for more information. ## How was this patch tested? Unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/clockfly/spark object_aggregation_buffer_try_2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14753.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14753 ---- commit 6efddadcb8e6d48e9898a8980f4dcceee4894ebc Author: Sean Zhong <seanzh...@databricks.com> Date: 2016-08-19T16:34:56Z object aggregation buffer ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org