[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...

clockfly Mon, 22 Aug 2016 08:49:29 -0700

GitHub user clockfly opened a pull request:

    https://github.com/apache/spark/pull/14753


    [SPARK-17187][SQL] Supports using arbitrary Java object as internal 
aggregation buffer object

    ## What changes were proposed in this pull request?
    
    This PR introduces an abstract class `TypedImperativeAggregate` so that an 
aggregation function of TypedImperativeAggregate can use  **arbitrary** 
user-defined Java object as intermediate aggregation buffer object.
    
    **This has advantages like:**
    1. It now can support larger category of aggregation functions. For 
example, it will be much easier to implement aggregation function 
`percentile_approx`, which has a complex aggregation buffer definition.
    2. It can be used to avoid doing serialization/de-serialization for every 
call of `update` or `merge` when converting domain specific aggregation object 
to internal Spark-Sql storage format.
    3. It is easier to integrate with other existing monoid libraries like 
algebird, and supports more aggregation functions with high performance. 
    
    Please see Java doc of `TypedImperativeAggregate` and Jira ticket 
SPARK-17187 for more information.
    
    ## How was this patch tested?
    
    Unit tests.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/clockfly/spark object_aggregation_buffer_try_2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14753.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14753
    
----
commit 6efddadcb8e6d48e9898a8980f4dcceee4894ebc
Author: Sean Zhong <seanzh...@databricks.com>
Date:   2016-08-19T16:34:56Z

    object aggregation buffer

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14753: [SPARK-17187][SQL] Supports using arbitrary Java ...

Reply via email to