[GitHub] spark pull request #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

ueshin Tue, 17 Oct 2017 07:29:36 -0700

GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/19517


    [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().apply() with pandas udf

    ## What changes were proposed in this pull request?
    
    This is a follow-up of #18732.
    This pr modifies `GroupedData.apply()` method to convert pandas udf to 
grouped udf implicitly.
    
    ## How was this patch tested?
    
    Exisiting tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-20396/fup2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19517.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19517
    
----
commit 4d2bd959e1eeabb4f72cfbb52a374ce721030507
Author: Takuya UESHIN <ues...@databricks.com>
Date:   2017-10-16T06:45:55Z

    Introduce `@pandas_grouped_udf` decorator for grouped vectorized UDF.

commit f0968702038e11c9c9a8f305c61f72d3f9e00f9a
Author: Takuya UESHIN <ues...@databricks.com>
Date:   2017-10-16T08:03:30Z

    Use PythonUdfType instead of vectorized and grouped.

commit 639af2cee77456271d5f2f536d4712ab8e01a89d
Author: Takuya UESHIN <ues...@databricks.com>
Date:   2017-10-16T13:42:58Z

    Update an error message.

commit 10512a64a9560eee6d3f65802abd042dedf0cafb
Author: Takuya UESHIN <ues...@databricks.com>
Date:   2017-10-16T13:43:51Z

    Add a test to use data type string.

commit 789e642763ab4f59e14137fcc75b514223bc7aae
Author: Takuya UESHIN <ues...@databricks.com>
Date:   2017-10-16T14:13:43Z

    Restrict the number of arguments for grouped udf to only 1.

commit 122a7bccaff11def2c12cfccdd00244394ed3478
Author: Takuya UESHIN <ues...@databricks.com>
Date:   2017-10-16T16:24:03Z

    Restrict checking the number of arguments.

commit fdafb3561d44ca2583380b7aeaf7843ce5285b1e
Author: Takuya UESHIN <ues...@databricks.com>
Date:   2017-10-16T16:54:23Z

    Revert "Restrict checking the number of arguments."
    
    This reverts commit 122a7bccaff11def2c12cfccdd00244394ed3478.

commit 94d05f4f8d5c663319ec12668dbd1206ffa2e83a
Author: Takuya UESHIN <ues...@databricks.com>
Date:   2017-10-16T18:10:50Z

    Address comments.

commit 733296951b45d760aa0a8465eb0189077ea67372
Author: Takuya UESHIN <ues...@databricks.com>
Date:   2017-10-16T18:33:08Z

    Add tests for unsupported type.

commit 85f250d0eda56606a599c5fb15046ef0fd63a3c4
Author: Takuya UESHIN <ues...@databricks.com>
Date:   2017-10-17T04:59:34Z

    Address a comment.

commit 7b386c4be48c0a2e8de6f04cf341de13e8e98444
Author: Takuya UESHIN <ues...@databricks.com>
Date:   2017-10-17T14:12:37Z

    Remove `@pandas_grouped_udf` and convert implicitly.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19517: [SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby()....

Reply via email to