GitHub user BryanCutler opened a pull request:

    https://github.com/apache/spark/pull/19325

    [SPARK--22106][PYSPARK][SQL] Disable 0-parameter pandas_udf and add doctests

    ## What changes were proposed in this pull request?
    
    This change disables the use of 0-parameter pandas_udfs due to the API 
being overly complex and awkward, and can easily be worked around by using an 
index column as an input argument.  Also added doctests for pandas_udfs which 
revealed bugs for handling empty partitions and using the pandas_udf decorator.
    
    ## How was this patch tested?
    
    Reworked existing 0-parameter test to verify error is raised, added doctest 
for pandas_udf, added new tests for empty partition and decorator usage.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BryanCutler/spark 
arrow-pandas_udf-0-param-remove-SPARK-22106

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19325.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19325
    
----
commit c0eec8d2484a3aa2b9a4c5f6d7fb32125f33f623
Author: Bryan Cutler <cutl...@gmail.com>
Date:   2017-09-22T18:08:58Z

    disabled support for 0-parameter pandas_udfs

commit 7b0da106fb64a16b77c62953bb12548fda3f7ef3
Author: Bryan Cutler <cutl...@gmail.com>
Date:   2017-09-22T20:11:02Z

    added doctests, fix for decorator and empty partition

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to