GitHub user icexelloss opened a pull request:

    https://github.com/apache/spark/pull/21650

    [SPARK-24624] Support mixture of Python UDF and Scalar Pandas UDF

    ## What changes were proposed in this pull request?
    
    This PR add supports for using mixed Python UDF and Scalar Pandas UDF, in 
the following two cases:
    
    (1)
    ```
    f1 = udf(lambda x: x + 1, 'int')
    f2 = pandas_udf(lambda x: x + 2, 'int')
    
    df = ...
    df = df.withColumn('foo', f1(df['v']))
    df = df.withColumn('bar', f2(df['v']))
    ```
    
    (2)
    ```
    f1 = udf(lambda x: x + 1, 'int')
    f2 = pandas_udf(lambda x: x + 2, 'int')
    
    df = ...
    df = df.withColumn('foo', f2(f1(df['v'])))
    ```
    ## How was this patch tested?
    
    New tests are added to BatchEvalPythonExecSuite and ScalarPandasUDFTests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/icexelloss/spark SPARK-24624-mix-udf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21650.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21650
    
----
commit 48ae822bcdf6df40b181f86379d275d602c580c9
Author: Li Jin <ice.xelloss@...>
Date:   2018-06-22T18:35:34Z

    wip

commit 68e665ec981c1a7cae46398bc2ea8a4880e95331
Author: Li Jin <ice.xelloss@...>
Date:   2018-06-27T22:31:25Z

    Test passes

commit 6b47b69305257e9ee9f5135968913a4f92731ef5
Author: Li Jin <ice.xelloss@...>
Date:   2018-06-27T22:34:28Z

    Remove white spaces

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to