[jira] [Commented] (SPARK-27463) Support Dataframe Cogroup via Pandas UDFs

Hyukjin Kwon (JIRA) Wed, 12 Jun 2019 04:09:59 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-27463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861999#comment-16861999
 ]


Hyukjin Kwon commented on SPARK-27463:
--------------------------------------

It's easier and safer to find a reference to justify new API and avoid to 
implement a API from scratch.
I think usually our Pandas UDF APIs mimic Pandas' or borrow some idea from 
there (e.g., groupby().apply(...)), and then make it distinct within PySpark.
There are some other examples that works just like other PySpark (or Scala side 
Spark) APIs too (e.g., Windows Pandas UDF).


> Support Dataframe Cogroup via Pandas UDFs 
> ------------------------------------------
>
>                 Key: SPARK-27463
>                 URL: https://issues.apache.org/jira/browse/SPARK-27463
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 3.0.0
>            Reporter: Chris Martin
>            Priority: Major
>
> Recent work on Pandas UDFs in Spark, has allowed for improved 
> interoperability between Pandas and Spark.  This proposal aims to extend this 
> by introducing a new Pandas UDF type which would allow for a cogroup 
> operation to be applied to two PySpark DataFrames.
> Full details are in the google document linked below.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27463) Support Dataframe Cogroup via Pandas UDFs

Reply via email to