[ 
https://issues.apache.org/jira/browse/SPARK-22216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199516#comment-16199516
 ] 

Li Jin commented on SPARK-22216:
--------------------------------

[~hyukjin.kwon],

My intention is to keep this open to track all the features that we want to do 
related to PySpark/Pandas interoperability and keep adding sub tasks as they 
come up. 

For existing Jiras, there are speed up to '"createDataFrame" and support for 
complex types.

For new features, there are more functions that I think we can add pandas udf 
support to, for instance, window functions. I haven't created Jiras for them 
yet.

> Improving PySpark/Pandas interoperability
> -----------------------------------------
>
>                 Key: SPARK-22216
>                 URL: https://issues.apache.org/jira/browse/SPARK-22216
>             Project: Spark
>          Issue Type: Epic
>          Components: PySpark
>    Affects Versions: 2.2.0
>            Reporter: Li Jin
>
> This is an umbrella ticket tracking the general effect of improving 
> performance and interoperability between PySpark and Pandas. The core idea is 
> to Apache Arrow as serialization format to reduce the overhead between 
> PySpark and Pandas.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to