[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

Florian Wilhelm (JIRA) Fri, 19 Apr 2019 04:19:26 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821849#comment-16821849
 ]


Florian Wilhelm commented on SPARK-21187:
-----------------------------------------

I know that this actually does not help with resolving this issue, but for the 
time being I wrote up a little workaround how to still use Spark's `pandas_udf` 
and Arrow with Spark dataframes containing complex types. I hope it's of some 
use for PySpark users until this issue is fixed. 
[https://florianwilhelm.info/2019/04/more_efficient_udfs_with_pyspark/]

> Complete support for remaining Spark data types in Arrow Converters
> -------------------------------------------------------------------
>
>                 Key: SPARK-21187
>                 URL: https://issues.apache.org/jira/browse/SPARK-21187
>             Project: Spark
>          Issue Type: Umbrella
>          Components: PySpark, SQL
>    Affects Versions: 2.3.0
>            Reporter: Bryan Cutler
>            Assignee: Bryan Cutler
>            Priority: Major
>
> This is to track adding the remaining type support in Arrow Converters. 
> Currently, only primitive data types are supported. '
> Remaining types:
>  * -*Date*-
>  * -*Timestamp*-
>  * *Complex*: Struct, -Array-, Arrays of Date/Timestamps, Map
>  * -*Decimal*-
>  * -*Binary*-
> * Categorical when converting from Pandas
> Some things to do before closing this out:
>  * -Look to upgrading to Arrow 0.7 for better Decimal support (can now write 
> values as BigDecimal)-
>  * -Need to add some user docs-
>  * -Make sure Python tests are thorough-
>  * Check into complex type support mentioned in comments by [~leif], should 
> we support mulit-indexing?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

Reply via email to