[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821849#comment-16821849 ]
Florian Wilhelm commented on SPARK-21187: ----------------------------------------- I know that this actually does not help with resolving this issue, but for the time being I wrote up a little workaround how to still use Spark's `pandas_udf` and Arrow with Spark dataframes containing complex types. I hope it's of some use for PySpark users until this issue is fixed. [https://florianwilhelm.info/2019/04/more_efficient_udfs_with_pyspark/] > Complete support for remaining Spark data types in Arrow Converters > ------------------------------------------------------------------- > > Key: SPARK-21187 > URL: https://issues.apache.org/jira/browse/SPARK-21187 > Project: Spark > Issue Type: Umbrella > Components: PySpark, SQL > Affects Versions: 2.3.0 > Reporter: Bryan Cutler > Assignee: Bryan Cutler > Priority: Major > > This is to track adding the remaining type support in Arrow Converters. > Currently, only primitive data types are supported. ' > Remaining types: > * -*Date*- > * -*Timestamp*- > * *Complex*: Struct, -Array-, Arrays of Date/Timestamps, Map > * -*Decimal*- > * -*Binary*- > * Categorical when converting from Pandas > Some things to do before closing this out: > * -Look to upgrading to Arrow 0.7 for better Decimal support (can now write > values as BigDecimal)- > * -Need to add some user docs- > * -Make sure Python tests are thorough- > * Check into complex type support mentioned in comments by [~leif], should > we support mulit-indexing? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org