[ https://issues.apache.org/jira/browse/SPARK-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589288#comment-16589288 ]
Wes McKinney commented on SPARK-21375: -------------------------------------- Seems there might be some requirements that need to be propagated upstream to Arrow. If so, please create a follow on JIRA, thanks! > Add date and timestamp support to ArrowConverters for toPandas() collection > --------------------------------------------------------------------------- > > Key: SPARK-21375 > URL: https://issues.apache.org/jira/browse/SPARK-21375 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL > Affects Versions: 2.3.0 > Reporter: Bryan Cutler > Assignee: Bryan Cutler > Priority: Major > Fix For: 2.3.0 > > > Date and timestamp are not yet supported in DataFrame.toPandas() using > ArrowConverters. These are common types for data analysis used in both Spark > and Pandas and should be supported. > There is a discrepancy with the way that PySpark and Arrow store timestamps, > without timezone specified, internally. PySpark takes a UTC timestamp that > is adjusted to local time and Arrow is in UTC time. Hopefully there is a > clean way to resolve this. > Spark internal storage spec: > * *DateType* stored as days > * *Timestamp* stored as microseconds -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org