Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/18664 Ok sounds good. Could I get some opinions on the best way to convert internal Spark timestamps since they are stored as UTC time? I think we have the following options: 1. Write Arrow data with SESSION_LOCAL timestamp (as is currently in this PR), then convert to local timezone without timestamp in Python after the data is loaded into Pandas. This would be at the end of `toPandas()` or just before the user function is called in `pandas_udf`s, and convert back to UTC again just after. 2. Convert Spark internal data to local timezone without timestamp in Scala and write to Arrow data as timezone naive. With (1) it's easy to do the conversion with Pandas, but we have to make sure it gets done at multiple places. With (2), it's just in one spot but I'm not sure if it's possible to end up doing the conversion more than once
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org