[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

BryanCutler Mon, 09 Oct 2017 17:00:38 -0700

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/18664
  
    Ok sounds good.  Could I get some opinions on the best way to convert 
internal Spark timestamps since they are stored as UTC time?  I think we have 
the following options:
    
    1. Write Arrow data with SESSION_LOCAL timestamp (as is currently in this 
PR), then convert to local timezone without timestamp in Python after the data 
is loaded into Pandas.  This would be at the end of `toPandas()` or just before 
the user function is called in `pandas_udf`s, and convert back to UTC again 
just after.
    
    2. Convert Spark internal data to local timezone without timestamp in Scala 
and write to Arrow data as timezone naive.
    
    With (1) it's easy to do the conversion with Pandas, but we have to make 
sure it gets done at multiple places.  With (2), it's just in one spot but I'm 
not sure if it's possible to end up doing the conversion more than once



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

Reply via email to