[jira] [Created] (SPARK-22395) Fix the behavior of timestamp values for Pandas to respect session timezone

Takuya Ueshin (JIRA) Sun, 29 Oct 2017 21:08:59 -0700

Takuya Ueshin created SPARK-22395:
-------------------------------------

             Summary: Fix the behavior of timestamp values for Pandas to 
respect session timezone
                 Key: SPARK-22395
                 URL: https://issues.apache.org/jira/browse/SPARK-22395
             Project: Spark
          Issue Type: Bug
          Components: PySpark, SQL
    Affects Versions: 2.3.0
            Reporter: Takuya Ueshin



When converting Pandas DataFrame/Series from/to Spark DataFrame using 
{{toPandas()}} or pandas udfs, timestamp values behave to respect Python system 
timezone instead of session timezone.


For example, let's say we use {{"America/Los_Angeles"}} as session timezone and 
have a timestamp value {{"1970-01-01 00:00:01"}} in the timezone. Btw, I'm in 
Japan so Python timezone would be {{"Asia/Tokyo"}}.

The timestamp value from current {{toPandas()}} will be the following:

{noformat}
>>> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
>>> df = spark.createDataFrame([28801], "long").selectExpr("timestamp(value) as 
>>> ts")
>>> df.show()
+-------------------+
|                 ts|
+-------------------+
|1970-01-01 00:00:01|
+-------------------+

>>> df.toPandas()
                   ts
0 1970-01-01 17:00:01
{noformat}

As you can see, the value becomes {{"1970-01-01 17:00:01"}} because it respects 
Python timezone.


As we discussed in https://github.com/apache/spark/pull/18664, we consider this 
behavior is a bug and the value should be {{"1970-01-01 00:00:01"}}.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-22395) Fix the behavior of timestamp values for Pandas to respect session timezone

Reply via email to