GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/19607

    [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas 
to respect session timezone

    ## What changes were proposed in this pull request?
    
    When converting Pandas DataFrame/Series from/to Spark DataFrame using 
`toPandas()` or pandas udfs, timestamp values behave to respect Python system 
timezone instead of session timezone.
    
    For example, let's say we use `"America/Los_Angeles"` as session timezone 
and have a timestamp value `"1970-01-01 00:00:01"` in the timezone. Btw, I'm in 
Japan so Python timezone would be `"Asia/Tokyo"`.
    
    The timestamp value from current `toPandas()` will be the following:
    
    ```
    >>> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
    >>> df = spark.createDataFrame([28801], 
"long").selectExpr("timestamp(value) as ts")
    >>> df.show()
    +-------------------+
    |                 ts|
    +-------------------+
    |1970-01-01 00:00:01|
    +-------------------+
    
    >>> df.toPandas()
                       ts
    0 1970-01-01 17:00:01
    ```
    
    As you can see, the value becomes `"1970-01-01 17:00:01"` because it 
respects Python timezone.
    As we discussed in #18664, we consider this behavior is a bug and the value 
should be `"1970-01-01 00:00:01"`.
    
    ## How was this patch tested?
    
    Added tests and existing tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-22395

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19607.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19607
    
----
commit 4735e5981ecf3a4bce50ce86f706e25830f4a801
Author: Takuya UESHIN <ues...@databricks.com>
Date:   2017-10-23T06:27:22Z

    Add a conf to make Pandas DataFrame respect session local timezone.

commit 1f85150dc5b26df21dca6bad2ef4eaec342c4400
Author: Takuya UESHIN <ues...@databricks.com>
Date:   2017-10-23T08:09:16Z

    Fix toPandas() behavior.

commit 5c08ecf247bfe7e14afcdef8eba1c25cb3b68634
Author: Takuya UESHIN <ues...@databricks.com>
Date:   2017-10-23T09:15:47Z

    Modify pandas UDFs to respect session timezone.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to