GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/19607
[SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas to respect session timezone ## What changes were proposed in this pull request? When converting Pandas DataFrame/Series from/to Spark DataFrame using `toPandas()` or pandas udfs, timestamp values behave to respect Python system timezone instead of session timezone. For example, let's say we use `"America/Los_Angeles"` as session timezone and have a timestamp value `"1970-01-01 00:00:01"` in the timezone. Btw, I'm in Japan so Python timezone would be `"Asia/Tokyo"`. The timestamp value from current `toPandas()` will be the following: ``` >>> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") >>> df = spark.createDataFrame([28801], "long").selectExpr("timestamp(value) as ts") >>> df.show() +-------------------+ | ts| +-------------------+ |1970-01-01 00:00:01| +-------------------+ >>> df.toPandas() ts 0 1970-01-01 17:00:01 ``` As you can see, the value becomes `"1970-01-01 17:00:01"` because it respects Python timezone. As we discussed in #18664, we consider this behavior is a bug and the value should be `"1970-01-01 00:00:01"`. ## How was this patch tested? Added tests and existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-22395 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19607.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19607 ---- commit 4735e5981ecf3a4bce50ce86f706e25830f4a801 Author: Takuya UESHIN <ues...@databricks.com> Date: 2017-10-23T06:27:22Z Add a conf to make Pandas DataFrame respect session local timezone. commit 1f85150dc5b26df21dca6bad2ef4eaec342c4400 Author: Takuya UESHIN <ues...@databricks.com> Date: 2017-10-23T08:09:16Z Fix toPandas() behavior. commit 5c08ecf247bfe7e14afcdef8eba1c25cb3b68634 Author: Takuya UESHIN <ues...@databricks.com> Date: 2017-10-23T09:15:47Z Modify pandas UDFs to respect session timezone. ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org