[jira] [Assigned] (SPARK-22395) Fix the behavior of timestamp values for Pandas to respect session timezone
[ https://issues.apache.org/jira/browse/SPARK-22395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-22395: --- Assignee: Takuya Ueshin > Fix the behavior of timestamp values for Pandas to respect session timezone > --- > > Key: SPARK-22395 > URL: https://issues.apache.org/jira/browse/SPARK-22395 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.3.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin > Labels: release-notes > Fix For: 2.3.0 > > > When converting Pandas DataFrame/Series from/to Spark DataFrame using > {{toPandas()}} or pandas udfs, timestamp values behave to respect Python > system timezone instead of session timezone. > For example, let's say we use {{"America/Los_Angeles"}} as session timezone > and have a timestamp value {{"1970-01-01 00:00:01"}} in the timezone. Btw, > I'm in Japan so Python timezone would be {{"Asia/Tokyo"}}. > The timestamp value from current {{toPandas()}} will be the following: > {noformat} > >>> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > >>> df = spark.createDataFrame([28801], "long").selectExpr("timestamp(value) > >>> as ts") > >>> df.show() > +---+ > | ts| > +---+ > |1970-01-01 00:00:01| > +---+ > >>> df.toPandas() >ts > 0 1970-01-01 17:00:01 > {noformat} > As you can see, the value becomes {{"1970-01-01 17:00:01"}} because it > respects Python timezone. > As we discussed in https://github.com/apache/spark/pull/18664, we consider > this behavior is a bug and the value should be {{"1970-01-01 00:00:01"}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22395) Fix the behavior of timestamp values for Pandas to respect session timezone
[ https://issues.apache.org/jira/browse/SPARK-22395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22395: Assignee: (was: Apache Spark) > Fix the behavior of timestamp values for Pandas to respect session timezone > --- > > Key: SPARK-22395 > URL: https://issues.apache.org/jira/browse/SPARK-22395 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.3.0 >Reporter: Takuya Ueshin > > When converting Pandas DataFrame/Series from/to Spark DataFrame using > {{toPandas()}} or pandas udfs, timestamp values behave to respect Python > system timezone instead of session timezone. > For example, let's say we use {{"America/Los_Angeles"}} as session timezone > and have a timestamp value {{"1970-01-01 00:00:01"}} in the timezone. Btw, > I'm in Japan so Python timezone would be {{"Asia/Tokyo"}}. > The timestamp value from current {{toPandas()}} will be the following: > {noformat} > >>> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > >>> df = spark.createDataFrame([28801], "long").selectExpr("timestamp(value) > >>> as ts") > >>> df.show() > +---+ > | ts| > +---+ > |1970-01-01 00:00:01| > +---+ > >>> df.toPandas() >ts > 0 1970-01-01 17:00:01 > {noformat} > As you can see, the value becomes {{"1970-01-01 17:00:01"}} because it > respects Python timezone. > As we discussed in https://github.com/apache/spark/pull/18664, we consider > this behavior is a bug and the value should be {{"1970-01-01 00:00:01"}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22395) Fix the behavior of timestamp values for Pandas to respect session timezone
[ https://issues.apache.org/jira/browse/SPARK-22395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22395: Assignee: Apache Spark > Fix the behavior of timestamp values for Pandas to respect session timezone > --- > > Key: SPARK-22395 > URL: https://issues.apache.org/jira/browse/SPARK-22395 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.3.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark > > When converting Pandas DataFrame/Series from/to Spark DataFrame using > {{toPandas()}} or pandas udfs, timestamp values behave to respect Python > system timezone instead of session timezone. > For example, let's say we use {{"America/Los_Angeles"}} as session timezone > and have a timestamp value {{"1970-01-01 00:00:01"}} in the timezone. Btw, > I'm in Japan so Python timezone would be {{"Asia/Tokyo"}}. > The timestamp value from current {{toPandas()}} will be the following: > {noformat} > >>> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") > >>> df = spark.createDataFrame([28801], "long").selectExpr("timestamp(value) > >>> as ts") > >>> df.show() > +---+ > | ts| > +---+ > |1970-01-01 00:00:01| > +---+ > >>> df.toPandas() >ts > 0 1970-01-01 17:00:01 > {noformat} > As you can see, the value becomes {{"1970-01-01 17:00:01"}} because it > respects Python timezone. > As we discussed in https://github.com/apache/spark/pull/18664, we consider > this behavior is a bug and the value should be {{"1970-01-01 00:00:01"}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org