[jira] [Commented] (SPARK-22632) Fix the behavior of timestamp values for R's DataFrame to respect session timezone
[ https://issues.apache.org/jira/browse/SPARK-22632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610083#comment-16610083 ] Felix Cheung commented on SPARK-22632: -- mismatch between R and JVM time zone could be an issue but not a blocker for release. let's move to 3.0 > Fix the behavior of timestamp values for R's DataFrame to respect session > timezone > -- > > Key: SPARK-22632 > URL: https://issues.apache.org/jira/browse/SPARK-22632 > Project: Spark > Issue Type: Bug > Components: SparkR, SQL >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon >Priority: Major > > Note: wording is borrowed from SPARK-22395. Symptom is similar and I think > that JIRA is well descriptive. > When converting R's DataFrame from/to Spark DataFrame using > {{createDataFrame}} or {{collect}}, timestamp values behave to respect R > system timezone instead of session timezone. > For example, let's say we use "America/Los_Angeles" as session timezone and > have a timestamp value "1970-01-01 00:00:01" in the timezone. Btw, I'm in > South Korea so R timezone would be "KST". > The timestamp value from current collect() will be the following: > {code} > > sparkR.session(master = "local[*]", sparkConfig = > > list(spark.sql.session.timeZone = "America/Los_Angeles")) > > collect(sql("SELECT cast(cast(28801 as timestamp) as string) as ts")) >ts > 1 1970-01-01 00:00:01 > > collect(sql("SELECT cast(28801 as timestamp) as ts")) >ts > 1 1970-01-01 17:00:01 > {code} > As you can see, the value becomes "1970-01-01 17:00:01" because it respects R > system timezone. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22632) Fix the behavior of timestamp values for R's DataFrame to respect session timezone
[ https://issues.apache.org/jira/browse/SPARK-22632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609292#comment-16609292 ] Wenchen Fan commented on SPARK-22632: - Is this still a problem now? > Fix the behavior of timestamp values for R's DataFrame to respect session > timezone > -- > > Key: SPARK-22632 > URL: https://issues.apache.org/jira/browse/SPARK-22632 > Project: Spark > Issue Type: Bug > Components: SparkR, SQL >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon >Priority: Major > > Note: wording is borrowed from SPARK-22395. Symptom is similar and I think > that JIRA is well descriptive. > When converting R's DataFrame from/to Spark DataFrame using > {{createDataFrame}} or {{collect}}, timestamp values behave to respect R > system timezone instead of session timezone. > For example, let's say we use "America/Los_Angeles" as session timezone and > have a timestamp value "1970-01-01 00:00:01" in the timezone. Btw, I'm in > South Korea so R timezone would be "KST". > The timestamp value from current collect() will be the following: > {code} > > sparkR.session(master = "local[*]", sparkConfig = > > list(spark.sql.session.timeZone = "America/Los_Angeles")) > > collect(sql("SELECT cast(cast(28801 as timestamp) as string) as ts")) >ts > 1 1970-01-01 00:00:01 > > collect(sql("SELECT cast(28801 as timestamp) as ts")) >ts > 1 1970-01-01 17:00:01 > {code} > As you can see, the value becomes "1970-01-01 17:00:01" because it respects R > system timezone. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22632) Fix the behavior of timestamp values for R's DataFrame to respect session timezone
[ https://issues.apache.org/jira/browse/SPARK-22632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316915#comment-16316915 ] Sameer Agarwal commented on SPARK-22632: Thanks guys, I'll move this to 2.4.0 > Fix the behavior of timestamp values for R's DataFrame to respect session > timezone > -- > > Key: SPARK-22632 > URL: https://issues.apache.org/jira/browse/SPARK-22632 > Project: Spark > Issue Type: Bug > Components: SparkR, SQL >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon > > Note: wording is borrowed from SPARK-22395. Symptom is similar and I think > that JIRA is well descriptive. > When converting R's DataFrame from/to Spark DataFrame using > {{createDataFrame}} or {{collect}}, timestamp values behave to respect R > system timezone instead of session timezone. > For example, let's say we use "America/Los_Angeles" as session timezone and > have a timestamp value "1970-01-01 00:00:01" in the timezone. Btw, I'm in > South Korea so R timezone would be "KST". > The timestamp value from current collect() will be the following: > {code} > > sparkR.session(master = "local[*]", sparkConfig = > > list(spark.sql.session.timeZone = "America/Los_Angeles")) > > collect(sql("SELECT cast(cast(28801 as timestamp) as string) as ts")) >ts > 1 1970-01-01 00:00:01 > > collect(sql("SELECT cast(28801 as timestamp) as ts")) >ts > 1 1970-01-01 17:00:01 > {code} > As you can see, the value becomes "1970-01-01 17:00:01" because it respects R > system timezone. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22632) Fix the behavior of timestamp values for R's DataFrame to respect session timezone
[ https://issues.apache.org/jira/browse/SPARK-22632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16315436#comment-16315436 ] Felix Cheung commented on SPARK-22632: -- yes, first I'd agree we should generalize this to R & Python second, I think in general the different treatment of timezone between language and Spark has been a source of confusion (has been reported at least a few times) lastly, this isn't a regression AFAIK, so not necessarily a blocker for 2.3, although might be very good to have. > Fix the behavior of timestamp values for R's DataFrame to respect session > timezone > -- > > Key: SPARK-22632 > URL: https://issues.apache.org/jira/browse/SPARK-22632 > Project: Spark > Issue Type: Bug > Components: SparkR, SQL >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon > > Note: wording is borrowed from SPARK-22395. Symptom is similar and I think > that JIRA is well descriptive. > When converting R's DataFrame from/to Spark DataFrame using > {{createDataFrame}} or {{collect}}, timestamp values behave to respect R > system timezone instead of session timezone. > For example, let's say we use "America/Los_Angeles" as session timezone and > have a timestamp value "1970-01-01 00:00:01" in the timezone. Btw, I'm in > South Korea so R timezone would be "KST". > The timestamp value from current collect() will be the following: > {code} > > sparkR.session(master = "local[*]", sparkConfig = > > list(spark.sql.session.timeZone = "America/Los_Angeles")) > > collect(sql("SELECT cast(cast(28801 as timestamp) as string) as ts")) >ts > 1 1970-01-01 00:00:01 > > collect(sql("SELECT cast(28801 as timestamp) as ts")) >ts > 1 1970-01-01 17:00:01 > {code} > As you can see, the value becomes "1970-01-01 17:00:01" because it respects R > system timezone. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22632) Fix the behavior of timestamp values for R's DataFrame to respect session timezone
[ https://issues.apache.org/jira/browse/SPARK-22632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312356#comment-16312356 ] Hyukjin Kwon commented on SPARK-22632: -- To me, nope, I don't think so although it might be important to have. In case of PySpark <> Pandas related one, it was fixed with a configuration to control the behaviour. I was trying to take a look at that time but I am not sure if it's safe to have this at this stage and I can make it within 2.3.0 timeline too ... PySpark itself also still has the issue too. FYI. > Fix the behavior of timestamp values for R's DataFrame to respect session > timezone > -- > > Key: SPARK-22632 > URL: https://issues.apache.org/jira/browse/SPARK-22632 > Project: Spark > Issue Type: Bug > Components: SparkR, SQL >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon > > Note: wording is borrowed from SPARK-22395. Symptom is similar and I think > that JIRA is well descriptive. > When converting R's DataFrame from/to Spark DataFrame using > {{createDataFrame}} or {{collect}}, timestamp values behave to respect R > system timezone instead of session timezone. > For example, let's say we use "America/Los_Angeles" as session timezone and > have a timestamp value "1970-01-01 00:00:01" in the timezone. Btw, I'm in > South Korea so R timezone would be "KST". > The timestamp value from current collect() will be the following: > {code} > > sparkR.session(master = "local[*]", sparkConfig = > > list(spark.sql.session.timeZone = "America/Los_Angeles")) > > collect(sql("SELECT cast(cast(28801 as timestamp) as string) as ts")) >ts > 1 1970-01-01 00:00:01 > > collect(sql("SELECT cast(28801 as timestamp) as ts")) >ts > 1 1970-01-01 17:00:01 > {code} > As you can see, the value becomes "1970-01-01 17:00:01" because it respects R > system timezone. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22632) Fix the behavior of timestamp values for R's DataFrame to respect session timezone
[ https://issues.apache.org/jira/browse/SPARK-22632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312287#comment-16312287 ] Sameer Agarwal commented on SPARK-22632: [~hyukjin.kwon] [~felixcheung] should this be a blocker for 2.3? cc [~ueshin] > Fix the behavior of timestamp values for R's DataFrame to respect session > timezone > -- > > Key: SPARK-22632 > URL: https://issues.apache.org/jira/browse/SPARK-22632 > Project: Spark > Issue Type: Bug > Components: SparkR, SQL >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon > > Note: wording is borrowed from SPARK-22395. Symptom is similar and I think > that JIRA is well descriptive. > When converting R's DataFrame from/to Spark DataFrame using > {{createDataFrame}} or {{collect}}, timestamp values behave to respect R > system timezone instead of session timezone. > For example, let's say we use "America/Los_Angeles" as session timezone and > have a timestamp value "1970-01-01 00:00:01" in the timezone. Btw, I'm in > South Korea so R timezone would be "KST". > The timestamp value from current collect() will be the following: > {code} > > sparkR.session(master = "local[*]", sparkConfig = > > list(spark.sql.session.timeZone = "America/Los_Angeles")) > > collect(sql("SELECT cast(cast(28801 as timestamp) as string) as ts")) >ts > 1 1970-01-01 00:00:01 > > collect(sql("SELECT cast(28801 as timestamp) as ts")) >ts > 1 1970-01-01 17:00:01 > {code} > As you can see, the value becomes "1970-01-01 17:00:01" because it respects R > system timezone. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22632) Fix the behavior of timestamp values for R's DataFrame to respect session timezone
[ https://issues.apache.org/jira/browse/SPARK-22632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16301006#comment-16301006 ] Felix Cheung commented on SPARK-22632: -- how are we on this for 2.3? > Fix the behavior of timestamp values for R's DataFrame to respect session > timezone > -- > > Key: SPARK-22632 > URL: https://issues.apache.org/jira/browse/SPARK-22632 > Project: Spark > Issue Type: Bug > Components: SparkR, SQL >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon > > Note: wording is borrowed from SPARK-22395. Symptom is similar and I think > that JIRA is well descriptive. > When converting R's DataFrame from/to Spark DataFrame using > {{createDataFrame}} or {{collect}}, timestamp values behave to respect R > system timezone instead of session timezone. > For example, let's say we use "America/Los_Angeles" as session timezone and > have a timestamp value "1970-01-01 00:00:01" in the timezone. Btw, I'm in > South Korea so R timezone would be "KST". > The timestamp value from current collect() will be the following: > {code} > > sparkR.session(master = "local[*]", sparkConfig = > > list(spark.sql.session.timeZone = "America/Los_Angeles")) > > collect(sql("SELECT cast(cast(28801 as timestamp) as string) as ts")) >ts > 1 1970-01-01 00:00:01 > > collect(sql("SELECT cast(28801 as timestamp) as ts")) >ts > 1 1970-01-01 17:00:01 > {code} > As you can see, the value becomes "1970-01-01 17:00:01" because it respects R > system timezone. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22632) Fix the behavior of timestamp values for R's DataFrame to respect session timezone
[ https://issues.apache.org/jira/browse/SPARK-22632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274029#comment-16274029 ] Felix Cheung commented on SPARK-22632: -- interesting re: timezone on macOS https://cran.r-project.org/src/base/NEWS > Fix the behavior of timestamp values for R's DataFrame to respect session > timezone > -- > > Key: SPARK-22632 > URL: https://issues.apache.org/jira/browse/SPARK-22632 > Project: Spark > Issue Type: Bug > Components: SparkR, SQL >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon > > Note: wording is borrowed from SPARK-22395. Symptom is similar and I think > that JIRA is well descriptive. > When converting R's DataFrame from/to Spark DataFrame using > {{createDataFrame}} or {{collect}}, timestamp values behave to respect R > system timezone instead of session timezone. > For example, let's say we use "America/Los_Angeles" as session timezone and > have a timestamp value "1970-01-01 00:00:01" in the timezone. Btw, I'm in > South Korea so R timezone would be "KST". > The timestamp value from current collect() will be the following: > {code} > > sparkR.session(master = "local[*]", sparkConfig = > > list(spark.sql.session.timeZone = "America/Los_Angeles")) > > collect(sql("SELECT cast(cast(28801 as timestamp) as string) as ts")) >ts > 1 1970-01-01 00:00:01 > > collect(sql("SELECT cast(28801 as timestamp) as ts")) >ts > 1 1970-01-01 17:00:01 > {code} > As you can see, the value becomes "1970-01-01 17:00:01" because it respects R > system timezone. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org