[ https://issues.apache.org/jira/browse/SPARK-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737375#comment-14737375 ]
Davies Liu commented on SPARK-10519: ------------------------------------ +1 for 3, user have the ability to control timezone, it's also compatible. > Investigate if we should encode timezone information to a timestamp value > stored in JSON > ---------------------------------------------------------------------------------------- > > Key: SPARK-10519 > URL: https://issues.apache.org/jira/browse/SPARK-10519 > Project: Spark > Issue Type: Task > Components: SQL > Reporter: Yin Huai > Priority: Minor > > Since Spark 1.3, we store a timestamp in JSON without encoding the timezone > information and the string representation of a timestamp stored in JSON > implicitly using the local timezone (see > [1|https://github.com/apache/spark/blob/branch-1.3/sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala#L454], > > [2|https://github.com/apache/spark/blob/branch-1.4/sql/core/src/main/scala/org/apache/spark/sql/json/JacksonGenerator.scala#L38], > > [3|https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala#L41], > > [4|https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala#L93]). > This behavior may cause the data consumers got different values when they > are in a different timezone with the data producers. > Since JSON is string based, if we encode timezone information to timestamp > value, downstream applications may need to change their code (for example, > java.sql.Timestamp.valueOf only supports the format of {{yyyy-\[m]m-\[d]d > hh:mm:ss\[.f...]}}). > We should investigate what we should do about this issue. Right now, I can > think of three options: > 1. Encoding timezone info in the timestamp value, which can break user code > and may change the semantic of timestamp (our timestamp value is > timezone-less). > 2. When saving a timestamp value to json, we treat this value as a value in > the local timezone and convert it to UTC time. Then, when save the data, we > do not encode timezone info in the value. > 3. We do not change our current behavior. But, in our doc, we explicitly say > that users need to use a single timezone for their datasets (e.g. always use > UTC time). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org