[ 
https://issues.apache.org/jira/browse/SPARK-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737375#comment-14737375
 ] 

Davies Liu commented on SPARK-10519:
------------------------------------

+1 for 3, user have the ability to control timezone, it's also compatible. 

> Investigate if we should encode timezone information to a timestamp value 
> stored in JSON
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-10519
>                 URL: https://issues.apache.org/jira/browse/SPARK-10519
>             Project: Spark
>          Issue Type: Task
>          Components: SQL
>            Reporter: Yin Huai
>            Priority: Minor
>
> Since Spark 1.3, we store a timestamp in JSON without encoding the timezone 
> information and the string representation of a timestamp stored in JSON 
> implicitly using the local timezone (see 
> [1|https://github.com/apache/spark/blob/branch-1.3/sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala#L454],
>  
> [2|https://github.com/apache/spark/blob/branch-1.4/sql/core/src/main/scala/org/apache/spark/sql/json/JacksonGenerator.scala#L38],
>  
> [3|https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala#L41],
>  
> [4|https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala#L93]).
>  This behavior may cause the data consumers got different values when they 
> are in a different timezone with the data producers.
> Since JSON is string based, if we encode timezone information to timestamp 
> value, downstream applications may need to change their code (for example, 
> java.sql.Timestamp.valueOf only supports the format of {{yyyy-\[m]m-\[d]d 
> hh:mm:ss\[.f...]}}).
> We should investigate what we should do about this issue. Right now, I can 
> think of three options:
> 1. Encoding timezone info in the timestamp value, which can break user code 
> and may change the semantic of timestamp (our timestamp value is 
> timezone-less).
> 2. When saving a timestamp value to json, we treat this value as a value in 
> the local timezone and convert it to UTC time. Then, when save the data, we 
> do not encode timezone info in the value.
> 3. We do not change our current behavior. But, in our doc, we explicitly say 
> that users need to use a single timezone for their datasets (e.g. always use 
> UTC time). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to