[jira] [Comment Edited] (SPARK-17914) Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp

Oksana Romankova (JIRA) Thu, 13 Oct 2016 12:42:35 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572957#comment-15572957
 ]


Oksana Romankova edited comment on SPARK-17914 at 10/13/16 7:42 PM:
--------------------------------------------------------------------

Sean, I can't find any evidence of ISO8601 not supporting nanoseconds. All it 
says that it supports fraction of a second that should be supplied following 
comma or dot. Different parsing libraries that support ISO8601 have different 
precision limitations. For instance in Python, datetime.strptime() only 
supports precision down to microseconds and will throw an exception if 
nanoseconds were supplied in input string. While it may not be ideal for those 
who need to be able to retain nanosecond precision after parsing, it is an 
acceptable behavior. Those who do not need to retain nanosecond precision can 
catch, or, preemptively, truncate input string. Spark sql 
DateTimeUtils.stringToTimestamp() doesn't throw, and doesn't truncate properly, 
which results in incorrect timestamp. In the example above, the acceptable 
truncation would be:

"2016-05-14T15:12:14.0034567Z" -> "2016-05-14 15:12:14.003456"
"2016-05-14T15:12:14.000345678Z" -> "2016-05-14 15:12:14.000345"



was (Author: oromank...@cardlytics.com):
Sean, I can't find any evidence of ISO8601 not supporting nanoseconds. All it 
says that it supports fraction of a second that should be supplied following 
comma or dot. Different parsing libraries that support ISO8601 have different 
precision limitations. For instance in Python, datetime.strptime() only 
supports precision down to microseconds and will throw an exception if 
nanoseconds were supplied in input string. While it may not be ideal for those 
who need to be able to retain nanosecond precision after parsing, it is an 
acceptable behavior. Those who do not need to retain nanosecond precision can 
catch, or, preemptively, truncate input string. Spark sql 
DateTimeUtils.stringToTimestamp() doesn't throw, and doesn't truncate properly, 
which results in incorrect timestamp. In the example above, the acceptable 
truncation would be:

```
"2016-05-14T15:12:14.0034567Z" -> "2016-05-14 15:12:14.003456"
"2016-05-14T15:12:14.000345678Z" -> "2016-05-14 15:12:14.000345"
```

> Spark SQL casting to TimestampType with nanosecond results in incorrect 
> timestamp
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-17914
>                 URL: https://issues.apache.org/jira/browse/SPARK-17914
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Oksana Romankova
>
> In some cases when timestamps contain nanoseconds they will be parsed 
> incorrectly. 
> Examples: 
> "2016-05-14T15:12:14.0034567Z" -> "2016-05-14 15:12:14.034567"
> "2016-05-14T15:12:14.000345678Z" -> "2016-05-14 15:12:14.345678"
> The issue seems to be happening in DateTimeUtils.stringToTimestamp(). It 
> assumes that only 6 digit fraction of a second will be passed.
> With this being the case I would suggest either discarding nanoseconds 
> automatically, or throw an exception prompting to pre-format timestamps to 
> microsecond precision first before casting to the Timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-17914) Spark SQL casting to TimestampType with nanosecond results in incorrect timestamp

Reply via email to