[ https://issues.apache.org/jira/browse/SPARK-17914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750292#comment-16750292 ]
Chaitanya P Chandurkar edited comment on SPARK-17914 at 1/23/19 6:25 PM: ------------------------------------------------------------------------- I'm still seeing this issue in Spark 2.4.0 when using from_json() function. In ISO Zulu format datetime, it is not interpreting the timezone accurately after certain number of digits. Every digit added after 3rd digit in the timestamp is adding up more seconds to the parsed datetime. For example, This datetime: "2019-01-23T17:50:29.9991Z" when parsed using spark's build-in from_json() function results in "2019-01-23T17:50:38.991+0000" ( Note the number of seconds added ) If I'm not wrong from_json() internally uses the Jackson JSON library. I'm not sure if the bug is within that or within spark. {code:java} // Create Schema to Parse JSON val sc = StructType( StructField( "date", TimestampType ):: Nil ){code} {code:java} // Sample JSON Parsing using schema created Seq( """{"date": "2019-01-22T18:33:39.134232733Z"}""" ) .toDF( "data" ) .withColumn( "parsed", from_json( $"data", sc ) ) {code} This results in date being "2019-01-24T07:50:51.733+0000" ( Note the difference of 2 days ) was (Author: cchandurkar): I'm still seeing this issue in Spark 2.4.0 when using from_json() function. In ISO Zulu format datetime, it is not interpreting the timezone accurately after certain number of digits. Every digit added after 3rd digit in the timestamp is adding up more seconds to the parsed datetime. For example, This datetime: "2019-01-23T17:50:29.9991Z" when parsed using spark's build-in from_json() function results in "2019-01-23T17:50:38.991+0000" ( Note the number of seconds added ) If I'm not wrong from_json() internally uses the Jackson JSON library. I'm not sure if the bug is within that or within spark. > Spark SQL casting to TimestampType with nanosecond results in incorrect > timestamp > --------------------------------------------------------------------------------- > > Key: SPARK-17914 > URL: https://issues.apache.org/jira/browse/SPARK-17914 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.1 > Reporter: Oksana Romankova > Assignee: Anton Okolnychyi > Priority: Major > Fix For: 2.2.0, 2.3.0 > > > In some cases when timestamps contain nanoseconds they will be parsed > incorrectly. > Examples: > "2016-05-14T15:12:14.0034567Z" -> "2016-05-14 15:12:14.034567" > "2016-05-14T15:12:14.000345678Z" -> "2016-05-14 15:12:14.345678" > The issue seems to be happening in DateTimeUtils.stringToTimestamp(). It > assumes that only 6 digit fraction of a second will be passed. > With this being the case I would suggest either discarding nanoseconds > automatically, or throw an exception prompting to pre-format timestamps to > microsecond precision first before casting to the Timestamp. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org