[ https://issues.apache.org/jira/browse/SPARK-17914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950932#comment-16950932 ]
Alexandre Gattiker edited comment on SPARK-17914 at 10/14/19 11:36 AM: ----------------------------------------------------------------------- As reported by other commenters, the issue is still outstanding with from_json in Spark 2.4.3 (Azure Databricks 5.5 LTS): {{sc.parallelize(List("2019-10-14T{color:#00875a}09:39{color}:07.3220000Z")).toDF}} {{.select('value.cast("timestamp"))}} {{// 2019-10-14T{color:#00875a}09:39{color}:07.322+0000}} {{// correct time parsing outside of from_json}} {{val schema = StructType(StructField("a", TimestampType, false) :: Nil)}} {{sc.parallelize(List("""{"a":"2019-10-14T}}{color:#00875a}{{09:39}}{color}{{:07.3220000Z"}""")).toDF}} {{.select(from_json('value, schema))}} {{// {"a":"2019-10-14T{color:#de350b}10:32{color}:47.000+0000"}}} {{// wrong time, corresponds to 09:39+3220 seconds}} {{val schema = StructType(StructField("a", TimestampType, false) :: Nil)}} {{sc.parallelize(List("""{"a":"2019-10-14T{color:#00875a}09:39{color}:322000Z"}""")).toDF}} {{.select(from_json('value, schema))}} {{// {"a":"2019-10-14T{color:#de350b}09:44{color}:29.000+0000"}}} {{// wrong time, corresponds to 09:39+322 seconds}} {{val schema = StructType(StructField("a", TimestampType, false) :: Nil)}} {{sc.parallelize(List("""{"a":"2019-10-14T{color:#00875a}09:39{color}:322000Z"}""")).toDF}} {{.select(from_json('value, schema))}} {{// {"a":"2019-10-14T{color:#00875a}09:39{color}:07.322+0000"}}} {{// correct time}} was (Author: agattiker): As reported by other commenters, the issue is still outstanding with from_json in Spark 2.4.3 (Azure Databricks 5.5 LTS): {{sc.parallelize(List("2019-10-14T{color:#00875a}09:39{color}:07.3220000Z")).toDF}} {{.select('value.cast("timestamp"))}} {{// 2019-10-14T{color:#00875a}09:39{color}:07.322+0000}} {{// correct time parsing outside of from_json}} {{val schema = StructType(StructField("a", TimestampType, false) :: Nil)}} {{sc.parallelize(List("""{"a":"2019-10-14T}}{color:#00875a}{{09:39}}{color}{{:07.3220000Z"}""")).toDF}} {{.select(from_json('value, schema))}} {{// {"a":"2019-10-14T{color:#de350b}10:32{color}:47.000+0000"}}} {{// wrong time, corresponds to 09:39+3220 seconds}} {{val schema = StructType(StructField("a", TimestampType, false) :: Nil)}} {{ sc.parallelize(List("""{"a":"2019-10-14T{color:#00875a}09:39{color}:322000Z"}""")).toDF}} {{ .select(from_json('value, schema))}} {{ // {"a":"2019-10-14T{color:#de350b}09:44{color}:29.000+0000"}}} {{ // wrong time, corresponds to 09:39+322 seconds}} {{ val schema = StructType(StructField("a", TimestampType, false) :: Nil)}} {{ sc.parallelize(List("""{"a":"2019-10-14T{color:#00875a}09:39{color}:322000Z"}""")).toDF}} {{ .select(from_json('value, schema))}} {{ // {"a":"2019-10-14T{color:#00875a}09:39{color}:07.322+0000"}}} {{ // correct time}} > Spark SQL casting to TimestampType with nanosecond results in incorrect > timestamp > --------------------------------------------------------------------------------- > > Key: SPARK-17914 > URL: https://issues.apache.org/jira/browse/SPARK-17914 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.1 > Reporter: Oksana Romankova > Assignee: Anton Okolnychyi > Priority: Major > Fix For: 2.2.0, 2.3.0 > > > In some cases when timestamps contain nanoseconds they will be parsed > incorrectly. > Examples: > "2016-05-14T15:12:14.0034567Z" -> "2016-05-14 15:12:14.034567" > "2016-05-14T15:12:14.000345678Z" -> "2016-05-14 15:12:14.345678" > The issue seems to be happening in DateTimeUtils.stringToTimestamp(). It > assumes that only 6 digit fraction of a second will be passed. > With this being the case I would suggest either discarding nanoseconds > automatically, or throw an exception prompting to pre-format timestamps to > microsecond precision first before casting to the Timestamp. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org