[ https://issues.apache.org/jira/browse/SPARK-30767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033293#comment-17033293 ]
Maxim Gekk edited comment on SPARK-30767 at 2/9/20 9:08 PM: ------------------------------------------------------------ The default timestamp pattern in JSON datasource specifies only milliseconds but your input strings have timestamps in microsecond precision. You can change the pattern via: {code:scala} from_json(col("json"), struct, Map("timestampFormat" -> "uuuu-MM-dd'T'HH:mm:ss.SSSSSSXXX") {code} Just in case, it should work in Spark 3.0 preview and in Spark 2.4.5 was (Author: maxgekk): The default timestamp pattern in JSON datasource specifies only milliseconds but your input strings have timestamps in microsecond precision. You can change the pattern via: {code:scala} from_json(col("json"), struct, Map("timestampFormat" -> "uuuu-MM-dd'T'HH:mm:ss.SSSSSSXXX") {code} > from_json changes times of timestmaps by several minutes without error > ----------------------------------------------------------------------- > > Key: SPARK-30767 > URL: https://issues.apache.org/jira/browse/SPARK-30767 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.4 > Environment: We ran the example code with Spark 2.4.4 via Azure > Databricks with Databricks Runtime version 6.3 within an interactive cluster. > We encountered the issue first on a Job Cluster running a streaming > application on Databricks Runtime Version 5.4. > Reporter: Benedikt Maria Beckermann > Priority: Major > Labels: corruption > > When a json text column includes a timestamp and the timestamp has a format > like {{2020-01-25T06:39:45.887429Z}}, the function > {{from_json(Column,StructType)}} is able to infer a timestamp but that > timestamp is changed by several minutes. > Spark does not throw any kind of error but continues to run with the > invalidated timestamp. > The following scala snipped is able to reproduce the issue. > > {code:scala} > import org.apache.spark.sql._ > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.types._ > val df = Seq("""{"time":"2020-01-25T06:39:45.887429Z"}""").toDF("json") > val struct = new StructType().add("time", TimestampType, nullable = true) > val timeDF = df > .withColumn("time (string)", get_json_object(col("json"), "$.time")) > .withColumn("time casted directly (CORRECT)", col("time > (string)").cast(TimestampType)) > .withColumn("time casted via struct (INVALID)", from_json(col("json"), > struct)) > display(timeDF) > {code} > Output: > ||json||time (string)||time casted directly (CORRECT)||time casted via struct > (INVALID) > |{"time":"2020-01-25T06:39:45.887429Z"}|2020-01-25T06:39:45.887429Z|2020-01-25T06:39:45.887+0000|{"time":"2020-01-25T06:54:32.429+0000"} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org