[ https://issues.apache.org/jira/browse/SPARK-26325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276711#comment-17276711 ]
Daniel Himmelstein edited comment on SPARK-26325 at 2/1/21, 10:53 PM: ---------------------------------------------------------------------- Here's the code from the original post, but using an RDD rather than JSON file and applying [~maxgekk]'s suggestion to "try Z instead of 'Z'": {code:python} line = '{"time_field" : "2017-09-30 04:53:39.412496Z"}' rdd = spark.sparkContext.parallelize([line]) ( spark.read .option("timestampFormat", "yyyy-MM-dd HH:mm:ss.SSSSSSZ") .json(path=rdd) ){code} The output I get with pyspark 3.0.1 is `DataFrame[time_field: string]`. So it looks like the issue remains. I'd be interested if there are any examples where spark infers a date or timestamp from a JSON string or whether dateFormat and timestampFormat do not work at all? was (Author: dhimmel): Here's the code from the original post, but using an RDD rather than JSON file and applying [~maxgekk]'s suggestion to "try Z instead of 'Z'": {code:python} line = '{"time_field" : "2017-09-30 04:53:39.412496Z"}' rdd = spark.sparkContext.parallelize([line]) ( spark.read .option("timestampFormat", "yyyy-MM-dd HH:mm:ss.SSSSSSZ") .json(path=rdd) ){code} The output I get with pyspark 3.0.1 is `DataFrame[time_field: string]`. So it looks like the issue remains. I'd be interested if there are any examples where spark infers a timestamp from a JSON string or whether timestampFormat does not work at all? > Interpret timestamp fields in Spark while reading json (timestampFormat) > ------------------------------------------------------------------------ > > Key: SPARK-26325 > URL: https://issues.apache.org/jira/browse/SPARK-26325 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Veenit Shah > Priority: Major > > I am trying to read a pretty printed json which has time fields in it. I want > to interpret the timestamps columns as timestamp fields while reading the > json itself. However, it's still reading them as string when I {{printSchema}} > E.g. Input json file - > {code:java} > [{ > "time_field" : "2017-09-30 04:53:39.412496Z" > }] > {code} > Code - > {code:java} > df = spark.read.option("multiLine", > "true").option("timestampFormat","yyyy-MM-dd > HH:mm:ss.SSSSSS'Z'").json('path_to_json_file') > {code} > Output of df.printSchema() - > {code:java} > root > |-- time_field: string (nullable = true) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org