Amruth Ashok created SPARK-54908:
------------------------------------

             Summary: dateFormat option is ignored during schema inference for 
JSON files
                 Key: SPARK-54908
                 URL: https://issues.apache.org/jira/browse/SPARK-54908
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.0.0, 3.5.2, 3.5.0, 3.4.1, 3.3.2
         Environment: Tested in Databricks on DBR 16.4 LTS. Similar behavior in 
other DBR versions as well.
Appears to be a core spark issue in 
[JsonInferSchema.scala|https://github.com/apache/spark/blob/8fe006b20877671c75e4650a27d268b496294299/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala#L42]
            Reporter: Amruth Ashok


When using COPY INTO ... FILEFORMAT = JSON with schema inference, the 
dateFormat option is ignored during schema inference, timestampFormat works. 
This caused date-only strings to be inferred as StringType instead of DateType.


Example:

test.json

{

  "created_at": "02JUL14",

  "updated_at": "02JUL14 12:17:43.39 UTC"

}


code:

Using COPY INTO in JSON

COPY INTO my_table

FROM '/path/to/test.json'

FILEFORMAT = JSON

OPTIONS (

  inferSchema = true,

  inferTimestamp = true,

  timestampFormat = "ddMMMyy HH:mm:ss.SSS 'UTC'",

  dateFormat = "ddMMMyy"

)

 

{*}Observed behavior:{*}{*}{*}
 * created_at: string (should be date)
 * updated_at: timestamp (correct)

{*}Expected behavior:{*}{*}{*}
 * created_at: date (correct)
 * updated_at: timestamp (correct)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to