Hyukjin Kwon created SPARK-22165: ------------------------------------ Summary: Type conflicts between dates, timestamps and date in partition column Key: SPARK-22165 URL: https://issues.apache.org/jira/browse/SPARK-22165 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0, 2.1.1, 2.3.0 Reporter: Hyukjin Kwon Priority: Minor
It looks we have some bugs when resolving type conflicts in partition column. I found few corner cases as below: Case 1: timestamp should be inferred but date type is inferred. {code} val df = Seq((1, "2015-01-01"), (2, "2016-01-01 00:00:00")).toDF("i", "ts") df.write.format("parquet").partitionBy("ts").save("/tmp/foo") spark.read.load("/tmp/foo").printSchema() {code} {code} root |-- i: integer (nullable = true) |-- ts: date (nullable = true) {code} Case 2: decimal should be inferred but integer is inferred. {code} val df = Seq((1, "1"), (2, "1" * 30)).toDF("i", "decimal") df.write.format("parquet").partitionBy("decimal").save("/tmp/bar") spark.read.load("/tmp/bar").printSchema() {code} {code} root |-- i: integer (nullable = true) |-- decimal: integer (nullable = true) {code} Looks we should de-duplicate type resolution logic if possible rather than separate numeric precedence-like comparison alone. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org