Hyukjin Kwon created SPARK-22165:
------------------------------------

             Summary: Type conflicts between dates, timestamps and date in 
partition column
                 Key: SPARK-22165
                 URL: https://issues.apache.org/jira/browse/SPARK-22165
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.0, 2.1.1, 2.3.0
            Reporter: Hyukjin Kwon
            Priority: Minor


It looks we have some bugs when resolving type conflicts in partition column. I 
found few corner cases as below:

Case 1: timestamp should be inferred but date type is inferred.

{code}
val df = Seq((1, "2015-01-01"), (2, "2016-01-01 00:00:00")).toDF("i", "ts")
df.write.format("parquet").partitionBy("ts").save("/tmp/foo")
spark.read.load("/tmp/foo").printSchema()
{code}

{code}
root
 |-- i: integer (nullable = true)
 |-- ts: date (nullable = true)
{code}

Case 2: decimal should be inferred but integer is inferred.

{code}
val df = Seq((1, "1"), (2, "1" * 30)).toDF("i", "decimal")
df.write.format("parquet").partitionBy("decimal").save("/tmp/bar")
spark.read.load("/tmp/bar").printSchema()
{code}

{code}
root
 |-- i: integer (nullable = true)
 |-- decimal: integer (nullable = true)
{code}

Looks we should de-duplicate type resolution logic if possible rather than 
separate numeric precedence-like comparison alone.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to