Apoorva Sareen created SPARK-23436:
--------------------------------------

             Summary: Incorrect Date column Inference in partition discovery
                 Key: SPARK-23436
                 URL: https://issues.apache.org/jira/browse/SPARK-23436
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.1
            Reporter: Apoorva Sareen


If a Partition column appears to partial date/timestamp

    example : 2018-01-01-23 

where it is only truncated upto an hour then the data types of the partitioning 
columns are automatically inferred as date however, the values are loaded as 
null. 

Here is an example code to reproduce this behaviour

 

 
{code:java}
val data = Seq(("1", "2018-01", "2018-01-01-04", "test")).toDF("id", 
"date_month", "data_hour", "data")  

data.write.partitionBy("id","date_month","data_hour").parquet("output/test")

val input = spark.read.parquet("output/test")  

input.printSchema()

input.show()


## Result ###

root

|-- data: string (nullable = true)

|-- id: integer (nullable = true)

|-- date_month: string (nullable = true)

|-- data_hour: date (nullable = true)



+----+---+----------+---------+

|data| id|date_month|data_hour|

+----+---+----------+---------+

|test|  1|   2018-01|     null|

+----+---+----------+---------+{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to