Apoorva Sareen created SPARK-23436: -------------------------------------- Summary: Incorrect Date column Inference in partition discovery Key: SPARK-23436 URL: https://issues.apache.org/jira/browse/SPARK-23436 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.1 Reporter: Apoorva Sareen
If a Partition column appears to partial date/timestamp example : 2018-01-01-23 where it is only truncated upto an hour then the data types of the partitioning columns are automatically inferred as date however, the values are loaded as null. Here is an example code to reproduce this behaviour {code:java} val data = Seq(("1", "2018-01", "2018-01-01-04", "test")).toDF("id", "date_month", "data_hour", "data") data.write.partitionBy("id","date_month","data_hour").parquet("output/test") val input = spark.read.parquet("output/test") input.printSchema() input.show() ## Result ### root |-- data: string (nullable = true) |-- id: integer (nullable = true) |-- date_month: string (nullable = true) |-- data_hour: date (nullable = true) +----+---+----------+---------+ |data| id|date_month|data_hour| +----+---+----------+---------+ |test| 1| 2018-01| null| +----+---+----------+---------+{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org