[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423402#comment-17423402 ]
Senthil Kumar edited comment on SPARK-36861 at 10/1/21, 7:24 PM: ----------------------------------------------------------------- Yes in Spark 3.3, hour column is created as "DateType" but I could see hour part in subdirs created =============== Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/_,_/_/ /_/_\ version 3.3.0-SNAPSHOT /_/ Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_292) Type in expressions to have them evaluated. Type :help for more information. scala> val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), ("2021-01-01T02", 2)).toDF("hour", "i") df: org.apache.spark.sql.DataFrame = [hour: string, i: int] scala> df.write.partitionBy("hour").parquet("/tmp/t1") scala> spark.read.parquet("/tmp/t1").schema res1: org.apache.spark.sql.types.StructType = StructType(StructField(i,IntegerType,true), StructField(hour,DateType,true)) scala> =============== and subdirs created are =============== ls -l total 0 -rw-r--r-- 1 senthilkumar wheel 0 Oct 2 00:44 _SUCCESS drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T00 drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T01 drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T02 =============== It will be helpful if you share the list of sub-dirs created in your case. was (Author: senthh): Yes in Spark 3.3 hour column is created as "DateType" but I could see hour part in subdirs created =============== Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.3.0-SNAPSHOT /_/ Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_292) Type in expressions to have them evaluated. Type :help for more information. scala> val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), ("2021-01-01T02", 2)).toDF("hour", "i") df: org.apache.spark.sql.DataFrame = [hour: string, i: int] scala> df.write.partitionBy("hour").parquet("/tmp/t1") scala> spark.read.parquet("/tmp/t1").schema res1: org.apache.spark.sql.types.StructType = StructType(StructField(i,IntegerType,true), StructField(hour,DateType,true)) scala> =============== and subdirs created are =============== ls -l total 0 -rw-r--r-- 1 senthilkumar wheel 0 Oct 2 00:44 _SUCCESS drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T00 drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T01 drwxr-xr-x 4 senthilkumar wheel 128 Oct 2 00:44 hour=2021-01-01T02 =============== It will be helpful if you share the list of sub-dirs created in your case. > Partition columns are overly eagerly parsed as dates > ---------------------------------------------------- > > Key: SPARK-36861 > URL: https://issues.apache.org/jira/browse/SPARK-36861 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.0 > Reporter: Tanel Kiis > Priority: Blocker > > I have an input directory with subdirs: > * hour=2021-01-01T00 > * hour=2021-01-01T01 > * hour=2021-01-01T02 > * ... > in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it > is parsed as date type and the hour part is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org