The relevant earlier discussion is here: https://github.com/apache/spark/pull/25678#issuecomment-531585556.
(FWIW, a recent PR tried adding this again: https://github.com/apache/spark/pull/28858.) On Wed, Jun 24, 2020 at 10:01 PM Rylan Dmello <rdme...@mathworks.com> wrote: > Hello, > > > Tahsin and I are trying to use the Apache Parquet file format with Spark > SQL, but are running into errors when reading Parquet files that contain > TimeType columns. We're wondering whether this is unsupported in Spark SQL > due to an architectural limitation, or due to lack of resources? > > > Context: When reading some Parquet files with Spark, we get an error > message like the following: > > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 186.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 186.0 (TID 1970, 10.155.249.249, executor 1): java.io.IOException: Could > not read or convert schema for file: > dbfs:/test/randomdata/sample001.parquet > ... > Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: > INT64 (TIME_MICROS); > at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:106) > > > This only seems to occur with Parquet files that have a column with the > "TimeType" (or the deprecated "TIME_MILLIS"/"TIME_MICROS") types in the > Parquet file. After digging into this a bit, we think that the error > message is coming from "ParquetSchemaConverter.scala" here: link > <https://github.com/apache/spark/blob/11d3a744e20fe403dd76e18d57963b6090a7c581/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L151>. > > <https://github.com/apache/spark/blob/11d3a744e20fe403dd76e18d57963b6090a7c581/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L140> > > > This seems to imply that the Spark SQL engine does not support reading > Parquet files with TimeType columns. > > We are wondering if anyone on the mailing list could shed some more light > on this: are there are architectural/datatype limitations in Spark that are > resulting in this error, or is TimeType support for Parquet files something > that hasn't been implemented yet due to lack of resources/interest? > > > Thanks, > Rylan > -- Bart Samwel bart.sam...@databricks.com