Marius Soutier created SPARK-6648: ------------------------------------- Summary: Reading Parquet files with different sub-files doesn't work Key: SPARK-6648 URL: https://issues.apache.org/jira/browse/SPARK-6648 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.1 Reporter: Marius Soutier
When reading from multiple parquet files (via sqlContext.parquetFile(/path/1.parquet,/path/2.parquet), if the parquet files were created using a different coalesce, the reading fails with: ERROR c.w.r.websocket.ParquetReader efault-dispatcher-63 : Failed reading parquet file java.lang.IllegalArgumentException: Could not find Parquet metadata at path <path> at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.4.jar:na] at org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:458) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:477) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetRelation.<init>(ParquetRelation.scala:65) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:165) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] I haven't tested with Spark 1.3 yet but will report back after upgrading to 1.3.1 (as soon as it's released). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org