[ https://issues.apache.org/jira/browse/SPARK-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marius Soutier updated SPARK-6648: ---------------------------------- Description: When reading from multiple parquet files (via sqlContext.parquetFile(/path/1.parquet,/path/2.parquet), and one of the parquet files is being overwritten using a different coalesce (e.g. one only contains part-r-1.parquet, the other also part-r-2.parquet, part-r-3.parquet), the reading fails with: ERROR c.w.r.websocket.ParquetReader efault-dispatcher-63 : Failed reading parquet file java.lang.IllegalArgumentException: Could not find Parquet metadata at path <path> at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.4.jar:na] at org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:458) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:477) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetRelation.<init>(ParquetRelation.scala:65) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:165) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] I haven't tested with Spark 1.3 yet but will report back after upgrading to 1.3.1 (as soon as it's released). was: When reading from multiple parquet files (via sqlContext.parquetFile(/path/1.parquet,/path/2.parquet), if the parquet files were created using a different coalesce (e.g. one only contains part-r-1.parquet, the other also part-r-2.parquet, part-r-3.parquet), the reading fails with: ERROR c.w.r.websocket.ParquetReader efault-dispatcher-63 : Failed reading parquet file java.lang.IllegalArgumentException: Could not find Parquet metadata at path <path> at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.4.jar:na] at org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:458) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:477) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.parquet.ParquetRelation.<init>(ParquetRelation.scala:65) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:165) ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] I haven't tested with Spark 1.3 yet but will report back after upgrading to 1.3.1 (as soon as it's released). > Reading Parquet files with different sub-files doesn't work > ----------------------------------------------------------- > > Key: SPARK-6648 > URL: https://issues.apache.org/jira/browse/SPARK-6648 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.1 > Reporter: Marius Soutier > > When reading from multiple parquet files (via > sqlContext.parquetFile(/path/1.parquet,/path/2.parquet), and one of the > parquet files is being overwritten using a different coalesce (e.g. one only > contains part-r-1.parquet, the other also part-r-2.parquet, > part-r-3.parquet), the reading fails with: > ERROR c.w.r.websocket.ParquetReader efault-dispatcher-63 : Failed reading > parquet file > java.lang.IllegalArgumentException: Could not find Parquet metadata at path > <path> > at > org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) > ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] > at > org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459) > ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] > at scala.Option.getOrElse(Option.scala:120) > ~[org.scala-lang.scala-library-2.10.4.jar:na] > at > org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:458) > ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] > at > org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:477) > ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] > at > org.apache.spark.sql.parquet.ParquetRelation.<init>(ParquetRelation.scala:65) > ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] > at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:165) > ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1] > I haven't tested with Spark 1.3 yet but will report back after upgrading to > 1.3.1 (as soon as it's released). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org