Marius Soutier created SPARK-6648:
-------------------------------------

             Summary: Reading Parquet files with different sub-files doesn't 
work
                 Key: SPARK-6648
                 URL: https://issues.apache.org/jira/browse/SPARK-6648
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.2.1
            Reporter: Marius Soutier


When reading from multiple parquet files (via 
sqlContext.parquetFile(/path/1.parquet,/path/2.parquet), if the parquet files 
were created using a different coalesce, the reading fails with:

ERROR c.w.r.websocket.ParquetReader  efault-dispatcher-63 : Failed reading 
parquet file
java.lang.IllegalArgumentException: Could not find Parquet metadata at path 
<path>
at 
org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459)
 ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]

        at 
org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$readMetaData$4.apply(ParquetTypes.scala:459)
 ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
        at scala.Option.getOrElse(Option.scala:120) 
~[org.scala-lang.scala-library-2.10.4.jar:na]
        at 
org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:458)
 ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
        at 
org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:477)
 ~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
        at 
org.apache.spark.sql.parquet.ParquetRelation.<init>(ParquetRelation.scala:65) 
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]
        at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:165) 
~[org.apache.spark.spark-sql_2.10-1.2.1.jar:1.2.1]

I haven't tested with Spark 1.3 yet but will report back after upgrading to 
1.3.1 (as soon as it's released).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to