Hi, I am using spark-sql 1.0.1 to load parquet files generated from method described in:
https://gist.github.com/massie/7224868 When I try to submit a select query with columns of type fixed length byte array, the following error pops up: 14/07/14 11:09:14 INFO scheduler.DAGScheduler: Failed to run take at basicOperators.scala:100 org.apache.spark.SparkDriverExecutionException: Execution error at org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:581) at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:559) Caused by: parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file s3n://foo/bar/part-r-00000.snappy.parquet at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:177) at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to (TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$27.apply(RDD.scala:989) at org.apache.spark.rdd.RDD$$anonfun$27.apply(RDD.scala:989) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083) at org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:574) ... 1 more Caused by: java.lang.ClassCastException: Expected instance of primitive converter but got "org.apache.spark.sql.parquet.CatalystNativeArrayConverter" at parquet.io.api.Converter.asPrimitiveConverter(Converter.java:30) at parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:264) at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:60) at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:74) at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:110) at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172) ... 24 more Is fixed length byte array supposed to work in this version? I noticed that other array types like int or string already work. Thanks, -- Pei-Lun