Hi,

I am using spark-sql 1.0.1 to load parquet files generated from method
described in:

https://gist.github.com/massie/7224868


When I try to submit a select query with columns of type fixed length byte
array, the following error pops up:


14/07/14 11:09:14 INFO scheduler.DAGScheduler: Failed to run take at
basicOperators.scala:100
org.apache.spark.SparkDriverExecutionException: Execution error
        at
org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:581)
        at
org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:559)
Caused by: parquet.io.ParquetDecodingException: Can not read value at 0 in
block -1 in file s3n://foo/bar/part-r-00000.snappy.parquet
        at
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:177)
        at
parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
        at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122)
        at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
        at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
        at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
        at scala.collection.TraversableOnce$class.to
(TraversableOnce.scala:273)
        at scala.collection.AbstractIterator.to(Iterator.scala:1157)
        at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
        at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
        at org.apache.spark.rdd.RDD$$anonfun$27.apply(RDD.scala:989)
        at org.apache.spark.rdd.RDD$$anonfun$27.apply(RDD.scala:989)
        at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083)
        at
org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:574)
        ... 1 more
Caused by: java.lang.ClassCastException: Expected instance of primitive
converter but got
"org.apache.spark.sql.parquet.CatalystNativeArrayConverter"
        at parquet.io.api.Converter.asPrimitiveConverter(Converter.java:30)
        at
parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:264)
        at
parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:60)
        at
parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:74)
        at
parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:110)
        at
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172)
        ... 24 more


Is fixed length byte array supposed to work in this version? I noticed that
other array types like int or string already work.

Thanks,
--
Pei-Lun

Reply via email to