Sorry, should be SPARK-2489
2014-07-15 19:22 GMT+08:00 Pei-Lun Lee <pl...@appier.com>: > Filed SPARK-2446 > > > > 2014-07-15 16:17 GMT+08:00 Michael Armbrust <mich...@databricks.com>: > > Oh, maybe not. Please file another JIRA. >> >> >> On Tue, Jul 15, 2014 at 12:34 AM, Pei-Lun Lee <pl...@appier.com> wrote: >> >>> Hi Michael, >>> >>> Good to know it is being handled. I tried master branch (9fe693b5) and >>> got another error: >>> >>> scala> sqlContext.parquetFile("/tmp/foo") >>> java.lang.RuntimeException: Unsupported parquet datatype optional >>> fixed_len_byte_array(4) b >>> at scala.sys.package$.error(package.scala:27) >>> at >>> org.apache.spark.sql.parquet.ParquetTypesConverter$.toPrimitiveDataType(ParquetTypes.scala:58) >>> at >>> org.apache.spark.sql.parquet.ParquetTypesConverter$.toDataType(ParquetTypes.scala:109) >>> at >>> org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:282) >>> at >>> org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:279) >>> ...... >>> >>> The avro schema I used is something like: >>> >>> protocol Test { >>> fixed Bytes4(4); >>> >>> record User { >>> string name; >>> int age; >>> union {null, int} i; >>> union {null, int} j; >>> union {null, Bytes4} b; >>> union {null, bytes} c; >>> union {null, int} d; >>> } >>> } >>> >>> Is this case included in SPARK-2446 >>> <https://issues.apache.org/jira/browse/SPARK-2446>? >>> >>> >>> 2014-07-15 3:54 GMT+08:00 Michael Armbrust <mich...@databricks.com>: >>> >>> This is not supported yet, but there is a PR open to fix it: >>>> https://issues.apache.org/jira/browse/SPARK-2446 >>>> >>>> >>>> On Mon, Jul 14, 2014 at 4:17 AM, Pei-Lun Lee <pl...@appier.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am using spark-sql 1.0.1 to load parquet files generated from method >>>>> described in: >>>>> >>>>> https://gist.github.com/massie/7224868 >>>>> >>>>> >>>>> When I try to submit a select query with columns of type fixed length >>>>> byte array, the following error pops up: >>>>> >>>>> >>>>> 14/07/14 11:09:14 INFO scheduler.DAGScheduler: Failed to run take at >>>>> basicOperators.scala:100 >>>>> org.apache.spark.SparkDriverExecutionException: Execution error >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:581) >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:559) >>>>> Caused by: parquet.io.ParquetDecodingException: Can not read value at >>>>> 0 in block -1 in file s3n://foo/bar/part-r-00000.snappy.parquet >>>>> at >>>>> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:177) >>>>> at >>>>> parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130) >>>>> at >>>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122) >>>>> at >>>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) >>>>> at >>>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >>>>> at >>>>> scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) >>>>> at >>>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >>>>> at >>>>> scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308) >>>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >>>>> at >>>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >>>>> at >>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >>>>> at >>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >>>>> at >>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >>>>> at scala.collection.TraversableOnce$class.to >>>>> (TraversableOnce.scala:273) >>>>> at scala.collection.AbstractIterator.to(Iterator.scala:1157) >>>>> at >>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >>>>> at >>>>> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >>>>> at >>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >>>>> at >>>>> scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >>>>> at org.apache.spark.rdd.RDD$$anonfun$27.apply(RDD.scala:989) >>>>> at org.apache.spark.rdd.RDD$$anonfun$27.apply(RDD.scala:989) >>>>> at >>>>> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1083) >>>>> at >>>>> org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:574) >>>>> ... 1 more >>>>> Caused by: java.lang.ClassCastException: Expected instance of >>>>> primitive converter but got >>>>> "org.apache.spark.sql.parquet.CatalystNativeArrayConverter" >>>>> at >>>>> parquet.io.api.Converter.asPrimitiveConverter(Converter.java:30) >>>>> at >>>>> parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:264) >>>>> at >>>>> parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:60) >>>>> at >>>>> parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:74) >>>>> at >>>>> parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:110) >>>>> at >>>>> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172) >>>>> ... 24 more >>>>> >>>>> >>>>> Is fixed length byte array supposed to work in this version? I noticed >>>>> that other array types like int or string already work. >>>>> >>>>> Thanks, >>>>> -- >>>>> Pei-Lun >>>>> >>>>> >>>> >>> >> >