Actually stack trace looks different. In my case, there seems to be a bad entry in the parquet file (although I can successfully write it ), at some row group , some page, 19072 out of 36318 in that currentPage, that entry cannot be read.
On Sat, Nov 22, 2014 at 5:48 PM, Cheng Lian <[email protected]> wrote: > The problem mentioned in [this thread] [1] looks similar to yours. > > [1]: http://apache-spark-user-list.1001560.n3.nabble.com/ > SparkSQL-exception-on-cached-parquet-table-tt18978.html#a19020 > > > On 11/23/14 4:22 AM, Tongjie Chen wrote: > >> Hi, >> >> >> Does anyone find the following message familiar? It seems like a data >> corruption issue but when we wrote that parquet file, it did not have >> any error. We are using Parquet version 1.6.0rc3. >> >> >> Thanks, >> >> >> Tongjie >> >> >> >> >> >> 2014-11-22 18:55:28,970 WARN [main] >> org.apache.hadoop.mapred.YarnChild: Exception running child : >> java.io.IOException: java.io.IOException: >> parquet.io.ParquetDecodingException: Can not read value at 511538 in >> block 0 in file >> s3n://..../dateint=20141122/hour=16/batchid=merged_ >> 20141122T171928_1/542f393b-57f8-441b-8591-2c0169f15d14_000072 >> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain. >> handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) >> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil. >> handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) >> at org.apache.hadoop.hive.shims.HadoopShimsSecure$ >> CombineFileRecordReader.doNextWithExceptionHandler( >> HadoopShimsSecure.java:302) >> at org.apache.hadoop.hive.shims.HadoopShimsSecure$ >> CombineFileRecordReader.next(HadoopShimsSecure.java:218) >> at org.apache.hadoop.mapred.MapTask$TrackedRecordReader. >> moveToNext(MapTask.java:199) >> at org.apache.hadoop.mapred.MapTask$TrackedRecordReader. >> next(MapTask.java:185) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask. >> java:432) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at org.apache.hadoop.security.UserGroupInformation.doAs( >> UserGroupInformation.java:1548) >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) >> Caused by: java.io.IOException: parquet.io.ParquetDecodingException: >> Can not read value at 511538 in block 0 in file >> s3n://..../dateint=20141122/hour=16/batchid=merged_ >> 20141122T171928_1/542f393b-57f8-441b-8591-2c0169f15d14_000072 >> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain. >> handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) >> at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil. >> handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) >> at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader. >> doNext(HiveContextAwareRecordReader.java:276) >> at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext( >> CombineHiveRecordReader.java:101) >> at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext( >> CombineHiveRecordReader.java:41) >> at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader. >> next(HiveContextAwareRecordReader.java:108) >> at org.apache.hadoop.hive.shims.HadoopShimsSecure$ >> CombineFileRecordReader.doNextWithExceptionHandler( >> HadoopShimsSecure.java:300) >> ... 11 more >> Caused by: parquet.io.ParquetDecodingException: Can not read value at >> 511538 in block 0 in file >> s3n://..../dateint=20141122/hour=16/batchid=merged_ >> 20141122T171928_1/542f393b-57f8-441b-8591-2c0169f15d14_000072 >> at parquet.hadoop.InternalParquetRecordReader.nextKeyValue( >> InternalParquetRecordReader.java:213) >> at parquet.hadoop.ParquetRecordReader.nextKeyValue( >> ParquetRecordReader.java:204) >> at org.apache.hadoop.hive.ql.io.parquet.read. >> ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:157) >> at org.apache.hadoop.hive.ql.io.parquet.read. >> ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:45) >> at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader. >> doNext(HiveContextAwareRecordReader.java:274) >> ... 15 more >> Caused by: parquet.io.ParquetDecodingException: Can't read value in >> column [other_properties, map, value] BINARY at value 20433392 out of >> 27896945, 19072 out of 36318 in currentPage. repetition level: 1, >> definition level: 3 >> at parquet.column.impl.ColumnReaderImpl.readValue( >> ColumnReaderImpl.java:450) >> at parquet.column.impl.ColumnReaderImpl. >> writeCurrentValueToConverter(ColumnReaderImpl.java:352) >> at parquet.io.RecordReaderImplementation.read( >> RecordReaderImplementation.java:402) >> at parquet.hadoop.InternalParquetRecordReader.nextKeyValue( >> InternalParquetRecordReader.java:194) >> ... 19 more >> Caused by: parquet.io.ParquetDecodingException: could not read bytes >> at offset 1599090621 >> at parquet.column.values.plain.BinaryPlainValuesReader.readBytes( >> BinaryPlainValuesReader.java:43) >> at parquet.column.impl.ColumnReaderImpl$2$6.read( >> ColumnReaderImpl.java:295) >> at parquet.column.impl.ColumnReaderImpl.readValue( >> ColumnReaderImpl.java:446) >> ... 22 more >> Caused by: *java.lang.ArrayIndexOutOfBoundsException*: 1599090621 >> at parquet.bytes.BytesUtils.readIntLittleEndian( >> BytesUtils.java:54) >> at parquet.column.values.plain.BinaryPlainValuesReader.readBytes( >> BinaryPlainValuesReader.java:36) >> ... 24 more >> >> >
