The problem mentioned in [this thread] [1] looks similar to yours.
[1]:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-exception-on-cached-parquet-table-tt18978.html#a19020
On 11/23/14 4:22 AM, Tongjie Chen wrote:
Hi,
Does anyone find the following message familiar? It seems like a data
corruption issue but when we wrote that parquet file, it did not have
any error. We are using Parquet version 1.6.0rc3.
Thanks,
Tongjie
2014-11-22 18:55:28,970 WARN [main]
org.apache.hadoop.mapred.YarnChild: Exception running child :
java.io.IOException: java.io.IOException:
parquet.io.ParquetDecodingException: Can not read value at 511538 in
block 0 in file
s3n://..../dateint=20141122/hour=16/batchid=merged_20141122T171928_1/542f393b-57f8-441b-8591-2c0169f15d14_000072
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:218)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.io.IOException: parquet.io.ParquetDecodingException:
Can not read value at 511538 in block 0 in file
s3n://..../dateint=20141122/hour=16/batchid=merged_20141122T171928_1/542f393b-57f8-441b-8591-2c0169f15d14_000072
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:300)
... 11 more
Caused by: parquet.io.ParquetDecodingException: Can not read value at
511538 in block 0 in file
s3n://..../dateint=20141122/hour=16/batchid=merged_20141122T171928_1/542f393b-57f8-441b-8591-2c0169f15d14_000072
at
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:213)
at
parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:204)
at
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:157)
at
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:45)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
... 15 more
Caused by: parquet.io.ParquetDecodingException: Can't read value in
column [other_properties, map, value] BINARY at value 20433392 out of
27896945, 19072 out of 36318 in currentPage. repetition level: 1,
definition level: 3
at
parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:450)
at
parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:352)
at
parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:402)
at
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:194)
... 19 more
Caused by: parquet.io.ParquetDecodingException: could not read bytes
at offset 1599090621
at
parquet.column.values.plain.BinaryPlainValuesReader.readBytes(BinaryPlainValuesReader.java:43)
at
parquet.column.impl.ColumnReaderImpl$2$6.read(ColumnReaderImpl.java:295)
at
parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:446)
... 22 more
Caused by: *java.lang.ArrayIndexOutOfBoundsException*: 1599090621
at parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:54)
at
parquet.column.values.plain.BinaryPlainValuesReader.readBytes(BinaryPlainValuesReader.java:36)
... 24 more