[ 
https://issues.apache.org/jira/browse/PARQUET-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156190#comment-14156190
 ] 

Kristoffer Sjögren commented on PARQUET-112:
--------------------------------------------

I should add that data is written using AvroParquetFileTarget and SNAPPY 
compression. Data is read using AvroParquetFileSource with UnboundRecordFilter 
and includeField.

> RunLengthBitPackingHybridDecoder: Reading past RLE/BitPacking stream.
> ---------------------------------------------------------------------
>
>                 Key: PARQUET-112
>                 URL: https://issues.apache.org/jira/browse/PARQUET-112
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>         Environment: Java 1.7 Linux Debian
>            Reporter: Kristoffer Sjögren
>
> I am using Avro and Crunch 0.11 to write data into Hadoop CDH 4.6 in parquet 
> format. This works fine for a few gigabytes but blows up in the 
> RunLengthBitPackingHybridDecoder when reading a few thousands gigabytes.
> parquet.io.ParquetDecodingException: Can not read value at 19453 in block 0 
> in file 
> hdfs://nn-ix01.se-ix.delta.prod:8020/user/stoffe/parquet/dogfight/2014/09/29/part-m-00153.snappy.parquet
>       at 
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:177)
>       at 
> parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
>       at 
> org.apache.crunch.impl.mr.run.CrunchRecordReader.nextKeyValue(CrunchRecordReader.java:157)
>       at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483)
>       at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
>       at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>       at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: parquet.io.ParquetDecodingException: Can't read value in column 
> [action] BINARY at value 697332 out of 872236, 96921 out of 96921 in 
> currentPage. repetition level: 0, definition level: 1
>       at 
> parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:466)
>       at 
> parquet.column.impl.ColumnReaderImpl.getBinary(ColumnReaderImpl.java:414)
>       at parquet.filter.ColumnPredicates$1.apply(ColumnPredicates.java:64)
>       at parquet.filter.ColumnRecordFilter.isMatch(ColumnRecordFilter.java:69)
>       at 
> parquet.io.FilteredRecordReader.skipToMatch(FilteredRecordReader.java:71)
>       at parquet.io.FilteredRecordReader.read(FilteredRecordReader.java:57)
>       at 
> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:173)
>       ... 13 more
> Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking 
> stream.
>       at parquet.Preconditions.checkArgument(Preconditions.java:47)
>       at 
> parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
>       at 
> parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
>       at 
> parquet.column.values.dictionary.DictionaryValuesReader.readBytes(DictionaryValuesReader.java:73)
>       at 
> parquet.column.impl.ColumnReaderImpl$2$7.read(ColumnReaderImpl.java:311)
>       at 
> parquet.column.impl.ColumnReaderImpl.readValue(ColumnReaderImpl.java:462)
>       ... 19 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to