Ryan Blue created PARQUET-18:
--------------------------------

             Summary: Cannot read dictionary-encoded pages with all null values
                 Key: PARQUET-18
                 URL: https://issues.apache.org/jira/browse/PARQUET-18
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
            Reporter: Ryan Blue
            Assignee: Ryan Blue
             Fix For: 1.6.0


This is [issue #283|https://github.com/Parquet/parquet-mr/issues/283]. 
Parquet-mr will try to read the bit-width byte in 
{{DictionaryValuesReader#initPage}} even if the incoming offset is at the end 
of the byte array because there are no values.

Here's the stack trace:

{code}
Caused by: parquet.io.ParquetDecodingException: could not read page Page [id: 
1, bytes.size=7, valueCount=100, uncompressedSize=7] in col [id] INT32
        at 
parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:532)
        at 
parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:493)
        at 
parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:546)
        at 
parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:339)
        at 
parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:63)
        at 
parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:58)
        at 
parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:265)
        at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:60)
        at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:74)
        at 
parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:112)
        at 
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:174)
        ... 29 more
Caused by: java.io.EOFException
        at 
parquet.bytes.BytesUtils.readIntLittleEndianOnOneByte(BytesUtils.java:76)
        at 
parquet.column.values.dictionary.DictionaryValuesReader.initFromPage(DictionaryValuesReader.java:55)
        at 
parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:530)
        ... 39 more
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to