Ryan Blue created PARQUET-18:
--------------------------------
Summary: Cannot read dictionary-encoded pages with all null values
Key: PARQUET-18
URL: https://issues.apache.org/jira/browse/PARQUET-18
Project: Parquet
Issue Type: Bug
Components: parquet-mr
Reporter: Ryan Blue
Assignee: Ryan Blue
Fix For: 1.6.0
This is [issue #283|https://github.com/Parquet/parquet-mr/issues/283].
Parquet-mr will try to read the bit-width byte in
{{DictionaryValuesReader#initPage}} even if the incoming offset is at the end
of the byte array because there are no values.
Here's the stack trace:
{code}
Caused by: parquet.io.ParquetDecodingException: could not read page Page [id:
1, bytes.size=7, valueCount=100, uncompressedSize=7] in col [id] INT32
at
parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:532)
at
parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:493)
at
parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:546)
at
parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:339)
at
parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:63)
at
parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:58)
at
parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:265)
at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:60)
at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:74)
at
parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:112)
at
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:174)
... 29 more
Caused by: java.io.EOFException
at
parquet.bytes.BytesUtils.readIntLittleEndianOnOneByte(BytesUtils.java:76)
at
parquet.column.values.dictionary.DictionaryValuesReader.initFromPage(DictionaryValuesReader.java:55)
at
parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:530)
... 39 more
{code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)