Ryan Blue created PARQUET-9:
-------------------------------

             Summary: InternalParquetRecordReader will not read multiple blocks 
when filtering
                 Key: PARQUET-9
                 URL: https://issues.apache.org/jira/browse/PARQUET-9
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
            Reporter: Ryan Blue


The InternalParquetRecordReader keeps track of the count of records it has 
processed and uses that count to know when it is finished and when to load a 
new row group of data. But when it is wrapping a FilteredRecordReader, this 
count is not updated for records that are filtered, so when the reader exhausts 
the block it is reading, it will continue calling read() on the filtered reader 
and will pass null values to the caller.

The quick fix is to detect null values returned by the record reader and update 
the count to read the next row group. But the longer-term solution is to 
correctly account for the filtered records.

The pull request for the quick fix is 
[#9|https://github.com/apache/incubator-parquet-mr/pull/9].



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to