[
https://issues.apache.org/jira/browse/PARQUET-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tom White resolved PARQUET-9.
-----------------------------
Resolution: Fixed
Fix Version/s: 1.6.0
Assignee: Tom White
Fixed in
https://git-wip-us.apache.org/repos/asf?p=incubator-parquet-mr.git;a=commit;h=2d8ebdbe00786823658bcdd2817e6b5afee15b25
> InternalParquetRecordReader will not read multiple blocks when filtering
> ------------------------------------------------------------------------
>
> Key: PARQUET-9
> URL: https://issues.apache.org/jira/browse/PARQUET-9
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Reporter: Ryan Blue
> Assignee: Tom White
> Fix For: 1.6.0
>
>
> The InternalParquetRecordReader keeps track of the count of records it has
> processed and uses that count to know when it is finished and when to load a
> new row group of data. But when it is wrapping a FilteredRecordReader, this
> count is not updated for records that are filtered, so when the reader
> exhausts the block it is reading, it will continue calling read() on the
> filtered reader and will pass null values to the caller.
> The quick fix is to detect null values returned by the record reader and
> update the count to read the next row group. But the longer-term solution is
> to correctly account for the filtered records.
> The pull request for the quick fix is
> [#9|https://github.com/apache/incubator-parquet-mr/pull/9].
--
This message was sent by Atlassian JIRA
(v6.2#6252)