[
https://issues.apache.org/jira/browse/PARQUET-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016923#comment-17016923
]
ASF GitHub Bot commented on PARQUET-1765:
-----------------------------------------
gszadovszky commented on pull request #747: PARQUET-1765: Invalid
filteredRowCount in InternalParquetRecordReader
URL: https://github.com/apache/parquet-mr/pull/747
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Invalid filteredRowCount in InternalParquetRecordReader
> -------------------------------------------------------
>
> Key: PARQUET-1765
> URL: https://issues.apache.org/jira/browse/PARQUET-1765
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.11.0
> Reporter: Gabor Szadovszky
> Assignee: Gabor Szadovszky
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.11.1
>
>
> The [record
> count|https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.java#L185]
> is retrieved before setting the [projection
> schema|https://github.com/apache/parquet-mr/blob/apache-parquet-1.11.0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.java#L188]
> so the value might be invalid if the projection impacts the filter.
> In normal cases it does not cause any issue because the record filter will
> filter correctly only that we are filtering the records one-by-one instead of
> dropping the related pages.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)