[ https://issues.apache.org/jira/browse/HIVE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548380#comment-16548380 ]
Adesh Kumar Rao commented on HIVE-15131: ---------------------------------------- Uploaded new patch to fix the parquet_analyze test failure. This was happening because in case of noscan stats collection, parquet reader gets a dummySplit with start/length set to 0/0 in which case, using the filter api was returning 0 blocks for the dummySplit and hence wrong stats were updated. > Change Parquet reader to read metadata on the task side > ------------------------------------------------------- > > Key: HIVE-15131 > URL: https://issues.apache.org/jira/browse/HIVE-15131 > Project: Hive > Issue Type: Bug > Components: Reader > Reporter: Chao Sun > Assignee: Adesh Kumar Rao > Priority: Major > Attachments: HIVE-15131.1.patch, HIVE-15131.2.patch, > HIVE-15131.3.patch, HIVE-15131.4.patch > > > Currently the {{ParquetRecordReaderWrapper}} still uses the {{readFooter}} > API without filtering, which means it needs to read metadata about all row > groups every time. This could some issues when input dataset is particularly > big and has many columns. > [Parquet-84|https://issues.apache.org/jira/browse/PARQUET-84] introduced > another API which allows to do row group filtering on the task side. Hive > should adopt this API. -- This message was sent by Atlassian JIRA (v7.6.3#76005)