[ https://issues.apache.org/jira/browse/IMPALA-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong resolved IMPALA-6383. ----------------------------------- Resolution: Fixed Fix Version/s: Impala 2.12.0 IMPALA-6383: free memory after skipping parquet row groups Before this patch, resources were only flushed after breaking out of NextRowGroup(). This is a problem because resources can be allocated for skipped row groups (e.g. for reading dictionaries). Testing: Tested in conjunction with a prototype buffer pool patch that was DCHECKing before the change. Added DCHECKs to the current version to ensure the streams are cleared up as expected. Change-Id: Ibc2f8f27c9b238be60261539f8d4be2facb57a2b Reviewed-on: [http://gerrit.cloudera.org:8080/9002] Reviewed-by: Tim Armstrong < [tarmstr...@cloudera.com|mailto:tarmstr...@cloudera.com] > Tested-by: Impala Public Jenkins > Memory from previous row groups can accumulate in Parquet scanner > ----------------------------------------------------------------- > > Key: IMPALA-6383 > URL: https://issues.apache.org/jira/browse/IMPALA-6383 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 2.12.0 > Reporter: Tim Armstrong > Assignee: Tim Armstrong > Priority: Major > Labels: parquet, resource-management > Fix For: Impala 2.12.0 > > > I ran across this bug when working on porting scanners to the new buffer > pool. Before that the only symptom of the failures was excessive memory > consumption, but with the reservations they become easy-to-detect hard > failures. > The problem is in HdfsParquetScanner::NextRowGroup(), which calls > InitColumns() on column readers, which starts scans, which allocate memory. > The problem is that, if the row group is skipped because of dictionary > predicates or some other error, the scans aren't cancelled and the I/O > buffers aren't releated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)