[ https://issues.apache.org/jira/browse/IMPALA-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543490#comment-16543490 ]
Tim Armstrong edited comment on IMPALA-7096 at 7/13/18 5:33 PM: ---------------------------------------------------------------- I looked at a couple of queries and concluded that there are two main problems with non-reserved memory: # Memory accumulating in the row batch queue - we don't have any real bound on the amount of memory that can accumulate here. # Scanning some files can be quite memory hungry, e.g. compressed text with snappy compression, compressed parquet with large pages, etc. I think we could have a principled solution for problem #1 based on limiting the amount of memory in the row batch queue (rather than limiting the number of batches) but problem #2 requires more heuristics because of the difficulty in knowing the amount of memory required to scan the file before actually doing it. The EnoughMemoryForScannerThread() heuristic that IMPALA-4835 removed was a decent line of defense against problem #2 (and problem #1 to a lesser extent), although it was flawed in that it compared the expected memory consumption from a single scan to the amount of memory left in the query-global memtracker. was (Author: tarmstrong): I looked at a couple of queries and concluded that there are two main problems with non-reserved memory: # Memory accumulating in the row batch queue - we don't have any real bound on the amount of memory that can accumulate here. # Scanning some files can be quite memory hungry, e.g. compressed text with snappy compression, compressed parquet with large pages, etc. I think we could have a principled solution for problem #1 based on limiting the amount of memory in the row batch queue (rather than limiting the number of batches) but problem #2 requires more heuristics because of the difficulty in knowing the amount of memory required to scan the file before actually doing it. The EnoughMemoryForScannerThread() heuristic that IMPALA-4835 removed was a decent line of defense against problem #2, although it was flawed in that it compared the expected memory consumption from a single scan to the amount of memory left in the query-global memtracker. > Ensure no memory limit exceeded regressions from IMPALA-4835 because of > non-reserved memory > ------------------------------------------------------------------------------------------- > > Key: IMPALA-7096 > URL: https://issues.apache.org/jira/browse/IMPALA-7096 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 2.13.0, Impala 3.1.0 > Reporter: Tim Armstrong > Assignee: Tim Armstrong > Priority: Blocker > Labels: resource-management > Attachments: ScanConsumingMostMemory.txt > > > IMPALA-7078 showed some cases where non-buffer memory could accumulate in the > row batch queue and cause memory consumption problems. > The decision for whether to spin up a scanner thread in IMPALA-4835 > implicitly assumes that buffer memory is the bulk of memory consumed by a > scan, but there may be cases where that is not true and the previous > heuristic would be more conservative about starting a scanner thread. > We should investigate this further and figure out how to avoid it if there's > an issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org