[ 
https://issues.apache.org/jira/browse/IMPALA-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543490#comment-16543490
 ] 

Tim Armstrong edited comment on IMPALA-7096 at 7/13/18 5:33 PM:
----------------------------------------------------------------

I looked at a couple of queries and concluded that there are two main problems 
with non-reserved memory:
# Memory accumulating in the row batch queue - we don't have any real bound on 
the amount of memory that can accumulate here.
# Scanning some files can be quite memory hungry, e.g. compressed text with 
snappy compression, compressed parquet with large pages, etc.

I think we could have a principled solution for problem #1 based on limiting 
the amount of memory in the row batch queue (rather than limiting the number of 
batches) but problem #2 requires more heuristics because of the difficulty in 
knowing the amount of memory required to scan the file before actually doing 
it. The EnoughMemoryForScannerThread() heuristic that IMPALA-4835 removed was a 
decent line of defense against problem #2 (and problem #1 to a lesser extent), 
although it was flawed in that it compared the expected memory consumption from 
a single scan to the amount of memory left in the query-global memtracker.


was (Author: tarmstrong):
I looked at a couple of queries and concluded that there are two main problems 
with non-reserved memory:
# Memory accumulating in the row batch queue - we don't have any real bound on 
the amount of memory that can accumulate here.
# Scanning some files can be quite memory hungry, e.g. compressed text with 
snappy compression, compressed parquet with large pages, etc.

I think we could have a principled solution for problem #1 based on limiting 
the amount of memory in the row batch queue (rather than limiting the number of 
batches) but problem #2 requires more heuristics because of the difficulty in 
knowing the amount of memory required to scan the file before actually doing 
it. The EnoughMemoryForScannerThread() heuristic that IMPALA-4835 removed was a 
decent line of defense against problem #2, although it was flawed in that it 
compared the expected memory consumption from a single scan to the amount of 
memory left in the query-global memtracker.

> Ensure no memory limit exceeded regressions from IMPALA-4835 because of 
> non-reserved memory
> -------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-7096
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7096
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.13.0, Impala 3.1.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Blocker
>              Labels: resource-management
>         Attachments: ScanConsumingMostMemory.txt
>
>
> IMPALA-7078 showed some cases where non-buffer memory could accumulate in the 
> row batch queue and cause memory consumption problems.
> The decision for whether to spin up a scanner thread in IMPALA-4835 
> implicitly assumes that buffer memory is the bulk of memory consumed by a 
> scan, but there may be cases where that is not true and the previous 
> heuristic would be more conservative about starting a scanner thread.
> We should investigate this further and figure out how to avoid it if there's 
> an issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to