[ 
https://issues.apache.org/jira/browse/HIVE-16546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15991885#comment-15991885
 ] 

Prasanth Jayachandran commented on HIVE-16546:
----------------------------------------------

Fixed review comments in the patch.

Regarding the inflation factor. It is already accounted in during statistics 
annotation. Most file formats provide rawDataSize which is deserialized and 
decompressed data size which is stored in the metastore. When rawDataSize is 
available, the map join decision is already taken based on that. So inflation 
factor is already accounted in most cases. Example ORC file could be 10MB on 
disk which corresponds to totalSize in metastore whose rawDataSize could be 
several 100MB which is also stored in metastore. 

The inflation factor in the context of this patch is we allow the hash table to 
expand by that many factors in-memory before killing it. Say if container size 
is 4GB, noconditional task size is configured to 1GB. We will wait until the 
estimated memory size to reach 2GB before killing the task. 

Addressed the grace size review comment with Math.max(threshold, 
(2/3)*maxMemoryAvailable).. maxMemoryAvailable will be container size in case 
of tez and memory per executor in case of LLAP. This guards against poorly 
configured noconditional task size. 

> LLAP: Fail map join tasks if hash table memory exceeds threshold
> ----------------------------------------------------------------
>
>                 Key: HIVE-16546
>                 URL: https://issues.apache.org/jira/browse/HIVE-16546
>             Project: Hive
>          Issue Type: Bug
>          Components: llap
>    Affects Versions: 3.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>         Attachments: HIVE-16546.1.patch, HIVE-16546.2.patch, 
> HIVE-16546.3.patch, HIVE-16546.WIP.patch
>
>
> When map join task is running in llap, it can potentially use lot more memory 
> than its limit which could be memory per executor or no conditional task 
> size. If it uses more memory, it can adversely affect other query performance 
> or it can even bring down the daemon. In such cases, it is better to fail the 
> query than to bring down the daemon. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to