[ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16302114#comment-16302114
 ] 

Sergey Shelukhin commented on HIVE-18269:
-----------------------------------------


Update: we've seen large queue also producing GC problems without getting close 
to OOM with many decimal columns. The temp patch to see if the limit works 
performed well with queue size of 3-10, which I suspect will be insufficient 
for a cloud FS like S3 if IO thread is blocked - if pipeline can process 10 
VRBs rapidly, it will have to wait for a while until the unblocked S3 reader 
produces more data and blocks, then process it quickly again and block, etc. 
This might require some testing. There are 3 possible approaches that I see:
1) Don't block physical reads from FS, but only block the decoding/etc. that 
produces java objects. That may be a complex threading change and/or would 
require separate throttle for the buffers (that may be more forgiving) lest 
they cause OOM.
2) Determine queue size dynamically based on speed of processing - e.g. start 
high, then see how fast next calls are coming and how fast IO is putting stuff 
in queue, and adjust down if IO is much faster; or start low  (~10) and expand 
aggressively every time the next() waits (meaning IO is not keeping up). This 
is rather complex although may be the best long term solution.
3) Determine queue size per fragment (vertex, really) based on schema. 
Configure a high default limit (e.g. 10k to prevent OOMs), and the lower bound 
of the limit (e.g. 10). Then, at init time start with the limit as the high 
boundary, and reduce it based on the number and type of VRB vectors (reduce 
proportionally assuming the maximum limit is for a single INT vector, and it 
can never go below the minimum). This is hand wavy but easy to implement and 
reason about, and as a fail safe one can always set min=max to fix the queue 
size.

I think we can start with 3 and consider 2 later. 1 is only good if we decide 
to separate FS and decoding threads that was a plan long time ago that was not 
implemented.

[~gopalv] [~prasanth_j] [~hagleitn] any input?

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> ----------------------------------------------------------------
>
>                 Key: HIVE-18269
>                 URL: https://issues.apache.org/jira/browse/HIVE-18269
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>         Attachments: HIVE-18269.1.patch, Screen Shot 2017-12-13 at 1.15.16 
> AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to