[ https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16302114#comment-16302114 ]
Sergey Shelukhin commented on HIVE-18269: ----------------------------------------- Update: we've seen large queue also producing GC problems without getting close to OOM with many decimal columns. The temp patch to see if the limit works performed well with queue size of 3-10, which I suspect will be insufficient for a cloud FS like S3 if IO thread is blocked - if pipeline can process 10 VRBs rapidly, it will have to wait for a while until the unblocked S3 reader produces more data and blocks, then process it quickly again and block, etc. This might require some testing. There are 3 possible approaches that I see: 1) Don't block physical reads from FS, but only block the decoding/etc. that produces java objects. That may be a complex threading change and/or would require separate throttle for the buffers (that may be more forgiving) lest they cause OOM. 2) Determine queue size dynamically based on speed of processing - e.g. start high, then see how fast next calls are coming and how fast IO is putting stuff in queue, and adjust down if IO is much faster; or start low (~10) and expand aggressively every time the next() waits (meaning IO is not keeping up). This is rather complex although may be the best long term solution. 3) Determine queue size per fragment (vertex, really) based on schema. Configure a high default limit (e.g. 10k to prevent OOMs), and the lower bound of the limit (e.g. 10). Then, at init time start with the limit as the high boundary, and reduce it based on the number and type of VRB vectors (reduce proportionally assuming the maximum limit is for a single INT vector, and it can never go below the minimum). This is hand wavy but easy to implement and reason about, and as a fail safe one can always set min=max to fix the queue size. I think we can start with 3 and consider 2 later. 1 is only good if we decide to separate FS and decoding threads that was a plan long time ago that was not implemented. [~gopalv] [~prasanth_j] [~hagleitn] any input? > LLAP: Fast llap io with slow processing pipeline can lead to OOM > ---------------------------------------------------------------- > > Key: HIVE-18269 > URL: https://issues.apache.org/jira/browse/HIVE-18269 > Project: Hive > Issue Type: Bug > Affects Versions: 3.0.0 > Reporter: Prasanth Jayachandran > Assignee: Prasanth Jayachandran > Attachments: HIVE-18269.1.patch, Screen Shot 2017-12-13 at 1.15.16 > AM.png > > > pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow > indefinitely when Llap IO is faster than processing pipeline. Since we don't > have backpressure to slow down the IO, this can lead to indefinite growth of > pending data leading to severe GC pressure and eventually lead to OOM. > This specific instance of LLAP was running on HDFS on top of EBS volume > backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS > .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)