[ 
https://issues.apache.org/jira/browse/HIVE-22753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021041#comment-17021041
 ] 

Rajesh Balamohan edited comment on HIVE-22753 at 1/22/20 1:16 PM:
------------------------------------------------------------------

There is a race between BatchEventProcessor and the cleanup operation performed 
by HS2 thread. So even though stop() is being invoked and file is getting 
closed,  same filename is recreated by BatchEventProcessor in a span of 1 ms 
due to race. This instance never gets cleared up, causing the leak.  This also 
creates stale directories/files in ops log folder. Another option with .2 patch 
is to track the files which are genuinely getting closed and prevent them from 
getting recreated within seconds. Verified that this fixes the leak in the 
cluster.


was (Author: rajesh.balamohan):
There is a race between BatchEventProcessor and the cleanup operation performed 
by HS2 thread. So even though stop() is being invoked and file is getting 
closed,  same filename is recreated by BatchEventProcessor in a span of 1 ms 
due to race. This instance never gets cleared up, causing the leak.  This also 
creates stale directories/files in ops log folder. Another option with .2 patch 
is to track the files which are genuinely getting closed and prevent them from 
getting recreated within seconds.

> Fix gradual mem leak: Operationlog related appenders should be cleared up on 
> errors 
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-22753
>                 URL: https://issues.apache.org/jira/browse/HIVE-22753
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HIVE-22753.1.patch, HIVE-22753.2.patch, 
> image-2020-01-21-11-14-37-911.png, image-2020-01-21-11-17-59-279.png, 
> image-2020-01-21-11-18-37-294.png
>
>
> In case of exception in SQLOperation, operational log does not get cleared 
> up. This causes gradual build up of HushableRandomAccessFileAppender causing 
> HS2 to OOM after some time.
> !image-2020-01-21-11-14-37-911.png|width=431,height=267!
>  
> Allocation tree
> !image-2020-01-21-11-18-37-294.png|width=425,height=178!
>  
> Prod instance mem
> !image-2020-01-21-11-17-59-279.png|width=698,height=209!
>  
> Each HushableRandomAccessFileAppender holds internal ref to 
> RandomAccessFileAppender which holds a 256 KB bytebuffer, causing the mem 
> leak.
> Related ticket: HIVE-18820



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to