[jira] [Commented] (HBASE-26088) conn.getBufferedMutator(tableName) leaks thread executors and other problems

Viraj Jasani (Jira) Thu, 22 Jul 2021 07:10:05 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-26088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385527#comment-17385527
 ]


Viraj Jasani commented on HBASE-26088:
--------------------------------------

Nice find and fix (y)

> conn.getBufferedMutator(tableName) leaks thread executors and other problems
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-26088
>                 URL: https://issues.apache.org/jira/browse/HBASE-26088
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.0.0
>            Reporter: Whitney Jackson
>            Assignee: Rushabh Shah
>            Priority: Critical
>             Fix For: 2.5.0, 2.3.6, 2.4.5
>
>
> TL;DR: {{conn.getBufferedMutator(tableName)}} is dangerous in hbase client 
> 2.4.4 and doesn't match documented behavior in 1.4.13.
> To work around the problems until fixed do this:
> {code:java}
> var mySingletonPool = HTable.getDefaultExecutor(hbaseConf);
> var params = new BufferedMutatorParams(tableName);
> params.pool(mySingletonPool);
> var myMutator = conn.getBufferedMutator(params);
> {code}
> And avoid code like this:
> {code:java}
> var myMutator = conn.getBufferedMutator(tableName);
> {code}
> The full story:
> My application started leaking threads after upgrading from hbase client 
> 1.4.13 to 2.4.4. So much so that after less than a minute of runtime more 
> that 30k threads are leaked and all available virtual memory on the box (> 50 
> GB) is consumed. Other processes on the box start crashing with memory 
> allocation errors. Even running {{ls}} at the shell fails with OS resource 
> allocation failures.
> A thread dump after just a few seconds of runtime shows thousands of threads 
> like this:
> {code:java}
> "htable-pool-0" #8841 prio=5 os_prio=0 cpu=0.15ms elapsed=7.49s 
> tid=0x00007efb6d2a1000 nid=0x57d2 waiting on condition [0x00007ef8a6c38000]
>  java.lang.Thread.State: TIMED_WAITING (parking)
>  at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
>  - parking to wait for <0x00000007e7cd6188> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>  at 
> java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6/LockSupport.java:234)
>  at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.base@11.0.6/SynchronousQueue.java:462)
>  at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@11.0.6/SynchronousQueue.java:361)
>  at 
> java.util.concurrent.SynchronousQueue.poll(java.base@11.0.6/SynchronousQueue.java:937)
>  at 
> java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.6/ThreadPoolExecutor.java:1053)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6/ThreadPoolExecutor.java:1114)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6/ThreadPoolExecutor.java:628)
>  at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> {code}
>  
> Note: All the threads are labeled {{htable-pool-0}}. That suggests we're 
> leaking thread executors not just threads. The {{htable-pool}} part indicates 
> the problem is to do with {{HTable.getDefaultExecutor(conf)}} and the only 
> part of my code that interacts with that is a call to 
> {{conn.getBufferedMutator(tableName)}}.
>  
> Looking at the hbase client code shows a few problems:
> 1) Neither 1.4.13 nor 2.4.4's behavior matches the documentation for 
> {{conn.getBufferedMutator(tableName)}} which says:
> {quote}This BufferedMutator will use the Connection's ExecutorService.
> {quote}
> That suggests some singleton thread executor is being used which is not the 
> case.
>  
> 2) Under 1.4.13 you get a new {{ThreadPoolExecutor}} for every 
> {{BufferedMutator}}. That's probably not what you want but you likely won't 
> notice. I didn't. It's a code path I hadn't profiled much.
>  
> 3) Under 2.4.4 you get a new {{ThreadPoolExecutor}} for every 
> {{BufferedMutator}} *and* that {{ThreadPoolExecutor}} *is not* cleaned up 
> after the {{Mutator}} is closed. Each completed {{ThreadPoolExecutor}} 
> carries with it one thread which hangs around until a timeout value which 
> defaults to 60 seconds.
> My application creates one {{BufferedMutator}} for every incoming stream and 
> there are lots of streams, some of them are short lived so my code leaks 
> threads fast under 2.4.4.
> Here's the part where a new executor is created for every {{BufferedMutator}} 
> (it's similar for 1.4.13):
> [https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L420]
>  
> The reason for the leak in 2.4.4 is the should-we/shouldn't-we cleanup logic 
> added here:
> [https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/BufferedMutatorImpl.java#L104]
> That might be ok if {{pool}} was being initialized there but in the 
> {{conn.getBufferedMutator(tableName)}} code path it's not. {{pool}} is 
> initialized in {{conn.getBufferedMutator}} itself so the executor cleanup 
> code never runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-26088) conn.getBufferedMutator(tableName) leaks thread executors and other problems

Reply via email to