[ 
https://issues.apache.org/jira/browse/HBASE-26088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-26088:
-----------------------------------
    Release Note: 
The API doc for Connection#getBufferedMutator(TableName) and 
Connection#getBufferedMutator(BufferedMutatorParams) mentioned that when user 
dont pass a ThreadPool to be used, we use the ThreadPool in the Connection.  
But in reality, we were creating new ThreadPool in such cases.

We are keeping the behaviour of code as is but corrected the Javadoc and also a 
bug of not closing this new pool while Closing the BufferedMutator.

> conn.getBufferedMutator(tableName) leaks thread executors and other problems
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-26088
>                 URL: https://issues.apache.org/jira/browse/HBASE-26088
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 1.4.13, 2.4.4
>            Reporter: Whitney Jackson
>            Assignee: Rushabh Shah
>            Priority: Critical
>             Fix For: 2.5.0, 2.3.6, 2.4.5
>
>
> TL;DR: {{conn.getBufferedMutator(tableName)}} is dangerous in hbase client 
> 2.4.4 and doesn't match documented behavior in 1.4.13.
> To work around the problems until fixed do this:
> {code:java}
> var mySingletonPool = HTable.getDefaultExecutor(hbaseConf);
> var params = new BufferedMutatorParams(tableName);
> params.pool(mySingletonPool);
> var myMutator = conn.getBufferedMutator(params);
> {code}
> And avoid code like this:
> {code:java}
> var myMutator = conn.getBufferedMutator(tableName);
> {code}
> The full story:
> My application started leaking threads after upgrading from hbase client 
> 1.4.13 to 2.4.4. So much so that after less than a minute of runtime more 
> that 30k threads are leaked and all available virtual memory on the box (> 50 
> GB) is consumed. Other processes on the box start crashing with memory 
> allocation errors. Even running {{ls}} at the shell fails with OS resource 
> allocation failures.
> A thread dump after just a few seconds of runtime shows thousands of threads 
> like this:
> {code:java}
> "htable-pool-0" #8841 prio=5 os_prio=0 cpu=0.15ms elapsed=7.49s 
> tid=0x00007efb6d2a1000 nid=0x57d2 waiting on condition [0x00007ef8a6c38000]
>  java.lang.Thread.State: TIMED_WAITING (parking)
>  at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
>  - parking to wait for <0x00000007e7cd6188> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>  at 
> java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6/LockSupport.java:234)
>  at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.base@11.0.6/SynchronousQueue.java:462)
>  at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@11.0.6/SynchronousQueue.java:361)
>  at 
> java.util.concurrent.SynchronousQueue.poll(java.base@11.0.6/SynchronousQueue.java:937)
>  at 
> java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.6/ThreadPoolExecutor.java:1053)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6/ThreadPoolExecutor.java:1114)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6/ThreadPoolExecutor.java:628)
>  at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> {code}
>  
> Note: All the threads are labeled {{htable-pool-0}}. That suggests we're 
> leaking thread executors not just threads. The {{htable-pool}} part indicates 
> the problem is to do with {{HTable.getDefaultExecutor(conf)}} and the only 
> part of my code that interacts with that is a call to 
> {{conn.getBufferedMutator(tableName)}}.
>  
> Looking at the hbase client code shows a few problems:
> 1) Neither 1.4.13 nor 2.4.4's behavior matches the documentation for 
> {{conn.getBufferedMutator(tableName)}} which says:
> {quote}This BufferedMutator will use the Connection's ExecutorService.
> {quote}
> That suggests some singleton thread executor is being used which is not the 
> case.
>  
> 2) Under 1.4.13 you get a new {{ThreadPoolExecutor}} for every 
> {{BufferedMutator}}. That's probably not what you want but you likely won't 
> notice. I didn't. It's a code path I hadn't profiled much.
>  
> 3) Under 2.4.4 you get a new {{ThreadPoolExecutor}} for every 
> {{BufferedMutator}} *and* that {{ThreadPoolExecutor}} *is not* cleaned up 
> after the {{Mutator}} is closed. Each completed {{ThreadPoolExecutor}} 
> carries with it one thread which hangs around until a timeout value which 
> defaults to 60 seconds.
> My application creates one {{BufferedMutator}} for every incoming stream and 
> there are lots of streams, some of them are short lived so my code leaks 
> threads fast under 2.4.4.
> Here's the part where a new executor is created for every {{BufferedMutator}} 
> (it's similar for 1.4.13):
> [https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L420]
>  
> The reason for the leak in 2.4.4 is the should-we/shouldn't-we cleanup logic 
> added here:
> [https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/BufferedMutatorImpl.java#L104]
> That might be ok if {{pool}} was being initialized there but in the 
> {{conn.getBufferedMutator(tableName)}} code path it's not. {{pool}} is 
> initialized in {{conn.getBufferedMutator}} itself so the executor cleanup 
> code never runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to