[ https://issues.apache.org/jira/browse/HBASE-26088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385527#comment-17385527 ]
Viraj Jasani commented on HBASE-26088: -------------------------------------- Nice find and fix (y) > conn.getBufferedMutator(tableName) leaks thread executors and other problems > ---------------------------------------------------------------------------- > > Key: HBASE-26088 > URL: https://issues.apache.org/jira/browse/HBASE-26088 > Project: HBase > Issue Type: Bug > Components: Client > Affects Versions: 2.0.0 > Reporter: Whitney Jackson > Assignee: Rushabh Shah > Priority: Critical > Fix For: 2.5.0, 2.3.6, 2.4.5 > > > TL;DR: {{conn.getBufferedMutator(tableName)}} is dangerous in hbase client > 2.4.4 and doesn't match documented behavior in 1.4.13. > To work around the problems until fixed do this: > {code:java} > var mySingletonPool = HTable.getDefaultExecutor(hbaseConf); > var params = new BufferedMutatorParams(tableName); > params.pool(mySingletonPool); > var myMutator = conn.getBufferedMutator(params); > {code} > And avoid code like this: > {code:java} > var myMutator = conn.getBufferedMutator(tableName); > {code} > The full story: > My application started leaking threads after upgrading from hbase client > 1.4.13 to 2.4.4. So much so that after less than a minute of runtime more > that 30k threads are leaked and all available virtual memory on the box (> 50 > GB) is consumed. Other processes on the box start crashing with memory > allocation errors. Even running {{ls}} at the shell fails with OS resource > allocation failures. > A thread dump after just a few seconds of runtime shows thousands of threads > like this: > {code:java} > "htable-pool-0" #8841 prio=5 os_prio=0 cpu=0.15ms elapsed=7.49s > tid=0x00007efb6d2a1000 nid=0x57d2 waiting on condition [0x00007ef8a6c38000] > java.lang.Thread.State: TIMED_WAITING (parking) > at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method) > - parking to wait for <0x00000007e7cd6188> (a > java.util.concurrent.SynchronousQueue$TransferStack) > at > java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6/LockSupport.java:234) > at > java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.base@11.0.6/SynchronousQueue.java:462) > at > java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@11.0.6/SynchronousQueue.java:361) > at > java.util.concurrent.SynchronousQueue.poll(java.base@11.0.6/SynchronousQueue.java:937) > at > java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.6/ThreadPoolExecutor.java:1053) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6/ThreadPoolExecutor.java:1114) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6/ThreadPoolExecutor.java:628) > at java.lang.Thread.run(java.base@11.0.6/Thread.java:834) > {code} > > Note: All the threads are labeled {{htable-pool-0}}. That suggests we're > leaking thread executors not just threads. The {{htable-pool}} part indicates > the problem is to do with {{HTable.getDefaultExecutor(conf)}} and the only > part of my code that interacts with that is a call to > {{conn.getBufferedMutator(tableName)}}. > > Looking at the hbase client code shows a few problems: > 1) Neither 1.4.13 nor 2.4.4's behavior matches the documentation for > {{conn.getBufferedMutator(tableName)}} which says: > {quote}This BufferedMutator will use the Connection's ExecutorService. > {quote} > That suggests some singleton thread executor is being used which is not the > case. > > 2) Under 1.4.13 you get a new {{ThreadPoolExecutor}} for every > {{BufferedMutator}}. That's probably not what you want but you likely won't > notice. I didn't. It's a code path I hadn't profiled much. > > 3) Under 2.4.4 you get a new {{ThreadPoolExecutor}} for every > {{BufferedMutator}} *and* that {{ThreadPoolExecutor}} *is not* cleaned up > after the {{Mutator}} is closed. Each completed {{ThreadPoolExecutor}} > carries with it one thread which hangs around until a timeout value which > defaults to 60 seconds. > My application creates one {{BufferedMutator}} for every incoming stream and > there are lots of streams, some of them are short lived so my code leaks > threads fast under 2.4.4. > Here's the part where a new executor is created for every {{BufferedMutator}} > (it's similar for 1.4.13): > [https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L420] > > The reason for the leak in 2.4.4 is the should-we/shouldn't-we cleanup logic > added here: > [https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/BufferedMutatorImpl.java#L104] > That might be ok if {{pool}} was being initialized there but in the > {{conn.getBufferedMutator(tableName)}} code path it's not. {{pool}} is > initialized in {{conn.getBufferedMutator}} itself so the executor cleanup > code never runs. -- This message was sent by Atlassian Jira (v8.3.4#803005)