[ https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909059#comment-16909059 ]
Duo Zhang commented on HBASE-22867: ----------------------------------- OK, both in fork and submit, we will try to create new workers if too few workers are active. The code are very 'Doug Lea' so not easy to fully understand but at least the comments tell this... So I do not think we should use ForkJoinPool here then. This is not an in memory computation, some tasks may be pending for a long time, and introduce lots of threads... > The ForkJoinPool in CleanerChore will spawn thousands of threads in our > cluster with thousands table > ---------------------------------------------------------------------------------------------------- > > Key: HBASE-22867 > URL: https://issues.apache.org/jira/browse/HBASE-22867 > Project: HBase > Issue Type: Bug > Reporter: Zheng Hu > Priority: Critical > Attachments: 31162.stack.1 > > > The thousands of spawned threads make the safepoint cost 80+s in our Master > JVM processs. > {code} > 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] > org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard > from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket > connection and at > tempting reconnect > {code} > The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s) > {code} > vmop [threads: total initially_running wait_to_block] > [time: spin block sync cleanup vmop] page_trap_count > 32358.859: ForceAsyncSafepoint [ 9126 67 > 474 ] [ 1 28 86596 87 101 ] 0 > {code} > Also we got the jstack: > {code} > $ cat 31162.stack.1 | grep 'ForkJoinPool-1-worker' | wc -l > 8648 > {code} > It's a dangerous bug, make it as blocker. -- This message was sent by Atlassian JIRA (v7.6.14#76016)