[ https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911127#comment-16911127 ]
Reid Chan commented on HBASE-22867: ----------------------------------- First skimmed, as i said, the root cause is not the choice of FJP or TP (but i do agree the `cap` of FJP is a concern). The current pr will just end up piling up the BlockinqQueue of TP with SnapshotHFileCleaner#getDeletableFiles tasks. > The ForkJoinPool in CleanerChore will spawn thousands of threads in our > cluster with thousands table > ---------------------------------------------------------------------------------------------------- > > Key: HBASE-22867 > URL: https://issues.apache.org/jira/browse/HBASE-22867 > Project: HBase > Issue Type: Bug > Reporter: Zheng Hu > Assignee: Zheng Hu > Priority: Critical > Attachments: 31162.stack.1 > > > The thousands of spawned threads make the safepoint cost 80+s in our Master > JVM processs. > {code} > 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] > org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard > from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket > connection and at > tempting reconnect > {code} > The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s) > {code} > vmop [threads: total initially_running wait_to_block] > [time: spin block sync cleanup vmop] page_trap_count > 32358.859: ForceAsyncSafepoint [ 9126 67 > 474 ] [ 1 28 86596 87 101 ] 0 > {code} > Also we got the jstack: > {code} > $ cat 31162.stack.1 | grep 'ForkJoinPool-1-worker' | wc -l > 8648 > {code} > It's a dangerous bug, make it as blocker. -- This message was sent by Atlassian Jira (v8.3.2#803003)