[ https://issues.apache.org/jira/browse/HBASE-25899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaolin Ha updated HBASE-25899: ------------------------------- Attachment: 78631.jstack > Improve efficiency of SnapshotHFileCleaner > ------------------------------------------ > > Key: HBASE-25899 > URL: https://issues.apache.org/jira/browse/HBASE-25899 > Project: HBase > Issue Type: Improvement > Components: master > Affects Versions: 3.0.0-alpha-1, 2.0.0 > Reporter: Xiaolin Ha > Assignee: Xiaolin Ha > Priority: Major > Attachments: 78631.jstack > > > We have met same problems of thousands threads in HBASE-22867, but after this > issue, the cleaner becomes more inefficient. > From the jstack we can see that most dir-scan threads are blocked at > SnapshotHFileCleaner#getDeletableFiles, > {code:java} > "dir-scan-pool-19" #694 daemon prio=5 os_prio=0 tid=0x0000000002ab1800 > nid=0x26a7e waiting for monitor entry [0x00007fb0a9913000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.getDeletableFiles(SnapshotHFileCleaner.java:74) > - waiting to lock <0x00007fb148737048> (a > org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:498) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$traverseAndDelete$1(CleanerChore.java:246) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$41/1187372779.act(Unknown > Source) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.deleteAction(CleanerChore.java:358) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.traverseAndDelete(CleanerChore.java:246) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$null$2(CleanerChore.java:255) > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$38/2003131501.run(Unknown > Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745){code} > and all the HFileCleaner threads are waiting at the delete tasks queue, > {code:java} > "gha-data-hbase0002:16000.activeMasterManager-HFileCleaner.large.2-1621210982419" > #358 daemon prio=5 os_prio=0 tid=0x00007fb967fc0000 nid=0x266f2 waiting on > condition [0x00007fb0c57d6000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00007fb1486db9f0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > org.apache.hadoop.hbase.util.StealJobQueue.take(StealJobQueue.java:106) > at > org.apache.hadoop.hbase.master.cleaner.HFileCleaner.consumerLoop(HFileCleaner.java:264) > at > org.apache.hadoop.hbase.master.cleaner.HFileCleaner$1.run(HFileCleaner.java:233) > {code} > So it's need to increase the speed of scanning files. But since the > getDeletableFiles is a synchronized method, increasing the number of scan-dir > threads can not solve this problem. > After looking through the codes in SnapshotHFileCleaner and > SnapshotFileCache, I think the lock granularity in them should be optimized. > -- This message was sent by Atlassian Jira (v8.3.4#803005)