[
https://issues.apache.org/jira/browse/HDFS-16438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18039026#comment-18039026
]
ASF GitHub Bot commented on HDFS-16438:
---------------------------------------
github-actions[bot] commented on PR #3928:
URL: https://github.com/apache/hadoop/pull/3928#issuecomment-3544466719
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> Avoid holding read locks for a long time when scanDatanodeStorage
> -----------------------------------------------------------------
>
> Key: HDFS-16438
> URL: https://issues.apache.org/jira/browse/HDFS-16438
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Tao Li
> Assignee: Tao Li
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2022-01-25-23-18-30-275.png
>
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> At the time of decommission, if use {*}DatanodeAdminBackoffMonitor{*}, there
> is a heavy operation: {*}scanDatanodeStorage{*}. If the number of blocks on a
> storage is large(more than 5 hundred thousand), and GC performance is also
> poor, it may hold *read lock* for a long time, we should optimize it.
> !image-2022-01-25-23-18-30-275.png|width=764,height=193!
> {code:java}
> 2021-12-22 07:49:01,279 INFO namenode.FSNamesystem
> (FSNamesystemLock.java:readUnlock(220)) - FSNamesystem scanDatanodeStorage
> read lock held for 5491 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:222)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readUnlock(FSNamesystem.java:1641)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.scanDatanodeStorage(DatanodeAdminBackoffMonitor.java:646)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.checkForCompletedNodes(DatanodeAdminBackoffMonitor.java:417)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.check(DatanodeAdminBackoffMonitor.java:300)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.run(DatanodeAdminBackoffMonitor.java:201)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> Number of suppressed read-lock reports: 0
> Longest read-lock held interval: 5491 {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]