[
https://issues.apache.org/jira/browse/HDFS-16657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18036654#comment-18036654
]
ASF GitHub Bot commented on HDFS-16657:
---------------------------------------
github-actions[bot] closed pull request #4558: HDFS-16657 Changing pool-level
lock to volume-level lock for invalida…
URL: https://github.com/apache/hadoop/pull/4558
> Changing pool-level lock to volume-level lock for invalidation of blocks
> ------------------------------------------------------------------------
>
> Key: HDFS-16657
> URL: https://issues.apache.org/jira/browse/HDFS-16657
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Yuanbo Liu
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2022-07-13-10-25-37-383.png,
> image-2022-07-13-10-27-01-386.png, image-2022-07-13-10-27-44-258.png
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Recently we see that the heartbeating of dn become slow in a very busy
> cluster, here is the chart:
> !image-2022-07-13-10-25-37-383.png|width=665,height=245!
>
> After getting jstack of the dn, we find that dn heartbeat stuck in
> invalidation of blocks:
> !image-2022-07-13-10-27-01-386.png|width=658,height=308!
> !image-2022-07-13-10-27-44-258.png|width=502,height=325!
> The key code is:
> {code:java}
> // code placeholder
> try {
> File blockFile = new File(info.getBlockURI());
> if (blockFile != null && blockFile.getParentFile() == null) {
> errors.add("Failed to delete replica " + invalidBlks[i]
> + ". Parent not found for block file: " + blockFile);
> continue;
> }
> } catch(IllegalArgumentException e) {
> LOG.warn("Parent directory check failed; replica " + info
> + " is not backed by a local file");
> } {code}
> DN is trying to locate parent path of block file, thus there is a disk I/O in
> pool-level lock. When the disk becomes very busy with high io wait, All the
> pending threads will be blocked by the pool-level lock, and the time of
> heartbeat is high. We proposal to change the pool-level lock to volume-level
> lock for block invalidation
> cc: [~hexiaoqiao] [~Aiphag0]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]