yuanboliu opened a new pull request, #4558: URL: https://github.com/apache/hadoop/pull/4558
The key code is: // code placeholder try { File blockFile = new File(info.getBlockURI()); if (blockFile != null && blockFile.getParentFile() == null) { errors.add("Failed to delete replica " + invalidBlks[i] + ". Parent not found for block file: " + blockFile); continue; } } catch(IllegalArgumentException e) { LOG.warn("Parent directory check failed; replica " + info + " is not backed by a local file"); } DN is trying to locate parent path of block file, thus there is a disk I/O in pool-level lock. When the disk becomes very busy with high io wait, All the pending threads will be blocked by the pool-level lock, and the time of heartbeat is high. We proposal to change the pool-level lock to volume-level lock for block invalidation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org