[ https://issues.apache.org/jira/browse/HDFS-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116304#comment-13116304 ]
Todd Lipcon commented on HDFS-2379: ----------------------------------- As discussed in the above-referenced JIRA, I think we can do something like the following pseudocode: {code} Set<Block> blocksFoundByScan = inconsistentScanVolume(); // ignore any file-not-founds we get due to concurrent FS modifications synchronized (volume) { Set<Block> missingFromScan = Sets.difference(volumeMap.keySet(), blocksFoundByScan); Set<Block> missingFromMem = Sets.difference(blocksFoundByScan, volumeMap.keySet()); for (Block b : missingFromScan) { // block is in memory but not in scan if (b exists on disk) { // it got added after we scanned that part of the tree! add it to block report } } for (Block b : missingFromMem) { // block was on disk but not in memory if (b no longer exists on disk) { // remove from block report - it was deleted after we scanned that part } } } {code} Anyone see a reason why this wouldn't work? Basically the idea is to do a "rough sketch" scan first, then anywhere we detect inconsistency, we touch it up, while holding the lock. > 0.20: Allow block reports to proceed without holding FSDataset lock > ------------------------------------------------------------------- > > Key: HDFS-2379 > URL: https://issues.apache.org/jira/browse/HDFS-2379 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Affects Versions: 0.20.206.0 > Reporter: Todd Lipcon > Priority: Critical > > As disks are getting larger and more plentiful, we're seeing DNs with > multiple millions of blocks on a single machine. When page cache space is > tight, block reports can take multiple minutes to generate. Currently, during > the scanning of the data directories to generate a report, the FSVolumeSet > lock is held. This causes writes and reads to block, timeout, etc, causing > big problems especially for clients like HBase. > This JIRA is to explore some of the ideas originally discussed in HADOOP-4584 > for the 0.20.20x series. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira