[ https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240384#comment-14240384 ]
Yongjun Zhang commented on HDFS-6833: ------------------------------------- HI [~sinchii], Thanks for working on the new rev. It doesn't apply now due to the latest trunk changes. I looked at rev10 and have the following comments: * {{deletingBlock = new ReplicaMap(this);}} is currently using FsDataset object ("this") as the mutex, we should replace the "this" with a different one that's dedicated to synchronize access to {{deletingBlock}} to avoid holding FsDataset object. * There is no need to have {{private boolean scanning}} in {{DirectoryScanner}} * There is no need for the local {{Set<Long> deletingBlockIds }} in {{scan()}} method, and we don't want to do the following because we do it from the other place after deleting block files: 519 if (dataset.getNumDeletingBlocks(bpid) > 0) { 520 dataset.removeDeletedBlocks(bpid, deletingBlockIds, true); 521 } * The log {code} statsRecord.deletingBlocks++; LOG.info("Block file " + blockpoolReport[d].getBlockFile() + " is to be deleted"); {code} can be moved to the following place: 1570 ReplicaInfo removing = volumeMap.remove(bpid, invalidBlks[i]); 1571 if (datanode.isDirectoryScannerInited()) { 1572 deletingBlock.add(bpid, removing); 1573 } And we should not check whether the directory scanner is running in the above code because the directory scanner can start any time. E.g., change the above 1570-1572 code to something like: {code} ReplicaInfo removing = volumeMap.remove(bpid, invalidBlks[i]); deletingBlock.add(bpid, removing); statsRecord.deletingBlocks++; LOG.info("Block file " + blockpoolReport[d].getBlockFile() + " is to be deleted"); ==> need to revise to only print at debug level? {code} * The following code need to be removed {code} } else { LOG.info("Block file " + blockpoolReport[d].getBlockFile() + " is to be deleted"); statsRecord.deletingBlocks++; deletingBlockIds.add(info.getBlockId()); } {code} > DirectoryScanner should not register a deleting block with memory of DataNode > ----------------------------------------------------------------------------- > > Key: HDFS-6833 > URL: https://issues.apache.org/jira/browse/HDFS-6833 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 3.0.0, 2.5.0, 2.5.1 > Reporter: Shinichi Yamashita > Assignee: Shinichi Yamashita > Priority: Critical > Attachments: HDFS-6833-10.patch, HDFS-6833-6-2.patch, > HDFS-6833-6-3.patch, HDFS-6833-6.patch, HDFS-6833-7-2.patch, > HDFS-6833-7.patch, HDFS-6833.8.patch, HDFS-6833.9.patch, HDFS-6833.patch, > HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch > > > When a block is deleted in DataNode, the following messages are usually > output. > {code} > 2014-08-07 17:53:11,606 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: > Scheduling blk_1073741825_1001 file > /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 > for deletion > 2014-08-07 17:53:11,617 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: > Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file > /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 > {code} > However, DirectoryScanner may be executed when DataNode deletes the block in > the current implementation. And the following messsages are output. > {code} > 2014-08-07 17:53:30,519 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: > Scheduling blk_1073741825_1001 file > /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 > for deletion > 2014-08-07 17:53:31,426 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata > files:0, missing block files:0, missing blocks in memory:1, mismatched > blocks:0 > 2014-08-07 17:53:31,426 WARN > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added > missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED > getNumBytes() = 21230663 > getBytesOnDisk() = 21230663 > getVisibleLength()= 21230663 > getVolume() = /hadoop/data1/dfs/data/current > getBlockFile() = > /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 > unlinked =false > 2014-08-07 17:53:31,531 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: > Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file > /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825 > {code} > Deleting block information is registered in DataNode's memory. > And when DataNode sends a block report, NameNode receives wrong block > information. > For example, when we execute recommission or change the number of > replication, NameNode may delete the right block as "ExcessReplicate" by this > problem. > And "Under-Replicated Blocks" and "Missing Blocks" occur. > When DataNode run DirectoryScanner, DataNode should not register a deleting > block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)