[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

Yongjun Zhang (JIRA) Tue, 09 Dec 2014 16:26:46 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240384#comment-14240384
 ]


Yongjun Zhang commented on HDFS-6833:
-------------------------------------

HI [~sinchii],

Thanks for working on the new rev. It doesn't apply now due to the latest trunk 
changes. 

I looked at rev10 and have the following comments:
* {{deletingBlock = new ReplicaMap(this);}} is currently using FsDataset object 
("this") as the mutex, we should replace the "this" with a different one that's 
dedicated to synchronize access to {{deletingBlock}} to avoid holding FsDataset 
object.
* There is no need to have {{private boolean scanning}} in {{DirectoryScanner}}
* There is no need for the local {{Set<Long> deletingBlockIds }} in {{scan()}} 
method, and we don't want to do the following because we do it from the other 
place after deleting block files:
519             if (dataset.getNumDeletingBlocks(bpid) > 0) {
520               dataset.removeDeletedBlocks(bpid, deletingBlockIds, true);
521             } 
* The log 
{code}
    statsRecord.deletingBlocks++;
    LOG.info("Block file " + blockpoolReport[d].getBlockFile() + " is to be 
deleted");
{code}
can be moved to the following place:
1570            ReplicaInfo removing = volumeMap.remove(bpid, invalidBlks[i]);
1571            if (datanode.isDirectoryScannerInited()) {
1572              deletingBlock.add(bpid, removing);
1573            }
And we should not check whether the directory scanner is running in the above 
code because the directory scanner can start any time.
E.g., change the above 1570-1572 code to something like:
{code}
ReplicaInfo removing = volumeMap.remove(bpid, invalidBlks[i]);
deletingBlock.add(bpid, removing);
statsRecord.deletingBlocks++;
LOG.info("Block file " + blockpoolReport[d].getBlockFile() + " is to be 
deleted");  ==> need to revise to only print at debug level?
{code}

* The following code need to be removed
{code}
} else {
    LOG.info("Block file " + blockpoolReport[d].getBlockFile() + " is to be 
deleted");
    statsRecord.deletingBlocks++;
    deletingBlockIds.add(info.getBlockId());
}
{code}


> DirectoryScanner should not register a deleting block with memory of DataNode
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-6833
>                 URL: https://issues.apache.org/jira/browse/HDFS-6833
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.0.0, 2.5.0, 2.5.1
>            Reporter: Shinichi Yamashita
>            Assignee: Shinichi Yamashita
>            Priority: Critical
>         Attachments: HDFS-6833-10.patch, HDFS-6833-6-2.patch, 
> HDFS-6833-6-3.patch, HDFS-6833-6.patch, HDFS-6833-7-2.patch, 
> HDFS-6833-7.patch, HDFS-6833.8.patch, HDFS-6833.9.patch, HDFS-6833.patch, 
> HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch
>
>
> When a block is deleted in DataNode, the following messages are usually 
> output.
> {code}
> 2014-08-07 17:53:11,606 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Scheduling blk_1073741825_1001 file 
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
>  for deletion
> 2014-08-07 17:53:11,617 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> {code}
> However, DirectoryScanner may be executed when DataNode deletes the block in 
> the current implementation. And the following messsages are output.
> {code}
> 2014-08-07 17:53:30,519 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Scheduling blk_1073741825_1001 file 
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
>  for deletion
> 2014-08-07 17:53:31,426 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata 
> files:0, missing block files:0, missing blocks in memory:1, mismatched 
> blocks:0
> 2014-08-07 17:53:31,426 WARN 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added 
> missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
>   getNumBytes()     = 21230663
>   getBytesOnDisk()  = 21230663
>   getVisibleLength()= 21230663
>   getVolume()       = /hadoop/data1/dfs/data/current
>   getBlockFile()    = 
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
>   unlinked          =false
> 2014-08-07 17:53:31,531 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
>  Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file 
> /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> {code}
> Deleting block information is registered in DataNode's memory.
> And when DataNode sends a block report, NameNode receives wrong block 
> information.
> For example, when we execute recommission or change the number of 
> replication, NameNode may delete the right block as "ExcessReplicate" by this 
> problem.
> And "Under-Replicated Blocks" and "Missing Blocks" occur.
> When DataNode run DirectoryScanner, DataNode should not register a deleting 
> block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode

Reply via email to