Gordon Wang created HDFS-6636:
---------------------------------

             Summary: NameNode should remove block replica out from corrupted 
replica map when adding block under construction
                 Key: HDFS-6636
                 URL: https://issues.apache.org/jira/browse/HDFS-6636
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 2.2.0
            Reporter: Gordon Wang


In our test environment, we found the namenode can not handle incremental block 
report correctly when the block replica is under construction and the replica 
is marked as corrupt.
Here is our scenario.
*the block had 3 replica by default. But because one datanode was down, the 
available replica for the block was 2. Say the alive datanode is DN1 and DN2.
*client tried to append data to the block. And during appending, something was 
wrong with the pipeline. Then, client did the pipeline recovery, only one 
datanode DN1 is in the pipeline now.
*For some unknown reason(might be the IO error), DN2 got checksum error when 
receiving block data from DN1, then DN2 reported the replica on DN1 as bad 
block to NameNode. But actually, client was appending data to replica on DN1, 
and the replica is good.
*NameNode marked replica on DN1 as corrupt.
*When client finished appending, DN1 checked the data in the replica, and the 
replica is OK. Then, DN1 finalized the replica, DN1 reported the block as 
received block to NameNode.
*NameNode handled the incremental block report form DN1, because the block is 
under construction. NameNode called the addStoredBlockUnderConstruction in 
block manager. But as the replica on DN1 was never removed from the corrupted 
block. The number of alive replica for the block was 0, and the number of 
corrupt replica was 1.
*client could not complete the file because the number of alive replicas for 
the last block was smaller than minimal replica number. 
 




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to