Kihwal Lee created HDFS-5557:
--------------------------------

             Summary: Write pipeline recovery for the last packet in the block 
may cause rejection of valid replicas
                 Key: HDFS-5557
                 URL: https://issues.apache.org/jira/browse/HDFS-5557
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 0.23.9, 2.3.0
            Reporter: Kihwal Lee
            Priority: Critical


When a block is reported from a data node while the block is under construction 
(i.e. not committed or completed), BlockManager calls 
BlockInfoUnderConstruction.addReplicaIfNotPresent() to update the reported 
replica state. But BlockManager is calling it with the stored block, not 
reported block.  This causes the recorded replicas' gen stamp to be that of 
BlockInfoUnderConstruction itself, not the one from reported replica.

When a pipeline recovery is done for the last packet of a block, the 
incremental block reports with the new gen stamp may come before the client 
calling updatePipeline(). If this happens, these replicas will be incorrectly 
recorded with the old gen stamp and get removed later.  The result is close or 
addAdditionalBlock failure.

If the last block is completed, but the penultimate block is not because of 
this issue, the file won't be closed. If this file is not cleared, but the 
client goes away, the lease manager will try to recover the lease/block, at 
which point it will crash. I will file a separate jira for this shortly.

The worst case is to reject all good ones and accepting a bad one. In this 
case, the block will get completed, but the data cannot be read until the next 
full block report containing one of the valid replicas is received.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to