[jira] [Created] (HDFS-16146) All three replicas are lost due to not adding a new DataNode in time

Shuyan Zhang (Jira) Thu, 29 Jul 2021 01:45:12 -0700

Shuyan Zhang created HDFS-16146:
-----------------------------------

             Summary: All three replicas are lost due to not adding a new 
DataNode in time
                 Key: HDFS-16146
                 URL: https://issues.apache.org/jira/browse/HDFS-16146
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode, hdfs
            Reporter: Shuyan Zhang
            Assignee: Shuyan Zhang



We have a three-replica file, and all replicas of a block are lost when the 
default datanode replacement strategy is used. It happened like this:
1. addBlock() applies for a new block and successfully connects three datanodes 
(dn1, dn2 and dn3) to build a pipeline;
2. Write data;
3. dn1 has an error and was kicked out. At this time, the remaining datanodes 
in the pipeline > 1, according to the replacement strategy, there is no need to 
add a new datanode;
4. After writing is completed, enter PIPELINE_CLOSE;
5. dn2 has an error and was kicked out. But because it is already in the close 
phase, addDatanode2ExistingPipeline() decides to hand over the task of 
transfering the replica to the NameNode. At this time, there is only one 
datanode left in the pipeline;
6. dn3 error, all replicas are lost.
If we add a new datanode in step 5, we can avoid losing all replicas in this 
case. I think error in PIPELINE_CLOSE and error in DATA_STREAMING have the same 
risk of losing replicas,  we should not skip adding a new datanode during 
PIPELINE_CLOSE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16146) All three replicas are lost due to not adding a new DataNode in time

Reply via email to