[jira] [Updated] (HDFS-16146) All three replicas are lost due to not adding a new DataNode in time

ASF GitHub Bot (Jira) Thu, 29 Jul 2021 01:48:06 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated HDFS-16146:
----------------------------------
    Labels: pull-request-available  (was: )

> All three replicas are lost due to not adding a new DataNode in time
> --------------------------------------------------------------------
>
>                 Key: HDFS-16146
>                 URL: https://issues.apache.org/jira/browse/HDFS-16146
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs
>            Reporter: Shuyan Zhang
>            Assignee: Shuyan Zhang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have a three-replica file, and all replicas of a block are lost when the 
> default datanode replacement strategy is used. It happened like this:
> 1. addBlock() applies for a new block and successfully connects three 
> datanodes (dn1, dn2 and dn3) to build a pipeline;
> 2. Write data;
> 3. dn1 has an error and was kicked out. At this time, the remaining datanodes 
> in the pipeline > 1, according to the replacement strategy, there is no need 
> to add a new datanode;
> 4. After writing is completed, enter PIPELINE_CLOSE;
> 5. dn2 has an error and was kicked out. But because it is already in the 
> close phase, addDatanode2ExistingPipeline() decides to hand over the task of 
> transfering the replica to the NameNode. At this time, there is only one 
> datanode left in the pipeline;
> 6. dn3 error, all replicas are lost.
> If we add a new datanode in step 5, we can avoid losing all replicas in this 
> case. I think error in PIPELINE_CLOSE and error in DATA_STREAMING have the 
> same risk of losing replicas,  we should not skip adding a new datanode 
> during PIPELINE_CLOSE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16146) All three replicas are lost due to not adding a new DataNode in time

Reply via email to