[ https://issues.apache.org/jira/browse/HDFS-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HDFS-16146: ---------------------------------- Labels: pull-request-available (was: ) > All three replicas are lost due to not adding a new DataNode in time > -------------------------------------------------------------------- > > Key: HDFS-16146 > URL: https://issues.apache.org/jira/browse/HDFS-16146 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs > Reporter: Shuyan Zhang > Assignee: Shuyan Zhang > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We have a three-replica file, and all replicas of a block are lost when the > default datanode replacement strategy is used. It happened like this: > 1. addBlock() applies for a new block and successfully connects three > datanodes (dn1, dn2 and dn3) to build a pipeline; > 2. Write data; > 3. dn1 has an error and was kicked out. At this time, the remaining datanodes > in the pipeline > 1, according to the replacement strategy, there is no need > to add a new datanode; > 4. After writing is completed, enter PIPELINE_CLOSE; > 5. dn2 has an error and was kicked out. But because it is already in the > close phase, addDatanode2ExistingPipeline() decides to hand over the task of > transfering the replica to the NameNode. At this time, there is only one > datanode left in the pipeline; > 6. dn3 error, all replicas are lost. > If we add a new datanode in step 5, we can avoid losing all replicas in this > case. I think error in PIPELINE_CLOSE and error in DATA_STREAMING have the > same risk of losing replicas, we should not skip adding a new datanode > during PIPELINE_CLOSE. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org