Shuyan Zhang created HDFS-16146: ----------------------------------- Summary: All three replicas are lost due to not adding a new DataNode in time Key: HDFS-16146 URL: https://issues.apache.org/jira/browse/HDFS-16146 Project: Hadoop HDFS Issue Type: Bug Components: datanode, hdfs Reporter: Shuyan Zhang Assignee: Shuyan Zhang
We have a three-replica file, and all replicas of a block are lost when the default datanode replacement strategy is used. It happened like this: 1. addBlock() applies for a new block and successfully connects three datanodes (dn1, dn2 and dn3) to build a pipeline; 2. Write data; 3. dn1 has an error and was kicked out. At this time, the remaining datanodes in the pipeline > 1, according to the replacement strategy, there is no need to add a new datanode; 4. After writing is completed, enter PIPELINE_CLOSE; 5. dn2 has an error and was kicked out. But because it is already in the close phase, addDatanode2ExistingPipeline() decides to hand over the task of transfering the replica to the NameNode. At this time, there is only one datanode left in the pipeline; 6. dn3 error, all replicas are lost. If we add a new datanode in step 5, we can avoid losing all replicas in this case. I think error in PIPELINE_CLOSE and error in DATA_STREAMING have the same risk of losing replicas, we should not skip adding a new datanode during PIPELINE_CLOSE. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org