Shuyan Zhang created HDFS-16146:
-----------------------------------
Summary: All three replicas are lost due to not adding a new
DataNode in time
Key: HDFS-16146
URL: https://issues.apache.org/jira/browse/HDFS-16146
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode, hdfs
Reporter: Shuyan Zhang
Assignee: Shuyan Zhang
We have a three-replica file, and all replicas of a block are lost when the
default datanode replacement strategy is used. It happened like this:
1. addBlock() applies for a new block and successfully connects three datanodes
(dn1, dn2 and dn3) to build a pipeline;
2. Write data;
3. dn1 has an error and was kicked out. At this time, the remaining datanodes
in the pipeline > 1, according to the replacement strategy, there is no need to
add a new datanode;
4. After writing is completed, enter PIPELINE_CLOSE;
5. dn2 has an error and was kicked out. But because it is already in the close
phase, addDatanode2ExistingPipeline() decides to hand over the task of
transfering the replica to the NameNode. At this time, there is only one
datanode left in the pipeline;
6. dn3 error, all replicas are lost.
If we add a new datanode in step 5, we can avoid losing all replicas in this
case. I think error in PIPELINE_CLOSE and error in DATA_STREAMING have the same
risk of losing replicas, we should not skip adding a new datanode during
PIPELINE_CLOSE.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]