xuzq created HDFS-12687:
---------------------------

             Summary: Client has recovered DN will not be removed from the 
“filed”
                 Key: HDFS-12687
                 URL: https://issues.apache.org/jira/browse/HDFS-12687
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs
    Affects Versions: 2.8.1
            Reporter: xuzq


When client writing pipeline, such as Client=>DN1=>DN2=DN3.At one point, DN2 
crashed, client will execute the recovery process. The error DN2 will be added 
into "failed". Client will apply a new DN from NN with "failed" and replace the 
DN2 in the pipeline, eg: Client=>DN1=>DN4=>DN3. 
This Client running....
After a long time, client is still writing data for the file. Of course, there 
are many pipelines. eg. Client => DN-1 => DN-2 => DN-3.
When DN-2 crashed, error DN-2 will be added into "failed", client will execute 
the recovery process as before. It will get a new DN from NN with the "failed", 
and {color:red}NN will select one DN from all DNs exclude "failed", even if 
DN-2 has restarted{color}.

Questions:
Why not remove DN2(started) from "failed"??
Why is the collection of error nodes in the recovery process Shared with the 
get next Block.such as
private final List<DatanodeInfo> failed = new ArrayList<>();
private final LoadingCache<DatanodeInfo, DatanodeInfo> excludedNodes;

As Before, when DN2 crashed, client will recovery the pipeline after 
timeout(default worst need 490s). When the client finished writing this block 
and apply the next block, NN maybe return the block which contains the error 
data node 'DN2'. When client will create a new pipeline for the new block, 
{color:red}client will has to go through a connection timeout{color}(default 
need 60s).

If "failed" and "excludedNodes" is one collection, it will avoid the connection 
timeout. Because "excludedNodes" is dynamically deleted, it also avoid the 
first problem.






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to