[ 
https://issues.apache.org/jira/browse/HDFS-11856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033830#comment-16033830
 ] 

John Zhuge commented on HDFS-11856:
-----------------------------------

[~vinayrpet] and [~kihwal] Will this patch help clusters with more than 3 DNs? 
We saw HBase RegionServer occasional crashing with DamagedWALException after 
the following pipeline recovery failure:
{noformat}
java.io.IOException: All datanodes 
DatanodeInfoWithStorage[x.x.x.x:20002,DS-uuid,DISK] are bad. Aborting...
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1465)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1236)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:721)
{noformat}

> Ability to re-add Upgrading Nodes (remote) to pipeline for future pipeline 
> updates
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-11856
>                 URL: https://issues.apache.org/jira/browse/HDFS-11856
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client, rolling upgrades
>    Affects Versions: 2.7.3
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>             Fix For: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.2
>
>         Attachments: HDFS-11856-01.patch, HDFS-11856-02.branch-2.patch, 
> HDFS-11856-02.patch, HDFS-11856-branch-2-02.patch, 
> HDFS-11856-branch-2.7-02.patch, HDFS-11856-branch-2.8-02.patch
>
>
> During rolling upgrade if the DN gets restarted, then it will send special 
> OOB_RESTART status to all streams opened for write.
> 1. Local clients will wait for 30 seconds to datanode to come back.
> 2. Remote clients will consider these nodes as bad nodes and continue with 
> pipeline recoveries and write. These restarted nodes will be considered as 
> bad, and will be excluded for lifetime of stream.
> In case of small cluster, where total nodes itself is 3, each time a remote 
> node restarts for upgrade, it will be excluded.
> So a stream writing to 3 nodes initial, will end-up writing to only one node 
> at the end, there are no other nodes to replace.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to