[ 
https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912525#comment-13912525
 ] 

Kihwal Lee commented on HDFS-5535:
----------------------------------

[~stack]: If the local DN was added to {{deadNodes}} in a {{DFSInputStream}} 
because it was restarted, we may be able to (asynchronously?) probe and remove 
it from {{deadNodes}}. Then when a block boundary is crossed, the local node 
will get used again.  If explicit restart notifications are complex or 
unreliable (due to timing & network, etc), we could do the recovery solely 
based on the failure mode. E.g. abrupt connection breakage by the remote end, 
connection refusal/reset, etc. Background ping thread can reintroduce it 
independent of on-going reads.  I haven't checked how easy it is to track the 
cause of failures, so I can't say it is feasible just yet.

> Umbrella jira for improved HDFS rolling upgrades
> ------------------------------------------------
>
>                 Key: HDFS-5535
>                 URL: https://issues.apache.org/jira/browse/HDFS-5535
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, ha, hdfs-client, namenode
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Nathan Roberts
>         Attachments: HDFSRollingUpgradesHighLevelDesign.pdf, 
> h5535_20140219.patch, h5535_20140220-1554.patch, h5535_20140220b.patch, 
> h5535_20140221-2031.patch, h5535_20140224-1931.patch, 
> h5535_20140225-1225.patch
>
>
> In order to roll a new HDFS release through a large cluster quickly and 
> safely, a few enhancements are needed in HDFS. An initial High level design 
> document will be attached to this jira, and sub-jiras will itemize the 
> individual tasks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to