[ 
https://issues.apache.org/jira/browse/HDFS-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039891#comment-13039891
 ] 

Eli Collins commented on HDFS-1623:
-----------------------------------

Thanks for incorporating the feedback. New doc looks good. Some comments:
* Section 8.1 - I think the BN approach is to run *multiple* BNs, this way the 
3f use case is not a problem as long as you have at least one BN alive, and you 
don't need shared storage to address 3f. This is similar to GFS' multiple 
shadow masters.
* Section 8.3 - fail-over time doesn't need to be longer if the client is 
notified when there's a new primary. One idea, clients could watch an ephemeral 
ZK node, though there's an open question as to whether ZK can support as many 
observers as we have clients.
* Section 8.5 - We need to figure out where the FailoverController (FC) runs, 
if lives in the same failure domain as the primary then you've still got a 
single point of failure. If it lives on a different failure domain then it may 
not be able to tell if the primary has failed, or be able to take the 
appropriate action if it has (eg due to lack of connectivity). Obviously the FC 
needs to be HA itself too (eg leader elected, new FC is spawned if the primary 
FC fails).
* Section 9.9.1 - Todd and I have investigated fencing in NFS some. In v3 
locking (NLM) doesn't work because dead clients maintain the lock. We'll need 
to have a pluggable shell command (eg some vendors provide a perl module that 
can ssh in and fence a particular IP) if we don't want to require IPMI, ILO, 
etc for stonith.

> High Availability Framework for HDFS NN
> ---------------------------------------
>
>                 Key: HDFS-1623
>                 URL: https://issues.apache.org/jira/browse/HDFS-1623
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: HDFS-High-Availability.pdf, NameNode HA_v2.pdf, NameNode 
> HA_v2_1.pdf, Namenode HA Framework.pdf
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to