[ https://issues.apache.org/jira/browse/HDFS-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039891#comment-13039891 ]
Eli Collins commented on HDFS-1623: ----------------------------------- Thanks for incorporating the feedback. New doc looks good. Some comments: * Section 8.1 - I think the BN approach is to run *multiple* BNs, this way the 3f use case is not a problem as long as you have at least one BN alive, and you don't need shared storage to address 3f. This is similar to GFS' multiple shadow masters. * Section 8.3 - fail-over time doesn't need to be longer if the client is notified when there's a new primary. One idea, clients could watch an ephemeral ZK node, though there's an open question as to whether ZK can support as many observers as we have clients. * Section 8.5 - We need to figure out where the FailoverController (FC) runs, if lives in the same failure domain as the primary then you've still got a single point of failure. If it lives on a different failure domain then it may not be able to tell if the primary has failed, or be able to take the appropriate action if it has (eg due to lack of connectivity). Obviously the FC needs to be HA itself too (eg leader elected, new FC is spawned if the primary FC fails). * Section 9.9.1 - Todd and I have investigated fencing in NFS some. In v3 locking (NLM) doesn't work because dead clients maintain the lock. We'll need to have a pluggable shell command (eg some vendors provide a perl module that can ssh in and fence a particular IP) if we don't want to require IPMI, ILO, etc for stonith. > High Availability Framework for HDFS NN > --------------------------------------- > > Key: HDFS-1623 > URL: https://issues.apache.org/jira/browse/HDFS-1623 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Sanjay Radia > Assignee: Sanjay Radia > Attachments: HDFS-High-Availability.pdf, NameNode HA_v2.pdf, NameNode > HA_v2_1.pdf, Namenode HA Framework.pdf > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira