[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070631#comment-13070631 ]
Suresh Srinivas commented on HDFS-2179: --------------------------------------- Case 1), where active standby are in communication and co-operating does not require fencing at all. Fencing is required only when active/standby cannot communicate. So we should drop that out of cases to consider. When using solutions such as LinuxHA, a local process (LRM) kills the process to be fenced. This does not require ssh to the node. HDFS-2185 should consider this requirement. I might start with LinuxHA to play around with this, in the first phase, since I think getting a rock solid and correct fail-over controller is non-trivial. > HA: namenode fencing mechanism > ------------------------------ > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node > Reporter: Todd Lipcon > Assignee: Todd Lipcon > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira