[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181562#comment-13181562 ] Eli Collins commented on HDFS-2179: --- Agree. Spoke to Todd, am going to move this to common for HADOOP-7938 (there's minimal HDFS dependency). > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: HA branch (HDFS-1623) > > Attachments: hdfs-2179.txt, hdfs-2179.txt, hdfs-2179.txt > > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079539#comment-13079539 ] Suresh Srinivas commented on HDFS-2179: --- My feeling is that this is not NN specific. Whether MR uses it or not, doing it in common gets the abstractions right. > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: HA branch (HDFS-1623) > > Attachments: hdfs-2179.txt, hdfs-2179.txt, hdfs-2179.txt > > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079530#comment-13079530 ] Todd Lipcon commented on HDFS-2179: --- Yes, sorry for not running test-patch before. I just ran it and it pointed out that I was missing the license on one of the new test files as well. Aside from that, it passes. As for why it goes in the NN package instead of common, I think it's better to start with building things specific to our current use case. Then, if we have need of this code from another spot (eg MapReduce HA) we can consider moving it to common. But let's not overly generalize until we have to -- eg from what I've heard about MR HA, it stores all of its critical state in ZooKeeper so this sort of fencing is not necessary. > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: HA branch (HDFS-1623) > > Attachments: hdfs-2179.txt, hdfs-2179.txt, hdfs-2179.txt > > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079519#comment-13079519 ] Suresh Srinivas commented on HDFS-2179: --- I have not reviewed the patch yet. Just looking from the high level, why should this be in namenode package? Is this not generic enough that it should be in common? > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: HA branch (HDFS-1623) > > Attachments: hdfs-2179.txt, hdfs-2179.txt, hdfs-2179.txt > > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079516#comment-13079516 ] Suresh Srinivas commented on HDFS-2179: --- Todd, should we run test-patch before committing changes? My preference is to do that. > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: HA branch (HDFS-1623) > > Attachments: hdfs-2179.txt, hdfs-2179.txt, hdfs-2179.txt > > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079463#comment-13079463 ] Todd Lipcon commented on HDFS-2179: --- The only difference the lack of {{continue}} statement makes should be logging output. But I see your point - I'll add a {{continue}} there to be consistent, and then commit this to the HA branch. > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-2179.txt, hdfs-2179.txt > > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079459#comment-13079459 ] Aaron T. Myers commented on HDFS-2179: -- Latest patch looks great. One tiny comment In the loop in {{NodeFencer.fence}}, why do you continue in the event of {{BadFencingConfigurationException}}, but not in the case of an unknown {{Throwable}}? I can imagine a justification for continuing in both or neither cases, but not in only one. +1 once this is addressed. > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-2179.txt, hdfs-2179.txt > > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079155#comment-13079155 ] Aaron T. Myers commented on HDFS-2179: -- Patch looks pretty good, Todd. A few comments: # Please add some comments to the {{FenceMethod}} interface # I think {{FenceMethod}} should be public. Entirely possible (if not likely) end users will want to implement their own {{FenceMethods}}, and they shouldn't need to put them in {{o.a.h.hdfs.server.namenode.ha}}. # Please add some class comments to {{NodeFencer}}. # Seems to me like {{NodeFencer.fence}} should be catching {{Exception}} thrown by the individual methods. No reason not to try the other ones if some exception other than {{BadFencingConfigurationException}} is thrown. # In {{SshFenceByTcpPort.getNNPort}}, won't this be getting the port of the NN from where the SSH is occurring, not necessarily of the NN which is being SSHed into? This sort of points to what may be a larger problem, which is that I believe it's presently impossible to configure the addresses of multiple NNs in a single configuration. > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: hdfs-2179.txt > > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070658#comment-13070658 ] dhruba borthakur commented on HDFS-2179: Awesome, +1 to the proposal listed above. > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070631#comment-13070631 ] Suresh Srinivas commented on HDFS-2179: --- Case 1), where active standby are in communication and co-operating does not require fencing at all. Fencing is required only when active/standby cannot communicate. So we should drop that out of cases to consider. When using solutions such as LinuxHA, a local process (LRM) kills the process to be fenced. This does not require ssh to the node. HDFS-2185 should consider this requirement. I might start with LinuxHA to play around with this, in the first phase, since I think getting a rock solid and correct fail-over controller is non-trivial. > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069846#comment-13069846 ] Eli Collins commented on HDFS-2179: --- Agree with your proposal, though we shouldn't need to fence the old NN in the cooperative case (because the old primary has confirmed that it's gone into standby, closed its storage dirs, stopped service threads, etc). Since we have to make the uncooperative case work anyway, and exercising it frequently/by default will help find the relevant bugs (eg place where we're not syncing the log but should be) we should start with it. > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069033#comment-13069033 ] Kihwal Lee commented on HDFS-2179: -- I think it is safe to serve reads as long as the new node is not serving writes. So there can be a period of service overlap if we can make sure the old node stops serving reads before the new node starts serving writes. I am assuming the both are serving the same content, but if the fs state has diverged between the two (e.g. the in-memory state of the old node is not in sync with the persistent one), even serving reads may not be safe. Although it is safe in terms of the data integrity at the file system level in this case, it may cause clients to make wrong decisions and lose data. Probably we should not trust the old node at all since it can have unexpected failure modes. Then serving reads is not safe. > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068723#comment-13068723 ] Todd Lipcon commented on HDFS-2179: --- I found this reference useful to see what kind of fencing methods are implemented by Red Hat Cluster Suite: https://access.redhat.com/kb/docs/DOC-30004 > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068702#comment-13068702 ] Todd Lipcon commented on HDFS-2179: --- and one more open question I forgot: do we care about _read_ fencing? ie, is it OK if the old NN can for some number of seconds service reads which are no longer up-to-date? If so, storage fencing is necessary but not sufficient. > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism
[ https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068700#comment-13068700 ] Todd Lipcon commented on HDFS-2179: --- h3. Fencing overview In order to fence a NN, there are several different methods, at varying levels of nastiness: 1) Cooperative active->standby transition or shutdown In the case of a manual failover, the old primary can gracefully either transition to a standby mode, or gracefully shut down. In this case, since we assume the software to be cooperative, no real "fencing" is necessary -- the new NN just needs to unambiguously confirm that the old NN has dropped out of active mode. This method succeeds only if the old NN remains in full operation. 2) Process killing or death verification (eg via ssh or a second daemon) In the case that the old primary has either hung (eg deadlock) or crashed (eg JVM segfault), but the host is OK, the new primary may contact that host and send SIGKILL to the NameNode JVM. This may be done either via ssh or via contacting some process which is still running on the node. It is also sufficient to _verify_ that the NN process is no longer running in the case that its JVM crashed. This method succeeds only if the _host_ of the old NN remains in full operation, despite the NN itself being deadlocked or crashed. 3) Storage fencing Depending on the type of storage in which the old NN stores its edits directories, the new NN may explicitly fence the storage. This is typically accomplished using a vendor-specific extension. For example, NetApp filers support the command "exportfs -b enable save /vol/vol0" which can be remotely issued in order to disallow any further access to a particular mount by a particular host. In the case of edits stored on BookKeeper in the future, we may be able to implement some kind of lease revocation or fencing within that storage system. 4) Network port fencing Many switches support remote management. One way to prevent a NameNode from responding to any further requests is to forcibly disable its network port. An alternative similar mechanism is to use something like a LOM card to remotely disable the NIC. 5) Power port fencing (aka STONITH) Many power distribution units (PDUs) support remote management. The last ditch effort to fence a node is to literally "pull the power" h3. Proposal Since methods 3-5 above are usually vendor-specific implementations, it does not make sense to try to implement a catch-all fencing mechanism within Hadoop. Instead, operators are likely to want to use commonly available shell scripts that work against their preferred hardware. Given this, I would propose that Hadoop's fencing behavior be: - Configure a list of "fence methods", each with an associated priority. - Each fence method returns an exit code indicating whether it has successfully fenced the target node. - If any method succeeds, no further method is attempted. - If a method fails, continue down the list to try the next method. - If all fence methods fail, then both nodes remain in "standby" state, and an administrator must manually force the transition after verifying that the other node is no longer active. The first fence method will always be the "cooperative" method. We can also ship with Hadoop an implementation of method #2 (shoot-the-other-process-in-the-head via ssh). Methods 3-5 would probably be fulfilled by custom site-specific shell scripts, example snippets on a wiki, or existing tools like the fence_* programs that are available from Red Hat. h3. Open questions - do we need to have any kind of framework for unfencing built in to Hadoop? Or is it up to an administrator to "unfence"? - is it actually a good idea to include "Cooperative shutdown" in this same framework? or should we only call fence when we know it's uncooperative? > HA: namenode fencing mechanism > -- > > Key: HDFS-2179 > URL: https://issues.apache.org/jira/browse/HDFS-2179 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Todd Lipcon >Assignee: Todd Lipcon > > In an HA cluster, when there are two NNs, the invariant that only one NN is > active at a time has to be preserved in order to prevent "split brain > syndrome." Thus, when a standby NN is transition to "active" state during a > failover, it needs to somehow _fence_ the formerly active NN to ensure that > it can no longer perform edits. This JIRA is to discuss and implement NN > fencing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira