[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2012-01-06 Thread Eli Collins (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181562#comment-13181562
 ] 

Eli Collins commented on HDFS-2179:
---

Agree. Spoke to Todd, am going to move this to common for HADOOP-7938 (there's 
minimal HDFS dependency).

> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: HA branch (HDFS-1623)
>
> Attachments: hdfs-2179.txt, hdfs-2179.txt, hdfs-2179.txt
>
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2011-08-04 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079539#comment-13079539
 ] 

Suresh Srinivas commented on HDFS-2179:
---

My feeling is that this is not NN specific. Whether MR uses it or not, doing it 
in common gets the abstractions right. 

> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: HA branch (HDFS-1623)
>
> Attachments: hdfs-2179.txt, hdfs-2179.txt, hdfs-2179.txt
>
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2011-08-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079530#comment-13079530
 ] 

Todd Lipcon commented on HDFS-2179:
---

Yes, sorry for not running test-patch before. I just ran it and it pointed out 
that I was missing the license on one of the new test files as well. Aside from 
that, it passes.

As for why it goes in the NN package instead of common, I think it's better to 
start with building things specific to our current use case. Then, if we have 
need of this code from another spot (eg MapReduce HA) we can consider moving it 
to common. But let's not overly generalize until we have to -- eg from what 
I've heard about MR HA, it stores all of its critical state in ZooKeeper so 
this sort of fencing is not necessary.

> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: HA branch (HDFS-1623)
>
> Attachments: hdfs-2179.txt, hdfs-2179.txt, hdfs-2179.txt
>
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2011-08-04 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079519#comment-13079519
 ] 

Suresh Srinivas commented on HDFS-2179:
---

I have not reviewed the patch yet. Just looking from the high level, why should 
this be in namenode package? Is this not generic enough that it should be in 
common?

> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: HA branch (HDFS-1623)
>
> Attachments: hdfs-2179.txt, hdfs-2179.txt, hdfs-2179.txt
>
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2011-08-04 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079516#comment-13079516
 ] 

Suresh Srinivas commented on HDFS-2179:
---

Todd, should we run test-patch before committing changes? My preference is to 
do that.

> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: HA branch (HDFS-1623)
>
> Attachments: hdfs-2179.txt, hdfs-2179.txt, hdfs-2179.txt
>
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2011-08-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079463#comment-13079463
 ] 

Todd Lipcon commented on HDFS-2179:
---

The only difference the lack of {{continue}} statement makes should be logging 
output. But I see your point - I'll add a {{continue}} there to be consistent, 
and then commit this to the HA branch.

> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-2179.txt, hdfs-2179.txt
>
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2011-08-04 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079459#comment-13079459
 ] 

Aaron T. Myers commented on HDFS-2179:
--

Latest patch looks great. One tiny comment

In the loop in {{NodeFencer.fence}}, why do you continue in the event of 
{{BadFencingConfigurationException}}, but not in the case of an unknown 
{{Throwable}}? I can imagine a justification for continuing in both or neither 
cases, but not in only one.

+1 once this is addressed.

> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-2179.txt, hdfs-2179.txt
>
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2011-08-03 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079155#comment-13079155
 ] 

Aaron T. Myers commented on HDFS-2179:
--

Patch looks pretty good, Todd. A few comments:

# Please add some comments to the {{FenceMethod}} interface
# I think {{FenceMethod}} should be public. Entirely possible (if not likely) 
end users will want to implement their own {{FenceMethods}}, and they shouldn't 
need to put them in {{o.a.h.hdfs.server.namenode.ha}}.
# Please add some class comments to {{NodeFencer}}.
# Seems to me like {{NodeFencer.fence}} should be catching {{Exception}} thrown 
by the individual methods. No reason not to try the other ones if some 
exception other than {{BadFencingConfigurationException}}  is thrown.
# In {{SshFenceByTcpPort.getNNPort}}, won't this be getting the port of the NN 
from where the SSH is occurring, not necessarily of the NN which is being SSHed 
into? This sort of points to what may be a larger problem, which is that I 
believe it's presently impossible to configure the addresses of multiple NNs in 
a single configuration.

> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-2179.txt
>
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2011-07-25 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070658#comment-13070658
 ] 

dhruba borthakur commented on HDFS-2179:


Awesome, +1 to the proposal listed above.

> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2011-07-25 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070631#comment-13070631
 ] 

Suresh Srinivas commented on HDFS-2179:
---

Case 1), where active standby are in communication and co-operating does not 
require fencing at all. Fencing is required only when active/standby cannot 
communicate. So we should drop that out of cases to consider.

When using solutions such as LinuxHA, a local process (LRM) kills the process 
to be fenced. This does not require ssh to the node. HDFS-2185 should consider 
this requirement. I might start with LinuxHA to play around with this, in the 
first phase, since I think getting a rock solid and correct fail-over 
controller is non-trivial.

> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2011-07-22 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069846#comment-13069846
 ] 

Eli Collins commented on HDFS-2179:
---

Agree with your proposal, though we shouldn't need to fence the old NN in the 
cooperative case (because the old primary has confirmed that it's gone into 
standby, closed its storage dirs, stopped service threads, etc). Since we have 
to make the uncooperative case work anyway, and exercising it frequently/by 
default will help find the relevant bugs (eg place where we're not syncing the 
log but should be) we should start with it.

> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2011-07-21 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069033#comment-13069033
 ] 

Kihwal Lee commented on HDFS-2179:
--

I think it is safe to serve reads as long as the new node is not serving 
writes. So there can be a period of service overlap if we can make sure the old 
node stops serving reads before the new node starts serving writes.  I am 
assuming the both are serving the same content, but if the fs state has 
diverged between the two (e.g. the in-memory state of the old node is not in 
sync with the persistent one), even serving reads may not be safe. Although it 
is safe in terms of the data integrity at the file system level in this case, 
it may cause clients to make wrong decisions and lose data. Probably we should 
not trust the old node at all since it can have unexpected failure modes. Then 
serving reads is not safe.

> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2011-07-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068723#comment-13068723
 ] 

Todd Lipcon commented on HDFS-2179:
---

I found this reference useful to see what kind of fencing methods are 
implemented by Red Hat Cluster Suite: 
https://access.redhat.com/kb/docs/DOC-30004

> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2011-07-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068702#comment-13068702
 ] 

Todd Lipcon commented on HDFS-2179:
---

and one more open question I forgot: do we care about _read_ fencing? ie, is it 
OK if the old NN can for some number of seconds service reads which are no 
longer up-to-date? If so, storage fencing is necessary but not sufficient.

> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2179) HA: namenode fencing mechanism

2011-07-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068700#comment-13068700
 ] 

Todd Lipcon commented on HDFS-2179:
---


h3. Fencing overview

In order to fence a NN, there are several different methods, at varying levels 
of nastiness:

1) Cooperative active->standby transition or shutdown
In the case of a manual failover, the old primary can gracefully either 
transition to a standby mode, or gracefully shut down. In this case, since we 
assume the software to be cooperative, no real "fencing" is necessary -- the 
new NN just needs to unambiguously confirm that the old NN has dropped out of 
active mode.

This method succeeds only if the old NN remains in full operation.

2) Process killing or death verification (eg via ssh or a second daemon)
In the case that the old primary has either hung (eg deadlock) or crashed (eg 
JVM segfault), but the host is OK, the new primary may contact that host and 
send SIGKILL to the NameNode JVM. This may be done either via ssh or via 
contacting some process which is still running on the node. It is also 
sufficient to _verify_ that the NN process is no longer running in the case 
that its JVM crashed.

This method succeeds only if the _host_ of the old NN remains in full 
operation, despite the NN itself being deadlocked or crashed.

3) Storage fencing
Depending on the type of storage in which the old NN stores its edits 
directories, the new NN may explicitly fence the storage. This is typically 
accomplished using a vendor-specific extension. For example, NetApp filers 
support the command "exportfs -b enable save  /vol/vol0" which can 
be remotely issued in order to disallow any further access to a particular 
mount by a particular host.

In the case of edits stored on BookKeeper in the future, we may be able to 
implement some kind of lease revocation or fencing within that storage system.

4) Network port fencing
Many switches support remote management. One way to prevent a NameNode from 
responding to any further requests is to forcibly disable its network port. An 
alternative similar mechanism is to use something like a LOM card to remotely 
disable the NIC.

5) Power port fencing (aka STONITH)
Many power distribution units (PDUs) support remote management. The last ditch 
effort to fence a node is to literally "pull the power"


h3. Proposal

Since methods 3-5 above are usually vendor-specific implementations, it does 
not make sense to try to implement a catch-all fencing mechanism within Hadoop. 
Instead, operators are likely to want to use commonly available shell scripts 
that work against their preferred hardware. Given this, I would propose that 
Hadoop's fencing behavior be:

- Configure a list of "fence methods", each with an associated priority.
- Each fence method returns an exit code indicating whether it has successfully 
fenced the target node.
- If any method succeeds, no further method is attempted.
- If a method fails, continue down the list to try the next method.
- If all fence methods fail, then both nodes remain in "standby" state, and an 
administrator must manually force the transition after verifying that the other 
node is no longer active.

The first fence method will always be the "cooperative" method. We can also 
ship with Hadoop an implementation of method #2 
(shoot-the-other-process-in-the-head via ssh). Methods 3-5 would probably be 
fulfilled by custom site-specific shell scripts, example snippets on a wiki, or 
existing tools like the fence_* programs that are available from Red Hat.

h3. Open questions
- do we need to have any kind of framework for unfencing built in to Hadoop? Or 
is it up to an administrator to "unfence"?
- is it actually a good idea to include "Cooperative shutdown" in this same 
framework? or should we only call fence when we know it's uncooperative?


> HA: namenode fencing mechanism
> --
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira