[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2012-03-23 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237367#comment-13237367
 ] 

Todd Lipcon commented on HDFS-2185:
---

Hi Bikas. The important bits of the code are only ~200 lines. Is there really 
much value in a detailed design doc? In my opinion, if the code itself isn't 
clear and self-documenting enough to make the design obvious, then the code 
needs to be better. If there's anything unclear in the code, please let me know 
and I'll improve the javadocs and inline comments. A general overview of the 
design is posted above, though the code has less of a formal state machine 
approach.

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Todd Lipcon
> Attachments: Failover_Controller.jpg, hdfs-2185.txt
>
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2012-03-23 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237351#comment-13237351
 ] 

Bikas Saha commented on HDFS-2185:
--

It would be really great if there is a design document posted that explains the 
details. Thats usually a lot easier to understand (aside of actual 
white-boarding :)) than real code. It helps in reading the code if the mental 
model of the design is made via a document. Specially since this is a new 
component altogether.



> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Todd Lipcon
> Attachments: Failover_Controller.jpg, hdfs-2185.txt
>
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2012-03-02 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221370#comment-13221370
 ] 

Bikas Saha commented on HDFS-2185:
--

I have attached a state diagram for some ideas I had on how this could work. 
Think of the rectangles as the primary states of the controller. The ovals are 
actions that need to be taken before changing states. The black arrows are 
results of those actions and the blue arrows are external events. The blue 
arrows are notifications that can be received from the ZK leader election 
library added in HADOOP-7992 and the health notifications from the 
HAServiceProtocol.
This expects one change in the HAServiceProtocol. That is to split 
becomeActive() into prepareToBecomeActive() and becomeActive(). 
prepareToBecomeActive() does the time consuming heavy lifting and the world 
might change by the time it completes. At that point, if the node is still the 
leader, it can quickly becomeActive(). Else it can becomeStandby().

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Todd Lipcon
> Attachments: Failover_Controller.jpg
>
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2012-01-04 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180049#comment-13180049
 ] 

Todd Lipcon commented on HDFS-2185:
---

Sure, that makes sense. I'm a little skeptical that the ZK library can be done 
well entirely in isolation of having something to plug it into... but if it can 
be, certainly would work.

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Todd Lipcon
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2012-01-04 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179344#comment-13179344
 ] 

Suresh Srinivas commented on HDFS-2185:
---

Todd, instead of incorporating HDFS-2681 into this, can we finish the ZK 
library as a part of that jira and focus this jira on FailoverController.

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Todd Lipcon
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2011-12-21 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174376#comment-13174376
 ] 

Uma Maheswara Rao G commented on HDFS-2185:
---

That's Great!
Completely Agreed with you, for completing manual failover first.:-)
Ok, lets continue the discussions on design parallely whenever we find the time.

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Todd Lipcon
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2011-12-21 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174338#comment-13174338
 ] 

Todd Lipcon commented on HDFS-2185:
---

Great, thanks for the link, Uma. I will be sure to take a look.

My plan is to finish off the checkpointing work next (HDFS-2291) and then go 
into a testing cycle for manual failover to make sure everything's robust. 
Unless we have a robust functional manual failover, automatic failover is just 
going to add some complication. After we're reasonably confident in the manual 
operation, we can start in earnest on the ZK-based automatic work. Do you agree?

(of course it's good to start discussing design for the automatic one in 
parallel)

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Todd Lipcon
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2011-12-21 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174310#comment-13174310
 ] 

Uma Maheswara Rao G commented on HDFS-2185:
---

Ok, Todd thanks for the clarification.
ZOOKEEPER-1080 is the one we used for our internal HA implementation. Many 
cases has been handled based on the experiences ,testing and also running in 
production from last 6months.
That is also has State machine implementation as you proposed.
If you have some free go through once and if you find that is reasonable, we 
can take some code from there as well.
Also i can help in preparing some part of the patches.

Thanks
Uma

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Todd Lipcon
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2011-12-21 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174263#comment-13174263
 ] 

Todd Lipcon commented on HDFS-2185:
---

Twitter's also got a nice library of ZK stuff. But I think copy-paste is 
probably easier so we can customize it to our needs and not have to pull in 
lots of transitive dependencies

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Todd Lipcon
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2011-12-21 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174257#comment-13174257
 ] 

Aaron T. Myers commented on HDFS-2185:
--

Per a recommendation from Patrick Hunt, we might also consider taking a look at 
the [Netflix Curator|https://github.com/Netflix/curator], which includes a 
leader election recipe as well. It's Apache-licensed.

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Todd Lipcon
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2011-12-21 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174253#comment-13174253
 ] 

Aaron T. Myers commented on HDFS-2185:
--

Note also that the recipes included in ZK aren't actually built/packaged, so 
we'll need to copy/paste the code somewhere into Hadoop and built it ourselves 
anyway, even if we used the recipe as-is.

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Todd Lipcon
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2011-12-21 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174246#comment-13174246
 ] 

Todd Lipcon commented on HDFS-2185:
---

Yea, this is very similar to the leader election recipe - I planned to base the 
code somewhat on that code for best practices. But the major difference is that 
we need to do fencing as well, which requires that we leave a non-ephemeral 
node behind when our ephemeral node expires, so the new NN can fence the old.

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Todd Lipcon
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2011-12-21 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173966#comment-13173966
 ] 

Uma Maheswara Rao G commented on HDFS-2185:
---

Hi Todd,

 Small question before going through the proposal in detail.
 
   I think Zookeeper already has in-built "leader election recipe" 
implementations ready right. Are we going to reuse that implementations? 
Seems to me that, we are trying to implement the leader election again here. 

 Couple of JIRAs from Zookeeper: ZOOKEEPER-1209, ZOOKEEPER-1095, ZOOKEEPER-1080

Thanks
Uma

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Eli Collins
>Assignee: Todd Lipcon
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2011-12-14 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169713#comment-13169713
 ] 

Todd Lipcon commented on HDFS-2185:
---

BTW, should add that another goal is to implement a client failover solution 
which uses the {{activeNodeInfo}} information to locate the active NN. We can 
probably borrow some code from Dhruba's AvatarNode patch for this.

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Eli Collins
>Assignee: Todd Lipcon
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2011-12-14 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169702#comment-13169702
 ] 

Todd Lipcon commented on HDFS-2185:
---

Here's a design sketch -- I have only done a little bit of implementation but 
nothing really fleshed out yet. So, it might change a bit during the course of 
implementation. But feedback on the general approach would be appreciated!

h3. Goals
- Ensure that only a single NN can be active at a time.
-- Use ZK as a lock manager to satisfy this requirement.
- Perform health monitoring of the active NN to trigger a fail-over should it 
become unhealthy.
- Automatically fail-over in the case that one of the hosts fails (eg 
power/network outage)
- Allow manual (administratively initiated) graceful failover
- Initiate fencing of the previously active NN in the case of non-graceful 
failovers.

h3. Overall design

The ZooKeeper FailoverController (ZKFC) is a separate process/program which 
runs next to each of the HA NameNodes in the cluster. It does not directly 
spawn/supervise the NN JVM process, but rather runs on the same machine and 
communicates with it via localhost RPC.

The ZKFC is designed to be as simple as possible to decrease likelihood of bugs 
which might trigger a false fail-over. It is also designed to use only a very 
small amount of memory, so that it will never have lengthy GC pauses. This 
allows us to set a fairly low time-out on the ZK session in order to detect 
machine failures quickly.

h3. Configuration

The ZKFC needs the following pieces of configuration:
- list of zookeeper servers making up the ZK quorum (fail to start if this is 
not provided)
- host/port for the HAServiceProtocol of the local NN (defaults to 
localhost:)
- "base znode" at which to root all of the znodes used by the process

h3. Nodes in ZK:

Everything should be rooted at the configured base znode. Within that, there 
should be a znode per nameservice ID. Within this {{/base/nameserviceId/}} 
directory, there are the following znodes:

- {{activeLock}} - an ephemeral node taken by the ZKFC before it asks its local 
NN to become active. This acts as a mutex on the active state and also as a 
failure detector.
- {{activeNodeInfo}} - a non-ephemeral node written by the ZKFC after it 
succeeds in taking {{activeLock}}. This should have data like the IPC address, 
HTTP address, etc of the NN.

The {{activeNodeInfo}} is non-ephemeral so that, when a new NN takes over from 
a failed one, it has enough information to fence the previous active in case 
it's still actually running.

h3. Runtime operation states

For simplicity of testing, we can model the ZKFC as a state machine.

h4. LOCAL_NOT_READY

The NN on the local host is down or not responding to RPCs. We start in this 
state.

h4. LOCAL_STANDBY

The NN on the local host is in standby mode and ready to automatically 
transition to active if the former active dies.

h4. LOCAL_ACTIVE

The NN on the local host is running and performing active duty.

h3. Inputs into state machine

Three other classes interact with the state machine:

h4. ZK Controller

A ZK thread connects to ZK and watches for the following events:
- The previously active master has lost its ephemeral node
- The ZK session is lost

h4. User-initiated failover controller

By some means (RPC/signal/HTTP/etc) the user can request that the active NN's 
FC gracefully turn over the active state to a different NN.

h4. Health monitor

A HealthMonitor thread heartbeats continuously to the local NN. It provides an 
event whenever the health state of the NN changes. For example:
- NN has become unhealthy
- Lost contact with NN
- NN is now healthy

h3. Behavior of state machine

h4. LOCAL_NOT_READY state
- System starts here
- When HealthMonitor indicates the local NN is healthy:
-- Transition to LOCAL_STANDBY mode

h4. LOCAL_STANDBY
- On health state change:
-- Transition to NOT_READY state if local NN goes down
- On ZK state change:
-- If the old ZK "active" node was deleted, try to initiate automatic failover
-- If our own ZK session died, reconnect to ZK

h4. Failover process:
- Try to create the "activeLock" ephemeral node in ZK
- If we are unsuccessful, return to LOCAL_STANDBY
- See if there is a "activeNodeInfo" node in ZK. If so:
-- The old NN may still be running (it didn't gracefully shut down).
-- Initiate fencing process.
-- If successful, delete the "activeNodeInfo" node in ZK.
- Create an "activeNodeInfo" with our own information (ie NN IPC address, etc)
- Send IPC to local NN to transitionToActive. If successful, go to LOCAL_ACTIVE

h4. LOCAL_ACTIVE
- On health state change to unhealthy:
-- delete our active lock znode, go to LOCAL_NOT_READY. Another node will fence 
us.
- On ZK connection loss or notice our znode got deleted:
-- another process is probably about to fence us... unless all nodes lost their 
connection, in which case we should "stay the co

[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2011-08-22 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088746#comment-13088746
 ] 

Uma Maheswara Rao G commented on HDFS-2185:
---

Hi Eli,

 Are you planning to post the design document for this?

Small question.
 {quote}
 •Performs health monitoring (aka failure detection)
{quote}

   Here, do we need separate monitoring logic? As I know, ZK client watchers 
can give the call backs if we register the watchers rite?
   (or) you are planning something alternative here?

If you post the design doc, it would be great. 

-thanks
Uma
  

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Eli Collins
>Assignee: Eli Collins
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2011-08-04 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079700#comment-13079700
 ] 

Aaron T. Myers commented on HDFS-2185:
--

Hey Florian, I don't think anyone would disagree with you that Pacemaker 
already provides much of this functionality. The design document in HDFS-1623 
discusses both using Pacemaker or a Hadoop-specific failover controller. The 
intention of HDFS-1623 is absolutely to provide the necessary hooks to be able 
to support either one.

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Eli Collins
>Assignee: Eli Collins
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController

2011-08-04 Thread Florian Haas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079329#comment-13079329
 ] 

Florian Haas commented on HDFS-2185:


Allow me to comment that the [Pacemaker|http://www.clusterlabs.org] stack 
already fulfills all of the above.

* Initiates leader election (via ZK) when necessary -- Pacemaker calls this a 
_Designated Coordinator_, which is elected automatically.

* Performs health monitoring (aka failure detection) -- Pacemaker does this via 
the _monitor_ action of _resource agents_, which follow the Open Cluster 
Framework (OCF) standard

* Performs fail-over (standby to active and active to standby transitions) -- 
Pacemaker does this automatically, including fencing, quorum and other vital 
concepts

* Heartbeats to ensure the liveness -- Pacemaker does this over one of two 
cluster communication layers it supports, those being Heartbeat and Corosync.

Why reinvent the wheel?

> HA: ZK-based FailoverController
> ---
>
> Key: HDFS-2185
> URL: https://issues.apache.org/jira/browse/HDFS-2185
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Eli Collins
>Assignee: Eli Collins
>
> This jira is for a ZK-based FailoverController daemon. The FailoverController 
> is a separate daemon from the NN that does the following:
> * Initiates leader election (via ZK) when necessary
> * Performs health monitoring (aka failure detection)
> * Performs fail-over (standby to active and active to standby transitions)
> * Heartbeats to ensure the liveness
> It should have the same/similar interface as the Linux HA RM to aid 
> pluggability.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira