[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237367#comment-13237367 ] Todd Lipcon commented on HDFS-2185: --- Hi Bikas. The important bits of the code are only ~200 lines. Is there really much value in a detailed design doc? In my opinion, if the code itself isn't clear and self-documenting enough to make the design obvious, then the code needs to be better. If there's anything unclear in the code, please let me know and I'll improve the javadocs and inline comments. A general overview of the design is posted above, though the code has less of a formal state machine approach. > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Todd Lipcon > Attachments: Failover_Controller.jpg, hdfs-2185.txt > > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237351#comment-13237351 ] Bikas Saha commented on HDFS-2185: -- It would be really great if there is a design document posted that explains the details. Thats usually a lot easier to understand (aside of actual white-boarding :)) than real code. It helps in reading the code if the mental model of the design is made via a document. Specially since this is a new component altogether. > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Todd Lipcon > Attachments: Failover_Controller.jpg, hdfs-2185.txt > > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221370#comment-13221370 ] Bikas Saha commented on HDFS-2185: -- I have attached a state diagram for some ideas I had on how this could work. Think of the rectangles as the primary states of the controller. The ovals are actions that need to be taken before changing states. The black arrows are results of those actions and the blue arrows are external events. The blue arrows are notifications that can be received from the ZK leader election library added in HADOOP-7992 and the health notifications from the HAServiceProtocol. This expects one change in the HAServiceProtocol. That is to split becomeActive() into prepareToBecomeActive() and becomeActive(). prepareToBecomeActive() does the time consuming heavy lifting and the world might change by the time it completes. At that point, if the node is still the leader, it can quickly becomeActive(). Else it can becomeStandby(). > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Todd Lipcon > Attachments: Failover_Controller.jpg > > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180049#comment-13180049 ] Todd Lipcon commented on HDFS-2185: --- Sure, that makes sense. I'm a little skeptical that the ZK library can be done well entirely in isolation of having something to plug it into... but if it can be, certainly would work. > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Todd Lipcon > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179344#comment-13179344 ] Suresh Srinivas commented on HDFS-2185: --- Todd, instead of incorporating HDFS-2681 into this, can we finish the ZK library as a part of that jira and focus this jira on FailoverController. > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Todd Lipcon > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174376#comment-13174376 ] Uma Maheswara Rao G commented on HDFS-2185: --- That's Great! Completely Agreed with you, for completing manual failover first.:-) Ok, lets continue the discussions on design parallely whenever we find the time. > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Todd Lipcon > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174338#comment-13174338 ] Todd Lipcon commented on HDFS-2185: --- Great, thanks for the link, Uma. I will be sure to take a look. My plan is to finish off the checkpointing work next (HDFS-2291) and then go into a testing cycle for manual failover to make sure everything's robust. Unless we have a robust functional manual failover, automatic failover is just going to add some complication. After we're reasonably confident in the manual operation, we can start in earnest on the ZK-based automatic work. Do you agree? (of course it's good to start discussing design for the automatic one in parallel) > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Todd Lipcon > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174310#comment-13174310 ] Uma Maheswara Rao G commented on HDFS-2185: --- Ok, Todd thanks for the clarification. ZOOKEEPER-1080 is the one we used for our internal HA implementation. Many cases has been handled based on the experiences ,testing and also running in production from last 6months. That is also has State machine implementation as you proposed. If you have some free go through once and if you find that is reasonable, we can take some code from there as well. Also i can help in preparing some part of the patches. Thanks Uma > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Todd Lipcon > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174263#comment-13174263 ] Todd Lipcon commented on HDFS-2185: --- Twitter's also got a nice library of ZK stuff. But I think copy-paste is probably easier so we can customize it to our needs and not have to pull in lots of transitive dependencies > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Todd Lipcon > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174257#comment-13174257 ] Aaron T. Myers commented on HDFS-2185: -- Per a recommendation from Patrick Hunt, we might also consider taking a look at the [Netflix Curator|https://github.com/Netflix/curator], which includes a leader election recipe as well. It's Apache-licensed. > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Todd Lipcon > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174253#comment-13174253 ] Aaron T. Myers commented on HDFS-2185: -- Note also that the recipes included in ZK aren't actually built/packaged, so we'll need to copy/paste the code somewhere into Hadoop and built it ourselves anyway, even if we used the recipe as-is. > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Todd Lipcon > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174246#comment-13174246 ] Todd Lipcon commented on HDFS-2185: --- Yea, this is very similar to the leader election recipe - I planned to base the code somewhat on that code for best practices. But the major difference is that we need to do fencing as well, which requires that we leave a non-ephemeral node behind when our ephemeral node expires, so the new NN can fence the old. > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Todd Lipcon > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173966#comment-13173966 ] Uma Maheswara Rao G commented on HDFS-2185: --- Hi Todd, Small question before going through the proposal in detail. I think Zookeeper already has in-built "leader election recipe" implementations ready right. Are we going to reuse that implementations? Seems to me that, we are trying to implement the leader election again here. Couple of JIRAs from Zookeeper: ZOOKEEPER-1209, ZOOKEEPER-1095, ZOOKEEPER-1080 Thanks Uma > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Eli Collins >Assignee: Todd Lipcon > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169713#comment-13169713 ] Todd Lipcon commented on HDFS-2185: --- BTW, should add that another goal is to implement a client failover solution which uses the {{activeNodeInfo}} information to locate the active NN. We can probably borrow some code from Dhruba's AvatarNode patch for this. > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Eli Collins >Assignee: Todd Lipcon > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169702#comment-13169702 ] Todd Lipcon commented on HDFS-2185: --- Here's a design sketch -- I have only done a little bit of implementation but nothing really fleshed out yet. So, it might change a bit during the course of implementation. But feedback on the general approach would be appreciated! h3. Goals - Ensure that only a single NN can be active at a time. -- Use ZK as a lock manager to satisfy this requirement. - Perform health monitoring of the active NN to trigger a fail-over should it become unhealthy. - Automatically fail-over in the case that one of the hosts fails (eg power/network outage) - Allow manual (administratively initiated) graceful failover - Initiate fencing of the previously active NN in the case of non-graceful failovers. h3. Overall design The ZooKeeper FailoverController (ZKFC) is a separate process/program which runs next to each of the HA NameNodes in the cluster. It does not directly spawn/supervise the NN JVM process, but rather runs on the same machine and communicates with it via localhost RPC. The ZKFC is designed to be as simple as possible to decrease likelihood of bugs which might trigger a false fail-over. It is also designed to use only a very small amount of memory, so that it will never have lengthy GC pauses. This allows us to set a fairly low time-out on the ZK session in order to detect machine failures quickly. h3. Configuration The ZKFC needs the following pieces of configuration: - list of zookeeper servers making up the ZK quorum (fail to start if this is not provided) - host/port for the HAServiceProtocol of the local NN (defaults to localhost:) - "base znode" at which to root all of the znodes used by the process h3. Nodes in ZK: Everything should be rooted at the configured base znode. Within that, there should be a znode per nameservice ID. Within this {{/base/nameserviceId/}} directory, there are the following znodes: - {{activeLock}} - an ephemeral node taken by the ZKFC before it asks its local NN to become active. This acts as a mutex on the active state and also as a failure detector. - {{activeNodeInfo}} - a non-ephemeral node written by the ZKFC after it succeeds in taking {{activeLock}}. This should have data like the IPC address, HTTP address, etc of the NN. The {{activeNodeInfo}} is non-ephemeral so that, when a new NN takes over from a failed one, it has enough information to fence the previous active in case it's still actually running. h3. Runtime operation states For simplicity of testing, we can model the ZKFC as a state machine. h4. LOCAL_NOT_READY The NN on the local host is down or not responding to RPCs. We start in this state. h4. LOCAL_STANDBY The NN on the local host is in standby mode and ready to automatically transition to active if the former active dies. h4. LOCAL_ACTIVE The NN on the local host is running and performing active duty. h3. Inputs into state machine Three other classes interact with the state machine: h4. ZK Controller A ZK thread connects to ZK and watches for the following events: - The previously active master has lost its ephemeral node - The ZK session is lost h4. User-initiated failover controller By some means (RPC/signal/HTTP/etc) the user can request that the active NN's FC gracefully turn over the active state to a different NN. h4. Health monitor A HealthMonitor thread heartbeats continuously to the local NN. It provides an event whenever the health state of the NN changes. For example: - NN has become unhealthy - Lost contact with NN - NN is now healthy h3. Behavior of state machine h4. LOCAL_NOT_READY state - System starts here - When HealthMonitor indicates the local NN is healthy: -- Transition to LOCAL_STANDBY mode h4. LOCAL_STANDBY - On health state change: -- Transition to NOT_READY state if local NN goes down - On ZK state change: -- If the old ZK "active" node was deleted, try to initiate automatic failover -- If our own ZK session died, reconnect to ZK h4. Failover process: - Try to create the "activeLock" ephemeral node in ZK - If we are unsuccessful, return to LOCAL_STANDBY - See if there is a "activeNodeInfo" node in ZK. If so: -- The old NN may still be running (it didn't gracefully shut down). -- Initiate fencing process. -- If successful, delete the "activeNodeInfo" node in ZK. - Create an "activeNodeInfo" with our own information (ie NN IPC address, etc) - Send IPC to local NN to transitionToActive. If successful, go to LOCAL_ACTIVE h4. LOCAL_ACTIVE - On health state change to unhealthy: -- delete our active lock znode, go to LOCAL_NOT_READY. Another node will fence us. - On ZK connection loss or notice our znode got deleted: -- another process is probably about to fence us... unless all nodes lost their connection, in which case we should "stay the co
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088746#comment-13088746 ] Uma Maheswara Rao G commented on HDFS-2185: --- Hi Eli, Are you planning to post the design document for this? Small question. {quote} •Performs health monitoring (aka failure detection) {quote} Here, do we need separate monitoring logic? As I know, ZK client watchers can give the call backs if we register the watchers rite? (or) you are planning something alternative here? If you post the design doc, it would be great. -thanks Uma > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Eli Collins >Assignee: Eli Collins > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079700#comment-13079700 ] Aaron T. Myers commented on HDFS-2185: -- Hey Florian, I don't think anyone would disagree with you that Pacemaker already provides much of this functionality. The design document in HDFS-1623 discusses both using Pacemaker or a Hadoop-specific failover controller. The intention of HDFS-1623 is absolutely to provide the necessary hooks to be able to support either one. > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Eli Collins >Assignee: Eli Collins > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2185) HA: ZK-based FailoverController
[ https://issues.apache.org/jira/browse/HDFS-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079329#comment-13079329 ] Florian Haas commented on HDFS-2185: Allow me to comment that the [Pacemaker|http://www.clusterlabs.org] stack already fulfills all of the above. * Initiates leader election (via ZK) when necessary -- Pacemaker calls this a _Designated Coordinator_, which is elected automatically. * Performs health monitoring (aka failure detection) -- Pacemaker does this via the _monitor_ action of _resource agents_, which follow the Open Cluster Framework (OCF) standard * Performs fail-over (standby to active and active to standby transitions) -- Pacemaker does this automatically, including fencing, quorum and other vital concepts * Heartbeats to ensure the liveness -- Pacemaker does this over one of two cluster communication layers it supports, those being Heartbeat and Corosync. Why reinvent the wheel? > HA: ZK-based FailoverController > --- > > Key: HDFS-2185 > URL: https://issues.apache.org/jira/browse/HDFS-2185 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Eli Collins >Assignee: Eli Collins > > This jira is for a ZK-based FailoverController daemon. The FailoverController > is a separate daemon from the NN that does the following: > * Initiates leader election (via ZK) when necessary > * Performs health monitoring (aka failure detection) > * Performs fail-over (standby to active and active to standby transitions) > * Heartbeats to ensure the liveness > It should have the same/similar interface as the Linux HA RM to aid > pluggability. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira