[ https://issues.apache.org/jira/browse/HADOOP-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236213#comment-13236213 ]
Todd Lipcon commented on HADOOP-8163: ------------------------------------- ilePath. zkMostRecentFilePath is open to being misunderstood. Same for MOST_RECENT_FILENAME. Done bq. actually MostRecent seems to be a misnomer to me. I think it actually is LockOwnerInfo/LeaderInfo. zkLockOwnerInfoPath/tryDeleteLeaderInfo etc. It's not always the lock owner, though. Basically, we go through the following states: ||Time step||Lock node||MostRecentActive||Description|| |1|-|-|Startup| |2|Node A|-|Node A acquires active lock |3|Node A|Node A|..and writes its own info| |4|-|Node A|A loses its ZK lease| |5|Node B|Node A|Node B acquires active lock |6|Node B|-|Node B fences node A| |7|Node B|Node B|Node B writes its info| So, in steps 3 and 7, calling it "LeaderInfo" or "LockOwnerInfo" makes sense. But in steps 4 and 5, it's the "PreviousLeaderInfo". Perhaps just renaming to "LeaderBreadcrumb" or something makes more sense, since it's basically a bread crumb left around by the previous leader so that future leaders know its info. bq. why is ensureBaseNode() needed? In it we are creating a new set of znodes with the given zkAcls which may or may not be the correct thing. eg. if the admin simply forgot to create the appropriate znode path before starting the service it might be ok to fail. Instead of trying to create the path ourselves with permissions that may or may not be appropriate for the entire path. I would be wary of doing this. What is the use case? The use case is a "ZKFailoverController -formatZK" command line tool that I'm building into the ZKFC code. The thinking is that most administrators won't want to go into the ZK CLI to manually create the parent znode while installing HDFS. Instead, they'd rather just issue this simple command. In the case that they want to have varying permissions across the path, or some more complicated ACL, then they'll have to use the ZK CLI, but for the common case I think this will make deployment much simpler. bq. consider renaming baseNodeExists() to parentNodeExists() or renaming the parentZnodeName parameter in the constructor to baseNode for consistency. Perhaps this could be called in the constructor to check that the znode exists and be done with config issues. No need for ensureBaseNode() above. Renamed to parentZNodeExists and ensureParentZNode bq. this must be my newbie java skills but I find something like - prefixPath.append("/").append(pathParts[index]) or znodeWorkingDir.subString(0, znodeWorkingDir.nextIndexOf('/')) - more readable than prefixPath = Joiner.on("/").join(Arrays.asList(pathParts).subList(0, i)). It might also be more efficient but thats not relevant for this situation. Agreed, fixed. bq. public synchronized void quitElection(boolean needFence) - Dont we want to delete the permanent znode for standby's too? Why check if state is active. It anyways calls a tryDelete* method that should be harmless. If the node is standby, then the permanent znode refers to the current lockholder. So deleting it would incorrectly signify that whoever is active doesn't need to be fenced if it crashes. bq. tryDeleteMostRecentNode() - From my understanding of "tryFunction" - this function should be not really be asserting that some state holds. If it should assert then we should remove "try" from the name. The difference here is this: the assert() guards against programmer error. It is a mistake to call this function when you aren't active (see above comment). But if there is a ZK error (like the session got lost) it's OK to fail to delete it, since it just means that the node will get fenced. bq. in zkDoWithRetries there is a NUM_RETRIES field that could be used instead of 3. Fixed bq. why are we exposing "public synchronized ZooKeeper getZKClient()"? Removed bq. the following code seems to have issues... <snip>... While that is happening, the state of the world changes and this elector is not longer the lock owner. When appClient.fenceOldActive(data) will complete then the code will go ahead and delete the lockOwnerZnode at zkMostRecentFilePath. This node could be from the new leader who had successfully fenced and become active. The version number parameter might accidentally save us but would likely be 0 all the time. This scenario is impossible for the following reason: If the state of the world changed and this node was no longer active, the only possible reason for that is that the node lost its ZK session lease. If that's the case, then it won't be able to issue any further commands from that client (see my conversation with Hari above) bq. what happens if the leader lost the lock, tried to delete its znode, failed to do so, exited anyways, then became the next owner and found the existing mostrecent znode. I think it will try to fence itself I'm currently handling this situation at the ZKFailoverController level. Are you suggesting we determine "self" by comparing the actual bytewise data of the znode, and skip the fence call if it's the same as this elector instance's appData? Seems reasonable, just want to clarify. bq. It would be great to see the functional test with RealZK enhanced to cover some of the permanent znode use cases. I remember fixing some issues in my own code that were exposed to the reality of the real ZK Added a small change to the test to make sure the fencing code got exercised. I also plan to add some more integration-level tests using real ZK and the ZKFailoverController with that patch. > Improve ActiveStandbyElector to provide hooks for fencing old active > -------------------------------------------------------------------- > > Key: HADOOP-8163 > URL: https://issues.apache.org/jira/browse/HADOOP-8163 > Project: Hadoop Common > Issue Type: Improvement > Components: ha > Affects Versions: 0.23.3, 0.24.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hadoop-8163.txt, hadoop-8163.txt, hadoop-8163.txt > > > When a new node becomes active in an HA setup, it may sometimes have to take > fencing actions against the node that was formerly active. This JIRA extends > the ActiveStandbyElector which adds an extra non-ephemeral node into the ZK > directory, which acts as a second copy of the active node's information. > Then, if the active loses its ZK session, the next active to be elected may > easily locate the unfenced node to take the appropriate actions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira