[jira] [Commented] (HADOOP-8163) Improve ActiveStandbyElector to provide hooks for fencing old active

Aaron T. Myers (Commented) (JIRA) Tue, 20 Mar 2012 14:02:00 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233741#comment-13233741
 ]


Aaron T. Myers commented on HADOOP-8163:
----------------------------------------

Patch is looking pretty good, Todd. Here are some initial comments. I've so far 
only reviewed the non-test code.

# I find it odd that we use CamelCase for the lock file name, but 
dash-separated-words for the most recent file name.
# Not sure why we run the words "LOCKFILENAME" and "MOSTRECENT_FILE" together 
for constants. Underscores are cheap.
# Seems odd to have "Preconditions.checkState(zkClient != null);" inside the 
try/catch for KeeperException.
# Recommend adding something like "this method will create the znode and all 
parents if it does not exist" to the method comment of "ensureBaseZNode". 
# Seems like we should make operationSuccess, operationNodeExists, 
operationNodeDoesNotExist, and operationRetry static functions. I also question 
the usefulness of the names of these methods. Feel free to punt this to another 
JIRA.
# Seems like ensureBaseZNode can potentially return without error, but not have 
actually created the desired znode, if we retry 3 times but continue to get the 
retry error code returned from ZK.
# You left in a "TODO: should handle retries" - do you intend to address this 
in a separate JIRA?
# Now that we have a new method which creates a different ZNode, seems like we 
should rename the createNode() method to make it clear that it only creates the 
lock file ZNode. Likewise monitorNode() should probably either be parameterized 
or renamed to be more specific.
# Seems a little odd to throw an AssertionError in the case of unexpected app 
data in the ZNode we're trying to delete, though I don't have a good suggestion 
for a better exception. Feel free to ignore this comment.
# I think it's reasonable to change the "Checking for any old active..." 
message to be moved from debug level to info, particularly since the "Old node 
exists..." message is info level.
# You left in a "// TODO: think carefully about this error handling." Will this 
be addressed in a subsequent JIRA?
                
> Improve ActiveStandbyElector to provide hooks for fencing old active
> --------------------------------------------------------------------
>
>                 Key: HADOOP-8163
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8163
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha
>    Affects Versions: 0.24.0, 0.23.3
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-8163.txt
>
>
> When a new node becomes active in an HA setup, it may sometimes have to take 
> fencing actions against the node that was formerly active. This JIRA extends 
> the ActiveStandbyElector which adds an extra non-ephemeral node into the ZK 
> directory, which acts as a second copy of the active node's information. 
> Then, if the active loses its ZK session, the next active to be elected may 
> easily locate the unfenced node to take the appropriate actions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8163) Improve ActiveStandbyElector to provide hooks for fencing old active

Reply via email to