[ https://issues.apache.org/jira/browse/HADOOP-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265958#comment-13265958 ]
Aaron T. Myers commented on HADOOP-8279: ---------------------------------------- Patch looks pretty good to me, Todd. A few little comments: # "-forceFence doesn't seem to have any real use cases with auto-HA so it isn't implemented." - I don't follow the reasoning. Seems like it should be just as applicable to auto-HA as manual, no? # "If the attempt to transition to standby succeeds, then the ZKFC will delete the breadcrumb node in ZooKeeper" - might want to specify which ZKFC will do the deletion. # "If the node is healthy and not active, it sends an RPC to the current active, asking it to yield from the election." - it actually sends an RPC to the ZKFC associated with the current active. # "if the current active does not respond to the graceful request, throws an exception indicating the reason for failure." - I recommend you make it explicit which graceful request this is referring to. In fact, if the active NN fails to respond to the graceful request to transition to standby, it will be fenced. It's the failure of the active ZKFC to respond to the cedeActive calls that results in a failure of gracefulFailover. # I think you need interface annotations on ZKFCRpcServer, or perhaps it can be made package-private? # In ZKFCProtocol#cedeActive you declare the parameter to be in millis, but in the ZKFCRpcServer#cedeActive implementation, you say the period is in seconds. # I don't see much point in having both ZKFCRpcServer#stop and ZKFCRpcServer#join. Why not just call this.server.join in ZKFCRpcServer#stop? # "periodically check health state since, because entering an" - doesn't quite parse. # I think the log message about the timeout elapsing in ZKFailoverController#waitForActiveAttempt should probably be at least at WARN level instead of INFO. # "It's possible that it's in standby but just about to go into active, no? Is there some race here?" - should this comment now be removed? # I recommend you change the value of DFS_HA_ZKFC_PORT_DEFAULT to something other than 8021. I've seen a lot of JTs in the wild with their default port set to 8021. # The design in the document posted to HDFS-2185 mentions introducing "-to" and "-from" parameters to the `haadmin -failover' command, but this implementation doesn't do that. That seems fine by me, but I'm curious why you chose to do it this way. > Auto-HA: Allow manual failover to be invoked from zkfc. > ------------------------------------------------------- > > Key: HADOOP-8279 > URL: https://issues.apache.org/jira/browse/HADOOP-8279 > Project: Hadoop Common > Issue Type: Improvement > Components: auto-failover, ha > Affects Versions: Auto Failover (HDFS-3042) > Reporter: Mingjie Lai > Assignee: Todd Lipcon > Fix For: Auto Failover (HDFS-3042) > > Attachments: hadoop-8279.txt, hadoop-8279.txt, hadoop-8279.txt, > hadoop-8279.txt > > > HADOOP-8247 introduces a configure flag to prevent potential status > inconsistency between zkfc and namenode, by making auto and manual failover > mutually exclusive. > However, as described in 2.7.2 section of design doc at HDFS-2185, we should > allow manual and auto failover co-exist, by: > - adding some rpc interfaces at zkfc > - manual failover shall be triggered by haadmin, and handled by zkfc if auto > failover is enabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira