[ 
https://issues.apache.org/jira/browse/HADOOP-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265958#comment-13265958
 ] 

Aaron T. Myers commented on HADOOP-8279:
----------------------------------------

Patch looks pretty good to me, Todd. A few little comments:

# "-forceFence doesn't seem to have any real use cases with auto-HA so it isn't 
implemented." - I don't follow the reasoning. Seems like it should be just as 
applicable to auto-HA as manual, no?
# "If the attempt to transition to standby succeeds, then the ZKFC will delete 
the breadcrumb node in ZooKeeper" - might want to specify which ZKFC will do 
the deletion.
# "If the node is healthy and not active, it sends an RPC to the current 
active, asking it to yield from the election." - it actually sends an RPC to 
the ZKFC associated with the current active.
# "if the current active does not respond to the graceful request, throws an 
exception indicating the reason for failure." - I recommend you make it 
explicit which graceful request this is referring to. In fact, if the active NN 
fails to respond to the graceful request to transition to standby, it will be 
fenced. It's the failure of the active ZKFC to respond to the cedeActive calls 
that results in a failure of gracefulFailover.
# I think you need interface annotations on ZKFCRpcServer, or perhaps it can be 
made package-private?
# In ZKFCProtocol#cedeActive you declare the parameter to be in millis, but in 
the ZKFCRpcServer#cedeActive implementation, you say the period is in seconds.
# I don't see much point in having both ZKFCRpcServer#stop and 
ZKFCRpcServer#join. Why not just call this.server.join in ZKFCRpcServer#stop?
# "periodically check health state since, because entering an" - doesn't quite 
parse.
# I think the log message about the timeout elapsing in 
ZKFailoverController#waitForActiveAttempt should probably be at least at WARN 
level instead of INFO.
# "It's possible that it's in standby but just about to go into active, no? Is 
there some race here?" - should this comment now be removed?
# I recommend you change the value of DFS_HA_ZKFC_PORT_DEFAULT to something 
other than 8021. I've seen a lot of JTs in the wild with their default port set 
to 8021.
# The design in the document posted to HDFS-2185 mentions introducing "-to" and 
"-from" parameters to the `haadmin -failover' command, but this implementation 
doesn't do that. That seems fine by me, but I'm curious why you chose to do it 
this way.
                
> Auto-HA: Allow manual failover to be invoked from zkfc.
> -------------------------------------------------------
>
>                 Key: HADOOP-8279
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8279
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: auto-failover, ha
>    Affects Versions: Auto Failover (HDFS-3042)
>            Reporter: Mingjie Lai
>            Assignee: Todd Lipcon
>             Fix For: Auto Failover (HDFS-3042)
>
>         Attachments: hadoop-8279.txt, hadoop-8279.txt, hadoop-8279.txt, 
> hadoop-8279.txt
>
>
> HADOOP-8247 introduces a configure flag to prevent potential status 
> inconsistency between zkfc and namenode, by making auto and manual failover 
> mutually exclusive.
> However, as described in 2.7.2 section of design doc at HDFS-2185, we should 
> allow manual and auto failover co-exist, by:
> - adding some rpc interfaces at zkfc
> - manual failover shall be triggered by haadmin, and handled by zkfc if auto 
> failover is enabled. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to