[ https://issues.apache.org/jira/browse/HADOOP-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250294#comment-13250294 ]
Todd Lipcon commented on HADOOP-8247: ------------------------------------- I also ran the manual tests again. Here's the usage output of HAAdmin: {code} Usage: DFSHAAdmin [-ns <nameserviceId>] [-transitionToActive [--forcemanual] <serviceId>] [-transitionToStandby [--forcemanual] <serviceId>] [-failover [--forcefence] [--forceactive] [--forcemanual] <serviceId> <serviceId>] [-getServiceState <serviceId>] [-checkHealth <serviceId>] [-help <command>] --forceManual allows the manual failover commands to be used even when automatic failover is enabled. This flag is DANGEROUS and should only be used with expert guidance. {code} Here's what happens if I try to use a state change command with auto-HA enabled: {code} $ ./bin/hdfs haadmin -transitionToActive nn1 Automatic failover is enabled for NameNode at todd-w510/127.0.0.1:8021 Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. $ echo $? 255 {code} Also checked the other two state-changing ops (transitionToStandby and failover) and they yielded the same error message. - I verified that {{-getServiceState}} and {{-checkHealth}} continue to work. - I verified that the -forceManual flag worked: {code} $ ./bin/hdfs haadmin -transitionToStandby -forcemanual nn1 12/04/09 16:12:38 WARN ha.HAAdmin: Proceeding with manual HA state management even though automatic failover is enabled for NameNode at todd-w510/127.0.0.1:8021 {code} (also for -transitionToActive and -failover) - Verified that {{start-dfs.sh}} starts the ZKFCs on both of my configured NNs when auto-HA is enabled. Also verified {{stop-dfs.sh}} stops the ZKFCs. Discovered trivial bug HDFS-3234 here. ---- Next, I modified my config to set the auto failover flag to false. - verified that start-dfs.sh doesn't try to start ZKFCs. - verified that if I try to start a ZKFC, it bails: {code} 12/04/09 16:19:12 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode nameserviceId1.nn2 12/04/09 16:19:12 FATAL ha.ZKFailoverController: Automatic failover is not enabled for NameNode at todd-w510/127.0.0.1:8022. Please ensure that automatic failover is enabled in the configuration before running the ZK failover controller. {code} - verified that the haadmin commands all function without any {{-forcemanual}} flag specified. > Auto-HA: add a config to enable auto-HA, which disables manual FC > ----------------------------------------------------------------- > > Key: HADOOP-8247 > URL: https://issues.apache.org/jira/browse/HADOOP-8247 > Project: Hadoop Common > Issue Type: Improvement > Components: auto-failover, ha > Affects Versions: Auto Failover (HDFS-3042) > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hadoop-8247.txt, hadoop-8247.txt, hadoop-8247.txt, > hadoop-8247.txt > > > Currently, if automatic failover is set up and running, and the user uses the > "haadmin -failover" command, he or she can end up putting the system in an > inconsistent state, where the state in ZK disagrees with the actual state of > the world. To fix this, we should add a config flag which is used to enable > auto-HA. When this flag is set, we should disallow use of the haadmin command > to initiate failovers. We should refuse to run ZKFCs when the flag is not > set. Of course, this flag should be scoped by nameservice. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira