[ https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208168#comment-13208168 ]
Todd Lipcon commented on HDFS-2949: ----------------------------------- I think we should probably un-document the transitionTo* commands, but leave them as a safety valve. It's nice to have direct access to these RPCs just in case there's some problem with one of the safer methods and you need a workaround without recompiling the client. That said, having the safety check described in this JIRA is still valuable, even using haadmin -failover, in case the admin has a messed up configuration in some way (eg the fencing script returns true but did not in fact fence the standby correctly) > HA: Add check to active state transition to prevent operator-induced split > brain > -------------------------------------------------------------------------------- > > Key: HDFS-2949 > URL: https://issues.apache.org/jira/browse/HDFS-2949 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node > Affects Versions: HA branch (HDFS-1623) > Reporter: Todd Lipcon > > Currently, if the administrator mistakenly calls "-transitionToActive" on one > NN while the other one is still active, all hell will break loose. We can add > a simple check by having the NN make a getServiceState() RPC to its peer with > a short (~1 second?) timeout. If the RPC succeeds and indicates the other > node is active, it should refuse to enter active mode. If the RPC fails or > indicates standby, it can proceed. > This is just meant as a preventative safety check - we still expect users to > use the "-failover" command which has other checks plus fencing built in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira