[ https://issues.apache.org/jira/browse/HADOOP-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13670966#comment-13670966 ]
Bikas Saha commented on HADOOP-9608: ------------------------------------ Do you mean because NNA's ZKFC does not have C's address or there is a fixed mapping of the ith NN to a server and that mapping is stale? The proposed solution is to make every standby ZKFC restart when it discovers an active leader that cannot be connected to? This would mean all standby NN would be rebooting in the above scenario when C becomes master, right? > ZKFC should abort if it sees an unrecognized NN become active > ------------------------------------------------------------- > > Key: HADOOP-9608 > URL: https://issues.apache.org/jira/browse/HADOOP-9608 > Project: Hadoop Common > Issue Type: Bug > Components: ha > Affects Versions: 3.0.0 > Reporter: Todd Lipcon > > We recently had an issue where one NameNode and ZKFC was updated to a new > configuration/IP address but the ZKFC on the other node was not rebooted. > Then, next time a failover occurred, the second ZKFC was not able to become > active because the data in the ActiveBreadCrumb didn't match the data in its > own configuration: > {code} > org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of > election > java.lang.IllegalArgumentException: Unable to determine service address for > namenode 'XXXX' > {code} > To prevent this from happening, whenever the ZKFC sees a new NN become > active, it should check that it's properly able to instantiate a > ServiceTarget for it, and if not, abort (since this ZKFC wouldn't be able to > handle a failover successfully) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira