[ https://issues.apache.org/jira/browse/HBASE-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287235#comment-13287235 ]
Hudson commented on HBASE-6122: ------------------------------- Integrated in HBase-0.92-security #109 (See [https://builds.apache.org/job/HBase-0.92-security/109/]) HBASE-6122 Backup master does not become Active master after ZK exception (Ram) (Revision 1344799) HBASE-6122 Backup master does not become Active master after ZK exception: REVERT (Revision 1344466) HBASE-6122 Backup master does not become Active master after ZK exception (Ram) (Revision 1344350) Result = SUCCESS ramkrishna : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterZKSessionRecovery.java stack : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java ramkrishna : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java > Backup master does not become Active master after ZK exception > -------------------------------------------------------------- > > Key: HBASE-6122 > URL: https://issues.apache.org/jira/browse/HBASE-6122 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.0 > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Fix For: 0.92.2, 0.94.1 > > Attachments: HBASE-6122.patch, HBASE-6122_0.92.patch, > HBASE-6122_0.94.patch, HBASE-6122_0.94.patch > > > -> Active master gets ZK expiry exception. > -> Backup master becomes active. > -> The previous active master retries and becomes the back up master. > Now when the new active master goes down and the current back up master comes > up, it goes down again with the zk expiry exception it got in the first step. > {code} > if (abortNow(msg, t)) { > if (t != null) LOG.fatal(msg, t); > else LOG.fatal(msg); > this.abort = true; > stop("Aborting"); > } > {code} > In ActiveMasterManager.blockUntilBecomingActiveMaster we try to wait till the > back up master becomes active. > {code} > synchronized (this.clusterHasActiveMaster) { > while (this.clusterHasActiveMaster.get() && !this.master.isStopped()) { > try { > this.clusterHasActiveMaster.wait(); > } catch (InterruptedException e) { > // We expect to be interrupted when a master dies, will fall out if > so > LOG.debug("Interrupted waiting for master to die", e); > } > } > if (!clusterStatusTracker.isClusterUp()) { > this.master.stop("Cluster went down before this master became > active"); > } > if (this.master.isStopped()) { > return cleanSetOfActiveMaster; > } > // Try to become active master again now that there is no active master > blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); > } > return cleanSetOfActiveMaster; > {code} > When the back up master (it is in back up mode as he got ZK exception), once > again tries to come to active we don't get the return value that comes out > from > {code} > // Try to become active master again now that there is no active master > blockUntilBecomingActiveMaster(startupStatus,clusterStatusTracker); > {code} > We tend to return the 'cleanSetOfActiveMaster' which was previously false. > Now because of this instead of again becoming active the back up master goes > down in the abort() code. Thanks to Gopi,my colleague for reporting this > issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira