[ https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278839#comment-13278839 ]
ramkrishna.s.vasudevan commented on HBASE-6046: ----------------------------------------------- The problem here is when the master retries to come out of zk expiry exception and if he succeeds the entire master is almost recreated in the sense {code} try { if (!becomeActiveMaster(status)) { return Boolean.FALSE; } initializeZKBasedSystemTrackers(); // Update in-memory structures to reflect our earlier Root/Meta assignment. assignRootAndMeta(status); // process RIT if any // TODO: Why does this not call AssignmentManager.joinCluster? Otherwise // we are not processing dead servers if any. assignmentManager.processDeadServersAndRegionsInTransition(); {code} Here the initializeZKBasedSystemTrackers() will even create new AssignmentManager. So what ever he does in processDeadServersAndRegionsInTransition() is like a fresh start. So in processDeadServersAndRegionsInTransition() {code} for (Map.Entry<HRegionInfo, ServerName> e: this.regions.entrySet()) { if (!e.getKey().isMetaTable() && e.getValue() != null) { LOG.debug("Found " + e + " out on cluster"); this.failover = true; break; } {code} Though all the RS is online we will have the 'this.regions' empty and hence we go with completely new assignment. > Master retry on ZK session expiry causes inconsistent region assignments. > ------------------------------------------------------------------------- > > Key: HBASE-6046 > URL: https://issues.apache.org/jira/browse/HBASE-6046 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.92.1, 0.94.0 > Reporter: Gopinathan A > Assignee: ramkrishna.s.vasudevan > Fix For: 0.92.2, 0.94.1 > > > 1> ZK Session timeout in the hmaster leads to bulk assignment though all the > RSs are online. > 2> While doing bulk assignment, if the master again goes down & restart(or > backup comes up) all the node created in the ZK will now be tried to reassign > to the new RSs. This is leading to double assignment. > we had 2800 regions, among this 1900 region got double assignment, taking the > region count to 4700. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira