[ https://issues.apache.org/jira/browse/HBASE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263372#comment-13263372 ]
Hudson commented on HBASE-5829: ------------------------------- Integrated in HBase-TRUNK-security #186 (See [https://builds.apache.org/job/HBase-TRUNK-security/186/]) HBASE-5829 Inconsistency between the "regions" map and the "servers" map in AssignmentManager (Revision 1330993) Result = SUCCESS stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java > Inconsistency between the "regions" map and the "servers" map in > AssignmentManager > ---------------------------------------------------------------------------------- > > Key: HBASE-5829 > URL: https://issues.apache.org/jira/browse/HBASE-5829 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.6, 0.92.1 > Reporter: Maryann Xue > Assignee: Maryann Xue > Fix For: 0.96.0 > > Attachments: HBASE-5829-0.90.patch, HBASE-5829-trunk.patch > > > There are occurrences in AM where this.servers is not kept consistent with > this.regions. This might cause balancer to offline a region from the RS that > already returned NotServingRegionException at a previous offline attempt. > In AssignmentManager.unassign(HRegionInfo, boolean) > try { > // TODO: We should consider making this look more like it does for the > // region open where we catch all throwables and never abort > if (serverManager.sendRegionClose(server, state.getRegion(), > versionOfClosingNode)) { > LOG.debug("Sent CLOSE to " + server + " for region " + > region.getRegionNameAsString()); > return; > } > // This never happens. Currently regionserver close always return true. > LOG.warn("Server " + server + " region CLOSE RPC returned false for " + > region.getRegionNameAsString()); > } catch (NotServingRegionException nsre) { > LOG.info("Server " + server + " returned " + nsre + " for " + > region.getRegionNameAsString()); > // Presume that master has stale data. Presume remote side just split. > // Presume that the split message when it comes in will fix up the > master's > // in memory cluster state. > } catch (Throwable t) { > if (t instanceof RemoteException) { > t = ((RemoteException)t).unwrapRemoteException(); > if (t instanceof NotServingRegionException) { > if (checkIfRegionBelongsToDisabling(region)) { > // Remove from the regionsinTransition map > LOG.info("While trying to recover the table " > + region.getTableNameAsString() > + " to DISABLED state the region " + region > + " was offlined but the table was in DISABLING state"); > synchronized (this.regionsInTransition) { > this.regionsInTransition.remove(region.getEncodedName()); > } > // Remove from the regionsMap > synchronized (this.regions) { > this.regions.remove(region); > } > deleteClosingOrClosedNode(region); > } > } > // RS is already processing this region, only need to update the > timestamp > if (t instanceof RegionAlreadyInTransitionException) { > LOG.debug("update " + state + " the timestamp."); > state.update(state.getState()); > } > } > In AssignmentManager.assign(HRegionInfo, RegionState, boolean, boolean, > boolean) > synchronized (this.regions) { > this.regions.put(plan.getRegionInfo(), plan.getDestination()); > } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira