[ https://issues.apache.org/jira/browse/HBASE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160437#comment-13160437 ]
stack commented on HBASE-4899: ------------------------------ Good find. Here are some comments on the patch. {code} + if(rit!=null&&!rit.isClosing() && !rit.isPendingClose()){ {code} Our style is to put spaces between operators as in 'if (rit != null &&...' etc. + LOG.debug("Skip assgining region " The above should be assigning. + + rit.getRegion().getRegionNameAsString() + + " because in RIT region state: " + rit.getState()); I think just to rit.toString()? + continue; This is confusing. Is this code inside a loop and this continue takes us to the head of the loop again? Its buried pretty deeply here so I'd not be surprised if readers missed it. Shouldn't you rather write the code as if/else? + if (addressFromAM != null && !addressFromAM.equals(this.serverName)) { + LOG.debug("Skip assgining region " Again, a misspelling. I can fix the above small things on commit or maybe you want to have a go at it? This is a pretty nice find. Thanks for the patch. > Region would be assigned twice easily with continually killing server and > moving region in testing environment > --------------------------------------------------------------------------------------------------------------- > > Key: HBASE-4899 > URL: https://issues.apache.org/jira/browse/HBASE-4899 > Project: HBase > Issue Type: Bug > Affects Versions: 0.92.1 > Reporter: chunhui shen > Assignee: chunhui shen > Attachments: hbase-4899.patch, hbase-4899v2.patch > > > Before assigning region in ServerShutdownHandler#process, it will check > whether region is in RIT, > however, this checking doesn't work as the excepted in the following case: > 1.move region A from server B to server C > 2.kill server B > 3.start server B immediately > Let's see what happen in the code for the above case > {code} > for step1: > 1.1 server B close the region A, > 1.2 master setOffline for region > A,(AssignmentManager#setOffline:this.regions.remove(regionInfo)) > 1.3 server C start to open region A.(Not completed) > for step3: > master ServerShutdownHandler#process() for server B > { > .. > splitlog() > ... > List<RegionState> regionsInTransition = > this.services.getAssignmentManager() > .processServerShutdown(this.serverName); > ... > Skip regions that were in transition unless CLOSING or PENDING_CLOSE > ... > assign region > } > {code} > In fact, when running > ServerShutdownHandler#process()#this.services.getAssignmentManager().processServerShutdown(this.serverName), > region A is in RIT (step1.3 not completed), but the return List<RegionState> > regionsInTransition doesn't contain it, because region A has removed from > AssignmentManager.regions by AssignmentManager#setOffline in step 1.2 > Therefore, region A will be assigned twice. > Actually, one server killed and started twice will also easily cause region > assigned twice. > Exclude the above reason, another probability : > when execute ServerShutdownHandler#process()#MetaReader.getServerUserRegions > ,region is included which is in RIT now. > But after completing MetaReader.getServerUserRegions, the region has been > opened in other server and is not in RIT now. > In our testing environment where balancing,moving and killing are executed > periodly, assigning region twice often happens, and it is hateful because it > will affect other test cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira