[ https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456494#comment-13456494 ]
Ted Yu commented on HBASE-6438: ------------------------------- I ran the test suite: {code} [INFO] HBase - Server .................................... FAILURE [45:18.213s] [INFO] HBase - Hadoop Two Compatibility .................. SKIPPED [INFO] HBase - Integration Tests ......................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 45:27.603s {code} and got one test failure: {code} queueFailover(org.apache.hadoop.hbase.replication.TestReplication): test timed out after 300000 milliseconds {code} I don't think the above is related to Rajesh's patch. +1 from me. > RegionAlreadyInTransitionException needs to give more info to avoid > assignment inconsistencies > ---------------------------------------------------------------------------------------------- > > Key: HBASE-6438 > URL: https://issues.apache.org/jira/browse/HBASE-6438 > Project: HBase > Issue Type: Bug > Reporter: ramkrishna.s.vasudevan > Assignee: rajeshbabu > Fix For: 0.96.0, 0.92.3, 0.94.3 > > Attachments: 6438-trunk_2.patch, HBASE-6438_2.patch, > HBASE-6438_94_3.patch, HBASE-6438_94_4.patch, HBASE-6438_94.patch, > HBASE-6438-trunk_2.patch, HBASE-6438_trunk.patch > > > Seeing some of the recent issues in region assignment, > RegionAlreadyInTransitionException is one reason after which the region > assignment may or may not happen(in the sense we need to wait for the TM to > assign). > In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on > master restart. > Consider the following case, due to some reason like master restart or > external assign call, we try to assign a region that is already getting > opened in a RS. > Now the next call to assign has already changed the state of the znode and so > the current assign that is going on the RS is affected and it fails. The > second assignment that started also fails getting RAITE exception. Finally > both assignments not carrying on. Idea is to find whether any such RAITE > exception can be retried or not. > Here again we have following cases like where > -> The znode is yet to transitioned from OFFLINE to OPENING in RS > -> RS may be in the step of openRegion. > -> RS may be trying to transition OPENING to OPENED. > -> RS is yet to add to online regions in the RS side. > Here in openRegion() and updateMeta() any failures we are moving the znode to > FAILED_OPEN. So in these cases getting an RAITE should be ok. But in other > cases the assignment is stopped. > The idea is to just add the current state of the region assignment in the RIT > map in the RS side and using that info we can determine whether the > assignment can be retried or not on getting an RAITE. > Considering the current work going on in AM, pls do share if this is needed > atleast in the 0.92/0.94 versions? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira