[ 
https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456494#comment-13456494
 ] 

Ted Yu commented on HBASE-6438:
-------------------------------

I ran the test suite:
{code}
[INFO] HBase - Server .................................... FAILURE [45:18.213s]
[INFO] HBase - Hadoop Two Compatibility .................. SKIPPED
[INFO] HBase - Integration Tests ......................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 45:27.603s
{code}
 and got one test failure:
{code}
  queueFailover(org.apache.hadoop.hbase.replication.TestReplication): test 
timed out after 300000 milliseconds
{code}
I don't think the above is related to Rajesh's patch.

+1 from me.
                
> RegionAlreadyInTransitionException needs to give more info to avoid 
> assignment inconsistencies
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6438
>                 URL: https://issues.apache.org/jira/browse/HBASE-6438
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: rajeshbabu
>             Fix For: 0.96.0, 0.92.3, 0.94.3
>
>         Attachments: 6438-trunk_2.patch, HBASE-6438_2.patch, 
> HBASE-6438_94_3.patch, HBASE-6438_94_4.patch, HBASE-6438_94.patch, 
> HBASE-6438-trunk_2.patch, HBASE-6438_trunk.patch
>
>
> Seeing some of the recent issues in region assignment, 
> RegionAlreadyInTransitionException is one reason after which the region 
> assignment may or may not happen(in the sense we need to wait for the TM to 
> assign).
> In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on 
> master restart.
> Consider the following case, due to some reason like master restart or 
> external assign call, we try to assign a region that is already getting 
> opened in a RS.
> Now the next call to assign has already changed the state of the znode and so 
> the current assign that is going on the RS is affected and it fails.  The 
> second assignment that started also fails getting RAITE exception.  Finally 
> both assignments not carrying on.  Idea is to find whether any such RAITE 
> exception can be retried or not.
> Here again we have following cases like where
> -> The znode is yet to transitioned from OFFLINE to OPENING in RS
> -> RS may be in the step of openRegion.
> -> RS may be trying to transition OPENING to OPENED.
> -> RS is yet to add to online regions in the RS side.
> Here in openRegion() and updateMeta() any failures we are moving the znode to 
> FAILED_OPEN.  So in these cases getting an RAITE should be ok.  But in other 
> cases the assignment is stopped.
> The idea is to just add the current state of the region assignment in the RIT 
> map in the RS side and using that info we can determine whether the 
> assignment can be retried or not on getting an RAITE.
> Considering the current work going on in AM, pls do share if this is needed 
> atleast in the 0.92/0.94 versions?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to