[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

ramkrishna.s.vasudevan (JIRA) Wed, 15 Aug 2012 22:39:42 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435772#comment-13435772
 ]


ramkrishna.s.vasudevan commented on HBASE-6587:
-----------------------------------------------

@Chunhui
The intention of your soln is valid.  but few questions just to clarify the 
scenario
First time when the assignment started
{code}
2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
No previous transition plan was found (or we are ignoring an existing plan) for 
writete
st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so 
generated a random one; 
hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, 
available=1) available servers
{code}
there was atleast one server right.  
Then when the timeout monitor thread saw that there were no region server  
online and that the flag allRegionServersOffline should be set to true.
In this case the prev assignment has already failed right as there is no RS.  I 
am sure am missing something here.  Can you tel me how the double assignment 
happened?
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0
>
>         Attachments: 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if 
> (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back 
> online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region 
> which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan 
> to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
> was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so 
> generated a random one; 
> hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, 
> available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 
> 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, 
> server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for 
> too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Reply via email to