[ 
https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754871#comment-13754871
 ] 

Jimmy Xiang commented on HBASE-9387:
------------------------------------

Agree. In this case, can we check again if the node still exists?  If it is 
still there, that means we failed to transition the node, we need to close the 
region.  If it is not there, rs can abort, just to be safe?
                
> Region could get lost during assignment
> ---------------------------------------
>
>                 Key: HBASE-9387
>                 URL: https://issues.apache.org/jira/browse/HBASE-9387
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 0.95.2
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 9387-v1.txt, hbase-9387.patch, 
> org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt
>
>
> I observed test timeout running against hadoop 2.1.0 with distributed log 
> replay turned on.
> Looks like region state for 1588230740 became inconsistent between master and 
> the surviving region server:
> {code}
> 2013-08-29 22:15:34,180 INFO  [AM.ZK.Worker-pool2-t4] 
> master.RegionStates(299): Onlined 1588230740 on 
> kiyo.gq1.ygridcore.net,57016,1377814510039
> ...
> 2013-08-29 22:15:34,587 DEBUG [Thread-221] 
> client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta 
> parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, 
> hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of 
> 35 failed; retrying after sleep of 302 because: 
> org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being 
> opened: 1588230740
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063)
>         at 
> org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800)
>         at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165)
>         at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to