[ 
https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754837#comment-13754837
 ] 

Jimmy Xiang commented on HBASE-9387:
------------------------------------

Most likely your ZK/network was in a bad state at that moment.  RS setData was 
retried while master already got it and deleted it, so RS setData retry failed. 
 The RS ZKUtil.setData retry took so long, this must be some thread scheduling 
issue too. This issue should apply to all scenario setData retry is used. The 
root cause is that setData succeeds, the other party gets it and does something 
with the data, the setData retry fails since the data is updated by the other 
party, so the setData caller is fooled/screwed.
                
> Region could get lost during assignment
> ---------------------------------------
>
>                 Key: HBASE-9387
>                 URL: https://issues.apache.org/jira/browse/HBASE-9387
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 0.95.2
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: 9387-v1.txt, hbase-9387.patch, 
> org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt
>
>
> I observed test timeout running against hadoop 2.1.0 with distributed log 
> replay turned on.
> Looks like region state for 1588230740 became inconsistent between master and 
> the surviving region server:
> {code}
> 2013-08-29 22:15:34,180 INFO  [AM.ZK.Worker-pool2-t4] 
> master.RegionStates(299): Onlined 1588230740 on 
> kiyo.gq1.ygridcore.net,57016,1377814510039
> ...
> 2013-08-29 22:15:34,587 DEBUG [Thread-221] 
> client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta 
> parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, 
> hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of 
> 35 failed; retrying after sleep of 302 because: 
> org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being 
> opened: 1588230740
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063)
>         at 
> org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800)
>         at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165)
>         at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to