[jira] [Commented] (HBASE-9387) Region could get lost during assignment

stack (JIRA) Fri, 30 Aug 2013 14:37:48 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755153#comment-13755153
 ]


stack commented on HBASE-9387:
------------------------------

You should probably set stopped state too if you call abort in your 
MockRegionServer since abort always calls stop.

This change is now gratuitous, right?

@@ -438,9 +439,10 @@
           EventType.M_ZK_REGION_OFFLINE,
           EventType.RS_ZK_REGION_FAILED_OPEN,
           versionOfOfflineNode) == -1) {
-        LOG.warn("Unable to mark region " + hri + " as FAILED_OPEN. " +
+        String warnMsg = "Unable to mark region " + hri + " as FAILED_OPEN. " +
             "It's likely that the master already timed out this open " +
-            "attempt, and thus another RS already has the region.");
+            "attempt, and thus another RS already has the region.";
+        LOG.warn(warnMsg);
       } else {
         result = true;
       }

On the test change, how I know it replicates what we saw here?  I started to 
dig but it was taking too long.  Would expect comment to explain why we expect 
RS to abort.  Would expect to see explain why the yanking of znode is not same 
as master removing it on successful open.

                
> Region could get lost during assignment
> ---------------------------------------
>
>                 Key: HBASE-9387
>                 URL: https://issues.apache.org/jira/browse/HBASE-9387
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 0.95.2
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>         Attachments: 9387-v1.txt, 9387-v3.txt, 9387-v4.2.txt, 9387-v4.3.txt, 
> 9387-v4.txt, 9387-v5.txt, hbase-9387.patch, 
> org.apache.hadoop.hbase.TestFullLogReconstruction-output.txt
>
>
> I observed test timeout running against hadoop 2.1.0 with distributed log 
> replay turned on.
> Looks like region state for 1588230740 became inconsistent between master and 
> the surviving region server:
> {code}
> 2013-08-29 22:15:34,180 INFO  [AM.ZK.Worker-pool2-t4] 
> master.RegionStates(299): Onlined 1588230740 on 
> kiyo.gq1.ygridcore.net,57016,1377814510039
> ...
> 2013-08-29 22:15:34,587 DEBUG [Thread-221] 
> client.HConnectionManager$HConnectionImplementation(1269): locateRegionInMeta 
> parentTable=hbase:meta, metaLocation={region=hbase:meta,,1.1588230740, 
> hostname=kiyo.gq1.ygridcore.net,57016,1377814510039, seqNum=0}, attempt=2 of 
> 35 failed; retrying after sleep of 302 because: 
> org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being 
> opened: 1588230740
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2574)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3949)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2733)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26965)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2063)
>         at 
> org.apache.hadoop.hbase.ipc.RpcServer$CallRunner.run(RpcServer.java:1800)
>         at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:165)
>         at 
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:41)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-9387) Region could get lost during assignment

Reply via email to