[jira] [Comment Edited] (HBASE-22017) Master Fails to become active due to the data race bug in region server

Allan Yang (JIRA) Tue, 12 Mar 2019 22:45:06 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-22017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791330#comment-16791330
 ]


Allan Yang edited comment on HBASE-22017 at 3/13/19 5:44 AM:
-------------------------------------------------------------

I think this would be better:
{code:java}
       // where processing of request takes > lease expiration time.
       lease = regionServer.leases.removeLease(scannerName);
     } catch (LeaseException e) {
-      throw new ServiceException(e);
+      IOException ioE = e;
+      // There is a case that the lease is closed because of RS shutting down
+      try {
+        checkOpen();
+      } catch (IOException ioexception) {
+        ioE = ioexception;
+      }
+      throw new ServiceException(ioE);
{code}

And [~Apache9], should we handle LeaseException just like 
OutOfOrderScannerNextException or UnknownScannerException and retry?
If we do retry for LeaseException, then no need to fix this issue.


was (Author: allan163):
I think this would be better:
{code:java}
       // where processing of request takes > lease expiration time.
       lease = regionServer.leases.removeLease(scannerName);
     } catch (LeaseException e) {
-      throw new ServiceException(e);
+      IOException ioE = e;
+      // There is a case that the lease is closed because of RS shutting down
+      try {
+        checkOpen();
+      } catch (IOException ioexception) {
+        ioE = ioexception;
+      }
+      throw new ServiceException(ioE);
{code}

And [~Apache9], should we handle LeaseException just like 
OutOfOrderScannerNextException or UnknownScannerException and retry?

> Master Fails to become active due to the data race bug in region server
> -----------------------------------------------------------------------
>
>                 Key: HBASE-22017
>                 URL: https://issues.apache.org/jira/browse/HBASE-22017
>             Project: HBase
>          Issue Type: Bug
>            Reporter: lujie
>            Assignee: lujie
>            Priority: Critical
>         Attachments: 0001-fix-HBASE-22017.patch, 0002-fix-HBASE-22017.patch, 
> fixedlogs.zip, logs.zip
>
>
> Test cluster: hadoop11(master), hadoop14(slave), haoop15(slave).
> before code execute at 
> org.apache.hadoop.hbase.regionserver.HStore#getScanner(function)#2027(line 
> number), hadoop15 shutdown, then master startup fails
> {code:java}
> 2019-03-06 01:36:17,040 ERROR [master/hadoop11:16000:becomeActiveMaster] 
> master.HMaster: ***** ABORTING master hadoop11,16000,1551807353275: Unhandled 
> exception. Starting shutdown. *****
> org.apache.hadoop.hbase.regionserver.LeaseException: 
> org.apache.hadoop.hbase.regionserver.LeaseException: lease 
> '3449673378019934209' does not exist
> at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:224)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3434)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42002)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:100)
> at 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:90)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:361)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:349)
> at 
> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:344)
> at 
> org.apache.hadoop.hbase.client.ScannerCallable.rpcCall(ScannerCallable.java:242)
> at 
> org.apache.hadoop.hbase.client.ScannerCallable.rpcCall(ScannerCallable.java:58)
> at 
> org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)
> at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:387)
> at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:361)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107)
> at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-22017) Master Fails to become active due to the data race bug in region server

Reply via email to