[ https://issues.apache.org/jira/browse/HBASE-22017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791330#comment-16791330 ]
Allan Yang edited comment on HBASE-22017 at 3/13/19 5:44 AM: ------------------------------------------------------------- I think this would be better: {code:java} // where processing of request takes > lease expiration time. lease = regionServer.leases.removeLease(scannerName); } catch (LeaseException e) { - throw new ServiceException(e); + IOException ioE = e; + // There is a case that the lease is closed because of RS shutting down + try { + checkOpen(); + } catch (IOException ioexception) { + ioE = ioexception; + } + throw new ServiceException(ioE); {code} And [~Apache9], should we handle LeaseException just like OutOfOrderScannerNextException or UnknownScannerException and retry? If we do retry for LeaseException, then no need to fix this issue. was (Author: allan163): I think this would be better: {code:java} // where processing of request takes > lease expiration time. lease = regionServer.leases.removeLease(scannerName); } catch (LeaseException e) { - throw new ServiceException(e); + IOException ioE = e; + // There is a case that the lease is closed because of RS shutting down + try { + checkOpen(); + } catch (IOException ioexception) { + ioE = ioexception; + } + throw new ServiceException(ioE); {code} And [~Apache9], should we handle LeaseException just like OutOfOrderScannerNextException or UnknownScannerException and retry? > Master Fails to become active due to the data race bug in region server > ----------------------------------------------------------------------- > > Key: HBASE-22017 > URL: https://issues.apache.org/jira/browse/HBASE-22017 > Project: HBase > Issue Type: Bug > Reporter: lujie > Assignee: lujie > Priority: Critical > Attachments: 0001-fix-HBASE-22017.patch, 0002-fix-HBASE-22017.patch, > fixedlogs.zip, logs.zip > > > Test cluster: hadoop11(master), hadoop14(slave), haoop15(slave). > before code execute at > org.apache.hadoop.hbase.regionserver.HStore#getScanner(function)#2027(line > number), hadoop15 shutdown, then master startup fails > {code:java} > 2019-03-06 01:36:17,040 ERROR [master/hadoop11:16000:becomeActiveMaster] > master.HMaster: ***** ABORTING master hadoop11,16000,1551807353275: Unhandled > exception. Starting shutdown. ***** > org.apache.hadoop.hbase.regionserver.LeaseException: > org.apache.hadoop.hbase.regionserver.LeaseException: lease > '3449673378019934209' does not exist > at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:224) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3434) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42002) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:100) > at > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:90) > at > org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:361) > at > org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:349) > at > org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:344) > at > org.apache.hadoop.hbase.client.ScannerCallable.rpcCall(ScannerCallable.java:242) > at > org.apache.hadoop.hbase.client.ScannerCallable.rpcCall(ScannerCallable.java:58) > at > org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:387) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:361) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)