[ https://issues.apache.org/jira/browse/HBASE-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15145573#comment-15145573 ]
Ted Yu commented on HBASE-15219: -------------------------------- Verified that patch v8 works: {code} Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException): org.apache.hadoop.hbase.NotServingRegionException: Region tscantbl,,1453941714280.0f2f1a2fdfa3dad009807fb1b95d3c9a. is not online on ted-hbase-insec-4.novalocal,16020,1450214717066 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2898) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:947) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2235) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1226) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651) at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:372) ... 10 more 2016-02-12 23:44:56,422 INFO [main] tool.Canary: err 0 read: 1 2016-02-12 23:44:56,423 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService 2016-02-12 23:44:56,425 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x251a7631c3e00fa 2016-02-12 23:44:56,429 INFO [main] zookeeper.ZooKeeper: Session: 0x251a7631c3e00fa closed 2016-02-12 23:44:56,429 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down 2016-02-12 23:44:56,429 DEBUG [main] ipc.AbstractRpcClient: Stopping rpc client 2016-02-12 23:44:56,440 INFO [main] hbase.ChoreService: Chore service for: CANARY_TOOL had [] on shutdown {code} {code} x:~> echo $? 5 {code} > Canary tool does not return non-zero exit code when one of regions is in > stuck state > ------------------------------------------------------------------------------------- > > Key: HBASE-15219 > URL: https://issues.apache.org/jira/browse/HBASE-15219 > Project: HBase > Issue Type: Bug > Components: canary > Affects Versions: 0.98.16 > Reporter: Vishal Khandelwal > Assignee: Ted Yu > Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.2.1, 0.98.18 > > Attachments: HBASE-15219.v1.patch, HBASE-15219.v3.patch, > HBASE-15219.v4.patch, HBASE-15219.v5.patch, HBASE-15219.v7.patch, > HBASE-15219.v8.patch > > > {code} > 2016-02-05 12:24:18,571 ERROR [pool-2-thread-7] tool.Canary - read from > region > CAN_1,\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1454667477865.00e77d07b8defe10704417fb99aa0418. > column family 0 failed > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=2, exceptions: > Fri Feb 05 12:24:15 GMT 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@54c9fea0, > org.apache.hadoop.hbase.NotServingRegionException: > org.apache.hadoop.hbase.NotServingRegionException: Region > CAN_1,\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1454667477865.00e77d07b8defe10704417fb99aa0418. > is not online on isthbase02-dnds1-3-crd.eng.sfdc.net,60020,1454669984738 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2852) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4468) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2984) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31186) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2149) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) > at java.lang.Thread.run(Thread.java:745) > -------- > -bash-4.1$ echo $? > 0 > {code} > Below code prints the error but it does sets/returns the exit code. Due to > this tool can't be integrated with nagios or other alerting. > Ideally it should return error for failures. as pre the documentation: > <snip> > This tool will return non zero error codes to user for collaborating with > other monitoring tools, such as Nagios. The error code definitions are: > private static final int USAGE_EXIT_CODE = 1; > private static final int INIT_ERROR_EXIT_CODE = 2; > private static final int TIMEOUT_ERROR_EXIT_CODE = 3; > private static final int ERROR_EXIT_CODE = 4; > </snip> > {code} > org.apache.hadoop.hbase.tool.Canary.RegionTask > public Void read() { > .... > try { > table = connection.getTable(region.getTable()); > tableDesc = table.getTableDescriptor(); > } catch (IOException e) { > LOG.debug("sniffRegion failed", e); > sink.publishReadFailure(region, e); > ... > return null; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)