[ https://issues.apache.org/jira/browse/HBASE-24480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120691#comment-17120691 ]
Bharath Vissapragada commented on HBASE-24480: ---------------------------------------------- I think the root cause is this. The problem happens when an empty list is passed to postClearDeadServers hook... {noformat} public void postClearDeadServers(ObserverContext<MasterCoprocessorEnvironment> ctx, List<ServerName> servers, List<ServerName> notClearedServers) throws IOException { Set<Address> clearedServer = Sets.newHashSet(); for (ServerName server: servers) { if (!notClearedServers.contains(server)) { clearedServer.add(server.getAddress()); } } groupAdminServer.removeServers(clearedServer); <== clearedServer list is empty } {noformat} The cause of this is in clearDeadServers() RPC.. {noformat} if (master.getServerManager().areDeadServersInProgress()) { LOG.debug("Some dead server is still under processing, won't clear the dead server list"); <======= response.addAllServerName(request.getServerNameList()); } else { for (HBaseProtos.ServerName pbServer : request.getServerNameList()) { if (!master.getServerManager().getDeadServers() .removeDeadServer(ProtobufUtil.toServerName(pbServer))) { response.addServerName(pbServer); } } } {noformat} I could see the LOG.debug() in the logs. Its the same region server that was stopped. Essentially there is a dead server that is being processed and hence the current request was rejected. The fix is essentially the following - Don't execute the post hook if no server is cleared - Make the the test more robust to handle this RPC failure.. > Deflake TestRSGroupsBasics#testClearDeadServers > ----------------------------------------------- > > Key: HBASE-24480 > URL: https://issues.apache.org/jira/browse/HBASE-24480 > Project: HBase > Issue Type: Bug > Components: rsgroup > Affects Versions: 2.3.0, 1.7.0 > Reporter: Bharath Vissapragada > Assignee: Bharath Vissapragada > Priority: Major > > Ran into this on our internal forks based on branch-1. It also applies to > branch-2 but not master because the code has been re-implemented without > co-proc due to HBASE-22514 > Running into this exception in the test run.. > {noformat} > org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers to > remove cannot be null or empty. at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:391) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:1175) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$104.call(MasterCoprocessorHost.java:1251) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.execOperation(MasterCoprocessorHost.java:1507) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1247) > at > org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:1167) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2421) at > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291) " > > type="org.apache.hadoop.hbase.constraint.ConstraintException">org.apache.hadoop.hbase.constraint.ConstraintException: > > org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers to > remove cannot be null or empty. > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:391) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:1175) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$104.call(MasterCoprocessorHost.java:1251) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.execOperation(MasterCoprocessorHost.java:1507) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1247) > at > org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:1167) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2421) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291) > at > org.apache.hadoop.hbase.rsgroup.TestRSGroupsBasics.testClearDeadServers(TestRSGroupsBasics.java:215) > Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: > org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers to > remove cannot be null or empty. > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:391) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:1175) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$104.call(MasterCoprocessorHost.java:1251) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.execOperation(MasterCoprocessorHost.java:1507) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1247) > at > org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:1167) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2421) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)