[ https://issues.apache.org/jira/browse/HBASE-24595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yu Wang updated HBASE-24595: ---------------------------- Description: environment: jdk: 1.8.0_181 hadoop: 3.1.1 hbase: 2.1.6 hbase shell create namespace blocked when all datanodes has restarted in kerberos environment, but create it successfully without kerberos hmaster log: 2020-06-19 23:47:48,241 WARN [PEWorker-15] procedure.CreateNamespaceProcedure: Retriable error trying to create namespace=abcd2 (in state=CREATE_NAMESPACE_INSERT_INTO_NS_TABLE) java.net.SocketTimeoutException: callTimeout=1200000, callDuration=1220061: Call to hadoop-hbnn0005.com/172.20.101.36:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, waitTime=10763, rpcTimeout=10759 row 'abcd2' on table 'hbase:namespace' at region=hbase:namespace,,1592548148073.f5c7e71fb5e5cab3b27e52600996f7fd., hostname=hadoop-hbnn0005.com,16020,1592580274989, seqNum=162 at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:159) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:542) at org.apache.hadoop.hbase.master.TableNamespaceManager.insertIntoNSTable(TableNamespaceManager.java:167) at org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.insertIntoNSTable(CreateNamespaceProcedure.java:240) at org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.executeFromState(CreateNamespaceProcedure.java:85) at org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.executeFromState(CreateNamespaceProcedure.java:39) at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:189) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:965) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1723) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1462) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:78) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2039) Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to hadoop-hbnn0005.com/172.20.101.36:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, waitTime=10763, rpcTimeout=10759 at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:205) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406) at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:96) at org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:199) at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:682) at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:757) at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:485) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, waitTime=10763, rpcTimeout=10759 at org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:200) ... 4 more 2020-06-19 23:47:49,218 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 1.262sec 2020-06-19 23:47:54,220 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 6.263sec 2020-06-19 23:47:59,220 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 11.264sec 2020-06-19 23:48:04,220 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 16.264sec 2020-06-19 23:48:09,221 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 21.265sec 2020-06-19 23:48:14,221 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 26.265sec 2020-06-19 23:48:19,221 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 31.265sec 2020-06-19 23:48:24,222 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 36.266sec 2020-06-19 23:48:29,222 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 41.266sec 2020-06-19 23:48:34,223 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 46.267sec 2020-06-19 23:48:39,223 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 51.267sec was: environment: jdk: 1.8.0_181 hadoop: 3.1.1 hbase: 2.1.6 hbase shell create namespace blocked when all datanodes has restarted in kerberos environment, but create it successfully without kerberos hmaster日志中显示: 2020-06-19 23:47:48,241 WARN [PEWorker-15] procedure.CreateNamespaceProcedure: Retriable error trying to create namespace=abcd2 (in state=CREATE_NAMESPACE_INSERT_INTO_NS_TABLE) java.net.SocketTimeoutException: callTimeout=1200000, callDuration=1220061: Call to hadoop-hbnn0005.com/172.20.101.36:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, waitTime=10763, rpcTimeout=10759 row 'abcd2' on table 'hbase:namespace' at region=hbase:namespace,,1592548148073.f5c7e71fb5e5cab3b27e52600996f7fd., hostname=hadoop-hbnn0005.com,16020,1592580274989, seqNum=162 at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:159) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:542) at org.apache.hadoop.hbase.master.TableNamespaceManager.insertIntoNSTable(TableNamespaceManager.java:167) at org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.insertIntoNSTable(CreateNamespaceProcedure.java:240) at org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.executeFromState(CreateNamespaceProcedure.java:85) at org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.executeFromState(CreateNamespaceProcedure.java:39) at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:189) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:965) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1723) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1462) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:78) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2039) Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to hadoop-hbnn0005.com/172.20.101.36:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, waitTime=10763, rpcTimeout=10759 at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:205) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406) at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:96) at org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:199) at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:682) at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:757) at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:485) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, waitTime=10763, rpcTimeout=10759 at org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:200) ... 4 more 2020-06-19 23:47:49,218 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 1.262sec 2020-06-19 23:47:54,220 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 6.263sec 2020-06-19 23:47:59,220 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 11.264sec 2020-06-19 23:48:04,220 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 16.264sec 2020-06-19 23:48:09,221 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 21.265sec 2020-06-19 23:48:14,221 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 26.265sec 2020-06-19 23:48:19,221 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 31.265sec 2020-06-19 23:48:24,222 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 36.266sec 2020-06-19 23:48:29,222 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 41.266sec 2020-06-19 23:48:34,223 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 46.267sec 2020-06-19 23:48:39,223 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: Worker stuck PEWorker-15(pid=171), run time 20mins, 51.267sec > hbase create namespace blocked when all datanodes has restarted > --------------------------------------------------------------- > > Key: HBASE-24595 > URL: https://issues.apache.org/jira/browse/HBASE-24595 > Project: HBase > Issue Type: Bug > Affects Versions: 2.1.6 > Reporter: Yu Wang > Priority: Critical > Attachments: create_namespace_1.png, create_namespace_2.png, > hmaster.log, hmaster.png, hmaster_4569.jstack, hregionserver.log, > hregionserver_25649.jstack, procedure.png > > > environment: > jdk: 1.8.0_181 > hadoop: 3.1.1 > hbase: 2.1.6 > hbase shell create namespace blocked when all datanodes has restarted > in kerberos environment, > but create it successfully without kerberos > > hmaster log: > 2020-06-19 23:47:48,241 WARN [PEWorker-15] > procedure.CreateNamespaceProcedure: Retriable error trying to create > namespace=abcd2 (in state=CREATE_NAMESPACE_INSERT_INTO_NS_TABLE) > java.net.SocketTimeoutException: callTimeout=1200000, callDuration=1220061: > Call to hadoop-hbnn0005.com/172.20.101.36:16020 failed on local exception: > org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, > waitTime=10763, rpcTimeout=10759 row 'abcd2' on table 'hbase:namespace' at > region=hbase:namespace,,1592548148073.f5c7e71fb5e5cab3b27e52600996f7fd., > hostname=hadoop-hbnn0005.com,16020,1592580274989, seqNum=162 > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:159) > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:542) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.insertIntoNSTable(TableNamespaceManager.java:167) > at > org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.insertIntoNSTable(CreateNamespaceProcedure.java:240) > at > org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.executeFromState(CreateNamespaceProcedure.java:85) > at > org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.executeFromState(CreateNamespaceProcedure.java:39) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:189) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:965) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1723) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1462) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2039) > Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to > hadoop-hbnn0005.com/172.20.101.36:16020 failed on local exception: > org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, > waitTime=10763, rpcTimeout=10759 > at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:205) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406) > at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:96) > at > org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:199) > at > org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:682) > at > org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:757) > at > org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:485) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, > waitTime=10763, rpcTimeout=10759 > at > org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:200) > ... 4 more > 2020-06-19 23:47:49,218 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: > Worker stuck PEWorker-15(pid=171), run time 20mins, 1.262sec > 2020-06-19 23:47:54,220 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: > Worker stuck PEWorker-15(pid=171), run time 20mins, 6.263sec > 2020-06-19 23:47:59,220 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: > Worker stuck PEWorker-15(pid=171), run time 20mins, 11.264sec > 2020-06-19 23:48:04,220 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: > Worker stuck PEWorker-15(pid=171), run time 20mins, 16.264sec > 2020-06-19 23:48:09,221 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: > Worker stuck PEWorker-15(pid=171), run time 20mins, 21.265sec > 2020-06-19 23:48:14,221 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: > Worker stuck PEWorker-15(pid=171), run time 20mins, 26.265sec > 2020-06-19 23:48:19,221 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: > Worker stuck PEWorker-15(pid=171), run time 20mins, 31.265sec > 2020-06-19 23:48:24,222 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: > Worker stuck PEWorker-15(pid=171), run time 20mins, 36.266sec > 2020-06-19 23:48:29,222 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: > Worker stuck PEWorker-15(pid=171), run time 20mins, 41.266sec > 2020-06-19 23:48:34,223 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: > Worker stuck PEWorker-15(pid=171), run time 20mins, 46.267sec > 2020-06-19 23:48:39,223 WARN [ProcExecTimeout] procedure2.ProcedureExecutor: > Worker stuck PEWorker-15(pid=171), run time 20mins, 51.267sec -- This message was sent by Atlassian Jira (v8.3.4#803005)