[jira] [Comment Edited] (HBASE-24595) hbase create namespace blocked when all datanodes has restarted

2020-06-28 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147347#comment-17147347
 ] 

Yu Wang edited comment on HBASE-24595 at 6/29/20, 1:40 AM:
---

Did anyone ever appear this error in this environment and how to solve it ?

Thank you very much and sorry if i'm asking silly questions.


was (Author: yuwang0...@gmail.com):
Did anyone ever appear this error in this environment and how to solve it ?

> hbase create namespace blocked when all datanodes has restarted
> ---
>
> Key: HBASE-24595
> URL: https://issues.apache.org/jira/browse/HBASE-24595
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.6
>Reporter: Yu Wang
>Priority: Critical
> Attachments: create_namespace_1.png, create_namespace_2.png, 
> hmaster.log, hmaster.png, hmaster_4569.jstack, hregionserver.log, 
> hregionserver_25649.jstack, procedure.png
>
>
> environment:
> jdk:1.8.0_181
> hadoop:   3.1.1
> hbase:   2.1.6
> hbase shell create namespace blocked when all datanodes has restarted 
> in kerberos environment,
>  but create it successfully without kerberos
>   
> hmaster log:
> 2020-06-19 23:47:48,241 WARN  [PEWorker-15] 
> procedure.CreateNamespaceProcedure: Retriable error trying to create 
> namespace=abcd2 (in state=CREATE_NAMESPACE_INSERT_INTO_NS_TABLE)
> java.net.SocketTimeoutException: callTimeout=120, callDuration=1220061: 
> Call to hadoop-hbnn0005.com/172.20.101.36:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, 
> waitTime=10763, rpcTimeout=10759 row 'abcd2' on table 'hbase:namespace' at 
> region=hbase:namespace,,1592548148073.f5c7e71fb5e5cab3b27e52600996f7fd., 
> hostname=hadoop-hbnn0005.com,16020,1592580274989, seqNum=162
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:159)
>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:542)
>   at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.insertIntoNSTable(TableNamespaceManager.java:167)
>   at 
> org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.insertIntoNSTable(CreateNamespaceProcedure.java:240)
>   at 
> org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.executeFromState(CreateNamespaceProcedure.java:85)
>   at 
> org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.executeFromState(CreateNamespaceProcedure.java:39)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:189)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:965)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1723)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1462)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:78)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2039)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> hadoop-hbnn0005.com/172.20.101.36:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, 
> waitTime=10763, rpcTimeout=10759
>   at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:205)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
>   at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:96)
>   at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:199)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:682)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:757)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:485)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, 
> waitTime=10763, rpcTimeout=10759
>   at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:200)
>   ... 4 more
> 2020-06-19 23:47:49,218 WARN  [ProcExecTimeout] procedure2.ProcedureExecutor: 
> Worker stuck PEWorker-15(pid=171), run time 20mins, 1.262sec
> 2020-06-19 23:47:54,220 

[jira] [Comment Edited] (HBASE-24595) hbase create namespace blocked when all datanodes has restarted

2020-06-23 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142683#comment-17142683
 ] 

Yu Wang edited comment on HBASE-24595 at 6/23/20, 7:15 AM:
---

The phenomenon is similar with HBASE-22665 and regionserver log has the same 
error log,but not found 'AbstractFSWAL.shutdown' in regionserver jstack.

the regionserver log has error log:

{code:java}
2020-06-23 14:34:11,943 ERROR [MemStoreFlusher.0] regionserver.MemStoreFlusher: 
Cache flush failed for region hbase:meta,,1
org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync 
result after 30 ms for txid=22, WAL system stuck?
at 
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:145)
at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:718)
at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:586)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doSyncOfUnflushedWALChanges(HRegion.java:2674)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2612)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2470)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2444)
at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2334)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:612)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:581)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:361)
at java.lang.Thread.run(Thread.java:748)
2020-06-23 14:34:35,011 WARN  
[RpcServer.priority.FPBQ.Fifo.handler=17,queue=1,port=16020] ipc.RpcServer: Can 
not complete this request in time, drop it: callId: 28 service: ClientService 
methodName: Mutate size: 142 connection: 172.20.100.7:50198 deadline: 
1592893834970 param: region= 
hbase:namespace,,1592548148073.f5c7e71fb5e5cab3b27e52600996f7fd., row=aa1 
connection: 172.20.100.7:50198
{code}







was (Author: yuwang0...@gmail.com):
The phenomenon is similar with HBASE-22665 and regionserver log has the same 
error log,but not found 'AbstractFSWAL.shutdown' in regionserver jstack.

the regionserver log has error log:

{code:java}
// Some comments here
public String getFoo()
{
return foo;
}
{code}

2020-06-23 14:34:11,943 ERROR [MemStoreFlusher.0] regionserver.MemStoreFlusher: 
Cache flush failed for region hbase:meta,,1
org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync 
result after 30 ms for txid=22, WAL system stuck?
at 
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:145)
at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:718)
at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:586)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doSyncOfUnflushedWALChanges(HRegion.java:2674)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2612)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2470)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2444)
at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2334)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:612)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:581)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:361)
at java.lang.Thread.run(Thread.java:748)
2020-06-23 14:34:35,011 WARN  
[RpcServer.priority.FPBQ.Fifo.handler=17,queue=1,port=16020] ipc.RpcServer: Can 
not complete this request in time, drop it: callId: 28 service: ClientService 
methodName: Mutate size: 142 connection: 172.20.100.7:50198 deadline: 
1592893834970 param: region= 
hbase:namespace,,1592548148073.f5c7e71fb5e5cab3b27e52600996f7fd., row=aa1 
connection: 172.20.100.7:50198




> hbase create namespace blocked when all datanodes has restarted
> ---
>
> Key: HBASE-24595
> URL: https://issues.apache.org/jira/browse/HBASE-24595
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.6
>Reporter: Yu Wang
>Priority: Critical
> Attachments: 

[jira] [Comment Edited] (HBASE-24595) hbase create namespace blocked when all datanodes has restarted

2020-06-23 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142683#comment-17142683
 ] 

Yu Wang edited comment on HBASE-24595 at 6/23/20, 7:15 AM:
---

The phenomenon is similar with HBASE-22665 and regionserver log has the same 
error log,but not found 'AbstractFSWAL.shutdown' in regionserver jstack.

the regionserver log has error log:

{code:java}
// Some comments here
public String getFoo()
{
return foo;
}
{code}

2020-06-23 14:34:11,943 ERROR [MemStoreFlusher.0] regionserver.MemStoreFlusher: 
Cache flush failed for region hbase:meta,,1
org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync 
result after 30 ms for txid=22, WAL system stuck?
at 
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:145)
at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:718)
at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:586)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doSyncOfUnflushedWALChanges(HRegion.java:2674)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2612)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2470)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2444)
at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2334)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:612)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:581)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:361)
at java.lang.Thread.run(Thread.java:748)
2020-06-23 14:34:35,011 WARN  
[RpcServer.priority.FPBQ.Fifo.handler=17,queue=1,port=16020] ipc.RpcServer: Can 
not complete this request in time, drop it: callId: 28 service: ClientService 
methodName: Mutate size: 142 connection: 172.20.100.7:50198 deadline: 
1592893834970 param: region= 
hbase:namespace,,1592548148073.f5c7e71fb5e5cab3b27e52600996f7fd., row=aa1 
connection: 172.20.100.7:50198





was (Author: yuwang0...@gmail.com):
The phenomenon is similar with HBASE-22665 and regionserver log has the same 
error log,but not found 'AbstractFSWAL.shutdown' in regionserver jstack.

the regionserver log has error log:
2020-06-23 14:34:11,943 ERROR [MemStoreFlusher.0] regionserver.MemStoreFlusher: 
Cache flush failed for region hbase:meta,,1
org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync 
result after 30 ms for txid=22, WAL system stuck?
at 
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:145)
at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:718)
at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:586)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doSyncOfUnflushedWALChanges(HRegion.java:2674)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2612)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2470)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2444)
at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2334)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:612)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:581)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:361)
at java.lang.Thread.run(Thread.java:748)
2020-06-23 14:34:35,011 WARN  
[RpcServer.priority.FPBQ.Fifo.handler=17,queue=1,port=16020] ipc.RpcServer: Can 
not complete this request in time, drop it: callId: 28 service: ClientService 
methodName: Mutate size: 142 connection: 172.20.100.7:50198 deadline: 
1592893834970 param: region= 
hbase:namespace,,1592548148073.f5c7e71fb5e5cab3b27e52600996f7fd., row=aa1 
connection: 172.20.100.7:50198




> hbase create namespace blocked when all datanodes has restarted
> ---
>
> Key: HBASE-24595
> URL: https://issues.apache.org/jira/browse/HBASE-24595
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.6
>Reporter: Yu Wang
>Priority: Critical
> Attachments: create_namespace_1.png, 

[jira] [Comment Edited] (HBASE-24595) hbase create namespace blocked when all datanodes has restarted

2020-06-22 Thread Yu Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141939#comment-17141939
 ] 

Yu Wang edited comment on HBASE-24595 at 6/22/20, 11:32 AM:


[~zhangduo] could you paste jira number so that I can research it 

thanks


was (Author: yuwang0...@gmail.com):
[~zhangduo] could you paste jira number so that I can research it 

> hbase create namespace blocked when all datanodes has restarted
> ---
>
> Key: HBASE-24595
> URL: https://issues.apache.org/jira/browse/HBASE-24595
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.6
>Reporter: Yu Wang
>Priority: Critical
> Attachments: create_namespace_1.png, create_namespace_2.png, 
> hmaster.log, hmaster.png, hmaster_4569.jstack, hregionserver.log, 
> hregionserver_25649.jstack, procedure.png
>
>
> environment:
> jdk:1.8.0_181
> hadoop:   3.1.1
> hbase:   2.1.6
> hbase shell create namespace blocked when all datanodes has restarted 
> in kerberos environment,
>  but create it successfully without kerberos
>   
> hmaster日志中显示:
> 2020-06-19 23:47:48,241 WARN  [PEWorker-15] 
> procedure.CreateNamespaceProcedure: Retriable error trying to create 
> namespace=abcd2 (in state=CREATE_NAMESPACE_INSERT_INTO_NS_TABLE)
> java.net.SocketTimeoutException: callTimeout=120, callDuration=1220061: 
> Call to hadoop-hbnn0005.com/172.20.101.36:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, 
> waitTime=10763, rpcTimeout=10759 row 'abcd2' on table 'hbase:namespace' at 
> region=hbase:namespace,,1592548148073.f5c7e71fb5e5cab3b27e52600996f7fd., 
> hostname=hadoop-hbnn0005.com,16020,1592580274989, seqNum=162
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:159)
>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:542)
>   at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.insertIntoNSTable(TableNamespaceManager.java:167)
>   at 
> org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.insertIntoNSTable(CreateNamespaceProcedure.java:240)
>   at 
> org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.executeFromState(CreateNamespaceProcedure.java:85)
>   at 
> org.apache.hadoop.hbase.master.procedure.CreateNamespaceProcedure.executeFromState(CreateNamespaceProcedure.java:39)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:189)
>   at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:965)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1723)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1462)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:78)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2039)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> hadoop-hbnn0005.com/172.20.101.36:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, 
> waitTime=10763, rpcTimeout=10759
>   at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:205)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
>   at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:96)
>   at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:199)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:682)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:757)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:485)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=116, 
> waitTime=10763, rpcTimeout=10759
>   at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:200)
>   ... 4 more
> 2020-06-19 23:47:49,218 WARN  [ProcExecTimeout] procedure2.ProcedureExecutor: 
> Worker stuck PEWorker-15(pid=171), run time 20mins, 1.262sec
> 2020-06-19 23:47:54,220 WARN  [ProcExecTimeout] procedure2.ProcedureExecutor: 
> Worker stuck