Re: RegionTooBusyException: StoreTooBusy

2022-03-23 Thread Hamado Dene
Hi Bryan,
Thanks for your info.In our case, we are experiencing the problem in a 
secondary cluster that is used only as a replica (i.e. that it does not receive 
any writes or reads from another application). So this cluster only receives 
replication data from primary cluster.For now we have disabled it.
Thanks,
Hamado Dene

Il mercoledì 23 marzo 2022, 12:01:28 CET, Bryan Beaudreault 
 ha scritto:  
 
 Hello,

Unfortunately I don’t have good guidance on what to tune this to. What I
can say though is that this feature will be disabled by default starting
with version 2.5.0. Part of the reason for that is we determined it is too
aggressive but didn’t yet have good guidance on a better default.

So I would recommend disabling this feature by setting
hbase.region.store.parallel.put.limit to 0 (zero) in your hbase-site.xml.

The idea behind the feature is good, so if you’d prefer to leave it enabled
I’d recommend doing some load testing based on your use case and hardware
to determine a value that works for you. The general idea is that it tries
to avoid painful write contention by limiting the number of parallel write
operations to a single region at a time, but how many parallel writers you
can withstand will be hardware dependent.

On Wed, Mar 23, 2022 at 6:02 AM Hamado Dene 
wrote:

> Hello Hbase Community,
> On our production environment we are experiencing several Exception such
> as:
> 2022-03-23 10:52:38,843 INFO  [AsyncFSWAL-0-hdfs://hadoopcluster/hbase]
> wal.AbstractFSWAL: Slow sync cost: 120 ms, current pipeline:
> [DatanodeInfoWithStorage[10.211.3.11:50010,DS-b8181e87-2f63-47d5-a9f2-4d9ca8216d93,DISK],
> DatanodeInfoWithStorage[10.211.3.12:50010,DS-f32aa630-e63c-4aee-a77b-a04128edee31,DISK]]2022-03-23
> 10:54:15,631 WARN  [hconnection-0x63191a3a-shared-pool6-t322]
> client.AsyncRequestFutureImpl: b7b2a5f50bdaa3794d185ce.:n Above
> parallelPutToStoreThreadLimit(10)        at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1083)
>      at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:986)
>      at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:951)
>      at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2783)
>      at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42290)
>      at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418)
>    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>  at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>      at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) on
> acv-db16-hd.diennea.lan,16020,1648028001827, tracking started Wed Mar 23
> 10:54:12 CET 2022; NOT retrying, failed=6 -- final attempt!2022-03-23
> 10:54:15,632 ERROR
> [RpcServer.replication.FPBQ.Fifo.handler=2,queue=0,port=16020]
> regionserver.ReplicationSink: Unable to accept edit
> because:org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed 6 actions: org.apache.hadoop.hbase.RegionTooBusyException:
> StoreTooBusy,mn1_5276_huserlog,,1647637376109.27fd761a2b7b2a5f50bdaa3794d185ce.:n
> Above parallelPutToStoreThreadLimit(10)        at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1083)
>      at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:986)
>      at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:951)
>      at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2783)
>      at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42290)
>      at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418)
>    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>  at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>      at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318):
> 6 times, servers with issues: acv-db16-hd,16020,1648028001827        at
> org.apache.hadoop.hbase.client.BatchErrors.makeException(BatchErrors.java:54)
>      at
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.getErrors(AsyncRequestFutureImpl.java:1204)
>      at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:453)
>  at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:436)        at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.batch(ReplicationSink.java:421)
>      at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.replicateEntries(ReplicationSink.java:251)
>      at
> org.apache.hadoop.hbase.replication.regionserver.Replication.replicateLogEntries(Replication.java:178)
>      at
> 

Re: RegionTooBusyException: StoreTooBusy

2022-03-23 Thread Bryan Beaudreault
Hello,

Unfortunately I don’t have good guidance on what to tune this to. What I
can say though is that this feature will be disabled by default starting
with version 2.5.0. Part of the reason for that is we determined it is too
aggressive but didn’t yet have good guidance on a better default.

So I would recommend disabling this feature by setting
hbase.region.store.parallel.put.limit to 0 (zero) in your hbase-site.xml.

The idea behind the feature is good, so if you’d prefer to leave it enabled
I’d recommend doing some load testing based on your use case and hardware
to determine a value that works for you. The general idea is that it tries
to avoid painful write contention by limiting the number of parallel write
operations to a single region at a time, but how many parallel writers you
can withstand will be hardware dependent.

On Wed, Mar 23, 2022 at 6:02 AM Hamado Dene 
wrote:

> Hello Hbase Community,
> On our production environment we are experiencing several Exception such
> as:
> 2022-03-23 10:52:38,843 INFO  [AsyncFSWAL-0-hdfs://hadoopcluster/hbase]
> wal.AbstractFSWAL: Slow sync cost: 120 ms, current pipeline:
> [DatanodeInfoWithStorage[10.211.3.11:50010,DS-b8181e87-2f63-47d5-a9f2-4d9ca8216d93,DISK],
> DatanodeInfoWithStorage[10.211.3.12:50010,DS-f32aa630-e63c-4aee-a77b-a04128edee31,DISK]]2022-03-23
> 10:54:15,631 WARN  [hconnection-0x63191a3a-shared-pool6-t322]
> client.AsyncRequestFutureImpl: b7b2a5f50bdaa3794d185ce.:n Above
> parallelPutToStoreThreadLimit(10)at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1083)
>   at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:986)
>   at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:951)
>   at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2783)
>   at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42290)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>   at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>   at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) on
> acv-db16-hd.diennea.lan,16020,1648028001827, tracking started Wed Mar 23
> 10:54:12 CET 2022; NOT retrying, failed=6 -- final attempt!2022-03-23
> 10:54:15,632 ERROR
> [RpcServer.replication.FPBQ.Fifo.handler=2,queue=0,port=16020]
> regionserver.ReplicationSink: Unable to accept edit
> because:org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed 6 actions: org.apache.hadoop.hbase.RegionTooBusyException:
> StoreTooBusy,mn1_5276_huserlog,,1647637376109.27fd761a2b7b2a5f50bdaa3794d185ce.:n
> Above parallelPutToStoreThreadLimit(10)at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1083)
>   at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:986)
>   at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:951)
>   at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2783)
>   at
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42290)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>   at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>   at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318):
> 6 times, servers with issues: acv-db16-hd,16020,1648028001827at
> org.apache.hadoop.hbase.client.BatchErrors.makeException(BatchErrors.java:54)
>   at
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.getErrors(AsyncRequestFutureImpl.java:1204)
>   at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:453)
>   at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:436)at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.batch(ReplicationSink.java:421)
>   at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.replicateEntries(ReplicationSink.java:251)
>   at
> org.apache.hadoop.hbase.replication.regionserver.Replication.replicateLogEntries(Replication.java:178)
>   at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:2311)
>   at
> org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29752)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>   at
> 

RegionTooBusyException: StoreTooBusy

2022-03-23 Thread Hamado Dene
Hello Hbase Community,
On our production environment we are experiencing several Exception such as:
2022-03-23 10:52:38,843 INFO  [AsyncFSWAL-0-hdfs://hadoopcluster/hbase] 
wal.AbstractFSWAL: Slow sync cost: 120 ms, current pipeline: 
[DatanodeInfoWithStorage[10.211.3.11:50010,DS-b8181e87-2f63-47d5-a9f2-4d9ca8216d93,DISK],
 
DatanodeInfoWithStorage[10.211.3.12:50010,DS-f32aa630-e63c-4aee-a77b-a04128edee31,DISK]]2022-03-23
 10:54:15,631 WARN  [hconnection-0x63191a3a-shared-pool6-t322] 
client.AsyncRequestFutureImpl: b7b2a5f50bdaa3794d185ce.:n Above 
parallelPutToStoreThreadLimit(10)        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1083)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:986)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:951)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2783)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42290)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418)       
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)       
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) 
on acv-db16-hd.diennea.lan,16020,1648028001827, tracking started Wed Mar 23 
10:54:12 CET 2022; NOT retrying, failed=6 -- final attempt!2022-03-23 
10:54:15,632 ERROR 
[RpcServer.replication.FPBQ.Fifo.handler=2,queue=0,port=16020] 
regionserver.ReplicationSink: Unable to accept edit 
because:org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: 
Failed 6 actions: org.apache.hadoop.hbase.RegionTooBusyException: 
StoreTooBusy,mn1_5276_huserlog,,1647637376109.27fd761a2b7b2a5f50bdaa3794d185ce.:n
 Above parallelPutToStoreThreadLimit(10)        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1083)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:986)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:951)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2783)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42290)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418)       
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)       
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318): 
6 times, servers with issues: acv-db16-hd,16020,1648028001827        at 
org.apache.hadoop.hbase.client.BatchErrors.makeException(BatchErrors.java:54)   
     at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.getErrors(AsyncRequestFutureImpl.java:1204)
        at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:453)        
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:436)        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.batch(ReplicationSink.java:421)
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.replicateEntries(ReplicationSink.java:251)
        at 
org.apache.hadoop.hbase.replication.regionserver.Replication.replicateLogEntries(Replication.java:178)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:2311)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29752)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418)       
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)       
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
What is the best way to manage this issue?I saw on the net the possibility of 
increasing the propery hbase.region.store.parallel.put.limit, but in the hbase 
documentation I don't find any reference about it.Is this property still valid? 
It can be enabled at the level of hdfs-site.xml
Thanks, 
Hamado Dene