Re: RegionTooBusyException: StoreTooBusy
Hi Bryan, Thanks for your info.In our case, we are experiencing the problem in a secondary cluster that is used only as a replica (i.e. that it does not receive any writes or reads from another application). So this cluster only receives replication data from primary cluster.For now we have disabled it. Thanks, Hamado Dene Il mercoledì 23 marzo 2022, 12:01:28 CET, Bryan Beaudreault ha scritto: Hello, Unfortunately I don’t have good guidance on what to tune this to. What I can say though is that this feature will be disabled by default starting with version 2.5.0. Part of the reason for that is we determined it is too aggressive but didn’t yet have good guidance on a better default. So I would recommend disabling this feature by setting hbase.region.store.parallel.put.limit to 0 (zero) in your hbase-site.xml. The idea behind the feature is good, so if you’d prefer to leave it enabled I’d recommend doing some load testing based on your use case and hardware to determine a value that works for you. The general idea is that it tries to avoid painful write contention by limiting the number of parallel write operations to a single region at a time, but how many parallel writers you can withstand will be hardware dependent. On Wed, Mar 23, 2022 at 6:02 AM Hamado Dene wrote: > Hello Hbase Community, > On our production environment we are experiencing several Exception such > as: > 2022-03-23 10:52:38,843 INFO [AsyncFSWAL-0-hdfs://hadoopcluster/hbase] > wal.AbstractFSWAL: Slow sync cost: 120 ms, current pipeline: > [DatanodeInfoWithStorage[10.211.3.11:50010,DS-b8181e87-2f63-47d5-a9f2-4d9ca8216d93,DISK], > DatanodeInfoWithStorage[10.211.3.12:50010,DS-f32aa630-e63c-4aee-a77b-a04128edee31,DISK]]2022-03-23 > 10:54:15,631 WARN [hconnection-0x63191a3a-shared-pool6-t322] > client.AsyncRequestFutureImpl: b7b2a5f50bdaa3794d185ce.:n Above > parallelPutToStoreThreadLimit(10) at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1083) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:986) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:951) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2783) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42290) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) on > acv-db16-hd.diennea.lan,16020,1648028001827, tracking started Wed Mar 23 > 10:54:12 CET 2022; NOT retrying, failed=6 -- final attempt!2022-03-23 > 10:54:15,632 ERROR > [RpcServer.replication.FPBQ.Fifo.handler=2,queue=0,port=16020] > regionserver.ReplicationSink: Unable to accept edit > because:org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: > Failed 6 actions: org.apache.hadoop.hbase.RegionTooBusyException: > StoreTooBusy,mn1_5276_huserlog,,1647637376109.27fd761a2b7b2a5f50bdaa3794d185ce.:n > Above parallelPutToStoreThreadLimit(10) at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1083) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:986) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:951) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2783) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42290) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318): > 6 times, servers with issues: acv-db16-hd,16020,1648028001827 at > org.apache.hadoop.hbase.client.BatchErrors.makeException(BatchErrors.java:54) > at > org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.getErrors(AsyncRequestFutureImpl.java:1204) > at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:453) > at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:436) at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.batch(ReplicationSink.java:421) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.replicateEntries(ReplicationSink.java:251) > at > org.apache.hadoop.hbase.replication.regionserver.Replication.replicateLogEntries(Replication.java:178) > at > org.apache.hadoop.hbase.regionserver.RSRpcServic
Re: RegionTooBusyException: StoreTooBusy
Hello, Unfortunately I don’t have good guidance on what to tune this to. What I can say though is that this feature will be disabled by default starting with version 2.5.0. Part of the reason for that is we determined it is too aggressive but didn’t yet have good guidance on a better default. So I would recommend disabling this feature by setting hbase.region.store.parallel.put.limit to 0 (zero) in your hbase-site.xml. The idea behind the feature is good, so if you’d prefer to leave it enabled I’d recommend doing some load testing based on your use case and hardware to determine a value that works for you. The general idea is that it tries to avoid painful write contention by limiting the number of parallel write operations to a single region at a time, but how many parallel writers you can withstand will be hardware dependent. On Wed, Mar 23, 2022 at 6:02 AM Hamado Dene wrote: > Hello Hbase Community, > On our production environment we are experiencing several Exception such > as: > 2022-03-23 10:52:38,843 INFO [AsyncFSWAL-0-hdfs://hadoopcluster/hbase] > wal.AbstractFSWAL: Slow sync cost: 120 ms, current pipeline: > [DatanodeInfoWithStorage[10.211.3.11:50010,DS-b8181e87-2f63-47d5-a9f2-4d9ca8216d93,DISK], > DatanodeInfoWithStorage[10.211.3.12:50010,DS-f32aa630-e63c-4aee-a77b-a04128edee31,DISK]]2022-03-23 > 10:54:15,631 WARN [hconnection-0x63191a3a-shared-pool6-t322] > client.AsyncRequestFutureImpl: b7b2a5f50bdaa3794d185ce.:n Above > parallelPutToStoreThreadLimit(10)at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1083) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:986) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:951) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2783) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42290) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) on > acv-db16-hd.diennea.lan,16020,1648028001827, tracking started Wed Mar 23 > 10:54:12 CET 2022; NOT retrying, failed=6 -- final attempt!2022-03-23 > 10:54:15,632 ERROR > [RpcServer.replication.FPBQ.Fifo.handler=2,queue=0,port=16020] > regionserver.ReplicationSink: Unable to accept edit > because:org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: > Failed 6 actions: org.apache.hadoop.hbase.RegionTooBusyException: > StoreTooBusy,mn1_5276_huserlog,,1647637376109.27fd761a2b7b2a5f50bdaa3794d185ce.:n > Above parallelPutToStoreThreadLimit(10)at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1083) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:986) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:951) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2783) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42290) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318): > 6 times, servers with issues: acv-db16-hd,16020,1648028001827at > org.apache.hadoop.hbase.client.BatchErrors.makeException(BatchErrors.java:54) > at > org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.getErrors(AsyncRequestFutureImpl.java:1204) > at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:453) > at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:436)at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.batch(ReplicationSink.java:421) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.replicateEntries(ReplicationSink.java:251) > at > org.apache.hadoop.hbase.replication.regionserver.Replication.replicateLogEntries(Replication.java:178) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:2311) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29752) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(Rpc