[jira] [Updated] (HBASE-28221) Introduce regionserver metric for delayed flushes

Viraj Jasani (Jira) Mon, 27 Nov 2023 15:20:15 -0800


     [ 
https://issues.apache.org/jira/browse/HBASE-28221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Viraj Jasani updated HBASE-28221:
---------------------------------
    Description: 
If compaction is disabled temporarily to allow stabilizing hdfs load, we can 
forget re-enabling the compaction. This can result into flushes getting delayed 
for "hbase.hstore.blockingWaitTime" time (90s). While flushes do happen 
eventually after waiting for max blocking time, it is important to realize that 
any cluster cannot function well with compaction disabled for significant 
amount of time.

 

We would also block any write requests until region is flushed (90+ sec, by 
default):
{code:java}
2023-11-27 20:40:52,124 WARN  [,queue=18,port=60020] regionserver.HRegion - 
Region is too busy due to exceeding memstore size limit.
org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
regionName=table1,1699923733811.4fd5e52e2133df1e347f32c646f23ab4., 
server=server-1,60020,1699421714454, memstoreSize=1073820928, 
blockingMemStoreSize=1073741824
    at 
org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4200)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3264)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3215)
    at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:967)
    at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:895)
    at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2524)
    at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36812)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2432)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311)
    at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291) {code}
 

Delayed flush logs:
{code:java}
LOG.warn("{} has too many store files({}); delaying flush up to {} ms",
  region.getRegionInfo().getEncodedName(), getStoreFileCount(region),
  this.blockingWaitTime); {code}
Suggestion: Introduce regionserver metric (MetricsRegionServerSource) for the 
num of flushes getting delayed due to too many store files.

  was:
If compaction is disabled temporarily to allow stabilizing hdfs load, we can 
forget re-enabling the compaction. This can result into flushes getting delayed 
for "hbase.hstore.blockingWaitTime" time (90s). While flushes do happen 
eventually after waiting for max blocking time, it is important to realize that 
any cluster cannot function well with compaction disabled for significant 
amount of time as we block any write requests until region memstore stays at 
full capacity.

 

Delayed flush logs:
{code:java}
LOG.warn("{} has too many store files({}); delaying flush up to {} ms",
  region.getRegionInfo().getEncodedName(), getStoreFileCount(region),
  this.blockingWaitTime); {code}
Suggestion: Introduce regionserver metric (MetricsRegionServerSource) for the 
num of flushes getting delayed due to too many store files.


> Introduce regionserver metric for delayed flushes
> -------------------------------------------------
>
>                 Key: HBASE-28221
>                 URL: https://issues.apache.org/jira/browse/HBASE-28221
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Viraj Jasani
>            Priority: Major
>             Fix For: 2.6.0, 3.0.0-beta-1
>
>
> If compaction is disabled temporarily to allow stabilizing hdfs load, we can 
> forget re-enabling the compaction. This can result into flushes getting 
> delayed for "hbase.hstore.blockingWaitTime" time (90s). While flushes do 
> happen eventually after waiting for max blocking time, it is important to 
> realize that any cluster cannot function well with compaction disabled for 
> significant amount of time.
>  
> We would also block any write requests until region is flushed (90+ sec, by 
> default):
> {code:java}
> 2023-11-27 20:40:52,124 WARN  [,queue=18,port=60020] regionserver.HRegion - 
> Region is too busy due to exceeding memstore size limit.
> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
> regionName=table1,1699923733811.4fd5e52e2133df1e347f32c646f23ab4., 
> server=server-1,60020,1699421714454, memstoreSize=1073820928, 
> blockingMemStoreSize=1073741824
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4200)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3264)
>     at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3215)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:967)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:895)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2524)
>     at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36812)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2432)
>     at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291) 
> {code}
>  
> Delayed flush logs:
> {code:java}
> LOG.warn("{} has too many store files({}); delaying flush up to {} ms",
>   region.getRegionInfo().getEncodedName(), getStoreFileCount(region),
>   this.blockingWaitTime); {code}
> Suggestion: Introduce regionserver metric (MetricsRegionServerSource) for the 
> num of flushes getting delayed due to too many store files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HBASE-28221) Introduce regionserver metric for delayed flushes

Reply via email to