[ 
https://issues.apache.org/jira/browse/HBASE-28545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klay updated HBASE-28545:
-------------------------
    Description: 
Hi, I am testing the reliability of HBase to see if it could work normally or 
properly handle exceptions under different configurations.

When setting +hbase.table.max.rowsize+ to a relatively small size, create an 
empty table and then truncate it could get worker thread to get stuck in an 
infinite retry and generate massive logs.
h1. Reproduce

This bug can be reproduced determinstically with the following steps:

1. Set hbase.table.max.rowsize to a relatively small value: 745

hbase-site.xml
{code:java}
    <property>
        <name>hbase.table.max.rowsize</name>
        <value>745</value>
    </property>
{code}
2. Start up HBase cluster 2.5.8 (1HM, 2RS) and execute the following commands 
from hbase shell
{code:java}
create 'uuid0f06811e5d11431fa9d4a543705b8747', {NAME => 
'uuidad7c09870a6346598bbca56cb97c4c67', VERSIONS => 4, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuidd2c5ad4df7b844afb4c2ac92b224dedd', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuid6e21f5f492eb4b9ba0e431b9a0666dae', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuidfb110629fba44915956e41609b68ebe9', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuid4943640bf1f6425582ac9b3572d95964', VERSIONS => 4, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}

truncate 'uuid0f06811e5d11431fa9d4a543705b8747' {code}
*truncate* command timeout after 10m. However, the PEWorker thread still gets 
stuck in an *infinite retry* and keep generating a massive WARN in HMaster logs
{code:java}
➜  log git:(main) ✗ ls
hbase--master-09e0b9e25e7c.log     hbase--master-09e0b9e25e7c.log.12  
hbase--master-09e0b9e25e7c.log.16  hbase--master-09e0b9e25e7c.log.2   
hbase--master-09e0b9e25e7c.log.5  hbase--master-09e0b9e25e7c.log.9   
SecurityAuth.audit
hbase--master-09e0b9e25e7c.log.1   hbase--master-09e0b9e25e7c.log.13  
hbase--master-09e0b9e25e7c.log.17  hbase--master-09e0b9e25e7c.log.20  
hbase--master-09e0b9e25e7c.log.6  hbase--master-09e0b9e25e7c.out
hbase--master-09e0b9e25e7c.log.10  hbase--master-09e0b9e25e7c.log.14  
hbase--master-09e0b9e25e7c.log.18  hbase--master-09e0b9e25e7c.log.3   
hbase--master-09e0b9e25e7c.log.7  hbase--zookeeper-09e0b9e25e7c.log
hbase--master-09e0b9e25e7c.log.11  hbase--master-09e0b9e25e7c.log.15  
hbase--master-09e0b9e25e7c.log.19  hbase--master-09e0b9e25e7c.log.4   
hbase--master-09e0b9e25e7c.log.8  hbase--zookeeper-09e0b9e25e7c.out {code}
The logs contain the following repeated exceptions
{code:java}
2024-04-23T02:52:16,144 WARN  [PEWorker-14] procedure.TruncateTableProcedure: 
Retriable error trying to truncate table=uuid0f06811e5d11431fa9d4a543705b8747 
state=TRUNCATE_TABLE_CLEAR_FS_LAYOUT
org.apache.hadoop.hbase.regionserver.RowTooBigException: 
org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 
745, but the row is bigger than that, the row info: 
uuid0f06811e5d11431fa9d4a543705b8747,,1713840709844.c03450373b32cba95a2dd3139f0a2201./info:state/1713840716052/Put/vlen=6/seqid=16,
 already have process row cells = 6, it belong to region = 
hbase:meta,,1.1588230740
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:653)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:145)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.populateResult(RegionScannerImpl.java:342)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextInternal(RegionScannerImpl.java:513)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextRaw(RegionScannerImpl.java:278)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.next(RegionScannerImpl.java:256)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.next(RegionScannerImpl.java:243)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2654)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2577)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45002)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)       
 at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown Source) 
~[?:?]
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[?:1.8.0_362]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
~[?:1.8.0_362]
        at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:110)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:100)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:280)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:265)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:126)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:103)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:390) 
~[hbase-client-2.5.8.jar:2.5.8]
        at org.apache.hadoop.hbase.client.HTable.lambda$get$0(HTable.java:363) 
~[hbase-client-2.5.8.jar:2.5.8]
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216) 
~[hbase-common-2.5.8.jar:2.5.8]
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:363) 
~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:248) 
~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.MetaTableAccessor.getRegionResult(MetaTableAccessor.java:344)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.MetaTableAccessor.getMergeRegions(MetaTableAccessor.java:374)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.master.procedure.DeleteTableProcedure.deleteFromFs(DeleteTableProcedure.java:283)
 ~[hbase-server-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:103)
 ~[hbase-server-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:44)
 ~[hbase-server-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)
 ~[hbase-procedure-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:942) 
~[hbase-procedure-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1765)
 ~[hbase-procedure-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1443)
 ~[hbase-procedure-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:77)
 ~[hbase-procedure-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:2091)
 ~[hbase-procedure-2.5.8.jar:2.5.8]
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216) 
~[hbase-common-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2118)
 ~[hbase-procedure-2.5.8.jar:2.5.8]
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: 
org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 
745, but the row is bigger than that, the row info: 
uuid0f06811e5d11431fa9d4a543705b8747,,1713840709844.c03450373b32cba95a2dd3139f0a2201./info:state/1713840716052/Put/vlen=6/seqid=16,
 already have process row cells = 6, it belong to region = 
hbase:meta,,1.1588230740
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:653)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:145)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.populateResult(RegionScannerImpl.java:342)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextInternal(RegionScannerImpl.java:513)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextRaw(RegionScannerImpl.java:278)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.next(RegionScannerImpl.java:256)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.next(RegionScannerImpl.java:243)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2654)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2577)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45002)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)       
 at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:388)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:425) 
~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:420) 
~[hbase-client-2.5.8.jar:2.5.8]
        at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114) 
~[hbase-client-2.5.8.jar:2.5.8]
        at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129) 
~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.readResponse(NettyRpcDuplexHandler.java:199)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelRead(NettyRpcDuplexHandler.java:220)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:509)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
{code}
I have attached the one example hmaster log.
h1. Root Cause

If decrease the length of the table name and cf name in the create table 
command, the truncate will work normally. This indicates that the symptom could 
be triggered as long as there's a table whose metadata size is larger than the 
max row size. 

E.g., using the default max row size value (1GB), if create a table with a 
large metadata size (lots of columns with long names), it might still cause the 
similar infinite loop problem. 

This usually won't happen with the default configurations. However, users are 
more likely to run into this exception if they set the max row size to a 
smaller value and create a large table.
h1. Fix

A better error handler might work
 * This infinite loop could be avoided by adding a retry limit if the truncate 
command keeps running into RowTooBigException.

I am still examining the exact root cause. I could try to provide a patch. Any 
suggestions are welcome. Thank you.

  was:
Hi, I am testing the reliability of HBase to see if it could work normally or 
properly handle exceptions under different configurations.

When setting +hbase.table.max.rowsize+ to a relatively small size, create an 
empty table and then truncate it could get worker thread to get stuck in an 
infinite retry and generate massive logs.
h1. Reproduce

This bug can be reproduced determinstically with the following steps:

1. Set hbase.table.max.rowsize to a relatively small value: 745

hbase-site.xml
{code:java}
    <property>
        <name>hbase.table.max.rowsize</name>
        <value>745</value>
    </property>
{code}
2. Start up HBase cluster 2.5.8 (1HM, 2RS) and execute the following commands 
from hbase shell
{code:java}
create 'uuid0f06811e5d11431fa9d4a543705b8747', {NAME => 
'uuidad7c09870a6346598bbca56cb97c4c67', VERSIONS => 4, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuidd2c5ad4df7b844afb4c2ac92b224dedd', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuid6e21f5f492eb4b9ba0e431b9a0666dae', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuidfb110629fba44915956e41609b68ebe9', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuid4943640bf1f6425582ac9b3572d95964', VERSIONS => 4, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}

truncate 'uuid0f06811e5d11431fa9d4a543705b8747' {code}
*truncate* command timeout after 10m. However, the PEWorker thread still gets 
stuck in an *infinite retry* and keep generating a massive WARN in HMaster logs
{code:java}
➜  log git:(main) ✗ ls
hbase--master-09e0b9e25e7c.log     hbase--master-09e0b9e25e7c.log.12  
hbase--master-09e0b9e25e7c.log.16  hbase--master-09e0b9e25e7c.log.2   
hbase--master-09e0b9e25e7c.log.5  hbase--master-09e0b9e25e7c.log.9   
SecurityAuth.audit
hbase--master-09e0b9e25e7c.log.1   hbase--master-09e0b9e25e7c.log.13  
hbase--master-09e0b9e25e7c.log.17  hbase--master-09e0b9e25e7c.log.20  
hbase--master-09e0b9e25e7c.log.6  hbase--master-09e0b9e25e7c.out
hbase--master-09e0b9e25e7c.log.10  hbase--master-09e0b9e25e7c.log.14  
hbase--master-09e0b9e25e7c.log.18  hbase--master-09e0b9e25e7c.log.3   
hbase--master-09e0b9e25e7c.log.7  hbase--zookeeper-09e0b9e25e7c.log
hbase--master-09e0b9e25e7c.log.11  hbase--master-09e0b9e25e7c.log.15  
hbase--master-09e0b9e25e7c.log.19  hbase--master-09e0b9e25e7c.log.4   
hbase--master-09e0b9e25e7c.log.8  hbase--zookeeper-09e0b9e25e7c.out {code}
The logs contain the following exceptions
{code:java}
2024-04-23T02:52:16,144 WARN  [PEWorker-14] procedure.TruncateTableProcedure: 
Retriable error trying to truncate table=uuid0f06811e5d11431fa9d4a543705b8747 
state=TRUNCATE_TABLE_CLEAR_FS_LAYOUT
org.apache.hadoop.hbase.regionserver.RowTooBigException: 
org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 
745, but the row is bigger than that, the row info: 
uuid0f06811e5d11431fa9d4a543705b8747,,1713840709844.c03450373b32cba95a2dd3139f0a2201./info:state/1713840716052/Put/vlen=6/seqid=16,
 already have process row cells = 6, it belong to region = 
hbase:meta,,1.1588230740
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:653)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:145)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.populateResult(RegionScannerImpl.java:342)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextInternal(RegionScannerImpl.java:513)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextRaw(RegionScannerImpl.java:278)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.next(RegionScannerImpl.java:256)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.next(RegionScannerImpl.java:243)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2654)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2577)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45002)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)       
 at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown Source) 
~[?:?]
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[?:1.8.0_362]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
~[?:1.8.0_362]
        at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:110)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:100)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:280)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:265)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:126)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:103)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:390) 
~[hbase-client-2.5.8.jar:2.5.8]
        at org.apache.hadoop.hbase.client.HTable.lambda$get$0(HTable.java:363) 
~[hbase-client-2.5.8.jar:2.5.8]
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216) 
~[hbase-common-2.5.8.jar:2.5.8]
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:363) 
~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:248) 
~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.MetaTableAccessor.getRegionResult(MetaTableAccessor.java:344)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.MetaTableAccessor.getMergeRegions(MetaTableAccessor.java:374)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.master.procedure.DeleteTableProcedure.deleteFromFs(DeleteTableProcedure.java:283)
 ~[hbase-server-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:103)
 ~[hbase-server-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:44)
 ~[hbase-server-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)
 ~[hbase-procedure-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:942) 
~[hbase-procedure-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1765)
 ~[hbase-procedure-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1443)
 ~[hbase-procedure-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:77)
 ~[hbase-procedure-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:2091)
 ~[hbase-procedure-2.5.8.jar:2.5.8]
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216) 
~[hbase-common-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2118)
 ~[hbase-procedure-2.5.8.jar:2.5.8]
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: 
org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 
745, but the row is bigger than that, the row info: 
uuid0f06811e5d11431fa9d4a543705b8747,,1713840709844.c03450373b32cba95a2dd3139f0a2201./info:state/1713840716052/Put/vlen=6/seqid=16,
 already have process row cells = 6, it belong to region = 
hbase:meta,,1.1588230740
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:653)
        at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:145)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.populateResult(RegionScannerImpl.java:342)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextInternal(RegionScannerImpl.java:513)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextRaw(RegionScannerImpl.java:278)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.next(RegionScannerImpl.java:256)
        at 
org.apache.hadoop.hbase.regionserver.RegionScannerImpl.next(RegionScannerImpl.java:243)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2654)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2577)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45002)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)       
 at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:388)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:425) 
~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:420) 
~[hbase-client-2.5.8.jar:2.5.8]
        at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114) 
~[hbase-client-2.5.8.jar:2.5.8]
        at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129) 
~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.readResponse(NettyRpcDuplexHandler.java:199)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelRead(NettyRpcDuplexHandler.java:220)
 ~[hbase-client-2.5.8.jar:2.5.8]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:509)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 ~[hbase-shaded-netty-4.1.5.jar:?]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
{code}
I have attached the one example hmaster log.
h1. Root Cause

If decrease the length of the table name and cf name in the create table 
command, the truncate will work normally. This indicates that the symptom could 
be triggered as long as there's a table whose metadata size is larger than the 
max row size. 

E.g., using the default max row size value (1GB), if create a table with a 
large metadata size (lots of columns with long names), it might still cause the 
similar infinite loop problem. 

This usually won't happen with the default configurations. However, users are 
more likely to run into this exception if they set the max row size to a 
smaller value and create a large table.
h1. Fix

A better error handler might work
 * This infinite loop could be avoided by adding a retry limit if the truncate 
command keeps running into RowTooBigException.

I am still examining the exact root cause. I could try to provide a patch. Any 
suggestions are welcome. Thank you.


> Worker thread gets stuck in an infinite retry when TRUNCATE command runs into 
> RowTooBigException
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-28545
>                 URL: https://issues.apache.org/jira/browse/HBASE-28545
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.4.17, 2.5.8
>            Reporter: Klay
>            Priority: Major
>         Attachments: hbase-master.log
>
>
> Hi, I am testing the reliability of HBase to see if it could work normally or 
> properly handle exceptions under different configurations.
> When setting +hbase.table.max.rowsize+ to a relatively small size, create an 
> empty table and then truncate it could get worker thread to get stuck in an 
> infinite retry and generate massive logs.
> h1. Reproduce
> This bug can be reproduced determinstically with the following steps:
> 1. Set hbase.table.max.rowsize to a relatively small value: 745
> hbase-site.xml
> {code:java}
>     <property>
>         <name>hbase.table.max.rowsize</name>
>         <value>745</value>
>     </property>
> {code}
> 2. Start up HBase cluster 2.5.8 (1HM, 2RS) and execute the following commands 
> from hbase shell
> {code:java}
> create 'uuid0f06811e5d11431fa9d4a543705b8747', {NAME => 
> 'uuidad7c09870a6346598bbca56cb97c4c67', VERSIONS => 4, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
> 'uuidd2c5ad4df7b844afb4c2ac92b224dedd', VERSIONS => 4, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
> 'uuid6e21f5f492eb4b9ba0e431b9a0666dae', VERSIONS => 4, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
> 'uuidfb110629fba44915956e41609b68ebe9', VERSIONS => 2, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
> 'uuid4943640bf1f6425582ac9b3572d95964', VERSIONS => 4, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
> truncate 'uuid0f06811e5d11431fa9d4a543705b8747' {code}
> *truncate* command timeout after 10m. However, the PEWorker thread still gets 
> stuck in an *infinite retry* and keep generating a massive WARN in HMaster 
> logs
> {code:java}
> ➜  log git:(main) ✗ ls
> hbase--master-09e0b9e25e7c.log     hbase--master-09e0b9e25e7c.log.12  
> hbase--master-09e0b9e25e7c.log.16  hbase--master-09e0b9e25e7c.log.2   
> hbase--master-09e0b9e25e7c.log.5  hbase--master-09e0b9e25e7c.log.9   
> SecurityAuth.audit
> hbase--master-09e0b9e25e7c.log.1   hbase--master-09e0b9e25e7c.log.13  
> hbase--master-09e0b9e25e7c.log.17  hbase--master-09e0b9e25e7c.log.20  
> hbase--master-09e0b9e25e7c.log.6  hbase--master-09e0b9e25e7c.out
> hbase--master-09e0b9e25e7c.log.10  hbase--master-09e0b9e25e7c.log.14  
> hbase--master-09e0b9e25e7c.log.18  hbase--master-09e0b9e25e7c.log.3   
> hbase--master-09e0b9e25e7c.log.7  hbase--zookeeper-09e0b9e25e7c.log
> hbase--master-09e0b9e25e7c.log.11  hbase--master-09e0b9e25e7c.log.15  
> hbase--master-09e0b9e25e7c.log.19  hbase--master-09e0b9e25e7c.log.4   
> hbase--master-09e0b9e25e7c.log.8  hbase--zookeeper-09e0b9e25e7c.out {code}
> The logs contain the following repeated exceptions
> {code:java}
> 2024-04-23T02:52:16,144 WARN  [PEWorker-14] procedure.TruncateTableProcedure: 
> Retriable error trying to truncate table=uuid0f06811e5d11431fa9d4a543705b8747 
> state=TRUNCATE_TABLE_CLEAR_FS_LAYOUT
> org.apache.hadoop.hbase.regionserver.RowTooBigException: 
> org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size 
> allowed: 745, but the row is bigger than that, the row info: 
> uuid0f06811e5d11431fa9d4a543705b8747,,1713840709844.c03450373b32cba95a2dd3139f0a2201./info:state/1713840716052/Put/vlen=6/seqid=16,
>  already have process row cells = 6, it belong to region = 
> hbase:meta,,1.1588230740
>         at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:653)
>         at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:145)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.populateResult(RegionScannerImpl.java:342)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextInternal(RegionScannerImpl.java:513)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextRaw(RegionScannerImpl.java:278)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.next(RegionScannerImpl.java:256)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.next(RegionScannerImpl.java:243)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2654)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2577)
>         at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45002)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>         at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
>         at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)     
>    at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown Source) 
> ~[?:?]
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  ~[?:1.8.0_362]
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
> ~[?:1.8.0_362]
>         at 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:110)
>  ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:100)
>  ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:280)
>  ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:265)
>  ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:126)
>  ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:103)
>  ~[hbase-client-2.5.8.jar:2.5.8]
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:390) 
> ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.client.HTable.lambda$get$0(HTable.java:363) 
> ~[hbase-client-2.5.8.jar:2.5.8]
>         at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216) 
> ~[hbase-common-2.5.8.jar:2.5.8]
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:363) 
> ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:248) 
> ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.MetaTableAccessor.getRegionResult(MetaTableAccessor.java:344)
>  ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMergeRegions(MetaTableAccessor.java:374)
>  ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.master.procedure.DeleteTableProcedure.deleteFromFs(DeleteTableProcedure.java:283)
>  ~[hbase-server-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:103)
>  ~[hbase-server-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.master.procedure.TruncateTableProcedure.executeFromState(TruncateTableProcedure.java:44)
>  ~[hbase-server-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)
>  ~[hbase-procedure-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:942) 
> ~[hbase-procedure-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1765)
>  ~[hbase-procedure-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1443)
>  ~[hbase-procedure-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:77)
>  ~[hbase-procedure-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:2091)
>  ~[hbase-procedure-2.5.8.jar:2.5.8]
>         at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216) 
> ~[hbase-common-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2118)
>  ~[hbase-procedure-2.5.8.jar:2.5.8]
> Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: 
> org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size 
> allowed: 745, but the row is bigger than that, the row info: 
> uuid0f06811e5d11431fa9d4a543705b8747,,1713840709844.c03450373b32cba95a2dd3139f0a2201./info:state/1713840716052/Put/vlen=6/seqid=16,
>  already have process row cells = 6, it belong to region = 
> hbase:meta,,1.1588230740
>         at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:653)
>         at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:145)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.populateResult(RegionScannerImpl.java:342)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextInternal(RegionScannerImpl.java:513)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.nextRaw(RegionScannerImpl.java:278)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.next(RegionScannerImpl.java:256)
>         at 
> org.apache.hadoop.hbase.regionserver.RegionScannerImpl.next(RegionScannerImpl.java:243)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2654)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2577)
>         at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45002)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>         at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
>         at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)     
>    at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:388)
>  ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
>  ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:425)
>  ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:420)
>  ~[hbase-client-2.5.8.jar:2.5.8]
>         at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114) 
> ~[hbase-client-2.5.8.jar:2.5.8]
>         at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129) 
> ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.readResponse(NettyRpcDuplexHandler.java:199)
>  ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelRead(NettyRpcDuplexHandler.java:220)
>  ~[hbase-client-2.5.8.jar:2.5.8]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:509)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  ~[hbase-shaded-netty-4.1.5.jar:?]
>         at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
> {code}
> I have attached the one example hmaster log.
> h1. Root Cause
> If decrease the length of the table name and cf name in the create table 
> command, the truncate will work normally. This indicates that the symptom 
> could be triggered as long as there's a table whose metadata size is larger 
> than the max row size. 
> E.g., using the default max row size value (1GB), if create a table with a 
> large metadata size (lots of columns with long names), it might still cause 
> the similar infinite loop problem. 
> This usually won't happen with the default configurations. However, users are 
> more likely to run into this exception if they set the max row size to a 
> smaller value and create a large table.
> h1. Fix
> A better error handler might work
>  * This infinite loop could be avoided by adding a retry limit if the 
> truncate command keeps running into RowTooBigException.
> I am still examining the exact root cause. I could try to provide a patch. 
> Any suggestions are welcome. Thank you.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to