[ 
https://issues.apache.org/jira/browse/HBASE-29797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048236#comment-18048236
 ] 

Duo Zhang commented on HBASE-29797:
-----------------------------------

OK, we had suspended data03 for about 1 minute, starting from 11:02:23 to 
11:03:28

{noformat}
2025-12-29T11:02:23,549 INFO  [regionserver/data03:16020-longCompactions-0] 
compress.CodecPool: Got brand-new compressor [.gz] 2025-12-29T11:03:28,337 INFO 
 [RS_CLOSE_REGION-regionserver/data03:16020-0] handler.UnassignRegionHandler: 
Close 4aff39d07b0e7ed64e262b8fc9f14a63
2025-12-29T11:03:28,337 INFO  [RS_CLOSE_REGION-regionserver/data03:16020-0] 
regionserver.HRegion: Closing region 
IntegrationTestBigLinkedList,j\x1D\x18LZ\xC6\xF0\x1A\xF1B\xEE\xE4\x84\xCB\xB4\x9E,1766973008381.4aff39d07b0e7ed64e262b8fc9f14a63.2025-12-29T11:03:28,335
 WARN  [regionserver/data03:16020] util.Sleeper: We slept 67234ms instead of 
3000ms, this is likely due to a long garbage collecting pause and it's usually 
bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired2025-12-29T11:03:28,338
 INFO  [RS_CLOSE_REGION-regionserver/data03:16020-1] 
handler.UnassignRegionHandler: Close aea23dbdd1f955e04dab89e5327ae553
2025-12-29T11:03:28,338 INFO  [RS_CLOSE_REGION-regionserver/data03:16020-1] 
regionserver.HRegion: Closing region 
IntegrationTestBigLinkedList,\xFC\xF1Tc;,1766970301976.aea23dbdd1f955e04dab89e5327ae553.2025-12-29T11:03:28,338
 INFO  [RS_CLOSE_REGION-regionserver/data03:16020-2] 
handler.UnassignRegionHandler: Close 
53198b829a06da97ac9c27db533be3972025-12-29T11:03:28,338 INFO  
[RS_CLOSE_REGION-regionserver/data03:16020-2] regionserver.HRegion: Closing 
region 
IntegrationTestBigLinkedList,,1766969822377.53198b829a06da97ac9c27db533be397.
{noformat}

But aftere resuming, we started to process some closing region request on 
data03. I think this is the root cause, as seems there is no fencing when 
writing max sequence id file...

Let me check the related logic...

> RegionServer aborted because of invalid max sequence id
> -------------------------------------------------------
>
>                 Key: HBASE-29797
>                 URL: https://issues.apache.org/jira/browse/HBASE-29797
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>            Reporter: Duo Zhang
>            Priority: Critical
>
> {noformat}
> 2025-12-29T11:03:32,429 WARN  [RS_CLOSE_REGION-regionserver/data02:16020-0] 
> handler.UnassignRegionHandler: Fatal error occurred while closing region 
> 8d60369be1061570a2f6e47a1af7a797, aborting...
> java.io.IOException: The new max sequence id 1212630 is less than the old max 
> sequence id 1212631
>         at 
> org.apache.hadoop.hbase.wal.WALSplitUtil.writeRegionSequenceIdFile(WALSplitUtil.java:402)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.writeRegionCloseMarker(HRegion.java:1290)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1950)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1675)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1630)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1613)
>         at 
> org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler.process(UnassignRegionHandler.java:139)
>         at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>         at java.base/java.lang.Thread.run(Thread.java:840)
> 2025-12-29T11:03:32,433 ERROR [RS_CLOSE_REGION-regionserver/data02:16020-0] 
> regionserver.HRegionServer: ***** ABORTING region server 
> data02,16020,1766977119966: Failed to close region 
> 8d60369be1061570a2f6e47a1af7a797 and can not recover *****
> java.io.IOException: The new max sequence id 1212630 is less than the old max 
> sequence id 1212631
>         at 
> org.apache.hadoop.hbase.wal.WALSplitUtil.writeRegionSequenceIdFile(WALSplitUtil.java:402)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.writeRegionCloseMarker(HRegion.java:1290)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1950)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1675)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1630)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1613)
>         at 
> org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler.process(UnassignRegionHandler.java:139)
>         at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>         at java.base/java.lang.Thread.run(Thread.java:840)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to