[
https://issues.apache.org/jira/browse/HBASE-29797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048236#comment-18048236
]
Duo Zhang commented on HBASE-29797:
-----------------------------------
OK, we had suspended data03 for about 1 minute, starting from 11:02:23 to
11:03:28
{noformat}
2025-12-29T11:02:23,549 INFO [regionserver/data03:16020-longCompactions-0]
compress.CodecPool: Got brand-new compressor [.gz] 2025-12-29T11:03:28,337 INFO
[RS_CLOSE_REGION-regionserver/data03:16020-0] handler.UnassignRegionHandler:
Close 4aff39d07b0e7ed64e262b8fc9f14a63
2025-12-29T11:03:28,337 INFO [RS_CLOSE_REGION-regionserver/data03:16020-0]
regionserver.HRegion: Closing region
IntegrationTestBigLinkedList,j\x1D\x18LZ\xC6\xF0\x1A\xF1B\xEE\xE4\x84\xCB\xB4\x9E,1766973008381.4aff39d07b0e7ed64e262b8fc9f14a63.2025-12-29T11:03:28,335
WARN [regionserver/data03:16020] util.Sleeper: We slept 67234ms instead of
3000ms, this is likely due to a long garbage collecting pause and it's usually
bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired2025-12-29T11:03:28,338
INFO [RS_CLOSE_REGION-regionserver/data03:16020-1]
handler.UnassignRegionHandler: Close aea23dbdd1f955e04dab89e5327ae553
2025-12-29T11:03:28,338 INFO [RS_CLOSE_REGION-regionserver/data03:16020-1]
regionserver.HRegion: Closing region
IntegrationTestBigLinkedList,\xFC\xF1Tc;,1766970301976.aea23dbdd1f955e04dab89e5327ae553.2025-12-29T11:03:28,338
INFO [RS_CLOSE_REGION-regionserver/data03:16020-2]
handler.UnassignRegionHandler: Close
53198b829a06da97ac9c27db533be3972025-12-29T11:03:28,338 INFO
[RS_CLOSE_REGION-regionserver/data03:16020-2] regionserver.HRegion: Closing
region
IntegrationTestBigLinkedList,,1766969822377.53198b829a06da97ac9c27db533be397.
{noformat}
But aftere resuming, we started to process some closing region request on
data03. I think this is the root cause, as seems there is no fencing when
writing max sequence id file...
Let me check the related logic...
> RegionServer aborted because of invalid max sequence id
> -------------------------------------------------------
>
> Key: HBASE-29797
> URL: https://issues.apache.org/jira/browse/HBASE-29797
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Reporter: Duo Zhang
> Priority: Critical
>
> {noformat}
> 2025-12-29T11:03:32,429 WARN [RS_CLOSE_REGION-regionserver/data02:16020-0]
> handler.UnassignRegionHandler: Fatal error occurred while closing region
> 8d60369be1061570a2f6e47a1af7a797, aborting...
> java.io.IOException: The new max sequence id 1212630 is less than the old max
> sequence id 1212631
> at
> org.apache.hadoop.hbase.wal.WALSplitUtil.writeRegionSequenceIdFile(WALSplitUtil.java:402)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.writeRegionCloseMarker(HRegion.java:1290)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1950)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1675)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1630)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1613)
> at
> org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler.process(UnassignRegionHandler.java:139)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.base/java.lang.Thread.run(Thread.java:840)
> 2025-12-29T11:03:32,433 ERROR [RS_CLOSE_REGION-regionserver/data02:16020-0]
> regionserver.HRegionServer: ***** ABORTING region server
> data02,16020,1766977119966: Failed to close region
> 8d60369be1061570a2f6e47a1af7a797 and can not recover *****
> java.io.IOException: The new max sequence id 1212630 is less than the old max
> sequence id 1212631
> at
> org.apache.hadoop.hbase.wal.WALSplitUtil.writeRegionSequenceIdFile(WALSplitUtil.java:402)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.writeRegionCloseMarker(HRegion.java:1290)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1950)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1675)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1630)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1613)
> at
> org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler.process(UnassignRegionHandler.java:139)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.base/java.lang.Thread.run(Thread.java:840)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)