[
https://issues.apache.org/jira/browse/HBASE-29802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HBASE-29802:
-----------------------------------
Labels: pull-request-available (was: )
> NPE when shutting down mini cluster cause tests hang
> ----------------------------------------------------
>
> Key: HBASE-29802
> URL: https://issues.apache.org/jira/browse/HBASE-29802
> Project: HBase
> Issue Type: Bug
> Components: mapreduce, test
> Reporter: Duo Zhang
> Priority: Major
> Labels: pull-request-available
>
> Found this in a timeout run for TestSecureExportSnapshot
> {noformat}
> 2026-01-02T22:53:58,988 INFO [RegionServerTracker-0] master.HMaster(3360):
> ***** STOPPING master '59fad5da8b71,38027,1767393931926' *****
> 2026-01-02T22:53:58,988 INFO [RegionServerTracker-0] master.HMaster(3362):
> STOPPED: Cluster shutdown set; onlineServer=0
> 2026-01-02T22:53:58,988 INFO [M:0;59fad5da8b71:38027]
> hbase.HBaseServerBase(457): Close async cluster connection
> 2026-01-02T22:53:58,988 INFO [M:0;59fad5da8b71:38027]
> hbase.HBaseServerBase(440): Shutdown chores and chore service
> 2026-01-02T22:53:58,988 DEBUG [M:0;59fad5da8b71:38027]
> cleaner.LogCleaner(212): Cancelling LogCleaner
> 2026-01-02T22:53:58,989 DEBUG [M:0;59fad5da8b71:38027]
> cleaner.HFileCleaner(335): Stopping file delete threads
> 2026-01-02T22:53:58,989 WARN [OldWALsCleaner-0] cleaner.LogCleaner(200):
> Interrupted while cleaning old WALs, will try to clean it next round. Exiting.
> 2026-01-02T22:53:58,989 DEBUG
> [master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.large.0-1767393935168]
> cleaner.HFileCleaner(306): Exit
> Thread[master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.large.0-1767393935168,5,FailOnTimeoutGroup]
> 2026-01-02T22:53:58,989 DEBUG
> [master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.small.0-1767393935174]
> cleaner.HFileCleaner(306): Exit
> Thread[master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.small.0-1767393935174,5,FailOnTimeoutGroup]
> 2026-01-02T22:53:58,989 INFO [M:0;59fad5da8b71:38027]
> hbase.ChoreService(370): Chore service for: master/59fad5da8b71:0 had
> [ScheduledChore name=FlushedSequenceIdFlusher, period=10800000,
> unit=MILLISECONDS] on shutdown
> 2026-01-02T22:53:58,989 INFO [M:0;59fad5da8b71:38027]
> hbase.HBaseServerBase(450): Shutdown executor service
> 2026-01-02T22:53:58,989 DEBUG [M:0;59fad5da8b71:38027] master.HMaster(1813):
> Stopping service threads
> 2026-01-02T22:53:58,989 INFO [M:0;59fad5da8b71:38027]
> procedure2.RemoteProcedureDispatcher(119): Stopping procedure remote
> dispatcher
> 2026-01-02T22:53:58,989 INFO [M:0;59fad5da8b71:38027]
> procedure2.ProcedureExecutor(723): Stopping
> 2026-01-02T22:53:58,990 INFO [M:0;59fad5da8b71:38027]
> region.RegionProcedureStore(110): Stopping the Region Procedure Store,
> isAbort=false
> 2026-01-02T22:53:58,991 DEBUG [Time-limited test-EventThread]
> zookeeper.ZKWatcher(609): master:38027-0x100734ba5eb0000,
> quorum=127.0.0.1:59100, baseZNode=/hbase Received ZooKeeper Event,
> type=NodeDeleted, state=SyncConnected, path=/hbase/master
> 2026-01-02T22:53:58,992 DEBUG [Time-limited test-EventThread]
> zookeeper.ZKWatcher(609): master:38027-0x100734ba5eb0000,
> quorum=127.0.0.1:59100, baseZNode=/hbase Received ZooKeeper Event,
> type=NodeChildrenChanged, state=SyncConnected, path=/hbase
> 2026-01-02T22:53:58,992 DEBUG [M:0;59fad5da8b71:38027] zookeeper.ZKUtil(347):
> master:38027-0x100734ba5eb0000, quorum=127.0.0.1:59100, baseZNode=/hbase
> Unable to get data of znode /hbase/master because node does not exist (not an
> error)
> 2026-01-02T22:53:58,992 WARN [M:0;59fad5da8b71:38027]
> master.ActiveMasterManager(344): Failed get of master address:
> java.io.IOException: Can't get master address from ZooKeeper; znode data ==
> null
> 2026-01-02T22:53:58,993 INFO [M:0;59fad5da8b71:38027]
> master.ServerManager(1148): Writing .lastflushedseqids file at:
> hdfs://localhost:44733/user/jenkins/test-data/eb9fcf01-d40e-546d-7f81-2198b3d62cff/.lastflushedseqids
> 2026-01-02T22:53:58,994 DEBUG [normalizer-worker-0]
> normalizer.RegionNormalizerWorker(193): interrupt detected. terminating.
> 2026-01-02T22:53:59,019 INFO [Block report processor]
> blockmanagement.BlockManager(3777): BLOCK* addStoredBlock: 127.0.0.1:37189 is
> added to blk_1073742422_1598 (size=307)
> 2026-01-02T22:53:59,019 INFO [Block report processor]
> blockmanagement.BlockManager(3777): BLOCK* addStoredBlock: 127.0.0.1:38631 is
> added to blk_1073742422_1598 (size=307)
> 2026-01-02T22:53:59,020 INFO [Block report processor]
> blockmanagement.BlockManager(3777): BLOCK* addStoredBlock: 127.0.0.1:40115 is
> added to blk_1073742422_1598 (size=307)
> 2026-01-02T22:53:59,020 INFO [M:0;59fad5da8b71:38027]
> assignment.AssignmentManager(409): Stopping assignment manager
> 2026-01-02T22:53:59,021 INFO [M:0;59fad5da8b71:38027]
> region.MasterRegion(264): Closing local region {ENCODED =>
> 1595e783b53d99cd5eef43b6debb2682, NAME =>
> 'master:store,,1.1595e783b53d99cd5eef43b6debb2682.', STARTKEY => '', ENDKEY
> => ''}, isAbort=false
> 2026-01-02T22:53:59,021 DEBUG [M:0;59fad5da8b71:38027]
> regionserver.HRegion(1725): Closing 1595e783b53d99cd5eef43b6debb2682,
> disabling compactions & flushes
> 2026-01-02T22:53:59,042 INFO [M:0;59fad5da8b71:38027]
> regionserver.HRegion(1758): Closing region
> master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2026-01-02T22:53:59,042 DEBUG [M:0;59fad5da8b71:38027]
> regionserver.HRegion(1779): Time limited wait for close lock on
> master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2026-01-02T22:53:59,042 DEBUG [M:0;59fad5da8b71:38027]
> regionserver.HRegion(1846): Acquired close lock on
> master:store,,1.1595e783b53d99cd5eef43b6debb2682. after waiting 0 ms
> 2026-01-02T22:53:59,043 DEBUG [M:0;59fad5da8b71:38027]
> regionserver.HRegion(1856): Updates disabled for region
> master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2026-01-02T22:53:59,043 INFO [M:0;59fad5da8b71:38027]
> regionserver.HRegion(2969): Flushing 1595e783b53d99cd5eef43b6debb2682 4/4
> column families, dataSize=1013.53 KB heapSize=1.19 MB
> 2026-01-02T22:53:59,043 ERROR
> [AsyncFSWAL-0-hdfs://localhost:44733/user/jenkins/test-data/eb9fcf01-d40e-546d-7f81-2198b3d62cff/MasterData-prefix:59fad5da8b71,38027,1767393931926]
> server.NIOServerCnxnFactory(85): Thread
> Thread[AsyncFSWAL-0-hdfs://localhost:44733/user/jenkins/test-data/eb9fcf01-d40e-546d-7f81-2198b3d62cff/MasterData-prefix:59fad5da8b71,38027,1767393931926,5,FailOnTimeoutGroup]
> died
> java.lang.NullPointerException: Cannot invoke
> "org.apache.hbase.thirdparty.io.netty.buffer.ByteBuf.readableBytes()" because
> "this.buf" is null
> at
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.buffered(FanOutOneBlockAsyncDFSOutput.java:414)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.append(AsyncProtobufLogWriter.java:134)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doAppend(AsyncFSWAL.java:181)
> at
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doAppend(AsyncFSWAL.java:100)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.appendEntry(AbstractFSWAL.java:1333)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.appendAndSync(AbstractFSWAL.java:1724)
> at
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.consume(AbstractFSWAL.java:1832)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.base/java.lang.Thread.run(Thread.java:840)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)