[ 
https://issues.apache.org/jira/browse/HBASE-29802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-29802:
------------------------------
    Status: Patch Available  (was: Open)

> NPE when shutting down mini cluster cause tests hang
> ----------------------------------------------------
>
>                 Key: HBASE-29802
>                 URL: https://issues.apache.org/jira/browse/HBASE-29802
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce, test
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>              Labels: pull-request-available
>
> Found this in a timeout run for TestSecureExportSnapshot
> {noformat}
> 2026-01-02T22:53:58,988 INFO  [RegionServerTracker-0] master.HMaster(3360): 
> ***** STOPPING master '59fad5da8b71,38027,1767393931926' *****
> 2026-01-02T22:53:58,988 INFO  [RegionServerTracker-0] master.HMaster(3362): 
> STOPPED: Cluster shutdown set; onlineServer=0
> 2026-01-02T22:53:58,988 INFO  [M:0;59fad5da8b71:38027] 
> hbase.HBaseServerBase(457): Close async cluster connection
> 2026-01-02T22:53:58,988 INFO  [M:0;59fad5da8b71:38027] 
> hbase.HBaseServerBase(440): Shutdown chores and chore service
> 2026-01-02T22:53:58,988 DEBUG [M:0;59fad5da8b71:38027] 
> cleaner.LogCleaner(212): Cancelling LogCleaner
> 2026-01-02T22:53:58,989 DEBUG [M:0;59fad5da8b71:38027] 
> cleaner.HFileCleaner(335): Stopping file delete threads
> 2026-01-02T22:53:58,989 WARN  [OldWALsCleaner-0] cleaner.LogCleaner(200): 
> Interrupted while cleaning old WALs, will try to clean it next round. Exiting.
> 2026-01-02T22:53:58,989 DEBUG 
> [master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.large.0-1767393935168] 
> cleaner.HFileCleaner(306): Exit 
> Thread[master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.large.0-1767393935168,5,FailOnTimeoutGroup]
> 2026-01-02T22:53:58,989 DEBUG 
> [master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.small.0-1767393935174] 
> cleaner.HFileCleaner(306): Exit 
> Thread[master/59fad5da8b71:0:becomeActiveMaster-HFileCleaner.small.0-1767393935174,5,FailOnTimeoutGroup]
> 2026-01-02T22:53:58,989 INFO  [M:0;59fad5da8b71:38027] 
> hbase.ChoreService(370): Chore service for: master/59fad5da8b71:0 had 
> [ScheduledChore name=FlushedSequenceIdFlusher, period=10800000, 
> unit=MILLISECONDS] on shutdown
> 2026-01-02T22:53:58,989 INFO  [M:0;59fad5da8b71:38027] 
> hbase.HBaseServerBase(450): Shutdown executor service
> 2026-01-02T22:53:58,989 DEBUG [M:0;59fad5da8b71:38027] master.HMaster(1813): 
> Stopping service threads
> 2026-01-02T22:53:58,989 INFO  [M:0;59fad5da8b71:38027] 
> procedure2.RemoteProcedureDispatcher(119): Stopping procedure remote 
> dispatcher
> 2026-01-02T22:53:58,989 INFO  [M:0;59fad5da8b71:38027] 
> procedure2.ProcedureExecutor(723): Stopping
> 2026-01-02T22:53:58,990 INFO  [M:0;59fad5da8b71:38027] 
> region.RegionProcedureStore(110): Stopping the Region Procedure Store, 
> isAbort=false
> 2026-01-02T22:53:58,991 DEBUG [Time-limited test-EventThread] 
> zookeeper.ZKWatcher(609): master:38027-0x100734ba5eb0000, 
> quorum=127.0.0.1:59100, baseZNode=/hbase Received ZooKeeper Event, 
> type=NodeDeleted, state=SyncConnected, path=/hbase/master
> 2026-01-02T22:53:58,992 DEBUG [Time-limited test-EventThread] 
> zookeeper.ZKWatcher(609): master:38027-0x100734ba5eb0000, 
> quorum=127.0.0.1:59100, baseZNode=/hbase Received ZooKeeper Event, 
> type=NodeChildrenChanged, state=SyncConnected, path=/hbase
> 2026-01-02T22:53:58,992 DEBUG [M:0;59fad5da8b71:38027] zookeeper.ZKUtil(347): 
> master:38027-0x100734ba5eb0000, quorum=127.0.0.1:59100, baseZNode=/hbase 
> Unable to get data of znode /hbase/master because node does not exist (not an 
> error)
> 2026-01-02T22:53:58,992 WARN  [M:0;59fad5da8b71:38027] 
> master.ActiveMasterManager(344): Failed get of master address: 
> java.io.IOException: Can't get master address from ZooKeeper; znode data == 
> null
> 2026-01-02T22:53:58,993 INFO  [M:0;59fad5da8b71:38027] 
> master.ServerManager(1148): Writing .lastflushedseqids file at: 
> hdfs://localhost:44733/user/jenkins/test-data/eb9fcf01-d40e-546d-7f81-2198b3d62cff/.lastflushedseqids
> 2026-01-02T22:53:58,994 DEBUG [normalizer-worker-0] 
> normalizer.RegionNormalizerWorker(193): interrupt detected. terminating.
> 2026-01-02T22:53:59,019 INFO  [Block report processor] 
> blockmanagement.BlockManager(3777): BLOCK* addStoredBlock: 127.0.0.1:37189 is 
> added to blk_1073742422_1598 (size=307)
> 2026-01-02T22:53:59,019 INFO  [Block report processor] 
> blockmanagement.BlockManager(3777): BLOCK* addStoredBlock: 127.0.0.1:38631 is 
> added to blk_1073742422_1598 (size=307)
> 2026-01-02T22:53:59,020 INFO  [Block report processor] 
> blockmanagement.BlockManager(3777): BLOCK* addStoredBlock: 127.0.0.1:40115 is 
> added to blk_1073742422_1598 (size=307)
> 2026-01-02T22:53:59,020 INFO  [M:0;59fad5da8b71:38027] 
> assignment.AssignmentManager(409): Stopping assignment manager
> 2026-01-02T22:53:59,021 INFO  [M:0;59fad5da8b71:38027] 
> region.MasterRegion(264): Closing local region {ENCODED => 
> 1595e783b53d99cd5eef43b6debb2682, NAME => 
> 'master:store,,1.1595e783b53d99cd5eef43b6debb2682.', STARTKEY => '', ENDKEY 
> => ''}, isAbort=false
> 2026-01-02T22:53:59,021 DEBUG [M:0;59fad5da8b71:38027] 
> regionserver.HRegion(1725): Closing 1595e783b53d99cd5eef43b6debb2682, 
> disabling compactions & flushes
> 2026-01-02T22:53:59,042 INFO  [M:0;59fad5da8b71:38027] 
> regionserver.HRegion(1758): Closing region 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2026-01-02T22:53:59,042 DEBUG [M:0;59fad5da8b71:38027] 
> regionserver.HRegion(1779): Time limited wait for close lock on 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2026-01-02T22:53:59,042 DEBUG [M:0;59fad5da8b71:38027] 
> regionserver.HRegion(1846): Acquired close lock on 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682. after waiting 0 ms
> 2026-01-02T22:53:59,043 DEBUG [M:0;59fad5da8b71:38027] 
> regionserver.HRegion(1856): Updates disabled for region 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682.
> 2026-01-02T22:53:59,043 INFO  [M:0;59fad5da8b71:38027] 
> regionserver.HRegion(2969): Flushing 1595e783b53d99cd5eef43b6debb2682 4/4 
> column families, dataSize=1013.53 KB heapSize=1.19 MB
> 2026-01-02T22:53:59,043 ERROR 
> [AsyncFSWAL-0-hdfs://localhost:44733/user/jenkins/test-data/eb9fcf01-d40e-546d-7f81-2198b3d62cff/MasterData-prefix:59fad5da8b71,38027,1767393931926]
>  server.NIOServerCnxnFactory(85): Thread 
> Thread[AsyncFSWAL-0-hdfs://localhost:44733/user/jenkins/test-data/eb9fcf01-d40e-546d-7f81-2198b3d62cff/MasterData-prefix:59fad5da8b71,38027,1767393931926,5,FailOnTimeoutGroup]
>  died
> java.lang.NullPointerException: Cannot invoke 
> "org.apache.hbase.thirdparty.io.netty.buffer.ByteBuf.readableBytes()" because 
> "this.buf" is null
>       at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.buffered(FanOutOneBlockAsyncDFSOutput.java:414)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.append(AsyncProtobufLogWriter.java:134)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doAppend(AsyncFSWAL.java:181)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doAppend(AsyncFSWAL.java:100)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.appendEntry(AbstractFSWAL.java:1333)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.appendAndSync(AbstractFSWAL.java:1724)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.consume(AbstractFSWAL.java:1832)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>       at java.base/java.lang.Thread.run(Thread.java:840)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to