[ https://issues.apache.org/jira/browse/HBASE-28192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784659#comment-17784659 ]
Viraj Jasani commented on HBASE-28192: -------------------------------------- Let me add some logs: regionserver where meta is online: {code:java} 2023-11-08 18:10:31,079 INFO [MemStoreFlusher.1] regionserver.HStore - Added hdfs://{cluster}/hbase/data/hbase/meta/1588230740/rep_barrier/3e5faf652f1e4c6db1c4ba1ae676c3ee, entries=1630, sequenceid=94325525, filesize=362.1 K {code} master server 4 which thought it was active: {code:java} 2023-11-08 18:14:34,563 DEBUG [0:becomeActiveMaster] assignment.AssignmentManager - Loaded hbase:meta state=OPEN, location=server3-1,61020,1699456864765, table=hbase:meta, region=1588230740 2023-11-08 18:14:34,609 INFO [0:becomeActiveMaster] master.ServerManager - Registering regionserver=server3-1,61020,1699456864765 {code} master server 1 which thought it was active: {code:java} 2023-11-08 18:15:50,350 DEBUG [aster/server1:61000:becomeActiveMaster] assignment.AssignmentManager - Loaded hbase:meta state=OPEN, location=server3-1,61020,1698687384632, table=hbase:meta, region=1588230740 2023-11-08 18:15:50,399 INFO [aster/server1:61000:becomeActiveMaster] master.ServerManager - Registering regionserver=server3-1,61020,1699456864765 {code} master server 4 gave up: {code:java} 2023-11-08 18:16:22,776 INFO [aster/server4:61000:becomeActiveMaster] master.ActiveMasterManager - Another master is the active master, server1,61000,1699467212235; waiting to become the next active master {code} When server 4 was trying to be active master and loaded meta, it retrieved the correct location of meta i.e. server3-1,61020,1699456864765 However, when server 1 (eventual active master) loaded meta, it retrieved incorrect location i.e. server3-1,61020,1698687384632 For hbase 2.5, i see that HBASE-26193 no longer relies on zookeeper and rather relies on scanning master region: {code:java} // Start the Assignment Thread startAssignmentThread(); // load meta region states. // here we are still in the early steps of active master startup. There is only one thread(us) // can access AssignmentManager and create region node, so here we do not need to lock the // region node. try (ResultScanner scanner = masterRegion.getScanner(new Scan().addFamily(HConstants.CATALOG_FAMILY))) { for (;;) { Result result = scanner.next(); if (result == null) { break; } RegionStateStore .visitMetaEntry((r, regionInfo, state, regionLocation, lastHost, openSeqNum) -> { RegionStateNode regionNode = regionStates.getOrCreateRegionStateNode(regionInfo); regionNode.setState(state); regionNode.setLastHost(lastHost); regionNode.setRegionLocation(regionLocation); regionNode.setOpenSeqNum(openSeqNum); if (regionNode.getProcedure() != null) { regionNode.getProcedure().stateLoaded(this, regionNode); } if (regionLocation != null) { regionStates.addRegionToServer(regionNode); } if (RegionReplicaUtil.isDefaultReplica(regionInfo.getReplicaId())) { setMetaAssigned(regionInfo, state == State.OPEN); } LOG.debug("Loaded hbase:meta {}", regionNode); }, result); } } mirrorMetaLocations(); } {code} > Master should recover if meta region state is inconsistent > ---------------------------------------------------------- > > Key: HBASE-28192 > URL: https://issues.apache.org/jira/browse/HBASE-28192 > Project: HBase > Issue Type: Improvement > Affects Versions: 2.4.17, 2.5.6 > Reporter: Viraj Jasani > Assignee: Viraj Jasani > Priority: Major > Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 2.5.7 > > > During active master initialization, before we set master as active (i.e. > {_}setInitialized(true){_}), we need both meta and namespace regions online. > If the region state of meta or namespace is inconsistent, active master can > get stuck in the initialization step: > {code:java} > private boolean isRegionOnline(RegionInfo ri) { > RetryCounter rc = null; > while (!isStopped()) { > ... > ... > ... > // Check once-a-minute. > if (rc == null) { > rc = new RetryCounterFactory(Integer.MAX_VALUE, 1000, 60_000).create(); > } > Threads.sleep(rc.getBackoffTimeAndIncrementAttempts()); > } > return false; > } > {code} > In one of the recent outage, we observed that meta was online on a server, > which was correctly reflected in meta znode, but the server starttime was > different. This means that as per the latest transition record, meta was > marked online on old server (same server with old start time). This kept > active master initialization waiting forever and some SCPs got stuck in > initial stage where they need to access meta table before getting candidate > for region moves. > The only way out of this outage is for operator to schedule recoveries using > hbck for old server, which triggers SCP for old server address of meta. Since > many SCPs were stuck, the processing of new SCP too was taking some time and > manual restart of active master triggered failover, and new master was able > to complete SCP for old meta server, correcting the meta assignment details, > which eventually marked master as active and only after this, we were able to > see real large num of RITs that were hidden so far. > We need to let master recover from this state to avoid manual intervention. -- This message was sent by Atlassian Jira (v8.20.10#820010)