[ https://issues.apache.org/jira/browse/HBASE-24286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414646#comment-17414646 ]
Josh Elser commented on HBASE-24286: ------------------------------------ So, I've been back in this part of the codebase again. I have something which, I think, generally works in happy paths against branch-2.4 right now. I know that Zach is also looking at this as well right now. The idea is pretty similar to what Stephen was trying to do. I want to make sure that we're all in agreement that this approach makes sense before I start throwing up yet another pull request. (the below is assuming Master Region, HBase 2.4+) h3. Get hbase:meta assigned Right now, hbase:meta will sit unassigned if we lose the WALs because we have nothing to assign hbase:meta. SCPs get submitted back on the WALs on the FS, so there is no entity in HBase who is looking at ZK to say " this meta region says it's OPEN on this RS which is definitely not alive". This situation is also subject to change with HBASE-26193. The first piece is that, when IMP has already run once and we can reasonably determine that meta is on a non-alive RS, we can trigger it to be reassigned. When we don't have an InitMetaProcedure, it's more complex. IMP is two-fold: create hbase:meta and then assign it. IMP will be destructive to any hbase:meta that happens to be on disk right now, so it's important that we don't try to run it multiple times. My change modifies IMP such that, if it notices a hbase:meta directory on the filesystem which _looks_ reasonable (e.g. region directory exists, table descriptor exists), it will not blindly create a brand-new meta. Then, it assigns meta as before. h3. Get other regions assigned At this point, we should be capable of getting meta back, and we can play the trick where we look at RegionServers which are marked as hosting regions but are not LIVE (holding lock in ZK and heartbeating with Master) or DEAD (in master memory, not holding ZK lock, not in the process of being recovered). The idea is that we try to identify which RegionServers are UNKNOWN only when the Master first starts up (rather than continuously) and hopefully reduce some of the risks that Stack/Duo called out in PR#2113. For each RS we find in hbase:meta which we call UNKNOWN, we submit an SCP and let HBase do its thing. The change I have largely doesn't address the concerns about UNKNOWN servers and manual verification (https://github.com/apache/hbase/pull/2113#issuecomment-701656158). The only improvement is that we only perform this operation during Master startup (after grabbing the lock, prior to becoming active). Because of this, I believe we're reducing the risk of some RS being inadvertently marked as UNKNOWN (due to some bug) and (at worst) causing a double assignment. Once I realized that Zach was still chasing this, I told him I would bring up this discussion once more to see if folks have any appetite for trying to make this work. I understand that not everyone is operating in the world where this is scenario is even remotely plausible, but it's a burden my team and I have to deal with :). I am always in favor of getting these changes upstream for all to benefit from, but I don't want to rehash a difficult topic again unless people think there is merit. Let me know! > HMaster won't become healthy after after cloning or creating a new cluster > pointing at the same file system > ----------------------------------------------------------------------------------------------------------- > > Key: HBASE-24286 > URL: https://issues.apache.org/jira/browse/HBASE-24286 > Project: HBase > Issue Type: Bug > Components: master, Region Assignment > Affects Versions: 3.0.0-alpha-1, 2.2.3, 2.2.4, 2.2.5 > Reporter: Jack Ye > Assignee: Tak-Lon (Stephen) Wu > Priority: Major > > h1. How to reproduce: > # user starts an HBase cluster on top of a file system > # user performs some operations and shuts down the cluster, all the data are > still persisted in the file system > # user creates a new HBase cluster using a different set of servers on top > of the same file system with the same root directory > # HMaster cannot initialize > h1. Root cause: > During HMaster initialization phase, the following happens: > # HMaster waits for namespace table online > # AssignmentManager gets all namespace table regions info > # region servers of namespace table are already dead, online check fails > # HMaster waits for namespace regions online, keep retrying for 1000 times > which means forever > Code waiting for namespace table to be online: > https://github.com/apache/hbase/blob/rel/2.2.3/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1102 > h1. Stack trace (running on S3): > 2020-04-23 08:15:57,185 WARN [master/ip-10-12-13-14:16000:becomeActiveMaster] > master.HMaster: > hbase:namespace,,1587628169070.d34b65b91a52644ed3e77c5fbb065c2b. is NOT > online; state=\{d34b65b91a52644ed3e77c5fbb065c2b state=OPEN, > ts=1587629742129, server=ip-10-12-13-14.ec2.internal,16020,1587628031614}; > ServerCrashProcedures=false. Master startup cannot progress, in > holding-pattern until region onlined. > where ip-10-12-13-14.ec2.internal is the old region server hosting the region > of hbase:namespace. > h1. Discussion for the fix > We see there is a fix for this at branch-3: > https://issues.apache.org/jira/browse/HBASE-21154. Before we provide a patch, > we would like to know from the community if we should backport this change to > branch-2, or if we should just perform a fix with minimum code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)