[jira] [Commented] (HBASE-24286) HMaster won't become healthy after after cloning or creating a new cluster pointing at the same file system

Josh Elser (Jira) Mon, 13 Sep 2021 17:12:04 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414646#comment-17414646
 ]


Josh Elser commented on HBASE-24286:
------------------------------------

So, I've been back in this part of the codebase again. I have something which, 
I think, generally works in happy paths against branch-2.4 right now. I know 
that Zach is also looking at this as well right now.

The idea is pretty similar to what Stephen was trying to do. I want to make 
sure that we're all in agreement that this approach makes sense before I start 
throwing up yet another pull request. (the below is assuming Master Region, 
HBase 2.4+)

h3. Get hbase:meta assigned

Right now, hbase:meta will sit unassigned if we lose the WALs because we have 
nothing to assign hbase:meta. SCPs get submitted back on the WALs on the FS, so 
there is no entity in HBase who is looking at ZK to say " this meta region says 
it's OPEN on this RS which is definitely not alive". This situation is also 
subject to change with HBASE-26193. The first piece is that, when IMP has 
already run once and we can reasonably determine that meta is on a non-alive 
RS, we can trigger it to be reassigned.

When we don't have an InitMetaProcedure, it's more complex. IMP is two-fold: 
create hbase:meta and then assign it. IMP will be destructive to any hbase:meta 
that happens to be on disk right now, so it's important that we don't try to 
run it multiple times. My change modifies IMP such that, if it notices a 
hbase:meta directory on the filesystem which _looks_ reasonable (e.g. region 
directory exists, table descriptor exists), it will not blindly create a 
brand-new meta. Then, it assigns meta as before.

h3. Get other regions assigned

At this point, we should be capable of getting meta back, and we can play the 
trick where we look at RegionServers which are marked as hosting regions but 
are not LIVE (holding lock in ZK and heartbeating with Master) or DEAD (in 
master memory, not holding ZK lock, not in the process of being recovered). The 
idea is that we try to identify which RegionServers are UNKNOWN only when the 
Master first starts up (rather than continuously) and hopefully reduce some of 
the risks that Stack/Duo called out in PR#2113. For each RS we find in 
hbase:meta which we call UNKNOWN, we submit an SCP and let HBase do its thing.

The change I have largely doesn't address the concerns about UNKNOWN servers 
and manual verification 
(https://github.com/apache/hbase/pull/2113#issuecomment-701656158). The only 
improvement is that we only perform this operation during Master startup (after 
grabbing the lock, prior to becoming active). Because of this, I believe we're 
reducing the risk of some RS being inadvertently marked as UNKNOWN (due to some 
bug) and (at worst) causing a double assignment.

Once I realized that Zach was still chasing this, I told him I would bring up 
this discussion once more to see if folks have any appetite for trying to make 
this work. I understand that not everyone is operating in the world where this 
is scenario is even remotely plausible, but it's a burden my team and I have to 
deal with :). I am always in favor of getting these changes upstream for all to 
benefit from, but I don't want to rehash a difficult topic again unless people 
think there is merit. Let me know!

> HMaster won't become healthy after after cloning or creating a new cluster 
> pointing at the same file system
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-24286
>                 URL: https://issues.apache.org/jira/browse/HBASE-24286
>             Project: HBase
>          Issue Type: Bug
>          Components: master, Region Assignment
>    Affects Versions: 3.0.0-alpha-1, 2.2.3, 2.2.4, 2.2.5
>            Reporter: Jack Ye
>            Assignee: Tak-Lon (Stephen) Wu
>            Priority: Major
>
> h1. How to reproduce:
>  # user starts an HBase cluster on top of a file system
>  # user performs some operations and shuts down the cluster, all the data are 
> still persisted in the file system
>  # user creates a new HBase cluster using a different set of servers on top 
> of the same file system with the same root directory
>  # HMaster cannot initialize
> h1. Root cause:
> During HMaster initialization phase, the following happens:
>  # HMaster waits for namespace table online
>  # AssignmentManager gets all namespace table regions info
>  # region servers of namespace table are already dead, online check fails
>  # HMaster waits for namespace regions online, keep retrying for 1000 times 
> which means forever
> Code waiting for namespace table to be online: 
> https://github.com/apache/hbase/blob/rel/2.2.3/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1102
> h1. Stack trace (running on S3):
> 2020-04-23 08:15:57,185 WARN [master/ip-10-12-13-14:16000:becomeActiveMaster] 
> master.HMaster: 
> hbase:namespace,,1587628169070.d34b65b91a52644ed3e77c5fbb065c2b. is NOT 
> online; state=\{d34b65b91a52644ed3e77c5fbb065c2b state=OPEN, 
> ts=1587629742129, server=ip-10-12-13-14.ec2.internal,16020,1587628031614}; 
> ServerCrashProcedures=false. Master startup cannot progress, in 
> holding-pattern until region onlined.
> where ip-10-12-13-14.ec2.internal is the old region server hosting the region 
> of hbase:namespace.
> h1. Discussion for the fix
> We see there is a fix for this at branch-3: 
> https://issues.apache.org/jira/browse/HBASE-21154. Before we provide a patch, 
> we would like to know from the community if we should backport this change to 
> branch-2, or if we should just perform a fix with minimum code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24286) HMaster won't become healthy after after cloning or creating a new cluster pointing at the same file system

Reply via email to