[
https://issues.apache.org/jira/browse/HBASE-26245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412271#comment-17412271
]
Duo Zhang commented on HBASE-26245:
-----------------------------------
{quote}
At a minimum, if we track RegionServers which are "dead but not yet processed"
in the Master Region, I think that handles our biggest "we don't know if it's
safe to schedule SCP" concern.
{quote}
Practically, we do not have a list like this. If we want to process a dead
region server, we will schedule a SCP, and it will be recorded in the procedure
store. There is only a very short interval here, I do not think it worth to add
one more step to store the region server to other places...
The root problem here, is we need to get the old live region server list, and
then compare it with the current live region server list, to find out the dead
region servers need to process.
In general, the region server list could also be constructed by scanning
meta(this should the typical bigtable way), but there is a cyclic dependency
that, we need SCP to bring meta online first. I think there could be tricky way
to break the tie. As we can load the location of meta region first, then we
could compare it with the current live region servers, if it is dead, then we
could schedule a SCP for the region server first to bring meta online, and then
we could scan meta to find out other region servers.
But there is still a problem for replication. In the current implementation, we
rely on SCP to assign the replication queues for a dead region server to other
region servers, so even if the region server does not carry any regions, we
still need to schedule a SCP for it, as maybe it held some regions in the past
and then all of them were moved elsewhere, but it still has some unreplicated
wals...
So, this is a complicated problem... We need to discuss more here.
Thanks [~elserj] and [~zyork] for chimming in.
> Store region server list in master local region
> -----------------------------------------------
>
> Key: HBASE-26245
> URL: https://issues.apache.org/jira/browse/HBASE-26245
> Project: HBase
> Issue Type: Brainstorming
> Components: master, Zookeeper
> Reporter: Duo Zhang
> Priority: Major
>
> Just a simple idea still need to be polished.
> For large clusters, ZooKeeper could be a bottle neck, there are some related
> issues to not track region server list as much as possible, but what if we
> want to do more, just do not register region server on zk?
> I think, zookeeper here is act as something like a service registry, we need
> the list to get all the region servers, and also need to know the changes of
> the list.
> But in fact, we could also kill region server from master side, so the latter
> one could be done by a periodical heartbeat check daemon on master. And for
> the former one, we could store the list in master local region, so when
> master restart, it could setup the region server list by load from master
> local region.
> And there are mainly two other side effect which are good:
> 1. We do not need to list the WAL directory on HDFS to find the previous
> region servers, for scheduling SCP. This could make it possible to restart a
> new HBase cluster based on only the root directory.
> 2. For now, region server needs to register to HMaster first, and then put
> its node on ZooKeeper, if it fails between these two actions, there is no way
> for HMaster to clean this dead server, as it never expires on ZooKeeper.
> There should be a related issue. If we just do not store a node on zk, then
> this problem is also gone.
> Of course, there will still be lots of other problems, like whether we need
> another heartbeat call as reportForDuty is a bit heavy as we also report the
> region list, etc.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)