[
https://issues.apache.org/jira/browse/HBASE-26245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang resolved HBASE-26245.
-------------------------------
Hadoop Flags: Reviewed
Resolution: Fixed
Pushed to branch-2.5+.
Thanks all for helping and reviewing!
> Store region server list in master local region
> -----------------------------------------------
>
> Key: HBASE-26245
> URL: https://issues.apache.org/jira/browse/HBASE-26245
> Project: HBase
> Issue Type: Brainstorming
> Components: master, Zookeeper
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Major
>
> Just a simple idea still need to be polished.
> For large clusters, ZooKeeper could be a bottle neck, there are some related
> issues to not track region server list as much as possible, but what if we
> want to do more, just do not register region server on zk?
> I think, zookeeper here is act as something like a service registry, we need
> the list to get all the region servers, and also need to know the changes of
> the list.
> But in fact, we could also kill region server from master side, so the latter
> one could be done by a periodical heartbeat check daemon on master. And for
> the former one, we could store the list in master local region, so when
> master restart, it could setup the region server list by load from master
> local region.
> And there are mainly two other side effect which are good:
> 1. We do not need to list the WAL directory on HDFS to find the previous
> region servers, for scheduling SCP. This could make it possible to restart a
> new HBase cluster based on only the root directory.
> 2. For now, region server needs to register to HMaster first, and then put
> its node on ZooKeeper, if it fails between these two actions, there is no way
> for HMaster to clean this dead server, as it never expires on ZooKeeper.
> There should be a related issue. If we just do not store a node on zk, then
> this problem is also gone.
> Of course, there will still be lots of other problems, like whether we need
> another heartbeat call as reportForDuty is a bit heavy as we also report the
> region list, etc.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)