[ 
https://issues.apache.org/jira/browse/HBASE-26245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514992#comment-17514992
 ] 

Duo Zhang commented on HBASE-26245:
-----------------------------------

OK, so let me add this into the release note. Thank you for your test!

> Store region server list in master local region
> -----------------------------------------------
>
>                 Key: HBASE-26245
>                 URL: https://issues.apache.org/jira/browse/HBASE-26245
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: master, Zookeeper
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>
> Just a simple idea still need to be polished.
> For large clusters, ZooKeeper could be a bottle neck, there are some related 
> issues to not track region server list as much as possible, but what if we 
> want to do more, just do not register region server on zk?
> I think, zookeeper here is act as something like a service registry, we need 
> the list to get all the region servers, and also need to know the changes of 
> the list.
> But in fact, we could also kill region server from master side, so the latter 
> one could be done by a periodical heartbeat check daemon on master. And for 
> the former one, we could store the list in master local region, so when 
> master restart, it could setup the region server list by load from master 
> local region.
> And there are mainly two other side effect which are good:
> 1. We do not need to list the WAL directory on HDFS to find the previous 
> region servers, for scheduling SCP. This could make it possible to restart a 
> new HBase cluster based on only the root directory.
> 2. For now, region server needs to register to HMaster first, and then put 
> its node on ZooKeeper, if it fails between these two actions, there is no way 
> for HMaster to clean this dead server, as it never expires on ZooKeeper. 
> There should be a related issue. If we just do not store a node on zk, then 
> this problem is also gone.
> Of course, there will still be lots of other problems, like whether we need 
> another heartbeat call as reportForDuty is a bit heavy as we also report the 
> region list, etc.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to