[ https://issues.apache.org/jira/browse/HBASE-26245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514493#comment-17514493 ]
LiangJun He commented on HBASE-26245: ------------------------------------- I have a related problem [HBASE-26898|https://issues.apache.org/jira/browse/HBASE-26898] , I will test with this patch, verified if it solves my problem. > Store region server list in master local region > ----------------------------------------------- > > Key: HBASE-26245 > URL: https://issues.apache.org/jira/browse/HBASE-26245 > Project: HBase > Issue Type: Brainstorming > Components: master, Zookeeper > Reporter: Duo Zhang > Assignee: Duo Zhang > Priority: Major > > Just a simple idea still need to be polished. > For large clusters, ZooKeeper could be a bottle neck, there are some related > issues to not track region server list as much as possible, but what if we > want to do more, just do not register region server on zk? > I think, zookeeper here is act as something like a service registry, we need > the list to get all the region servers, and also need to know the changes of > the list. > But in fact, we could also kill region server from master side, so the latter > one could be done by a periodical heartbeat check daemon on master. And for > the former one, we could store the list in master local region, so when > master restart, it could setup the region server list by load from master > local region. > And there are mainly two other side effect which are good: > 1. We do not need to list the WAL directory on HDFS to find the previous > region servers, for scheduling SCP. This could make it possible to restart a > new HBase cluster based on only the root directory. > 2. For now, region server needs to register to HMaster first, and then put > its node on ZooKeeper, if it fails between these two actions, there is no way > for HMaster to clean this dead server, as it never expires on ZooKeeper. > There should be a related issue. If we just do not store a node on zk, then > this problem is also gone. > Of course, there will still be lots of other problems, like whether we need > another heartbeat call as reportForDuty is a bit heavy as we also report the > region list, etc. -- This message was sent by Atlassian Jira (v8.20.1#820001)