[ https://issues.apache.org/jira/browse/HBASE-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396887#comment-17396887 ]
Michael Stack commented on HBASE-26149: --------------------------------------- The one-pager helped. Thanks. I put it here as the Jira description copying to sub-tasks description that was in this document bit missing from the sub-task JIRA's desciption. Hopefully makes it easier on others trying to follow-long whats going on here (Put some questions on the document for my own clarification). THanks. > Further improvements on ConnectionRegistry implementations > ---------------------------------------------------------- > > Key: HBASE-26149 > URL: https://issues.apache.org/jira/browse/HBASE-26149 > Project: HBase > Issue Type: Umbrella > Components: Client > Reporter: Duo Zhang > Priority: Major > > (Copied in-line from the attached 'Documentation' with some filler as > connecting script) > HBASE-23324 Deprecate clients that connect to Zookeeper > ^^^ This is always our goal, to remove the zookeeper dependency from the > client side. > > See the sub-task HBASE-25051 DIGEST based auth broken for MasterRegistry > When constructing RpcClient, we will pass the clusterid in, and it will be > used to select the authentication method. More specifically, it will be used > to select the tokens for digest based authentication, please see the code in > BuiltInProviderSelector. For ZKConnectionRegistry, we do not need to use > RpcClient to connect to zookeeper, so we could get the cluster id first, and > then create the RpcClient. But for MasterRegistry/RpcConnectionRegistry, we > need to use RpcClient to connect to the ClientMetaService endpoints and then > we can call the getClusterId method to get the cluster id. Because of this, > when creating RpcClient for MasterRegistry/RpcConnectionRegistry, we can only > pass null or the default cluster id, which means the digest based > authentication is broken. > This is a cyclic dependency problem. Maybe a possible way forward, is to make > getClusterId method available to all users, which means it does not require > any authentication, so we can always call getClusterId with simple > authentication, and then at client side, once we get the cluster id, we > create a new RpcClient to select the correct authentication way. > The work in the sub-task, HBASE-26150 Let region server also carry > ClientMetaService, is work to make it so the RegionServers can carry a > ConnectionRegistry (rather than have the Masters-only carry it as is the case > now). Adds a new method getBootstrapNodes to ClientMetaService, the > ConnectionRegistry proto Service, for refreshing the bootstrap nodes > periodically or on error. The new *RpcConnectionRegistry* [Created here but > defined in the next sub-task]will use this method to refresh the bootstrap > nodes, while the old MasterRegistry will use the getMasters method to refresh > the ‘bootstrap’ nodes. > The getBootstrapNodes method will return all the region servers, so after the > first refreshing, the client will go to region servers for later rpc calls. > But since masters and region servers both implement the ClientMetaService > interface, it is free for the client to configure master as the initial > bootstrap nodes. > The following sub-task then deprecates MasterRegistry, HBASE-26172 Deprecated > MasterRegistry > The implementation of MasterRegistry is almost the same with > RpcConnectionRegistry except that it uses getMasters instead of > getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could > add configs in server side to control what nodes we want to return to client > in getBootstrapNodes, i.e, master or region server, then the > RpcConnectionRegistry can fully replace the old MasterRegistry. Deprecates > the MasterRegistry. > Sub-task HBASE-26173 Return only a sub set of region servers as bootstrap > nodes > For a large cluster which may have thousands of region servers, it is not a > good idea to return all the region servers as bootstrap nodes to clients. So > we should add a config at server side to control the max number of bootstrap > nodes we want to return to clients. I think the default value could be 5 or > 10, which is enough. > Sub-task HBASE-26174 Make rpc connection registry the default registry on > 3.0.0 > Just a follow up of HBASE-26172. MasterRegistry has been deprecated, we > should not make it default for 3.0.0 any more. > Sub-task HBASE-26180 Introduce a initial refresh interval for > RpcConnectionRegistry > As end users could configure any nodes in a cluster as the initial bootstrap > nodes, it is possible that different end users will configure the same > machine which makes the machine over load. So we should have a shorter delay > for the initial refresh, to let users quickly switch to the bootstrap nodes > we want them to connect to. > Sub-task HBASE-26181 Region server and master could use itself as > ConnectionRegistry > This is an optimization to reduce the pressure on zookeeper. For > MasterRegistry, we do not want to use it as the ConnectionRegistry for our > cluster connection because: > // We use ZKConnectionRegistry for all the internal communication, > primarily for these reasons: > // - Decouples RS and master life cycles. RegionServers can continue be > up independent of > // masters' availability. > // - Configuration management for region servers (cluster internal) is > much simpler when adding > // new masters or removing existing masters, since only clients' config > needs to be updated. > // - We need to retain ZKConnectionRegistry for replication use anyway, > so we just extend it for > // other internal connections too. > The above comments are in our code, in the HRegionServer.cleanupConfiguration > method. > But since now, masters and regionservers both implement the ClientMetaService > interface, we are free to just let the ConnectionRegistry to make use of > these in memory information directly, instead of going to zookeeper again. > Sub-task HBASE-26182 Allow disabling refresh of connection registry endpoint > One possible deployment in production is to use something like a lvs in front > of all the region servers to act as a LB, so clients just need to connect to > the lvs IP instead of going to the region server directly to get registry > information. > For this scenario we do not need to refresh the endpoints any more. > The simplest way is to set the refresh interval to -1. -- This message was sent by Atlassian Jira (v8.3.4#803005)