[ 
https://issues.apache.org/jira/browse/HBASE-26149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-26149:
----------------------------------
    Description: 
(Copied in-line from the attached 'Documentation' with some filler as 
connecting script)

HBASE-23324 Deprecate clients that connect to Zookeeper

^^^ This is always our goal, to remove the zookeeper dependency from the client 
side.

 

See the sub-task HBASE-25051 DIGEST based auth broken for MasterRegistry

When constructing RpcClient, we will pass the clusterid in, and it will be used 
to select the authentication method. More specifically, it will be used to 
select the tokens for digest based authentication, please see the code in 
BuiltInProviderSelector. For ZKConnectionRegistry, we do not need to use 
RpcClient to connect to zookeeper, so we could get the cluster id first, and 
then create the RpcClient. But for MasterRegistry/RpcConnectionRegistry, we 
need to use RpcClient to connect to the ClientMetaService endpoints and then we 
can call the getClusterId method to get the cluster id. Because of this, when 
creating RpcClient for MasterRegistry/RpcConnectionRegistry, we can only pass 
null or the default cluster id, which means the digest based authentication is 
broken.

This is a cyclic dependency problem. Maybe a possible way forward, is to make 
getClusterId method available to all users, which means it does not require any 
authentication, so we can always call getClusterId with simple authentication, 
and then at client side, once we get the cluster id, we create a new RpcClient 
to select the correct authentication way.

The work in the sub-task, HBASE-26150 Let region server also carry 
ClientMetaService, is work to make it so the RegionServers can carry a 
ConnectionRegistry (rather than have the Masters-only carry it as is the case 
now). Adds a new method getBootstrapNodes to ClientMetaService, the 
ConnectionRegistry proto Service, for refreshing the bootstrap nodes 
periodically or on error. The new *RpcConnectionRegistry*  [Created here but 
defined in the next sub-task]will use this method to refresh the bootstrap 
nodes, while the old MasterRegistry will use the getMasters method to refresh 
the ‘bootstrap’ nodes.

The getBootstrapNodes method will return all the region servers, so after the 
first refreshing, the client will go to region servers for later rpc calls. But 
since masters and region servers both implement the ClientMetaService 
interface, it is free for the client to configure master as the initial 
bootstrap nodes.

The following sub-task then deprecates MasterRegistry, HBASE-26172 Deprecated 
MasterRegistry

The implementation of MasterRegistry is almost the same with 
RpcConnectionRegistry except that it uses getMasters instead of 
getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could 
add configs in server side to control what nodes we want to return to client in 
getBootstrapNodes, i.e, master or region server, then the RpcConnectionRegistry 
can fully replace the old MasterRegistry. Deprecates the MasterRegistry.

Sub-task HBASE-26173 Return only a sub set of region servers as bootstrap nodes

For a large cluster which may have thousands of region servers, it is not a 
good idea to return all the region servers as bootstrap nodes to clients. So we 
should add a config at server side to control the max number of bootstrap nodes 
we want to return to clients. I think the default value could be 5 or 10, which 
is enough.

Sub-task HBASE-26174 Make rpc connection registry the default registry on 3.0.0

Just a follow up of HBASE-26172. MasterRegistry has been deprecated, we should 
not make it default for 3.0.0 any more.

Sub-task HBASE-26180 Introduce a initial refresh interval for 
RpcConnectionRegistry

As end users could configure any nodes in a cluster as the initial bootstrap 
nodes, it is possible that different end users will configure the same machine 
which makes the machine over load. So we should have a shorter delay for the 
initial refresh, to let users quickly switch to the bootstrap nodes we want 
them to connect to.

Sub-task HBASE-26181 Region server and master could use itself as 
ConnectionRegistry

This is an optimization to reduce the pressure on zookeeper. For 
MasterRegistry, we do not want to use it as the ConnectionRegistry for our 
cluster connection because:

    // We use ZKConnectionRegistry for all the internal communication, 
primarily for these reasons:

    // - Decouples RS and master life cycles. RegionServers can continue be up 
independent of

    //   masters' availability.

    // - Configuration management for region servers (cluster internal) is much 
simpler when adding

    //   new masters or removing existing masters, since only clients' config 
needs to be updated.

    // - We need to retain ZKConnectionRegistry for replication use anyway, so 
we just extend it for

    //   other internal connections too.

The above comments are in our code, in the HRegionServer.cleanupConfiguration 
method.

But since now, masters and regionservers both implement the ClientMetaService 
interface, we are free to just let the ConnectionRegistry to make use of these 
in memory information directly, instead of going to zookeeper again.

Sub-task HBASE-26182 Allow disabling refresh of connection registry endpoint

One possible deployment in production is to use something like a lvs in front 
of all the region servers to act as a LB, so clients just need to connect to 
the lvs IP instead of going to the region server directly to get registry 
information.

For this scenario we do not need to refresh the endpoints any more.

The simplest way is to set the refresh interval to -1.

  was:
(Copied in-line from the attached 'Documentation' with some filler as 
connecting script)

HBASE-23324 Deprecate clients that connect to Zookeeper

^^^ This is always our goal, to remove the zookeeper dependency from the client 
side.

 

See the sub-task HBASE-25051 DIGEST based auth broken for MasterRegistry

When constructing RpcClient, we will pass the clusterid in, and it will be used 
to select the authentication method. More specifically, it will be used to 
select the tokens for digest based authentication, please see the code in 
BuiltInProviderSelector. For ZKConnectionRegistry, we do not need to use 
RpcClient to connect to zookeeper, so we could get the cluster id first, and 
then create the RpcClient. But for MasterRegistry/RpcConnectionRegistry, we 
need to use RpcClient to connect to the ClientMetaService endpoints and then we 
can call the getClusterId method to get the cluster id. Because of this, when 
creating RpcClient for MasterRegistry/RpcConnectionRegistry, we can only pass 
null or the default cluster id, which means the digest based authentication is 
broken.

This is a cyclic dependency problem. Maybe a possible way forward, is to make 
getClusterId method available to all users, which means it does not require any 
authentication, so we can always call getClusterId with simple authentication, 
and then at client side, once we get the cluster id, we create a new RpcClient 
to select the correct authentication way.

The work in the sub-task, HBASE-26150 Let region server also carry 
ClientMetaService, is work to make it so the RegionServers can carry a 
ConnectionRegistry (rather than have the Masters-only carry it as is the case 
now). Adds a new method getBootstrapNodes to ClientMetaService, the 
ConnectionRegistry proto Service, for refreshing the bootstrap nodes 
periodically or on error. The new *RpcConnectionRegistry*  [Created here but 
defined in the next sub-task]will use this method to refresh the bootstrap 
nodes, while the old MasterRegistry will use the getMasters method to refresh 
the ‘bootstrap’ nodes.

The getBootstrapNodes method will return all the region servers, so after the 
first refreshing, the client will go to region servers for later rpc calls. But 
since masters and region servers both implement the ClientMetaService 
interface, it is free for the client to configure master as the initial 
bootstrap nodes.

HBASE-26172 Deprecated MasterRegistry and allow getBootstrapNodes to return 
master address instead of region server

The implementation of MasterRegistry is almost the same with 
RpcConnectionRegistry except that it uses getMasters instead of 
getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could 
add configs in server side to control what nodes we want to return to client in 
getBootstrapNodes, i.e, master or region server, then the RpcConnectionRegistry 
can fully replace the old MasterRegistry. So after this change, we could 
deprecate the MasterRegistry.
h1. HBASE-26173 Return only a sub set of region servers as bootstrap nodes

For a large cluster which may have thousands of region servers, it is not a 
good idea to return all the region servers as bootstrap nodes to clients. So we 
should add a config at server side to control the max number of bootstrap nodes 
we want to return to clients. I think the default value could be 5 or 10, which 
is enough.
h1. HBASE-26174 Make rpc connection registry the default registry on 3.0.0

Just a follow up of HBASE-26172. MasterRegistry has been deprecated, we should 
not make it default for 3.0.0 any more.
h1. HBASE-26180 Introduce a initial refresh interval for RpcConnectionRegistry

As end users could configure any nodes in a cluster as the initial bootstrap 
nodes, it is possible that different end users will configure the same machine 
which makes the machine over load. So we should have a shorter delay for the 
initial refresh, to let users quickly switch to the bootstrap nodes we want 
them to connect to.
h1. HBASE-26181 Region server and master could use itself as ConnectionRegistry

This is an optimization to reduce the pressure on zookeeper. For 
MasterRegistry, we do not want to use it as the ConnectionRegistry for our 
cluster connection because:

    // We use ZKConnectionRegistry for all the internal communication, 
primarily for these reasons:

    // - Decouples RS and master life cycles. RegionServers can continue be up 
independent of

    //   masters' availability.

    // - Configuration management for region servers (cluster internal) is much 
simpler when adding

    //   new masters or removing existing masters, since only clients' config 
needs to be updated.

    // - We need to retain ZKConnectionRegistry for replication use anyway, so 
we just extend it for

    //   other internal connections too.

The above comments are in our code, in the HRegionServer.cleanupConfiguration 
method.

But since now, masters and regionservers both implement the ClientMetaService 
interface, we are free to just let the ConnectionRegistry to make use of these 
in memory information directly, instead of going to zookeeper again.
h1. HBASE-26182 Allow disabling refresh of connection registry endpoint

One possible deployment in production is to use something like a lvs in front 
of all the region servers to act as a LB, so clients just need to connect to 
the lvs IP instead of going to the region server directly to get registry 
information.

For this scenario we do not need to refresh the endpoints any more.

The simplest way is to set the refresh interval to -1.


> Further improvements on ConnectionRegistry implementations
> ----------------------------------------------------------
>
>                 Key: HBASE-26149
>                 URL: https://issues.apache.org/jira/browse/HBASE-26149
>             Project: HBase
>          Issue Type: Umbrella
>          Components: Client
>            Reporter: Duo Zhang
>            Priority: Major
>
> (Copied in-line from the attached 'Documentation' with some filler as 
> connecting script)
> HBASE-23324 Deprecate clients that connect to Zookeeper
> ^^^ This is always our goal, to remove the zookeeper dependency from the 
> client side.
>  
> See the sub-task HBASE-25051 DIGEST based auth broken for MasterRegistry
> When constructing RpcClient, we will pass the clusterid in, and it will be 
> used to select the authentication method. More specifically, it will be used 
> to select the tokens for digest based authentication, please see the code in 
> BuiltInProviderSelector. For ZKConnectionRegistry, we do not need to use 
> RpcClient to connect to zookeeper, so we could get the cluster id first, and 
> then create the RpcClient. But for MasterRegistry/RpcConnectionRegistry, we 
> need to use RpcClient to connect to the ClientMetaService endpoints and then 
> we can call the getClusterId method to get the cluster id. Because of this, 
> when creating RpcClient for MasterRegistry/RpcConnectionRegistry, we can only 
> pass null or the default cluster id, which means the digest based 
> authentication is broken.
> This is a cyclic dependency problem. Maybe a possible way forward, is to make 
> getClusterId method available to all users, which means it does not require 
> any authentication, so we can always call getClusterId with simple 
> authentication, and then at client side, once we get the cluster id, we 
> create a new RpcClient to select the correct authentication way.
> The work in the sub-task, HBASE-26150 Let region server also carry 
> ClientMetaService, is work to make it so the RegionServers can carry a 
> ConnectionRegistry (rather than have the Masters-only carry it as is the case 
> now). Adds a new method getBootstrapNodes to ClientMetaService, the 
> ConnectionRegistry proto Service, for refreshing the bootstrap nodes 
> periodically or on error. The new *RpcConnectionRegistry*  [Created here but 
> defined in the next sub-task]will use this method to refresh the bootstrap 
> nodes, while the old MasterRegistry will use the getMasters method to refresh 
> the ‘bootstrap’ nodes.
> The getBootstrapNodes method will return all the region servers, so after the 
> first refreshing, the client will go to region servers for later rpc calls. 
> But since masters and region servers both implement the ClientMetaService 
> interface, it is free for the client to configure master as the initial 
> bootstrap nodes.
> The following sub-task then deprecates MasterRegistry, HBASE-26172 Deprecated 
> MasterRegistry
> The implementation of MasterRegistry is almost the same with 
> RpcConnectionRegistry except that it uses getMasters instead of 
> getBootstrapNodes to refresh the ‘bootstrap’ nodes connected to. So we could 
> add configs in server side to control what nodes we want to return to client 
> in getBootstrapNodes, i.e, master or region server, then the 
> RpcConnectionRegistry can fully replace the old MasterRegistry. Deprecates 
> the MasterRegistry.
> Sub-task HBASE-26173 Return only a sub set of region servers as bootstrap 
> nodes
> For a large cluster which may have thousands of region servers, it is not a 
> good idea to return all the region servers as bootstrap nodes to clients. So 
> we should add a config at server side to control the max number of bootstrap 
> nodes we want to return to clients. I think the default value could be 5 or 
> 10, which is enough.
> Sub-task HBASE-26174 Make rpc connection registry the default registry on 
> 3.0.0
> Just a follow up of HBASE-26172. MasterRegistry has been deprecated, we 
> should not make it default for 3.0.0 any more.
> Sub-task HBASE-26180 Introduce a initial refresh interval for 
> RpcConnectionRegistry
> As end users could configure any nodes in a cluster as the initial bootstrap 
> nodes, it is possible that different end users will configure the same 
> machine which makes the machine over load. So we should have a shorter delay 
> for the initial refresh, to let users quickly switch to the bootstrap nodes 
> we want them to connect to.
> Sub-task HBASE-26181 Region server and master could use itself as 
> ConnectionRegistry
> This is an optimization to reduce the pressure on zookeeper. For 
> MasterRegistry, we do not want to use it as the ConnectionRegistry for our 
> cluster connection because:
>     // We use ZKConnectionRegistry for all the internal communication, 
> primarily for these reasons:
>     // - Decouples RS and master life cycles. RegionServers can continue be 
> up independent of
>     //   masters' availability.
>     // - Configuration management for region servers (cluster internal) is 
> much simpler when adding
>     //   new masters or removing existing masters, since only clients' config 
> needs to be updated.
>     // - We need to retain ZKConnectionRegistry for replication use anyway, 
> so we just extend it for
>     //   other internal connections too.
> The above comments are in our code, in the HRegionServer.cleanupConfiguration 
> method.
> But since now, masters and regionservers both implement the ClientMetaService 
> interface, we are free to just let the ConnectionRegistry to make use of 
> these in memory information directly, instead of going to zookeeper again.
> Sub-task HBASE-26182 Allow disabling refresh of connection registry endpoint
> One possible deployment in production is to use something like a lvs in front 
> of all the region servers to act as a LB, so clients just need to connect to 
> the lvs IP instead of going to the region server directly to get registry 
> information.
> For this scenario we do not need to refresh the endpoints any more.
> The simplest way is to set the refresh interval to -1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to