[ https://issues.apache.org/jira/browse/HBASE-18095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962712#comment-16962712 ]
Bharath Vissapragada commented on HBASE-18095: ---------------------------------------------- I have a prototype patch that seems to be working for me locally (basic sanity testing). So I'm assigning this jira to myself. I propose to split the work into 3 sub-tasks so that it is easy to review and (hopefully) easy to backport. 1. Server side changes: Implement an in-memory cache in the HMasters (both active and standby) that caches the HRLs of all the meta replicas. We also need to install listeners to track changes to the meta znodes to invalidate the cache when meta state changes. Master (both active and standby) now exposes new RPC calls that let clients fetch this information. 2. Client side changes: Implement a new AsyncRegistry replacing ZkAsyncRegistry that parses the list of static master addresses from client local hbase-site.xml and makes use of RPCs exposed from step (1) to determine which of the listed masters is active and fetch the meta locations (tracked by all the masters). We can pick a random master to fetch the meta locations to avoid hot spotting of the active master. 3. Figure out a way to plumb auth-less ClusterID for clients using delegation tokens. I have a working patch for (1) and (2) is almost done (need to clean up and add some test coverage). I still need to look into (3) and understand the problem better. Coming to the backports, I think this should be doable on 2.x but could turn out to be tricky on 1.x (probably needs to be redone totally) because the code diverged quite a bit (no AsyncClient stuff in 1.x AFAICT). [~apurtell] / others. Thoughts? If the approach seems reasonable I can put up my patches for review to get some initial feedback. > Provide an option for clients to find the server hosting META that does not > involve the ZooKeeper client > -------------------------------------------------------------------------------------------------------- > > Key: HBASE-18095 > URL: https://issues.apache.org/jira/browse/HBASE-18095 > Project: HBase > Issue Type: New Feature > Components: Client > Reporter: Andrew Kyle Purtell > Priority: Major > > Clients are required to connect to ZooKeeper to find the location of the > regionserver hosting the meta table region. Site configuration provides the > client a list of ZK quorum peers and the client uses an embedded ZK client to > query meta location. Timeouts and retry behavior of this embedded ZK client > are managed orthogonally to HBase layer settings and in some cases the ZK > cannot manage what in theory the HBase client can, i.e. fail fast upon outage > or network partition. > We should consider new configuration settings that provide a list of > well-known master and backup master locations, and with this information the > client can contact any of the master processes directly. Any master in either > active or passive state will track meta location and respond to requests for > it with its cached last known location. If this location is stale, the client > can ask again with a flag set that requests the master refresh its location > cache and return the up-to-date location. Every client interaction with the > cluster thus uses only HBase RPC as transport, with appropriate settings > applied to the connection. The configuration toggle that enables this > alternative meta location lookup should be false by default. > This removes the requirement that HBase clients embed the ZK client and > contact the ZK service directly at the beginning of the connection lifecycle. > This has several benefits. ZK service need not be exposed to clients, and > their potential abuse, yet no benefit ZK provides the HBase server cluster is > compromised. Normalizing HBase client and ZK client timeout settings and > retry behavior - in some cases, impossible, i.e. for fail-fast - is no longer > necessary. > And, from [~ghelmling]: There is an additional complication here for > token-based authentication. When a delegation token is used for SASL > authentication, the client uses the cluster ID obtained from Zookeeper to > select the token identifier to use. So there would also need to be some > Zookeeper-less, unauthenticated way to obtain the cluster ID as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)