Cut the link between the client and the zookeeper ensemble
----------------------------------------------------------

                 Key: HBASE-5399
                 URL: https://issues.apache.org/jira/browse/HBASE-5399
             Project: HBase
          Issue Type: Improvement
          Components: client
    Affects Versions: 0.94.0
         Environment: all
            Reporter: nkeywal
            Assignee: nkeywal
            Priority: Minor


The link is often considered as an issue, for various reasons. One of them 
being that there is a limit on the number of connection that ZK can manage. 
Stack was suggesting as well to remove the link to master from HConnection.

There are choices to be made considering the existing API (that we don't want 
to break).

The first patches I will submit on hadoop-qa should not be committed: they are 
here to show the progress on the direction taken.


ZooKeeper is used for:
- public getter, to let the client do whatever he wants, and close ZooKeeper 
when closing the connection => we have to deprecate this but keep it.
- read get master address to create a master => now done with a temporary 
zookeeper connection
- read root location => now done with a temporary zookeeper connection, but 
questionable. Used in public function "locateRegion". To be reworked.
- read cluster id => now done once with a temporary zookeeper connection.
- check if base done is available => now done once with a zookeeper connection 
given as a parameter
- isTableDisabled/isTableAvailable => public functions, now done with a 
temporary zookeeper connection.
     - Called internally from HBaseAdmin and HTable
- getCurrentNrHRS(): public function to get the number of region servers and 
create a pool of thread => now done with a temporary zookeeper connection
-

Master is used for:
- getMaster public getter, as for ZooKeeper => we have to deprecate this but 
keep it.
- isMasterRunning(): public function, used internally by HMerge & HBaseAdmin
- getHTableDescriptor*: public functions offering access to the master.  => we 
could make them using a temporary master connection as well.

Main points are:
- hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly 
coupled architecture ;-). This can be changed, but requires a lot of 
modifications in these classes (likely adding a class in the middle of the 
hierarchy, something like that). Anyway, non connected client will always be 
really slower, because it's a tcp connection, and establishing a tcp connection 
is slow.
- having a link between ZK and all the client seems to make sense for some Use 
Cases. However, it won't scale if a TCP connection is required for every client
- if we move the table descriptor part away from the client, we need to find a 
new place for it.
- we will have the same issue if HBaseAdmin (for both ZK & Master), may be we 
can put a timeout on the connection. That would make the whole system less 
deterministic however.




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to