[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027281#comment-13027281
 ] 

M. C. Srivas commented on HBASE-3777:
-------------------------------------

bq. The thing is that a HConnection's behavior is determined not just by the 
server-side cluster it goes against, but also its client-side properties, such 
as "hbase.client.retries.number", "hbase.client.prefetch.limit", and so on. 
Ergo, we really need a different connection for every unique set of 
connection-specific config properties, whether it be client- or server-specific.

I am beginning to understand the reasons behind taking this approach. Thanks 
for explaining.

bq. As per the ZK/HBase use cases wiki, in theory we can have multiple masters 
registered with the ZK (to eliminate any SPOFs perhaps?). So, I'm not sure we 
can presuppose what hmaster we'll be going to at any given point in time.

Even in the presence of multiple hmasters, does it really matter if we connect 
back to the same hmaster? It probably is important for the hmasters themselves 
which hmaster they connect to (and perhaps for region-servers as well). But it 
should not matter for clients. Agree?  (of course, I am stating all this 
without knowing any details about Hbase, so don't kill me for it).

bq. The whole purpose of this patch was to reduce the number of connections by 
reusing them to the extent possible. At one point, the config's equals method 
was treated as the key to the connection, which promoted reuse to some extent, 
but started breaking down if the config was changed after the fact. Currently, 
the config's identity (object reference) is treated as the key, but that 
suffers from connection overload. Hopefully, the HConnectionKey defined in the 
HCM will serve as a happy medium between the two ends of the spectrum.


Ted Yu pointed out the work being done here, so I started reading the JIRA. I 
am not familiar with where/how the HConnection instance gets used, and this 
JIRA was pretty long to understand with the code changes and all.

I started to comment on this Jira due to the problems we faced trying to scale 
up the YCSB benchmark. We tried to run about 500 threads in the YCSB HBase 
client, and ran out of connections to ZK. It was a complete, unexpected, 
surprise that the HBase client needed to maintain multiple connections to ZK, 
and it seemed to be using one per thread (ie, per HTable).

We share the same goal: with this patch, we hope to be able to scale YCSB to 50 
client machines, with 500 threads per client, and see how HBase holds up.

Would you agree, that in the long run, the HBase client should use ZK only to 
find the hmaster and region-servers, but not keep the connection to ZK open? 
Otherwise ZK may go under as we try to scale the number of HBase clients.


> Redefine Identity Of HBase Configuration
> ----------------------------------------
>
>                 Key: HBASE-3777
>                 URL: https://issues.apache.org/jira/browse/HBASE-3777
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, ipc
>    Affects Versions: 0.90.2
>            Reporter: Karthick Sankarachary
>            Assignee: Karthick Sankarachary
>            Priority: Minor
>             Fix For: 0.92.0
>
>         Attachments: 3777-TOF.patch, HBASE-3777-V2.patch, 
> HBASE-3777-V3.patch, HBASE-3777-V4.patch, HBASE-3777-V6.patch, 
> HBASE-3777.patch
>
>
> Judging from the javadoc in {{HConnectionManager}}, sharing connections 
> across multiple clients going to the same cluster is supposedly a good thing. 
> However, the fact that there is a one-to-one mapping between a configuration 
> and connection instance, kind of works against that goal. Specifically, when 
> you create {{HTable}} instances using a given {{Configuration}} instance and 
> a copy thereof, we end up with two distinct {{HConnection}} instances under 
> the covers. Is this really expected behavior, especially given that the 
> configuration instance gets cloned a lot?
> Here, I'd like to play devil's advocate and propose that we "deep-compare" 
> {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
> instances that have the same properties map to the same {{HConnection}} 
> instance. In case one is "concerned that a single {{HConnection}} is 
> insufficient for sharing amongst clients",  to quote the javadoc, then one 
> should be able to mark a given {{HBaseConfiguration}} instance as being 
> "uniquely identifiable".
> Note that "sharing connections makes clean up of {{HConnection}} instances a 
> little awkward", unless of course, you apply the change described in 
> HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to