[ https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027281#comment-13027281 ]
M. C. Srivas commented on HBASE-3777: ------------------------------------- bq. The thing is that a HConnection's behavior is determined not just by the server-side cluster it goes against, but also its client-side properties, such as "hbase.client.retries.number", "hbase.client.prefetch.limit", and so on. Ergo, we really need a different connection for every unique set of connection-specific config properties, whether it be client- or server-specific. I am beginning to understand the reasons behind taking this approach. Thanks for explaining. bq. As per the ZK/HBase use cases wiki, in theory we can have multiple masters registered with the ZK (to eliminate any SPOFs perhaps?). So, I'm not sure we can presuppose what hmaster we'll be going to at any given point in time. Even in the presence of multiple hmasters, does it really matter if we connect back to the same hmaster? It probably is important for the hmasters themselves which hmaster they connect to (and perhaps for region-servers as well). But it should not matter for clients. Agree? (of course, I am stating all this without knowing any details about Hbase, so don't kill me for it). bq. The whole purpose of this patch was to reduce the number of connections by reusing them to the extent possible. At one point, the config's equals method was treated as the key to the connection, which promoted reuse to some extent, but started breaking down if the config was changed after the fact. Currently, the config's identity (object reference) is treated as the key, but that suffers from connection overload. Hopefully, the HConnectionKey defined in the HCM will serve as a happy medium between the two ends of the spectrum. Ted Yu pointed out the work being done here, so I started reading the JIRA. I am not familiar with where/how the HConnection instance gets used, and this JIRA was pretty long to understand with the code changes and all. I started to comment on this Jira due to the problems we faced trying to scale up the YCSB benchmark. We tried to run about 500 threads in the YCSB HBase client, and ran out of connections to ZK. It was a complete, unexpected, surprise that the HBase client needed to maintain multiple connections to ZK, and it seemed to be using one per thread (ie, per HTable). We share the same goal: with this patch, we hope to be able to scale YCSB to 50 client machines, with 500 threads per client, and see how HBase holds up. Would you agree, that in the long run, the HBase client should use ZK only to find the hmaster and region-servers, but not keep the connection to ZK open? Otherwise ZK may go under as we try to scale the number of HBase clients. > Redefine Identity Of HBase Configuration > ---------------------------------------- > > Key: HBASE-3777 > URL: https://issues.apache.org/jira/browse/HBASE-3777 > Project: HBase > Issue Type: Improvement > Components: client, ipc > Affects Versions: 0.90.2 > Reporter: Karthick Sankarachary > Assignee: Karthick Sankarachary > Priority: Minor > Fix For: 0.92.0 > > Attachments: 3777-TOF.patch, HBASE-3777-V2.patch, > HBASE-3777-V3.patch, HBASE-3777-V4.patch, HBASE-3777-V6.patch, > HBASE-3777.patch > > > Judging from the javadoc in {{HConnectionManager}}, sharing connections > across multiple clients going to the same cluster is supposedly a good thing. > However, the fact that there is a one-to-one mapping between a configuration > and connection instance, kind of works against that goal. Specifically, when > you create {{HTable}} instances using a given {{Configuration}} instance and > a copy thereof, we end up with two distinct {{HConnection}} instances under > the covers. Is this really expected behavior, especially given that the > configuration instance gets cloned a lot? > Here, I'd like to play devil's advocate and propose that we "deep-compare" > {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} > instances that have the same properties map to the same {{HConnection}} > instance. In case one is "concerned that a single {{HConnection}} is > insufficient for sharing amongst clients", to quote the javadoc, then one > should be able to mark a given {{HBaseConfiguration}} instance as being > "uniquely identifiable". > Note that "sharing connections makes clean up of {{HConnection}} instances a > little awkward", unless of course, you apply the change described in > HBASE-3766. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira