HTable or HConnectionManager, how a client connect to HBase?
Hi, I am using HBase 0.98.6. I learned from this maillist before, that the recommended method to 'connect' to HBase from client is to use HConnectionManager like this: HConnection con=HConnectionManager.createConnection(configuration); HTableInterfacetable = con.getTable("hbase_table1"); Instead of HTableInterface table = new HTable(configuration, "hbase_table1"); I don't quite understand the reason. I was thinking that each time I initialize a HTable instance, it needs to create a new HConnection. And that is expensive. But using the first method, multiple HTable instances can share the same HConnection. That is quite reasonable to me. However, I was reading from some articles on internet that , even if I use the 'new HTable(conf, tbl)' method, if the 'conf' object is the same one, all the HTable instances will still share the same HConnection. I was recently read yet another article and said when using 'new HTable(conf, tbl)', one don't need to use the exactly same 'conf' object (same one in memory). if two 'conf' objects, two different objects are all the same, I mean all attributes of these two are same (for example, created from the same hbase-site.xml and never change) then HTable objects can still share the same HConnection. I also try to read the HTable src code, it is very hard, but it seems to me the last statement is correct: 'HTable will share HConnection, if configuration is all the same'. Sorry for so verbose. My question: If two 'configuration' objects are same, then two HTable object instantiated with them respectively can still share the same HConnection or not? Directly using the 'new HTable()' method. If the answer is 'yes', then why I still need the HConnectionManager to create a shared connection? I am talking about 0.98.6. I googled for days, and even try to read HBase src code, but still get really confused. I try to do some tests also, but since I am too newbie, I don't know how to verify the difference, I really don't know what a HConnection do under the hood. I counted the ZooKeeper client requests, and I found some difference. If this ZooKeeper requests difference is a correct metrics, it means to me that two HTable do not share HConnetion even using same 'configuration' in the constructor. So it confused me more and more Please someone kindly help me for this newbie question and thanks in advance. Thanks, Ming
RE: HTable or HConnectionManager, how a client connect to HBase?
Hi, I have to spend a lot of time to look into the source code of HTable, HConnectionManager. IMHO, it seems the document on hbase website is misleading. In the hbase online document : http://hbase.apache.org/book.html#architecture.client . It mentioned: == For example, this is preferred: HBaseConfiguration conf = HBaseConfiguration.create(); HTable table1 = new HTable(conf, "myTable"); HTable table2 = new HTable(conf, "myTable"); as opposed to this: HBaseConfiguration conf1 = HBaseConfiguration.create(); HTable table1 = new HTable(conf1, "myTable"); HBaseConfiguration conf2 = HBaseConfiguration.create(); HTable table2 = new HTable(conf2, "myTable"); === After I checking the src code , it seems only in 0.20 code, HTable must use the same Configuration instance in order to share the HConnection. 0.20 uses the configuration instance as the key for a hashmap to save HConnections. I check 0.90.0 code, it already use HConnectionKey as the key of the HashMap which save the shared HConnections. So as far as I understand, the document is NOT true for HBase later than 0.90 version. These two examples can both share HConnection instance. If I am wrong, please correct me. For my previous question. If two HTable already share the HConnection, why I need to create a HConnection first by HConnectionManager.createConnection()? By reading the src code, it seems the HTable.close() will also close the HConnection, so one table do a close, the following HTable have to reconnect, no shareing. But if the HTable is initiated by the HConnection.getTable(), it will use a special constructor of HTable to make sure when HTable.close() is invoked, it will NOT close the connection. So the HConnection can be shared. I will use the recommended method, and as discussed in another thread here, to share HConnection one still have to ensure the shared connection should not be closed. So the HConnectionManager is a good abstraction to control the life cycle of a connection. I seem to understand now :-) Thanks, Ming -Original Message- From: Liu, Ming (HPIT-GADSC) Sent: Saturday, February 14, 2015 10:45 PM To: user@hbase.apache.org Subject: HTable or HConnectionManager, how a client connect to HBase? Hi, I am using HBase 0.98.6. I learned from this maillist before, that the recommended method to 'connect' to HBase from client is to use HConnectionManager like this: HConnection con=HConnectionManager.createConnection(configuration); HTableInterfacetable = con.getTable("hbase_table1"); Instead of HTableInterface table = new HTable(configuration, "hbase_table1"); I don't quite understand the reason. I was thinking that each time I initialize a HTable instance, it needs to create a new HConnection. And that is expensive. But using the first method, multiple HTable instances can share the same HConnection. That is quite reasonable to me. However, I was reading from some articles on internet that , even if I use the 'new HTable(conf, tbl)' method, if the 'conf' object is the same one, all the HTable instances will still share the same HConnection. I was recently read yet another article and said when using 'new HTable(conf, tbl)', one don't need to use the exactly same 'conf' object (same one in memory). if two 'conf' objects, two different objects are all the same, I mean all attributes of these two are same (for example, created from the same hbase-site.xml and never change) then HTable objects can still share the same HConnection. I also try to read the HTable src code, it is very hard, but it seems to me the last statement is correct: 'HTable will share HConnection, if configuration is all the same'. Sorry for so verbose. My question: If two 'configuration' objects are same, then two HTable object instantiated with them respectively can still share the same HConnection or not? Directly using the 'new HTable()' method. If the answer is 'yes', then why I still need the HConnectionManager to create a shared connection? I am talking about 0.98.6. I googled for days, and even try to read HBase src code, but still get really confused. I try to do some tests also, but since I am too newbie, I don't know how to verify the difference, I really don't know what a HConnection do under the hood. I counted the ZooKeeper client requests, and I found some difference. If this ZooKeeper requests difference is a correct metrics, it means to me that two HTable do not share HConnetion even using same 'configuration' in the constructor. So it confused me more and more Please someone kindly help me for this newbie question and thanks in advance. Thanks, Ming
Re: HTable or HConnectionManager, how a client connect to HBase?
Hi, You are right that the constructor new HTable(Configuration, ..) will share the underlying connection if same configuration object is used. Connection is a heavy weight object, that holds the zookeeper connection, rpc client, socket connections to multiple region servers, master, and the thread pool, etc. You definitely do not want to create multiple connections per process unless you know what you are doing. The model is changed, and the old way of HTable(Configuration, ..) is deprecated because, we want to make the Connection lifecycle management explicit. In the new model, an opened Connection is closed by the user again, and light weight Table instances are obtained from the Connection. Having HTable's share their connections implicitly makes reasoning about it too hard. The new model should be pretty easy to follow. Enis On Sat, Feb 14, 2015 at 6:45 AM, Liu, Ming (HPIT-GADSC) wrote: > Hi, > > I am using HBase 0.98.6. > > I learned from this maillist before, that the recommended method to > 'connect' to HBase from client is to use HConnectionManager like this: > HConnection > con=HConnectionManager.createConnection(configuration); > HTableInterfacetable = > con.getTable("hbase_table1"); > Instead of > HTableInterface table = new > HTable(configuration, "hbase_table1"); > > I don't quite understand the reason. I was thinking that each time I > initialize a HTable instance, it needs to create a new HConnection. And > that is expensive. But using the first method, multiple HTable instances > can share the same HConnection. That is quite reasonable to me. > However, I was reading from some articles on internet that , even if I use > the 'new HTable(conf, tbl)' method, if the 'conf' object is the same one, > all the HTable instances will still share the same HConnection. I was > recently read yet another article and said when using 'new HTable(conf, > tbl)', one don't need to use the exactly same 'conf' object (same one in > memory). if two 'conf' objects, two different objects are all the same, I > mean all attributes of these two are same (for example, created from the > same hbase-site.xml and never change) then HTable objects can still share > the same HConnection. I also try to read the HTable src code, it is very > hard, but it seems to me the last statement is correct: 'HTable will share > HConnection, if configuration is all the same'. > > Sorry for so verbose. My question: > If two 'configuration' objects are same, then two HTable object > instantiated with them respectively can still share the same HConnection or > not? Directly using the 'new HTable()' method. > If the answer is 'yes', then why I still need the HConnectionManager to > create a shared connection? > I am talking about 0.98.6. > I googled for days, and even try to read HBase src code, but still get > really confused. I try to do some tests also, but since I am too newbie, I > don't know how to verify the difference, I really don't know what a > HConnection do under the hood. I counted the ZooKeeper client requests, and > I found some difference. If this ZooKeeper requests difference is a correct > metrics, it means to me that two HTable do not share HConnetion even using > same 'configuration' in the constructor. So it confused me more and more > > Please someone kindly help me for this newbie question and thanks in > advance. > > Thanks, > Ming > > >
Re: HTable or HConnectionManager, how a client connect to HBase?
Hi, Enis Söztutar You've wrote: >>You are right that the constructor new HTable(Configuration, ..) will share the underlying connection if same configuration object is used. What do it mean "the same"? is equality checked using reference (java == ) or using equals(Object other) method? 2015-02-18 7:34 GMT+03:00 Enis Söztutar : > Hi, > > You are right that the constructor new HTable(Configuration, ..) will share > the underlying connection if same configuration object is used. Connection > is a heavy weight object, that holds the zookeeper connection, rpc client, > socket connections to multiple region servers, master, and the thread pool, > etc. You definitely do not want to create multiple connections per process > unless you know what you are doing. > > The model is changed, and the old way of HTable(Configuration, ..) is > deprecated because, we want to make the Connection lifecycle management > explicit. In the new model, an opened Connection is closed by the user > again, and light weight Table instances are obtained from the Connection. > Having HTable's share their connections implicitly makes reasoning about it > too hard. The new model should be pretty easy to follow. > > Enis > > On Sat, Feb 14, 2015 at 6:45 AM, Liu, Ming (HPIT-GADSC) > wrote: > > > Hi, > > > > I am using HBase 0.98.6. > > > > I learned from this maillist before, that the recommended method to > > 'connect' to HBase from client is to use HConnectionManager like this: > > HConnection > > con=HConnectionManager.createConnection(configuration); > > HTableInterfacetable = > > con.getTable("hbase_table1"); > > Instead of > > HTableInterface table = new > > HTable(configuration, "hbase_table1"); > > > > I don't quite understand the reason. I was thinking that each time I > > initialize a HTable instance, it needs to create a new HConnection. And > > that is expensive. But using the first method, multiple HTable instances > > can share the same HConnection. That is quite reasonable to me. > > However, I was reading from some articles on internet that , even if I > use > > the 'new HTable(conf, tbl)' method, if the 'conf' object is the same one, > > all the HTable instances will still share the same HConnection. I was > > recently read yet another article and said when using 'new HTable(conf, > > tbl)', one don't need to use the exactly same 'conf' object (same one in > > memory). if two 'conf' objects, two different objects are all the same, I > > mean all attributes of these two are same (for example, created from the > > same hbase-site.xml and never change) then HTable objects can still share > > the same HConnection. I also try to read the HTable src code, it is very > > hard, but it seems to me the last statement is correct: 'HTable will > share > > HConnection, if configuration is all the same'. > > > > Sorry for so verbose. My question: > > If two 'configuration' objects are same, then two HTable object > > instantiated with them respectively can still share the same HConnection > or > > not? Directly using the 'new HTable()' method. > > If the answer is 'yes', then why I still need the HConnectionManager to > > create a shared connection? > > I am talking about 0.98.6. > > I googled for days, and even try to read HBase src code, but still get > > really confused. I try to do some tests also, but since I am too newbie, > I > > don't know how to verify the difference, I really don't know what a > > HConnection do under the hood. I counted the ZooKeeper client requests, > and > > I found some difference. If this ZooKeeper requests difference is a > correct > > metrics, it means to me that two HTable do not share HConnetion even > using > > same 'configuration' in the constructor. So it confused me more and > more > > > > Please someone kindly help me for this newbie question and thanks in > > advance. > > > > Thanks, > > Ming > > > > > > >
Re: HTable or HConnectionManager, how a client connect to HBase?
It is a bit more complex than that. It is actually a hash of some subset of the configuration properties. See HConnectionKey class if you want to learn more. But the important thing is that with the new style, you do not need to worry anything about these since there is no implicit connection sharing. Everything is explicit now. Enis On Tue, Feb 17, 2015 at 11:50 PM, Serega Sheypak wrote: > Hi, Enis Söztutar > You've wrote: > >>You are right that the constructor new HTable(Configuration, ..) will > share the underlying connection if same configuration object is used. > > What do it mean "the same"? is equality checked using reference (java == ) > or using equals(Object other) method? > > > 2015-02-18 7:34 GMT+03:00 Enis Söztutar : > > > Hi, > > > > You are right that the constructor new HTable(Configuration, ..) will > share > > the underlying connection if same configuration object is used. > Connection > > is a heavy weight object, that holds the zookeeper connection, rpc > client, > > socket connections to multiple region servers, master, and the thread > pool, > > etc. You definitely do not want to create multiple connections per > process > > unless you know what you are doing. > > > > The model is changed, and the old way of HTable(Configuration, ..) is > > deprecated because, we want to make the Connection lifecycle management > > explicit. In the new model, an opened Connection is closed by the user > > again, and light weight Table instances are obtained from the Connection. > > Having HTable's share their connections implicitly makes reasoning about > it > > too hard. The new model should be pretty easy to follow. > > > > Enis > > > > On Sat, Feb 14, 2015 at 6:45 AM, Liu, Ming (HPIT-GADSC) < > ming.l...@hp.com> > > wrote: > > > > > Hi, > > > > > > I am using HBase 0.98.6. > > > > > > I learned from this maillist before, that the recommended method to > > > 'connect' to HBase from client is to use HConnectionManager like this: > > > HConnection > > > con=HConnectionManager.createConnection(configuration); > > > HTableInterfacetable = > > > con.getTable("hbase_table1"); > > > Instead of > > > HTableInterface table = new > > > HTable(configuration, "hbase_table1"); > > > > > > I don't quite understand the reason. I was thinking that each time I > > > initialize a HTable instance, it needs to create a new HConnection. And > > > that is expensive. But using the first method, multiple HTable > instances > > > can share the same HConnection. That is quite reasonable to me. > > > However, I was reading from some articles on internet that , even if I > > use > > > the 'new HTable(conf, tbl)' method, if the 'conf' object is the same > one, > > > all the HTable instances will still share the same HConnection. I was > > > recently read yet another article and said when using 'new HTable(conf, > > > tbl)', one don't need to use the exactly same 'conf' object (same one > in > > > memory). if two 'conf' objects, two different objects are all the > same, I > > > mean all attributes of these two are same (for example, created from > the > > > same hbase-site.xml and never change) then HTable objects can still > share > > > the same HConnection. I also try to read the HTable src code, it is > very > > > hard, but it seems to me the last statement is correct: 'HTable will > > share > > > HConnection, if configuration is all the same'. > > > > > > Sorry for so verbose. My question: > > > If two 'configuration' objects are same, then two HTable object > > > instantiated with them respectively can still share the same > HConnection > > or > > > not? Directly using the 'new HTable()' method. > > > If the answer is 'yes', then why I still need the HConnectionManager to > > > create a shared connection? > > > I am talking about 0.98.6. > > > I googled for days, and even try to read HBase src code, but still get > > > really confused. I try to do some tests also, but since I am too > newbie, > > I > > > don't know how to verify the difference, I really don't know what a > > > HConnection do under the hood. I counted the ZooKeeper client requests, > > and > > > I found some difference. If this ZooKeeper requests difference is a > > correct > > > metrics, it means to me that two HTable do not share HConnetion even > > using > > > same 'configuration' in the constructor. So it confused me more and > > more > > > > > > Please someone kindly help me for this newbie question and thanks in > > > advance. > > > > > > Thanks, > > > Ming > > > > > > > > > > > >
RE: HTable or HConnectionManager, how a client connect to HBase?
Thanks, Enis, Your reply is very clear, I finally understand it now. Best Regards, Ming -Original Message- From: Enis Söztutar [mailto:enis@gmail.com] Sent: Thursday, February 19, 2015 10:41 AM To: hbase-user Subject: Re: HTable or HConnectionManager, how a client connect to HBase? It is a bit more complex than that. It is actually a hash of some subset of the configuration properties. See HConnectionKey class if you want to learn more. But the important thing is that with the new style, you do not need to worry anything about these since there is no implicit connection sharing. Everything is explicit now. Enis On Tue, Feb 17, 2015 at 11:50 PM, Serega Sheypak wrote: > Hi, Enis Söztutar > You've wrote: > >>You are right that the constructor new HTable(Configuration, ..) > >>will > share the underlying connection if same configuration object is used. > > What do it mean "the same"? is equality checked using reference (java > == ) or using equals(Object other) method? > > > 2015-02-18 7:34 GMT+03:00 Enis Söztutar : > > > Hi, > > > > You are right that the constructor new HTable(Configuration, ..) > > will > share > > the underlying connection if same configuration object is used. > Connection > > is a heavy weight object, that holds the zookeeper connection, rpc > client, > > socket connections to multiple region servers, master, and the > > thread > pool, > > etc. You definitely do not want to create multiple connections per > process > > unless you know what you are doing. > > > > The model is changed, and the old way of HTable(Configuration, ..) > > is deprecated because, we want to make the Connection lifecycle > > management explicit. In the new model, an opened Connection is > > closed by the user again, and light weight Table instances are obtained > > from the Connection. > > Having HTable's share their connections implicitly makes reasoning > > about > it > > too hard. The new model should be pretty easy to follow. > > > > Enis > > > > On Sat, Feb 14, 2015 at 6:45 AM, Liu, Ming (HPIT-GADSC) < > ming.l...@hp.com> > > wrote: > > > > > Hi, > > > > > > I am using HBase 0.98.6. > > > > > > I learned from this maillist before, that the recommended method > > > to 'connect' to HBase from client is to use HConnectionManager like this: > > > HConnection > > > con=HConnectionManager.createConnection(configuration); > > > HTableInterfacetable = > > > con.getTable("hbase_table1"); Instead of > > > HTableInterface table = new > > > HTable(configuration, "hbase_table1"); > > > > > > I don't quite understand the reason. I was thinking that each time > > > I initialize a HTable instance, it needs to create a new > > > HConnection. And that is expensive. But using the first method, > > > multiple HTable > instances > > > can share the same HConnection. That is quite reasonable to me. > > > However, I was reading from some articles on internet that , even > > > if I > > use > > > the 'new HTable(conf, tbl)' method, if the 'conf' object is the > > > same > one, > > > all the HTable instances will still share the same HConnection. I > > > was recently read yet another article and said when using 'new > > > HTable(conf, tbl)', one don't need to use the exactly same 'conf' > > > object (same one > in > > > memory). if two 'conf' objects, two different objects are all the > same, I > > > mean all attributes of these two are same (for example, created > > > from > the > > > same hbase-site.xml and never change) then HTable objects can > > > still > share > > > the same HConnection. I also try to read the HTable src code, it > > > is > very > > > hard, but it seems to me the last statement is correct: 'HTable > > > will > > share > > > HConnection, if configuration is all the same'. > > > > > > Sorry for so verbose. My question: > > > If two 'configuration' objects are same, then two HTable object > > > instantiated with them respectively can still share the same > HConnection > > or > > > not? Directly using the 'new HTable()' method. > > > If the answer is 'yes', then why I still need the