HTable or HConnectionManager, how a client connect to HBase?

2015-02-14 Thread Liu, Ming (HPIT-GADSC)
Hi,

I am using HBase 0.98.6.

I learned from this maillist before, that the recommended method to 'connect' 
to HBase from client is to use HConnectionManager like this:
HConnection 
con=HConnectionManager.createConnection(configuration);
HTableInterfacetable = 
con.getTable("hbase_table1");
Instead of
HTableInterface table = new 
HTable(configuration, "hbase_table1");

I don't quite understand the reason. I was thinking that each time I initialize 
a HTable instance, it needs to create a new HConnection. And that is expensive. 
But using the first method, multiple HTable instances can share the same 
HConnection. That is quite reasonable to me.
However, I was reading from some articles on internet that , even if I use the 
'new HTable(conf, tbl)' method, if the 'conf' object is the same one, all the 
HTable instances will still share the same HConnection. I was recently read yet 
another article and said when using 'new HTable(conf, tbl)', one don't need to 
use the exactly same 'conf' object (same one in memory). if two 'conf' objects, 
two different objects are all the same, I mean all attributes of these two are 
same (for example, created from the same hbase-site.xml and never change) then 
HTable objects can still share the same HConnection.  I also try to read the 
HTable src code, it is very hard, but it seems to me the last statement is 
correct: 'HTable will share HConnection, if configuration is all the same'.

Sorry for so verbose. My question:
If two 'configuration' objects are same, then two HTable object instantiated 
with them respectively can still share the same HConnection or not? Directly 
using the 'new HTable()' method.
If the answer is 'yes', then why I still need the HConnectionManager to create 
a shared connection?
I am talking about 0.98.6.
I googled for days, and even try to read HBase src code, but still get really 
confused. I try to do some tests also, but since I am too newbie, I don't know 
how to verify the difference, I really don't know what a HConnection do under 
the hood. I counted the ZooKeeper client requests, and I found some difference. 
If this ZooKeeper requests difference is a correct metrics, it means to me that 
two HTable do not share HConnetion even using same 'configuration' in the 
constructor. So it confused me more and more

Please someone kindly help me for this newbie question and thanks in advance.

Thanks,
Ming




RE: HTable or HConnectionManager, how a client connect to HBase?

2015-02-16 Thread Liu, Ming (HPIT-GADSC)
Hi,

I have to spend a lot of time to look into the source code of HTable, 
HConnectionManager. 
IMHO, it seems the document on hbase website is misleading. In the hbase online 
document : http://hbase.apache.org/book.html#architecture.client . It mentioned:
==
For example, this is preferred:

HBaseConfiguration conf = HBaseConfiguration.create();
HTable table1 = new HTable(conf, "myTable");
HTable table2 = new HTable(conf, "myTable");

as opposed to this:

HBaseConfiguration conf1 = HBaseConfiguration.create();
HTable table1 = new HTable(conf1, "myTable");
HBaseConfiguration conf2 = HBaseConfiguration.create();
HTable table2 = new HTable(conf2, "myTable");
===
After I checking the src code , it seems only in 0.20 code, HTable must use the 
same Configuration instance in order to share the HConnection. 0.20 uses the 
configuration instance as the key for a hashmap to save HConnections. I check 
0.90.0 code, it already use HConnectionKey as the key of the HashMap which save 
the shared HConnections. 

So as far as I understand, the document is NOT true for HBase later than 0.90 
version. These two examples can both share HConnection instance. If I am wrong, 
please correct me.  

For my previous question. If two HTable already share the HConnection, why I 
need to create a HConnection first by HConnectionManager.createConnection()?
By reading the src code, it seems the HTable.close() will also close the 
HConnection, so one table do a close, the following HTable have to reconnect, 
no shareing. But if the HTable is initiated by the HConnection.getTable(), it 
will use a special constructor of HTable to make sure when HTable.close() is 
invoked, it will NOT close the connection. So the HConnection can be shared.

I will use the recommended method, and as discussed in another thread here, to 
share HConnection one still have to ensure the shared connection should not be 
closed. So the HConnectionManager is a good abstraction to control the life 
cycle of a connection. I seem to understand now :-) 

Thanks,
Ming


-Original Message-
From: Liu, Ming (HPIT-GADSC) 
Sent: Saturday, February 14, 2015 10:45 PM
To: user@hbase.apache.org
Subject: HTable or HConnectionManager, how a client connect to HBase?

Hi,

I am using HBase 0.98.6.

I learned from this maillist before, that the recommended method to 'connect' 
to HBase from client is to use HConnectionManager like this:
HConnection 
con=HConnectionManager.createConnection(configuration);
HTableInterfacetable = 
con.getTable("hbase_table1"); Instead of
HTableInterface table = new 
HTable(configuration, "hbase_table1");

I don't quite understand the reason. I was thinking that each time I initialize 
a HTable instance, it needs to create a new HConnection. And that is expensive. 
But using the first method, multiple HTable instances can share the same 
HConnection. That is quite reasonable to me.
However, I was reading from some articles on internet that , even if I use the 
'new HTable(conf, tbl)' method, if the 'conf' object is the same one, all the 
HTable instances will still share the same HConnection. I was recently read yet 
another article and said when using 'new HTable(conf, tbl)', one don't need to 
use the exactly same 'conf' object (same one in memory). if two 'conf' objects, 
two different objects are all the same, I mean all attributes of these two are 
same (for example, created from the same hbase-site.xml and never change) then 
HTable objects can still share the same HConnection.  I also try to read the 
HTable src code, it is very hard, but it seems to me the last statement is 
correct: 'HTable will share HConnection, if configuration is all the same'.

Sorry for so verbose. My question:
If two 'configuration' objects are same, then two HTable object instantiated 
with them respectively can still share the same HConnection or not? Directly 
using the 'new HTable()' method.
If the answer is 'yes', then why I still need the HConnectionManager to create 
a shared connection?
I am talking about 0.98.6.
I googled for days, and even try to read HBase src code, but still get really 
confused. I try to do some tests also, but since I am too newbie, I don't know 
how to verify the difference, I really don't know what a HConnection do under 
the hood. I counted the ZooKeeper client requests, and I found some difference. 
If this ZooKeeper requests difference is a correct metrics, it means to me that 
two HTable do not share HConnetion even using same 'configuration' in the 
constructor. So it confused me more and more

Please someone kindly help me for this newbie question and thanks in advance.

Thanks,
Ming




Re: HTable or HConnectionManager, how a client connect to HBase?

2015-02-17 Thread Enis Söztutar
Hi,

You are right that the constructor new HTable(Configuration, ..) will share
the underlying connection if same configuration object is used. Connection
is a heavy weight object, that holds the zookeeper connection, rpc client,
socket connections to multiple region servers, master, and the thread pool,
etc. You definitely do not want to create multiple connections per process
unless you know what you are doing.

The model is changed, and the old way of HTable(Configuration, ..) is
deprecated because, we want to make the Connection lifecycle management
explicit. In the new model, an opened Connection is closed by the user
again, and light weight Table instances are obtained from the Connection.
Having HTable's share their connections implicitly makes reasoning about it
too hard. The new model should be pretty easy to follow.

Enis

On Sat, Feb 14, 2015 at 6:45 AM, Liu, Ming (HPIT-GADSC) 
wrote:

> Hi,
>
> I am using HBase 0.98.6.
>
> I learned from this maillist before, that the recommended method to
> 'connect' to HBase from client is to use HConnectionManager like this:
> HConnection
> con=HConnectionManager.createConnection(configuration);
> HTableInterfacetable =
> con.getTable("hbase_table1");
> Instead of
> HTableInterface table = new
> HTable(configuration, "hbase_table1");
>
> I don't quite understand the reason. I was thinking that each time I
> initialize a HTable instance, it needs to create a new HConnection. And
> that is expensive. But using the first method, multiple HTable instances
> can share the same HConnection. That is quite reasonable to me.
> However, I was reading from some articles on internet that , even if I use
> the 'new HTable(conf, tbl)' method, if the 'conf' object is the same one,
> all the HTable instances will still share the same HConnection. I was
> recently read yet another article and said when using 'new HTable(conf,
> tbl)', one don't need to use the exactly same 'conf' object (same one in
> memory). if two 'conf' objects, two different objects are all the same, I
> mean all attributes of these two are same (for example, created from the
> same hbase-site.xml and never change) then HTable objects can still share
> the same HConnection.  I also try to read the HTable src code, it is very
> hard, but it seems to me the last statement is correct: 'HTable will share
> HConnection, if configuration is all the same'.
>
> Sorry for so verbose. My question:
> If two 'configuration' objects are same, then two HTable object
> instantiated with them respectively can still share the same HConnection or
> not? Directly using the 'new HTable()' method.
> If the answer is 'yes', then why I still need the HConnectionManager to
> create a shared connection?
> I am talking about 0.98.6.
> I googled for days, and even try to read HBase src code, but still get
> really confused. I try to do some tests also, but since I am too newbie, I
> don't know how to verify the difference, I really don't know what a
> HConnection do under the hood. I counted the ZooKeeper client requests, and
> I found some difference. If this ZooKeeper requests difference is a correct
> metrics, it means to me that two HTable do not share HConnetion even using
> same 'configuration' in the constructor. So it confused me more and more
>
> Please someone kindly help me for this newbie question and thanks in
> advance.
>
> Thanks,
> Ming
>
>
>


Re: HTable or HConnectionManager, how a client connect to HBase?

2015-02-17 Thread Serega Sheypak
Hi, Enis Söztutar
You've wrote:
>>You are right that the constructor new HTable(Configuration, ..) will
share the underlying connection if same configuration object is used.

What do it mean "the same"? is equality checked using reference (java == )
or using equals(Object other) method?


2015-02-18 7:34 GMT+03:00 Enis Söztutar :

> Hi,
>
> You are right that the constructor new HTable(Configuration, ..) will share
> the underlying connection if same configuration object is used. Connection
> is a heavy weight object, that holds the zookeeper connection, rpc client,
> socket connections to multiple region servers, master, and the thread pool,
> etc. You definitely do not want to create multiple connections per process
> unless you know what you are doing.
>
> The model is changed, and the old way of HTable(Configuration, ..) is
> deprecated because, we want to make the Connection lifecycle management
> explicit. In the new model, an opened Connection is closed by the user
> again, and light weight Table instances are obtained from the Connection.
> Having HTable's share their connections implicitly makes reasoning about it
> too hard. The new model should be pretty easy to follow.
>
> Enis
>
> On Sat, Feb 14, 2015 at 6:45 AM, Liu, Ming (HPIT-GADSC) 
> wrote:
>
> > Hi,
> >
> > I am using HBase 0.98.6.
> >
> > I learned from this maillist before, that the recommended method to
> > 'connect' to HBase from client is to use HConnectionManager like this:
> > HConnection
> > con=HConnectionManager.createConnection(configuration);
> > HTableInterfacetable =
> > con.getTable("hbase_table1");
> > Instead of
> > HTableInterface table = new
> > HTable(configuration, "hbase_table1");
> >
> > I don't quite understand the reason. I was thinking that each time I
> > initialize a HTable instance, it needs to create a new HConnection. And
> > that is expensive. But using the first method, multiple HTable instances
> > can share the same HConnection. That is quite reasonable to me.
> > However, I was reading from some articles on internet that , even if I
> use
> > the 'new HTable(conf, tbl)' method, if the 'conf' object is the same one,
> > all the HTable instances will still share the same HConnection. I was
> > recently read yet another article and said when using 'new HTable(conf,
> > tbl)', one don't need to use the exactly same 'conf' object (same one in
> > memory). if two 'conf' objects, two different objects are all the same, I
> > mean all attributes of these two are same (for example, created from the
> > same hbase-site.xml and never change) then HTable objects can still share
> > the same HConnection.  I also try to read the HTable src code, it is very
> > hard, but it seems to me the last statement is correct: 'HTable will
> share
> > HConnection, if configuration is all the same'.
> >
> > Sorry for so verbose. My question:
> > If two 'configuration' objects are same, then two HTable object
> > instantiated with them respectively can still share the same HConnection
> or
> > not? Directly using the 'new HTable()' method.
> > If the answer is 'yes', then why I still need the HConnectionManager to
> > create a shared connection?
> > I am talking about 0.98.6.
> > I googled for days, and even try to read HBase src code, but still get
> > really confused. I try to do some tests also, but since I am too newbie,
> I
> > don't know how to verify the difference, I really don't know what a
> > HConnection do under the hood. I counted the ZooKeeper client requests,
> and
> > I found some difference. If this ZooKeeper requests difference is a
> correct
> > metrics, it means to me that two HTable do not share HConnetion even
> using
> > same 'configuration' in the constructor. So it confused me more and
> more
> >
> > Please someone kindly help me for this newbie question and thanks in
> > advance.
> >
> > Thanks,
> > Ming
> >
> >
> >
>


Re: HTable or HConnectionManager, how a client connect to HBase?

2015-02-18 Thread Enis Söztutar
It is a bit more complex than that. It is actually a hash of some subset of
the configuration properties. See HConnectionKey class if you want to learn
more. But the important thing is that with the new style, you do not need
to worry anything about these since there is no implicit connection
sharing. Everything is explicit now.

Enis

On Tue, Feb 17, 2015 at 11:50 PM, Serega Sheypak 
wrote:

> Hi, Enis Söztutar
> You've wrote:
> >>You are right that the constructor new HTable(Configuration, ..) will
> share the underlying connection if same configuration object is used.
>
> What do it mean "the same"? is equality checked using reference (java == )
> or using equals(Object other) method?
>
>
> 2015-02-18 7:34 GMT+03:00 Enis Söztutar :
>
> > Hi,
> >
> > You are right that the constructor new HTable(Configuration, ..) will
> share
> > the underlying connection if same configuration object is used.
> Connection
> > is a heavy weight object, that holds the zookeeper connection, rpc
> client,
> > socket connections to multiple region servers, master, and the thread
> pool,
> > etc. You definitely do not want to create multiple connections per
> process
> > unless you know what you are doing.
> >
> > The model is changed, and the old way of HTable(Configuration, ..) is
> > deprecated because, we want to make the Connection lifecycle management
> > explicit. In the new model, an opened Connection is closed by the user
> > again, and light weight Table instances are obtained from the Connection.
> > Having HTable's share their connections implicitly makes reasoning about
> it
> > too hard. The new model should be pretty easy to follow.
> >
> > Enis
> >
> > On Sat, Feb 14, 2015 at 6:45 AM, Liu, Ming (HPIT-GADSC) <
> ming.l...@hp.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am using HBase 0.98.6.
> > >
> > > I learned from this maillist before, that the recommended method to
> > > 'connect' to HBase from client is to use HConnectionManager like this:
> > > HConnection
> > > con=HConnectionManager.createConnection(configuration);
> > > HTableInterfacetable =
> > > con.getTable("hbase_table1");
> > > Instead of
> > > HTableInterface table = new
> > > HTable(configuration, "hbase_table1");
> > >
> > > I don't quite understand the reason. I was thinking that each time I
> > > initialize a HTable instance, it needs to create a new HConnection. And
> > > that is expensive. But using the first method, multiple HTable
> instances
> > > can share the same HConnection. That is quite reasonable to me.
> > > However, I was reading from some articles on internet that , even if I
> > use
> > > the 'new HTable(conf, tbl)' method, if the 'conf' object is the same
> one,
> > > all the HTable instances will still share the same HConnection. I was
> > > recently read yet another article and said when using 'new HTable(conf,
> > > tbl)', one don't need to use the exactly same 'conf' object (same one
> in
> > > memory). if two 'conf' objects, two different objects are all the
> same, I
> > > mean all attributes of these two are same (for example, created from
> the
> > > same hbase-site.xml and never change) then HTable objects can still
> share
> > > the same HConnection.  I also try to read the HTable src code, it is
> very
> > > hard, but it seems to me the last statement is correct: 'HTable will
> > share
> > > HConnection, if configuration is all the same'.
> > >
> > > Sorry for so verbose. My question:
> > > If two 'configuration' objects are same, then two HTable object
> > > instantiated with them respectively can still share the same
> HConnection
> > or
> > > not? Directly using the 'new HTable()' method.
> > > If the answer is 'yes', then why I still need the HConnectionManager to
> > > create a shared connection?
> > > I am talking about 0.98.6.
> > > I googled for days, and even try to read HBase src code, but still get
> > > really confused. I try to do some tests also, but since I am too
> newbie,
> > I
> > > don't know how to verify the difference, I really don't know what a
> > > HConnection do under the hood. I counted the ZooKeeper client requests,
> > and
> > > I found some difference. If this ZooKeeper requests difference is a
> > correct
> > > metrics, it means to me that two HTable do not share HConnetion even
> > using
> > > same 'configuration' in the constructor. So it confused me more and
> > more
> > >
> > > Please someone kindly help me for this newbie question and thanks in
> > > advance.
> > >
> > > Thanks,
> > > Ming
> > >
> > >
> > >
> >
>


RE: HTable or HConnectionManager, how a client connect to HBase?

2015-02-23 Thread Liu, Ming (HPIT-GADSC)
Thanks, Enis,

Your reply is very clear,  I finally understand it now.

Best Regards,
Ming
-Original Message-
From: Enis Söztutar [mailto:enis@gmail.com] 
Sent: Thursday, February 19, 2015 10:41 AM
To: hbase-user
Subject: Re: HTable or HConnectionManager, how a client connect to HBase?

It is a bit more complex than that. It is actually a hash of some subset of the 
configuration properties. See HConnectionKey class if you want to learn more. 
But the important thing is that with the new style, you do not need to worry 
anything about these since there is no implicit connection sharing. Everything 
is explicit now.

Enis

On Tue, Feb 17, 2015 at 11:50 PM, Serega Sheypak 
wrote:

> Hi, Enis Söztutar
> You've wrote:
> >>You are right that the constructor new HTable(Configuration, ..) 
> >>will
> share the underlying connection if same configuration object is used.
>
> What do it mean "the same"? is equality checked using reference (java 
> == ) or using equals(Object other) method?
>
>
> 2015-02-18 7:34 GMT+03:00 Enis Söztutar :
>
> > Hi,
> >
> > You are right that the constructor new HTable(Configuration, ..) 
> > will
> share
> > the underlying connection if same configuration object is used.
> Connection
> > is a heavy weight object, that holds the zookeeper connection, rpc
> client,
> > socket connections to multiple region servers, master, and the 
> > thread
> pool,
> > etc. You definitely do not want to create multiple connections per
> process
> > unless you know what you are doing.
> >
> > The model is changed, and the old way of HTable(Configuration, ..) 
> > is deprecated because, we want to make the Connection lifecycle 
> > management explicit. In the new model, an opened Connection is 
> > closed by the user again, and light weight Table instances are obtained 
> > from the Connection.
> > Having HTable's share their connections implicitly makes reasoning 
> > about
> it
> > too hard. The new model should be pretty easy to follow.
> >
> > Enis
> >
> > On Sat, Feb 14, 2015 at 6:45 AM, Liu, Ming (HPIT-GADSC) <
> ming.l...@hp.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am using HBase 0.98.6.
> > >
> > > I learned from this maillist before, that the recommended method 
> > > to 'connect' to HBase from client is to use HConnectionManager like this:
> > > HConnection 
> > > con=HConnectionManager.createConnection(configuration);
> > > HTableInterfacetable = 
> > > con.getTable("hbase_table1"); Instead of
> > > HTableInterface table = new 
> > > HTable(configuration, "hbase_table1");
> > >
> > > I don't quite understand the reason. I was thinking that each time 
> > > I initialize a HTable instance, it needs to create a new 
> > > HConnection. And that is expensive. But using the first method, 
> > > multiple HTable
> instances
> > > can share the same HConnection. That is quite reasonable to me.
> > > However, I was reading from some articles on internet that , even 
> > > if I
> > use
> > > the 'new HTable(conf, tbl)' method, if the 'conf' object is the 
> > > same
> one,
> > > all the HTable instances will still share the same HConnection. I 
> > > was recently read yet another article and said when using 'new 
> > > HTable(conf, tbl)', one don't need to use the exactly same 'conf' 
> > > object (same one
> in
> > > memory). if two 'conf' objects, two different objects are all the
> same, I
> > > mean all attributes of these two are same (for example, created 
> > > from
> the
> > > same hbase-site.xml and never change) then HTable objects can 
> > > still
> share
> > > the same HConnection.  I also try to read the HTable src code, it 
> > > is
> very
> > > hard, but it seems to me the last statement is correct: 'HTable 
> > > will
> > share
> > > HConnection, if configuration is all the same'.
> > >
> > > Sorry for so verbose. My question:
> > > If two 'configuration' objects are same, then two HTable object 
> > > instantiated with them respectively can still share the same
> HConnection
> > or
> > > not? Directly using the 'new HTable()' method.
> > > If the answer is 'yes', then why I still need the