RE: Some HBase FAQ

Puri, Aseem Tue, 14 Apr 2009 00:16:26 -0700

So it is possible that without* loading all region we can have some part
of data in memory that is required.


Can you also suggest me what should I do for a situation:

-- For my application where I will use HBase which will do updates in a
table frequently. I want your suggestion on that what technique should I
follow for write operation:

a. If there is some update I should store data temporarily in MySQL and
then do bulk update on HBase after some time.

Or

b. As if there is an update I should directly update on HBase instead of
writing it in MySQL.
  
What you say, what approach is more optimized?

Thanks & Regards
Aseem Puri

-----Original Message-----
From: Ryan Rawson [mailto:ryano...@gmail.com] 
Sent: Tuesday, April 14, 2009 12:33 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Some HBase FAQ

Only a part of the file on HDFS is read into memory to serve the
request.
It is not required to hold the entire file in ram.


-ryan

On Mon, Apr 13, 2009 at 11:56 PM, Puri, Aseem
<aseem.p...@honeywell.com>wrote:

>
> Ryan,
>
> Thanks for updating me, Also please tell me what will happen if is
read
> operation then required region is bring into RAM or not?
>
> Thanks & Regards
> Aseem Puri
>
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryano...@gmail.com]
> Sent: Tuesday, April 14, 2009 12:23 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Some HBase FAQ
>
> yes exactly.  The regionserver loads the index on start up in one go,
> holds
> it in ram - then it can use this index to do small specific reads from
> HDFS.
>
> I found that in hbase 0.20 I was using about 700kB/ram per 5m rows, 40
> byte
> values.
>
> -ryan
>
> On Mon, Apr 13, 2009 at 11:50 PM, Puri, Aseem
> <aseem.p...@honeywell.com>wrote:
>
> > Hi Ryan,
> >
> > It means Regionserver have only index file of regions but not the
> actual
> > data that is on HDFS.
> >
> > Thanks & Regards
> > Aseem Puri
> >
> > -----Original Message-----
> > From: Ryan Rawson [mailto:ryano...@gmail.com]
> > Sent: Tuesday, April 14, 2009 12:16 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: Some HBase FAQ
> >
> > HBase loads the index of the files on start-up, if you ran out of
> memory
> > for
> > those indexes (which are a fraction of the data size), you'd crash
> with
> > OOME.
> >
> > The index is supposed to be a smallish fraction of the total data
> size.
> >
> > I wouldn't run with less than -Xmx2000m
> >
> > On Mon, Apr 13, 2009 at 10:48 PM, Puri, Aseem
> > <aseem.p...@honeywell.com>wrote:
> >
> > >
> > > -----Original Message-----
> > > From: Erik Holstad [mailto:erikhols...@gmail.com]
> > > Sent: Monday, April 13, 2009 9:47 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: Re: Some HBase FAQ
> > >
> > > On Mon, Apr 13, 2009 at 7:12 AM, Puri, Aseem
> > > <aseem.p...@honeywell.com>wrote:
> > >
> > > > Hi
> > > >
> > > >            I am new HBase user. I have some doubts regards
> > > > functionality of HBase. I am working on HBase, things are going
> fine
> > > but
> > > > I am not clear how are things happening. Please help me by
> answering
> > > > these questions.
> > > >
> > > >
> > > >
> > > > 1.      I am inserting data in HBase table and all regions get
> > > balanced
> > > > across various Regionservers. But what will happens when data
> > > increases
> > > > and there is not enough space in Regionservers to accommodate
all
> > > > regions. So I will like this that some regions in Regionserver
and
> > > some
> > > > are at HDFS but not on Regionserver or HBase Regioservers stop
> > taking
> > > > new data?
> > > >
> > > Not really sure what you mean here, but if you are asking what to
do
> > > when
> > > you are
> > > running out of disk space on the regionservers, the answer is add
> > > another
> > > machine
> > > or two.
> > >
> > > --- I want ask that HBase RegionServer store regions data on HDFS.
> So
> > > when HBase master starts it loads all region data from HDFS to
> > > regionserver. So what will the scenario if there is not enough
space
> > in
> > > regionservers to accommodate new data? Is some regions swapped out
> > from
> > > regionserver to create space for new regions and when needed swaps
> in
> > > regions to regionserver from HDFS. Or something else will happen.
> > >
> > > >
> > > >
> > > >
> > > > 2.      When I insert data in HBase table, 3 to 4 mapfiles are
> > > generated
> > > > for one category, but after some time all mapfiles combines as
one
> > > file.
> > > > Is this we call minor compaction actually?
> > > >
> > > When all current mapfiles and memcache are combined into one
files,
> > this
> > > is called major compaction, see BigTable paper for more details.
> > >
> > > >
> > > >
> > > >
> > > > 3.      For my application where I will use HBase will have
> updates
> > in
> > > a
> > > > table frequently. Should is use some other database as a
> > intermediate
> > > to
> > > > store data temporarily like MySQL and then do bulk update on
HBase
> > or
> > > > should I directly do updates on HBase. Please tell which
technique
> > > will
> > > > be more optimized in HBase?
> > > >
> > > HBase is fast for reads which has so far been the main focus of
the
> > > development, with
> > > 0.20 we can hopefully add even fast random reading to it to make
it
> a
> > > more
> > > well rounded
> > > system. Is HBase too slow for you today when writing to it and
what
> > are
> > > your
> > > requirements?
> > >
> > > ---- Basically I put this question for writing operation. Not any
> > > complex requirement. I want your suggestion on that what technique
> > > should I follow for write operation:
> > >
> > > a. If there is some update I should store data temporarily in
MySQL
> > and
> > > then do bulk update on HBase
> > >
> > > b. As if there is an update I should directly update on HBase
> instead
> > of
> > > writing it in MySQL and after some time doing bulk update on
HBase.
> > >
> > > What you say, what approach is more optimized?
> > >
> >
>

RE: Some HBase FAQ

Reply via email to