RE: Some HBase FAQ

Puri, Aseem Tue, 14 Apr 2009 01:56:35 -0700

Thanks Ryan for sharing your knowledge.

Thanks & Regards
Aseem Puri



-----Original Message-----
From: Ryan Rawson [mailto:ryano...@gmail.com] 
Sent: Tuesday, April 14, 2009 1:20 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Some HBase FAQ

every time memcache fills up, you get a flush - that is ~64mb.  But lots
of
files reduces performance, so minor compaction just merges them into 1
file.  Since all files are sorted, a file-merge sort is fast and
efficient.

this is a minor compaction.  For major compactions one would need to do
more
work to prune old values.

-ryan

On Tue, Apr 14, 2009 at 12:46 AM, Puri, Aseem
<aseem.p...@honeywell.com>wrote:

> One more thing I want ask that in minor compaction definition in HBase
> documentation is "when the number of MapFiles exceeds a configurable
> threshold, a minor compaction is performed which consolidates the most
> recently written MapFiles"
>
> So it means when we insert data in HBase table, 3 to 4 mapfiles are
> generated for one category, but after some time all mapfiles combines
as
> one file. Is this we call minor compaction actually?
>
> Thanks & Regards
> Aseem Puri
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryano...@gmail.com]
> Sent: Tuesday, April 14, 2009 12:59 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Some HBase FAQ
>
> The write-ahead log is to recover in crash scenarios.  Even if the
> regionserver crashes, recovery from log will save you.
>
> But even so, during a controlled shutdown, regionserver flushes
memcache
> ->
> disk.  If the master dies, this flush should get your data to
persistent
> disk.
>
> On Tue, Apr 14, 2009 at 12:24 AM, Puri, Aseem
> <aseem.p...@honeywell.com>wrote:
>
> > Actually I read that if HBase master fails cluster will shut down,
so
> I
> > think for that instance if the data in currently memcache will also
> > lost. So may be it minimize data loss. It's just what I am thinking.
> If
> > I am wrong regarding this issue please correct me.
> >
> > Thanks & Regards
> > Aseem Puri
> >
> > -----Original Message-----
> > From: Ryan Rawson [mailto:ryano...@gmail.com]
> > Sent: Tuesday, April 14, 2009 12:47 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: Some HBase FAQ
> >
> > I don't understand the rationale for mysql buffering... HBase
handles
> > writes
> > well, it is not a weak point, so just directly write into HBase.
> >
> > -ryan
> >
> > On Tue, Apr 14, 2009 at 12:15 AM, Puri, Aseem
> > <aseem.p...@honeywell.com>wrote:
> >
> > > So it is possible that without* loading all region we can have
some
> > part
> > > of data in memory that is required.
> > >
> > > Can you also suggest me what should I do for a situation:
> > >
> > > -- For my application where I will use HBase which will do updates
> in
> > a
> > > table frequently. I want your suggestion on that what technique
> should
> > I
> > > follow for write operation:
> > >
> > > a. If there is some update I should store data temporarily in
MySQL
> > and
> > > then do bulk update on HBase after some time.
> > >
> > > Or
> > >
> > > b. As if there is an update I should directly update on HBase
> instead
> > of
> > > writing it in MySQL.
> > >
> > > What you say, what approach is more optimized?
> > >
> > > Thanks & Regards
> > > Aseem Puri
> > >
> > > -----Original Message-----
> > > From: Ryan Rawson [mailto:ryano...@gmail.com]
> > > Sent: Tuesday, April 14, 2009 12:33 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: Re: Some HBase FAQ
> > >
> > > Only a part of the file on HDFS is read into memory to serve the
> > > request.
> > > It is not required to hold the entire file in ram.
> > >
> > >
> > > -ryan
> > >
> > > On Mon, Apr 13, 2009 at 11:56 PM, Puri, Aseem
> > > <aseem.p...@honeywell.com>wrote:
> > >
> > > >
> > > > Ryan,
> > > >
> > > > Thanks for updating me, Also please tell me what will happen if
is
> > > read
> > > > operation then required region is bring into RAM or not?
> > > >
> > > > Thanks & Regards
> > > > Aseem Puri
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Ryan Rawson [mailto:ryano...@gmail.com]
> > > > Sent: Tuesday, April 14, 2009 12:23 PM
> > > > To: hbase-user@hadoop.apache.org
> > > > Subject: Re: Some HBase FAQ
> > > >
> > > > yes exactly.  The regionserver loads the index on start up in
one
> > go,
> > > > holds
> > > > it in ram - then it can use this index to do small specific
reads
> > from
> > > > HDFS.
> > > >
> > > > I found that in hbase 0.20 I was using about 700kB/ram per 5m
> rows,
> > 40
> > > > byte
> > > > values.
> > > >
> > > > -ryan
> > > >
> > > > On Mon, Apr 13, 2009 at 11:50 PM, Puri, Aseem
> > > > <aseem.p...@honeywell.com>wrote:
> > > >
> > > > > Hi Ryan,
> > > > >
> > > > > It means Regionserver have only index file of regions but not
> the
> > > > actual
> > > > > data that is on HDFS.
> > > > >
> > > > > Thanks & Regards
> > > > > Aseem Puri
> > > > >
> > > > > -----Original Message-----
> > > > > From: Ryan Rawson [mailto:ryano...@gmail.com]
> > > > > Sent: Tuesday, April 14, 2009 12:16 PM
> > > > > To: hbase-user@hadoop.apache.org
> > > > > Subject: Re: Some HBase FAQ
> > > > >
> > > > > HBase loads the index of the files on start-up, if you ran out
> of
> > > > memory
> > > > > for
> > > > > those indexes (which are a fraction of the data size), you'd
> crash
> > > > with
> > > > > OOME.
> > > > >
> > > > > The index is supposed to be a smallish fraction of the total
> data
> > > > size.
> > > > >
> > > > > I wouldn't run with less than -Xmx2000m
> > > > >
> > > > > On Mon, Apr 13, 2009 at 10:48 PM, Puri, Aseem
> > > > > <aseem.p...@honeywell.com>wrote:
> > > > >
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Erik Holstad [mailto:erikhols...@gmail.com]
> > > > > > Sent: Monday, April 13, 2009 9:47 PM
> > > > > > To: hbase-user@hadoop.apache.org
> > > > > > Subject: Re: Some HBase FAQ
> > > > > >
> > > > > > On Mon, Apr 13, 2009 at 7:12 AM, Puri, Aseem
> > > > > > <aseem.p...@honeywell.com>wrote:
> > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > >            I am new HBase user. I have some doubts regards
> > > > > > > functionality of HBase. I am working on HBase, things are
> > going
> > > > fine
> > > > > > but
> > > > > > > I am not clear how are things happening. Please help me by
> > > > answering
> > > > > > > these questions.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 1.      I am inserting data in HBase table and all regions
> get
> > > > > > balanced
> > > > > > > across various Regionservers. But what will happens when
> data
> > > > > > increases
> > > > > > > and there is not enough space in Regionservers to
> accommodate
> > > all
> > > > > > > regions. So I will like this that some regions in
> Regionserver
> > > and
> > > > > > some
> > > > > > > are at HDFS but not on Regionserver or HBase Regioservers
> stop
> > > > > taking
> > > > > > > new data?
> > > > > > >
> > > > > > Not really sure what you mean here, but if you are asking
what
> > to
> > > do
> > > > > > when
> > > > > > you are
> > > > > > running out of disk space on the regionservers, the answer
is
> > add
> > > > > > another
> > > > > > machine
> > > > > > or two.
> > > > > >
> > > > > > --- I want ask that HBase RegionServer store regions data on
> > HDFS.
> > > > So
> > > > > > when HBase master starts it loads all region data from HDFS
to
> > > > > > regionserver. So what will the scenario if there is not
enough
> > > space
> > > > > in
> > > > > > regionservers to accommodate new data? Is some regions
swapped
> > out
> > > > > from
> > > > > > regionserver to create space for new regions and when needed
> > swaps
> > > > in
> > > > > > regions to regionserver from HDFS. Or something else will
> > happen.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2.      When I insert data in HBase table, 3 to 4 mapfiles
> are
> > > > > > generated
> > > > > > > for one category, but after some time all mapfiles
combines
> as
> > > one
> > > > > > file.
> > > > > > > Is this we call minor compaction actually?
> > > > > > >
> > > > > > When all current mapfiles and memcache are combined into one
> > > files,
> > > > > this
> > > > > > is called major compaction, see BigTable paper for more
> details.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 3.      For my application where I will use HBase will
have
> > > > updates
> > > > > in
> > > > > > a
> > > > > > > table frequently. Should is use some other database as a
> > > > > intermediate
> > > > > > to
> > > > > > > store data temporarily like MySQL and then do bulk update
on
> > > HBase
> > > > > or
> > > > > > > should I directly do updates on HBase. Please tell which
> > > technique
> > > > > > will
> > > > > > > be more optimized in HBase?
> > > > > > >
> > > > > > HBase is fast for reads which has so far been the main focus
> of
> > > the
> > > > > > development, with
> > > > > > 0.20 we can hopefully add even fast random reading to it to
> make
> > > it
> > > > a
> > > > > > more
> > > > > > well rounded
> > > > > > system. Is HBase too slow for you today when writing to it
and
> > > what
> > > > > are
> > > > > > your
> > > > > > requirements?
> > > > > >
> > > > > > ---- Basically I put this question for writing operation.
Not
> > any
> > > > > > complex requirement. I want your suggestion on that what
> > technique
> > > > > > should I follow for write operation:
> > > > > >
> > > > > > a. If there is some update I should store data temporarily
in
> > > MySQL
> > > > > and
> > > > > > then do bulk update on HBase
> > > > > >
> > > > > > b. As if there is an update I should directly update on
HBase
> > > > instead
> > > > > of
> > > > > > writing it in MySQL and after some time doing bulk update on
> > > HBase.
> > > > > >
> > > > > > What you say, what approach is more optimized?
> > > > > >
> > > > >
> > > >
> > >
> >
>

RE: Some HBase FAQ

Reply via email to