During compaction the region is not out of service.
According to documentation the max region size for V2 format is 20G
And now the question: Assuming that 20G is the limit and the number of
regions in a single RS should stay low < 500 it means that there is no mean
having RS with more than 10TB of storage to use by HBase (otherwise
locality will not be achieve for some servers, i also assume that
compression is used and therefore it compensate the need for additional
space for replication)?
If the max number of region per RS is smaller then the storage size is even
smaller. Is it correct?

Mikael.S

On Sun, Feb 19, 2012 at 6:38 PM, M. C. Srivas <[email protected]> wrote:

> What is the impact when a compaction happens on a large 20G region?   Given
> that the FS will do writes at 30 MB/s (over a single 1 GigE link), it will
> take about 1500 seconds to read/write the region. Is the region out of
> service for 25 mins (= 1500 seconds)?
>
>
> On Fri, Feb 17, 2012 at 11:25 PM, Pan, Thomas <[email protected]> wrote:
>
> >
> > Jacques, thanks for the details on region size. We've observed that
> > regions per region server could skew big time at the table level. We do
> > have tool to balance regions. Still, it is sort of annoying to maintain
> > the balance. $0.02, -Thomas
> >
> > On 2/17/12 2:46 PM, "Jacques" <[email protected]> wrote:
> >
> > >You should be fine having multiple tables with high region counts.  I
> > >would
> > >avoid making thousands of tables.  However, if you have three separate
> > >business needs, make three different tables.
> > >
> > >You seem to be starting with a perspective that there would be some kind
> > >of
> > >issues with multiple tables.  Why do you think this exists?  You said
> > >"Otherwise, runtime tuning seems to add quite amount of operational
> cost."
> > >I'm not sure what you are thinking here and where your thoughts are
> coming
> > >from.  Additionally, if you have separate tables, then you can modify
> them
> > >differently (e.g. setting them to different region sizes if it makes
> > >sense-- for example, some of our tables have smaller region sizes so
> we'll
> > >have more maps rather than fewer when we run map reduce jobs).
> > >
> > >Regarding region size: the HTable v1 format in 0.90 and below suffered
> > >from
> > >taking a long time to transition as individual regions got too big.
>  With
> > >0.92 and HTablev2 that isn't as much of a problem as I understand it.
>  If
> > >I
> > >recall correctly, there are numerous organizations using 10gb regions
> with
> > >sucess-- (among others, I believe this what Yahoo reported they were
> using
> > >for their web crawl tables on their thousand node cluster).  While I
> > >haven't run any stats, I believe that there is negligible scan
> performance
> > >impact as region size grows.  There is definitely no  exponential
> negative
> > >performance impact.
> > >
> > >
> > >
> > >On Fri, Feb 17, 2012 at 10:55 AM, Pan, Thomas <[email protected]> wrote:
> > >
> > >>
> > >> Vladimire and Jacques, Thanks for the information! Unless Hbase well
> > >> handles multiple big sized tables (relatively high region count) in
> one
> > >> cluster, it seems to me that one big table is the way to go.
> Otherwise,
> > >> runtime tuning seems to add quite amount of operational cost. That
> leads
> > >> to another question. Do we see big region size as an issue? If so,
> > >>what's
> > >> the pivot point as region size grows further, the scan performance
> > >>starts
> > >> to degrade exponentially?
> > >>
> > >> On 2/15/12 4:11 PM, "Vladimir Rodionov" <[email protected]>
> > wrote:
> > >>
> > >> >10 tables are fine. 1000 are not, especially when one does table
> > >> >pre-splitting to increase write perf.
> > >> >
> > >> >Too many regions kill HBase.
> > >> >
> > >> >Best regards,
> > >> >Vladimir Rodionov
> > >> >Principal Platform Engineer
> > >> >Carrier IQ, www.carrieriq.com
> > >> >e-mail: [email protected]
> > >> >
> > >> >________________________________________
> > >> >From: Jacques [[email protected]]
> > >> >Sent: Wednesday, February 15, 2012 3:45 PM
> > >> >To: [email protected]
> > >> >Subject: Re: Scan performance on a big table as combination of
> multiple
> > >> >logic tables
> > >> >
> > >> >Out of curiosity,  what do you perceive as the benefit to having only
> > >>one
> > >> >table?  Are there reasons that you think one table would perform
> better
> > >> >than a few?
> > >> >
> > >> >If you're splitting data within a table because you'd otherwise have
> > >> >millions of tables, I understand that and would concur with
> Vladimir's
> > >> >approach below.  However, if you're really looking at 10 tables
> versus
> > >>one
> > >> >table, it seems like HBase is built exactly to make that work well
> > >>(rather
> > >> >than having to make all sorts of application level code to do what
> > >>HBase
> > >> >already does).
> > >> >
> > >> >thanks,
> > >> >Jacques
> > >> >
> > >> >On Wed, Feb 15, 2012 at 1:57 PM, Pan, Thomas <[email protected]> wrote:
> > >> >
> > >> >>
> > >> >> Since Hbase is tailored to handle one table very well, we are
> > >>thinking
> > >> >>to
> > >> >> put multiple tables into one big table but on different column
> family
> > >> >>sets.
> > >> >> Our use case is full table scan against single column value
> filters.
> > >>As
> > >> >> records from different "logical tables" are at different column
> > >> >>families,
> > >> >> could we speed up the scan performance by simply checking the
> column
> > >> >>family
> > >> >> referenced by these single column value filters first before really
> > >> >>going
> > >> >> through all the underlying K-V pairs? It would be great if the
> Hbase
> > >> >>code
> > >> >> is already coded that way.
> > >> >>
> > >> >>
> > >> >> $0.02,
> > >> >> Thomas
> > >> >>
> > >> >>
> > >> >
> > >> >Confidentiality Notice:  The information contained in this message,
> > >> >including any attachments hereto, may be confidential and is intended
> > >>to
> > >> >be read only by the individual or entity to whom this message is
> > >> >addressed. If the reader of this message is not the intended
> recipient
> > >>or
> > >> >an agent or designee of the intended recipient, please note that any
> > >> >review, use, disclosure or distribution of this message or its
> > >> >attachments, in any form, is strictly prohibited.  If you have
> received
> > >> >this message in error, please immediately notify the sender and/or
> > >> >[email protected] and delete or destroy any copy of this
> > >> >message and its attachments.
> > >>
> > >>
> >
> >
>



-- 
Mikael.S

Reply via email to