The clocks are all running in sync, though I am not using NTP shamefully. I 
should.

And no, I listed the errors backwards, that's not how they showed up in the 
log, sorry, heh. I don't think they run backwards.

-----Original Message-----
From: Andrew Purtell [mailto:apurt...@apache.org]
Sent: Wednesday, December 23, 2009 12:47 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Smaller Region Size?

How do you have clocks set up on your systems Mark? Are you using NTP to keep
them sane? Am I correct that they are sometimes running backward?


   - Andy



----- Original Message ----
> From: Mark Vigeant <mark.vige...@riskmetrics.com>
> To: "hbase-user@hadoop.apache.org" <hbase-user@hadoop.apache.org>
> Sent: Wed, December 23, 2009 9:09:04 AM
> Subject: RE: Smaller Region Size?
>
> > The biggest legitimate reason to run smaller region size is if your
> > data set is small (lets say 400mb) but highly accessed, so you want a
> > good spread of regions across your cluster.
>
> That's exactly it, my input dataset was 500MB total (~1,000,000 rows) and it 
> was
> getting stored as just one region on one regionserver.
>
> In response to St. Ack, I don't think my regions are performing too many 
> splits:
> the regionserver logs mainly consist of the occasional ZooKeeper Connection
> error and these two repeatedly:
>
> 2009-12-22 15:21:50,415 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache:
> Cache Stats: Sizes: Total=6.556961MB (6875472), Free=792.61804MB (831120240),
> Max=799.175MB (837995712), Counts: Blocks=0, Access=25755, Hit=0, Miss=25755,
> Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%,
> Evicted/Run=NaN
>
> 2009-12-22 15:20:35,073 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Skipping major compaction of Message because one (major) compacted file only 
> and
> elapsedTime 339624149ms is < ttl=9223372036854775807
>
> You're suggesting the performance would be improved if the dataset was larger?
> What are other parameters that can be fine-tuned to optimize based off data
> size?
>
> Thanks
> -Mark
> -----Original Message-----
> From: Ryan Rawson [mailto:ryano...@gmail.com]
> Sent: Tuesday, December 22, 2009 11:28 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Smaller Region Size?
>
> The biggest legitimate reason to run smaller region size is if your
> data set is small (lets say 400mb) but highly accessed, so you want a
> good spread of regions across your cluster.
>
> Another is to run a larger region if you are having a huge table and
> you want to keep absolute region count low. I am not 100% sold on this
> yet.
>
> I have a patch that can keep performance high during a highly split
> table, by using parallel puts. This has been proven to keep aggregate
> performance really high, and I hope it will make 0.20.3.
>
> On Tue, Dec 22, 2009 at 2:31 PM, stack wrote:
> > On Tue, Dec 22, 2009 at 8:57 AM, Mark Vigeant
> > wrote:
> >
> >> J-D,
> >>
> >> I noticed that performance for uploading data into tables got a lot better
> >> as I lowered the max file size -- but up until a certain point, where the
> >> performance began slowing down again.
> >>
> >>
> > Tell us more.  What kinda size changes did you make?  How many regions were
> > created?  Is the slow down because table is splitting all the time?  If you
> > study regionserver logs, can you make out what the regionservers are
> > spending their times doing?
> >
> >
> >
> >> Is there a rule of thumb/formula/notion to rely on when setting this
> >> parameter for optimal performance? Thanks!
> >>
> >>
> > We have most experience running defaults.  Generally folks go up from the
> > default size because they want to host more data in about same number or
> > regions.  Going down from the default I've not seen much of.
> >
> > St.Ack
> >
>
> This email message and any attachments are for the sole use of the intended
> recipients and may contain proprietary and/or confidential information which 
> may
> be privileged or otherwise protected from disclosure. Any unauthorized review,
> use, disclosure or distribution is prohibited. If you are not an intended
> recipient, please contact the sender by reply email and destroy the original
> message and any copies of the message as well as any attachments to the 
> original
> message.






This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message.

Reply via email to