It will save you a lot of trouble since by default the version of a cell is set to System.currenTimeInMillis by the region server. Let's say you delete a value, the region gets reassigned minutes later to another region server which is running 60 minutes in the past and then you do a Get on that cell with default ts. This will translate to a time before the previous delete so you will get a deleted cell back (unless a major compaction was run).
So a minor clock skew is ok but more than 20 minutes is asking for trouble. This requirement is documented in the Getting Started. J-D On Thu, Dec 24, 2009 at 8:17 AM, Dhruba Borthakur <dhr...@gmail.com> wrote: > Hi folks, > > Is it necessary to run keep the clocks synchronized on all Hbase region > servers/master? I would appreciate it a lot if somebody can please explain > if the HBase architecture depends on this fact. > > thanks, > dhruba > > > On Wed, Dec 23, 2009 at 9:57 AM, Mark Vigeant > <mark.vige...@riskmetrics.com>wrote: > >> The clocks are all running in sync, though I am not using NTP shamefully. I >> should. >> >> And no, I listed the errors backwards, that's not how they showed up in the >> log, sorry, heh. I don't think they run backwards. >> >> -----Original Message----- >> From: Andrew Purtell [mailto:apurt...@apache.org] >> Sent: Wednesday, December 23, 2009 12:47 PM >> To: hbase-user@hadoop.apache.org >> Subject: Re: Smaller Region Size? >> >> How do you have clocks set up on your systems Mark? Are you using NTP to >> keep >> them sane? Am I correct that they are sometimes running backward? >> >> >> - Andy >> >> >> >> ----- Original Message ---- >> > From: Mark Vigeant <mark.vige...@riskmetrics.com> >> > To: "hbase-user@hadoop.apache.org" <hbase-user@hadoop.apache.org> >> > Sent: Wed, December 23, 2009 9:09:04 AM >> > Subject: RE: Smaller Region Size? >> > >> > > The biggest legitimate reason to run smaller region size is if your >> > > data set is small (lets say 400mb) but highly accessed, so you want a >> > > good spread of regions across your cluster. >> > >> > That's exactly it, my input dataset was 500MB total (~1,000,000 rows) and >> it was >> > getting stored as just one region on one regionserver. >> > >> > In response to St. Ack, I don't think my regions are performing too many >> splits: >> > the regionserver logs mainly consist of the occasional ZooKeeper >> Connection >> > error and these two repeatedly: >> > >> > 2009-12-22 15:21:50,415 DEBUG >> org.apache.hadoop.hbase.io.hfile.LruBlockCache: >> > Cache Stats: Sizes: Total=6.556961MB (6875472), Free=792.61804MB >> (831120240), >> > Max=799.175MB (837995712), Counts: Blocks=0, Access=25755, Hit=0, >> Miss=25755, >> > Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%, >> > Evicted/Run=NaN >> > >> > 2009-12-22 15:20:35,073 DEBUG org.apache.hadoop.hbase.regionserver.Store: >> > Skipping major compaction of Message because one (major) compacted file >> only and >> > elapsedTime 339624149ms is < ttl=9223372036854775807 >> > >> > You're suggesting the performance would be improved if the dataset was >> larger? >> > What are other parameters that can be fine-tuned to optimize based off >> data >> > size? >> > >> > Thanks >> > -Mark >> > -----Original Message----- >> > From: Ryan Rawson [mailto:ryano...@gmail.com] >> > Sent: Tuesday, December 22, 2009 11:28 PM >> > To: hbase-user@hadoop.apache.org >> > Subject: Re: Smaller Region Size? >> > >> > The biggest legitimate reason to run smaller region size is if your >> > data set is small (lets say 400mb) but highly accessed, so you want a >> > good spread of regions across your cluster. >> > >> > Another is to run a larger region if you are having a huge table and >> > you want to keep absolute region count low. I am not 100% sold on this >> > yet. >> > >> > I have a patch that can keep performance high during a highly split >> > table, by using parallel puts. This has been proven to keep aggregate >> > performance really high, and I hope it will make 0.20.3. >> > >> > On Tue, Dec 22, 2009 at 2:31 PM, stack wrote: >> > > On Tue, Dec 22, 2009 at 8:57 AM, Mark Vigeant >> > > wrote: >> > > >> > >> J-D, >> > >> >> > >> I noticed that performance for uploading data into tables got a lot >> better >> > >> as I lowered the max file size -- but up until a certain point, where >> the >> > >> performance began slowing down again. >> > >> >> > >> >> > > Tell us more. What kinda size changes did you make? How many regions >> were >> > > created? Is the slow down because table is splitting all the time? If >> you >> > > study regionserver logs, can you make out what the regionservers are >> > > spending their times doing? >> > > >> > > >> > > >> > >> Is there a rule of thumb/formula/notion to rely on when setting this >> > >> parameter for optimal performance? Thanks! >> > >> >> > >> >> > > We have most experience running defaults. Generally folks go up from >> the >> > > default size because they want to host more data in about same number >> or >> > > regions. Going down from the default I've not seen much of. >> > > >> > > St.Ack >> > > >> > >> > This email message and any attachments are for the sole use of the >> intended >> > recipients and may contain proprietary and/or confidential information >> which may >> > be privileged or otherwise protected from disclosure. Any unauthorized >> review, >> > use, disclosure or distribution is prohibited. If you are not an intended >> > recipient, please contact the sender by reply email and destroy the >> original >> > message and any copies of the message as well as any attachments to the >> original >> > message. >> >> >> >> >> >> >> This email message and any attachments are for the sole use of the intended >> recipients and may contain proprietary and/or confidential information which >> may be privileged or otherwise protected from disclosure. Any unauthorized >> review, use, disclosure or distribution is prohibited. If you are not an >> intended recipient, please contact the sender by reply email and destroy the >> original message and any copies of the message as well as any attachments to >> the original message. >> > > > > -- > Connect to me at http://www.facebook.com/dhruba >