Thoughts on a client-facing call to explicit call a WAL sync? So I could turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a batch of my inserts, and then run an explicit flush/sync. The returning of that call would guarantee to the client that the data up to that point is safe.
JG On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote: > I added a new feature for tables called "deferred flush", see > https://issues.apache.org/jira/browse/HBASE-1944 > > > My opinion is that the default should be paranoid enough to not lose > any user data. If we can change a table's attribute without taking it down > (there's a jira on that), wouldn't that solve the import problem? > > > For example: have some table that needs to have fast insertion via MR. > During the creation of the job, you change the table's > DEFERRED_LOG_FLUSH to "true", then run the job and finally set the > value to false when the job is done. > > This way you still pass the responsibility to the user but for > performance reasons. > > J-D > > > On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <[email protected]> wrote: > >> We could have a speedy default and an extra parameter for puts that >> would specify a flush is needed. This way you pass the responsibility to >> the user and he can decide if he needs to be paranoid or not. This could >> be part of Put and even specify granularity of the flush if needed. >> >> >> Cosmin >> >> >> >> On 11/15/09 6:59 PM, "Andrew Purtell" <[email protected]> wrote: >> >> >>> I agree with this. >>> >>> >>> I also think we should leave the default as is with the caveat that >>> we call out the durability versus write performance tradeoff in the >>> flushlogentries description and up on the wiki somewhere, maybe on >>> http://wiki.apache.org/hadoop/PerformanceTuning . We could also >>> provide two example configurations, one for performance (reasonable >>> tradeoffs), one for paranoia. I put up an issue: >>> https://issues.apache.org/jira/browse/HBASE-1984 >>> >>> >>> - Andy >>> >>> >>> >>> >>> >>> ________________________________ >>> From: Ryan Rawson <[email protected]> >>> To: [email protected] >>> Sent: Sat, November 14, 2009 11:22:13 PM >>> Subject: Re: Should we change the default value of >>> hbase.regionserver.flushlogentries for 0.21? >>> >>> That sync at the end of a RPC is my doing. You dont want to sync >>> every _EDIT_, after all, the previous definition of the word "edit" >>> was each KeyValue. So we could be calling sync for every single >>> column in a row. Bad stuff. >>> >>> In the end, if the regionserver crashes during a batch put, we will >>> never know how much of the batch was flushed to the WAL. Thus it makes >>> sense to only do it once and get a massive, massive, speedup. >>> >>> On Sat, Nov 14, 2009 at 9:45 PM, stack <[email protected]> wrote: >>> >>>> I'm for leaving it as it is, at every 100 edits -- maybe every 10 >>>> edits? Speed stays as it was. We used to lose MBs. By default, >>>> we'll now lose 99 or 9 edits max. >>>> >>>> We need to do some work bringing folks along regardless of what we >>>> decide. Flush happens at the end of the put up in the regionserver. >>>> If you are >>>> doing a batch of commits -- e.g. using a big write buffer over on >>>> your client -- the puts will only be flushed on the way out after >>>> the batch put completes EVEN if you have configured hbase to sync >>>> every edit (I ran into this this evening. J-D sorted me out). We >>>> need to make sure folks are up on this. >>>> >>>> St.Ack >>>> >>>> >>>> >>>> >>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans >>>> <[email protected]>wrote: >>>> >>>> >>>>> Hi dev! >>>>> >>>>> >>>>> Hadoop 0.21 now has a reliable append and flush feature and this >>>>> gives us the opportunity to review some assumptions. The current >>>>> situation: >>>>> >>>>> >>>>> - Every edit going to a catalog table is flushed so there's no >>>>> data loss. - The user tables edits are flushed every >>>>> hbase.regionserver.flushlogentries which by default is 100. >>>>> >>>>> Should we now set this value to 1 in order to have more durable >>>>> but slower inserts by default? Please speak up. >>>>> >>>>> Thx, >>>>> >>>>> >>>>> J-D >>>>> >>>>> >>>> >>> >>> >>> >>> >> >> > >
