All correct. No changes in HBase needed (no were requested actually, changing default retry behavior was just suggestion by Stack). Thank you all for participating!
Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Wed, Jun 29, 2011 at 4:44 PM, Doug Meil <[email protected]>wrote: > > Hi there- > > 1) Buffer/Batch > > Addressing the comment in the Cloudera ticket (FLUME-390) "currently > non-written events are lost.", I agree that two paths (write-buffer vs. > batch-it-yourself) are available for Flume to recover from a failure and > know what hasn't been sent (or what was at least attempted to be sent). > > Thus, I don't see this an "HBase issue". There are existing APIs for > Flume to utilize that will get the job done. > > 2) Retry-forver. > > I've seen several folks vote -1 on retry-forever as default behavior. > Based on the conversation I'm assuming this won't happen. > > > Are there other aspects to this issue? I doesn't seem like any HBase > changes are needed to address these issues. > > > > > > On 6/29/11 2:17 AM, "Alex Baranau" <[email protected]> wrote: > > >I think you are talking here about loosing some data from client-side > >buffer. I don't think using batch will help. If we use batch from client > >code and want to use the client-side buffering, we would need to implement > >the same buffering code already implemented in HTable. The behavior and > >ack > >sending will be the same: the ack is sent after Flume sink receives the > >event, which might be buffered and not persisted (yet) to HBase. I haven't > >looked in Flume's ability to skip sending ack on receiving event in sink > >and > >doing it in batches later (after the actual persisting happens). Will > >investigate that as a separate effort. > > > >In general, please correct me if I'm wrong, but there won't be much > >difference between using HTable's batch and put: > >* with put() I can also tell what was persisted and which records failed, > >as > >they will be available in the client-side buffer after failures > >* internally put uses batch anyways (i.e. connection.processBatch) > > > >Alex Baranau > >---- > >Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > HBase > > > >On Tue, Jun 28, 2011 at 10:41 PM, Doug Meil > ><[email protected]>wrote: > > > >> > >> But if Flume used the htable 'batch' method instead of 'put'... > >> > >> > >> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm > >>l# > >> batch%28java.util.List%29 > >> > >> .. doesn't it sidestep this issue? Because instead of being unsure what > >> was in the write-buffer and what wasn't, the caller knows exactly what > >>was > >> sent and whether it was sent without error. > >> > >> > >> > >> > >> > >> On 6/28/11 1:07 PM, "Alex Baranau" <[email protected]> wrote: > >> > >> >> if the sink "dies" for some reason, then it should > >> >> push that back to the upstream parts of the flume dataflow, and have > >> >>them > >> >> buffer data on local disk. > >> > > >> >True. But this seem to be a separate issue: > >> >https://issues.cloudera.org/browse/FLUME-390. > >> > > >> >Alex Baranau > >> >---- > >> >Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > >> HBase > >> > > >> >On Tue, Jun 28, 2011 at 7:40 PM, Doug Meil > >> ><[email protected]>wrote: > >> > > >> >> I agree with what Todd & Gary said. I don't like retry-forever, > >> >> especially as a default option in HBase. > >> >> > >> >> > >> >> -----Original Message----- > >> >> From: Gary Helmling [mailto:[email protected]] > >> >> Sent: Tuesday, June 28, 2011 12:18 PM > >> >> To: [email protected] > >> >> Cc: Jonathan Hsieh > >> >> Subject: Re: Retry HTable.put() on client-side to handle temp > >> >>connectivity > >> >> problem > >> >> > >> >> I'd also be wary of changing the default to retry forever. This > >>might > >> >>be > >> >> hard to differentiate from a hang or deadlock for new users and > >>seems to > >> >> violate "least surprise". > >> >> > >> >> In many cases it's preferable to have some kind of predictable > >>failure > >> >>as > >> >> well. So I think this would appear to be a regression in behavior. > >>If > >> >> you're serving say web site data from hbase, you may prefer an > >> >>occasional > >> >> error or timeout rather than having page loading hang forever. > >> >> > >> >> I'm all for making "retry forever" a configurable option, but do we > >>need > >> >> any new knobs here? > >> >> > >> >> --gh > >> >> > >> >> > >> >> On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <[email protected]> > >> >> wrote: > >> >> > >> >> > If I could override the default, I'd be a hesitant +1. I'd rather > >>see > >> >> > the default be something like retry 10 times, then throw an error. > >> >> > With one option being infinite retries. > >> >> > > >> >> > -Joey > >> >> > > >> >> > On Mon, Jun 27, 2011 at 2:21 PM, Stack <[email protected]> wrote: > >> >> > > I'd be fine with changing the default in hbase so clients just > >>keep > >> >> > > trying. What do others think? > >> >> > > St.Ack > >> >> > > > >> >> > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau > >> >> > > <[email protected]> > >> >> > wrote: > >> >> > >> The code I pasted works for me: it reconnects successfully. Just > >> >> > >> thought > >> >> > it > >> >> > >> might be not the best way to do it.. I realized that by using > >>HBase > >> >> > >> configuration properties we could just say that it's up to user > >>to > >> >> > configure > >> >> > >> HBase client (created by Flume) properly (e.g. by adding > >> >> > >> hbase-site.xml > >> >> > with > >> >> > >> settings to classpath). On the other hand, it looks to me that > >> >> > >> users of HBase sinks will *always* want it to retry writing to > >> >> > >> HBase until it > >> >> > works > >> >> > >> out. But default configuration works not this way: sinks stops > >>when > >> >> > HBase is > >> >> > >> temporarily down or inaccessible. Hence it makes using the sink > >> >> > >> more complicated (because default configuration sucks), which > >>I'd > >> >> > >> like to > >> >> > avoid > >> >> > >> here by adding the code above. Ideally the default configuration > >> >> > >> should > >> >> > work > >> >> > >> the best way for general-purpose case. > >> >> > >> > >> >> > >> I understood what are the ways to implement/configure such > >> >> > >> behavior. I > >> >> > think > >> >> > >> we should discuss what is the best default behavior and do we > >>need > >> >> > >> to > >> >> > allow > >> >> > >> user override it on Flume ML (or directly at > >> >> > >> https://issues.cloudera.org/browse/FLUME-685). > >> >> > >> > >> >> > >> Thank you guys, > >> >> > >> > >> >> > >> Alex Baranau > >> >> > >> ---- > >> >> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - > >>Hadoop > >> >> > >> - > >> >> > HBase > >> >> > >> > >> >> > >> > >> >> > >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <[email protected]> > >>wrote: > >> >> > >> > >> >> > >>> Either should work Alex. Your version will go "for ever". > >>Have > >> >> > >>> you tried yanking hbase out from under the client to see if it > >> >> reconnects? > >> >> > >>> > >> >> > >>> Good on you, > >> >> > >>> St.Ack > >> >> > >>> > >> >> > >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau < > >> >> > [email protected]> > >> >> > >>> wrote: > >> >> > >>> > Yes, that is what intended, I think. To make the whole > >>picture > >> >> > >>> > clear, > >> >> > >>> here's > >> >> > >>> > the context: > >> >> > >>> > > >> >> > >>> > * there's a Flume's HBase sink (read: HBase client) which > >>writes > >> >> > >>> > data > >> >> > >>> from > >> >> > >>> > Flume "pipe" (read: some event-based messages source) to > >>HTable; > >> >> > >>> > * when HBase is down for some time (with default HBase > >> >> > >>> > configuration > >> >> > on > >> >> > >>> > Flume's sink side) HTable.put throws exception and client > >>exits > >> >> > >>> > (it > >> >> > >>> usually > >> >> > >>> > takes ~10 min to fail); > >> >> > >>> > * Flume is smart enough to accumulate data to be written > >> >> > >>> > reliably if > >> >> > sink > >> >> > >>> > behaves badly (not writing for some time, pauses, etc.), so > >>it > >> >> > >>> > would > >> >> > be > >> >> > >>> > great if the sink tries to write data until HBase is up > >>again, > >> >>BUT: > >> >> > >>> > * but here, as we have complete "failure" of sink process > >> >> > >>> > (thread > >> >> > needs > >> >> > >>> to > >> >> > >>> > be restarted) the data never reaches HTable even after HBase > >> >> > >>> > cluster > >> >> > is > >> >> > >>> > brought up again. > >> >> > >>> > > >> >> > >>> > So you suggest instead of this extra construction around > >> >> > >>> > HTable.put > >> >> > to > >> >> > >>> use > >> >> > >>> > configuration properties "hbase.client.pause" and > >> >> > >>> > "hbase.client.retries.number"? I.e. make retries attempts to > >>be > >> >> > >>> (reasonably) > >> >> > >>> > close to "perform forever". Is that what you meant? > >> >> > >>> > > >> >> > >>> > Thank you, > >> >> > >>> > Alex Baranau > >> >> > >>> > ---- > >> >> > >>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - > >> >> > >>> > Hadoop - > >> >> > >>> HBase > >> >> > >>> > > >> >> > >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu > >><[email protected]> > >> >> > wrote: > >> >> > >>> > > >> >> > >>> >> This would retry indefinitely, right ? > >> >> > >>> >> Normally maximum retry duration would govern how long the > >>retry > >> >> > >>> >> is attempted. > >> >> > >>> >> > >> >> > >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau < > >> >> > [email protected] > >> >> > >>> >> >wrote: > >> >> > >>> >> > >> >> > >>> >> > Hello, > >> >> > >>> >> > > >> >> > >>> >> > Just wanted to confirm that I'm doing things in a proper > >>way > >> >> here. > >> >> > How > >> >> > >>> >> > about > >> >> > >>> >> > this code to handle the temp cluster connectivity problems > >> >> > >>> >> > (or > >> >> > cluster > >> >> > >>> >> down > >> >> > >>> >> > time) on client-side? > >> >> > >>> >> > > >> >> > >>> >> > + // HTable.put() will fail with exception if > >>connection > >> >> > >>> >> > + to > >> >> > cluster > >> >> > >>> is > >> >> > >>> >> > temporarily broken or > >> >> > >>> >> > + // cluster is temporarily down. To be sure data is > >> >> > >>> >> > + written we > >> >> > >>> retry > >> >> > >>> >> > writing. > >> >> > >>> >> > + boolean dataWritten = false; > >> >> > >>> >> > + do { > >> >> > >>> >> > + try { > >> >> > >>> >> > + table.put(p); > >> >> > >>> >> > + dataWritten = true; > >> >> > >>> >> > + } catch (IOException ioe) { // indicates cluster > >> >> > connectivity > >> >> > >>> >> > problem > >> >> > >>> >> > (also thrown when cluster is down) > >> >> > >>> >> > + LOG.error("Writing data to HBase failed, will try > >> >> > >>> >> > + again > >> >> > in " > >> >> > >>> + > >> >> > >>> >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe); > >> >> > >>> >> > + > >> >> > >>> >> > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL > >> >> > * > >> >> > >>> >> 1000); > >> >> > >>> >> > + } > >> >> > >>> >> > + } while (!dataWritten); > >> >> > >>> >> > > >> >> > >>> >> > Thank you in advance, > >> >> > >>> >> > Alex Baranau > >> >> > >>> >> > ---- > >> >> > >>> >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > >>- > >> >> > Hadoop - > >> >> > >>> >> HBase > >> >> > >>> >> > > >> >> > >>> >> > >> >> > >>> > > >> >> > >>> > >> >> > >> > >> >> > > > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Joseph Echeverria > >> >> > Cloudera, Inc. > >> >> > 443.305.9434 > >> >> > > >> >> > >> > >> > >
