> if the sink "dies" for some reason, then it should > push that back to the upstream parts of the flume dataflow, and have them > buffer data on local disk.
True. But this seem to be a separate issue: https://issues.cloudera.org/browse/FLUME-390. Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Tue, Jun 28, 2011 at 7:40 PM, Doug Meil <[email protected]>wrote: > I agree with what Todd & Gary said. I don't like retry-forever, > especially as a default option in HBase. > > > -----Original Message----- > From: Gary Helmling [mailto:[email protected]] > Sent: Tuesday, June 28, 2011 12:18 PM > To: [email protected] > Cc: Jonathan Hsieh > Subject: Re: Retry HTable.put() on client-side to handle temp connectivity > problem > > I'd also be wary of changing the default to retry forever. This might be > hard to differentiate from a hang or deadlock for new users and seems to > violate "least surprise". > > In many cases it's preferable to have some kind of predictable failure as > well. So I think this would appear to be a regression in behavior. If > you're serving say web site data from hbase, you may prefer an occasional > error or timeout rather than having page loading hang forever. > > I'm all for making "retry forever" a configurable option, but do we need > any new knobs here? > > --gh > > > On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <[email protected]> > wrote: > > > If I could override the default, I'd be a hesitant +1. I'd rather see > > the default be something like retry 10 times, then throw an error. > > With one option being infinite retries. > > > > -Joey > > > > On Mon, Jun 27, 2011 at 2:21 PM, Stack <[email protected]> wrote: > > > I'd be fine with changing the default in hbase so clients just keep > > > trying. What do others think? > > > St.Ack > > > > > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau > > > <[email protected]> > > wrote: > > >> The code I pasted works for me: it reconnects successfully. Just > > >> thought > > it > > >> might be not the best way to do it.. I realized that by using HBase > > >> configuration properties we could just say that it's up to user to > > configure > > >> HBase client (created by Flume) properly (e.g. by adding > > >> hbase-site.xml > > with > > >> settings to classpath). On the other hand, it looks to me that > > >> users of HBase sinks will *always* want it to retry writing to > > >> HBase until it > > works > > >> out. But default configuration works not this way: sinks stops when > > HBase is > > >> temporarily down or inaccessible. Hence it makes using the sink > > >> more complicated (because default configuration sucks), which I'd > > >> like to > > avoid > > >> here by adding the code above. Ideally the default configuration > > >> should > > work > > >> the best way for general-purpose case. > > >> > > >> I understood what are the ways to implement/configure such > > >> behavior. I > > think > > >> we should discuss what is the best default behavior and do we need > > >> to > > allow > > >> user override it on Flume ML (or directly at > > >> https://issues.cloudera.org/browse/FLUME-685). > > >> > > >> Thank you guys, > > >> > > >> Alex Baranau > > >> ---- > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop > > >> - > > HBase > > >> > > >> > > >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <[email protected]> wrote: > > >> > > >>> Either should work Alex. Your version will go "for ever". Have > > >>> you tried yanking hbase out from under the client to see if it > reconnects? > > >>> > > >>> Good on you, > > >>> St.Ack > > >>> > > >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau < > > [email protected]> > > >>> wrote: > > >>> > Yes, that is what intended, I think. To make the whole picture > > >>> > clear, > > >>> here's > > >>> > the context: > > >>> > > > >>> > * there's a Flume's HBase sink (read: HBase client) which writes > > >>> > data > > >>> from > > >>> > Flume "pipe" (read: some event-based messages source) to HTable; > > >>> > * when HBase is down for some time (with default HBase > > >>> > configuration > > on > > >>> > Flume's sink side) HTable.put throws exception and client exits > > >>> > (it > > >>> usually > > >>> > takes ~10 min to fail); > > >>> > * Flume is smart enough to accumulate data to be written > > >>> > reliably if > > sink > > >>> > behaves badly (not writing for some time, pauses, etc.), so it > > >>> > would > > be > > >>> > great if the sink tries to write data until HBase is up again, BUT: > > >>> > * but here, as we have complete "failure" of sink process > > >>> > (thread > > needs > > >>> to > > >>> > be restarted) the data never reaches HTable even after HBase > > >>> > cluster > > is > > >>> > brought up again. > > >>> > > > >>> > So you suggest instead of this extra construction around > > >>> > HTable.put > > to > > >>> use > > >>> > configuration properties "hbase.client.pause" and > > >>> > "hbase.client.retries.number"? I.e. make retries attempts to be > > >>> (reasonably) > > >>> > close to "perform forever". Is that what you meant? > > >>> > > > >>> > Thank you, > > >>> > Alex Baranau > > >>> > ---- > > >>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - > > >>> > Hadoop - > > >>> HBase > > >>> > > > >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <[email protected]> > > wrote: > > >>> > > > >>> >> This would retry indefinitely, right ? > > >>> >> Normally maximum retry duration would govern how long the retry > > >>> >> is attempted. > > >>> >> > > >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau < > > [email protected] > > >>> >> >wrote: > > >>> >> > > >>> >> > Hello, > > >>> >> > > > >>> >> > Just wanted to confirm that I'm doing things in a proper way > here. > > How > > >>> >> > about > > >>> >> > this code to handle the temp cluster connectivity problems > > >>> >> > (or > > cluster > > >>> >> down > > >>> >> > time) on client-side? > > >>> >> > > > >>> >> > + // HTable.put() will fail with exception if connection > > >>> >> > + to > > cluster > > >>> is > > >>> >> > temporarily broken or > > >>> >> > + // cluster is temporarily down. To be sure data is > > >>> >> > + written we > > >>> retry > > >>> >> > writing. > > >>> >> > + boolean dataWritten = false; > > >>> >> > + do { > > >>> >> > + try { > > >>> >> > + table.put(p); > > >>> >> > + dataWritten = true; > > >>> >> > + } catch (IOException ioe) { // indicates cluster > > connectivity > > >>> >> > problem > > >>> >> > (also thrown when cluster is down) > > >>> >> > + LOG.error("Writing data to HBase failed, will try > > >>> >> > + again > > in " > > >>> + > > >>> >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe); > > >>> >> > + > > >>> >> > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL > > * > > >>> >> 1000); > > >>> >> > + } > > >>> >> > + } while (!dataWritten); > > >>> >> > > > >>> >> > Thank you in advance, > > >>> >> > Alex Baranau > > >>> >> > ---- > > >>> >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - > > Hadoop - > > >>> >> HBase > > >>> >> > > > >>> >> > > >>> > > > >>> > > >> > > > > > > > > > > > -- > > Joseph Echeverria > > Cloudera, Inc. > > 443.305.9434 > > >
