But if Flume used the htable 'batch' method instead of 'put'... http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html# batch%28java.util.List%29
.. doesn't it sidestep this issue? Because instead of being unsure what was in the write-buffer and what wasn't, the caller knows exactly what was sent and whether it was sent without error. On 6/28/11 1:07 PM, "Alex Baranau" <[email protected]> wrote: >> if the sink "dies" for some reason, then it should >> push that back to the upstream parts of the flume dataflow, and have >>them >> buffer data on local disk. > >True. But this seem to be a separate issue: >https://issues.cloudera.org/browse/FLUME-390. > >Alex Baranau >---- >Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase > >On Tue, Jun 28, 2011 at 7:40 PM, Doug Meil ><[email protected]>wrote: > >> I agree with what Todd & Gary said. I don't like retry-forever, >> especially as a default option in HBase. >> >> >> -----Original Message----- >> From: Gary Helmling [mailto:[email protected]] >> Sent: Tuesday, June 28, 2011 12:18 PM >> To: [email protected] >> Cc: Jonathan Hsieh >> Subject: Re: Retry HTable.put() on client-side to handle temp >>connectivity >> problem >> >> I'd also be wary of changing the default to retry forever. This might >>be >> hard to differentiate from a hang or deadlock for new users and seems to >> violate "least surprise". >> >> In many cases it's preferable to have some kind of predictable failure >>as >> well. So I think this would appear to be a regression in behavior. If >> you're serving say web site data from hbase, you may prefer an >>occasional >> error or timeout rather than having page loading hang forever. >> >> I'm all for making "retry forever" a configurable option, but do we need >> any new knobs here? >> >> --gh >> >> >> On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <[email protected]> >> wrote: >> >> > If I could override the default, I'd be a hesitant +1. I'd rather see >> > the default be something like retry 10 times, then throw an error. >> > With one option being infinite retries. >> > >> > -Joey >> > >> > On Mon, Jun 27, 2011 at 2:21 PM, Stack <[email protected]> wrote: >> > > I'd be fine with changing the default in hbase so clients just keep >> > > trying. What do others think? >> > > St.Ack >> > > >> > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau >> > > <[email protected]> >> > wrote: >> > >> The code I pasted works for me: it reconnects successfully. Just >> > >> thought >> > it >> > >> might be not the best way to do it.. I realized that by using HBase >> > >> configuration properties we could just say that it's up to user to >> > configure >> > >> HBase client (created by Flume) properly (e.g. by adding >> > >> hbase-site.xml >> > with >> > >> settings to classpath). On the other hand, it looks to me that >> > >> users of HBase sinks will *always* want it to retry writing to >> > >> HBase until it >> > works >> > >> out. But default configuration works not this way: sinks stops when >> > HBase is >> > >> temporarily down or inaccessible. Hence it makes using the sink >> > >> more complicated (because default configuration sucks), which I'd >> > >> like to >> > avoid >> > >> here by adding the code above. Ideally the default configuration >> > >> should >> > work >> > >> the best way for general-purpose case. >> > >> >> > >> I understood what are the ways to implement/configure such >> > >> behavior. I >> > think >> > >> we should discuss what is the best default behavior and do we need >> > >> to >> > allow >> > >> user override it on Flume ML (or directly at >> > >> https://issues.cloudera.org/browse/FLUME-685). >> > >> >> > >> Thank you guys, >> > >> >> > >> Alex Baranau >> > >> ---- >> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop >> > >> - >> > HBase >> > >> >> > >> >> > >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <[email protected]> wrote: >> > >> >> > >>> Either should work Alex. Your version will go "for ever". Have >> > >>> you tried yanking hbase out from under the client to see if it >> reconnects? >> > >>> >> > >>> Good on you, >> > >>> St.Ack >> > >>> >> > >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau < >> > [email protected]> >> > >>> wrote: >> > >>> > Yes, that is what intended, I think. To make the whole picture >> > >>> > clear, >> > >>> here's >> > >>> > the context: >> > >>> > >> > >>> > * there's a Flume's HBase sink (read: HBase client) which writes >> > >>> > data >> > >>> from >> > >>> > Flume "pipe" (read: some event-based messages source) to HTable; >> > >>> > * when HBase is down for some time (with default HBase >> > >>> > configuration >> > on >> > >>> > Flume's sink side) HTable.put throws exception and client exits >> > >>> > (it >> > >>> usually >> > >>> > takes ~10 min to fail); >> > >>> > * Flume is smart enough to accumulate data to be written >> > >>> > reliably if >> > sink >> > >>> > behaves badly (not writing for some time, pauses, etc.), so it >> > >>> > would >> > be >> > >>> > great if the sink tries to write data until HBase is up again, >>BUT: >> > >>> > * but here, as we have complete "failure" of sink process >> > >>> > (thread >> > needs >> > >>> to >> > >>> > be restarted) the data never reaches HTable even after HBase >> > >>> > cluster >> > is >> > >>> > brought up again. >> > >>> > >> > >>> > So you suggest instead of this extra construction around >> > >>> > HTable.put >> > to >> > >>> use >> > >>> > configuration properties "hbase.client.pause" and >> > >>> > "hbase.client.retries.number"? I.e. make retries attempts to be >> > >>> (reasonably) >> > >>> > close to "perform forever". Is that what you meant? >> > >>> > >> > >>> > Thank you, >> > >>> > Alex Baranau >> > >>> > ---- >> > >>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - >> > >>> > Hadoop - >> > >>> HBase >> > >>> > >> > >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <[email protected]> >> > wrote: >> > >>> > >> > >>> >> This would retry indefinitely, right ? >> > >>> >> Normally maximum retry duration would govern how long the retry >> > >>> >> is attempted. >> > >>> >> >> > >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau < >> > [email protected] >> > >>> >> >wrote: >> > >>> >> >> > >>> >> > Hello, >> > >>> >> > >> > >>> >> > Just wanted to confirm that I'm doing things in a proper way >> here. >> > How >> > >>> >> > about >> > >>> >> > this code to handle the temp cluster connectivity problems >> > >>> >> > (or >> > cluster >> > >>> >> down >> > >>> >> > time) on client-side? >> > >>> >> > >> > >>> >> > + // HTable.put() will fail with exception if connection >> > >>> >> > + to >> > cluster >> > >>> is >> > >>> >> > temporarily broken or >> > >>> >> > + // cluster is temporarily down. To be sure data is >> > >>> >> > + written we >> > >>> retry >> > >>> >> > writing. >> > >>> >> > + boolean dataWritten = false; >> > >>> >> > + do { >> > >>> >> > + try { >> > >>> >> > + table.put(p); >> > >>> >> > + dataWritten = true; >> > >>> >> > + } catch (IOException ioe) { // indicates cluster >> > connectivity >> > >>> >> > problem >> > >>> >> > (also thrown when cluster is down) >> > >>> >> > + LOG.error("Writing data to HBase failed, will try >> > >>> >> > + again >> > in " >> > >>> + >> > >>> >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe); >> > >>> >> > + >> > >>> >> > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL >> > * >> > >>> >> 1000); >> > >>> >> > + } >> > >>> >> > + } while (!dataWritten); >> > >>> >> > >> > >>> >> > Thank you in advance, >> > >>> >> > Alex Baranau >> > >>> >> > ---- >> > >>> >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - >> > Hadoop - >> > >>> >> HBase >> > >>> >> > >> > >>> >> >> > >>> > >> > >>> >> > >> >> > > >> > >> > >> > >> > -- >> > Joseph Echeverria >> > Cloudera, Inc. >> > 443.305.9434 >> > >>
