I also think if it takes 10 minutes to fail, that is probably too long. Best regards,
- Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) ----- Original Message ----- > From: Doug Meil <[email protected]> > To: "[email protected]" <[email protected]> > Cc: Jonathan Hsieh <[email protected]> > Sent: Tuesday, June 28, 2011 9:40 AM > Subject: RE: Retry HTable.put() on client-side to handle temp connectivity > problem > > I agree with what Todd & Gary said. I don't like retry-forever, > especially as a default option in HBase. > > > -----Original Message----- > From: Gary Helmling [mailto:[email protected]] > Sent: Tuesday, June 28, 2011 12:18 PM > To: [email protected] > Cc: Jonathan Hsieh > Subject: Re: Retry HTable.put() on client-side to handle temp connectivity > problem > > I'd also be wary of changing the default to retry forever. This might be > hard to differentiate from a hang or deadlock for new users and seems to > violate > "least surprise". > > In many cases it's preferable to have some kind of predictable failure as > well. So I think this would appear to be a regression in behavior. If > you're serving say web site data from hbase, you may prefer an occasional > error or timeout rather than having page loading hang forever. > > I'm all for making "retry forever" a configurable option, but do > we need any new knobs here? > > --gh > > > On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <[email protected]> > wrote: > >> If I could override the default, I'd be a hesitant +1. I'd rather > see >> the default be something like retry 10 times, then throw an error. >> With one option being infinite retries. >> >> -Joey >> >> On Mon, Jun 27, 2011 at 2:21 PM, Stack <[email protected]> wrote: >> > I'd be fine with changing the default in hbase so clients just > keep >> > trying. What do others think? >> > St.Ack >> > >> > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau >> > <[email protected]> >> wrote: >> >> The code I pasted works for me: it reconnects successfully. Just >> >> thought >> it >> >> might be not the best way to do it.. I realized that by using > HBase >> >> configuration properties we could just say that it's up to > user to >> configure >> >> HBase client (created by Flume) properly (e.g. by adding >> >> hbase-site.xml >> with >> >> settings to classpath). On the other hand, it looks to me that >> >> users of HBase sinks will *always* want it to retry writing to >> >> HBase until it >> works >> >> out. But default configuration works not this way: sinks stops > when >> HBase is >> >> temporarily down or inaccessible. Hence it makes using the sink >> >> more complicated (because default configuration sucks), which > I'd >> >> like to >> avoid >> >> here by adding the code above. Ideally the default configuration >> >> should >> work >> >> the best way for general-purpose case. >> >> >> >> I understood what are the ways to implement/configure such >> >> behavior. I >> think >> >> we should discuss what is the best default behavior and do we need > >> >> to >> allow >> >> user override it on Flume ML (or directly at >> >> https://issues.cloudera.org/browse/FLUME-685). >> >> >> >> Thank you guys, >> >> >> >> Alex Baranau >> >> ---- >> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop > >> >> - >> HBase >> >> >> >> >> >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <[email protected]> > wrote: >> >> >> >>> Either should work Alex. Your version will go "for > ever". Have >> >>> you tried yanking hbase out from under the client to see if it > reconnects? >> >>> >> >>> Good on you, >> >>> St.Ack >> >>> >> >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau < >> [email protected]> >> >>> wrote: >> >>> > Yes, that is what intended, I think. To make the whole > picture >> >>> > clear, >> >>> here's >> >>> > the context: >> >>> > >> >>> > * there's a Flume's HBase sink (read: HBase > client) which writes >> >>> > data >> >>> from >> >>> > Flume "pipe" (read: some event-based messages > source) to HTable; >> >>> > * when HBase is down for some time (with default HBase >> >>> > configuration >> on >> >>> > Flume's sink side) HTable.put throws exception and > client exits >> >>> > (it >> >>> usually >> >>> > takes ~10 min to fail); >> >>> > * Flume is smart enough to accumulate data to be written >> >>> > reliably if >> sink >> >>> > behaves badly (not writing for some time, pauses, etc.), > so it >> >>> > would >> be >> >>> > great if the sink tries to write data until HBase is up > again, BUT: >> >>> > * but here, as we have complete "failure" of > sink process >> >>> > (thread >> needs >> >>> to >> >>> > be restarted) the data never reaches HTable even after > HBase >> >>> > cluster >> is >> >>> > brought up again. >> >>> > >> >>> > So you suggest instead of this extra construction around >> >>> > HTable.put >> to >> >>> use >> >>> > configuration properties "hbase.client.pause" > and >> >>> > "hbase.client.retries.number"? I.e. make > retries attempts to be >> >>> (reasonably) >> >>> > close to "perform forever". Is that what you > meant? >> >>> > >> >>> > Thank you, >> >>> > Alex Baranau >> >>> > ---- >> >>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > - >> >>> > Hadoop - >> >>> HBase >> >>> > >> >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu > <[email protected]> >> wrote: >> >>> > >> >>> >> This would retry indefinitely, right ? >> >>> >> Normally maximum retry duration would govern how long > the retry >> >>> >> is attempted. >> >>> >> >> >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau < >> [email protected] >> >>> >> >wrote: >> >>> >> >> >>> >> > Hello, >> >>> >> > >> >>> >> > Just wanted to confirm that I'm doing things > in a proper way here. >> How >> >>> >> > about >> >>> >> > this code to handle the temp cluster > connectivity problems >> >>> >> > (or >> cluster >> >>> >> down >> >>> >> > time) on client-side? >> >>> >> > >> >>> >> > + // HTable.put() will fail with exception if > connection >> >>> >> > + to >> cluster >> >>> is >> >>> >> > temporarily broken or >> >>> >> > + // cluster is temporarily down. To be sure > data is >> >>> >> > + written we >> >>> retry >> >>> >> > writing. >> >>> >> > + boolean dataWritten = false; >> >>> >> > + do { >> >>> >> > + try { >> >>> >> > + table.put(p); >> >>> >> > + dataWritten = true; >> >>> >> > + } catch (IOException ioe) { // indicates > cluster >> connectivity >> >>> >> > problem >> >>> >> > (also thrown when cluster is down) >> >>> >> > + LOG.error("Writing data to HBase > failed, will try >> >>> >> > + again >> in " >> >>> + >> >>> >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", > ioe); >> >>> >> > + >> >>> >> > + > Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL >> * >> >>> >> 1000); >> >>> >> > + } >> >>> >> > + } while (!dataWritten); >> >>> >> > >> >>> >> > Thank you in advance, >> >>> >> > Alex Baranau >> >>> >> > ---- >> >>> >> > Sematext :: http://sematext.com/ :: Solr - > Lucene - Nutch - >> Hadoop - >> >>> >> HBase >> >>> >> > >> >>> >> >> >>> > >> >>> >> >> >> > >> >> >> >> -- >> Joseph Echeverria >> Cloudera, Inc. >> 443.305.9434 >> >
