If I could override the default, I'd be a hesitant +1. I'd rather see the default be something like retry 10 times, then throw an error. With one option being infinite retries.
-Joey On Mon, Jun 27, 2011 at 2:21 PM, Stack <st...@duboce.net> wrote: > I'd be fine with changing the default in hbase so clients just keep > trying. What do others think? > St.Ack > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau <alex.barano...@gmail.com> > wrote: >> The code I pasted works for me: it reconnects successfully. Just thought it >> might be not the best way to do it.. I realized that by using HBase >> configuration properties we could just say that it's up to user to configure >> HBase client (created by Flume) properly (e.g. by adding hbase-site.xml with >> settings to classpath). On the other hand, it looks to me that users of >> HBase sinks will *always* want it to retry writing to HBase until it works >> out. But default configuration works not this way: sinks stops when HBase is >> temporarily down or inaccessible. Hence it makes using the sink more >> complicated (because default configuration sucks), which I'd like to avoid >> here by adding the code above. Ideally the default configuration should work >> the best way for general-purpose case. >> >> I understood what are the ways to implement/configure such behavior. I think >> we should discuss what is the best default behavior and do we need to allow >> user override it on Flume ML (or directly at >> https://issues.cloudera.org/browse/FLUME-685). >> >> Thank you guys, >> >> Alex Baranau >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase >> >> >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <st...@duboce.net> wrote: >> >>> Either should work Alex. Your version will go "for ever". Have you >>> tried yanking hbase out from under the client to see if it reconnects? >>> >>> Good on you, >>> St.Ack >>> >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau <alex.barano...@gmail.com> >>> wrote: >>> > Yes, that is what intended, I think. To make the whole picture clear, >>> here's >>> > the context: >>> > >>> > * there's a Flume's HBase sink (read: HBase client) which writes data >>> from >>> > Flume "pipe" (read: some event-based messages source) to HTable; >>> > * when HBase is down for some time (with default HBase configuration on >>> > Flume's sink side) HTable.put throws exception and client exits (it >>> usually >>> > takes ~10 min to fail); >>> > * Flume is smart enough to accumulate data to be written reliably if sink >>> > behaves badly (not writing for some time, pauses, etc.), so it would be >>> > great if the sink tries to write data until HBase is up again, BUT: >>> > * but here, as we have complete "failure" of sink process (thread needs >>> to >>> > be restarted) the data never reaches HTable even after HBase cluster is >>> > brought up again. >>> > >>> > So you suggest instead of this extra construction around HTable.put to >>> use >>> > configuration properties "hbase.client.pause" and >>> > "hbase.client.retries.number"? I.e. make retries attempts to be >>> (reasonably) >>> > close to "perform forever". Is that what you meant? >>> > >>> > Thank you, >>> > Alex Baranau >>> > ---- >>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - >>> HBase >>> > >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>> > >>> >> This would retry indefinitely, right ? >>> >> Normally maximum retry duration would govern how long the retry is >>> >> attempted. >>> >> >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau <alex.barano...@gmail.com >>> >> >wrote: >>> >> >>> >> > Hello, >>> >> > >>> >> > Just wanted to confirm that I'm doing things in a proper way here. How >>> >> > about >>> >> > this code to handle the temp cluster connectivity problems (or cluster >>> >> down >>> >> > time) on client-side? >>> >> > >>> >> > + // HTable.put() will fail with exception if connection to cluster >>> is >>> >> > temporarily broken or >>> >> > + // cluster is temporarily down. To be sure data is written we >>> retry >>> >> > writing. >>> >> > + boolean dataWritten = false; >>> >> > + do { >>> >> > + try { >>> >> > + table.put(p); >>> >> > + dataWritten = true; >>> >> > + } catch (IOException ioe) { // indicates cluster connectivity >>> >> > problem >>> >> > (also thrown when cluster is down) >>> >> > + LOG.error("Writing data to HBase failed, will try again in " >>> + >>> >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe); >>> >> > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL * >>> >> 1000); >>> >> > + } >>> >> > + } while (!dataWritten); >>> >> > >>> >> > Thank you in advance, >>> >> > Alex Baranau >>> >> > ---- >>> >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - >>> >> HBase >>> >> > >>> >> >>> > >>> >> > -- Joseph Echeverria Cloudera, Inc. 443.305.9434