I think you are talking here about loosing some data from client-side buffer. I don't think using batch will help. If we use batch from client code and want to use the client-side buffering, we would need to implement the same buffering code already implemented in HTable. The behavior and ack sending will be the same: the ack is sent after Flume sink receives the event, which might be buffered and not persisted (yet) to HBase. I haven't looked in Flume's ability to skip sending ack on receiving event in sink and doing it in batches later (after the actual persisting happens). Will investigate that as a separate effort.
In general, please correct me if I'm wrong, but there won't be much difference between using HTable's batch and put: * with put() I can also tell what was persisted and which records failed, as they will be available in the client-side buffer after failures * internally put uses batch anyways (i.e. connection.processBatch) Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Tue, Jun 28, 2011 at 10:41 PM, Doug Meil <[email protected]>wrote: > > But if Flume used the htable 'batch' method instead of 'put'... > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html# > batch%28java.util.List%29 > > .. doesn't it sidestep this issue? Because instead of being unsure what > was in the write-buffer and what wasn't, the caller knows exactly what was > sent and whether it was sent without error. > > > > > > On 6/28/11 1:07 PM, "Alex Baranau" <[email protected]> wrote: > > >> if the sink "dies" for some reason, then it should > >> push that back to the upstream parts of the flume dataflow, and have > >>them > >> buffer data on local disk. > > > >True. But this seem to be a separate issue: > >https://issues.cloudera.org/browse/FLUME-390. > > > >Alex Baranau > >---- > >Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - > HBase > > > >On Tue, Jun 28, 2011 at 7:40 PM, Doug Meil > ><[email protected]>wrote: > > > >> I agree with what Todd & Gary said. I don't like retry-forever, > >> especially as a default option in HBase. > >> > >> > >> -----Original Message----- > >> From: Gary Helmling [mailto:[email protected]] > >> Sent: Tuesday, June 28, 2011 12:18 PM > >> To: [email protected] > >> Cc: Jonathan Hsieh > >> Subject: Re: Retry HTable.put() on client-side to handle temp > >>connectivity > >> problem > >> > >> I'd also be wary of changing the default to retry forever. This might > >>be > >> hard to differentiate from a hang or deadlock for new users and seems to > >> violate "least surprise". > >> > >> In many cases it's preferable to have some kind of predictable failure > >>as > >> well. So I think this would appear to be a regression in behavior. If > >> you're serving say web site data from hbase, you may prefer an > >>occasional > >> error or timeout rather than having page loading hang forever. > >> > >> I'm all for making "retry forever" a configurable option, but do we need > >> any new knobs here? > >> > >> --gh > >> > >> > >> On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <[email protected]> > >> wrote: > >> > >> > If I could override the default, I'd be a hesitant +1. I'd rather see > >> > the default be something like retry 10 times, then throw an error. > >> > With one option being infinite retries. > >> > > >> > -Joey > >> > > >> > On Mon, Jun 27, 2011 at 2:21 PM, Stack <[email protected]> wrote: > >> > > I'd be fine with changing the default in hbase so clients just keep > >> > > trying. What do others think? > >> > > St.Ack > >> > > > >> > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau > >> > > <[email protected]> > >> > wrote: > >> > >> The code I pasted works for me: it reconnects successfully. Just > >> > >> thought > >> > it > >> > >> might be not the best way to do it.. I realized that by using HBase > >> > >> configuration properties we could just say that it's up to user to > >> > configure > >> > >> HBase client (created by Flume) properly (e.g. by adding > >> > >> hbase-site.xml > >> > with > >> > >> settings to classpath). On the other hand, it looks to me that > >> > >> users of HBase sinks will *always* want it to retry writing to > >> > >> HBase until it > >> > works > >> > >> out. But default configuration works not this way: sinks stops when > >> > HBase is > >> > >> temporarily down or inaccessible. Hence it makes using the sink > >> > >> more complicated (because default configuration sucks), which I'd > >> > >> like to > >> > avoid > >> > >> here by adding the code above. Ideally the default configuration > >> > >> should > >> > work > >> > >> the best way for general-purpose case. > >> > >> > >> > >> I understood what are the ways to implement/configure such > >> > >> behavior. I > >> > think > >> > >> we should discuss what is the best default behavior and do we need > >> > >> to > >> > allow > >> > >> user override it on Flume ML (or directly at > >> > >> https://issues.cloudera.org/browse/FLUME-685). > >> > >> > >> > >> Thank you guys, > >> > >> > >> > >> Alex Baranau > >> > >> ---- > >> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop > >> > >> - > >> > HBase > >> > >> > >> > >> > >> > >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <[email protected]> wrote: > >> > >> > >> > >>> Either should work Alex. Your version will go "for ever". Have > >> > >>> you tried yanking hbase out from under the client to see if it > >> reconnects? > >> > >>> > >> > >>> Good on you, > >> > >>> St.Ack > >> > >>> > >> > >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau < > >> > [email protected]> > >> > >>> wrote: > >> > >>> > Yes, that is what intended, I think. To make the whole picture > >> > >>> > clear, > >> > >>> here's > >> > >>> > the context: > >> > >>> > > >> > >>> > * there's a Flume's HBase sink (read: HBase client) which writes > >> > >>> > data > >> > >>> from > >> > >>> > Flume "pipe" (read: some event-based messages source) to HTable; > >> > >>> > * when HBase is down for some time (with default HBase > >> > >>> > configuration > >> > on > >> > >>> > Flume's sink side) HTable.put throws exception and client exits > >> > >>> > (it > >> > >>> usually > >> > >>> > takes ~10 min to fail); > >> > >>> > * Flume is smart enough to accumulate data to be written > >> > >>> > reliably if > >> > sink > >> > >>> > behaves badly (not writing for some time, pauses, etc.), so it > >> > >>> > would > >> > be > >> > >>> > great if the sink tries to write data until HBase is up again, > >>BUT: > >> > >>> > * but here, as we have complete "failure" of sink process > >> > >>> > (thread > >> > needs > >> > >>> to > >> > >>> > be restarted) the data never reaches HTable even after HBase > >> > >>> > cluster > >> > is > >> > >>> > brought up again. > >> > >>> > > >> > >>> > So you suggest instead of this extra construction around > >> > >>> > HTable.put > >> > to > >> > >>> use > >> > >>> > configuration properties "hbase.client.pause" and > >> > >>> > "hbase.client.retries.number"? I.e. make retries attempts to be > >> > >>> (reasonably) > >> > >>> > close to "perform forever". Is that what you meant? > >> > >>> > > >> > >>> > Thank you, > >> > >>> > Alex Baranau > >> > >>> > ---- > >> > >>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - > >> > >>> > Hadoop - > >> > >>> HBase > >> > >>> > > >> > >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <[email protected]> > >> > wrote: > >> > >>> > > >> > >>> >> This would retry indefinitely, right ? > >> > >>> >> Normally maximum retry duration would govern how long the retry > >> > >>> >> is attempted. > >> > >>> >> > >> > >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau < > >> > [email protected] > >> > >>> >> >wrote: > >> > >>> >> > >> > >>> >> > Hello, > >> > >>> >> > > >> > >>> >> > Just wanted to confirm that I'm doing things in a proper way > >> here. > >> > How > >> > >>> >> > about > >> > >>> >> > this code to handle the temp cluster connectivity problems > >> > >>> >> > (or > >> > cluster > >> > >>> >> down > >> > >>> >> > time) on client-side? > >> > >>> >> > > >> > >>> >> > + // HTable.put() will fail with exception if connection > >> > >>> >> > + to > >> > cluster > >> > >>> is > >> > >>> >> > temporarily broken or > >> > >>> >> > + // cluster is temporarily down. To be sure data is > >> > >>> >> > + written we > >> > >>> retry > >> > >>> >> > writing. > >> > >>> >> > + boolean dataWritten = false; > >> > >>> >> > + do { > >> > >>> >> > + try { > >> > >>> >> > + table.put(p); > >> > >>> >> > + dataWritten = true; > >> > >>> >> > + } catch (IOException ioe) { // indicates cluster > >> > connectivity > >> > >>> >> > problem > >> > >>> >> > (also thrown when cluster is down) > >> > >>> >> > + LOG.error("Writing data to HBase failed, will try > >> > >>> >> > + again > >> > in " > >> > >>> + > >> > >>> >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe); > >> > >>> >> > + > >> > >>> >> > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL > >> > * > >> > >>> >> 1000); > >> > >>> >> > + } > >> > >>> >> > + } while (!dataWritten); > >> > >>> >> > > >> > >>> >> > Thank you in advance, > >> > >>> >> > Alex Baranau > >> > >>> >> > ---- > >> > >>> >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - > >> > Hadoop - > >> > >>> >> HBase > >> > >>> >> > > >> > >>> >> > >> > >>> > > >> > >>> > >> > >> > >> > > > >> > > >> > > >> > > >> > -- > >> > Joseph Echeverria > >> > Cloudera, Inc. > >> > 443.305.9434 > >> > > >> > >
