Re: Retry HTable.put() on client-side to handle temp connectivity problem

Alex Baranau Tue, 28 Jun 2011 23:18:24 -0700

I think you are talking here about loosing some data from client-side
buffer. I don't think using batch will help. If we use batch from client
code and want to use the client-side buffering, we would need to implement
the same buffering code already implemented in HTable. The behavior and ack
sending will be the same: the ack is sent after Flume sink receives the
event, which might be buffered and not persisted (yet) to HBase. I haven't
looked in Flume's ability to skip sending ack on receiving event in sink and
doing it in batches later (after the actual persisting happens). Will
investigate that as a separate effort.


In general, please correct me if I'm wrong, but there won't be much
difference between using HTable's batch and put:
* with put() I can also tell what was persisted and which records failed, as
they will be available in the client-side buffer after failures
* internally put uses batch anyways (i.e. connection.processBatch)

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Tue, Jun 28, 2011 at 10:41 PM, Doug Meil
<[email protected]>wrote:

>
> But if Flume used the htable 'batch' method instead of 'put'...
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#
> batch%28java.util.List%29
>
> .. doesn't it sidestep this issue?  Because instead of being unsure what
> was in the write-buffer and what wasn't, the caller knows exactly what was
> sent and whether it was sent without error.
>
>
>
>
>
> On 6/28/11 1:07 PM, "Alex Baranau" <[email protected]> wrote:
>
> >> if the sink "dies" for some reason, then it should
> >> push that back to the upstream parts of the flume dataflow, and have
> >>them
> >> buffer data on local disk.
> >
> >True. But this seem to be a separate issue:
> >https://issues.cloudera.org/browse/FLUME-390.
> >
> >Alex Baranau
> >----
> >Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> HBase
> >
> >On Tue, Jun 28, 2011 at 7:40 PM, Doug Meil
> ><[email protected]>wrote:
> >
> >> I agree with what Todd & Gary said.   I don't like retry-forever,
> >> especially as a default option in HBase.
> >>
> >>
> >> -----Original Message-----
> >> From: Gary Helmling [mailto:[email protected]]
> >> Sent: Tuesday, June 28, 2011 12:18 PM
> >> To: [email protected]
> >> Cc: Jonathan Hsieh
> >> Subject: Re: Retry HTable.put() on client-side to handle temp
> >>connectivity
> >> problem
> >>
> >> I'd also be wary of changing the default to retry forever.  This might
> >>be
> >> hard to differentiate from a hang or deadlock for new users and seems to
> >> violate "least surprise".
> >>
> >> In many cases it's preferable to have some kind of predictable failure
> >>as
> >> well.  So I think this would appear to be a regression in behavior.  If
> >> you're serving say web site data from hbase, you may prefer an
> >>occasional
> >> error or timeout rather than having page loading hang forever.
> >>
> >> I'm all for making "retry forever" a configurable option, but do we need
> >> any new knobs here?
> >>
> >> --gh
> >>
> >>
> >> On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <[email protected]>
> >> wrote:
> >>
> >> > If I could override the default, I'd be a hesitant +1. I'd rather see
> >> > the default be something like retry 10 times, then throw an error.
> >> > With one option being infinite retries.
> >> >
> >> > -Joey
> >> >
> >> > On Mon, Jun 27, 2011 at 2:21 PM, Stack <[email protected]> wrote:
> >> > > I'd be fine with changing the default in hbase so clients just keep
> >> > > trying.  What do others think?
> >> > > St.Ack
> >> > >
> >> > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau
> >> > > <[email protected]>
> >> > wrote:
> >> > >> The code I pasted works for me: it reconnects successfully. Just
> >> > >> thought
> >> > it
> >> > >> might be not the best way to do it.. I realized that by using HBase
> >> > >> configuration properties we could just say that it's up to user to
> >> > configure
> >> > >> HBase client (created by Flume) properly (e.g. by adding
> >> > >> hbase-site.xml
> >> > with
> >> > >> settings to classpath). On the other hand, it looks to me that
> >> > >> users of HBase sinks will *always* want it to retry writing to
> >> > >> HBase until it
> >> > works
> >> > >> out. But default configuration works not this way: sinks stops when
> >> > HBase is
> >> > >> temporarily down or inaccessible. Hence it makes using the sink
> >> > >> more complicated (because default configuration sucks), which I'd
> >> > >> like to
> >> > avoid
> >> > >> here by adding the code above. Ideally the default configuration
> >> > >> should
> >> > work
> >> > >> the best way for general-purpose case.
> >> > >>
> >> > >> I understood what are the ways to implement/configure such
> >> > >> behavior. I
> >> > think
> >> > >> we should discuss what is the best default behavior and do we need
> >> > >> to
> >> > allow
> >> > >> user override it on Flume ML (or directly at
> >> > >> https://issues.cloudera.org/browse/FLUME-685).
> >> > >>
> >> > >> Thank you guys,
> >> > >>
> >> > >> Alex Baranau
> >> > >> ----
> >> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop
> >> > >> -
> >> > HBase
> >> > >>
> >> > >>
> >> > >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <[email protected]> wrote:
> >> > >>
> >> > >>> Either should work Alex.  Your version will go "for ever".  Have
> >> > >>> you tried yanking hbase out from under the client to see if it
> >> reconnects?
> >> > >>>
> >> > >>> Good on you,
> >> > >>> St.Ack
> >> > >>>
> >> > >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau <
> >> > [email protected]>
> >> > >>> wrote:
> >> > >>> > Yes, that is what intended, I think. To make the whole picture
> >> > >>> > clear,
> >> > >>> here's
> >> > >>> > the context:
> >> > >>> >
> >> > >>> > * there's a Flume's HBase sink (read: HBase client) which writes
> >> > >>> > data
> >> > >>> from
> >> > >>> > Flume "pipe" (read: some event-based messages source) to HTable;
> >> > >>> > * when HBase is down for some time (with default HBase
> >> > >>> > configuration
> >> > on
> >> > >>> > Flume's sink side) HTable.put throws exception and client exits
> >> > >>> > (it
> >> > >>> usually
> >> > >>> > takes ~10 min to fail);
> >> > >>> > * Flume is smart enough to accumulate data to be written
> >> > >>> > reliably if
> >> > sink
> >> > >>> > behaves badly (not writing for some time, pauses, etc.), so it
> >> > >>> > would
> >> > be
> >> > >>> > great if the sink tries to write data until HBase is up again,
> >>BUT:
> >> > >>> > * but here, as we have complete "failure" of sink process
> >> > >>> > (thread
> >> > needs
> >> > >>> to
> >> > >>> > be restarted) the data never reaches HTable even after HBase
> >> > >>> > cluster
> >> > is
> >> > >>> > brought up again.
> >> > >>> >
> >> > >>> > So you suggest instead of this extra construction around
> >> > >>> > HTable.put
> >> > to
> >> > >>> use
> >> > >>> > configuration properties "hbase.client.pause" and
> >> > >>> > "hbase.client.retries.number"? I.e. make retries attempts to be
> >> > >>> (reasonably)
> >> > >>> > close to "perform forever". Is that what you meant?
> >> > >>> >
> >> > >>> > Thank you,
> >> > >>> > Alex Baranau
> >> > >>> > ----
> >> > >>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> >> > >>> > Hadoop -
> >> > >>> HBase
> >> > >>> >
> >> > >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <[email protected]>
> >> > wrote:
> >> > >>> >
> >> > >>> >> This would retry indefinitely, right ?
> >> > >>> >> Normally maximum retry duration would govern how long the retry
> >> > >>> >> is attempted.
> >> > >>> >>
> >> > >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau <
> >> > [email protected]
> >> > >>> >> >wrote:
> >> > >>> >>
> >> > >>> >> > Hello,
> >> > >>> >> >
> >> > >>> >> > Just wanted to confirm that I'm doing things in a proper way
> >> here.
> >> > How
> >> > >>> >> > about
> >> > >>> >> > this code to handle the temp cluster connectivity problems
> >> > >>> >> > (or
> >> > cluster
> >> > >>> >> down
> >> > >>> >> > time) on client-side?
> >> > >>> >> >
> >> > >>> >> > +    // HTable.put() will fail with exception if connection
> >> > >>> >> > + to
> >> > cluster
> >> > >>> is
> >> > >>> >> > temporarily broken or
> >> > >>> >> > +    // cluster is temporarily down. To be sure data is
> >> > >>> >> > + written we
> >> > >>> retry
> >> > >>> >> > writing.
> >> > >>> >> > +    boolean dataWritten = false;
> >> > >>> >> > +    do {
> >> > >>> >> > +      try {
> >> > >>> >> > +        table.put(p);
> >> > >>> >> > +        dataWritten = true;
> >> > >>> >> > +      } catch (IOException ioe) { // indicates cluster
> >> > connectivity
> >> > >>> >> > problem
> >> > >>> >> > (also thrown when cluster is down)
> >> > >>> >> > +        LOG.error("Writing data to HBase failed, will try
> >> > >>> >> > + again
> >> > in "
> >> > >>> +
> >> > >>> >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe);
> >> > >>> >> > +
> >> > >>> >> > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL
> >> > *
> >> > >>> >> 1000);
> >> > >>> >> > +      }
> >> > >>> >> > +    } while (!dataWritten);
> >> > >>> >> >
> >> > >>> >> > Thank you in advance,
> >> > >>> >> > Alex Baranau
> >> > >>> >> > ----
> >> > >>> >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> >> > Hadoop -
> >> > >>> >> HBase
> >> > >>> >> >
> >> > >>> >>
> >> > >>> >
> >> > >>>
> >> > >>
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Joseph Echeverria
> >> > Cloudera, Inc.
> >> > 443.305.9434
> >> >
> >>
>
>

Re: Retry HTable.put() on client-side to handle temp connectivity problem

Reply via email to