Re: Retry HTable.put() on client-side to handle temp connectivity problem

Alex Baranau Wed, 29 Jun 2011 07:58:27 -0700

All correct. No changes in HBase needed (no were requested actually,
changing default retry behavior was just suggestion by Stack). Thank you all
for participating!


Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Wed, Jun 29, 2011 at 4:44 PM, Doug Meil <[email protected]>wrote:

>
> Hi there-
>
> 1)      Buffer/Batch
>
> Addressing the comment in the Cloudera ticket (FLUME-390) "currently
> non-written events are lost.", I agree that two paths (write-buffer vs.
> batch-it-yourself) are available for Flume to recover from a failure and
> know what hasn't been sent (or what was at least attempted to be sent).
>
> Thus, I don't see this an "HBase issue".  There are existing APIs for
> Flume to utilize that will get the job done.
>
> 2)      Retry-forver.
>
> I've seen several folks vote -1 on retry-forever as default behavior.
> Based on the conversation I'm assuming this won't happen.
>
>
> Are there other aspects to this issue?  I doesn't seem like any HBase
> changes are needed to address these issues.
>
>
>
>
>
> On 6/29/11 2:17 AM, "Alex Baranau" <[email protected]> wrote:
>
> >I think you are talking here about loosing some data from client-side
> >buffer. I don't think using batch will help. If we use batch from client
> >code and want to use the client-side buffering, we would need to implement
> >the same buffering code already implemented in HTable. The behavior and
> >ack
> >sending will be the same: the ack is sent after Flume sink receives the
> >event, which might be buffered and not persisted (yet) to HBase. I haven't
> >looked in Flume's ability to skip sending ack on receiving event in sink
> >and
> >doing it in batches later (after the actual persisting happens). Will
> >investigate that as a separate effort.
> >
> >In general, please correct me if I'm wrong, but there won't be much
> >difference between using HTable's batch and put:
> >* with put() I can also tell what was persisted and which records failed,
> >as
> >they will be available in the client-side buffer after failures
> >* internally put uses batch anyways (i.e. connection.processBatch)
> >
> >Alex Baranau
> >----
> >Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> HBase
> >
> >On Tue, Jun 28, 2011 at 10:41 PM, Doug Meil
> ><[email protected]>wrote:
> >
> >>
> >> But if Flume used the htable 'batch' method instead of 'put'...
> >>
> >>
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm
> >>l#
> >> batch%28java.util.List%29
> >>
> >> .. doesn't it sidestep this issue?  Because instead of being unsure what
> >> was in the write-buffer and what wasn't, the caller knows exactly what
> >>was
> >> sent and whether it was sent without error.
> >>
> >>
> >>
> >>
> >>
> >> On 6/28/11 1:07 PM, "Alex Baranau" <[email protected]> wrote:
> >>
> >> >> if the sink "dies" for some reason, then it should
> >> >> push that back to the upstream parts of the flume dataflow, and have
> >> >>them
> >> >> buffer data on local disk.
> >> >
> >> >True. But this seem to be a separate issue:
> >> >https://issues.cloudera.org/browse/FLUME-390.
> >> >
> >> >Alex Baranau
> >> >----
> >> >Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> >> HBase
> >> >
> >> >On Tue, Jun 28, 2011 at 7:40 PM, Doug Meil
> >> ><[email protected]>wrote:
> >> >
> >> >> I agree with what Todd & Gary said.   I don't like retry-forever,
> >> >> especially as a default option in HBase.
> >> >>
> >> >>
> >> >> -----Original Message-----
> >> >> From: Gary Helmling [mailto:[email protected]]
> >> >> Sent: Tuesday, June 28, 2011 12:18 PM
> >> >> To: [email protected]
> >> >> Cc: Jonathan Hsieh
> >> >> Subject: Re: Retry HTable.put() on client-side to handle temp
> >> >>connectivity
> >> >> problem
> >> >>
> >> >> I'd also be wary of changing the default to retry forever.  This
> >>might
> >> >>be
> >> >> hard to differentiate from a hang or deadlock for new users and
> >>seems to
> >> >> violate "least surprise".
> >> >>
> >> >> In many cases it's preferable to have some kind of predictable
> >>failure
> >> >>as
> >> >> well.  So I think this would appear to be a regression in behavior.
> >>If
> >> >> you're serving say web site data from hbase, you may prefer an
> >> >>occasional
> >> >> error or timeout rather than having page loading hang forever.
> >> >>
> >> >> I'm all for making "retry forever" a configurable option, but do we
> >>need
> >> >> any new knobs here?
> >> >>
> >> >> --gh
> >> >>
> >> >>
> >> >> On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <[email protected]>
> >> >> wrote:
> >> >>
> >> >> > If I could override the default, I'd be a hesitant +1. I'd rather
> >>see
> >> >> > the default be something like retry 10 times, then throw an error.
> >> >> > With one option being infinite retries.
> >> >> >
> >> >> > -Joey
> >> >> >
> >> >> > On Mon, Jun 27, 2011 at 2:21 PM, Stack <[email protected]> wrote:
> >> >> > > I'd be fine with changing the default in hbase so clients just
> >>keep
> >> >> > > trying.  What do others think?
> >> >> > > St.Ack
> >> >> > >
> >> >> > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau
> >> >> > > <[email protected]>
> >> >> > wrote:
> >> >> > >> The code I pasted works for me: it reconnects successfully. Just
> >> >> > >> thought
> >> >> > it
> >> >> > >> might be not the best way to do it.. I realized that by using
> >>HBase
> >> >> > >> configuration properties we could just say that it's up to user
> >>to
> >> >> > configure
> >> >> > >> HBase client (created by Flume) properly (e.g. by adding
> >> >> > >> hbase-site.xml
> >> >> > with
> >> >> > >> settings to classpath). On the other hand, it looks to me that
> >> >> > >> users of HBase sinks will *always* want it to retry writing to
> >> >> > >> HBase until it
> >> >> > works
> >> >> > >> out. But default configuration works not this way: sinks stops
> >>when
> >> >> > HBase is
> >> >> > >> temporarily down or inaccessible. Hence it makes using the sink
> >> >> > >> more complicated (because default configuration sucks), which
> >>I'd
> >> >> > >> like to
> >> >> > avoid
> >> >> > >> here by adding the code above. Ideally the default configuration
> >> >> > >> should
> >> >> > work
> >> >> > >> the best way for general-purpose case.
> >> >> > >>
> >> >> > >> I understood what are the ways to implement/configure such
> >> >> > >> behavior. I
> >> >> > think
> >> >> > >> we should discuss what is the best default behavior and do we
> >>need
> >> >> > >> to
> >> >> > allow
> >> >> > >> user override it on Flume ML (or directly at
> >> >> > >> https://issues.cloudera.org/browse/FLUME-685).
> >> >> > >>
> >> >> > >> Thank you guys,
> >> >> > >>
> >> >> > >> Alex Baranau
> >> >> > >> ----
> >> >> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> >>Hadoop
> >> >> > >> -
> >> >> > HBase
> >> >> > >>
> >> >> > >>
> >> >> > >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <[email protected]>
> >>wrote:
> >> >> > >>
> >> >> > >>> Either should work Alex.  Your version will go "for ever".
> >>Have
> >> >> > >>> you tried yanking hbase out from under the client to see if it
> >> >> reconnects?
> >> >> > >>>
> >> >> > >>> Good on you,
> >> >> > >>> St.Ack
> >> >> > >>>
> >> >> > >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau <
> >> >> > [email protected]>
> >> >> > >>> wrote:
> >> >> > >>> > Yes, that is what intended, I think. To make the whole
> >>picture
> >> >> > >>> > clear,
> >> >> > >>> here's
> >> >> > >>> > the context:
> >> >> > >>> >
> >> >> > >>> > * there's a Flume's HBase sink (read: HBase client) which
> >>writes
> >> >> > >>> > data
> >> >> > >>> from
> >> >> > >>> > Flume "pipe" (read: some event-based messages source) to
> >>HTable;
> >> >> > >>> > * when HBase is down for some time (with default HBase
> >> >> > >>> > configuration
> >> >> > on
> >> >> > >>> > Flume's sink side) HTable.put throws exception and client
> >>exits
> >> >> > >>> > (it
> >> >> > >>> usually
> >> >> > >>> > takes ~10 min to fail);
> >> >> > >>> > * Flume is smart enough to accumulate data to be written
> >> >> > >>> > reliably if
> >> >> > sink
> >> >> > >>> > behaves badly (not writing for some time, pauses, etc.), so
> >>it
> >> >> > >>> > would
> >> >> > be
> >> >> > >>> > great if the sink tries to write data until HBase is up
> >>again,
> >> >>BUT:
> >> >> > >>> > * but here, as we have complete "failure" of sink process
> >> >> > >>> > (thread
> >> >> > needs
> >> >> > >>> to
> >> >> > >>> > be restarted) the data never reaches HTable even after HBase
> >> >> > >>> > cluster
> >> >> > is
> >> >> > >>> > brought up again.
> >> >> > >>> >
> >> >> > >>> > So you suggest instead of this extra construction around
> >> >> > >>> > HTable.put
> >> >> > to
> >> >> > >>> use
> >> >> > >>> > configuration properties "hbase.client.pause" and
> >> >> > >>> > "hbase.client.retries.number"? I.e. make retries attempts to
> >>be
> >> >> > >>> (reasonably)
> >> >> > >>> > close to "perform forever". Is that what you meant?
> >> >> > >>> >
> >> >> > >>> > Thank you,
> >> >> > >>> > Alex Baranau
> >> >> > >>> > ----
> >> >> > >>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> >> >> > >>> > Hadoop -
> >> >> > >>> HBase
> >> >> > >>> >
> >> >> > >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu
> >><[email protected]>
> >> >> > wrote:
> >> >> > >>> >
> >> >> > >>> >> This would retry indefinitely, right ?
> >> >> > >>> >> Normally maximum retry duration would govern how long the
> >>retry
> >> >> > >>> >> is attempted.
> >> >> > >>> >>
> >> >> > >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau <
> >> >> > [email protected]
> >> >> > >>> >> >wrote:
> >> >> > >>> >>
> >> >> > >>> >> > Hello,
> >> >> > >>> >> >
> >> >> > >>> >> > Just wanted to confirm that I'm doing things in a proper
> >>way
> >> >> here.
> >> >> > How
> >> >> > >>> >> > about
> >> >> > >>> >> > this code to handle the temp cluster connectivity problems
> >> >> > >>> >> > (or
> >> >> > cluster
> >> >> > >>> >> down
> >> >> > >>> >> > time) on client-side?
> >> >> > >>> >> >
> >> >> > >>> >> > +    // HTable.put() will fail with exception if
> >>connection
> >> >> > >>> >> > + to
> >> >> > cluster
> >> >> > >>> is
> >> >> > >>> >> > temporarily broken or
> >> >> > >>> >> > +    // cluster is temporarily down. To be sure data is
> >> >> > >>> >> > + written we
> >> >> > >>> retry
> >> >> > >>> >> > writing.
> >> >> > >>> >> > +    boolean dataWritten = false;
> >> >> > >>> >> > +    do {
> >> >> > >>> >> > +      try {
> >> >> > >>> >> > +        table.put(p);
> >> >> > >>> >> > +        dataWritten = true;
> >> >> > >>> >> > +      } catch (IOException ioe) { // indicates cluster
> >> >> > connectivity
> >> >> > >>> >> > problem
> >> >> > >>> >> > (also thrown when cluster is down)
> >> >> > >>> >> > +        LOG.error("Writing data to HBase failed, will try
> >> >> > >>> >> > + again
> >> >> > in "
> >> >> > >>> +
> >> >> > >>> >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe);
> >> >> > >>> >> > +
> >> >> > >>> >> > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL
> >> >> > *
> >> >> > >>> >> 1000);
> >> >> > >>> >> > +      }
> >> >> > >>> >> > +    } while (!dataWritten);
> >> >> > >>> >> >
> >> >> > >>> >> > Thank you in advance,
> >> >> > >>> >> > Alex Baranau
> >> >> > >>> >> > ----
> >> >> > >>> >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >>-
> >> >> > Hadoop -
> >> >> > >>> >> HBase
> >> >> > >>> >> >
> >> >> > >>> >>
> >> >> > >>> >
> >> >> > >>>
> >> >> > >>
> >> >> > >
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Joseph Echeverria
> >> >> > Cloudera, Inc.
> >> >> > 443.305.9434
> >> >> >
> >> >>
> >>
> >>
>
>

Re: Retry HTable.put() on client-side to handle temp connectivity problem

Reply via email to