Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Bryan Beaudreault Mon, 08 Feb 2016 11:04:02 -0800

This is great news! Thanks for all of the hard work here, we're excited to
put this issue behind us and are happy to see the lesson around improving
perf testing.


Cheers,

Bryan

On Mon, Feb 8, 2016 at 1:55 PM Stack <st...@duboce.net> wrote:

> Let me close out this thread.
>
> Below is the release note from the HBASE-14460 umbrella increments
> regression issue and then some.
>
> Increments, appends, checkAnd* have been slow since hbase-.1.0.0. The
> unification of mvcc and sequence id done by HBASE-8763 was responsible.
>
> A ‘fast-path’ workaround was added by HBASE-15031 “Fix merge of MVCC and
> SequenceID performance regression in branch-1.0 for Increments”. It became
> available in 1.0.3 and 1.1.3. To enable the fast path, set
> "hbase.increment.fast.but.narrow.consistency" and then rolling restart. The
> workaround was for increments only (appends, checkAndPut, etc., were not
> addressed. See HBASE-15031 release note for more detail).
>
> Subsequently, the regression was properly identified and fixed in
> HBASE-15213 and the fix applied to branch-1.0 and branch-1.1. As it
> happens, hbase-1.2.0 does not suffer from the performance regression
> (though the thought was that it did -- and so it got the fast-path patch
> too via HBASE-15092). Nor does the master branch. HBASE-15213 identified
> that HBASE-12751 (as a side effect) had cured the regression.
>
> hbase-1.0.4 (if it is ever released -- 1.0 has been end-of-lifed) and
> hbase-1.1.4 will have the HBASE-15213 fix.  If you are suffering from the
> increment regression and you are on 1.0.3 or 1.1.3, you can enable the work
> around to get back your increment performance but you should upgrade to get
> the proper fix
>
>
> There are a couple of lessons here:
>
>    - Open source rules. See Junegunn Choi ‘scratch an itch’ in HBASE-15213
>    - Never make assumptions, even between versions. Presumption was that
>    the slowdown came from added friction brought on by unification of mvcc
> and
>    sequenceid and cursory testing seemed to bare out this hunch but should
>    have dug in more (see bullet point above). A load of work was then done
> to
>    make work-around patch fit different versions of the write-path (it has
>    evolved across 1.0=>2.0) but at least for 1.2.0, this work was not
> needed.
>    - We need to perf test all paths. The increment/append/checkAnd* code
>    paths had been neglected and were missing perf tooling (since
> addressed).
>
> Hopefully we live and learn. Thanks to all who helped along the fix. It
> took an army.
>
> St.Ack
>
>
> On Mon, Dec 21, 2015 at 12:35 PM, Stack <st...@duboce.net> wrote:
>
> > On Mon, Dec 21, 2015 at 2:31 AM, 鈴木俊裕 <brfrn...@gmail.com> wrote:
> >
> >> St.Ack
> >>
> >> I am sorry for the late reply.
> >>
> >>
> > Thank you for the reply.
> >
> >
> >
> >> This is the test code:
> >> https://github.com/brfrn169/hbase-test
> >
> >
> > This helps. The test has a different character to others we currently
> have
> > (a thread keeps writing its row rather than a thread writing all rows in
> > region or all threads writing a single row). Let me try it.
> >
> >
> >
> >>
> >>
> >> We applied the patch you can find below to HBase-1.0.0 to resolve the
> >> performance degradation:
> >> https://gist.github.com/brfrn169/15a874594be2fb9d6ea0
> >>
> >> It showed a good performance.
> >>
> >>
> > The patch is attractive because it keeps the change in MVCC class only.
> >
> > It is a pretty radical change though with some changes that look like
> they
> > could go upstream. There are some questions below but no problem if you
> are
> > busy, I can go do my own study.
> >
> > It looks like advanceMemstore had a "blind spot": i.e. we could complete
> a
> > WriteEntry but it might not yet have had its write numbers assigned... Is
> > that why you added the new advancedNoWriteNumberWriteEntries Set?
> >
> > + if (queueFirst.getWriteNumber() == NO_WRITE_NUMBER) {
> > + advancedNoWriteNumberWriteEntries.add(queueFirst);
> > Are we sure that we'll move up through the write mvcc sequence numbers in
> > order?  We seem to be able to take from the
> > advancedNoWriteNumberWriteEntries Set w/o concern for order.
> >
> > You remove the writeQueue notify and seem to flip instead to wait/notify
> > on readWaiters. Less synchronization points seems good. We should do this
> > upstream too (and your change of > to >= looking at nextReadValue?)
> >
> > Do you have a notion of how much more throughput you got with this
> change?
> >
> > Thank you 鈴木俊裕,
> > St.Ack
> >
> >
> >
> >
> >> I think the direct cause of the performance degradation is not a region
> >> level lock because HBase-0.98.6 also has a region level lock.
> >> I think lock contention is caused by HBASE-8763.
> >>
> >> The patch mitigates that lock contention.
> >>
> >> Thanks,
> >> Toshihiro Suzuki
> >>
> >>
> >> 2015-12-14 15:12 GMT+09:00 Stack <st...@duboce.net>:
> >>
> >> > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 <brfrn...@gmail.com> wrote:
> >> >
> >> > > ...
> >> > >
> >> > > Also we wrote performance test code for increment operation that
> >> included
> >> > > 100 threads and ran it in local mode.
> >> > >
> >> > > The result is shown below:
> >> > >
> >> > > CDH5.3.1(HBase0.98.6)
> >> > > Throughput(op/s): 12757, Latency(ms): 7.975072509210629
> >> > >
> >> > > CDH5.4.5(HBase1.0.0)
> >> > > Throughput(op/s): 2027, Latency(ms): 49.11840157868772
> >> > >
> >> > >
> >> > Do you have this program Toshihiro Suzuki still? May I see it? I am
> >> > interested in seeing what the 100 threads are doing, if they are all
> >> > updating the same Cell or if they are spread over many rows (I see
> >> master
> >> > branch is more than 2x slower than 0.94 if all threads are contending
> >> on a
> >> > single Cell but if not contending, master seems faster -- I must be
> >> doing
> >> > something wrong over in my experiments on HBASE-14460). I would also
> be
> >> > interested in what you loading is like in production if you can
> >> describe it
> >> > at all (send me offlist if you'd rather talk about it in public). In
> >> > production, can you tell how much slowdown you are seeing? Is it 2x or
> >> 7x
> >> > as per your test program?
> >> >
> >> > Thank you,
> >> > St.Ack
> >> >
> >> >
> >> >
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Toshihiro Suzuki
> >> > >
> >> >
> >>
> >
> >
>

Re: Performance degradation between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)

Reply via email to