Re: Use experience and performance data of offheap from Alibaba online cluster

Yu Li Sun, 20 Nov 2016 21:29:07 -0800

Thanks @Nick for the feedback, and sorry for the lagging response.

Regarding the backport, I've opened HBASE-17138 for further discussion, FYI.


Regarding the fullGC part, Anoop is right, we still don't have write-path
offheap so there'll still be fullGC happening. And since the block cache
normally occupies 30% of the overall (hbase.block.cache.size is 0.3 by
default), we reduce the heap size by 30% after making blockcache offheap,
and the fullGC time is reduced at a relative ratio

Best Regards,
Yu

On 21 November 2016 at 00:29, Anoop John <anoop.hb...@gmail.com> wrote:

> Regarding ur Q on max GC pause Nick, the use case is R+ W workload. This is
> what I know.. thr write path is still having heavy on heap usage and
> garbage. Might be a reason I guess.. once we can have write path off heap
> also, the huge % of our memory need can then be handled by off heap and I
> believe that can make this still down. The point is then we can work with
> much reduced xmx. The off heap memory what we use is fixed buffers which
> live for the life of the RS.  ie. DBBs in Bucket cache/ DBBs in
> MemstoreChunkPool.
>
> -Anoop-
>
> On Sunday, November 20, 2016, Nick Dimiduk <ndimi...@apache.org> wrote:
> > Very encouraging indeed! Thank you so much for sharing your results with
> > the community!!! This is excellent to see and we really appreciate your
> > openness. I have a couple comments/questions.
> >
> > (1) from the DISCUSS thread re: EOL of 1.1, it seems we'll continue to
> > support 1.x releases for some time, including with overlap of the 2.0
> line.
> > Based on the energy of the community, I would guess these will be
> > maintained until 2.2 at least. Therefore, offheap patches that have seen
> > production exposure seem like a reasonable candidate for backport,
> perhaps
> > in a 1.4 or 1.5 release timeframe.
> >
> > (2) I'm surprised to see your max GC pause go from 11s -> 7s. Do I
> > understand you correctly? This is an improvement to be sure, but I would
> > have expected more gain. Can you elaborate on your GC and heapsize
> > settings? Have you profiled the heaps at all to see where the pressure
> lies
> > once the bulk of the data pathway is moved to direct memory?
> >
> > Thanks a lot!
> > Nick
> >
> > On Fri, Nov 18, 2016 at 12:11 AM Yu Li <car...@gmail.com> wrote:
> >
> >> Dear all,
> >>
> >> We have backported read path offheap (HBASE-11425) to our customized
> >> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for
> more
> >> than a month, and would like to share our experience, for what it's
> worth
> >> (smile).
> >>
> >> Generally speaking, we gained a better and more stable
> >> throughput/performance with offheap, and below are some details:
> >>
> >> 1. QPS become more stable with offheap
> >>
> >> Performance w/o offheap:
> >>
> >> Performance w/ offheap:
> >>
> >> These data come from our online A/B test cluster (with 450 physical
> >> machines, and each with 256G memory + 64 core) with real world
> workloads,
> >> it shows using offheap we could gain a more stable throughput as well as
> >> better performance
> >>
> >> Not showing fully online data here because for online we published the
> >> version with both offheap and NettyRpcServer together, so no standalone
> >> comparison data for offheap
> >>
> >> 2. Full GC frequency and cost
> >>
> >> Average Full GC STW time reduce from 11s to 7s with offheap.
> >>
> >> 3. Young GC frequency and cost
> >>
> >> No performance degradation observed with offheap.
> >>
> >> 4. Peak throughput of one single RS
> >>
> >> On Singles Day (11/11), peak throughput of one single RS reached 100K,
> >> among which 90K from Get. Plus internet in/out data we could know the
> >> average result size of get request is ~1KB
> >>
> >> Offheap are used on all online machines (more than 1600 nodes) instead
> of
> >> LruCache, so the above QPS is gained from offheap bucketcache, along
> with
> >> NettyRpcServer(HBASE-15756).
> >> Just let us know if any comments. Thanks.
> >>
> >> Best Regards,
> >> Yu
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
>

Re: Use experience and performance data of offheap from Alibaba online cluster

Reply via email to