Re: Use experience and performance data of offheap from Alibaba online cluster

Anoop John Sun, 20 Nov 2016 08:29:27 -0800

Regarding ur Q on max GC pause Nick, the use case is R+ W workload. This is
what I know.. thr write path is still having heavy on heap usage and
garbage. Might be a reason I guess.. once we can have write path off heap
also, the huge % of our memory need can then be handled by off heap and I
believe that can make this still down. The point is then we can work with
much reduced xmx. The off heap memory what we use is fixed buffers which
live for the life of the RS.  ie. DBBs in Bucket cache/ DBBs in
MemstoreChunkPool.


-Anoop-

On Sunday, November 20, 2016, Nick Dimiduk <[email protected]> wrote:
> Very encouraging indeed! Thank you so much for sharing your results with
> the community!!! This is excellent to see and we really appreciate your
> openness. I have a couple comments/questions.
>
> (1) from the DISCUSS thread re: EOL of 1.1, it seems we'll continue to
> support 1.x releases for some time, including with overlap of the 2.0
line.
> Based on the energy of the community, I would guess these will be
> maintained until 2.2 at least. Therefore, offheap patches that have seen
> production exposure seem like a reasonable candidate for backport, perhaps
> in a 1.4 or 1.5 release timeframe.
>
> (2) I'm surprised to see your max GC pause go from 11s -> 7s. Do I
> understand you correctly? This is an improvement to be sure, but I would
> have expected more gain. Can you elaborate on your GC and heapsize
> settings? Have you profiled the heaps at all to see where the pressure
lies
> once the bulk of the data pathway is moved to direct memory?
>
> Thanks a lot!
> Nick
>
> On Fri, Nov 18, 2016 at 12:11 AM Yu Li <[email protected]> wrote:
>
>> Dear all,
>>
>> We have backported read path offheap (HBASE-11425) to our customized
>> hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for
more
>> than a month, and would like to share our experience, for what it's worth
>> (smile).
>>
>> Generally speaking, we gained a better and more stable
>> throughput/performance with offheap, and below are some details:
>>
>> 1. QPS become more stable with offheap
>>
>> Performance w/o offheap:
>>
>> Performance w/ offheap:
>>
>> These data come from our online A/B test cluster (with 450 physical
>> machines, and each with 256G memory + 64 core) with real world workloads,
>> it shows using offheap we could gain a more stable throughput as well as
>> better performance
>>
>> Not showing fully online data here because for online we published the
>> version with both offheap and NettyRpcServer together, so no standalone
>> comparison data for offheap
>>
>> 2. Full GC frequency and cost
>>
>> Average Full GC STW time reduce from 11s to 7s with offheap.
>>
>> 3. Young GC frequency and cost
>>
>> No performance degradation observed with offheap.
>>
>> 4. Peak throughput of one single RS
>>
>> On Singles Day (11/11), peak throughput of one single RS reached 100K,
>> among which 90K from Get. Plus internet in/out data we could know the
>> average result size of get request is ~1KB
>>
>> Offheap are used on all online machines (more than 1600 nodes) instead of
>> LruCache, so the above QPS is gained from offheap bucketcache, along with
>> NettyRpcServer(HBASE-15756).
>> Just let us know if any comments. Thanks.
>>
>> Best Regards,
>> Yu
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Use experience and performance data of offheap from Alibaba online cluster

Reply via email to