Re: Use experience and performance data of offheap from Alibaba online cluster

Andrew Purtell Tue, 22 Nov 2016 06:14:07 -0800

> I hope we could strengthen our faith in HBase capability

Us too. Would you be interested in taking the metrics and discussion of them 
that came out in this thread into a post for the HBase project blog 
(https://blogs.apache.org/hbase)? As you can see from the other blog entries 
details about the use case does not need to reveal proprietary information, 
readers would be most interested in the metrics you observed/achieved on 11/11 
followed by a technical discussion of how (roughly) to replicate them. You have 
good command of the English language so that won't be a problem and anyway I 
offer my services as editor should you like to try. Think about it. This would 
be a great post. I am sure, very popular.



> On Nov 22, 2016, at 12:51 AM, Yu Li <car...@gmail.com> wrote:
> 
> bq. If it were not "confidential" might you mention why there is such a
> large (several orders of magnitude) explosion of end user queries to
> backend ones?
> For index building and online machine learning system, there're more
> information recorded after each visit/trade, such as user query/click
> history, item stock updates, etc., and multiple user-specific feature data
> will be read/updated for better recommendation. The flow is pretty much
> like:
> user visit some items
> -> put them into shopping cart
> -> checkout/removing item from shopping cart
> -> item stock update/recommend new items to user
> -> user visit new items
> Not that much details could be supplied but I believe we could imagine how
> many queries/updates there will be at backend for such loops, right? (smile)
> 
> Thanks again for the interest and questions although a little bit derail of
> the thread, and I hope we could strengthen our faith in HBase capability
> after these discussions. :-)
> 
> Best Regards,
> Yu
> 
>> On 21 November 2016 at 01:26, Stephen Boesch <java...@gmail.com> wrote:
>> 
>> Thanks Yu - given your apparent direct knowledge of the data that is
>> helpful (my response earlier had been to  张铎) .   It is important so as to
>> ensure informing colleagues of numbers that are "real".
>> 
>> If it were not "confidential" might you mention why there is such a large
>> (several orders of magnitude) explosion of end user queries to backend
>> ones?
>> 
>> 
>> 
>> 2016-11-20 7:51 GMT-08:00 Yu Li <car...@gmail.com>:
>> 
>>> Thanks everyone for the feedback/comments, glad this data means something
>>> and have drawn your interesting. Let me answer the questions (and sorry
>> for
>>> the lag)
>>> 
>>> For the backport patches, ours are based on a customized 1.1.2 version
>> and
>>> cannot apply directly for any 1.x branches. It would be easy for us to
>>> upload existing patches somewhere but obviously not that useful... so
>> maybe
>>> we still should get them in branch-1 and officially support read-path
>>> offheap in future 1.x release? Let me create one JIRA about this and
>> let's
>>> discuss in the JIRA system. And to be very clear, it's a big YES to share
>>> our patches with all rather than only numbers, just which way is better
>>> (smile).
>>> 
>>> And answers for @Stephen Boesch:
>>> 
>>> bq. In any case the data is marked as 9/25/16 not 11/11/16
>>> It's specially noted that the data on 9/25 are from our online A/B test
>>> cluster, and not showing fully online data because we published offheap
>>> together with NettyRpcServer for online thus no standalone comparison
>> data
>>> for offheap. Please check my original email more carefully (smile).
>>> 
>>> bq. Repeating my earlier question:  20*Meg* queries per second??  Just
>>> checked and *google* does 40*K* queries per second.
>>> As you already noticed, the 20M QPS is number from A/B testing cluster
>> (450
>>> nodes), and there're much more on 11/11 online cluster (1600+ nodes).
>>> Please note that this is NOT some cluster directly serves queries from
>> end
>>> user, but serving the index building and online machine learning system.
>>> Refer to our talk on hbasecon2016 (slides
>>> <http://www.slideshare.net/HBaseCon/improvements-to-
>> apache-hbase-and-its-
>>> applications-in-alibaba-search>
>>> /recording
>>> <https://www.youtube.com/watch?v=UVGDd2JeIMg&list=PLe-h9HrA9qfDVOeNh1l_
>>> T5HvwvkO9raWy&index=10>)
>>> for more details, if you're interested. And different from google,
>> there's
>>> an obvious "hot spot" for us, so I don't think the QPS of these two
>>> different systems are comparable.
>>> 
>>> bq. So maybe please check your numbers again.
>>> The numbers are got from online monitoring system and all real not fake,
>> so
>>> no need to check. Maybe just need some more time to take and understand?
>>> (smile)
>>> 
>>> Best Regards,
>>> Yu
>>> 
>>>> On 20 November 2016 at 23:03, Stephen Boesch <java...@gmail.com> wrote:
>>>> 
>>>> Your arguments do not reflect direct knowledge of the numbers.  (a)
>> There
>>>> is no super-spikiness int he graphs in the data (b) In any case the
>> data
>>> is
>>>> marked as 9/25/16 not 11/11/16.  (c) The number of internet users says
>>>> little about the number of *concurrent* users.
>>>> 
>>>> Overall it would be helpful for those who actually collected the data
>> to
>>>> comment - not just speculation from someone who does not. As I had
>>>> mentioned already there *may* be a huge fanout from number of
>>>> user/application queries to the backend: but *huge* it would seemingly
>>> need
>>>> to be to generate the numbers shown.
>>>> 
>>>> 2016-11-19 22:39 GMT-08:00 张铎 <palomino...@gmail.com>:
>>>> 
>>>>> 11.11 is something like the Black Friday. Almost every item on
>> Alibaba
>>>> will
>>>>> discount a lot at 11.11. Alibaba earned a 1 billion revenue within 1
>>>>> minute(52 seconds) and 10 billion revenue within 7 minutes(6 minutes
>> 58
>>>>> seconds) at 11.11. The Chinese people had payed more 120 billion
>>> Chinese
>>>>> yuan to alibaba at 11.11. And I remember that Jeff Dean used to give
>> a
>>>>> slides that for google the amplification from user queries to the
>>> storage
>>>>> system queries is also very large(I can not remember the exact
>> number.
>>>> The
>>>>> slides is used to explain that hedge read is very useful for reducing
>>>>> latency). So I think the peak throughput is true.
>>>>> 
>>>>> There are more than 600 million people in China that use internet. So
>>> if
>>>>> they decide to do something to your system at the same time, it looks
>>>> like
>>>>> a DDOS to your system...
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> 2016-11-20 12:56 GMT+08:00 Stephen Boesch <java...@gmail.com>:
>>>>> 
>>>>>> Repeating my earlier question:  20*Meg* queries per second??  Just
>>>>> checked
>>>>>> and *google* does 40*K* queries per second. Now maybe the "queries"
>>>> are a
>>>>>> decomposition of far fewer end-user queries that cause a fanout of
>>>>> backend
>>>>>> queries. *But still .. *
>>>>>> 
>>>>>> So maybe please check your numbers again.
>>>>>> 
>>>>>> 2016-11-19 17:05 GMT-08:00 Heng Chen <heng.chen.1...@gmail.com>:
>>>>>> 
>>>>>>> The performance looks great!
>>>>>>> 
>>>>>>> 2016-11-19 18:03 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:
>>>>>>>> Opening a JIRA would be fine.
>>>>>>>> This makes it easier for people to obtain the patch(es).
>>>>>>>> 
>>>>>>>> Cheers
>>>>>>>> 
>>>>>>>>> On Nov 18, 2016, at 11:35 PM, Anoop John <
>> anoop.hb...@gmail.com
>>>> 
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Because of some compatibility issues, we decide that this will
>>> be
>>>>> done
>>>>>>>>> in 2.0 only..  Ya as Andy said, it would be great to share the
>>> 1.x
>>>>>>>>> backported patches.  Is it a mega patch at ur end?  Or issue
>> by
>>>>> issue
>>>>>>>>> patches?  Latter would be best.  Pls share patches in some
>> place
>>>>> and a
>>>>>>>>> list of issues backported. I can help with verifying the
>> issues
>>>> once
>>>>>>>>> so as to make sure we dont miss any...
>>>>>>>>> 
>>>>>>>>> -Anoop-
>>>>>>>>> 
>>>>>>>>>> On Sat, Nov 19, 2016 at 12:32 AM, Enis Söztutar <
>>>>> enis....@gmail.com>
>>>>>>> wrote:
>>>>>>>>>> Thanks for sharing this. Great work.
>>>>>>>>>> 
>>>>>>>>>> I don't see any reason why we cannot backport to branch-1.
>>>>>>>>>> 
>>>>>>>>>> Enis
>>>>>>>>>> 
>>>>>>>>>> On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell <
>>>>>>> andrew.purt...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Yes, please, the patches will be useful to the community
>> even
>>> if
>>>>> we
>>>>>>> decide
>>>>>>>>>>> not to backport into an official 1.x release.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault <
>>>>>>>>>>>> bbeaudrea...@hubspot.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Is the backported patch available anywhere? Not seeing it
>> on
>>>> the
>>>>>>>>>>> referenced
>>>>>>>>>>>> JIRA. If it ends up not getting officially backported to
>>>> branch-1
>>>>>>> due to
>>>>>>>>>>>> 2.0 around the corner, some of us who build our own deploy
>>> may
>>>>> want
>>>>>>> to
>>>>>>>>>>>> integrate into our builds. Thanks! These numbers look great
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, Nov 18, 2016 at 12:20 PM Anoop John <
>>>>>> anoop.hb...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Yu Li
>>>>>>>>>>>>>             Good to see that the off heap work help you..
>>>> The
>>>>>> perf
>>>>>>>>>>>>> numbers looks great.  So this is a compare of on heap L1
>>> cache
>>>>> vs
>>>>>>> off
>>>>>>>>>>> heap
>>>>>>>>>>>>> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make
>>> L2
>>>>> off
>>>>>>> heap
>>>>>>>>>>>>> cache ON by default I believe.  Will raise a jira for that
>>> we
>>>>> can
>>>>>>>>>>> discuss
>>>>>>>>>>>>> under that.   Seems like L2 off heap cache for data blocks
>>> and
>>>>> L1
>>>>>>> cache
>>>>>>>>>>> for
>>>>>>>>>>>>> index blocks seems a right choice.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for the backport and the help in testing the
>>> feature..
>>>>> You
>>>>>>> were
>>>>>>>>>>>>> able to find some corner case bugs and helped community to
>>> fix
>>>>>>> them..
>>>>>>>>>>>>> Thanks goes to ur whole team.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Anoop-
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li <
>> car...@gmail.com>
>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Sorry guys, let me retry the inline images:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Performance w/o offheap:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Performance w/ offheap:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Peak Get QPS of one single RS during Singles' Day
>> (11/11):
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> And attach the files in case inline still not working:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Performance_without_offheap.png
>>>>>>>>>>>>>> <
>>>>>>>>>>>>> https://drive.google.com/file/d/0B017Q40_
>>>>>>> F5uwbWEzUGktYVIya3JkcXVjRkFvVG
>>>>>>>>>>> NtM0VxWC1n/view?usp=drive_web
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Performance_with_offheap.png
>>>>>>>>>>>>>> <
>>>>>>>>>>>>> https://drive.google.com/file/d/0B017Q40_
>>>>>>> F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF
>>>>>>>>>>> VrcUdPc2ww/view?usp=drive_web
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Peak_Get_QPS_of_Single_RS.png
>>>>>>>>>>>>>> <
>>>>>>>>>>>>> https://drive.google.com/file/d/0B017Q40_
>>>>>>> F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3
>>>>>>>>>>> F6bHpNYnJz/view?usp=drive_web
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Yu
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 18 November 2016 at 19:29, Ted Yu <
>> yuzhih...@gmail.com
>>>> 
>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Yu:
>>>>>>>>>>>>>>> With positive results, more hbase users would be asking
>>> for
>>>>> the
>>>>>>>>>>> backport
>>>>>>>>>>>>>>> of offheap read path patches.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Do you think you or your coworker has the bandwidth to
>>>> publish
>>>>>>>>>>> backport
>>>>>>>>>>>>>>> for branch-1 ?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Nov 18, 2016, at 12:11 AM, Yu Li <car...@gmail.com>
>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> We have backported read path offheap (HBASE-11425) to
>> our
>>>>>>> customized
>>>>>>>>>>>>>>> hbase-1.1.2 (thanks @Anoop for the help/support) and run
>>> it
>>>>>>> online for
>>>>>>>>>>>>> more
>>>>>>>>>>>>>>> than a month, and would like to share our experience,
>> for
>>>> what
>>>>>>> it's
>>>>>>>>>>>>> worth
>>>>>>>>>>>>>>> (smile).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Generally speaking, we gained a better and more stable
>>>>>>>>>>>>>>> throughput/performance with offheap, and below are some
>>>>> details:
>>>>>>>>>>>>>>>> 1. QPS become more stable with offheap
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Performance w/o offheap:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Performance w/ offheap:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> These data come from our online A/B test cluster (with
>>> 450
>>>>>>> physical
>>>>>>>>>>>>>>> machines, and each with 256G memory + 64 core) with real
>>>> world
>>>>>>>>>>>>> workloads,
>>>>>>>>>>>>>>> it shows using offheap we could gain a more stable
>>>> throughput
>>>>> as
>>>>>>> well
>>>>>>>>>>> as
>>>>>>>>>>>>>>> better performance
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Not showing fully online data here because for online
>> we
>>>>>>> published
>>>>>>>>>>> the
>>>>>>>>>>>>>>> version with both offheap and NettyRpcServer together,
>> so
>>> no
>>>>>>>>>>> standalone
>>>>>>>>>>>>>>> comparison data for offheap
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2. Full GC frequency and cost
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Average Full GC STW time reduce from 11s to 7s with
>>>> offheap.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 3. Young GC frequency and cost
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> No performance degradation observed with offheap.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 4. Peak throughput of one single RS
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Singles Day (11/11), peak throughput of one single
>> RS
>>>>>> reached
>>>>>>>>>>> 100K,
>>>>>>>>>>>>>>> among which 90K from Get. Plus internet in/out data we
>>> could
>>>>>> know
>>>>>>> the
>>>>>>>>>>>>>>> average result size of get request is ~1KB
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Offheap are used on all online machines (more than 1600
>>>>> nodes)
>>>>>>>>>>> instead
>>>>>>>>>>>>>>> of LruCache, so the above QPS is gained from offheap
>>>>>> bucketcache,
>>>>>>>>>>> along
>>>>>>>>>>>>>>> with NettyRpcServer(HBASE-15756).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Just let us know if any comments. Thanks.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>> Yu
>>>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Use experience and performance data of offheap from Alibaba online cluster

Reply via email to