> I hope we could strengthen our faith in HBase capability Us too. Would you be interested in taking the metrics and discussion of them that came out in this thread into a post for the HBase project blog (https://blogs.apache.org/hbase)? As you can see from the other blog entries details about the use case does not need to reveal proprietary information, readers would be most interested in the metrics you observed/achieved on 11/11 followed by a technical discussion of how (roughly) to replicate them. You have good command of the English language so that won't be a problem and anyway I offer my services as editor should you like to try. Think about it. This would be a great post. I am sure, very popular.
> On Nov 22, 2016, at 12:51 AM, Yu Li <car...@gmail.com> wrote: > > bq. If it were not "confidential" might you mention why there is such a > large (several orders of magnitude) explosion of end user queries to > backend ones? > For index building and online machine learning system, there're more > information recorded after each visit/trade, such as user query/click > history, item stock updates, etc., and multiple user-specific feature data > will be read/updated for better recommendation. The flow is pretty much > like: > user visit some items > -> put them into shopping cart > -> checkout/removing item from shopping cart > -> item stock update/recommend new items to user > -> user visit new items > Not that much details could be supplied but I believe we could imagine how > many queries/updates there will be at backend for such loops, right? (smile) > > Thanks again for the interest and questions although a little bit derail of > the thread, and I hope we could strengthen our faith in HBase capability > after these discussions. :-) > > Best Regards, > Yu > >> On 21 November 2016 at 01:26, Stephen Boesch <java...@gmail.com> wrote: >> >> Thanks Yu - given your apparent direct knowledge of the data that is >> helpful (my response earlier had been to 张铎) . It is important so as to >> ensure informing colleagues of numbers that are "real". >> >> If it were not "confidential" might you mention why there is such a large >> (several orders of magnitude) explosion of end user queries to backend >> ones? >> >> >> >> 2016-11-20 7:51 GMT-08:00 Yu Li <car...@gmail.com>: >> >>> Thanks everyone for the feedback/comments, glad this data means something >>> and have drawn your interesting. Let me answer the questions (and sorry >> for >>> the lag) >>> >>> For the backport patches, ours are based on a customized 1.1.2 version >> and >>> cannot apply directly for any 1.x branches. It would be easy for us to >>> upload existing patches somewhere but obviously not that useful... so >> maybe >>> we still should get them in branch-1 and officially support read-path >>> offheap in future 1.x release? Let me create one JIRA about this and >> let's >>> discuss in the JIRA system. And to be very clear, it's a big YES to share >>> our patches with all rather than only numbers, just which way is better >>> (smile). >>> >>> And answers for @Stephen Boesch: >>> >>> bq. In any case the data is marked as 9/25/16 not 11/11/16 >>> It's specially noted that the data on 9/25 are from our online A/B test >>> cluster, and not showing fully online data because we published offheap >>> together with NettyRpcServer for online thus no standalone comparison >> data >>> for offheap. Please check my original email more carefully (smile). >>> >>> bq. Repeating my earlier question: 20*Meg* queries per second?? Just >>> checked and *google* does 40*K* queries per second. >>> As you already noticed, the 20M QPS is number from A/B testing cluster >> (450 >>> nodes), and there're much more on 11/11 online cluster (1600+ nodes). >>> Please note that this is NOT some cluster directly serves queries from >> end >>> user, but serving the index building and online machine learning system. >>> Refer to our talk on hbasecon2016 (slides >>> <http://www.slideshare.net/HBaseCon/improvements-to- >> apache-hbase-and-its- >>> applications-in-alibaba-search> >>> /recording >>> <https://www.youtube.com/watch?v=UVGDd2JeIMg&list=PLe-h9HrA9qfDVOeNh1l_ >>> T5HvwvkO9raWy&index=10>) >>> for more details, if you're interested. And different from google, >> there's >>> an obvious "hot spot" for us, so I don't think the QPS of these two >>> different systems are comparable. >>> >>> bq. So maybe please check your numbers again. >>> The numbers are got from online monitoring system and all real not fake, >> so >>> no need to check. Maybe just need some more time to take and understand? >>> (smile) >>> >>> Best Regards, >>> Yu >>> >>>> On 20 November 2016 at 23:03, Stephen Boesch <java...@gmail.com> wrote: >>>> >>>> Your arguments do not reflect direct knowledge of the numbers. (a) >> There >>>> is no super-spikiness int he graphs in the data (b) In any case the >> data >>> is >>>> marked as 9/25/16 not 11/11/16. (c) The number of internet users says >>>> little about the number of *concurrent* users. >>>> >>>> Overall it would be helpful for those who actually collected the data >> to >>>> comment - not just speculation from someone who does not. As I had >>>> mentioned already there *may* be a huge fanout from number of >>>> user/application queries to the backend: but *huge* it would seemingly >>> need >>>> to be to generate the numbers shown. >>>> >>>> 2016-11-19 22:39 GMT-08:00 张铎 <palomino...@gmail.com>: >>>> >>>>> 11.11 is something like the Black Friday. Almost every item on >> Alibaba >>>> will >>>>> discount a lot at 11.11. Alibaba earned a 1 billion revenue within 1 >>>>> minute(52 seconds) and 10 billion revenue within 7 minutes(6 minutes >> 58 >>>>> seconds) at 11.11. The Chinese people had payed more 120 billion >>> Chinese >>>>> yuan to alibaba at 11.11. And I remember that Jeff Dean used to give >> a >>>>> slides that for google the amplification from user queries to the >>> storage >>>>> system queries is also very large(I can not remember the exact >> number. >>>> The >>>>> slides is used to explain that hedge read is very useful for reducing >>>>> latency). So I think the peak throughput is true. >>>>> >>>>> There are more than 600 million people in China that use internet. So >>> if >>>>> they decide to do something to your system at the same time, it looks >>>> like >>>>> a DDOS to your system... >>>>> >>>>> Thanks. >>>>> >>>>> 2016-11-20 12:56 GMT+08:00 Stephen Boesch <java...@gmail.com>: >>>>> >>>>>> Repeating my earlier question: 20*Meg* queries per second?? Just >>>>> checked >>>>>> and *google* does 40*K* queries per second. Now maybe the "queries" >>>> are a >>>>>> decomposition of far fewer end-user queries that cause a fanout of >>>>> backend >>>>>> queries. *But still .. * >>>>>> >>>>>> So maybe please check your numbers again. >>>>>> >>>>>> 2016-11-19 17:05 GMT-08:00 Heng Chen <heng.chen.1...@gmail.com>: >>>>>> >>>>>>> The performance looks great! >>>>>>> >>>>>>> 2016-11-19 18:03 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: >>>>>>>> Opening a JIRA would be fine. >>>>>>>> This makes it easier for people to obtain the patch(es). >>>>>>>> >>>>>>>> Cheers >>>>>>>> >>>>>>>>> On Nov 18, 2016, at 11:35 PM, Anoop John < >> anoop.hb...@gmail.com >>>> >>>>>> wrote: >>>>>>>>> >>>>>>>>> Because of some compatibility issues, we decide that this will >>> be >>>>> done >>>>>>>>> in 2.0 only.. Ya as Andy said, it would be great to share the >>> 1.x >>>>>>>>> backported patches. Is it a mega patch at ur end? Or issue >> by >>>>> issue >>>>>>>>> patches? Latter would be best. Pls share patches in some >> place >>>>> and a >>>>>>>>> list of issues backported. I can help with verifying the >> issues >>>> once >>>>>>>>> so as to make sure we dont miss any... >>>>>>>>> >>>>>>>>> -Anoop- >>>>>>>>> >>>>>>>>>> On Sat, Nov 19, 2016 at 12:32 AM, Enis Söztutar < >>>>> enis....@gmail.com> >>>>>>> wrote: >>>>>>>>>> Thanks for sharing this. Great work. >>>>>>>>>> >>>>>>>>>> I don't see any reason why we cannot backport to branch-1. >>>>>>>>>> >>>>>>>>>> Enis >>>>>>>>>> >>>>>>>>>> On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell < >>>>>>> andrew.purt...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Yes, please, the patches will be useful to the community >> even >>> if >>>>> we >>>>>>> decide >>>>>>>>>>> not to backport into an official 1.x release. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault < >>>>>>>>>>>> bbeaudrea...@hubspot.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Is the backported patch available anywhere? Not seeing it >> on >>>> the >>>>>>>>>>> referenced >>>>>>>>>>>> JIRA. If it ends up not getting officially backported to >>>> branch-1 >>>>>>> due to >>>>>>>>>>>> 2.0 around the corner, some of us who build our own deploy >>> may >>>>> want >>>>>>> to >>>>>>>>>>>> integrate into our builds. Thanks! These numbers look great >>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Nov 18, 2016 at 12:20 PM Anoop John < >>>>>> anoop.hb...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Yu Li >>>>>>>>>>>>> Good to see that the off heap work help you.. >>>> The >>>>>> perf >>>>>>>>>>>>> numbers looks great. So this is a compare of on heap L1 >>> cache >>>>> vs >>>>>>> off >>>>>>>>>>> heap >>>>>>>>>>>>> L2 cache(HBASE-11425 enabled). So for 2.0 we should make >>> L2 >>>>> off >>>>>>> heap >>>>>>>>>>>>> cache ON by default I believe. Will raise a jira for that >>> we >>>>> can >>>>>>>>>>> discuss >>>>>>>>>>>>> under that. Seems like L2 off heap cache for data blocks >>> and >>>>> L1 >>>>>>> cache >>>>>>>>>>> for >>>>>>>>>>>>> index blocks seems a right choice. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the backport and the help in testing the >>> feature.. >>>>> You >>>>>>> were >>>>>>>>>>>>> able to find some corner case bugs and helped community to >>> fix >>>>>>> them.. >>>>>>>>>>>>> Thanks goes to ur whole team. >>>>>>>>>>>>> >>>>>>>>>>>>> -Anoop- >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li < >> car...@gmail.com> >>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sorry guys, let me retry the inline images: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Performance w/o offheap: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Performance w/ offheap: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Peak Get QPS of one single RS during Singles' Day >> (11/11): >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> And attach the files in case inline still not working: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Performance_without_offheap.png >>>>>>>>>>>>>> < >>>>>>>>>>>>> https://drive.google.com/file/d/0B017Q40_ >>>>>>> F5uwbWEzUGktYVIya3JkcXVjRkFvVG >>>>>>>>>>> NtM0VxWC1n/view?usp=drive_web >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Performance_with_offheap.png >>>>>>>>>>>>>> < >>>>>>>>>>>>> https://drive.google.com/file/d/0B017Q40_ >>>>>>> F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF >>>>>>>>>>> VrcUdPc2ww/view?usp=drive_web >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Peak_Get_QPS_of_Single_RS.png >>>>>>>>>>>>>> < >>>>>>>>>>>>> https://drive.google.com/file/d/0B017Q40_ >>>>>>> F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3 >>>>>>>>>>> F6bHpNYnJz/view?usp=drive_web >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>> Yu >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 18 November 2016 at 19:29, Ted Yu < >> yuzhih...@gmail.com >>>> >>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yu: >>>>>>>>>>>>>>> With positive results, more hbase users would be asking >>> for >>>>> the >>>>>>>>>>> backport >>>>>>>>>>>>>>> of offheap read path patches. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Do you think you or your coworker has the bandwidth to >>>> publish >>>>>>>>>>> backport >>>>>>>>>>>>>>> for branch-1 ? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Nov 18, 2016, at 12:11 AM, Yu Li <car...@gmail.com> >>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Dear all, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We have backported read path offheap (HBASE-11425) to >> our >>>>>>> customized >>>>>>>>>>>>>>> hbase-1.1.2 (thanks @Anoop for the help/support) and run >>> it >>>>>>> online for >>>>>>>>>>>>> more >>>>>>>>>>>>>>> than a month, and would like to share our experience, >> for >>>> what >>>>>>> it's >>>>>>>>>>>>> worth >>>>>>>>>>>>>>> (smile). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Generally speaking, we gained a better and more stable >>>>>>>>>>>>>>> throughput/performance with offheap, and below are some >>>>> details: >>>>>>>>>>>>>>>> 1. QPS become more stable with offheap >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Performance w/o offheap: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Performance w/ offheap: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> These data come from our online A/B test cluster (with >>> 450 >>>>>>> physical >>>>>>>>>>>>>>> machines, and each with 256G memory + 64 core) with real >>>> world >>>>>>>>>>>>> workloads, >>>>>>>>>>>>>>> it shows using offheap we could gain a more stable >>>> throughput >>>>> as >>>>>>> well >>>>>>>>>>> as >>>>>>>>>>>>>>> better performance >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Not showing fully online data here because for online >> we >>>>>>> published >>>>>>>>>>> the >>>>>>>>>>>>>>> version with both offheap and NettyRpcServer together, >> so >>> no >>>>>>>>>>> standalone >>>>>>>>>>>>>>> comparison data for offheap >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2. Full GC frequency and cost >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Average Full GC STW time reduce from 11s to 7s with >>>> offheap. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 3. Young GC frequency and cost >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> No performance degradation observed with offheap. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 4. Peak throughput of one single RS >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Singles Day (11/11), peak throughput of one single >> RS >>>>>> reached >>>>>>>>>>> 100K, >>>>>>>>>>>>>>> among which 90K from Get. Plus internet in/out data we >>> could >>>>>> know >>>>>>> the >>>>>>>>>>>>>>> average result size of get request is ~1KB >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Offheap are used on all online machines (more than 1600 >>>>> nodes) >>>>>>>>>>> instead >>>>>>>>>>>>>>> of LruCache, so the above QPS is gained from offheap >>>>>> bucketcache, >>>>>>>>>>> along >>>>>>>>>>>>>>> with NettyRpcServer(HBASE-15756). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Just let us know if any comments. Thanks. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>> Yu >>>>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>