Hi Andy Based on our POCs done, we expect around 20% improvement in latency. For scans it will be little lesser than 20%.
Regards Ram On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <andrew.purt...@gmail.com> wrote: > Hi Ram, > > Do you have any targets for what you are measuring? What are the goals you > guys are working toward with the off heaping changes? > > > > On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan < > ramkrishna.s.vasude...@gmail.com> wrote: > > > > Thanks Vladimir. > > Yeah, the reports that were attached specifically captured the 95/99th > > percentile. > > The reason for checking the server side perf was to specifically see the > > improvement in the server side and also the client was sending large > > results in multiple threads. So wanted to avoid the n/w interference. I > > think it was a general practice that we were following. > > We Wil do some more tests and get some latest readings with bigger data > > sets. > > Sent from mobile. > >> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <andrew.purt...@gmail.com> > wrote: > >> > >> +1 > >> > >> Yeah, something like that, with aspirational targets for improvement > from > >> current releases. Then what to measure, the tests to run, and criteria > for > >> evaluation are clear and organized and we're able to better assess how > the > >> work in progress is meeting its goals (or not) > >> > >> > >> > >> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <vladrodio...@gmail.com > > > >> wrote: > >> > >>>>> Umbrella jira to make sure we can have blocks cached in offheap > backed > >>> cache. In the entire read path, we can refer to this offheap buffer and > >>> avoid onheap copying. > >>> > >>> I think, on a read path, the most important improvement we could > imagine > >> is > >>> elimination or reducing of object creations (KVs, iterators etc). > >>> object reuse, byte buffers reuse or offheap buffers reuse, API change > >> etc. > >>> If this is a part of this JIRA, then I would easily define a goal: > >>> improving 95/99% latency of a read operations. Not performance, but > >> latency > >>> matters > >>> > >>> -Vlad > >>> > >>> > >>> > >>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell < > >> andrew.purt...@gmail.com> > >>> wrote: > >>> > >>>> That's not a realistic or useful test scenario, unless the goal is to > >>>> accelerate queries where all cells are filtered at the server. > >>>> > >>>> > >>>> > >>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <anoop.hb...@gmail.com> > >> wrote: > >>>>> > >>>>> No Andy. 11425 having doc attached to it. At the end of it, we have > >> added > >>>>> perf numbers in a cluster testing. This was done using PE get and > scan > >>>>> tests with filtering all cells at server (to not consider n/w > bandwidth > >>>>> constraints) > >>>>> > >>>>> -Anoop- > >>>>> > >>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell < > >>>> andrew.purt...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> We have some microbenchmarks, not evidence of differences seen from > a > >>>>>> client application. I'm not saying that microbenchmarks are not > >> totally > >>>>>> necessary and a great start - they are - but that they don't measure > >> an > >>>> end > >>>>>> goal. Furthermore unless I've missed one somewhere we don't have a > >> JIRA > >>>> or > >>>>>> design doc that states a clear end goal metric like the strawman I > >> threw > >>>>>> together in my previous mail. A measurable system level goal and > some > >>>> data > >>>>>> from full cluster testing would go a lot further toward letting all > of > >>>> us > >>>>>> evaluate the potential and payoff of the work. In the meantime we > >> should > >>>>>> probably be assembling these changes on a branch instead of in > trunk, > >>>> for > >>>>>> as long as the goal is not clearly defined and the payoff and > >> potential > >>>> for > >>>>>> perf regressions is untested and unknown. > >>>>>> > >>>>>> > >>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <anoop.hb...@gmail.com> > >> wrote: > >>>>>>> > >>>>>>> Thanks Andy and Lars. The parent jira has doc attached which > >> contains > >>>>>> some > >>>>>>> perf gain numbers.. We will be doing more tests in next 2 weeks > >>>> (before > >>>>>>> end of this month) and will publish them. Yes it will be great if > >> it > >>>> is > >>>>>>> more IST friendly time :-) > >>>>>>> > >>>>>>> -Anoop- > >>>>>>> > >>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell < > >>>>>> andrew.purt...@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>>> I can represent your side Ram (and Anoop). I've been known always > >>>> argue > >>>>>>>> both side of a discussion and to never take sides easily (drives > >> some > >>>>>> folks > >>>>>>>> crazy). > >>>>>>>> > >>>>>>>> I can vouch for this (smile) > >>>>>>>> > >>>>>>>> I also can offer support for off heaping there. At the same time > we > >> do > >>>>>>>> have a gap where we can't point to a timeline of improvements > (yet, > >>>>>> anyway) > >>>>>>>> with benchmarks showing gains where your goals need them. For > >> example, > >>>>>>>> stock HBase in one JVM can address max N GB for response time > >>>>>> distribution > >>>>>>>> D; dev version of HBase in off heap branch can address max N' GB > for > >>>>>>>> distribution D', where N' > N and D > D' (distribution D' > >>>> statistically > >>>>>>>> shows better/lower response times). > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org> > >> wrote: > >>>>>>>>> > >>>>>>>>> I'm in favor of anything that improves performance (and > preferably > >>>>>>>> doesn't set us back into a world that's worse than C due to the > lack > >>>> of > >>>>>>>> pointers in Java).Never said "I don't like it", it's just that I'm > >>>>>> perhaps > >>>>>>>> asking for more numbers and justification in weighing the pros and > >>>> cons. > >>>>>>>>> I can represent your side Ram (and Anoop). I've been known always > >>>> argue > >>>>>>>> both side of a discussion and to never take sides easily (drives > >> some > >>>>>> folks > >>>>>>>> crazy). And Stack's there too, he yell at me where needed :) > >>>>>>>>> > >>>>>>>>> Perhaps we can do it a bit later in the evening so there is a > >>>> fighting > >>>>>>>> chance that folks on IST can participate. I know that some of our > >>>> folks > >>>>>> on > >>>>>>>> IST would love to participate in the backup discussion). > >>>>>>>>> > >>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just > >>>> need > >>>>>>>> an approx. number of folks. > >>>>>>>>> > >>>>>>>>> -- Lars > >>>>>>>>> > >>>>>>>>> From: ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com> > >>>>>>>>> To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars > hofhansl < > >>>>>>>> la...@apache.org> > >>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM > >>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on > near-term > >>>> work > >>>>>>>>> > >>>>>>>>> Hi > >>>>>>>>> What time will it be on August 26th? > >>>>>>>>> @LarsYa. I know that you are not generally in favour of this > >>>> offheaping > >>>>>>>> stuff. May be if we (from India) can attend this meeting remotely > >>>> your > >>>>>>>> thoughts can be discussed and also the current state of this work. > >>>>>>>>> RegardsRam > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org > > > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Works for me. I'll be back in the Bay Area the week of August > 9th. > >>>>>>>>> We have done a _lot_ of work on backups as well - ours are more > >>>>>>>> complicated as we wanted fast per-tenant restores, so data is > >>>> "grouped" > >>>>>> by > >>>>>>>> tenant. Would like to sync up on that (hopefully some of the folks > >> who > >>>>>>>> wrote most of the code will be in town, I'll check). > >>>>>>>>> > >>>>>>>>> Also interested in the "Time" and "offheap" parts (although you > >> folks > >>>>>>>> usually do not like what I think about the offheap efforts :) ). > >>>>>>>>> Would like to add the following topics: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> - "Timestamp Resolution". Or making space for more bits in the > >>>>>>>> timestamps (happy to cover that, unless it's part of the "Time" > >> topic) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> - "Replication". We found that replication cannot keep up with > high > >>>>>>>> write loads, due to the fact that replicated is strictly single > >>>> threaded > >>>>>>>> per regionserver (even though we have multiple region servers on > the > >>>>>> sink > >>>>>>>> side) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> - "Spark integration" (Ted Malaska?) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> OK... Out now to make a "bullshit hat". > >>>>>>>>> > >>>>>>>>> -- Lars > >>>>>>>>> > >>>>>>>>> ________________________________ > >>>>>>>>> From: Sean Busbey <bus...@cloudera.com> > >>>>>>>>> To: dev <dev@hbase.apache.org> > >>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM > >>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on > near-term > >>>> work > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> I'm planning to be in the Bay area the week of the 24th of > August. > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Sean > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <apurt...@apache.org> > >>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> I can be up in your area in August. > >>>>>>>>>> > >>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> > >> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar < > >>>> enis....@gmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Sounds good. It has been a while we did the talk-aton. > >>>>>>>>>>>> > >>>>>>>>>>>> I'll be off starting 25 of July, so I prefer something next > week > >>>> if > >>>>>>>>>>>> possible. > >>>>>>>>>>>> > >>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of August > >>>>>> (Mikhail > >>>>>>>>>> on > >>>>>>>>>>> the 20th). > >>>>>>>>>>> St.Ack > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> Enis > >>>>>>>>>>>> > >>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> > >> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Matteo and I were thinking it time devs got together for a > >>>> pow-wow. > >>>>>>>>>>> There > >>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below list) > >> and > >>>> it > >>>>>>>>>>> would > >>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that have > >>>> gone > >>>>>>>>>>>> dormant > >>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached > >> google > >>>>>> doc > >>>>>>>>>>>> that > >>>>>>>>>>>>> need socializing. > >>>>>>>>>>>>> > >>>>>>>>>>>>> You can only come if you are wearing your bullshit hat. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Topics we'd go over could include: > >>>>>>>>>>>>> > >>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions > >>>> (Matteo/Stack) > >>>>>>>>>>>>> + Current state of the offheaping of read path and alternate > >>>>>> KeyValue > >>>>>>>>>>>>> implementation (Anoop/Ram) > >>>>>>>>>>>>> + Append rejigger (Elliott) > >>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven) > >>>>>>>>>>>>> + Splitting meta/1M regions > >>>>>>>>>>>>> + The revived Backup (Vladimir) > >>>>>>>>>>>>> + Time (Enis) > >>>>>>>>>>>>> + The overloaded SequenceId (Stack) > >>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean) > >>>>>>>>>>>>> + hbase-2.0.0 > >>>>>>>>>>>>> > >>>>>>>>>>>>> I put names by folks I know could talk to the topic. If you > >> want > >>>> to > >>>>>>>>>>> take > >>>>>>>>>>>>> over a topic or put your name by one, just say. Suggest that > >>>>>>>>>>> discussion > >>>>>>>>>>>>> lead off with a 5-10minute on current state of > >>>>>>>>>>>>> thought/design/implementation. > >>>>>>>>>>>>> > >>>>>>>>>>>>> What do others think? > >>>>>>>>>>>>> > >>>>>>>>>>>>> What date would suit folks? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Anyone want to host? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> Matteo and St.Ack > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> Best regards, > >>>>>>>>>> > >>>>>>>>>> - Andy > >>>>>>>>>> > >>>>>>>>>> Problems worthy of attack prove their worth by hitting back. - > >> Piet > >>>>>> Hein > >>>>>>>>>> (via Tom White) > >> >