We will be doing some more large data tests in coming week Andy.. Will report back more. Also will do a write up , in what all ways the work might help us. As Sean said, we will continue in another thread if any thing further.. Will soon write back on the test result. Thanks.
-Anoop- On Mon, Jul 20, 2015 at 9:59 PM, Andrew Purtell <[email protected]> wrote: > Cool, thanks. > > Is a 20% latency reduction the most we can expect or do you think there is > room for more improvement? Just curious. > > Is latency reduction the only goal? Anything here about supporting larger > heaps? Is there something we can measure in that regard? > > Hope you see my point and there's enough here to prime a goals and metrics > discussion at the pow wow or on the relevant JIRAs. > > > On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan < > [email protected]> wrote: > > > > Hi Andy > > > > Based on our POCs done, we expect around 20% improvement in latency. For > > scans it will be little lesser than 20%. > > > > Regards > > Ram > > > > > > On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell < > [email protected]> > > wrote: > > > >> Hi Ram, > >> > >> Do you have any targets for what you are measuring? What are the goals > you > >> guys are working toward with the off heaping changes? > >> > >> > >>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan < > >>> [email protected]> wrote: > >>> > >>> Thanks Vladimir. > >>> Yeah, the reports that were attached specifically captured the 95/99th > >>> percentile. > >>> The reason for checking the server side perf was to specifically see > the > >>> improvement in the server side and also the client was sending large > >>> results in multiple threads. So wanted to avoid the n/w interference. I > >>> think it was a general practice that we were following. > >>> We Wil do some more tests and get some latest readings with bigger data > >>> sets. > >>> Sent from mobile. > >>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <[email protected]> > >> wrote: > >>>> > >>>> +1 > >>>> > >>>> Yeah, something like that, with aspirational targets for improvement > >> from > >>>> current releases. Then what to measure, the tests to run, and criteria > >> for > >>>> evaluation are clear and organized and we're able to better assess how > >> the > >>>> work in progress is meeting its goals (or not) > >>>> > >>>> > >>>> > >>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov < > [email protected] > >>> > >>>> wrote: > >>>> > >>>>>>> Umbrella jira to make sure we can have blocks cached in offheap > >> backed > >>>>> cache. In the entire read path, we can refer to this offheap buffer > and > >>>>> avoid onheap copying. > >>>>> > >>>>> I think, on a read path, the most important improvement we could > >> imagine > >>>> is > >>>>> elimination or reducing of object creations (KVs, iterators etc). > >>>>> object reuse, byte buffers reuse or offheap buffers reuse, API change > >>>> etc. > >>>>> If this is a part of this JIRA, then I would easily define a goal: > >>>>> improving 95/99% latency of a read operations. Not performance, but > >>>> latency > >>>>> matters > >>>>> > >>>>> -Vlad > >>>>> > >>>>> > >>>>> > >>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell < > >>>> [email protected]> > >>>>> wrote: > >>>>> > >>>>>> That's not a realistic or useful test scenario, unless the goal is > to > >>>>>> accelerate queries where all cells are filtered at the server. > >>>>>> > >>>>>> > >>>>>> > >>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <[email protected]> > >>>> wrote: > >>>>>>> > >>>>>>> No Andy. 11425 having doc attached to it. At the end of it, we have > >>>> added > >>>>>>> perf numbers in a cluster testing. This was done using PE get and > >> scan > >>>>>>> tests with filtering all cells at server (to not consider n/w > >> bandwidth > >>>>>>> constraints) > >>>>>>> > >>>>>>> -Anoop- > >>>>>>> > >>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell < > >>>>>> [email protected]> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> We have some microbenchmarks, not evidence of differences seen > from > >> a > >>>>>>>> client application. I'm not saying that microbenchmarks are not > >>>> totally > >>>>>>>> necessary and a great start - they are - but that they don't > measure > >>>> an > >>>>>> end > >>>>>>>> goal. Furthermore unless I've missed one somewhere we don't have a > >>>> JIRA > >>>>>> or > >>>>>>>> design doc that states a clear end goal metric like the strawman I > >>>> threw > >>>>>>>> together in my previous mail. A measurable system level goal and > >> some > >>>>>> data > >>>>>>>> from full cluster testing would go a lot further toward letting > all > >> of > >>>>>> us > >>>>>>>> evaluate the potential and payoff of the work. In the meantime we > >>>> should > >>>>>>>> probably be assembling these changes on a branch instead of in > >> trunk, > >>>>>> for > >>>>>>>> as long as the goal is not clearly defined and the payoff and > >>>> potential > >>>>>> for > >>>>>>>> perf regressions is untested and unknown. > >>>>>>>> > >>>>>>>> > >>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <[email protected]> > >>>> wrote: > >>>>>>>>> > >>>>>>>>> Thanks Andy and Lars. The parent jira has doc attached which > >>>> contains > >>>>>>>> some > >>>>>>>>> perf gain numbers.. We will be doing more tests in next 2 weeks > >>>>>> (before > >>>>>>>>> end of this month) and will publish them. Yes it will be great > if > >>>> it > >>>>>> is > >>>>>>>>> more IST friendly time :-) > >>>>>>>>> > >>>>>>>>> -Anoop- > >>>>>>>>> > >>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell < > >>>>>>>> [email protected]> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been known > always > >>>>>> argue > >>>>>>>>>> both side of a discussion and to never take sides easily (drives > >>>> some > >>>>>>>> folks > >>>>>>>>>> crazy). > >>>>>>>>>> > >>>>>>>>>> I can vouch for this (smile) > >>>>>>>>>> > >>>>>>>>>> I also can offer support for off heaping there. At the same time > >> we > >>>> do > >>>>>>>>>> have a gap where we can't point to a timeline of improvements > >> (yet, > >>>>>>>> anyway) > >>>>>>>>>> with benchmarks showing gains where your goals need them. For > >>>> example, > >>>>>>>>>> stock HBase in one JVM can address max N GB for response time > >>>>>>>> distribution > >>>>>>>>>> D; dev version of HBase in off heap branch can address max N' GB > >> for > >>>>>>>>>> distribution D', where N' > N and D > D' (distribution D' > >>>>>> statistically > >>>>>>>>>> shows better/lower response times). > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <[email protected]> > >>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> I'm in favor of anything that improves performance (and > >> preferably > >>>>>>>>>> doesn't set us back into a world that's worse than C due to the > >> lack > >>>>>> of > >>>>>>>>>> pointers in Java).Never said "I don't like it", it's just that > I'm > >>>>>>>> perhaps > >>>>>>>>>> asking for more numbers and justification in weighing the pros > and > >>>>>> cons. > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been known > always > >>>>>> argue > >>>>>>>>>> both side of a discussion and to never take sides easily (drives > >>>> some > >>>>>>>> folks > >>>>>>>>>> crazy). And Stack's there too, he yell at me where needed :) > >>>>>>>>>>> > >>>>>>>>>>> Perhaps we can do it a bit later in the evening so there is a > >>>>>> fighting > >>>>>>>>>> chance that folks on IST can participate. I know that some of > our > >>>>>> folks > >>>>>>>> on > >>>>>>>>>> IST would love to participate in the backup discussion). > >>>>>>>>>>> > >>>>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd > just > >>>>>> need > >>>>>>>>>> an approx. number of folks. > >>>>>>>>>>> > >>>>>>>>>>> -- Lars > >>>>>>>>>>> > >>>>>>>>>>> From: ramkrishna vasudevan <[email protected]> > >>>>>>>>>>> To: "[email protected]" <[email protected]>; lars > >> hofhansl < > >>>>>>>>>> [email protected]> > >>>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on > >> near-term > >>>>>> work > >>>>>>>>>>> > >>>>>>>>>>> Hi > >>>>>>>>>>> What time will it be on August 26th? > >>>>>>>>>>> @LarsYa. I know that you are not generally in favour of this > >>>>>> offheaping > >>>>>>>>>> stuff. May be if we (from India) can attend this meeting > remotely > >>>>>> your > >>>>>>>>>> thoughts can be discussed and also the current state of this > work. > >>>>>>>>>>> RegardsRam > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl < > [email protected] > >>> > >>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Works for me. I'll be back in the Bay Area the week of August > >> 9th. > >>>>>>>>>>> We have done a _lot_ of work on backups as well - ours are more > >>>>>>>>>> complicated as we wanted fast per-tenant restores, so data is > >>>>>> "grouped" > >>>>>>>> by > >>>>>>>>>> tenant. Would like to sync up on that (hopefully some of the > folks > >>>> who > >>>>>>>>>> wrote most of the code will be in town, I'll check). > >>>>>>>>>>> > >>>>>>>>>>> Also interested in the "Time" and "offheap" parts (although you > >>>> folks > >>>>>>>>>> usually do not like what I think about the offheap efforts :) ). > >>>>>>>>>>> Would like to add the following topics: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> - "Timestamp Resolution". Or making space for more bits in the > >>>>>>>>>> timestamps (happy to cover that, unless it's part of the "Time" > >>>> topic) > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> - "Replication". We found that replication cannot keep up with > >> high > >>>>>>>>>> write loads, due to the fact that replicated is strictly single > >>>>>> threaded > >>>>>>>>>> per regionserver (even though we have multiple region servers on > >> the > >>>>>>>> sink > >>>>>>>>>> side) > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> - "Spark integration" (Ted Malaska?) > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> OK... Out now to make a "bullshit hat". > >>>>>>>>>>> > >>>>>>>>>>> -- Lars > >>>>>>>>>>> > >>>>>>>>>>> ________________________________ > >>>>>>>>>>> From: Sean Busbey <[email protected]> > >>>>>>>>>>> To: dev <[email protected]> > >>>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on > >> near-term > >>>>>> work > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> I'm planning to be in the Bay area the week of the 24th of > >> August. > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Sean > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" < > [email protected]> > >>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> I can be up in your area in August. > >>>>>>>>>>>> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <[email protected]> > >>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar < > >>>>>> [email protected]> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Sounds good. It has been a while we did the talk-aton. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I'll be off starting 25 of July, so I prefer something next > >> week > >>>>>> if > >>>>>>>>>>>>>> possible. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of > August > >>>>>>>> (Mikhail > >>>>>>>>>>>> on > >>>>>>>>>>>>> the 20th). > >>>>>>>>>>>>> St.Ack > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Enis > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <[email protected]> > >>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Matteo and I were thinking it time devs got together for a > >>>>>> pow-wow. > >>>>>>>>>>>>> There > >>>>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below > list) > >>>> and > >>>>>> it > >>>>>>>>>>>>> would > >>>>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that > have > >>>>>> gone > >>>>>>>>>>>>>> dormant > >>>>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached > >>>> google > >>>>>>>> doc > >>>>>>>>>>>>>> that > >>>>>>>>>>>>>>> need socializing. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> You can only come if you are wearing your bullshit hat. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Topics we'd go over could include: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions > >>>>>> (Matteo/Stack) > >>>>>>>>>>>>>>> + Current state of the offheaping of read path and > alternate > >>>>>>>> KeyValue > >>>>>>>>>>>>>>> implementation (Anoop/Ram) > >>>>>>>>>>>>>>> + Append rejigger (Elliott) > >>>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven) > >>>>>>>>>>>>>>> + Splitting meta/1M regions > >>>>>>>>>>>>>>> + The revived Backup (Vladimir) > >>>>>>>>>>>>>>> + Time (Enis) > >>>>>>>>>>>>>>> + The overloaded SequenceId (Stack) > >>>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean) > >>>>>>>>>>>>>>> + hbase-2.0.0 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I put names by folks I know could talk to the topic. If you > >>>> want > >>>>>> to > >>>>>>>>>>>>> take > >>>>>>>>>>>>>>> over a topic or put your name by one, just say. Suggest > that > >>>>>>>>>>>>> discussion > >>>>>>>>>>>>>>> lead off with a 5-10minute on current state of > >>>>>>>>>>>>>>> thought/design/implementation. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> What do others think? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> What date would suit folks? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Anyone want to host? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>> Matteo and St.Ack > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> Best regards, > >>>>>>>>>>>> > >>>>>>>>>>>> - Andy > >>>>>>>>>>>> > >>>>>>>>>>>> Problems worthy of attack prove their worth by hitting back. - > >>>> Piet > >>>>>>>> Hein > >>>>>>>>>>>> (via Tom White) > >> >
