Hi Ram, Do you have any targets for what you are measuring? What are the goals you guys are working toward with the off heaping changes?
> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan > <[email protected]> wrote: > > Thanks Vladimir. > Yeah, the reports that were attached specifically captured the 95/99th > percentile. > The reason for checking the server side perf was to specifically see the > improvement in the server side and also the client was sending large > results in multiple threads. So wanted to avoid the n/w interference. I > think it was a general practice that we were following. > We Wil do some more tests and get some latest readings with bigger data > sets. > Sent from mobile. >> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <[email protected]> wrote: >> >> +1 >> >> Yeah, something like that, with aspirational targets for improvement from >> current releases. Then what to measure, the tests to run, and criteria for >> evaluation are clear and organized and we're able to better assess how the >> work in progress is meeting its goals (or not) >> >> >> >> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <[email protected]> >> wrote: >> >>>>> Umbrella jira to make sure we can have blocks cached in offheap backed >>> cache. In the entire read path, we can refer to this offheap buffer and >>> avoid onheap copying. >>> >>> I think, on a read path, the most important improvement we could imagine >> is >>> elimination or reducing of object creations (KVs, iterators etc). >>> object reuse, byte buffers reuse or offheap buffers reuse, API change >> etc. >>> If this is a part of this JIRA, then I would easily define a goal: >>> improving 95/99% latency of a read operations. Not performance, but >> latency >>> matters >>> >>> -Vlad >>> >>> >>> >>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell < >> [email protected]> >>> wrote: >>> >>>> That's not a realistic or useful test scenario, unless the goal is to >>>> accelerate queries where all cells are filtered at the server. >>>> >>>> >>>> >>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <[email protected]> >> wrote: >>>>> >>>>> No Andy. 11425 having doc attached to it. At the end of it, we have >> added >>>>> perf numbers in a cluster testing. This was done using PE get and scan >>>>> tests with filtering all cells at server (to not consider n/w bandwidth >>>>> constraints) >>>>> >>>>> -Anoop- >>>>> >>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell < >>>> [email protected]> >>>>> wrote: >>>>> >>>>>> We have some microbenchmarks, not evidence of differences seen from a >>>>>> client application. I'm not saying that microbenchmarks are not >> totally >>>>>> necessary and a great start - they are - but that they don't measure >> an >>>> end >>>>>> goal. Furthermore unless I've missed one somewhere we don't have a >> JIRA >>>> or >>>>>> design doc that states a clear end goal metric like the strawman I >> threw >>>>>> together in my previous mail. A measurable system level goal and some >>>> data >>>>>> from full cluster testing would go a lot further toward letting all of >>>> us >>>>>> evaluate the potential and payoff of the work. In the meantime we >> should >>>>>> probably be assembling these changes on a branch instead of in trunk, >>>> for >>>>>> as long as the goal is not clearly defined and the payoff and >> potential >>>> for >>>>>> perf regressions is untested and unknown. >>>>>> >>>>>> >>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <[email protected]> >> wrote: >>>>>>> >>>>>>> Thanks Andy and Lars. The parent jira has doc attached which >> contains >>>>>> some >>>>>>> perf gain numbers.. We will be doing more tests in next 2 weeks >>>> (before >>>>>>> end of this month) and will publish them. Yes it will be great if >> it >>>> is >>>>>>> more IST friendly time :-) >>>>>>> >>>>>>> -Anoop- >>>>>>> >>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell < >>>>>> [email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>>> I can represent your side Ram (and Anoop). I've been known always >>>> argue >>>>>>>> both side of a discussion and to never take sides easily (drives >> some >>>>>> folks >>>>>>>> crazy). >>>>>>>> >>>>>>>> I can vouch for this (smile) >>>>>>>> >>>>>>>> I also can offer support for off heaping there. At the same time we >> do >>>>>>>> have a gap where we can't point to a timeline of improvements (yet, >>>>>> anyway) >>>>>>>> with benchmarks showing gains where your goals need them. For >> example, >>>>>>>> stock HBase in one JVM can address max N GB for response time >>>>>> distribution >>>>>>>> D; dev version of HBase in off heap branch can address max N' GB for >>>>>>>> distribution D', where N' > N and D > D' (distribution D' >>>> statistically >>>>>>>> shows better/lower response times). >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <[email protected]> >> wrote: >>>>>>>>> >>>>>>>>> I'm in favor of anything that improves performance (and preferably >>>>>>>> doesn't set us back into a world that's worse than C due to the lack >>>> of >>>>>>>> pointers in Java).Never said "I don't like it", it's just that I'm >>>>>> perhaps >>>>>>>> asking for more numbers and justification in weighing the pros and >>>> cons. >>>>>>>>> I can represent your side Ram (and Anoop). I've been known always >>>> argue >>>>>>>> both side of a discussion and to never take sides easily (drives >> some >>>>>> folks >>>>>>>> crazy). And Stack's there too, he yell at me where needed :) >>>>>>>>> >>>>>>>>> Perhaps we can do it a bit later in the evening so there is a >>>> fighting >>>>>>>> chance that folks on IST can participate. I know that some of our >>>> folks >>>>>> on >>>>>>>> IST would love to participate in the backup discussion). >>>>>>>>> >>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just >>>> need >>>>>>>> an approx. number of folks. >>>>>>>>> >>>>>>>>> -- Lars >>>>>>>>> >>>>>>>>> From: ramkrishna vasudevan <[email protected]> >>>>>>>>> To: "[email protected]" <[email protected]>; lars hofhansl < >>>>>>>> [email protected]> >>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM >>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term >>>> work >>>>>>>>> >>>>>>>>> Hi >>>>>>>>> What time will it be on August 26th? >>>>>>>>> @LarsYa. I know that you are not generally in favour of this >>>> offheaping >>>>>>>> stuff. May be if we (from India) can attend this meeting remotely >>>> your >>>>>>>> thoughts can be discussed and also the current state of this work. >>>>>>>>> RegardsRam >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <[email protected]> >>>>>> wrote: >>>>>>>>> >>>>>>>>> Works for me. I'll be back in the Bay Area the week of August 9th. >>>>>>>>> We have done a _lot_ of work on backups as well - ours are more >>>>>>>> complicated as we wanted fast per-tenant restores, so data is >>>> "grouped" >>>>>> by >>>>>>>> tenant. Would like to sync up on that (hopefully some of the folks >> who >>>>>>>> wrote most of the code will be in town, I'll check). >>>>>>>>> >>>>>>>>> Also interested in the "Time" and "offheap" parts (although you >> folks >>>>>>>> usually do not like what I think about the offheap efforts :) ). >>>>>>>>> Would like to add the following topics: >>>>>>>>> >>>>>>>>> >>>>>>>>> - "Timestamp Resolution". Or making space for more bits in the >>>>>>>> timestamps (happy to cover that, unless it's part of the "Time" >> topic) >>>>>>>>> >>>>>>>>> >>>>>>>>> - "Replication". We found that replication cannot keep up with high >>>>>>>> write loads, due to the fact that replicated is strictly single >>>> threaded >>>>>>>> per regionserver (even though we have multiple region servers on the >>>>>> sink >>>>>>>> side) >>>>>>>>> >>>>>>>>> >>>>>>>>> - "Spark integration" (Ted Malaska?) >>>>>>>>> >>>>>>>>> >>>>>>>>> OK... Out now to make a "bullshit hat". >>>>>>>>> >>>>>>>>> -- Lars >>>>>>>>> >>>>>>>>> ________________________________ >>>>>>>>> From: Sean Busbey <[email protected]> >>>>>>>>> To: dev <[email protected]> >>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM >>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term >>>> work >>>>>>>>> >>>>>>>>> >>>>>>>>> I'm planning to be in the Bay area the week of the 24th of August. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Sean >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <[email protected]> >>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I can be up in your area in August. >>>>>>>>>> >>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <[email protected]> >> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar < >>>> [email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Sounds good. It has been a while we did the talk-aton. >>>>>>>>>>>> >>>>>>>>>>>> I'll be off starting 25 of July, so I prefer something next week >>>> if >>>>>>>>>>>> possible. >>>>>>>>>>>> >>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of August >>>>>> (Mikhail >>>>>>>>>> on >>>>>>>>>>> the 20th). >>>>>>>>>>> St.Ack >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Enis >>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <[email protected]> >> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Matteo and I were thinking it time devs got together for a >>>> pow-wow. >>>>>>>>>>> There >>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below list) >> and >>>> it >>>>>>>>>>> would >>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that have >>>> gone >>>>>>>>>>>> dormant >>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached >> google >>>>>> doc >>>>>>>>>>>> that >>>>>>>>>>>>> need socializing. >>>>>>>>>>>>> >>>>>>>>>>>>> You can only come if you are wearing your bullshit hat. >>>>>>>>>>>>> >>>>>>>>>>>>> Topics we'd go over could include: >>>>>>>>>>>>> >>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions >>>> (Matteo/Stack) >>>>>>>>>>>>> + Current state of the offheaping of read path and alternate >>>>>> KeyValue >>>>>>>>>>>>> implementation (Anoop/Ram) >>>>>>>>>>>>> + Append rejigger (Elliott) >>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven) >>>>>>>>>>>>> + Splitting meta/1M regions >>>>>>>>>>>>> + The revived Backup (Vladimir) >>>>>>>>>>>>> + Time (Enis) >>>>>>>>>>>>> + The overloaded SequenceId (Stack) >>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean) >>>>>>>>>>>>> + hbase-2.0.0 >>>>>>>>>>>>> >>>>>>>>>>>>> I put names by folks I know could talk to the topic. If you >> want >>>> to >>>>>>>>>>> take >>>>>>>>>>>>> over a topic or put your name by one, just say. Suggest that >>>>>>>>>>> discussion >>>>>>>>>>>>> lead off with a 5-10minute on current state of >>>>>>>>>>>>> thought/design/implementation. >>>>>>>>>>>>> >>>>>>>>>>>>> What do others think? >>>>>>>>>>>>> >>>>>>>>>>>>> What date would suit folks? >>>>>>>>>>>>> >>>>>>>>>>>>> Anyone want to host? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Matteo and St.Ack >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best regards, >>>>>>>>>> >>>>>>>>>> - Andy >>>>>>>>>> >>>>>>>>>> Problems worthy of attack prove their worth by hitting back. - >> Piet >>>>>> Hein >>>>>>>>>> (via Tom White) >>
