Can y'all move discussion of the off heaping work (or perf feature dev generally) to a new thread?
-- Sean On Jul 20, 2015 6:44 AM, "ramkrishna vasudevan" < ramkrishna.s.vasude...@gmail.com> wrote: > Hi Andy > > Based on our POCs done, we expect around 20% improvement in latency. For > scans it will be little lesser than 20%. > > Regards > Ram > > > On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <andrew.purt...@gmail.com > > > wrote: > > > Hi Ram, > > > > Do you have any targets for what you are measuring? What are the goals > you > > guys are working toward with the off heaping changes? > > > > > > > On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan < > > ramkrishna.s.vasude...@gmail.com> wrote: > > > > > > Thanks Vladimir. > > > Yeah, the reports that were attached specifically captured the 95/99th > > > percentile. > > > The reason for checking the server side perf was to specifically see > the > > > improvement in the server side and also the client was sending large > > > results in multiple threads. So wanted to avoid the n/w interference. I > > > think it was a general practice that we were following. > > > We Wil do some more tests and get some latest readings with bigger data > > > sets. > > > Sent from mobile. > > >> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <andrew.purt...@gmail.com> > > wrote: > > >> > > >> +1 > > >> > > >> Yeah, something like that, with aspirational targets for improvement > > from > > >> current releases. Then what to measure, the tests to run, and criteria > > for > > >> evaluation are clear and organized and we're able to better assess how > > the > > >> work in progress is meeting its goals (or not) > > >> > > >> > > >> > > >> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov < > vladrodio...@gmail.com > > > > > >> wrote: > > >> > > >>>>> Umbrella jira to make sure we can have blocks cached in offheap > > backed > > >>> cache. In the entire read path, we can refer to this offheap buffer > and > > >>> avoid onheap copying. > > >>> > > >>> I think, on a read path, the most important improvement we could > > imagine > > >> is > > >>> elimination or reducing of object creations (KVs, iterators etc). > > >>> object reuse, byte buffers reuse or offheap buffers reuse, API change > > >> etc. > > >>> If this is a part of this JIRA, then I would easily define a goal: > > >>> improving 95/99% latency of a read operations. Not performance, but > > >> latency > > >>> matters > > >>> > > >>> -Vlad > > >>> > > >>> > > >>> > > >>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell < > > >> andrew.purt...@gmail.com> > > >>> wrote: > > >>> > > >>>> That's not a realistic or useful test scenario, unless the goal is > to > > >>>> accelerate queries where all cells are filtered at the server. > > >>>> > > >>>> > > >>>> > > >>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <anoop.hb...@gmail.com> > > >> wrote: > > >>>>> > > >>>>> No Andy. 11425 having doc attached to it. At the end of it, we have > > >> added > > >>>>> perf numbers in a cluster testing. This was done using PE get and > > scan > > >>>>> tests with filtering all cells at server (to not consider n/w > > bandwidth > > >>>>> constraints) > > >>>>> > > >>>>> -Anoop- > > >>>>> > > >>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell < > > >>>> andrew.purt...@gmail.com> > > >>>>> wrote: > > >>>>> > > >>>>>> We have some microbenchmarks, not evidence of differences seen > from > > a > > >>>>>> client application. I'm not saying that microbenchmarks are not > > >> totally > > >>>>>> necessary and a great start - they are - but that they don't > measure > > >> an > > >>>> end > > >>>>>> goal. Furthermore unless I've missed one somewhere we don't have a > > >> JIRA > > >>>> or > > >>>>>> design doc that states a clear end goal metric like the strawman I > > >> threw > > >>>>>> together in my previous mail. A measurable system level goal and > > some > > >>>> data > > >>>>>> from full cluster testing would go a lot further toward letting > all > > of > > >>>> us > > >>>>>> evaluate the potential and payoff of the work. In the meantime we > > >> should > > >>>>>> probably be assembling these changes on a branch instead of in > > trunk, > > >>>> for > > >>>>>> as long as the goal is not clearly defined and the payoff and > > >> potential > > >>>> for > > >>>>>> perf regressions is untested and unknown. > > >>>>>> > > >>>>>> > > >>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <anoop.hb...@gmail.com> > > >> wrote: > > >>>>>>> > > >>>>>>> Thanks Andy and Lars. The parent jira has doc attached which > > >> contains > > >>>>>> some > > >>>>>>> perf gain numbers.. We will be doing more tests in next 2 weeks > > >>>> (before > > >>>>>>> end of this month) and will publish them. Yes it will be great > if > > >> it > > >>>> is > > >>>>>>> more IST friendly time :-) > > >>>>>>> > > >>>>>>> -Anoop- > > >>>>>>> > > >>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell < > > >>>>>> andrew.purt...@gmail.com> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>>> I can represent your side Ram (and Anoop). I've been known > always > > >>>> argue > > >>>>>>>> both side of a discussion and to never take sides easily (drives > > >> some > > >>>>>> folks > > >>>>>>>> crazy). > > >>>>>>>> > > >>>>>>>> I can vouch for this (smile) > > >>>>>>>> > > >>>>>>>> I also can offer support for off heaping there. At the same time > > we > > >> do > > >>>>>>>> have a gap where we can't point to a timeline of improvements > > (yet, > > >>>>>> anyway) > > >>>>>>>> with benchmarks showing gains where your goals need them. For > > >> example, > > >>>>>>>> stock HBase in one JVM can address max N GB for response time > > >>>>>> distribution > > >>>>>>>> D; dev version of HBase in off heap branch can address max N' GB > > for > > >>>>>>>> distribution D', where N' > N and D > D' (distribution D' > > >>>> statistically > > >>>>>>>> shows better/lower response times). > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org> > > >> wrote: > > >>>>>>>>> > > >>>>>>>>> I'm in favor of anything that improves performance (and > > preferably > > >>>>>>>> doesn't set us back into a world that's worse than C due to the > > lack > > >>>> of > > >>>>>>>> pointers in Java).Never said "I don't like it", it's just that > I'm > > >>>>>> perhaps > > >>>>>>>> asking for more numbers and justification in weighing the pros > and > > >>>> cons. > > >>>>>>>>> I can represent your side Ram (and Anoop). I've been known > always > > >>>> argue > > >>>>>>>> both side of a discussion and to never take sides easily (drives > > >> some > > >>>>>> folks > > >>>>>>>> crazy). And Stack's there too, he yell at me where needed :) > > >>>>>>>>> > > >>>>>>>>> Perhaps we can do it a bit later in the evening so there is a > > >>>> fighting > > >>>>>>>> chance that folks on IST can participate. I know that some of > our > > >>>> folks > > >>>>>> on > > >>>>>>>> IST would love to participate in the backup discussion). > > >>>>>>>>> > > >>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd > just > > >>>> need > > >>>>>>>> an approx. number of folks. > > >>>>>>>>> > > >>>>>>>>> -- Lars > > >>>>>>>>> > > >>>>>>>>> From: ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com> > > >>>>>>>>> To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars > > hofhansl < > > >>>>>>>> la...@apache.org> > > >>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM > > >>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on > > near-term > > >>>> work > > >>>>>>>>> > > >>>>>>>>> Hi > > >>>>>>>>> What time will it be on August 26th? > > >>>>>>>>> @LarsYa. I know that you are not generally in favour of this > > >>>> offheaping > > >>>>>>>> stuff. May be if we (from India) can attend this meeting > remotely > > >>>> your > > >>>>>>>> thoughts can be discussed and also the current state of this > work. > > >>>>>>>>> RegardsRam > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl < > la...@apache.org > > > > > >>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>> Works for me. I'll be back in the Bay Area the week of August > > 9th. > > >>>>>>>>> We have done a _lot_ of work on backups as well - ours are more > > >>>>>>>> complicated as we wanted fast per-tenant restores, so data is > > >>>> "grouped" > > >>>>>> by > > >>>>>>>> tenant. Would like to sync up on that (hopefully some of the > folks > > >> who > > >>>>>>>> wrote most of the code will be in town, I'll check). > > >>>>>>>>> > > >>>>>>>>> Also interested in the "Time" and "offheap" parts (although you > > >> folks > > >>>>>>>> usually do not like what I think about the offheap efforts :) ). > > >>>>>>>>> Would like to add the following topics: > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> - "Timestamp Resolution". Or making space for more bits in the > > >>>>>>>> timestamps (happy to cover that, unless it's part of the "Time" > > >> topic) > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> - "Replication". We found that replication cannot keep up with > > high > > >>>>>>>> write loads, due to the fact that replicated is strictly single > > >>>> threaded > > >>>>>>>> per regionserver (even though we have multiple region servers on > > the > > >>>>>> sink > > >>>>>>>> side) > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> - "Spark integration" (Ted Malaska?) > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> OK... Out now to make a "bullshit hat". > > >>>>>>>>> > > >>>>>>>>> -- Lars > > >>>>>>>>> > > >>>>>>>>> ________________________________ > > >>>>>>>>> From: Sean Busbey <bus...@cloudera.com> > > >>>>>>>>> To: dev <dev@hbase.apache.org> > > >>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM > > >>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on > > near-term > > >>>> work > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> I'm planning to be in the Bay area the week of the 24th of > > August. > > >>>>>>>>> > > >>>>>>>>> -- > > >>>>>>>>> Sean > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" < > apurt...@apache.org> > > >>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>> I can be up in your area in August. > > >>>>>>>>>> > > >>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> > > >> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar < > > >>>> enis....@gmail.com> > > >>>>>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> Sounds good. It has been a while we did the talk-aton. > > >>>>>>>>>>>> > > >>>>>>>>>>>> I'll be off starting 25 of July, so I prefer something next > > week > > >>>> if > > >>>>>>>>>>>> possible. > > >>>>>>>>>>>> > > >>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of > August > > >>>>>> (Mikhail > > >>>>>>>>>> on > > >>>>>>>>>>> the 20th). > > >>>>>>>>>>> St.Ack > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>> Enis > > >>>>>>>>>>>> > > >>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> > > >> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Matteo and I were thinking it time devs got together for a > > >>>> pow-wow. > > >>>>>>>>>>> There > > >>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below > list) > > >> and > > >>>> it > > >>>>>>>>>>> would > > >>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that > have > > >>>> gone > > >>>>>>>>>>>> dormant > > >>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached > > >> google > > >>>>>> doc > > >>>>>>>>>>>> that > > >>>>>>>>>>>>> need socializing. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> You can only come if you are wearing your bullshit hat. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Topics we'd go over could include: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions > > >>>> (Matteo/Stack) > > >>>>>>>>>>>>> + Current state of the offheaping of read path and > alternate > > >>>>>> KeyValue > > >>>>>>>>>>>>> implementation (Anoop/Ram) > > >>>>>>>>>>>>> + Append rejigger (Elliott) > > >>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven) > > >>>>>>>>>>>>> + Splitting meta/1M regions > > >>>>>>>>>>>>> + The revived Backup (Vladimir) > > >>>>>>>>>>>>> + Time (Enis) > > >>>>>>>>>>>>> + The overloaded SequenceId (Stack) > > >>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean) > > >>>>>>>>>>>>> + hbase-2.0.0 > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I put names by folks I know could talk to the topic. If you > > >> want > > >>>> to > > >>>>>>>>>>> take > > >>>>>>>>>>>>> over a topic or put your name by one, just say. Suggest > that > > >>>>>>>>>>> discussion > > >>>>>>>>>>>>> lead off with a 5-10minute on current state of > > >>>>>>>>>>>>> thought/design/implementation. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> What do others think? > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> What date would suit folks? > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Anyone want to host? > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Thanks, > > >>>>>>>>>>>>> Matteo and St.Ack > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> -- > > >>>>>>>>>> Best regards, > > >>>>>>>>>> > > >>>>>>>>>> - Andy > > >>>>>>>>>> > > >>>>>>>>>> Problems worthy of attack prove their worth by hitting back. - > > >> Piet > > >>>>>> Hein > > >>>>>>>>>> (via Tom White) > > >> > > >