We have some microbenchmarks, not evidence of differences seen from a client application. I'm not saying that microbenchmarks are not totally necessary and a great start - they are - but that they don't measure an end goal. Furthermore unless I've missed one somewhere we don't have a JIRA or design doc that states a clear end goal metric like the strawman I threw together in my previous mail. A measurable system level goal and some data from full cluster testing would go a lot further toward letting all of us evaluate the potential and payoff of the work. In the meantime we should probably be assembling these changes on a branch instead of in trunk, for as long as the goal is not clearly defined and the payoff and potential for perf regressions is untested and unknown.
> On Jul 18, 2015, at 8:05 AM, Anoop John <[email protected]> wrote: > > Thanks Andy and Lars. The parent jira has doc attached which contains some > perf gain numbers.. We will be doing more tests in next 2 weeks (before > end of this month) and will publish them. Yes it will be great if it is > more IST friendly time :-) > > -Anoop- > > On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <[email protected]> > wrote: > >>> I can represent your side Ram (and Anoop). I've been known always argue >> both side of a discussion and to never take sides easily (drives some folks >> crazy). >> >> I can vouch for this (smile) >> >> I also can offer support for off heaping there. At the same time we do >> have a gap where we can't point to a timeline of improvements (yet, anyway) >> with benchmarks showing gains where your goals need them. For example, >> stock HBase in one JVM can address max N GB for response time distribution >> D; dev version of HBase in off heap branch can address max N' GB for >> distribution D', where N' > N and D > D' (distribution D' statistically >> shows better/lower response times). >> >> >> >>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <[email protected]> wrote: >>> >>> I'm in favor of anything that improves performance (and preferably >> doesn't set us back into a world that's worse than C due to the lack of >> pointers in Java).Never said "I don't like it", it's just that I'm perhaps >> asking for more numbers and justification in weighing the pros and cons. >>> I can represent your side Ram (and Anoop). I've been known always argue >> both side of a discussion and to never take sides easily (drives some folks >> crazy). And Stack's there too, he yell at me where needed :) >>> >>> Perhaps we can do it a bit later in the evening so there is a fighting >> chance that folks on IST can participate. I know that some of our folks on >> IST would love to participate in the backup discussion). >>> >>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just need >> an approx. number of folks. >>> >>> -- Lars >>> >>> From: ramkrishna vasudevan <[email protected]> >>> To: "[email protected]" <[email protected]>; lars hofhansl < >> [email protected]> >>> Sent: Wednesday, July 15, 2015 10:10 AM >>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work >>> >>> Hi >>> What time will it be on August 26th? >>> @LarsYa. I know that you are not generally in favour of this offheaping >> stuff. May be if we (from India) can attend this meeting remotely your >> thoughts can be discussed and also the current state of this work. >>> RegardsRam >>> >>> >>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <[email protected]> wrote: >>> >>> Works for me. I'll be back in the Bay Area the week of August 9th. >>> We have done a _lot_ of work on backups as well - ours are more >> complicated as we wanted fast per-tenant restores, so data is "grouped" by >> tenant. Would like to sync up on that (hopefully some of the folks who >> wrote most of the code will be in town, I'll check). >>> >>> Also interested in the "Time" and "offheap" parts (although you folks >> usually do not like what I think about the offheap efforts :) ). >>> Would like to add the following topics: >>> >>> >>> - "Timestamp Resolution". Or making space for more bits in the >> timestamps (happy to cover that, unless it's part of the "Time" topic) >>> >>> >>> - "Replication". We found that replication cannot keep up with high >> write loads, due to the fact that replicated is strictly single threaded >> per regionserver (even though we have multiple region servers on the sink >> side) >>> >>> >>> - "Spark integration" (Ted Malaska?) >>> >>> >>> OK... Out now to make a "bullshit hat". >>> >>> -- Lars >>> >>> ________________________________ >>> From: Sean Busbey <[email protected]> >>> To: dev <[email protected]> >>> Sent: Tuesday, July 14, 2015 7:11 PM >>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work >>> >>> >>> I'm planning to be in the Bay area the week of the 24th of August. >>> >>> -- >>> Sean >>> >>> >>> >>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <[email protected]> wrote: >>>> >>>> I can be up in your area in August. >>>> >>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <[email protected]> wrote: >>>>>> >>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <[email protected]> >>>>> wrote: >>>>> >>>>>> Sounds good. It has been a while we did the talk-aton. >>>>>> >>>>>> I'll be off starting 25 of July, so I prefer something next week if >>>>>> possible. >>>>>> >>>>>> You ever coming back? If so, when? I'm back on 10th of August (Mikhail >>>> on >>>>> the 20th). >>>>> St.Ack >>>>> >>>>> >>>>> >>>>> >>>>>> Enis >>>>>> >>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <[email protected]> wrote: >>>>>>> >>>>>>> Matteo and I were thinking it time devs got together for a pow-wow. >>>>> There >>>>>>> is a bunch of stuff in flight at the moment (see below list) and it >>>>> would >>>>>>> be good to meet and whiteboard, surface goodo ideas that have gone >>>>>> dormant >>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached google doc >>>>>> that >>>>>>> need socializing. >>>>>>> >>>>>>> You can only come if you are wearing your bullshit hat. >>>>>>> >>>>>>> Topics we'd go over could include: >>>>>>> >>>>>>> + Our filesystem layout will not work if 1M regions (Matteo/Stack) >>>>>>> + Current state of the offheaping of read path and alternate KeyValue >>>>>>> implementation (Anoop/Ram) >>>>>>> + Append rejigger (Elliott) >>>>>>> + A Pv2-based Assign (Matteo/Steven) >>>>>>> + Splitting meta/1M regions >>>>>>> + The revived Backup (Vladimir) >>>>>>> + Time (Enis) >>>>>>> + The overloaded SequenceId (Stack) >>>>>>> + Upstreaming IT testing (Dima/Sean) >>>>>>> + hbase-2.0.0 >>>>>>> >>>>>>> I put names by folks I know could talk to the topic. If you want to >>>>> take >>>>>>> over a topic or put your name by one, just say. Suggest that >>>>> discussion >>>>>>> lead off with a 5-10minute on current state of >>>>>>> thought/design/implementation. >>>>>>> >>>>>>> What do others think? >>>>>>> >>>>>>> What date would suit folks? >>>>>>> >>>>>>> Anyone want to host? >>>>>>> >>>>>>> Thanks, >>>>>>> Matteo and St.Ack >>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> >>>> - Andy >>>> >>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein >>>> (via Tom White) >>
