Re: DISCUSSION: lets do a developer workshop on near-term work

ramkrishna vasudevan Mon, 20 Jul 2015 04:47:20 -0700

Hi Andy

Based on our POCs done, we expect around 20% improvement in latency.  For
scans it will be little lesser than 20%.


Regards
Ram


On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <[email protected]>
wrote:

> Hi Ram,
>
> Do you have any targets for what you are measuring? What are the goals you
> guys are working toward with the off heaping changes?
>
>
> > On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
> [email protected]> wrote:
> >
> > Thanks Vladimir.
> > Yeah, the reports that were attached specifically captured the 95/99th
> > percentile.
> > The reason for checking the server side perf was to specifically see the
> > improvement in the server side and also the client was sending large
> > results in multiple threads. So wanted to avoid the n/w interference. I
> > think it was a general practice that we were following.
> > We Wil do some more tests and get some latest readings with bigger data
> > sets.
> > Sent from mobile.
> >> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <[email protected]>
> wrote:
> >>
> >> +1
> >>
> >> Yeah, something like that, with aspirational targets for improvement
> from
> >> current releases. Then what to measure, the tests to run, and criteria
> for
> >> evaluation are clear and organized and we're able to better assess how
> the
> >> work in progress is meeting its goals (or not)
> >>
> >>
> >>
> >> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <[email protected]
> >
> >> wrote:
> >>
> >>>>> Umbrella jira to make sure we can have blocks cached in offheap
> backed
> >>> cache. In the entire read path, we can refer to this offheap buffer and
> >>> avoid onheap copying.
> >>>
> >>> I think, on a read path, the most important improvement we could
> imagine
> >> is
> >>> elimination or reducing of object creations (KVs, iterators etc).
> >>> object reuse, byte buffers reuse or offheap buffers reuse, API change
> >> etc.
> >>> If this is a part of this JIRA, then I would easily define a goal:
> >>> improving 95/99% latency of a read operations. Not performance, but
> >> latency
> >>> matters
> >>>
> >>> -Vlad
> >>>
> >>>
> >>>
> >>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
> >> [email protected]>
> >>> wrote:
> >>>
> >>>> That's not a realistic or useful test scenario, unless the goal is to
> >>>> accelerate queries where all cells are filtered at the server.
> >>>>
> >>>>
> >>>>
> >>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <[email protected]>
> >> wrote:
> >>>>>
> >>>>> No Andy. 11425 having doc attached to it. At the end of it, we have
> >> added
> >>>>> perf numbers in a cluster testing.  This was done using PE get and
> scan
> >>>>> tests with filtering all cells at server (to not consider n/w
> bandwidth
> >>>>> constraints)
> >>>>>
> >>>>> -Anoop-
> >>>>>
> >>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
> >>>> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> We have some microbenchmarks, not evidence of differences seen from
> a
> >>>>>> client application. I'm not saying that microbenchmarks are not
> >> totally
> >>>>>> necessary and a great start - they are - but that they don't measure
> >> an
> >>>> end
> >>>>>> goal. Furthermore unless I've missed one somewhere we don't have a
> >> JIRA
> >>>> or
> >>>>>> design doc that states a clear end goal metric like the strawman I
> >> threw
> >>>>>> together in my previous mail. A measurable system level goal and
> some
> >>>> data
> >>>>>> from full cluster testing would go a lot further toward letting all
> of
> >>>> us
> >>>>>> evaluate the potential and payoff of the work. In the meantime we
> >> should
> >>>>>> probably be assembling these changes on a branch instead of in
> trunk,
> >>>> for
> >>>>>> as long as the goal is not clearly defined and the payoff and
> >> potential
> >>>> for
> >>>>>> perf regressions is untested and unknown.
> >>>>>>
> >>>>>>
> >>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <[email protected]>
> >> wrote:
> >>>>>>>
> >>>>>>> Thanks Andy and Lars.  The parent jira has doc attached which
> >> contains
> >>>>>> some
> >>>>>>> perf gain numbers..  We will be doing more tests in next 2 weeks
> >>>> (before
> >>>>>>> end of this month) and will publish them.   Yes it will be great if
> >> it
> >>>> is
> >>>>>>> more IST friendly time :-)
> >>>>>>>
> >>>>>>> -Anoop-
> >>>>>>>
> >>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> >>>>>> [email protected]>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>>> I can represent your side Ram (and Anoop). I've been known always
> >>>> argue
> >>>>>>>> both side of a discussion and to never take sides easily (drives
> >> some
> >>>>>> folks
> >>>>>>>> crazy).
> >>>>>>>>
> >>>>>>>> I can vouch for this (smile)
> >>>>>>>>
> >>>>>>>> I also can offer support for off heaping there. At the same time
> we
> >> do
> >>>>>>>> have a gap where we can't point to a timeline of improvements
> (yet,
> >>>>>> anyway)
> >>>>>>>> with benchmarks showing gains where your goals need them. For
> >> example,
> >>>>>>>> stock HBase in one JVM can address max N GB for response time
> >>>>>> distribution
> >>>>>>>> D; dev version of HBase in off heap branch can address max N' GB
> for
> >>>>>>>> distribution D', where N' > N and D > D' (distribution D'
> >>>> statistically
> >>>>>>>> shows better/lower response times).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <[email protected]>
> >> wrote:
> >>>>>>>>>
> >>>>>>>>> I'm in favor of anything that improves performance (and
> preferably
> >>>>>>>> doesn't set us back into a world that's worse than C due to the
> lack
> >>>> of
> >>>>>>>> pointers in Java).Never said "I don't like it", it's just that I'm
> >>>>>> perhaps
> >>>>>>>> asking for more numbers and justification in weighing the pros and
> >>>> cons.
> >>>>>>>>> I can represent your side Ram (and Anoop). I've been known always
> >>>> argue
> >>>>>>>> both side of a discussion and to never take sides easily (drives
> >> some
> >>>>>> folks
> >>>>>>>> crazy). And Stack's there too, he yell at me where needed :)
> >>>>>>>>>
> >>>>>>>>> Perhaps we can do it a bit later in the evening so there is a
> >>>> fighting
> >>>>>>>> chance that folks on IST can participate. I know that some of our
> >>>> folks
> >>>>>> on
> >>>>>>>> IST would love to participate in the backup discussion).
> >>>>>>>>>
> >>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just
> >>>> need
> >>>>>>>> an approx. number of folks.
> >>>>>>>>>
> >>>>>>>>> -- Lars
> >>>>>>>>>
> >>>>>>>>>  From: ramkrishna vasudevan <[email protected]>
> >>>>>>>>> To: "[email protected]" <[email protected]>; lars
> hofhansl <
> >>>>>>>> [email protected]>
> >>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
> >>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
> near-term
> >>>> work
> >>>>>>>>>
> >>>>>>>>> Hi
> >>>>>>>>> What time will it be on August 26th?
> >>>>>>>>> @LarsYa. I know that you are not generally in favour of this
> >>>> offheaping
> >>>>>>>> stuff.  May be if we (from India) can attend this meeting remotely
> >>>> your
> >>>>>>>> thoughts can be discussed and also the current state of this work.
> >>>>>>>>> RegardsRam
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <[email protected]
> >
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Works for me. I'll be back in the Bay Area the week of August
> 9th.
> >>>>>>>>> We have done a _lot_ of work on backups as well - ours are more
> >>>>>>>> complicated as we wanted fast per-tenant restores, so data is
> >>>> "grouped"
> >>>>>> by
> >>>>>>>> tenant. Would like to sync up on that (hopefully some of the folks
> >> who
> >>>>>>>> wrote most of the code will be in town, I'll check).
> >>>>>>>>>
> >>>>>>>>> Also interested in the "Time" and "offheap" parts (although you
> >> folks
> >>>>>>>> usually do not like what I think about the offheap efforts :) ).
> >>>>>>>>> Would like to add the following topics:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> - "Timestamp Resolution". Or making space for more bits in the
> >>>>>>>> timestamps (happy to cover that, unless it's part of the "Time"
> >> topic)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> - "Replication". We found that replication cannot keep up with
> high
> >>>>>>>> write loads, due to the fact that replicated is strictly single
> >>>> threaded
> >>>>>>>> per regionserver (even though we have multiple region servers on
> the
> >>>>>> sink
> >>>>>>>> side)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> - "Spark integration" (Ted Malaska?)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> OK... Out now to make a "bullshit hat".
> >>>>>>>>>
> >>>>>>>>> -- Lars
> >>>>>>>>>
> >>>>>>>>> ________________________________
> >>>>>>>>> From: Sean Busbey <[email protected]>
> >>>>>>>>> To: dev <[email protected]>
> >>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
> >>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
> near-term
> >>>> work
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I'm planning to be in the Bay area the week of the 24th of
> August.
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Sean
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <[email protected]>
> >>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> I can be up in your area in August.
> >>>>>>>>>>
> >>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <[email protected]>
> >> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
> >>>> [email protected]>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Sounds good. It has been a while we did the talk-aton.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'll be off starting 25 of July, so I prefer something next
> week
> >>>> if
> >>>>>>>>>>>> possible.
> >>>>>>>>>>>>
> >>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of August
> >>>>>> (Mikhail
> >>>>>>>>>> on
> >>>>>>>>>>> the 20th).
> >>>>>>>>>>> St.Ack
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> Enis
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <[email protected]>
> >> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Matteo and I were thinking it time devs got together for a
> >>>> pow-wow.
> >>>>>>>>>>> There
> >>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below list)
> >> and
> >>>> it
> >>>>>>>>>>> would
> >>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that have
> >>>> gone
> >>>>>>>>>>>> dormant
> >>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached
> >> google
> >>>>>> doc
> >>>>>>>>>>>> that
> >>>>>>>>>>>>> need socializing.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> You can only come if you are wearing your bullshit hat.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Topics we'd go over could include:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions
> >>>> (Matteo/Stack)
> >>>>>>>>>>>>> + Current state of the offheaping of read path and alternate
> >>>>>> KeyValue
> >>>>>>>>>>>>> implementation (Anoop/Ram)
> >>>>>>>>>>>>> + Append rejigger (Elliott)
> >>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
> >>>>>>>>>>>>> + Splitting meta/1M regions
> >>>>>>>>>>>>> + The revived Backup (Vladimir)
> >>>>>>>>>>>>> + Time (Enis)
> >>>>>>>>>>>>> + The overloaded SequenceId (Stack)
> >>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
> >>>>>>>>>>>>> + hbase-2.0.0
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I put names by folks I know could talk to the topic. If you
> >> want
> >>>> to
> >>>>>>>>>>> take
> >>>>>>>>>>>>> over a topic or put your name by one, just say.  Suggest that
> >>>>>>>>>>> discussion
> >>>>>>>>>>>>> lead off with a 5-10minute on current state of
> >>>>>>>>>>>>> thought/design/implementation.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> What do others think?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> What date would suit folks?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Anyone want to host?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Matteo and St.Ack
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Best regards,
> >>>>>>>>>>
> >>>>>>>>>> - Andy
> >>>>>>>>>>
> >>>>>>>>>> Problems worthy of attack prove their worth by hitting back. -
> >> Piet
> >>>>>> Hein
> >>>>>>>>>> (via Tom White)
> >>
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Reply via email to