Re: DISCUSSION: lets do a developer workshop on near-term work

Stephen Jiang Mon, 20 Jul 2015 13:05:27 -0700

[Let us move back to the main topic - a meeting to talk about the next
direction on HBASE development]


Are we firm on the *August 26th* meeting date?

Given the long list of topics from St.Ack, even a one day meeting might not
cover all of them (in depth).  We need to either trim the topic list or
limit the time to discuss a single topic (30 min for one topic enough?).

Thanks
Stephen


On Mon, Jul 20, 2015 at 9:50 AM, Anoop John <[email protected]> wrote:

> We will be doing some more large data tests in coming week Andy..   Will
> report back more.  Also will do a write up , in what all ways the work
> might help us.  As Sean said, we will continue in another thread if any
> thing further..  Will soon write back on the test result.  Thanks.
>
> -Anoop-
>
> On Mon, Jul 20, 2015 at 9:59 PM, Andrew Purtell <[email protected]>
> wrote:
>
> > Cool, thanks.
> >
> > Is a 20% latency reduction the most we can expect or do you think there
> is
> > room for more improvement? Just curious.
> >
> > Is latency reduction the only goal? Anything here about supporting larger
> > heaps? Is there something we can measure in that regard?
> >
> > Hope you see my point and there's enough here to prime a goals and
> metrics
> > discussion at the pow wow or on the relevant JIRAs.
> >
> > > On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan <
> > [email protected]> wrote:
> > >
> > > Hi Andy
> > >
> > > Based on our POCs done, we expect around 20% improvement in latency.
> For
> > > scans it will be little lesser than 20%.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <
> > [email protected]>
> > > wrote:
> > >
> > >> Hi Ram,
> > >>
> > >> Do you have any targets for what you are measuring? What are the goals
> > you
> > >> guys are working toward with the off heaping changes?
> > >>
> > >>
> > >>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
> > >>> [email protected]> wrote:
> > >>>
> > >>> Thanks Vladimir.
> > >>> Yeah, the reports that were attached specifically captured the
> 95/99th
> > >>> percentile.
> > >>> The reason for checking the server side perf was to specifically see
> > the
> > >>> improvement in the server side and also the client was sending large
> > >>> results in multiple threads. So wanted to avoid the n/w
> interference. I
> > >>> think it was a general practice that we were following.
> > >>> We Wil do some more tests and get some latest readings with bigger
> data
> > >>> sets.
> > >>> Sent from mobile.
> > >>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <[email protected]
> >
> > >> wrote:
> > >>>>
> > >>>> +1
> > >>>>
> > >>>> Yeah, something like that, with aspirational targets for improvement
> > >> from
> > >>>> current releases. Then what to measure, the tests to run, and
> criteria
> > >> for
> > >>>> evaluation are clear and organized and we're able to better assess
> how
> > >> the
> > >>>> work in progress is meeting its goals (or not)
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <
> > [email protected]
> > >>>
> > >>>> wrote:
> > >>>>
> > >>>>>>> Umbrella jira to make sure we can have blocks cached in offheap
> > >> backed
> > >>>>> cache. In the entire read path, we can refer to this offheap buffer
> > and
> > >>>>> avoid onheap copying.
> > >>>>>
> > >>>>> I think, on a read path, the most important improvement we could
> > >> imagine
> > >>>> is
> > >>>>> elimination or reducing of object creations (KVs, iterators etc).
> > >>>>> object reuse, byte buffers reuse or offheap buffers reuse, API
> change
> > >>>> etc.
> > >>>>> If this is a part of this JIRA, then I would easily define a goal:
> > >>>>> improving 95/99% latency of a read operations. Not performance, but
> > >>>> latency
> > >>>>> matters
> > >>>>>
> > >>>>> -Vlad
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
> > >>>> [email protected]>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> That's not a realistic or useful test scenario, unless the goal is
> > to
> > >>>>>> accelerate queries where all cells are filtered at the server.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <[email protected]>
> > >>>> wrote:
> > >>>>>>>
> > >>>>>>> No Andy. 11425 having doc attached to it. At the end of it, we
> have
> > >>>> added
> > >>>>>>> perf numbers in a cluster testing.  This was done using PE get
> and
> > >> scan
> > >>>>>>> tests with filtering all cells at server (to not consider n/w
> > >> bandwidth
> > >>>>>>> constraints)
> > >>>>>>>
> > >>>>>>> -Anoop-
> > >>>>>>>
> > >>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
> > >>>>>> [email protected]>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> We have some microbenchmarks, not evidence of differences seen
> > from
> > >> a
> > >>>>>>>> client application. I'm not saying that microbenchmarks are not
> > >>>> totally
> > >>>>>>>> necessary and a great start - they are - but that they don't
> > measure
> > >>>> an
> > >>>>>> end
> > >>>>>>>> goal. Furthermore unless I've missed one somewhere we don't
> have a
> > >>>> JIRA
> > >>>>>> or
> > >>>>>>>> design doc that states a clear end goal metric like the
> strawman I
> > >>>> threw
> > >>>>>>>> together in my previous mail. A measurable system level goal and
> > >> some
> > >>>>>> data
> > >>>>>>>> from full cluster testing would go a lot further toward letting
> > all
> > >> of
> > >>>>>> us
> > >>>>>>>> evaluate the potential and payoff of the work. In the meantime
> we
> > >>>> should
> > >>>>>>>> probably be assembling these changes on a branch instead of in
> > >> trunk,
> > >>>>>> for
> > >>>>>>>> as long as the goal is not clearly defined and the payoff and
> > >>>> potential
> > >>>>>> for
> > >>>>>>>> perf regressions is untested and unknown.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <[email protected]
> >
> > >>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Thanks Andy and Lars.  The parent jira has doc attached which
> > >>>> contains
> > >>>>>>>> some
> > >>>>>>>>> perf gain numbers..  We will be doing more tests in next 2
> weeks
> > >>>>>> (before
> > >>>>>>>>> end of this month) and will publish them.   Yes it will be
> great
> > if
> > >>>> it
> > >>>>>> is
> > >>>>>>>>> more IST friendly time :-)
> > >>>>>>>>>
> > >>>>>>>>> -Anoop-
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> > >>>>>>>> [email protected]>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been known
> > always
> > >>>>>> argue
> > >>>>>>>>>> both side of a discussion and to never take sides easily
> (drives
> > >>>> some
> > >>>>>>>> folks
> > >>>>>>>>>> crazy).
> > >>>>>>>>>>
> > >>>>>>>>>> I can vouch for this (smile)
> > >>>>>>>>>>
> > >>>>>>>>>> I also can offer support for off heaping there. At the same
> time
> > >> we
> > >>>> do
> > >>>>>>>>>> have a gap where we can't point to a timeline of improvements
> > >> (yet,
> > >>>>>>>> anyway)
> > >>>>>>>>>> with benchmarks showing gains where your goals need them. For
> > >>>> example,
> > >>>>>>>>>> stock HBase in one JVM can address max N GB for response time
> > >>>>>>>> distribution
> > >>>>>>>>>> D; dev version of HBase in off heap branch can address max N'
> GB
> > >> for
> > >>>>>>>>>> distribution D', where N' > N and D > D' (distribution D'
> > >>>>>> statistically
> > >>>>>>>>>> shows better/lower response times).
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <[email protected]
> >
> > >>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> I'm in favor of anything that improves performance (and
> > >> preferably
> > >>>>>>>>>> doesn't set us back into a world that's worse than C due to
> the
> > >> lack
> > >>>>>> of
> > >>>>>>>>>> pointers in Java).Never said "I don't like it", it's just that
> > I'm
> > >>>>>>>> perhaps
> > >>>>>>>>>> asking for more numbers and justification in weighing the pros
> > and
> > >>>>>> cons.
> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been known
> > always
> > >>>>>> argue
> > >>>>>>>>>> both side of a discussion and to never take sides easily
> (drives
> > >>>> some
> > >>>>>>>> folks
> > >>>>>>>>>> crazy). And Stack's there too, he yell at me where needed :)
> > >>>>>>>>>>>
> > >>>>>>>>>>> Perhaps we can do it a bit later in the evening so there is a
> > >>>>>> fighting
> > >>>>>>>>>> chance that folks on IST can participate. I know that some of
> > our
> > >>>>>> folks
> > >>>>>>>> on
> > >>>>>>>>>> IST would love to participate in the backup discussion).
> > >>>>>>>>>>>
> > >>>>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd
> > just
> > >>>>>> need
> > >>>>>>>>>> an approx. number of folks.
> > >>>>>>>>>>>
> > >>>>>>>>>>> -- Lars
> > >>>>>>>>>>>
> > >>>>>>>>>>> From: ramkrishna vasudevan <[email protected]
> >
> > >>>>>>>>>>> To: "[email protected]" <[email protected]>; lars
> > >> hofhansl <
> > >>>>>>>>>> [email protected]>
> > >>>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
> > >> near-term
> > >>>>>> work
> > >>>>>>>>>>>
> > >>>>>>>>>>> Hi
> > >>>>>>>>>>> What time will it be on August 26th?
> > >>>>>>>>>>> @LarsYa. I know that you are not generally in favour of this
> > >>>>>> offheaping
> > >>>>>>>>>> stuff.  May be if we (from India) can attend this meeting
> > remotely
> > >>>>>> your
> > >>>>>>>>>> thoughts can be discussed and also the current state of this
> > work.
> > >>>>>>>>>>> RegardsRam
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <
> > [email protected]
> > >>>
> > >>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> Works for me. I'll be back in the Bay Area the week of August
> > >> 9th.
> > >>>>>>>>>>> We have done a _lot_ of work on backups as well - ours are
> more
> > >>>>>>>>>> complicated as we wanted fast per-tenant restores, so data is
> > >>>>>> "grouped"
> > >>>>>>>> by
> > >>>>>>>>>> tenant. Would like to sync up on that (hopefully some of the
> > folks
> > >>>> who
> > >>>>>>>>>> wrote most of the code will be in town, I'll check).
> > >>>>>>>>>>>
> > >>>>>>>>>>> Also interested in the "Time" and "offheap" parts (although
> you
> > >>>> folks
> > >>>>>>>>>> usually do not like what I think about the offheap efforts :)
> ).
> > >>>>>>>>>>> Would like to add the following topics:
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> - "Timestamp Resolution". Or making space for more bits in
> the
> > >>>>>>>>>> timestamps (happy to cover that, unless it's part of the
> "Time"
> > >>>> topic)
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> - "Replication". We found that replication cannot keep up
> with
> > >> high
> > >>>>>>>>>> write loads, due to the fact that replicated is strictly
> single
> > >>>>>> threaded
> > >>>>>>>>>> per regionserver (even though we have multiple region servers
> on
> > >> the
> > >>>>>>>> sink
> > >>>>>>>>>> side)
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> - "Spark integration" (Ted Malaska?)
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> OK... Out now to make a "bullshit hat".
> > >>>>>>>>>>>
> > >>>>>>>>>>> -- Lars
> > >>>>>>>>>>>
> > >>>>>>>>>>> ________________________________
> > >>>>>>>>>>> From: Sean Busbey <[email protected]>
> > >>>>>>>>>>> To: dev <[email protected]>
> > >>>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
> > >> near-term
> > >>>>>> work
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> I'm planning to be in the Bay area the week of the 24th of
> > >> August.
> > >>>>>>>>>>>
> > >>>>>>>>>>> --
> > >>>>>>>>>>> Sean
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <
> > [email protected]>
> > >>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I can be up in your area in August.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <[email protected]>
> > >>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
> > >>>>>> [email protected]>
> > >>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Sounds good. It has been a while we did the talk-aton.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I'll be off starting 25 of July, so I prefer something
> next
> > >> week
> > >>>>>> if
> > >>>>>>>>>>>>>> possible.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of
> > August
> > >>>>>>>> (Mikhail
> > >>>>>>>>>>>> on
> > >>>>>>>>>>>>> the 20th).
> > >>>>>>>>>>>>> St.Ack
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Enis
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <[email protected]
> >
> > >>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Matteo and I were thinking it time devs got together for
> a
> > >>>>>> pow-wow.
> > >>>>>>>>>>>>> There
> > >>>>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below
> > list)
> > >>>> and
> > >>>>>> it
> > >>>>>>>>>>>>> would
> > >>>>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that
> > have
> > >>>>>> gone
> > >>>>>>>>>>>>>> dormant
> > >>>>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in
> JIRA-attached
> > >>>> google
> > >>>>>>>> doc
> > >>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>> need socializing.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> You can only come if you are wearing your bullshit hat.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Topics we'd go over could include:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions
> > >>>>>> (Matteo/Stack)
> > >>>>>>>>>>>>>>> + Current state of the offheaping of read path and
> > alternate
> > >>>>>>>> KeyValue
> > >>>>>>>>>>>>>>> implementation (Anoop/Ram)
> > >>>>>>>>>>>>>>> + Append rejigger (Elliott)
> > >>>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
> > >>>>>>>>>>>>>>> + Splitting meta/1M regions
> > >>>>>>>>>>>>>>> + The revived Backup (Vladimir)
> > >>>>>>>>>>>>>>> + Time (Enis)
> > >>>>>>>>>>>>>>> + The overloaded SequenceId (Stack)
> > >>>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
> > >>>>>>>>>>>>>>> + hbase-2.0.0
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> I put names by folks I know could talk to the topic. If
> you
> > >>>> want
> > >>>>>> to
> > >>>>>>>>>>>>> take
> > >>>>>>>>>>>>>>> over a topic or put your name by one, just say.  Suggest
> > that
> > >>>>>>>>>>>>> discussion
> > >>>>>>>>>>>>>>> lead off with a 5-10minute on current state of
> > >>>>>>>>>>>>>>> thought/design/implementation.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> What do others think?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> What date would suit folks?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Anyone want to host?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>> Matteo and St.Ack
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> --
> > >>>>>>>>>>>> Best regards,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> - Andy
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Problems worthy of attack prove their worth by hitting
> back. -
> > >>>> Piet
> > >>>>>>>> Hein
> > >>>>>>>>>>>> (via Tom White)
> > >>
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Reply via email to