[Let us move back to the main topic - a meeting to talk about the next direction on HBASE development]
Are we firm on the *August 26th* meeting date? Given the long list of topics from St.Ack, even a one day meeting might not cover all of them (in depth). We need to either trim the topic list or limit the time to discuss a single topic (30 min for one topic enough?). Thanks Stephen On Mon, Jul 20, 2015 at 9:50 AM, Anoop John <anoop.hb...@gmail.com> wrote: > We will be doing some more large data tests in coming week Andy.. Will > report back more. Also will do a write up , in what all ways the work > might help us. As Sean said, we will continue in another thread if any > thing further.. Will soon write back on the test result. Thanks. > > -Anoop- > > On Mon, Jul 20, 2015 at 9:59 PM, Andrew Purtell <andrew.purt...@gmail.com> > wrote: > > > Cool, thanks. > > > > Is a 20% latency reduction the most we can expect or do you think there > is > > room for more improvement? Just curious. > > > > Is latency reduction the only goal? Anything here about supporting larger > > heaps? Is there something we can measure in that regard? > > > > Hope you see my point and there's enough here to prime a goals and > metrics > > discussion at the pow wow or on the relevant JIRAs. > > > > > On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan < > > ramkrishna.s.vasude...@gmail.com> wrote: > > > > > > Hi Andy > > > > > > Based on our POCs done, we expect around 20% improvement in latency. > For > > > scans it will be little lesser than 20%. > > > > > > Regards > > > Ram > > > > > > > > > On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell < > > andrew.purt...@gmail.com> > > > wrote: > > > > > >> Hi Ram, > > >> > > >> Do you have any targets for what you are measuring? What are the goals > > you > > >> guys are working toward with the off heaping changes? > > >> > > >> > > >>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan < > > >>> ramkrishna.s.vasude...@gmail.com> wrote: > > >>> > > >>> Thanks Vladimir. > > >>> Yeah, the reports that were attached specifically captured the > 95/99th > > >>> percentile. > > >>> The reason for checking the server side perf was to specifically see > > the > > >>> improvement in the server side and also the client was sending large > > >>> results in multiple threads. So wanted to avoid the n/w > interference. I > > >>> think it was a general practice that we were following. > > >>> We Wil do some more tests and get some latest readings with bigger > data > > >>> sets. > > >>> Sent from mobile. > > >>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <andrew.purt...@gmail.com > > > > >> wrote: > > >>>> > > >>>> +1 > > >>>> > > >>>> Yeah, something like that, with aspirational targets for improvement > > >> from > > >>>> current releases. Then what to measure, the tests to run, and > criteria > > >> for > > >>>> evaluation are clear and organized and we're able to better assess > how > > >> the > > >>>> work in progress is meeting its goals (or not) > > >>>> > > >>>> > > >>>> > > >>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov < > > vladrodio...@gmail.com > > >>> > > >>>> wrote: > > >>>> > > >>>>>>> Umbrella jira to make sure we can have blocks cached in offheap > > >> backed > > >>>>> cache. In the entire read path, we can refer to this offheap buffer > > and > > >>>>> avoid onheap copying. > > >>>>> > > >>>>> I think, on a read path, the most important improvement we could > > >> imagine > > >>>> is > > >>>>> elimination or reducing of object creations (KVs, iterators etc). > > >>>>> object reuse, byte buffers reuse or offheap buffers reuse, API > change > > >>>> etc. > > >>>>> If this is a part of this JIRA, then I would easily define a goal: > > >>>>> improving 95/99% latency of a read operations. Not performance, but > > >>>> latency > > >>>>> matters > > >>>>> > > >>>>> -Vlad > > >>>>> > > >>>>> > > >>>>> > > >>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell < > > >>>> andrew.purt...@gmail.com> > > >>>>> wrote: > > >>>>> > > >>>>>> That's not a realistic or useful test scenario, unless the goal is > > to > > >>>>>> accelerate queries where all cells are filtered at the server. > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <anoop.hb...@gmail.com> > > >>>> wrote: > > >>>>>>> > > >>>>>>> No Andy. 11425 having doc attached to it. At the end of it, we > have > > >>>> added > > >>>>>>> perf numbers in a cluster testing. This was done using PE get > and > > >> scan > > >>>>>>> tests with filtering all cells at server (to not consider n/w > > >> bandwidth > > >>>>>>> constraints) > > >>>>>>> > > >>>>>>> -Anoop- > > >>>>>>> > > >>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell < > > >>>>>> andrew.purt...@gmail.com> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> We have some microbenchmarks, not evidence of differences seen > > from > > >> a > > >>>>>>>> client application. I'm not saying that microbenchmarks are not > > >>>> totally > > >>>>>>>> necessary and a great start - they are - but that they don't > > measure > > >>>> an > > >>>>>> end > > >>>>>>>> goal. Furthermore unless I've missed one somewhere we don't > have a > > >>>> JIRA > > >>>>>> or > > >>>>>>>> design doc that states a clear end goal metric like the > strawman I > > >>>> threw > > >>>>>>>> together in my previous mail. A measurable system level goal and > > >> some > > >>>>>> data > > >>>>>>>> from full cluster testing would go a lot further toward letting > > all > > >> of > > >>>>>> us > > >>>>>>>> evaluate the potential and payoff of the work. In the meantime > we > > >>>> should > > >>>>>>>> probably be assembling these changes on a branch instead of in > > >> trunk, > > >>>>>> for > > >>>>>>>> as long as the goal is not clearly defined and the payoff and > > >>>> potential > > >>>>>> for > > >>>>>>>> perf regressions is untested and unknown. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <anoop.hb...@gmail.com > > > > >>>> wrote: > > >>>>>>>>> > > >>>>>>>>> Thanks Andy and Lars. The parent jira has doc attached which > > >>>> contains > > >>>>>>>> some > > >>>>>>>>> perf gain numbers.. We will be doing more tests in next 2 > weeks > > >>>>>> (before > > >>>>>>>>> end of this month) and will publish them. Yes it will be > great > > if > > >>>> it > > >>>>>> is > > >>>>>>>>> more IST friendly time :-) > > >>>>>>>>> > > >>>>>>>>> -Anoop- > > >>>>>>>>> > > >>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell < > > >>>>>>>> andrew.purt...@gmail.com> > > >>>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been known > > always > > >>>>>> argue > > >>>>>>>>>> both side of a discussion and to never take sides easily > (drives > > >>>> some > > >>>>>>>> folks > > >>>>>>>>>> crazy). > > >>>>>>>>>> > > >>>>>>>>>> I can vouch for this (smile) > > >>>>>>>>>> > > >>>>>>>>>> I also can offer support for off heaping there. At the same > time > > >> we > > >>>> do > > >>>>>>>>>> have a gap where we can't point to a timeline of improvements > > >> (yet, > > >>>>>>>> anyway) > > >>>>>>>>>> with benchmarks showing gains where your goals need them. For > > >>>> example, > > >>>>>>>>>> stock HBase in one JVM can address max N GB for response time > > >>>>>>>> distribution > > >>>>>>>>>> D; dev version of HBase in off heap branch can address max N' > GB > > >> for > > >>>>>>>>>> distribution D', where N' > N and D > D' (distribution D' > > >>>>>> statistically > > >>>>>>>>>> shows better/lower response times). > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org > > > > >>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> I'm in favor of anything that improves performance (and > > >> preferably > > >>>>>>>>>> doesn't set us back into a world that's worse than C due to > the > > >> lack > > >>>>>> of > > >>>>>>>>>> pointers in Java).Never said "I don't like it", it's just that > > I'm > > >>>>>>>> perhaps > > >>>>>>>>>> asking for more numbers and justification in weighing the pros > > and > > >>>>>> cons. > > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been known > > always > > >>>>>> argue > > >>>>>>>>>> both side of a discussion and to never take sides easily > (drives > > >>>> some > > >>>>>>>> folks > > >>>>>>>>>> crazy). And Stack's there too, he yell at me where needed :) > > >>>>>>>>>>> > > >>>>>>>>>>> Perhaps we can do it a bit later in the evening so there is a > > >>>>>> fighting > > >>>>>>>>>> chance that folks on IST can participate. I know that some of > > our > > >>>>>> folks > > >>>>>>>> on > > >>>>>>>>>> IST would love to participate in the backup discussion). > > >>>>>>>>>>> > > >>>>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd > > just > > >>>>>> need > > >>>>>>>>>> an approx. number of folks. > > >>>>>>>>>>> > > >>>>>>>>>>> -- Lars > > >>>>>>>>>>> > > >>>>>>>>>>> From: ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com > > > > >>>>>>>>>>> To: "dev@hbase.apache.org" <dev@hbase.apache.org>; lars > > >> hofhansl < > > >>>>>>>>>> la...@apache.org> > > >>>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM > > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on > > >> near-term > > >>>>>> work > > >>>>>>>>>>> > > >>>>>>>>>>> Hi > > >>>>>>>>>>> What time will it be on August 26th? > > >>>>>>>>>>> @LarsYa. I know that you are not generally in favour of this > > >>>>>> offheaping > > >>>>>>>>>> stuff. May be if we (from India) can attend this meeting > > remotely > > >>>>>> your > > >>>>>>>>>> thoughts can be discussed and also the current state of this > > work. > > >>>>>>>>>>> RegardsRam > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl < > > la...@apache.org > > >>> > > >>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> Works for me. I'll be back in the Bay Area the week of August > > >> 9th. > > >>>>>>>>>>> We have done a _lot_ of work on backups as well - ours are > more > > >>>>>>>>>> complicated as we wanted fast per-tenant restores, so data is > > >>>>>> "grouped" > > >>>>>>>> by > > >>>>>>>>>> tenant. Would like to sync up on that (hopefully some of the > > folks > > >>>> who > > >>>>>>>>>> wrote most of the code will be in town, I'll check). > > >>>>>>>>>>> > > >>>>>>>>>>> Also interested in the "Time" and "offheap" parts (although > you > > >>>> folks > > >>>>>>>>>> usually do not like what I think about the offheap efforts :) > ). > > >>>>>>>>>>> Would like to add the following topics: > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> - "Timestamp Resolution". Or making space for more bits in > the > > >>>>>>>>>> timestamps (happy to cover that, unless it's part of the > "Time" > > >>>> topic) > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> - "Replication". We found that replication cannot keep up > with > > >> high > > >>>>>>>>>> write loads, due to the fact that replicated is strictly > single > > >>>>>> threaded > > >>>>>>>>>> per regionserver (even though we have multiple region servers > on > > >> the > > >>>>>>>> sink > > >>>>>>>>>> side) > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> - "Spark integration" (Ted Malaska?) > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> OK... Out now to make a "bullshit hat". > > >>>>>>>>>>> > > >>>>>>>>>>> -- Lars > > >>>>>>>>>>> > > >>>>>>>>>>> ________________________________ > > >>>>>>>>>>> From: Sean Busbey <bus...@cloudera.com> > > >>>>>>>>>>> To: dev <dev@hbase.apache.org> > > >>>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM > > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on > > >> near-term > > >>>>>> work > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> I'm planning to be in the Bay area the week of the 24th of > > >> August. > > >>>>>>>>>>> > > >>>>>>>>>>> -- > > >>>>>>>>>>> Sean > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" < > > apurt...@apache.org> > > >>>>>>>> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>> I can be up in your area in August. > > >>>>>>>>>>>> > > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> > > >>>> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar < > > >>>>>> enis....@gmail.com> > > >>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> Sounds good. It has been a while we did the talk-aton. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I'll be off starting 25 of July, so I prefer something > next > > >> week > > >>>>>> if > > >>>>>>>>>>>>>> possible. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of > > August > > >>>>>>>> (Mikhail > > >>>>>>>>>>>> on > > >>>>>>>>>>>>> the 20th). > > >>>>>>>>>>>>> St.Ack > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> Enis > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net > > > > >>>> wrote: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Matteo and I were thinking it time devs got together for > a > > >>>>>> pow-wow. > > >>>>>>>>>>>>> There > > >>>>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below > > list) > > >>>> and > > >>>>>> it > > >>>>>>>>>>>>> would > > >>>>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that > > have > > >>>>>> gone > > >>>>>>>>>>>>>> dormant > > >>>>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in > JIRA-attached > > >>>> google > > >>>>>>>> doc > > >>>>>>>>>>>>>> that > > >>>>>>>>>>>>>>> need socializing. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> You can only come if you are wearing your bullshit hat. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Topics we'd go over could include: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions > > >>>>>> (Matteo/Stack) > > >>>>>>>>>>>>>>> + Current state of the offheaping of read path and > > alternate > > >>>>>>>> KeyValue > > >>>>>>>>>>>>>>> implementation (Anoop/Ram) > > >>>>>>>>>>>>>>> + Append rejigger (Elliott) > > >>>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven) > > >>>>>>>>>>>>>>> + Splitting meta/1M regions > > >>>>>>>>>>>>>>> + The revived Backup (Vladimir) > > >>>>>>>>>>>>>>> + Time (Enis) > > >>>>>>>>>>>>>>> + The overloaded SequenceId (Stack) > > >>>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean) > > >>>>>>>>>>>>>>> + hbase-2.0.0 > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> I put names by folks I know could talk to the topic. If > you > > >>>> want > > >>>>>> to > > >>>>>>>>>>>>> take > > >>>>>>>>>>>>>>> over a topic or put your name by one, just say. Suggest > > that > > >>>>>>>>>>>>> discussion > > >>>>>>>>>>>>>>> lead off with a 5-10minute on current state of > > >>>>>>>>>>>>>>> thought/design/implementation. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> What do others think? > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> What date would suit folks? > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Anyone want to host? > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Thanks, > > >>>>>>>>>>>>>>> Matteo and St.Ack > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> -- > > >>>>>>>>>>>> Best regards, > > >>>>>>>>>>>> > > >>>>>>>>>>>> - Andy > > >>>>>>>>>>>> > > >>>>>>>>>>>> Problems worthy of attack prove their worth by hitting > back. - > > >>>> Piet > > >>>>>>>> Hein > > >>>>>>>>>>>> (via Tom White) > > >> > > >