I posted this meetup notice: http://www.meetup.com/hackathon/events/224589819/ St.Ack
On Wed, Aug 12, 2015 at 1:34 AM, Enis Söztutar <[email protected]> wrote: > Agreed, too many fat topics, but all important. I guess we can spend first > 10-20 mins on the agenda based on who is in the room and come up with a > shorter list and go from there. > > Enis > > On Tue, Aug 11, 2015 at 9:23 PM, Stack <[email protected]> wrote: > > > On Mon, Jul 20, 2015 at 1:04 PM, Stephen Jiang <[email protected]> > > wrote: > > > > > [Let us move back to the main topic - a meeting to talk about the next > > > direction on HBASE development] > > > > > > Are we firm on the *August 26th* meeting date? > > > > > > Given the long list of topics from St.Ack, even a one day meeting might > > > not cover all of them (in depth). We need to either trim the topic > list > > or > > > limit the time to discuss a single topic (30 min for one topic > enough?). > > > > > > > > Thanks for bringing us back to topic Stephen. > > > > Yes, lets do 26th. Speak up if this does not suit. I will file a meetup > > page in an hour or so. Where should we do it? Enis offered his nice > place. > > Could try and get space at ours too... in Palo Alto (less 'deep south', a > > little easier for the SFers). > > > > As to too many topics, in my experience, a bunch of smelly engineers all > in > > a room starts to fall apart after a couple of hours especially when > ranging > > discussion. Suggest we cut the time-per-topic and list of topics so can > do > > in an afternoon. If some topics are too fat, can do break out or put-off > to > > another day and smaller, interested group. > > > > St.Ack > > > > > > > > > > > Thanks > > > Stephen > > > > > > > > > On Mon, Jul 20, 2015 at 9:50 AM, Anoop John <[email protected]> > > wrote: > > > > > >> We will be doing some more large data tests in coming week Andy.. > Will > > >> report back more. Also will do a write up , in what all ways the work > > >> might help us. As Sean said, we will continue in another thread if > any > > >> thing further.. Will soon write back on the test result. Thanks. > > >> > > >> -Anoop- > > >> > > >> On Mon, Jul 20, 2015 at 9:59 PM, Andrew Purtell < > > [email protected] > > >> > > > >> wrote: > > >> > > >> > Cool, thanks. > > >> > > > >> > Is a 20% latency reduction the most we can expect or do you think > > there > > >> is > > >> > room for more improvement? Just curious. > > >> > > > >> > Is latency reduction the only goal? Anything here about supporting > > >> larger > > >> > heaps? Is there something we can measure in that regard? > > >> > > > >> > Hope you see my point and there's enough here to prime a goals and > > >> metrics > > >> > discussion at the pow wow or on the relevant JIRAs. > > >> > > > >> > > On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan < > > >> > [email protected]> wrote: > > >> > > > > >> > > Hi Andy > > >> > > > > >> > > Based on our POCs done, we expect around 20% improvement in > latency. > > >> For > > >> > > scans it will be little lesser than 20%. > > >> > > > > >> > > Regards > > >> > > Ram > > >> > > > > >> > > > > >> > > On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell < > > >> > [email protected]> > > >> > > wrote: > > >> > > > > >> > >> Hi Ram, > > >> > >> > > >> > >> Do you have any targets for what you are measuring? What are the > > >> goals > > >> > you > > >> > >> guys are working toward with the off heaping changes? > > >> > >> > > >> > >> > > >> > >>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan < > > >> > >>> [email protected]> wrote: > > >> > >>> > > >> > >>> Thanks Vladimir. > > >> > >>> Yeah, the reports that were attached specifically captured the > > >> 95/99th > > >> > >>> percentile. > > >> > >>> The reason for checking the server side perf was to specifically > > see > > >> > the > > >> > >>> improvement in the server side and also the client was sending > > large > > >> > >>> results in multiple threads. So wanted to avoid the n/w > > >> interference. I > > >> > >>> think it was a general practice that we were following. > > >> > >>> We Wil do some more tests and get some latest readings with > bigger > > >> data > > >> > >>> sets. > > >> > >>> Sent from mobile. > > >> > >>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" < > > >> [email protected]> > > >> > >> wrote: > > >> > >>>> > > >> > >>>> +1 > > >> > >>>> > > >> > >>>> Yeah, something like that, with aspirational targets for > > >> improvement > > >> > >> from > > >> > >>>> current releases. Then what to measure, the tests to run, and > > >> criteria > > >> > >> for > > >> > >>>> evaluation are clear and organized and we're able to better > > assess > > >> how > > >> > >> the > > >> > >>>> work in progress is meeting its goals (or not) > > >> > >>>> > > >> > >>>> > > >> > >>>> > > >> > >>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov < > > >> > [email protected] > > >> > >>> > > >> > >>>> wrote: > > >> > >>>> > > >> > >>>>>>> Umbrella jira to make sure we can have blocks cached in > > offheap > > >> > >> backed > > >> > >>>>> cache. In the entire read path, we can refer to this offheap > > >> buffer > > >> > and > > >> > >>>>> avoid onheap copying. > > >> > >>>>> > > >> > >>>>> I think, on a read path, the most important improvement we > could > > >> > >> imagine > > >> > >>>> is > > >> > >>>>> elimination or reducing of object creations (KVs, iterators > > etc). > > >> > >>>>> object reuse, byte buffers reuse or offheap buffers reuse, API > > >> change > > >> > >>>> etc. > > >> > >>>>> If this is a part of this JIRA, then I would easily define a > > goal: > > >> > >>>>> improving 95/99% latency of a read operations. Not > performance, > > >> but > > >> > >>>> latency > > >> > >>>>> matters > > >> > >>>>> > > >> > >>>>> -Vlad > > >> > >>>>> > > >> > >>>>> > > >> > >>>>> > > >> > >>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell < > > >> > >>>> [email protected]> > > >> > >>>>> wrote: > > >> > >>>>> > > >> > >>>>>> That's not a realistic or useful test scenario, unless the > goal > > >> is > > >> > to > > >> > >>>>>> accelerate queries where all cells are filtered at the > server. > > >> > >>>>>> > > >> > >>>>>> > > >> > >>>>>> > > >> > >>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John < > > [email protected] > > >> > > > >> > >>>> wrote: > > >> > >>>>>>> > > >> > >>>>>>> No Andy. 11425 having doc attached to it. At the end of it, > we > > >> have > > >> > >>>> added > > >> > >>>>>>> perf numbers in a cluster testing. This was done using PE > get > > >> and > > >> > >> scan > > >> > >>>>>>> tests with filtering all cells at server (to not consider > n/w > > >> > >> bandwidth > > >> > >>>>>>> constraints) > > >> > >>>>>>> > > >> > >>>>>>> -Anoop- > > >> > >>>>>>> > > >> > >>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell < > > >> > >>>>>> [email protected]> > > >> > >>>>>>> wrote: > > >> > >>>>>>> > > >> > >>>>>>>> We have some microbenchmarks, not evidence of differences > > seen > > >> > from > > >> > >> a > > >> > >>>>>>>> client application. I'm not saying that microbenchmarks are > > not > > >> > >>>> totally > > >> > >>>>>>>> necessary and a great start - they are - but that they > don't > > >> > measure > > >> > >>>> an > > >> > >>>>>> end > > >> > >>>>>>>> goal. Furthermore unless I've missed one somewhere we don't > > >> have a > > >> > >>>> JIRA > > >> > >>>>>> or > > >> > >>>>>>>> design doc that states a clear end goal metric like the > > >> strawman I > > >> > >>>> threw > > >> > >>>>>>>> together in my previous mail. A measurable system level > goal > > >> and > > >> > >> some > > >> > >>>>>> data > > >> > >>>>>>>> from full cluster testing would go a lot further toward > > letting > > >> > all > > >> > >> of > > >> > >>>>>> us > > >> > >>>>>>>> evaluate the potential and payoff of the work. In the > > meantime > > >> we > > >> > >>>> should > > >> > >>>>>>>> probably be assembling these changes on a branch instead of > > in > > >> > >> trunk, > > >> > >>>>>> for > > >> > >>>>>>>> as long as the goal is not clearly defined and the payoff > and > > >> > >>>> potential > > >> > >>>>>> for > > >> > >>>>>>>> perf regressions is untested and unknown. > > >> > >>>>>>>> > > >> > >>>>>>>> > > >> > >>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John < > > >> [email protected]> > > >> > >>>> wrote: > > >> > >>>>>>>>> > > >> > >>>>>>>>> Thanks Andy and Lars. The parent jira has doc attached > > which > > >> > >>>> contains > > >> > >>>>>>>> some > > >> > >>>>>>>>> perf gain numbers.. We will be doing more tests in next 2 > > >> weeks > > >> > >>>>>> (before > > >> > >>>>>>>>> end of this month) and will publish them. Yes it will be > > >> great > > >> > if > > >> > >>>> it > > >> > >>>>>> is > > >> > >>>>>>>>> more IST friendly time :-) > > >> > >>>>>>>>> > > >> > >>>>>>>>> -Anoop- > > >> > >>>>>>>>> > > >> > >>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell < > > >> > >>>>>>>> [email protected]> > > >> > >>>>>>>>> wrote: > > >> > >>>>>>>>> > > >> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been > known > > >> > always > > >> > >>>>>> argue > > >> > >>>>>>>>>> both side of a discussion and to never take sides easily > > >> (drives > > >> > >>>> some > > >> > >>>>>>>> folks > > >> > >>>>>>>>>> crazy). > > >> > >>>>>>>>>> > > >> > >>>>>>>>>> I can vouch for this (smile) > > >> > >>>>>>>>>> > > >> > >>>>>>>>>> I also can offer support for off heaping there. At the > same > > >> time > > >> > >> we > > >> > >>>> do > > >> > >>>>>>>>>> have a gap where we can't point to a timeline of > > improvements > > >> > >> (yet, > > >> > >>>>>>>> anyway) > > >> > >>>>>>>>>> with benchmarks showing gains where your goals need them. > > For > > >> > >>>> example, > > >> > >>>>>>>>>> stock HBase in one JVM can address max N GB for response > > time > > >> > >>>>>>>> distribution > > >> > >>>>>>>>>> D; dev version of HBase in off heap branch can address > max > > >> N' GB > > >> > >> for > > >> > >>>>>>>>>> distribution D', where N' > N and D > D' (distribution D' > > >> > >>>>>> statistically > > >> > >>>>>>>>>> shows better/lower response times). > > >> > >>>>>>>>>> > > >> > >>>>>>>>>> > > >> > >>>>>>>>>> > > >> > >>>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl < > > >> [email protected]> > > >> > >>>> wrote: > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> I'm in favor of anything that improves performance (and > > >> > >> preferably > > >> > >>>>>>>>>> doesn't set us back into a world that's worse than C due > to > > >> the > > >> > >> lack > > >> > >>>>>> of > > >> > >>>>>>>>>> pointers in Java).Never said "I don't like it", it's just > > >> that > > >> > I'm > > >> > >>>>>>>> perhaps > > >> > >>>>>>>>>> asking for more numbers and justification in weighing the > > >> pros > > >> > and > > >> > >>>>>> cons. > > >> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been > known > > >> > always > > >> > >>>>>> argue > > >> > >>>>>>>>>> both side of a discussion and to never take sides easily > > >> (drives > > >> > >>>> some > > >> > >>>>>>>> folks > > >> > >>>>>>>>>> crazy). And Stack's there too, he yell at me where needed > > :) > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> Perhaps we can do it a bit later in the evening so there > > is > > >> a > > >> > >>>>>> fighting > > >> > >>>>>>>>>> chance that folks on IST can participate. I know that > some > > of > > >> > our > > >> > >>>>>> folks > > >> > >>>>>>>> on > > >> > >>>>>>>>>> IST would love to participate in the backup discussion). > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. > > I'd > > >> > just > > >> > >>>>>> need > > >> > >>>>>>>>>> an approx. number of folks. > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> -- Lars > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> From: ramkrishna vasudevan < > > >> [email protected]> > > >> > >>>>>>>>>>> To: "[email protected]" <[email protected]>; lars > > >> > >> hofhansl < > > >> > >>>>>>>>>> [email protected]> > > >> > >>>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM > > >> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on > > >> > >> near-term > > >> > >>>>>> work > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> Hi > > >> > >>>>>>>>>>> What time will it be on August 26th? > > >> > >>>>>>>>>>> @LarsYa. I know that you are not generally in favour of > > this > > >> > >>>>>> offheaping > > >> > >>>>>>>>>> stuff. May be if we (from India) can attend this meeting > > >> > remotely > > >> > >>>>>> your > > >> > >>>>>>>>>> thoughts can be discussed and also the current state of > > this > > >> > work. > > >> > >>>>>>>>>>> RegardsRam > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl < > > >> > [email protected] > > >> > >>> > > >> > >>>>>>>> wrote: > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> Works for me. I'll be back in the Bay Area the week of > > >> August > > >> > >> 9th. > > >> > >>>>>>>>>>> We have done a _lot_ of work on backups as well - ours > are > > >> more > > >> > >>>>>>>>>> complicated as we wanted fast per-tenant restores, so > data > > is > > >> > >>>>>> "grouped" > > >> > >>>>>>>> by > > >> > >>>>>>>>>> tenant. Would like to sync up on that (hopefully some of > > the > > >> > folks > > >> > >>>> who > > >> > >>>>>>>>>> wrote most of the code will be in town, I'll check). > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> Also interested in the "Time" and "offheap" parts > > (although > > >> you > > >> > >>>> folks > > >> > >>>>>>>>>> usually do not like what I think about the offheap > efforts > > >> :) ). > > >> > >>>>>>>>>>> Would like to add the following topics: > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> - "Timestamp Resolution". Or making space for more bits > in > > >> the > > >> > >>>>>>>>>> timestamps (happy to cover that, unless it's part of the > > >> "Time" > > >> > >>>> topic) > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> - "Replication". We found that replication cannot keep > up > > >> with > > >> > >> high > > >> > >>>>>>>>>> write loads, due to the fact that replicated is strictly > > >> single > > >> > >>>>>> threaded > > >> > >>>>>>>>>> per regionserver (even though we have multiple region > > >> servers on > > >> > >> the > > >> > >>>>>>>> sink > > >> > >>>>>>>>>> side) > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> - "Spark integration" (Ted Malaska?) > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> OK... Out now to make a "bullshit hat". > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> -- Lars > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> ________________________________ > > >> > >>>>>>>>>>> From: Sean Busbey <[email protected]> > > >> > >>>>>>>>>>> To: dev <[email protected]> > > >> > >>>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM > > >> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on > > >> > >> near-term > > >> > >>>>>> work > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> I'm planning to be in the Bay area the week of the 24th > of > > >> > >> August. > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> -- > > >> > >>>>>>>>>>> Sean > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>> > > >> > >>>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" < > > >> > [email protected]> > > >> > >>>>>>>> wrote: > > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> I can be up in your area in August. > > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack < > > [email protected] > > >> > > > >> > >>>> wrote: > > >> > >>>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar < > > >> > >>>>>> [email protected]> > > >> > >>>>>>>>>>>>> wrote: > > >> > >>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>> Sounds good. It has been a while we did the > talk-aton. > > >> > >>>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>> I'll be off starting 25 of July, so I prefer > something > > >> next > > >> > >> week > > >> > >>>>>> if > > >> > >>>>>>>>>>>>>> possible. > > >> > >>>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th > of > > >> > August > > >> > >>>>>>>> (Mikhail > > >> > >>>>>>>>>>>> on > > >> > >>>>>>>>>>>>> the 20th). > > >> > >>>>>>>>>>>>> St.Ack > > >> > >>>>>>>>>>>>> > > >> > >>>>>>>>>>>>> > > >> > >>>>>>>>>>>>> > > >> > >>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>> Enis > > >> > >>>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack < > > >> [email protected]> > > >> > >>>> wrote: > > >> > >>>>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>>> Matteo and I were thinking it time devs got together > > >> for a > > >> > >>>>>> pow-wow. > > >> > >>>>>>>>>>>>> There > > >> > >>>>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see > below > > >> > list) > > >> > >>>> and > > >> > >>>>>> it > > >> > >>>>>>>>>>>>> would > > >> > >>>>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas > > that > > >> > have > > >> > >>>>>> gone > > >> > >>>>>>>>>>>>>> dormant > > >> > >>>>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in > > >> JIRA-attached > > >> > >>>> google > > >> > >>>>>>>> doc > > >> > >>>>>>>>>>>>>> that > > >> > >>>>>>>>>>>>>>> need socializing. > > >> > >>>>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>>> You can only come if you are wearing your bullshit > > hat. > > >> > >>>>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>>> Topics we'd go over could include: > > >> > >>>>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions > > >> > >>>>>> (Matteo/Stack) > > >> > >>>>>>>>>>>>>>> + Current state of the offheaping of read path and > > >> > alternate > > >> > >>>>>>>> KeyValue > > >> > >>>>>>>>>>>>>>> implementation (Anoop/Ram) > > >> > >>>>>>>>>>>>>>> + Append rejigger (Elliott) > > >> > >>>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven) > > >> > >>>>>>>>>>>>>>> + Splitting meta/1M regions > > >> > >>>>>>>>>>>>>>> + The revived Backup (Vladimir) > > >> > >>>>>>>>>>>>>>> + Time (Enis) > > >> > >>>>>>>>>>>>>>> + The overloaded SequenceId (Stack) > > >> > >>>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean) > > >> > >>>>>>>>>>>>>>> + hbase-2.0.0 > > >> > >>>>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>>> I put names by folks I know could talk to the topic. > > If > > >> you > > >> > >>>> want > > >> > >>>>>> to > > >> > >>>>>>>>>>>>> take > > >> > >>>>>>>>>>>>>>> over a topic or put your name by one, just say. > > Suggest > > >> > that > > >> > >>>>>>>>>>>>> discussion > > >> > >>>>>>>>>>>>>>> lead off with a 5-10minute on current state of > > >> > >>>>>>>>>>>>>>> thought/design/implementation. > > >> > >>>>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>>> What do others think? > > >> > >>>>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>>> What date would suit folks? > > >> > >>>>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>>> Anyone want to host? > > >> > >>>>>>>>>>>>>>> > > >> > >>>>>>>>>>>>>>> Thanks, > > >> > >>>>>>>>>>>>>>> Matteo and St.Ack > > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> -- > > >> > >>>>>>>>>>>> Best regards, > > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> - Andy > > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> Problems worthy of attack prove their worth by hitting > > >> back. - > > >> > >>>> Piet > > >> > >>>>>>>> Hein > > >> > >>>>>>>>>>>> (via Tom White) > > >> > >> > > >> > > > >> > > > > > > > > >
