Re: DISCUSSION: lets do a developer workshop on near-term work

Andrew Purtell Mon, 20 Jul 2015 09:23:33 -0700

Returning all cells to a client is the other extreme and I don't think that 
would be a great test either.


Personally I think for testing big change sets well we need a range of 
workloads. The extreme cases (filter all, filter none) are useful data points 
but not great if measured in isolation. I think YCSB is a reasonable option for 
that these days now that it is maintained. It comes with 6 or so canned 
workloads. Not a bad start.


> On Jul 20, 2015, at 6:01 AM, lars hofhansl <[email protected]> wrote:
> 
> Personally, I think that is a reasonable way to test the internal friction of 
> the server. I've been doing a lot of tests like that and found a lot of 
> inefficiencies in HBase that way.For cases where we return all Cells back to 
> a (remote) client improving the server by 10 or 20% would mostly go unnoticed.
> 
> Analytics (aggregates via Phoenix of direct coprocessors) will be more 
> important going forward, so improving that part is important.
> I completely agree that end-to-end (by which I mean data shipped to the 
> client) testing is important, it's just I'd expect us to work on different 
> areas (put Protobufs on a diet, have a streaming protocol, etc).
> -- Lars
> 
>     From: Andrew Purtell <[email protected]>
> To: "[email protected]" <[email protected]> 
> Sent: Saturday, July 18, 2015 11:24 AM
> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
> 
> That's not a realistic or useful test scenario, unless the goal is to 
> accelerate queries where all cells are filtered at the server. 
> 
> 
> 
> 
> 
>> On Jul 18, 2015, at 11:02 AM, Anoop John <[email protected]> wrote:
>> 
>> No Andy. 11425 having doc attached to it. At the end of it, we have added
>> perf numbers in a cluster testing.  This was done using PE get and scan
>> tests with filtering all cells at server (to not consider n/w bandwidth
>> constraints)
>> 
>> -Anoop-
>> 
>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <[email protected]>
>> wrote:
>> 
>>> We have some microbenchmarks, not evidence of differences seen from a
>>> client application. I'm not saying that microbenchmarks are not totally
>>> necessary and a great start - they are - but that they don't measure an end
>>> goal. Furthermore unless I've missed one somewhere we don't have a JIRA or
>>> design doc that states a clear end goal metric like the strawman I threw
>>> together in my previous mail. A measurable system level goal and some data
>>> from full cluster testing would go a lot further toward letting all of us
>>> evaluate the potential and payoff of the work. In the meantime we should
>>> probably be assembling these changes on a branch instead of in trunk, for
>>> as long as the goal is not clearly defined and the payoff and potential for
>>> perf regressions is untested and unknown.
>>> 
>>> 
>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <[email protected]> wrote:
>>>> 
>>>> Thanks Andy and Lars.  The parent jira has doc attached which contains
>>> some
>>>> perf gain numbers..  We will be doing more tests in next 2 weeks (before
>>>> end of this month) and will publish them.  Yes it will be great if it is
>>>> more IST friendly time :-)
>>>> 
>>>> -Anoop-
>>>> 
>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
>>> [email protected]>
>>>> wrote:
>>>> 
>>>>>> I can represent your side Ram (and Anoop). I've been known always argue
>>>>> both side of a discussion and to never take sides easily (drives some
>>> folks
>>>>> crazy).
>>>>> 
>>>>> I can vouch for this (smile)
>>>>> 
>>>>> I also can offer support for off heaping there. At the same time we do
>>>>> have a gap where we can't point to a timeline of improvements (yet,
>>> anyway)
>>>>> with benchmarks showing gains where your goals need them. For example,
>>>>> stock HBase in one JVM can address max N GB for response time
>>> distribution
>>>>> D; dev version of HBase in off heap branch can address max N' GB for
>>>>> distribution D', where N' > N and D > D' (distribution D' statistically
>>>>> shows better/lower response times).
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <[email protected]> wrote:
>>>>>> 
>>>>>> I'm in favor of anything that improves performance (and preferably
>>>>> doesn't set us back into a world that's worse than C due to the lack of
>>>>> pointers in Java).Never said "I don't like it", it's just that I'm
>>> perhaps
>>>>> asking for more numbers and justification in weighing the pros and cons.
>>>>>> I can represent your side Ram (and Anoop). I've been known always argue
>>>>> both side of a discussion and to never take sides easily (drives some
>>> folks
>>>>> crazy). And Stack's there too, he yell at me where needed :)
>>>>>> 
>>>>>> Perhaps we can do it a bit later in the evening so there is a fighting
>>>>> chance that folks on IST can participate. I know that some of our folks
>>> on
>>>>> IST would love to participate in the backup discussion).
>>>>>> 
>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just need
>>>>> an approx. number of folks.
>>>>>> 
>>>>>> -- Lars
>>>>>> 
>>>>>>     From: ramkrishna vasudevan <[email protected]>
>>>>>> To: "[email protected]" <[email protected]>; lars hofhansl <
>>>>> [email protected]>
>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
>>>>>> 
>>>>>> Hi
>>>>>> What time will it be on August 26th?
>>>>>> @LarsYa. I know that you are not generally in favour of this offheaping
>>>>> stuff.  May be if we (from India) can attend this meeting remotely your
>>>>> thoughts can be discussed and also the current state of this work.
>>>>>> RegardsRam
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <[email protected]>
>>> wrote:
>>>>>> 
>>>>>> Works for me. I'll be back in the Bay Area the week of August 9th.
>>>>>> We have done a _lot_ of work on backups as well - ours are more
>>>>> complicated as we wanted fast per-tenant restores, so data is "grouped"
>>> by
>>>>> tenant. Would like to sync up on that (hopefully some of the folks who
>>>>> wrote most of the code will be in town, I'll check).
>>>>>> 
>>>>>> Also interested in the "Time" and "offheap" parts (although you folks
>>>>> usually do not like what I think about the offheap efforts :) ).
>>>>>> Would like to add the following topics:
>>>>>> 
>>>>>> 
>>>>>> - "Timestamp Resolution". Or making space for more bits in the
>>>>> timestamps (happy to cover that, unless it's part of the "Time" topic)
>>>>>> 
>>>>>> 
>>>>>> - "Replication". We found that replication cannot keep up with high
>>>>> write loads, due to the fact that replicated is strictly single threaded
>>>>> per regionserver (even though we have multiple region servers on the
>>> sink
>>>>> side)
>>>>>> 
>>>>>> 
>>>>>> - "Spark integration" (Ted Malaska?)
>>>>>> 
>>>>>> 
>>>>>> OK... Out now to make a "bullshit hat".
>>>>>> 
>>>>>> -- Lars
>>>>>> 
>>>>>> ________________________________
>>>>>> From: Sean Busbey <[email protected]>
>>>>>> To: dev <[email protected]>
>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
>>>>>> 
>>>>>> 
>>>>>> I'm planning to be in the Bay area the week of the 24th of August.
>>>>>> 
>>>>>> --
>>>>>> Sean
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <[email protected]>
>>> wrote:
>>>>>>> 
>>>>>>> I can be up in your area in August.
>>>>>>> 
>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <[email protected]>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Sounds good. It has been a while we did the talk-aton.
>>>>>>>>> 
>>>>>>>>> I'll be off starting 25 of July, so I prefer something next week if
>>>>>>>>> possible.
>>>>>>>>> 
>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of August
>>> (Mikhail
>>>>>>> on
>>>>>>>> the 20th).
>>>>>>>> St.Ack
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Enis
>>>>>>>>> 
>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <[email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>> Matteo and I were thinking it time devs got together for a pow-wow.
>>>>>>>> There
>>>>>>>>>> is a bunch of stuff in flight at the moment (see below list) and it
>>>>>>>> would
>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that have gone
>>>>>>>>> dormant
>>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached google
>>> doc
>>>>>>>>> that
>>>>>>>>>> need socializing.
>>>>>>>>>> 
>>>>>>>>>> You can only come if you are wearing your bullshit hat.
>>>>>>>>>> 
>>>>>>>>>> Topics we'd go over could include:
>>>>>>>>>> 
>>>>>>>>>> + Our filesystem layout will not work if 1M regions (Matteo/Stack)
>>>>>>>>>> + Current state of the offheaping of read path and alternate
>>> KeyValue
>>>>>>>>>> implementation (Anoop/Ram)
>>>>>>>>>> + Append rejigger (Elliott)
>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
>>>>>>>>>> + Splitting meta/1M regions
>>>>>>>>>> + The revived Backup (Vladimir)
>>>>>>>>>> + Time (Enis)
>>>>>>>>>> + The overloaded SequenceId (Stack)
>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
>>>>>>>>>> + hbase-2.0.0
>>>>>>>>>> 
>>>>>>>>>> I put names by folks I know could talk to the topic. If you want to
>>>>>>>> take
>>>>>>>>>> over a topic or put your name by one, just say.  Suggest that
>>>>>>>> discussion
>>>>>>>>>> lead off with a 5-10minute on current state of
>>>>>>>>>> thought/design/implementation.
>>>>>>>>>> 
>>>>>>>>>> What do others think?
>>>>>>>>>> 
>>>>>>>>>> What date would suit folks?
>>>>>>>>>> 
>>>>>>>>>> Anyone want to host?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Matteo and St.Ack
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> 
>>>>>>>   - Andy
>>>>>>> 
>>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>> Hein
>>>>>>> (via Tom White)
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Reply via email to