2011

Ted Dunning Sat, 26 Mar 2011 15:59:39 -0700

I was thinking in terms of a more general test where read-modify-write
operations were being used.  It is also helpful to have some tests of simple
over-write.  If there is a percentage of ops that are reads and if data can
be determined to be prima facie valid or not, then this can be done during
the map-phase of your program.


On Sat, Mar 26, 2011 at 3:55 PM, Todd Lipcon <[email protected]> wrote:

> On Sat, Mar 26, 2011 at 3:53 PM, Ted Dunning <[email protected]>wrote:
>
>> Hmm...
>>
>> Yeah.  I hear that "scrapping YCSB" meme a lot.
>>
>> Do you not worry about verifying intermediate results when over-writing
>> data?
>>
>
>  Not sure what you mean by this?
>
> The design of this system test is basically to create virtual "linked
> lists" through the key space of an HBase table. The first job is map-only
> and writes these lists, and then the verify step checks to make sure there
> are no backreferences to rows that don't exist.
>
> So, if any row gets lost along the way, and the row that pointed to it
> doesn't get lost, it will flag it during the verification step.
>
> -Todd
>
>
>>
>>
>> On Sat, Mar 26, 2011 at 8:51 AM, Todd Lipcon <[email protected]> wrote:
>>
>>> Hi Ted,
>>>
>>> I actually ended up scrapping the YCSB approach and built a
>>> system/durability test instead. It's an MR job that writes a particular
>>> pattern of edits, and a second one that verifies them. I'm in the process of
>>> hooking this into our continuous integration system, and will attempt to
>>> open source it somehow or other in the next couple weeks.
>>>
>>> -Todd
>>>
>>> On Sat, Mar 26, 2011 at 12:58 AM, Ted Dunning <[email protected]>wrote:
>>>
>>>>
>>>> Todd,
>>>>
>>>> I see ycsb on your list.
>>>>
>>>> Where did that go?  We have been beating on it as well and have pretty
>>>> much decided that it is worthless as it stands.
>>>>
>>>> My thought is that we need a multi-node version that takes directions
>>>> about what load to generate via ZK.  That is better than a map-reduce based
>>>> load generator because you can ramp load up and down at any time.
>>>>
>>>> Where are you headed with this?
>>>>
>>>>
>>>> On Fri, Mar 25, 2011 at 10:49 PM, Todd Lipcon <[email protected]>wrote:
>>>>
>>>>> Dear HBase developers,
>>>>>
>>>>> Last Monday, several HBase contributors met up at the StumbleUpon
>>>>> offices
>>>>> for a bit of a hackathon. We spent the beginning of the day discussing
>>>>> a few
>>>>> general topics, and then from about 11am through 7pm or so most of us
>>>>> hunkered down to hacking on various projects. I was the secretary for
>>>>> the
>>>>> morning, so here are the notes. Please excuse any typos or if I got
>>>>> your
>>>>> name wrong - I was never cut out for stenography.
>>>>>
>>>>> Thanks to those who came, and special thanks to the folks at
>>>>> StumbleUpon for
>>>>> space, food, and beer!
>>>>>
>>>>>
>>>>> Agenda:
>>>>>  - Upcoming releases:
>>>>>   - 0.90.2 - when to release? a few bugs
>>>>>   - 0.91.x - - should we do one?
>>>>>   - 0.92.0 - when and what?
>>>>>  - Next user group meetup?
>>>>>  - Upcoming features:
>>>>>   - Rolling restart improvements?
>>>>>   - Online config change
>>>>>   - Security and build issues
>>>>>   - Distributed splitting
>>>>>  - Maybe produce some code today! (power through above, then work on
>>>>> respective priorities)
>>>>>
>>>>> ---
>>>>>
>>>>> People:
>>>>>
>>>>>  - Stack @ StumbleUpon
>>>>>  - Todd @ Cloudera
>>>>>  - Elliot @ NGMoco - using 0.89 in prod, 0.90.1 about to be rolled out
>>>>>  - Ted Yu from CarrierIQ
>>>>>  - Liyin and Nicolas from Facebook, using 0.89 for messaging product
>>>>>  - Benoit from SU - TSDB
>>>>>  - Mingjie, Eugene, Gary from TrendMicro - using some internal build
>>>>> which is like trunk (security + coprocessors frankenbuild)
>>>>>  - JD from SU
>>>>>  - Prakash Khemani from FB - his group is on 0.90 - increment heavy
>>>>> workload
>>>>>   - has a patch for distributed splitting
>>>>>   - if a server goes down, takes 10-15 minutes to catch up, so wants
>>>>> to reduce that time window
>>>>>  - Marc, independent consultant with MetaMarkets right now - 0.90.1
>>>>> "pseudo prdoction" work
>>>>>  - Ryan from StumbleUpon
>>>>>
>>>>>
>>>>> -----
>>>>>
>>>>> 0.90.2:
>>>>>  - next week? (week of 3/28?)
>>>>>   - there are some bugs that need ot be fixed still
>>>>>   - candidate end of this week, then some time for testing
>>>>>  - Stack has volunteered to be release manager
>>>>>
>>>>> 0.91.x:
>>>>>  - should we do it?
>>>>>   - people seem to think yes
>>>>>   - but we shouldn't put much effort into testing these pre-release
>>>>>   - there are a lot of interesting things in trunk that people might
>>>>> want to play with
>>>>>
>>>>> 0.92.x:
>>>>>  - JD would like to have something more than alpha quality in time
>>>>> for Hadoop Summit (3rd or 4th week of June)
>>>>>  - What are pending items?
>>>>>   - Coprocessors
>>>>>   - Online schema changes? Makes Coprocessors more useful
>>>>>   - HBASE-1502 - removing heartbeats
>>>>>   - HBASE-2856 - ACID fixes
>>>>>   - Distributed splitting
>>>>>  - Time based or feature based? we want to try doing really time based
>>>>>  - May 1st for first release candidate
>>>>>
>>>>>
>>>>> Next meetup:
>>>>>  - some time in April? in south bay?
>>>>>
>>>>> Features:
>>>>>  - Rolling restart: Stack working on it
>>>>>  - Online schema edit? FB finds it a pain point but Nicolas not sure
>>>>> where it ranks on their priority list
>>>>>  - Online config changes?
>>>>>  - Online schema change is probably more important than online config
>>>>> change, since config change can be done with rolling restart
>>>>>   - For co-processors, we need to attack some classloading issues
>>>>> before online schema change can really reload coprocessor
>>>>> implementations
>>>>>
>>>>> Security and build:
>>>>>  - Security code has been isolated as much as possible:
>>>>>   - two separate layers:
>>>>>     - RPC layer does secure RPC - pluggable RPC implementation and
>>>>> subclassing for HBaseServer and Client classes
>>>>>     - Loadable coprocessors for auth
>>>>>  - But building is difficult - need to build against a secure Hadoop
>>>>> in order to do this
>>>>>   - conditional build step? maven module?
>>>>>  - Stack and Gary will look into how to build and release this:
>>>>>   - maybe Maven profiles? modules?
>>>>>   - separate jar to be added to classpath with stuff that depends on
>>>>> security
>>>>>
>>>>> Distributed splitting:
>>>>>  - HLogSplitter code is pretty different on FB's 0.90 branch
>>>>>  - But most stuff plugs easily into trunk
>>>>>  - Same interface:
>>>>>   - call splitLog with server name
>>>>>   - master uses SplitLogManager - puts log splitting tasks in ZK
>>>>>   - each RS has SplitLogWorkers - watch for tasks, race to grab them in
>>>>> ZK
>>>>>   - each RS splits logs one at a time
>>>>>   - RS pings the master on the tasks as it splits them
>>>>>   - master can preempt a task away from a worker
>>>>>   - when master comes up it needs to grab orphanned tasks
>>>>>  - some unit tests done, but hasn't been substantially tested on real
>>>>> cluster
>>>>>  - Current splitting does batching - multiple input logs go to one
>>>>> output file per region
>>>>>   - new splitting creates 3-4x as many files for recovered.edits
>>>>>   - this is OK - we already handle this with seqids
>>>>>  - If whole cluster goes down, something like MapReduce makes more
>>>>> sense
>>>>>  - this feature is targeted towards single-RS failure
>>>>>   - currently seeing downtime of 10 minutes when RS goes down
>>>>>   - FB has various internal scripts/tools ("HyperShell") that let
>>>>> them do the full-cluster-failure case efficiently, but they don't have
>>>>> a clean way of open sourcing it
>>>>>   - Maybe we can build something like this with hbase-regionservers.sh
>>>>>
>>>>>
>>>>> What are we working on:
>>>>>  - Todd - maybe making YCSB runnable as integration test
>>>>>  - Stack - rolling restart? with Nicolas's help perhaps
>>>>>  - Marc - add some new cases to hbck
>>>>>  - Ryan - maybe porting RPC to Thrift?
>>>>>   - wants to resolve the meta-in-ZK ticket as "wontfix"
>>>>>  - Prakash - distributed splitting
>>>>>  - JD - fix bugs he saw over the weekend
>>>>>  - Gary - work on splitting out security build (maven pom file fun)
>>>>>  - Eugene: ZK-938 - kerberos stuff for ZooKeeper (necessary for HBase
>>>>> security)
>>>>>   - or maybe just fix some open bugs in HBase
>>>>>  - Mingjie: open bugs for secure HBase (Access Control related)
>>>>>  - Benoit: busy working on StumbleUpon stuff - mostly just observing
>>>>>  - Nicolas: multithreaded compactions - needs to be refactored and
>>>>> cleaned
>>>>> up
>>>>>   - they have very big storefiles (10GB+) so their compactions take
>>>>> 1hr+
>>>>>   - or just talking to people about stuff - easier than IRC
>>>>>  - Liyin - add ability to do ZK miniclusters with multiple ZKs
>>>>>  - Ted - working on pending patches / testing
>>>>>  - Elliot: HBASE-3541 - HBase rest multigets
>>>>>
>>>>>
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Hackathon notes 3/21/2011

Reply via email to