I was thinking in terms of a more general test where read-modify-write operations were being used. It is also helpful to have some tests of simple over-write. If there is a percentage of ops that are reads and if data can be determined to be prima facie valid or not, then this can be done during the map-phase of your program.
On Sat, Mar 26, 2011 at 3:55 PM, Todd Lipcon <[email protected]> wrote: > On Sat, Mar 26, 2011 at 3:53 PM, Ted Dunning <[email protected]>wrote: > >> Hmm... >> >> Yeah. I hear that "scrapping YCSB" meme a lot. >> >> Do you not worry about verifying intermediate results when over-writing >> data? >> > > Not sure what you mean by this? > > The design of this system test is basically to create virtual "linked > lists" through the key space of an HBase table. The first job is map-only > and writes these lists, and then the verify step checks to make sure there > are no backreferences to rows that don't exist. > > So, if any row gets lost along the way, and the row that pointed to it > doesn't get lost, it will flag it during the verification step. > > -Todd > > >> >> >> On Sat, Mar 26, 2011 at 8:51 AM, Todd Lipcon <[email protected]> wrote: >> >>> Hi Ted, >>> >>> I actually ended up scrapping the YCSB approach and built a >>> system/durability test instead. It's an MR job that writes a particular >>> pattern of edits, and a second one that verifies them. I'm in the process of >>> hooking this into our continuous integration system, and will attempt to >>> open source it somehow or other in the next couple weeks. >>> >>> -Todd >>> >>> On Sat, Mar 26, 2011 at 12:58 AM, Ted Dunning <[email protected]>wrote: >>> >>>> >>>> Todd, >>>> >>>> I see ycsb on your list. >>>> >>>> Where did that go? We have been beating on it as well and have pretty >>>> much decided that it is worthless as it stands. >>>> >>>> My thought is that we need a multi-node version that takes directions >>>> about what load to generate via ZK. That is better than a map-reduce based >>>> load generator because you can ramp load up and down at any time. >>>> >>>> Where are you headed with this? >>>> >>>> >>>> On Fri, Mar 25, 2011 at 10:49 PM, Todd Lipcon <[email protected]>wrote: >>>> >>>>> Dear HBase developers, >>>>> >>>>> Last Monday, several HBase contributors met up at the StumbleUpon >>>>> offices >>>>> for a bit of a hackathon. We spent the beginning of the day discussing >>>>> a few >>>>> general topics, and then from about 11am through 7pm or so most of us >>>>> hunkered down to hacking on various projects. I was the secretary for >>>>> the >>>>> morning, so here are the notes. Please excuse any typos or if I got >>>>> your >>>>> name wrong - I was never cut out for stenography. >>>>> >>>>> Thanks to those who came, and special thanks to the folks at >>>>> StumbleUpon for >>>>> space, food, and beer! >>>>> >>>>> >>>>> Agenda: >>>>> - Upcoming releases: >>>>> - 0.90.2 - when to release? a few bugs >>>>> - 0.91.x - - should we do one? >>>>> - 0.92.0 - when and what? >>>>> - Next user group meetup? >>>>> - Upcoming features: >>>>> - Rolling restart improvements? >>>>> - Online config change >>>>> - Security and build issues >>>>> - Distributed splitting >>>>> - Maybe produce some code today! (power through above, then work on >>>>> respective priorities) >>>>> >>>>> --- >>>>> >>>>> People: >>>>> >>>>> - Stack @ StumbleUpon >>>>> - Todd @ Cloudera >>>>> - Elliot @ NGMoco - using 0.89 in prod, 0.90.1 about to be rolled out >>>>> - Ted Yu from CarrierIQ >>>>> - Liyin and Nicolas from Facebook, using 0.89 for messaging product >>>>> - Benoit from SU - TSDB >>>>> - Mingjie, Eugene, Gary from TrendMicro - using some internal build >>>>> which is like trunk (security + coprocessors frankenbuild) >>>>> - JD from SU >>>>> - Prakash Khemani from FB - his group is on 0.90 - increment heavy >>>>> workload >>>>> - has a patch for distributed splitting >>>>> - if a server goes down, takes 10-15 minutes to catch up, so wants >>>>> to reduce that time window >>>>> - Marc, independent consultant with MetaMarkets right now - 0.90.1 >>>>> "pseudo prdoction" work >>>>> - Ryan from StumbleUpon >>>>> >>>>> >>>>> ----- >>>>> >>>>> 0.90.2: >>>>> - next week? (week of 3/28?) >>>>> - there are some bugs that need ot be fixed still >>>>> - candidate end of this week, then some time for testing >>>>> - Stack has volunteered to be release manager >>>>> >>>>> 0.91.x: >>>>> - should we do it? >>>>> - people seem to think yes >>>>> - but we shouldn't put much effort into testing these pre-release >>>>> - there are a lot of interesting things in trunk that people might >>>>> want to play with >>>>> >>>>> 0.92.x: >>>>> - JD would like to have something more than alpha quality in time >>>>> for Hadoop Summit (3rd or 4th week of June) >>>>> - What are pending items? >>>>> - Coprocessors >>>>> - Online schema changes? Makes Coprocessors more useful >>>>> - HBASE-1502 - removing heartbeats >>>>> - HBASE-2856 - ACID fixes >>>>> - Distributed splitting >>>>> - Time based or feature based? we want to try doing really time based >>>>> - May 1st for first release candidate >>>>> >>>>> >>>>> Next meetup: >>>>> - some time in April? in south bay? >>>>> >>>>> Features: >>>>> - Rolling restart: Stack working on it >>>>> - Online schema edit? FB finds it a pain point but Nicolas not sure >>>>> where it ranks on their priority list >>>>> - Online config changes? >>>>> - Online schema change is probably more important than online config >>>>> change, since config change can be done with rolling restart >>>>> - For co-processors, we need to attack some classloading issues >>>>> before online schema change can really reload coprocessor >>>>> implementations >>>>> >>>>> Security and build: >>>>> - Security code has been isolated as much as possible: >>>>> - two separate layers: >>>>> - RPC layer does secure RPC - pluggable RPC implementation and >>>>> subclassing for HBaseServer and Client classes >>>>> - Loadable coprocessors for auth >>>>> - But building is difficult - need to build against a secure Hadoop >>>>> in order to do this >>>>> - conditional build step? maven module? >>>>> - Stack and Gary will look into how to build and release this: >>>>> - maybe Maven profiles? modules? >>>>> - separate jar to be added to classpath with stuff that depends on >>>>> security >>>>> >>>>> Distributed splitting: >>>>> - HLogSplitter code is pretty different on FB's 0.90 branch >>>>> - But most stuff plugs easily into trunk >>>>> - Same interface: >>>>> - call splitLog with server name >>>>> - master uses SplitLogManager - puts log splitting tasks in ZK >>>>> - each RS has SplitLogWorkers - watch for tasks, race to grab them in >>>>> ZK >>>>> - each RS splits logs one at a time >>>>> - RS pings the master on the tasks as it splits them >>>>> - master can preempt a task away from a worker >>>>> - when master comes up it needs to grab orphanned tasks >>>>> - some unit tests done, but hasn't been substantially tested on real >>>>> cluster >>>>> - Current splitting does batching - multiple input logs go to one >>>>> output file per region >>>>> - new splitting creates 3-4x as many files for recovered.edits >>>>> - this is OK - we already handle this with seqids >>>>> - If whole cluster goes down, something like MapReduce makes more >>>>> sense >>>>> - this feature is targeted towards single-RS failure >>>>> - currently seeing downtime of 10 minutes when RS goes down >>>>> - FB has various internal scripts/tools ("HyperShell") that let >>>>> them do the full-cluster-failure case efficiently, but they don't have >>>>> a clean way of open sourcing it >>>>> - Maybe we can build something like this with hbase-regionservers.sh >>>>> >>>>> >>>>> What are we working on: >>>>> - Todd - maybe making YCSB runnable as integration test >>>>> - Stack - rolling restart? with Nicolas's help perhaps >>>>> - Marc - add some new cases to hbck >>>>> - Ryan - maybe porting RPC to Thrift? >>>>> - wants to resolve the meta-in-ZK ticket as "wontfix" >>>>> - Prakash - distributed splitting >>>>> - JD - fix bugs he saw over the weekend >>>>> - Gary - work on splitting out security build (maven pom file fun) >>>>> - Eugene: ZK-938 - kerberos stuff for ZooKeeper (necessary for HBase >>>>> security) >>>>> - or maybe just fix some open bugs in HBase >>>>> - Mingjie: open bugs for secure HBase (Access Control related) >>>>> - Benoit: busy working on StumbleUpon stuff - mostly just observing >>>>> - Nicolas: multithreaded compactions - needs to be refactored and >>>>> cleaned >>>>> up >>>>> - they have very big storefiles (10GB+) so their compactions take >>>>> 1hr+ >>>>> - or just talking to people about stuff - easier than IRC >>>>> - Liyin - add ability to do ZK miniclusters with multiple ZKs >>>>> - Ted - working on pending patches / testing >>>>> - Elliot: HBASE-3541 - HBase rest multigets >>>>> >>>>> >>>>> -- >>>>> Todd Lipcon >>>>> Software Engineer, Cloudera >>>>> >>>> >>>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> >> > > > -- > Todd Lipcon > Software Engineer, Cloudera >
