Dear devs, tl;dr: We now have Jenkins jobs <https://builds.apache.org/job/HBase-master-IntegrationTestBigLinkedList/> that can run IntegrationTestBigLinkedList with fault injection on 5-node Apache HBase clusters built from source.
Long version: I just wanted to provide an update on some recent work we've gotten done since committing an Apache HBase topology for clusterdock <https://github.com/apache/hbase/commit/ccf5d27d7aa238c8398d2818928a71f39bd749a0> (a Python-based framework for building and starting Docker container-based clusters). Despite the existence of an awesome system test framework with fault-injection capabilities in the form of the hbase-it module, we've never had an easy way to run these tests on distributed clusters upstream. This has long been a big hole in our Jenkins test coverage, but since the clusterdock topology got committed, we've been making progress on doing something about it. I'm happy to report that, starting today, we are now running IntegrationTestBigLinkedList with fault-injection on Apache Infrastructure <https://builds.apache.org/job/HBase-master-IntegrationTestBigLinkedList/>. Even longer version (stop reading here if you don't care how we do it): So how do we do it? Well clusterdock is designed to start up multiple Docker containers on one host where each containers acts like a lightweight VM (so 4 containers = 4-node cluster). What's in these containers (and what to do when starting them) is controlled by clusterdock's "topology" abstraction. Our apache_hbase topology builds a Docker image from a Java tarball, Hadoop tarball, and an HBase version. This last part can be either a binary tarball (for RC testing or playing around with a release) or a Git commit, in which case our clusterdock topology builds HBase from source. Once we build a cluster, we can then push the cluster images (actually, just one Docker image) to a shared Docker registry for repeated use. We now have a matrix job that can build any branches we care about (I set it up against branch-1.2 <https://builds.apache.org/view/H-L/view/HBase/job/HBase-Build-clusterdock-Clusters/HBASE_VERSION=branch-1.2,label=docker/>, branch-1.3 <https://builds.apache.org/view/H-L/view/HBase/job/HBase-Build-clusterdock-Clusters/HBASE_VERSION=branch-1.3,label=docker/>, and master <https://builds.apache.org/view/H-L/view/HBase/job/HBase-Build-clusterdock-Clusters/HBASE_VERSION=master,label=docker/> to start) and do this. Once these images are built (and pushed), we can use them to start up an n-node sized cluster on one host and run tests against it. To begin, I've set up a super simple Jenkins job that starts up a 5-node cluster, runs ITBLL (with an optional Chaos Monkey), and then exits. This work is being tracked in HBASE-15964 and there's much more that I want to do (more tests, more Chaos Monkeys, more branches, more diagnostic information collection when a test fails), but I figured I'd let you guys know about what have going so far. :) PS: Special thanks to Jon Hsieh for helping me get the Jenkins jobs running. -- -Dima