I've opened HBASE-16481 as an umbrella JIRA for improvements to this and added running on more branches and collecting logs/HFiles/WALs as subtasks. Please keep the suggestions coming!
On Tue, Aug 23, 2016 at 9:44 AM, Andrew Purtell <apurt...@apache.org> wrote: > This is great. > > To completely retrace a rare botch we may need persisted post run: > - The console log of the rum > - All daemon logs > - All WALs > - All HFiles > WALs and HFiles should be be organized by time from oldest to newest. > > All could reside in a S3 bucket. > > > > On Tue, Aug 23, 2016 at 12:26 AM, Dima Spivak <dspi...@cloudera.com> > wrote: > > > Yep, that's the next improvement I plan on making. Docker has API > endpoints > > for copying files from a container to the host, so I can definitely use > > that to move logs from the cluster to the Jenkins workspace if a test > > fails. > > > > On Monday, August 22, 2016, Nick Dimiduk <ndimi...@gmail.com> wrote: > > > > > This sounds great! Is there a way to gather logs and/or data files from > > the > > > containers before termination? Can they be stored on Jenkins as part of > > the > > > job artifacts? > > > > > > On Monday, August 22, 2016, Ted Yu <yuzhih...@gmail.com > <javascript:;>> > > > wrote: > > > > > > > Nice job, Dima. > > > > > > > > Is there Jenkins job for running ITBLL for 1.2 / 1.3 branches ? > > > > > > > > Cheers > > > > > > > > On Mon, Aug 22, 2016 at 5:33 PM, Dima Spivak <dspi...@cloudera.com > > > <javascript:;> > > > > <javascript:;>> wrote: > > > > > > > > > Dear devs, > > > > > > > > > > tl;dr: We now have Jenkins jobs > > > > > <https://builds.apache.org/job/HBase-master- > > > > IntegrationTestBigLinkedList/> > > > > > that can run IntegrationTestBigLinkedList with fault injection on > > > 5-node > > > > > Apache HBase clusters built from source. > > > > > > > > > > Long version: > > > > > > > > > > I just wanted to provide an update on some recent work we've gotten > > > done > > > > > since committing an Apache HBase topology for clusterdock > > > > > <https://github.com/apache/hbase/commit/ > > ccf5d27d7aa238c8398d2818928a71 > > > > > f39bd749a0> > > > > > (a Python-based framework for building and starting Docker > > > > container-based > > > > > clusters). > > > > > > > > > > Despite the existence of an awesome system test framework with > > > > > fault-injection capabilities in the form of the hbase-it module, > > we've > > > > > never had an easy way to run these tests on distributed clusters > > > > upstream. > > > > > This has long been a big hole in our Jenkins test coverage, but > since > > > the > > > > > clusterdock topology got committed, we've been making progress on > > doing > > > > > something about it. I'm happy to report that, starting today, we > are > > > now > > > > > running IntegrationTestBigLinkedList with fault-injection on Apache > > > > > Infrastructure > > > > > <https://builds.apache.org/job/HBase-master- > > > > IntegrationTestBigLinkedList/> > > > > > . > > > > > > > > > > Even longer version (stop reading here if you don't care how we do > > it): > > > > > > > > > > So how do we do it? Well clusterdock is designed to start up > multiple > > > > > Docker containers on one host where each containers acts like a > > > > lightweight > > > > > VM (so 4 containers = 4-node cluster). What's in these containers > > (and > > > > what > > > > > to do when starting them) is controlled by clusterdock's "topology" > > > > > abstraction. Our apache_hbase topology builds a Docker image from a > > > Java > > > > > tarball, Hadoop tarball, and an HBase version. This last part can > be > > > > either > > > > > a binary tarball (for RC testing or playing around with a release) > > or a > > > > Git > > > > > commit, in which case our clusterdock topology builds HBase from > > > source. > > > > > Once we build a cluster, we can then push the cluster images > > (actually, > > > > > just one Docker image) to a shared Docker registry for repeated > use. > > We > > > > now > > > > > have a matrix job that can build any branches we care about (I set > it > > > up > > > > > against branch-1.2 > > > > > <https://builds.apache.org/view/H-L/view/HBase/job/HBase- > > > > > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.2, > label=docker/>, > > > > > branch-1.3 > > > > > <https://builds.apache.org/view/H-L/view/HBase/job/HBase- > > > > > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.3, > label=docker/>, > > > > > and master > > > > > <https://builds.apache.org/view/H-L/view/HBase/job/HBase- > > > > > Build-clusterdock-Clusters/HBASE_VERSION=master,label=docker/> > > > > > to start) and do this. > > > > > > > > > > Once these images are built (and pushed), we can use them to start > up > > > an > > > > > n-node sized cluster on one host and run tests against it. To > begin, > > > I've > > > > > set up a super simple Jenkins job that starts up a 5-node cluster, > > runs > > > > > ITBLL (with an optional Chaos Monkey), and then exits. > > > > > > > > > > This work is being tracked in HBASE-15964 and there's much more > that > > I > > > > want > > > > > to do (more tests, more Chaos Monkeys, more branches, more > diagnostic > > > > > information collection when a test fails), but I figured I'd let > you > > > guys > > > > > know about what have going so far. :) > > > > > > > > > > PS: Special thanks to Jon Hsieh for helping me get the Jenkins jobs > > > > > running. > > > > > > > > > > -- > > > > > -Dima > > > > > > > > > > > > > > > > > > -- > > -Dima > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > -- -Dima