This is great. To completely retrace a rare botch we may need persisted post run: - The console log of the rum - All daemon logs - All WALs - All HFiles WALs and HFiles should be be organized by time from oldest to newest.
All could reside in a S3 bucket. On Tue, Aug 23, 2016 at 12:26 AM, Dima Spivak <dspi...@cloudera.com> wrote: > Yep, that's the next improvement I plan on making. Docker has API endpoints > for copying files from a container to the host, so I can definitely use > that to move logs from the cluster to the Jenkins workspace if a test > fails. > > On Monday, August 22, 2016, Nick Dimiduk <ndimi...@gmail.com> wrote: > > > This sounds great! Is there a way to gather logs and/or data files from > the > > containers before termination? Can they be stored on Jenkins as part of > the > > job artifacts? > > > > On Monday, August 22, 2016, Ted Yu <yuzhih...@gmail.com <javascript:;>> > > wrote: > > > > > Nice job, Dima. > > > > > > Is there Jenkins job for running ITBLL for 1.2 / 1.3 branches ? > > > > > > Cheers > > > > > > On Mon, Aug 22, 2016 at 5:33 PM, Dima Spivak <dspi...@cloudera.com > > <javascript:;> > > > <javascript:;>> wrote: > > > > > > > Dear devs, > > > > > > > > tl;dr: We now have Jenkins jobs > > > > <https://builds.apache.org/job/HBase-master- > > > IntegrationTestBigLinkedList/> > > > > that can run IntegrationTestBigLinkedList with fault injection on > > 5-node > > > > Apache HBase clusters built from source. > > > > > > > > Long version: > > > > > > > > I just wanted to provide an update on some recent work we've gotten > > done > > > > since committing an Apache HBase topology for clusterdock > > > > <https://github.com/apache/hbase/commit/ > ccf5d27d7aa238c8398d2818928a71 > > > > f39bd749a0> > > > > (a Python-based framework for building and starting Docker > > > container-based > > > > clusters). > > > > > > > > Despite the existence of an awesome system test framework with > > > > fault-injection capabilities in the form of the hbase-it module, > we've > > > > never had an easy way to run these tests on distributed clusters > > > upstream. > > > > This has long been a big hole in our Jenkins test coverage, but since > > the > > > > clusterdock topology got committed, we've been making progress on > doing > > > > something about it. I'm happy to report that, starting today, we are > > now > > > > running IntegrationTestBigLinkedList with fault-injection on Apache > > > > Infrastructure > > > > <https://builds.apache.org/job/HBase-master- > > > IntegrationTestBigLinkedList/> > > > > . > > > > > > > > Even longer version (stop reading here if you don't care how we do > it): > > > > > > > > So how do we do it? Well clusterdock is designed to start up multiple > > > > Docker containers on one host where each containers acts like a > > > lightweight > > > > VM (so 4 containers = 4-node cluster). What's in these containers > (and > > > what > > > > to do when starting them) is controlled by clusterdock's "topology" > > > > abstraction. Our apache_hbase topology builds a Docker image from a > > Java > > > > tarball, Hadoop tarball, and an HBase version. This last part can be > > > either > > > > a binary tarball (for RC testing or playing around with a release) > or a > > > Git > > > > commit, in which case our clusterdock topology builds HBase from > > source. > > > > Once we build a cluster, we can then push the cluster images > (actually, > > > > just one Docker image) to a shared Docker registry for repeated use. > We > > > now > > > > have a matrix job that can build any branches we care about (I set it > > up > > > > against branch-1.2 > > > > <https://builds.apache.org/view/H-L/view/HBase/job/HBase- > > > > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.2,label=docker/>, > > > > branch-1.3 > > > > <https://builds.apache.org/view/H-L/view/HBase/job/HBase- > > > > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.3,label=docker/>, > > > > and master > > > > <https://builds.apache.org/view/H-L/view/HBase/job/HBase- > > > > Build-clusterdock-Clusters/HBASE_VERSION=master,label=docker/> > > > > to start) and do this. > > > > > > > > Once these images are built (and pushed), we can use them to start up > > an > > > > n-node sized cluster on one host and run tests against it. To begin, > > I've > > > > set up a super simple Jenkins job that starts up a 5-node cluster, > runs > > > > ITBLL (with an optional Chaos Monkey), and then exits. > > > > > > > > This work is being tracked in HBASE-15964 and there's much more that > I > > > want > > > > to do (more tests, more Chaos Monkeys, more branches, more diagnostic > > > > information collection when a test fails), but I figured I'd let you > > guys > > > > know about what have going so far. :) > > > > > > > > PS: Special thanks to Jon Hsieh for helping me get the Jenkins jobs > > > > running. > > > > > > > > -- > > > > -Dima > > > > > > > > > > > > -- > -Dima > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)