I've opened HBASE-16481 as an umbrella JIRA for improvements to this and
added running on more branches and collecting logs/HFiles/WALs as subtasks.
Please keep the suggestions coming!

On Tue, Aug 23, 2016 at 9:44 AM, Andrew Purtell <apurt...@apache.org> wrote:

> This is great.
>
> To completely retrace a rare botch we may need persisted post run:
> - The console log of the rum
> - All daemon logs
> - All WALs
> - All HFiles
> WALs and HFiles should be be organized by time from oldest to newest.
>
> All could reside in a S3 bucket.
>
>
>
> On Tue, Aug 23, 2016 at 12:26 AM, Dima Spivak <dspi...@cloudera.com>
> wrote:
>
> > Yep, that's the next improvement I plan on making. Docker has API
> endpoints
> > for copying files from a container to the host, so I can definitely use
> > that to move logs from the cluster to the Jenkins workspace if a test
> > fails.
> >
> > On Monday, August 22, 2016, Nick Dimiduk <ndimi...@gmail.com> wrote:
> >
> > > This sounds great! Is there a way to gather logs and/or data files from
> > the
> > > containers before termination? Can they be stored on Jenkins as part of
> > the
> > > job artifacts?
> > >
> > > On Monday, August 22, 2016, Ted Yu <yuzhih...@gmail.com
> <javascript:;>>
> > > wrote:
> > >
> > > > Nice job, Dima.
> > > >
> > > > Is there Jenkins job for running ITBLL for 1.2 / 1.3 branches ?
> > > >
> > > > Cheers
> > > >
> > > > On Mon, Aug 22, 2016 at 5:33 PM, Dima Spivak <dspi...@cloudera.com
> > > <javascript:;>
> > > > <javascript:;>> wrote:
> > > >
> > > > > Dear devs,
> > > > >
> > > > > tl;dr: We now have Jenkins jobs
> > > > > <https://builds.apache.org/job/HBase-master-
> > > > IntegrationTestBigLinkedList/>
> > > > > that can run IntegrationTestBigLinkedList with fault injection on
> > > 5-node
> > > > > Apache HBase clusters built from source.
> > > > >
> > > > > Long version:
> > > > >
> > > > > I just wanted to provide an update on some recent work we've gotten
> > > done
> > > > > since committing an Apache HBase topology for clusterdock
> > > > > <https://github.com/apache/hbase/commit/
> > ccf5d27d7aa238c8398d2818928a71
> > > > > f39bd749a0>
> > > > > (a Python-based framework for building and starting Docker
> > > > container-based
> > > > > clusters).
> > > > >
> > > > > Despite the existence of an awesome system test framework with
> > > > > fault-injection capabilities in the form of the hbase-it module,
> > we've
> > > > > never had an easy way to run these tests on distributed clusters
> > > > upstream.
> > > > > This has long been a big hole in our Jenkins test coverage, but
> since
> > > the
> > > > > clusterdock topology got committed, we've been making progress on
> > doing
> > > > > something about it. I'm happy to report that, starting today, we
> are
> > > now
> > > > > running IntegrationTestBigLinkedList with fault-injection on Apache
> > > > > Infrastructure
> > > > > <https://builds.apache.org/job/HBase-master-
> > > > IntegrationTestBigLinkedList/>
> > > > > .
> > > > >
> > > > > Even longer version (stop reading here if you don't care how we do
> > it):
> > > > >
> > > > > So how do we do it? Well clusterdock is designed to start up
> multiple
> > > > > Docker containers on one host where each containers acts like a
> > > > lightweight
> > > > > VM (so 4 containers = 4-node cluster). What's in these containers
> > (and
> > > > what
> > > > > to do when starting them) is controlled by clusterdock's "topology"
> > > > > abstraction. Our apache_hbase topology builds a Docker image from a
> > > Java
> > > > > tarball, Hadoop tarball, and an HBase version. This last part can
> be
> > > > either
> > > > > a binary tarball (for RC testing or playing around with a release)
> > or a
> > > > Git
> > > > > commit, in which case our clusterdock topology builds HBase from
> > > source.
> > > > > Once we build a cluster, we can then push the cluster images
> > (actually,
> > > > > just one Docker image) to a shared Docker registry for repeated
> use.
> > We
> > > > now
> > > > > have a matrix job that can build any branches we care about (I set
> it
> > > up
> > > > > against branch-1.2
> > > > > <https://builds.apache.org/view/H-L/view/HBase/job/HBase-
> > > > > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.2,
> label=docker/>,
> > > > > branch-1.3
> > > > > <https://builds.apache.org/view/H-L/view/HBase/job/HBase-
> > > > > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.3,
> label=docker/>,
> > > > > and master
> > > > > <https://builds.apache.org/view/H-L/view/HBase/job/HBase-
> > > > > Build-clusterdock-Clusters/HBASE_VERSION=master,label=docker/>
> > > > > to start) and do this.
> > > > >
> > > > > Once these images are built (and pushed), we can use them to start
> up
> > > an
> > > > > n-node sized cluster on one host and run tests against it. To
> begin,
> > > I've
> > > > > set up a super simple Jenkins job that starts up a 5-node cluster,
> > runs
> > > > > ITBLL (with an optional Chaos Monkey), and then exits.
> > > > >
> > > > > This work is being tracked in HBASE-15964 and there's much more
> that
> > I
> > > > want
> > > > > to do (more tests, more Chaos Monkeys, more branches, more
> diagnostic
> > > > > information collection when a test fails), but I figured I'd let
> you
> > > guys
> > > > > know about what have going so far. :)
> > > > >
> > > > > PS: Special thanks to Jon Hsieh for helping me get the Jenkins jobs
> > > > > running.
> > > > >
> > > > > --
> > > > > -Dima
> > > > >
> > > >
> > >
> >
> >
> > --
> > -Dima
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>



-- 
-Dima

Reply via email to