Re: IntegrationTestBigLinkedList now running on builds.apache.org

2016-08-22 Thread Ted Yu
Nice job, Dima.

Is there Jenkins job for running ITBLL for 1.2 / 1.3 branches ?

Cheers

On Mon, Aug 22, 2016 at 5:33 PM, Dima Spivak  wrote:

> Dear devs,
>
> tl;dr: We now have Jenkins jobs
> 
> that can run IntegrationTestBigLinkedList with fault injection on 5-node
> Apache HBase clusters built from source.
>
> Long version:
>
> I just wanted to provide an update on some recent work we've gotten done
> since committing an Apache HBase topology for clusterdock
>  f39bd749a0>
> (a Python-based framework for building and starting Docker container-based
> clusters).
>
> Despite the existence of an awesome system test framework with
> fault-injection capabilities in the form of the hbase-it module, we've
> never had an easy way to run these tests on distributed clusters upstream.
> This has long been a big hole in our Jenkins test coverage, but since the
> clusterdock topology got committed, we've been making progress on doing
> something about it. I'm happy to report that, starting today, we are now
> running IntegrationTestBigLinkedList with fault-injection on Apache
> Infrastructure
> 
> .
>
> Even longer version (stop reading here if you don't care how we do it):
>
> So how do we do it? Well clusterdock is designed to start up multiple
> Docker containers on one host where each containers acts like a lightweight
> VM (so 4 containers = 4-node cluster). What's in these containers (and what
> to do when starting them) is controlled by clusterdock's "topology"
> abstraction. Our apache_hbase topology builds a Docker image from a Java
> tarball, Hadoop tarball, and an HBase version. This last part can be either
> a binary tarball (for RC testing or playing around with a release) or a Git
> commit, in which case our clusterdock topology builds HBase from source.
> Once we build a cluster, we can then push the cluster images (actually,
> just one Docker image) to a shared Docker registry for repeated use. We now
> have a matrix job that can build any branches we care about (I set it up
> against branch-1.2
>  Build-clusterdock-Clusters/HBASE_VERSION=branch-1.2,label=docker/>,
> branch-1.3
>  Build-clusterdock-Clusters/HBASE_VERSION=branch-1.3,label=docker/>,
> and master
>  Build-clusterdock-Clusters/HBASE_VERSION=master,label=docker/>
> to start) and do this.
>
> Once these images are built (and pushed), we can use them to start up an
> n-node sized cluster on one host and run tests against it. To begin, I've
> set up a super simple Jenkins job that starts up a 5-node cluster, runs
> ITBLL (with an optional Chaos Monkey), and then exits.
>
> This work is being tracked in HBASE-15964 and there's much more that I want
> to do (more tests, more Chaos Monkeys, more branches, more diagnostic
> information collection when a test fails), but I figured I'd let you guys
> know about what have going so far. :)
>
> PS: Special thanks to Jon Hsieh for helping me get the Jenkins jobs
> running.
>
> --
> -Dima
>


Re: IntegrationTestBigLinkedList now running on builds.apache.org

2016-08-22 Thread Nick Dimiduk
This sounds great! Is there a way to gather logs and/or data files from the
containers before termination? Can they be stored on Jenkins as part of the
job artifacts?

On Monday, August 22, 2016, Ted Yu  wrote:

> Nice job, Dima.
>
> Is there Jenkins job for running ITBLL for 1.2 / 1.3 branches ?
>
> Cheers
>
> On Mon, Aug 22, 2016 at 5:33 PM, Dima Spivak  > wrote:
>
> > Dear devs,
> >
> > tl;dr: We now have Jenkins jobs
> >  IntegrationTestBigLinkedList/>
> > that can run IntegrationTestBigLinkedList with fault injection on 5-node
> > Apache HBase clusters built from source.
> >
> > Long version:
> >
> > I just wanted to provide an update on some recent work we've gotten done
> > since committing an Apache HBase topology for clusterdock
> >  > f39bd749a0>
> > (a Python-based framework for building and starting Docker
> container-based
> > clusters).
> >
> > Despite the existence of an awesome system test framework with
> > fault-injection capabilities in the form of the hbase-it module, we've
> > never had an easy way to run these tests on distributed clusters
> upstream.
> > This has long been a big hole in our Jenkins test coverage, but since the
> > clusterdock topology got committed, we've been making progress on doing
> > something about it. I'm happy to report that, starting today, we are now
> > running IntegrationTestBigLinkedList with fault-injection on Apache
> > Infrastructure
> >  IntegrationTestBigLinkedList/>
> > .
> >
> > Even longer version (stop reading here if you don't care how we do it):
> >
> > So how do we do it? Well clusterdock is designed to start up multiple
> > Docker containers on one host where each containers acts like a
> lightweight
> > VM (so 4 containers = 4-node cluster). What's in these containers (and
> what
> > to do when starting them) is controlled by clusterdock's "topology"
> > abstraction. Our apache_hbase topology builds a Docker image from a Java
> > tarball, Hadoop tarball, and an HBase version. This last part can be
> either
> > a binary tarball (for RC testing or playing around with a release) or a
> Git
> > commit, in which case our clusterdock topology builds HBase from source.
> > Once we build a cluster, we can then push the cluster images (actually,
> > just one Docker image) to a shared Docker registry for repeated use. We
> now
> > have a matrix job that can build any branches we care about (I set it up
> > against branch-1.2
> >  > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.2,label=docker/>,
> > branch-1.3
> >  > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.3,label=docker/>,
> > and master
> >  > Build-clusterdock-Clusters/HBASE_VERSION=master,label=docker/>
> > to start) and do this.
> >
> > Once these images are built (and pushed), we can use them to start up an
> > n-node sized cluster on one host and run tests against it. To begin, I've
> > set up a super simple Jenkins job that starts up a 5-node cluster, runs
> > ITBLL (with an optional Chaos Monkey), and then exits.
> >
> > This work is being tracked in HBASE-15964 and there's much more that I
> want
> > to do (more tests, more Chaos Monkeys, more branches, more diagnostic
> > information collection when a test fails), but I figured I'd let you guys
> > know about what have going so far. :)
> >
> > PS: Special thanks to Jon Hsieh for helping me get the Jenkins jobs
> > running.
> >
> > --
> > -Dima
> >
>


Re: IntegrationTestBigLinkedList now running on builds.apache.org

2016-08-22 Thread Stack
On Mon, Aug 22, 2016 at 5:33 PM, Dima Spivak  wrote:

> Dear devs,
>
> tl;dr: We now have Jenkins jobs
> 
> that can run IntegrationTestBigLinkedList with fault injection on 5-node
> Apache HBase clusters built from source.
>
> Long version:
>
> I just wanted to provide an update on some recent work we've gotten done
> since committing an Apache HBase topology for clusterdock
>  f39bd749a0>
> (a Python-based framework for building and starting Docker container-based
> clusters).
>
> Despite the existence of an awesome system test framework with
> fault-injection capabilities in the form of the hbase-it module, we've
> never had an easy way to run these tests on distributed clusters upstream.
> This has long been a big hole in our Jenkins test coverage, but since the
> clusterdock topology got committed, we've been making progress on doing
> something about it. I'm happy to report that, starting today, we are now
> running IntegrationTestBigLinkedList with fault-injection on Apache
> Infrastructure
> 
> .
>
>
Hotdog!
St.Ack


> Even longer version (stop reading here if you don't care how we do it):
>
> So how do we do it? Well clusterdock is designed to start up multiple
> Docker containers on one host where each containers acts like a lightweight
> VM (so 4 containers = 4-node cluster). What's in these containers (and what
> to do when starting them) is controlled by clusterdock's "topology"
> abstraction. Our apache_hbase topology builds a Docker image from a Java
> tarball, Hadoop tarball, and an HBase version. This last part can be either
> a binary tarball (for RC testing or playing around with a release) or a Git
> commit, in which case our clusterdock topology builds HBase from source.
> Once we build a cluster, we can then push the cluster images (actually,
> just one Docker image) to a shared Docker registry for repeated use. We now
> have a matrix job that can build any branches we care about (I set it up
> against branch-1.2
>  Build-clusterdock-Clusters/HBASE_VERSION=branch-1.2,label=docker/>,
> branch-1.3
>  Build-clusterdock-Clusters/HBASE_VERSION=branch-1.3,label=docker/>,
> and master
>  Build-clusterdock-Clusters/HBASE_VERSION=master,label=docker/>
> to start) and do this.
>
> Once these images are built (and pushed), we can use them to start up an
> n-node sized cluster on one host and run tests against it. To begin, I've
> set up a super simple Jenkins job that starts up a 5-node cluster, runs
> ITBLL (with an optional Chaos Monkey), and then exits.
>
> This work is being tracked in HBASE-15964 and there's much more that I want
> to do (more tests, more Chaos Monkeys, more branches, more diagnostic
> information collection when a test fails), but I figured I'd let you guys
> know about what have going so far. :)
>
> PS: Special thanks to Jon Hsieh for helping me get the Jenkins jobs
> running.
>
> --
> -Dima
>


Re: IntegrationTestBigLinkedList now running on builds.apache.org

2016-08-23 Thread Dima Spivak
Not yet, but I plan on adding them once I get master passing. Stay tuned!

On Monday, August 22, 2016, Ted Yu  wrote:

> Nice job, Dima.
>
> Is there Jenkins job for running ITBLL for 1.2 / 1.3 branches ?
>
> Cheers
>
> On Mon, Aug 22, 2016 at 5:33 PM, Dima Spivak  > wrote:
>
> > Dear devs,
> >
> > tl;dr: We now have Jenkins jobs
> >  IntegrationTestBigLinkedList/>
> > that can run IntegrationTestBigLinkedList with fault injection on 5-node
> > Apache HBase clusters built from source.
> >
> > Long version:
> >
> > I just wanted to provide an update on some recent work we've gotten done
> > since committing an Apache HBase topology for clusterdock
> >  > f39bd749a0>
> > (a Python-based framework for building and starting Docker
> container-based
> > clusters).
> >
> > Despite the existence of an awesome system test framework with
> > fault-injection capabilities in the form of the hbase-it module, we've
> > never had an easy way to run these tests on distributed clusters
> upstream.
> > This has long been a big hole in our Jenkins test coverage, but since the
> > clusterdock topology got committed, we've been making progress on doing
> > something about it. I'm happy to report that, starting today, we are now
> > running IntegrationTestBigLinkedList with fault-injection on Apache
> > Infrastructure
> >  IntegrationTestBigLinkedList/>
> > .
> >
> > Even longer version (stop reading here if you don't care how we do it):
> >
> > So how do we do it? Well clusterdock is designed to start up multiple
> > Docker containers on one host where each containers acts like a
> lightweight
> > VM (so 4 containers = 4-node cluster). What's in these containers (and
> what
> > to do when starting them) is controlled by clusterdock's "topology"
> > abstraction. Our apache_hbase topology builds a Docker image from a Java
> > tarball, Hadoop tarball, and an HBase version. This last part can be
> either
> > a binary tarball (for RC testing or playing around with a release) or a
> Git
> > commit, in which case our clusterdock topology builds HBase from source.
> > Once we build a cluster, we can then push the cluster images (actually,
> > just one Docker image) to a shared Docker registry for repeated use. We
> now
> > have a matrix job that can build any branches we care about (I set it up
> > against branch-1.2
> >  > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.2,label=docker/>,
> > branch-1.3
> >  > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.3,label=docker/>,
> > and master
> >  > Build-clusterdock-Clusters/HBASE_VERSION=master,label=docker/>
> > to start) and do this.
> >
> > Once these images are built (and pushed), we can use them to start up an
> > n-node sized cluster on one host and run tests against it. To begin, I've
> > set up a super simple Jenkins job that starts up a 5-node cluster, runs
> > ITBLL (with an optional Chaos Monkey), and then exits.
> >
> > This work is being tracked in HBASE-15964 and there's much more that I
> want
> > to do (more tests, more Chaos Monkeys, more branches, more diagnostic
> > information collection when a test fails), but I figured I'd let you guys
> > know about what have going so far. :)
> >
> > PS: Special thanks to Jon Hsieh for helping me get the Jenkins jobs
> > running.
> >
> > --
> > -Dima
> >
>


-- 
-Dima


Re: IntegrationTestBigLinkedList now running on builds.apache.org

2016-08-23 Thread Dima Spivak
Yep, that's the next improvement I plan on making. Docker has API endpoints
for copying files from a container to the host, so I can definitely use
that to move logs from the cluster to the Jenkins workspace if a test fails.

On Monday, August 22, 2016, Nick Dimiduk  wrote:

> This sounds great! Is there a way to gather logs and/or data files from the
> containers before termination? Can they be stored on Jenkins as part of the
> job artifacts?
>
> On Monday, August 22, 2016, Ted Yu >
> wrote:
>
> > Nice job, Dima.
> >
> > Is there Jenkins job for running ITBLL for 1.2 / 1.3 branches ?
> >
> > Cheers
> >
> > On Mon, Aug 22, 2016 at 5:33 PM, Dima Spivak  
> > > wrote:
> >
> > > Dear devs,
> > >
> > > tl;dr: We now have Jenkins jobs
> > >  > IntegrationTestBigLinkedList/>
> > > that can run IntegrationTestBigLinkedList with fault injection on
> 5-node
> > > Apache HBase clusters built from source.
> > >
> > > Long version:
> > >
> > > I just wanted to provide an update on some recent work we've gotten
> done
> > > since committing an Apache HBase topology for clusterdock
> > >  > > f39bd749a0>
> > > (a Python-based framework for building and starting Docker
> > container-based
> > > clusters).
> > >
> > > Despite the existence of an awesome system test framework with
> > > fault-injection capabilities in the form of the hbase-it module, we've
> > > never had an easy way to run these tests on distributed clusters
> > upstream.
> > > This has long been a big hole in our Jenkins test coverage, but since
> the
> > > clusterdock topology got committed, we've been making progress on doing
> > > something about it. I'm happy to report that, starting today, we are
> now
> > > running IntegrationTestBigLinkedList with fault-injection on Apache
> > > Infrastructure
> > >  > IntegrationTestBigLinkedList/>
> > > .
> > >
> > > Even longer version (stop reading here if you don't care how we do it):
> > >
> > > So how do we do it? Well clusterdock is designed to start up multiple
> > > Docker containers on one host where each containers acts like a
> > lightweight
> > > VM (so 4 containers = 4-node cluster). What's in these containers (and
> > what
> > > to do when starting them) is controlled by clusterdock's "topology"
> > > abstraction. Our apache_hbase topology builds a Docker image from a
> Java
> > > tarball, Hadoop tarball, and an HBase version. This last part can be
> > either
> > > a binary tarball (for RC testing or playing around with a release) or a
> > Git
> > > commit, in which case our clusterdock topology builds HBase from
> source.
> > > Once we build a cluster, we can then push the cluster images (actually,
> > > just one Docker image) to a shared Docker registry for repeated use. We
> > now
> > > have a matrix job that can build any branches we care about (I set it
> up
> > > against branch-1.2
> > >  > > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.2,label=docker/>,
> > > branch-1.3
> > >  > > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.3,label=docker/>,
> > > and master
> > >  > > Build-clusterdock-Clusters/HBASE_VERSION=master,label=docker/>
> > > to start) and do this.
> > >
> > > Once these images are built (and pushed), we can use them to start up
> an
> > > n-node sized cluster on one host and run tests against it. To begin,
> I've
> > > set up a super simple Jenkins job that starts up a 5-node cluster, runs
> > > ITBLL (with an optional Chaos Monkey), and then exits.
> > >
> > > This work is being tracked in HBASE-15964 and there's much more that I
> > want
> > > to do (more tests, more Chaos Monkeys, more branches, more diagnostic
> > > information collection when a test fails), but I figured I'd let you
> guys
> > > know about what have going so far. :)
> > >
> > > PS: Special thanks to Jon Hsieh for helping me get the Jenkins jobs
> > > running.
> > >
> > > --
> > > -Dima
> > >
> >
>


-- 
-Dima


Re: IntegrationTestBigLinkedList now running on builds.apache.org

2016-08-23 Thread Ted Yu
ITBLL in master branch has not been run manually for quite some time.

I wouldn't be surprised if it doesn't pass with serverKilling monkey.

Starting from 1.2 / 1.3 may get you to a green build faster.

Cheers

On Tue, Aug 23, 2016 at 12:24 AM, Dima Spivak  wrote:

> Not yet, but I plan on adding them once I get master passing. Stay tuned!
>
> On Monday, August 22, 2016, Ted Yu  wrote:
>
> > Nice job, Dima.
> >
> > Is there Jenkins job for running ITBLL for 1.2 / 1.3 branches ?
> >
> > Cheers
> >
> > On Mon, Aug 22, 2016 at 5:33 PM, Dima Spivak  > > wrote:
> >
> > > Dear devs,
> > >
> > > tl;dr: We now have Jenkins jobs
> > >  > IntegrationTestBigLinkedList/>
> > > that can run IntegrationTestBigLinkedList with fault injection on
> 5-node
> > > Apache HBase clusters built from source.
> > >
> > > Long version:
> > >
> > > I just wanted to provide an update on some recent work we've gotten
> done
> > > since committing an Apache HBase topology for clusterdock
> > >  > > f39bd749a0>
> > > (a Python-based framework for building and starting Docker
> > container-based
> > > clusters).
> > >
> > > Despite the existence of an awesome system test framework with
> > > fault-injection capabilities in the form of the hbase-it module, we've
> > > never had an easy way to run these tests on distributed clusters
> > upstream.
> > > This has long been a big hole in our Jenkins test coverage, but since
> the
> > > clusterdock topology got committed, we've been making progress on doing
> > > something about it. I'm happy to report that, starting today, we are
> now
> > > running IntegrationTestBigLinkedList with fault-injection on Apache
> > > Infrastructure
> > >  > IntegrationTestBigLinkedList/>
> > > .
> > >
> > > Even longer version (stop reading here if you don't care how we do it):
> > >
> > > So how do we do it? Well clusterdock is designed to start up multiple
> > > Docker containers on one host where each containers acts like a
> > lightweight
> > > VM (so 4 containers = 4-node cluster). What's in these containers (and
> > what
> > > to do when starting them) is controlled by clusterdock's "topology"
> > > abstraction. Our apache_hbase topology builds a Docker image from a
> Java
> > > tarball, Hadoop tarball, and an HBase version. This last part can be
> > either
> > > a binary tarball (for RC testing or playing around with a release) or a
> > Git
> > > commit, in which case our clusterdock topology builds HBase from
> source.
> > > Once we build a cluster, we can then push the cluster images (actually,
> > > just one Docker image) to a shared Docker registry for repeated use. We
> > now
> > > have a matrix job that can build any branches we care about (I set it
> up
> > > against branch-1.2
> > >  > > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.2,label=docker/>,
> > > branch-1.3
> > >  > > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.3,label=docker/>,
> > > and master
> > >  > > Build-clusterdock-Clusters/HBASE_VERSION=master,label=docker/>
> > > to start) and do this.
> > >
> > > Once these images are built (and pushed), we can use them to start up
> an
> > > n-node sized cluster on one host and run tests against it. To begin,
> I've
> > > set up a super simple Jenkins job that starts up a 5-node cluster, runs
> > > ITBLL (with an optional Chaos Monkey), and then exits.
> > >
> > > This work is being tracked in HBASE-15964 and there's much more that I
> > want
> > > to do (more tests, more Chaos Monkeys, more branches, more diagnostic
> > > information collection when a test fails), but I figured I'd let you
> guys
> > > know about what have going so far. :)
> > >
> > > PS: Special thanks to Jon Hsieh for helping me get the Jenkins jobs
> > > running.
> > >
> > > --
> > > -Dima
> > >
> >
>
>
> --
> -Dima
>


Re: IntegrationTestBigLinkedList now running on builds.apache.org

2016-08-23 Thread Andrew Purtell
This is great.

To completely retrace a rare botch we may need persisted post run:
- The console log of the rum
- All daemon logs
- All WALs
- All HFiles
WALs and HFiles should be be organized by time from oldest to newest.

All could reside in a S3 bucket.



On Tue, Aug 23, 2016 at 12:26 AM, Dima Spivak  wrote:

> Yep, that's the next improvement I plan on making. Docker has API endpoints
> for copying files from a container to the host, so I can definitely use
> that to move logs from the cluster to the Jenkins workspace if a test
> fails.
>
> On Monday, August 22, 2016, Nick Dimiduk  wrote:
>
> > This sounds great! Is there a way to gather logs and/or data files from
> the
> > containers before termination? Can they be stored on Jenkins as part of
> the
> > job artifacts?
> >
> > On Monday, August 22, 2016, Ted Yu >
> > wrote:
> >
> > > Nice job, Dima.
> > >
> > > Is there Jenkins job for running ITBLL for 1.2 / 1.3 branches ?
> > >
> > > Cheers
> > >
> > > On Mon, Aug 22, 2016 at 5:33 PM, Dima Spivak  > 
> > > > wrote:
> > >
> > > > Dear devs,
> > > >
> > > > tl;dr: We now have Jenkins jobs
> > > >  > > IntegrationTestBigLinkedList/>
> > > > that can run IntegrationTestBigLinkedList with fault injection on
> > 5-node
> > > > Apache HBase clusters built from source.
> > > >
> > > > Long version:
> > > >
> > > > I just wanted to provide an update on some recent work we've gotten
> > done
> > > > since committing an Apache HBase topology for clusterdock
> > > >  ccf5d27d7aa238c8398d2818928a71
> > > > f39bd749a0>
> > > > (a Python-based framework for building and starting Docker
> > > container-based
> > > > clusters).
> > > >
> > > > Despite the existence of an awesome system test framework with
> > > > fault-injection capabilities in the form of the hbase-it module,
> we've
> > > > never had an easy way to run these tests on distributed clusters
> > > upstream.
> > > > This has long been a big hole in our Jenkins test coverage, but since
> > the
> > > > clusterdock topology got committed, we've been making progress on
> doing
> > > > something about it. I'm happy to report that, starting today, we are
> > now
> > > > running IntegrationTestBigLinkedList with fault-injection on Apache
> > > > Infrastructure
> > > >  > > IntegrationTestBigLinkedList/>
> > > > .
> > > >
> > > > Even longer version (stop reading here if you don't care how we do
> it):
> > > >
> > > > So how do we do it? Well clusterdock is designed to start up multiple
> > > > Docker containers on one host where each containers acts like a
> > > lightweight
> > > > VM (so 4 containers = 4-node cluster). What's in these containers
> (and
> > > what
> > > > to do when starting them) is controlled by clusterdock's "topology"
> > > > abstraction. Our apache_hbase topology builds a Docker image from a
> > Java
> > > > tarball, Hadoop tarball, and an HBase version. This last part can be
> > > either
> > > > a binary tarball (for RC testing or playing around with a release)
> or a
> > > Git
> > > > commit, in which case our clusterdock topology builds HBase from
> > source.
> > > > Once we build a cluster, we can then push the cluster images
> (actually,
> > > > just one Docker image) to a shared Docker registry for repeated use.
> We
> > > now
> > > > have a matrix job that can build any branches we care about (I set it
> > up
> > > > against branch-1.2
> > > >  > > > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.2,label=docker/>,
> > > > branch-1.3
> > > >  > > > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.3,label=docker/>,
> > > > and master
> > > >  > > > Build-clusterdock-Clusters/HBASE_VERSION=master,label=docker/>
> > > > to start) and do this.
> > > >
> > > > Once these images are built (and pushed), we can use them to start up
> > an
> > > > n-node sized cluster on one host and run tests against it. To begin,
> > I've
> > > > set up a super simple Jenkins job that starts up a 5-node cluster,
> runs
> > > > ITBLL (with an optional Chaos Monkey), and then exits.
> > > >
> > > > This work is being tracked in HBASE-15964 and there's much more that
> I
> > > want
> > > > to do (more tests, more Chaos Monkeys, more branches, more diagnostic
> > > > information collection when a test fails), but I figured I'd let you
> > guys
> > > > know about what have going so far. :)
> > > >
> > > > PS: Special thanks to Jon Hsieh for helping me get the Jenkins jobs
> > > > running.
> > > >
> > > > --
> > > > -Dima
> > > >
> > >
> >
>
>
> --
> -Dima
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: IntegrationTestBigLinkedList now running on builds.apache.org

2016-08-23 Thread Dima Spivak
I've opened HBASE-16481 as an umbrella JIRA for improvements to this and
added running on more branches and collecting logs/HFiles/WALs as subtasks.
Please keep the suggestions coming!

On Tue, Aug 23, 2016 at 9:44 AM, Andrew Purtell  wrote:

> This is great.
>
> To completely retrace a rare botch we may need persisted post run:
> - The console log of the rum
> - All daemon logs
> - All WALs
> - All HFiles
> WALs and HFiles should be be organized by time from oldest to newest.
>
> All could reside in a S3 bucket.
>
>
>
> On Tue, Aug 23, 2016 at 12:26 AM, Dima Spivak 
> wrote:
>
> > Yep, that's the next improvement I plan on making. Docker has API
> endpoints
> > for copying files from a container to the host, so I can definitely use
> > that to move logs from the cluster to the Jenkins workspace if a test
> > fails.
> >
> > On Monday, August 22, 2016, Nick Dimiduk  wrote:
> >
> > > This sounds great! Is there a way to gather logs and/or data files from
> > the
> > > containers before termination? Can they be stored on Jenkins as part of
> > the
> > > job artifacts?
> > >
> > > On Monday, August 22, 2016, Ted Yu  >
> > > wrote:
> > >
> > > > Nice job, Dima.
> > > >
> > > > Is there Jenkins job for running ITBLL for 1.2 / 1.3 branches ?
> > > >
> > > > Cheers
> > > >
> > > > On Mon, Aug 22, 2016 at 5:33 PM, Dima Spivak  > > 
> > > > > wrote:
> > > >
> > > > > Dear devs,
> > > > >
> > > > > tl;dr: We now have Jenkins jobs
> > > > >  > > > IntegrationTestBigLinkedList/>
> > > > > that can run IntegrationTestBigLinkedList with fault injection on
> > > 5-node
> > > > > Apache HBase clusters built from source.
> > > > >
> > > > > Long version:
> > > > >
> > > > > I just wanted to provide an update on some recent work we've gotten
> > > done
> > > > > since committing an Apache HBase topology for clusterdock
> > > > >  > ccf5d27d7aa238c8398d2818928a71
> > > > > f39bd749a0>
> > > > > (a Python-based framework for building and starting Docker
> > > > container-based
> > > > > clusters).
> > > > >
> > > > > Despite the existence of an awesome system test framework with
> > > > > fault-injection capabilities in the form of the hbase-it module,
> > we've
> > > > > never had an easy way to run these tests on distributed clusters
> > > > upstream.
> > > > > This has long been a big hole in our Jenkins test coverage, but
> since
> > > the
> > > > > clusterdock topology got committed, we've been making progress on
> > doing
> > > > > something about it. I'm happy to report that, starting today, we
> are
> > > now
> > > > > running IntegrationTestBigLinkedList with fault-injection on Apache
> > > > > Infrastructure
> > > > >  > > > IntegrationTestBigLinkedList/>
> > > > > .
> > > > >
> > > > > Even longer version (stop reading here if you don't care how we do
> > it):
> > > > >
> > > > > So how do we do it? Well clusterdock is designed to start up
> multiple
> > > > > Docker containers on one host where each containers acts like a
> > > > lightweight
> > > > > VM (so 4 containers = 4-node cluster). What's in these containers
> > (and
> > > > what
> > > > > to do when starting them) is controlled by clusterdock's "topology"
> > > > > abstraction. Our apache_hbase topology builds a Docker image from a
> > > Java
> > > > > tarball, Hadoop tarball, and an HBase version. This last part can
> be
> > > > either
> > > > > a binary tarball (for RC testing or playing around with a release)
> > or a
> > > > Git
> > > > > commit, in which case our clusterdock topology builds HBase from
> > > source.
> > > > > Once we build a cluster, we can then push the cluster images
> > (actually,
> > > > > just one Docker image) to a shared Docker registry for repeated
> use.
> > We
> > > > now
> > > > > have a matrix job that can build any branches we care about (I set
> it
> > > up
> > > > > against branch-1.2
> > > > >  > > > > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.2,
> label=docker/>,
> > > > > branch-1.3
> > > > >  > > > > Build-clusterdock-Clusters/HBASE_VERSION=branch-1.3,
> label=docker/>,
> > > > > and master
> > > > >  > > > > Build-clusterdock-Clusters/HBASE_VERSION=master,label=docker/>
> > > > > to start) and do this.
> > > > >
> > > > > Once these images are built (and pushed), we can use them to start
> up
> > > an
> > > > > n-node sized cluster on one host and run tests against it. To
> begin,
> > > I've
> > > > > set up a super simple Jenkins job that starts up a 5-node cluster,
> > runs
> > > > > ITBLL (with an optional Chaos Monkey), and then exits.
> > > > >
> > > > > This work is being tracked in HBASE-15964 and there's much more
> that
> > I
> > > > want
> > > > > to do (more tests, more Chaos Monkeys, more br