Thanks so much for banging on our tests and builds.a.o setup such that some 
sanity there has now been restored!

> Hanging tests have been fixed and or disabled to be put back after scrubbing.

What do you think about an interim step that adds a flakey test category and a 
profile that disables them only on builds.a.o., i.e. the Jenkins job 
configuration turns them off. Is that possible? I'd like to continue running 
these on my build rigs since they are better endowed than build.a.o resources. 
Or at least a profile that can turn them on? 

> This is a petition that we go out of our way going forward to keep OUR test 
> suite blue.

Big +1 here

BTW it turns out after seeing the results of your effort that most of my issues 
with builds.a.o were probably due to the broken zombie killing thing. That's 
why locally run stuff (also under Jenkins sometimes btw) was just so much more 
stable. Can we have review and SCM of our build configurations somehow going 
forward? 




> On Oct 23, 2015, at 2:54 PM, Stack <st...@duboce.net> wrote:
> 
> A few of us have been doing cleanup over the last month or so (see
> HBASE-14420). As a project, we had let our unit test suite go to seed. It
> was an anthology of mysterious crashes, zombies and flakes.
> 
> We are not done yet but tests are mostly stable again with patch builds
> passing close to 100% of the time as long as the patch is good and trunk
> and branch-1/branch-1.2 are tending back toward being blue always. Hanging
> tests have been fixed and or disabled to be put back after scrubbing.
> Mysterious surefire crashes/timeouts have been addressed by purging a
> problematic test set that we intend to re-add after tuneup and fix. There
> are still a few flakies in the mix.
> 
> This is a petition that we go out of our way going forward to keep OUR test
> suite blue. We'll all be more productive if we can keep it this way.
> Patches will land faster because there'll be less friction getting them in
> (Landing big patches was taking me a week before starting in on this
> effort). We'll catch a slew of problems before commit. New devs won't be
> confounded by mysterious unrelated test fails. There'll be no need to keep
> up an arcane knowledge of 'known flakies' or hanging tests or the need for
> expending extra effort and resources doing 'look-it-works-locally-for-me'
> test runs locally.
> 
> St.Ack
> 
> Below are some further notes for those interested in build and work done to
> our test rig recently; ugly detail is over in HBASE-14420.
> 
> Until an alternative shows up, our Apache Jenkins needs to run blue always
> if we want to do community development. True, Apache Jenkins is a trying
> environment in which to run tests, but it is shared, public, and I have yet
> to come across a hang or failure that was Apache-Jenkins-only; the only
> difference I've seen is that the incidence of hangs and flakies is higher
> on Apache.
> 
> The test-patch.sh script had some hacking done to it mostly removing code
> that was finding and killing zombies. We were reporting ANY concurrent
> build as a zombie, even those that were not hbase tests, and killing them
> in the belief that they were leftovers from previous runs (the script had a
> few different techniques for finding and executing adjacent processes).
> This made some sense when we were supposed to be the only test running on
> the box but this has not been true for a long time. Killing was
> papering-over the fact that we were leaving zombies after us.
> 
> The Jenkins build configuration also had zombie code from test-patch.sh in
> it (still does -- a TODO). Builds now dump out test machine load and
> listing of what else is running on the box at test start to give a sense of
> how loaded the test box is.
> 
> I feel particularly bad for the new contributors. They have it hard enough
> already checking out a fat project with a slow build system with hours of
> tests to run to verify changes. Lets spare them the added barrier of a
> confounding experience when their nice patch throws up a mysterious jenkins
> fail on submit.

Reply via email to