On Thu, Nov 5, 2015 at 8:07 AM, Andrew Purtell <andrew.purt...@gmail.com>
wrote:

> > Hanging tests have been fixed and or disabled to be put back after
> scrubbing.
>
> What do you think about an interim step that adds a flakey test category
> and a profile that disables them only on builds.a.o., i.e. the Jenkins job
> configuration turns them off. Is that possible? I'd like to continue
> running these on my build rigs since they are better endowed than build.a.o
> resources. Or at least a profile that can turn them on?
>
>
We could do such a thing. Probably better than the current hackery where
the test is just disabled with JIRAs to fix ...sometime.



> > This is a petition that we go out of our way going forward to keep OUR
> test suite blue.
>
> Big +1 here
>
>
Yeah. Its got to be a group thing.



> BTW it turns out after seeing the results of your effort that most of my
> issues with builds.a.o were probably due to the broken zombie killing
> thing. That's why locally run stuff (also under Jenkins sometimes btw) was
> just so much more stable. Can we have review and SCM of our build
> configurations somehow going forward?
>
>
Makes sense (and still work to do on zombie detector). Let me work on it.
St.Ack




>
>
>
> > On Oct 23, 2015, at 2:54 PM, Stack <st...@duboce.net> wrote:
> >
> > A few of us have been doing cleanup over the last month or so (see
> > HBASE-14420). As a project, we had let our unit test suite go to seed. It
> > was an anthology of mysterious crashes, zombies and flakes.
> >
> > We are not done yet but tests are mostly stable again with patch builds
> > passing close to 100% of the time as long as the patch is good and trunk
> > and branch-1/branch-1.2 are tending back toward being blue always.
> Hanging
> > tests have been fixed and or disabled to be put back after scrubbing.
> > Mysterious surefire crashes/timeouts have been addressed by purging a
> > problematic test set that we intend to re-add after tuneup and fix. There
> > are still a few flakies in the mix.
> >
> > This is a petition that we go out of our way going forward to keep OUR
> test
> > suite blue. We'll all be more productive if we can keep it this way.
> > Patches will land faster because there'll be less friction getting them
> in
> > (Landing big patches was taking me a week before starting in on this
> > effort). We'll catch a slew of problems before commit. New devs won't be
> > confounded by mysterious unrelated test fails. There'll be no need to
> keep
> > up an arcane knowledge of 'known flakies' or hanging tests or the need
> for
> > expending extra effort and resources doing 'look-it-works-locally-for-me'
> > test runs locally.
> >
> > St.Ack
> >
> > Below are some further notes for those interested in build and work done
> to
> > our test rig recently; ugly detail is over in HBASE-14420.
> >
> > Until an alternative shows up, our Apache Jenkins needs to run blue
> always
> > if we want to do community development. True, Apache Jenkins is a trying
> > environment in which to run tests, but it is shared, public, and I have
> yet
> > to come across a hang or failure that was Apache-Jenkins-only; the only
> > difference I've seen is that the incidence of hangs and flakies is higher
> > on Apache.
> >
> > The test-patch.sh script had some hacking done to it mostly removing code
> > that was finding and killing zombies. We were reporting ANY concurrent
> > build as a zombie, even those that were not hbase tests, and killing them
> > in the belief that they were leftovers from previous runs (the script
> had a
> > few different techniques for finding and executing adjacent processes).
> > This made some sense when we were supposed to be the only test running on
> > the box but this has not been true for a long time. Killing was
> > papering-over the fact that we were leaving zombies after us.
> >
> > The Jenkins build configuration also had zombie code from test-patch.sh
> in
> > it (still does -- a TODO). Builds now dump out test machine load and
> > listing of what else is running on the box at test start to give a sense
> of
> > how loaded the test box is.
> >
> > I feel particularly bad for the new contributors. They have it hard
> enough
> > already checking out a fat project with a slow build system with hours of
> > tests to run to verify changes. Lets spare them the added barrier of a
> > confounding experience when their nice patch throws up a mysterious
> jenkins
> > fail on submit.
>

Reply via email to