I find these options useful when running on contended or underpowered test
hosts
-Dsurefire.firstPartForkCount=1 \
-Dsurefire.secondPartForkCount=1 \
-Dsurefire.rerunFailingTestsCount=3
It balloons the test suite execution time, but produces more stable
results, and the rerun setting allows Surefire to help detect flaky tests.
On Mon, Oct 9, 2017 at 7:48 AM, Mike Drob <[email protected]> wrote:
> Addressing your individual suggestions inline.
>
> Another one that you missed (more long term) is splitting up the server
> module into smaller modules. We have some work on this already (backup,
> mapreduce) but it's a long way to go...
>
>
> On Mon, Oct 9, 2017 at 9:38 AM, Sean Busbey <[email protected]> wrote:
>
> > Hi folks!
> >
> > Lately our precommit runs have had a large amount of noise around unit
> > test failures due to timeout, especially for the hbase-server module.
> >
> > I'd really like to get us back to a place where a precommit -1 doesn't
> > just result in a reflexive "precommit is unreliable."
> >
> > When the hbase-server module is going to be run (which would include
> > changes to that module and changes to the top-level of the project), I
> > can think of a few ways to bring the noise down:
> >
> > * Do fewer parallel executions. We do 5 tests at once now and the
> > hbase-server module takes ~1.5 hours. We could tune down just the
> > hbase-server module to do fewer.
> >
>
> 1.5 hours is already past the threshold where I have to go do something
> else while I wait for the tests to finish. Putting this up to 3 hours
> wouldn't affect my productivity, I don't think.
>
>
> > * Do more test re-runs. We could have tests that fail retry more. I
> > think maybe we allow a single retry currently via surefire. We'd have
> > to do it outside of surefire to account for the large number of
> > time-out failures.
> >
>
> I like the idea of more retries, but I don't like going outside of
> surefire. I don't want us maintaining more custom hacks and shims in place
> for something that should be temporary - once we get the tests stabilized
> we shouldn't need it, right?
>
>
> > * Don't run the hbase-server module tests (or just run those tests
> > that expressly changed in the patch). Instead, we'd include a warning
> > to the committer that they need to test this particular module
> > independently. We could also add a committer-initiated jenkins job
> > that runs the tests for just hbase-server.
> >
>
> I'm optimistic about human nature, but I think this means that the tests
> just wouldn't get run.
>
> >
> > What do folks think?
> >
>
--
Best regards,
Andrew
Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
- A23, Crosstalk