[DISCUSS] options for precommit test reliability?

Sean Busbey Mon, 09 Oct 2017 07:39:59 -0700

Hi folks!

Lately our precommit runs have had a large amount of noise around unit
test failures due to timeout, especially for the hbase-server module.


I'd really like to get us back to a place where a precommit -1 doesn't
just result in a reflexive "precommit is unreliable."

When the hbase-server module is going to be run (which would include
changes to that module and changes to the top-level of the project), I
can think of a few ways to bring the noise down:

* Do fewer parallel executions. We do 5 tests at once now and the
hbase-server module takes ~1.5 hours. We could tune down just the
hbase-server module to do fewer.
* Do more test re-runs. We could have tests that fail retry more. I
think maybe we allow a single retry currently via surefire. We'd have
to do it outside of surefire to account for the large number of
time-out failures.
* Don't run the hbase-server module tests (or just run those tests
that expressly changed in the patch). Instead, we'd include a warning
to the committer that they need to test this particular module
independently. We could also add a committer-initiated jenkins job
that runs the tests for just hbase-server.

What do folks think?

[DISCUSS] options for precommit test reliability?

Reply via email to