Hi folks! Lately our precommit runs have had a large amount of noise around unit test failures due to timeout, especially for the hbase-server module.
I'd really like to get us back to a place where a precommit -1 doesn't just result in a reflexive "precommit is unreliable." When the hbase-server module is going to be run (which would include changes to that module and changes to the top-level of the project), I can think of a few ways to bring the noise down: * Do fewer parallel executions. We do 5 tests at once now and the hbase-server module takes ~1.5 hours. We could tune down just the hbase-server module to do fewer. * Do more test re-runs. We could have tests that fail retry more. I think maybe we allow a single retry currently via surefire. We'd have to do it outside of surefire to account for the large number of time-out failures. * Don't run the hbase-server module tests (or just run those tests that expressly changed in the patch). Instead, we'd include a warning to the committer that they need to test this particular module independently. We could also add a committer-initiated jenkins job that runs the tests for just hbase-server. What do folks think?
