On Mon, Oct 9, 2017 at 7:38 AM, Sean Busbey <[email protected]> wrote:
> Hi folks! > > Lately our precommit runs have had a large amount of noise around unit > test failures due to timeout, especially for the hbase-server module. > > I've not looked at why the timeouts. Anyone? Usually there is a cause. ... > I'd really like to get us back to a place where a precommit -1 doesn't > just result in a reflexive "precommit is unreliable." This is the default. The exception is one of us works on stabilizing test suite. It takes a while and a bunch of effort but stabilization has been doable in the past. Once stable, it stays that way a while before the rot sets in. > * Do fewer parallel executions. We do 5 tests at once now and the > hbase-server module takes ~1.5 hours. We could tune down just the > hbase-server module to do fewer. > Is it the loading that is the issue or tests stamping on each other. If latter, I'd think we'd want to fix it. If former, would want to look at it too; I'd think our tests shouldn't be such that they fall over if the context is other than 'perfect'. I've not looked at a machine when five concurrent hbase tests running. Is it even putting up a load? Over the extent of the full test suite? Or is it that it is just a few tests that when run together, they cause issue. Could we stagger these or give them their own category or have them burn less brightly? If tests are failing because contention for resources, we should fix the test. If given a machine, we should burn it up rather than pussy-foot it I'd say (can we size the concurrency off a query of the underlying OS so we step by CPUs say?). Tests could do with an edit. Generally, tests are written once and then never touched again. Meantime the system evolves. Edit could look for redundancy. Edit could look for cases where we start clusters --timeconsumming-- and we don't have to (use Mocks or start standalone instances instead). We also have some crazy tests that spin up lots of clusters all inside a single JVM though the context is the same as that of a simple method evaluation. St.Ack
