Currently our precommit build has a history of ~233 builds. Looking across[1] those for those with unit test logs, and treating the string "timeout" as an indicator that things failed because of timeout rather than a known bad answer, we have 80 builds that had one or more test timeout.
breaking this down by host: | Host | % timeout | Success | Timeout Failure | General Failure | | ---- | ---------:| -------:| ---------------:| ---------------:| | H0 | 42% | 10 | 15 | 11 | | H1 | 54% | 6 | 14 | 6 | | H2 | 45% | 18 | 35 | 24 | | H3 | 100% | 0 | 1 | 0 | | H4 | 0% | 1 | 0 | 2 | | H5 | 20% | 1 | 1 | 3 | | H6 | 44% | 4 | 4 | 1 | | H9 | 35% | 2 | 7 | 11 | | H10 | 26% | 4 | 8 | 19 | | H11 | 0% | 0 | 0 | 2 | | H12 | 43% | 1 | 3 | 3 | | H13 | 22% | 1 | 2 | 6 | | H26 | 0% | 0 | 0 | 1 | It's odd that we so strongly favor H2. But I don't see evidence that we have a bad host that we could just exclude. Scaling our concurrency by number of CPU cores is something surefire can do. Let me see what the H* hosts look like to figure out some example mappings. Do we have a rough bound on how many cores a single test using MiniCluster should need? 3? -busbey [1]: By "looking across" I mean using the python-jenkins library https://gist.github.com/busbey/ff5f7ae3a292164cc110fdb934935c8c On Mon, Oct 9, 2017 at 4:40 PM, Stack <[email protected]> wrote: > On Mon, Oct 9, 2017 at 7:38 AM, Sean Busbey <[email protected]> wrote: > >> Hi folks! >> >> Lately our precommit runs have had a large amount of noise around unit >> test failures due to timeout, especially for the hbase-server module. >> >> > I've not looked at why the timeouts. Anyone? Usually there is a cause. > > ... > > >> I'd really like to get us back to a place where a precommit -1 doesn't >> just result in a reflexive "precommit is unreliable." > > > This is the default. The exception is one of us works on stabilizing test > suite. It takes a while and a bunch of effort but stabilization has been > doable in the past. Once stable, it stays that way a while before the rot > sets in. > > > >> * Do fewer parallel executions. We do 5 tests at once now and the >> hbase-server module takes ~1.5 hours. We could tune down just the >> hbase-server module to do fewer. >> > > > Is it the loading that is the issue or tests stamping on each other. If > latter, I'd think we'd want to fix it. If former, would want to look at it > too; I'd think our tests shouldn't be such that they fall over if the > context is other than 'perfect'. > > I've not looked at a machine when five concurrent hbase tests running. Is > it even putting up a load? Over the extent of the full test suite? Or is it > that it is just a few tests that when run together, they cause issue. Could > we stagger these or give them their own category or have them burn less > brightly? > > If tests are failing because contention for resources, we should fix the > test. If given a machine, we should burn it up rather than pussy-foot it > I'd say (can we size the concurrency off a query of the underlying OS so we > step by CPUs say?). > > Tests could do with an edit. Generally, tests are written once and then > never touched again. Meantime the system evolves. Edit could look for > redundancy. Edit could look for cases where we start clusters > --timeconsumming-- and we don't have to (use Mocks or start standalone > instances instead). We also have some crazy tests that spin up lots of > clusters all inside a single JVM though the context is the same as that of > a simple method evaluation. > > St.Ack
