Re: [DISCUSS] options for precommit test reliability?

Sean Busbey Wed, 11 Oct 2017 06:44:37 -0700

Currently our precommit build has a history of ~233 builds.

Looking across[1] those for those with unit test logs, and treating
the string "timeout" as an indicator that things failed because of
timeout rather than a known bad answer, we have 80 builds that had one
or more test timeout.


breaking this down by host:

| Host | % timeout | Success | Timeout Failure | General Failure |
| ---- | ---------:| -------:| ---------------:| ---------------:|
| H0   | 42%       | 10      | 15              | 11              |
| H1   | 54%       | 6       | 14              | 6               |
| H2   | 45%       | 18      | 35              | 24              |
| H3   | 100%      | 0       | 1               | 0               |
| H4   | 0%        | 1       | 0               | 2               |
| H5   | 20%       | 1       | 1               | 3               |
| H6   | 44%       | 4       | 4               | 1               |
| H9   | 35%       | 2       | 7               | 11              |
| H10  | 26%       | 4       | 8               | 19              |
| H11  | 0%        | 0       | 0               | 2               |
| H12  | 43%       | 1       | 3               | 3               |
| H13  | 22%       | 1       | 2               | 6               |
| H26  | 0%        | 0       | 0               | 1               |


It's odd that we so strongly favor H2. But I don't see evidence that
we have a bad host that we could just exclude.

Scaling our concurrency by number of CPU cores is something surefire
can do. Let me see what the H* hosts look like to figure out some
example mappings. Do we have a rough bound on how many cores a single
test using MiniCluster should need? 3?

-busbey

[1]: By "looking across" I mean using the python-jenkins library

https://gist.github.com/busbey/ff5f7ae3a292164cc110fdb934935c8c



On Mon, Oct 9, 2017 at 4:40 PM, Stack <[email protected]> wrote:
> On Mon, Oct 9, 2017 at 7:38 AM, Sean Busbey <[email protected]> wrote:
>
>> Hi folks!
>>
>> Lately our precommit runs have had a large amount of noise around unit
>> test failures due to timeout, especially for the hbase-server module.
>>
>>
> I've not looked at why the timeouts. Anyone? Usually there is a cause.
>
> ...
>
>
>> I'd really like to get us back to a place where a precommit -1 doesn't
>> just result in a reflexive "precommit is unreliable."
>
>
> This is the default. The exception is one of us works on stabilizing test
> suite. It takes a while and a bunch of effort but stabilization has been
> doable in the past. Once stable, it stays that way a while before the rot
> sets in.
>
>
>
>> * Do fewer parallel executions. We do 5 tests at once now and the
>> hbase-server module takes ~1.5 hours. We could tune down just the
>> hbase-server module to do fewer.
>>
>
>
> Is it the loading that is the issue or tests stamping on each other. If
> latter, I'd think we'd want to fix it. If former, would want to look at it
> too; I'd think our tests shouldn't be such that they fall over if the
> context is other than 'perfect'.
>
> I've not looked at a machine when five concurrent hbase tests running. Is
> it even putting up a load? Over the extent of the full test suite? Or is it
> that it is just a few tests that when run together, they cause issue. Could
> we stagger these or give them their own category or have them burn less
> brightly?
>
> If tests are failing because contention for resources, we should fix the
> test. If given a machine, we should burn it up rather than pussy-foot it
> I'd say (can we size the concurrency off a query of the underlying OS so we
> step by CPUs say?).
>
> Tests could do with an edit. Generally, tests are written once and then
> never touched again. Meantime the system evolves. Edit could look for
> redundancy. Edit could look for cases where we start clusters
> --timeconsumming--  and we don't have to (use Mocks or start standalone
> instances instead). We also have some crazy tests that spin up lots of
> clusters all inside a single JVM though the context is the same as that of
> a simple method evaluation.
>
> St.Ack

Re: [DISCUSS] options for precommit test reliability?

Reply via email to