The clock sync errors do seem to have increased over the past few months. If we could just fix those, I think we'd be left with almost entirely "known" flakies. Any ideas as to what's going on? Think it's something that could be addressed with your NTP-client-in-Kudu patch series?
On Fri, Mar 23, 2018 at 9:58 AM, Todd Lipcon <[email protected]> wrote: > It seems that over recent weeks our precommits have gotten somewhat flaky. > Some of this is due to actual flaky tests (most of which are tracked by > JIRAs) but a lot has been due to issues like clock synchronization problems > on the dist-test slaves. > > I'd like to consider changing precommit to retry _all_ tests up to 3 times, > instead of just known-flakies. It's a bit of a heavy hammer -- the risk is > that if you introduce flakiness in a test you aren't likely to see it > precommit, but I think the upside of avoiding wasted effort triaging failed > precommits is probably worth it. > > Longer term hopefully we can improve the dist-test software to support > something like a "retry if results match a certain regex" to check for > clock sync errors or somesuch, but I think it's non-trivial. > > Thoughts? > > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera
