On Fri, Mar 23, 2018 at 11:03 AM, Adar Lieber-Dembo <[email protected]> wrote:
> The clock sync errors do seem to have increased over the past few > months. If we could just fix those, I think we'd be left with almost > entirely "known" flakies. Any ideas as to what's going on? Think it's > something that could be addressed with your NTP-client-in-Kudu patch > series? > Nothing has changed in the node configurations as far as I'm aware, so not sure why it appears to have become more common lately. I do think the 'NTP in kudu' could help a bit here, especially if it were only used as a "backup" in case the kernel is unsynchronized. I'm a little nervous about the impact on NTP servers, though, in our minicluster based tests where we might start and stop tens of thousands of times in the course of a 15-minute dist-test run. Wouldn't be surprised if that caused us to get blacklisted unless we took some effort to ensure that miniclusters "reuse" some NTP state instead of resynchronizing at startup. > > On Fri, Mar 23, 2018 at 9:58 AM, Todd Lipcon <[email protected]> wrote: > > It seems that over recent weeks our precommits have gotten somewhat > flaky. > > Some of this is due to actual flaky tests (most of which are tracked by > > JIRAs) but a lot has been due to issues like clock synchronization > problems > > on the dist-test slaves. > > > > I'd like to consider changing precommit to retry _all_ tests up to 3 > times, > > instead of just known-flakies. It's a bit of a heavy hammer -- the risk > is > > that if you introduce flakiness in a test you aren't likely to see it > > precommit, but I think the upside of avoiding wasted effort triaging > failed > > precommits is probably worth it. > > > > Longer term hopefully we can improve the dist-test software to support > > something like a "retry if results match a certain regex" to check for > > clock sync errors or somesuch, but I think it's non-trivial. > > > > Thoughts? > > > > -Todd > > -- > > Todd Lipcon > > Software Engineer, Cloudera > -- Todd Lipcon Software Engineer, Cloudera
