On Fri, Mar 23, 2018 at 11:03 AM, Adar Lieber-Dembo <[email protected]>
wrote:

> The clock sync errors do seem to have increased over the past few
> months. If we could just fix those, I think we'd be left with almost
> entirely "known" flakies. Any ideas as to what's going on? Think it's
> something that could be addressed with your NTP-client-in-Kudu patch
> series?
>

Nothing has changed in the node configurations as far as I'm aware, so not
sure why it appears to have become more common lately.

I do think the 'NTP in kudu' could help a bit here, especially if it were
only used as a "backup" in case the kernel is unsynchronized. I'm a little
nervous about the impact on NTP servers, though, in our minicluster based
tests where we might start and stop tens of thousands of times in the
course of a 15-minute dist-test run. Wouldn't be surprised if that caused
us to get blacklisted unless we took some effort to ensure that
miniclusters "reuse" some NTP state instead of resynchronizing at startup.


>
> On Fri, Mar 23, 2018 at 9:58 AM, Todd Lipcon <[email protected]> wrote:
> > It seems that over recent weeks our precommits have gotten somewhat
> flaky.
> > Some of this is due to actual flaky tests (most of which are tracked by
> > JIRAs) but a lot has been due to issues like clock synchronization
> problems
> > on the dist-test slaves.
> >
> > I'd like to consider changing precommit to retry _all_ tests up to 3
> times,
> > instead of just known-flakies. It's a bit of a heavy hammer -- the risk
> is
> > that if you introduce flakiness in a test you aren't likely to see it
> > precommit, but I think the upside of avoiding wasted effort triaging
> failed
> > precommits is probably worth it.
> >
> > Longer term hopefully we can improve the dist-test software to support
> > something like a "retry if results match a certain regex" to check for
> > clock sync errors or somesuch, but I think it's non-trivial.
> >
> > Thoughts?
> >
> > -Todd
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to