> I do think the 'NTP in kudu' could help a bit here, especially if it were
> only used as a "backup" in case the kernel is unsynchronized. I'm a little
> nervous about the impact on NTP servers, though, in our minicluster based
> tests where we might start and stop tens of thousands of times in the
> course of a 15-minute dist-test run. Wouldn't be surprised if that caused
> us to get blacklisted unless we took some effort to ensure that
> miniclusters "reuse" some NTP state instead of resynchronizing at startup.

If like you said we were to limit the user-space NTP client to only
operate in cases where the system ntpd isn't working properly, would
that mitigate the impact on remote NTP servers?

I didn't say this in my first reply, but I really do value precommit's
ability to surface new flaky tests. When it happens it serves as a
good reminder that my new tests should be run in dist-test. It's more
expensive (in terms of troubleshooting time) to figure that out after
the fact; we don't monitor the flaky test dashboard as much as we
should. That's why I'm interested in finding solutions to the general
"precommit tests failed due to flaky infra" problem.

Reply via email to