> I do think the 'NTP in kudu' could help a bit here, especially if it were > only used as a "backup" in case the kernel is unsynchronized. I'm a little > nervous about the impact on NTP servers, though, in our minicluster based > tests where we might start and stop tens of thousands of times in the > course of a 15-minute dist-test run. Wouldn't be surprised if that caused > us to get blacklisted unless we took some effort to ensure that > miniclusters "reuse" some NTP state instead of resynchronizing at startup.
If like you said we were to limit the user-space NTP client to only operate in cases where the system ntpd isn't working properly, would that mitigate the impact on remote NTP servers? I didn't say this in my first reply, but I really do value precommit's ability to surface new flaky tests. When it happens it serves as a good reminder that my new tests should be run in dist-test. It's more expensive (in terms of troubleshooting time) to figure that out after the fact; we don't monitor the flaky test dashboard as much as we should. That's why I'm interested in finding solutions to the general "precommit tests failed due to flaky infra" problem.
