Thanks for the detailed summary. I agree with the goal of using the same time source across all tests if possible. Originally, I was hoping this would be 'builtin', but that'd require deploying MiniChronyd too, both to avoid a dependency on Internet connectivity and to ensure that clock synchronization at test startup is sufficiently fast. And, like you said, not only is that a good chunk of work, it's also somewhat antithetical to those pure "in-proc" test setups.
So I reluctantly agree that 'system_unsync' may be the next best option, with targeted uses of 'system' and 'builtin' for testing specific functionality. That said, is it safe that 'system_unsync' uses CLOCK_REALTIME under the hood, which is affected by NTP and admin time changes? On Sun, Oct 6, 2019 at 11:27 PM Alexey Serbin <aser...@cloudera.com.invalid> wrote: > > Hi, > > Recently, built-in NTP client was been introduced in Kudu. Todd Lipcon > put together the original WIP patch a couple of years ago [1], and about a > week > ago the built-in NTP client was merged into the master branch of the Kudu > repo. > With that, a new time source is now available for Kudu masters and tablet > servers: the built-in NTP client. > > With the introduction of the new time source for Kudu components, we have > had a few offline discussions about using different time sources in Kudu > tests. That's not only about providing test coverage for the newly > introduced > built-in NTP client, but for all other tests as well. Last week Adar and I > talked about that offline one more time. I'll try to summarize few key > points in this e-mail message. > > With the introduction of the built-in NTP client, a significant part of > tests > was switched to run with it as the time source. Particularly, all tests > based on ExternalMiniCluster are now run with built-in NTP client with > commit > 03e2ada69 [2]. The idea is to have more coverage for the newly added > functionality, especially given the fact that at some point we might switch > to use the built-in NTP client by default instead of relying on the local > machine clock synchronized by the system NTP service. > > There are many other Kudu tests (e.g., tests based on InternalMiniCluster) > which still require the machine clock to be synchronized with NTP to run. > Ideally, it would be great to run all Kudu tests with using same time source > unless they are specifically targeted to verify functionality of a > particular > time source. An example of the latter are tests which cover the > functionality > of the built-in NTP client itself. > > Prior to the introduction of the built-in NTP client, almost all Kudu tests > were running against the local system clock synchronized with NTP, at least > on Linux. The only exception were tests for read-your-writes and other > consistency guarantees: those use mock time source to control the time > in a very specific way. > > In retrospective, it was great to have a single time source because we > always > knew that the same time source was used for almost all of our tests. Also, > the time source used in Kudu tests was the same as in Kudu masters and > tablet > servers running in real distributed clusters. From the other side, that > required local machine's clock to be synchronized with NTP to run those > tests, > and the tests would fail if the time was not synchronized with NTP. > In addition, as far as I know, there are no Kudu tests which are targeted > to detect inconsistencies due to irregularities in the time source or > non-synchronized clocks between different Kudu nodes. Of course, we have > Jepsen-based tests, but that's another story: we are not talking about > Jepsen > tests here, only C++ and Java tests based on gtest and JUnit frameworks. > > Now, here comes one observation: all the components of above mentioned tests > are running at the same machine during execution of corresponding scenarios. > If all of them are using the local machine's clock, then there is no need > to use NTP or other synchronisation techniques designed to synchronize > multiple clocks. > > So, here comes the question: why to require the local clock to be > synchronized > with NTP for tests? Yes, we need to verify that the 'system' time source > works for HybridTime timestamps, but as for the functionality of a generic > Kudu test which is simply based on the HybridTime timestamps without any > knowledge of the underlying time source, why is it important? > > It seems we can safely change the time source in those generic tests to be > 'system_unsync': local machine clock which is not necessarily synchronized > by > the kernel NTP discipline (see SystemUnsyncTime in > $KUDU_ROOT/src/kudu/clock/system_unsync_time.{h,cc} for details). > > Once switched to 'system_unsync' time source for tests, in addition it will > be necessary to add new dedicated tests scenarios such as: > * Smoke scenarios: using the system NTP time source for Kudu clusters. > * Smoke scenarios: using the built-in NTP time source for Kudu clusters. > * Advanced scenarios to detect issues due to irregularities in time source > (like time jumping back and forth, etc.). > > What do you think? Your feedback is appreciated. > > Thank you! > > > > Kind regards, > > Alexey > > [1] http://gerrit.cloudera.org:8080/7477 > [2] > https://github.com/apache/kudu/commit/03e2ada694290cafce0bea6ebbf092709aa64a2a