Hi,

I'd like to get feedback on the subj.

The built-in NTP client for Kudu masters and tablet servers was introduced
in Kudu 1.11.0.  Back then, there were thoughts of switching to the
built-in client by default starting Kudu 1.12.

Since it's time for cutting 1.12 release branch pretty soon, I think it's a
good opportunity to clarify on whether we want to make that change or we
want to keep the time source as is in 1.12 release.

For more context, the built-in NTP client has been used to run external
mini-cluster-based test scenarios since 1.11.0 release in gerrit pre-commit
builds.  In addition, I ran a 6 node cluster for a few weeks in public
cloud with basic write/read workload ('kudu perf loadgen' with the
--run_scan option).  So far I've seen no issues there.  As for the use in a
production environment, at this point I'm not aware of any Kudu clusters
running in production using the built-in NTP client.

The benefit of the internal built-in NTP client is that it allows to run
Kudu without the requirement of having the local machines' clocks
synchronized by the kernel NTP discipline.  That might benefit newer Kudu
installations where machines' clocks are not synchronized out-of-the-box
and users struggle to deploy NTP servers (and configure them appropriately
if the default configuration is not good enough -- e.g., in case of
firewalled internal clusters).

If we switch to the 'builtin' time source by default (i.e. use the built-in
NTP client), existing installations running with the 'system' time source
will need to add an extra flag if it's desired to stay with the 'system'
time source after the upgrade to 1.12.  In that regard, the update would
not be backwards-compatible, but Kudu users should not care much about the
clock source assuming the built-in NTP client is reliable enough.  Also, in
case of Kudu clusters running without access to the internet, it will be
necessary to point the built-in NTP client to some internal NTP servers
since pool.ntp.org servers (the default servers for the built-in NTP
client) might not be accessible.

So, it seems enabling the built-in NTP client by default could benefit
newer installations, but might require extra configuration steps for
existing Kudu deployments where pool.ntp.org NTP servers are not
accessible.  The latter step should be described in the release notes for
1.12 release, of course.  Also, there is some risk of hitting a not-yet
detected bug in the built-in NTP client.

Do you think the benefits of removing the requirement to have the local
clock synchronized outweighs the drawbacks of adding an extra configuration
step during 1.12 upgrade for Kudu clusters isolated from the Internet?

Your feedback is highly appreciated!


Thanks,

Alexey

Reply via email to