Re: Switching time source to the built-in NTP client

2020-03-16 Thread Adar Lieber-Dembo
I share the two concerns you highlighted, Alexey.
1. This would be a backwards incompatible change.
2. The default NTP server list could be unreachable, or could be a
poorer choice than whatever the cluster is currently using. Grant's
suggestion could mitigate that somewhat, but it's sort of weird for
Kudu to go rooting around in system configuration files, not to
mention the possibility that we could get it wrong.

To that I'll add a third concern:
3. The build-in NTP client has an awful lot of TODOs. Does it work
correctly when NTP servers misbehave? I presume chronyd and ntpd are
battle-tested in this regard.

Taken together, I'd be hesitant to change the default time source, at
least not without more concrete feedback suggesting that it'd be an
improvement for the vast majority of our user base. Ad hoc switching
is always an option when the system time source doesn't work.

On Mon, Mar 16, 2020 at 8:12 PM Grant Henke  wrote:
>
> >
> > Also, in
> > case of Kudu clusters running without access to the internet, it will be
> > necessary to point the built-in NTP client to some internal NTP servers
> > since pool.ntp.org servers (the default servers for the built-in NTP
> > client) might not be accessible.
> >
>
> I think this was discussed at some point, but I don't remember the
> outcome/answer.
> Would it be possible to load the ntp servers from /etc/ntp.conf
> or /etc/chrony.conf if --builtin_ntp_servers
> aren't specified? We could still fall back to the default pool.ntp.org if
> an ntp configuration isn't found.
> Looking in two files isn't great, but we could give preference to
> chrony.conf.
>
> Thank you,
> Grant
>
>
> On Mon, Mar 16, 2020 at 9:16 PM Alexey Serbin  wrote:
>
> > Hi,
> >
> > I'd like to get feedback on the subj, please.
> >
> > The built-in NTP client for Kudu masters and tablet servers was introduced
> > in Kudu 1.11.0.  Back then, there were thoughts of switching to the
> > built-in client by default starting Kudu 1.12.
> >
> > Since it's time for cutting 1.12 release branch pretty soon, I think it's a
> > good opportunity to clarify on whether we want to make that change or we
> > want to keep the time source as is (i.e. 'system') in 1.12 release.
> >
> > For more context, the built-in NTP client has been used to run external
> > mini-cluster-based test scenarios since 1.11.0 release for every gerrit
> > pre-commit build.  In addition, I ran a 6 node cluster for a few weeks at
> > two clusters cluster in public cloud with basic write/read workload ('kudu
> > perf loadgen' with the --run_scan option).  So far I've seen no issues
> > there.  As for the use in a production environment, at this point I'm not
> > aware of any Kudu clusters running in production using the built-in NTP
> > client.
> >
> > The benefit of the internal built-in NTP client is that it allows to run
> > Kudu without the requirement of having the local machines' clocks
> > synchronized by the kernel NTP discipline.  That might benefit newer Kudu
> > installations where machines' clocks are not synchronized out-of-the-box
> > and users are not keen performing an extra step deploying NTP servers (and
> > configure them appropriately if the default configuration is not good
> > enough -- e.g., in case of fire-walled internal clusters).
> >
> > If we switch to the 'builtin' time source by default (i.e. use the built-in
> > NTP client), existing installations running with the 'system' time source
> > will need to add an extra flag if it's desired to stay with the 'system'
> > time source after the upgrade to 1.12.  In that regard, the update would
> > not be backwards-compatible, but Kudu users should not care much about the
> > clock source assuming the built-in NTP client is reliable enough.  Also, in
> > case of Kudu clusters running without access to the internet, it will be
> > necessary to point the built-in NTP client to some internal NTP servers
> > since pool.ntp.org servers (the default servers for the built-in NTP
> > client) might not be accessible.
> >
> > So, it seems enabling the built-in NTP client by default could benefit
> > newer installations, but might require extra configuration steps for
> > existing Kudu deployments where pool.ntp.org NTP servers are not
> > accessible.  The latter step should be described in the release notes for
> > 1.12 release, of course.  Also, there is some risk of hitting a not-yet
> > detected bug in the built-in NTP client.
> >
> > Do you think the benefits of removing the requirement to have the local
> > clock synchronized by local NTP server outweighs the drawbacks of adding an
> > extra configuration step during 1.12 upgrade for Kudu clusters isolated
> > from the Internet?
> >
> > Your feedback is highly appreciated!
> >
> >
> > Thanks,
> >
> > Alexey
> >
> >
> > P.S. I sent the original message one week ago, but it seems it went into
> > spam box or alike, so I'm re-sending it.
> >
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twi

Re: Switching time source to the built-in NTP client

2020-03-16 Thread Grant Henke
>
> Also, in
> case of Kudu clusters running without access to the internet, it will be
> necessary to point the built-in NTP client to some internal NTP servers
> since pool.ntp.org servers (the default servers for the built-in NTP
> client) might not be accessible.
>

I think this was discussed at some point, but I don't remember the
outcome/answer.
Would it be possible to load the ntp servers from /etc/ntp.conf
or /etc/chrony.conf if --builtin_ntp_servers
aren't specified? We could still fall back to the default pool.ntp.org if
an ntp configuration isn't found.
Looking in two files isn't great, but we could give preference to
chrony.conf.

Thank you,
Grant


On Mon, Mar 16, 2020 at 9:16 PM Alexey Serbin  wrote:

> Hi,
>
> I'd like to get feedback on the subj, please.
>
> The built-in NTP client for Kudu masters and tablet servers was introduced
> in Kudu 1.11.0.  Back then, there were thoughts of switching to the
> built-in client by default starting Kudu 1.12.
>
> Since it's time for cutting 1.12 release branch pretty soon, I think it's a
> good opportunity to clarify on whether we want to make that change or we
> want to keep the time source as is (i.e. 'system') in 1.12 release.
>
> For more context, the built-in NTP client has been used to run external
> mini-cluster-based test scenarios since 1.11.0 release for every gerrit
> pre-commit build.  In addition, I ran a 6 node cluster for a few weeks at
> two clusters cluster in public cloud with basic write/read workload ('kudu
> perf loadgen' with the --run_scan option).  So far I've seen no issues
> there.  As for the use in a production environment, at this point I'm not
> aware of any Kudu clusters running in production using the built-in NTP
> client.
>
> The benefit of the internal built-in NTP client is that it allows to run
> Kudu without the requirement of having the local machines' clocks
> synchronized by the kernel NTP discipline.  That might benefit newer Kudu
> installations where machines' clocks are not synchronized out-of-the-box
> and users are not keen performing an extra step deploying NTP servers (and
> configure them appropriately if the default configuration is not good
> enough -- e.g., in case of fire-walled internal clusters).
>
> If we switch to the 'builtin' time source by default (i.e. use the built-in
> NTP client), existing installations running with the 'system' time source
> will need to add an extra flag if it's desired to stay with the 'system'
> time source after the upgrade to 1.12.  In that regard, the update would
> not be backwards-compatible, but Kudu users should not care much about the
> clock source assuming the built-in NTP client is reliable enough.  Also, in
> case of Kudu clusters running without access to the internet, it will be
> necessary to point the built-in NTP client to some internal NTP servers
> since pool.ntp.org servers (the default servers for the built-in NTP
> client) might not be accessible.
>
> So, it seems enabling the built-in NTP client by default could benefit
> newer installations, but might require extra configuration steps for
> existing Kudu deployments where pool.ntp.org NTP servers are not
> accessible.  The latter step should be described in the release notes for
> 1.12 release, of course.  Also, there is some risk of hitting a not-yet
> detected bug in the built-in NTP client.
>
> Do you think the benefits of removing the requirement to have the local
> clock synchronized by local NTP server outweighs the drawbacks of adding an
> extra configuration step during 1.12 upgrade for Kudu clusters isolated
> from the Internet?
>
> Your feedback is highly appreciated!
>
>
> Thanks,
>
> Alexey
>
>
> P.S. I sent the original message one week ago, but it seems it went into
> spam box or alike, so I'm re-sending it.
>


-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Switching time source to the built-in NTP client

2020-03-16 Thread Alexey Serbin
Hi,

I'd like to get feedback on the subj, please.

The built-in NTP client for Kudu masters and tablet servers was introduced
in Kudu 1.11.0.  Back then, there were thoughts of switching to the
built-in client by default starting Kudu 1.12.

Since it's time for cutting 1.12 release branch pretty soon, I think it's a
good opportunity to clarify on whether we want to make that change or we
want to keep the time source as is (i.e. 'system') in 1.12 release.

For more context, the built-in NTP client has been used to run external
mini-cluster-based test scenarios since 1.11.0 release for every gerrit
pre-commit build.  In addition, I ran a 6 node cluster for a few weeks at
two clusters cluster in public cloud with basic write/read workload ('kudu
perf loadgen' with the --run_scan option).  So far I've seen no issues
there.  As for the use in a production environment, at this point I'm not
aware of any Kudu clusters running in production using the built-in NTP
client.

The benefit of the internal built-in NTP client is that it allows to run
Kudu without the requirement of having the local machines' clocks
synchronized by the kernel NTP discipline.  That might benefit newer Kudu
installations where machines' clocks are not synchronized out-of-the-box
and users are not keen performing an extra step deploying NTP servers (and
configure them appropriately if the default configuration is not good
enough -- e.g., in case of fire-walled internal clusters).

If we switch to the 'builtin' time source by default (i.e. use the built-in
NTP client), existing installations running with the 'system' time source
will need to add an extra flag if it's desired to stay with the 'system'
time source after the upgrade to 1.12.  In that regard, the update would
not be backwards-compatible, but Kudu users should not care much about the
clock source assuming the built-in NTP client is reliable enough.  Also, in
case of Kudu clusters running without access to the internet, it will be
necessary to point the built-in NTP client to some internal NTP servers
since pool.ntp.org servers (the default servers for the built-in NTP
client) might not be accessible.

So, it seems enabling the built-in NTP client by default could benefit
newer installations, but might require extra configuration steps for
existing Kudu deployments where pool.ntp.org NTP servers are not
accessible.  The latter step should be described in the release notes for
1.12 release, of course.  Also, there is some risk of hitting a not-yet
detected bug in the built-in NTP client.

Do you think the benefits of removing the requirement to have the local
clock synchronized by local NTP server outweighs the drawbacks of adding an
extra configuration step during 1.12 upgrade for Kudu clusters isolated
from the Internet?

Your feedback is highly appreciated!


Thanks,

Alexey


P.S. I sent the original message one week ago, but it seems it went into
spam box or alike, so I'm re-sending it.