Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
Thanks, yes, that did the trick. I’ve installed it on one of my test computers. It seems to work fine. I’ve tried a couple of things that should trigger the bug, if it’s still there. In particular, rebooting the computer then examining the journal shows that it does get to poll all of the configured pools. Another experiment I have tried is this: Boot the computer and let everything settle. un-plug the ethernet (RJ45) restart ntpsec and watch the journal for evidence of attempts to contact the configured pools (which will fail, since the ethernet is disconnected) re-plug the ethernet and continue to watch the journal for attempts to contact the pools (which succeed this time). In this experiment, I saw no evidence of the bug — i.e. no evidence of having forgotten all but one of the configured pools. Thanks for all your help! Rick > On Mar 25, 2019, at 4:30 PM, Richard Laager wrote: > >> sudo apt update >> sudo apt install build-essential devscripts >> sudo apt build-dep ntpsec >> apt source ntpsec >> cd ntpsec-1.1.3+dfsg1 >> patch -p1 < ~/ntpsec_1.1.3+dfsg1-3~1.gbpdde9c0.debdiff > > I missed a step here, as `apt source` applies the existing patches, so > we apparently need to apply this patch too: > > patch -p1 < debian/patches/0001-Fix-for-577-DNS-retry-sloth.patch > >> debuild -uc -us > If you can try this and confirm this patch as packaged fixes your > problem, that would help me move forward with this. > > -- > Richard
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
> sudo apt update > sudo apt install build-essential devscripts > sudo apt build-dep ntpsec > apt source ntpsec > cd ntpsec-1.1.3+dfsg1 > patch -p1 < ~/ntpsec_1.1.3+dfsg1-3~1.gbpdde9c0.debdiff I missed a step here, as `apt source` applies the existing patches, so we apparently need to apply this patch too: patch -p1 < debian/patches/0001-Fix-for-577-DNS-retry-sloth.patch > debuild -uc -us If you can try this and confirm this patch as packaged fixes your problem, that would help me move forward with this. -- Richard
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
Hi Richard, I’m sorry that I was not able to try that patch for you. However, I was able to download and (with a lot of help from Hal Murray) build the latest git version. It worked a treat and properly handled the time lag between ntpsec starting and dhcp finishing. Let me know if there’s anything more I can do to help get this fix into production. Thanks very much for all your help! Enjoy! Rick > On Mar 19, 2019, at 10:18 AM, Richard Laager wrote: > > Attached is an untested debdiff. This is the upstream change refreshed > to apply to the package. You should be able to apply it and build a > package locally like this: > > sudo apt update > sudo apt install build-essential devscripts > sudo apt build-dep ntpsec > apt source ntpsec > cd ntpsec-1.1.3+dfsg1 > patch -p1 < ~/ntpsec_1.1.3+dfsg1-3~1.gbpdde9c0.debdiff > debuild -uc -us > > Can you try that and see if it fixes the issue for you? > > I'm sorry I'm short on time today and couldn't do further testing > myself. I hope this helps. > > -- > Richard >
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
> On Mar 19, 2019, at 10:18 AM, Richard Laager wrote: > > Attached is an untested debdiff. This is the upstream change refreshed > to apply to the package. You should be able to apply it and build a > package locally like this: > > sudo apt update > sudo apt install build-essential devscripts > sudo apt build-dep ntpsec > apt source ntpsec > cd ntpsec-1.1.3+dfsg1 > patch -p1 < ~/ntpsec_1.1.3+dfsg1-3~1.gbpdde9c0.debdiff > debuild -uc -us > > Can you try that and see if it fixes the issue for you? > > I'm sorry I'm short on time today and couldn't do further testing > myself. I hope this helps. > > -- > Richard > Thanks for the simple and very clear instructions! Unfortunately, here’s what I get when I do the above steps… Did I miss a something? Rick > rbthomas@cube:~/make-ntpsec/ntpsec-1.1.3+dfsg1$ debuild -uc -us > dpkg-buildpackage -us -uc -ui > dpkg-buildpackage: info: source package ntpsec > dpkg-buildpackage: info: source version 1.1.3+dfsg1-3~1.gbpdde9c0 > dpkg-buildpackage: info: source distribution UNRELEASED > dpkg-buildpackage: info: source changed by Richard Laager > dpkg-source --before-build . > dpkg-buildpackage: info: host architecture armhf > debian/rules clean > dh clean --with apache2,python3 >debian/rules override_dh_auto_clean > make[1]: Entering directory '/home/rbthomas/make-ntpsec/ntpsec-1.1.3+dfsg1' > python3 waf clean || true > --- cleaning host --- > The project was not configured: run "waf configure" first! > rm -rf \ > build \ > .lock-waf_*_build \ > ntpclients/ntp \ > ntpd/version.h \ > pylib/control.py \ > pylib/magic.py \ > pylib/version.py \ > .waf* \ > wafhelpers/autorevision.sh > find -name "*.pyc" -delete > make[1]: Leaving directory '/home/rbthomas/make-ntpsec/ntpsec-1.1.3+dfsg1' >dh_clean > dpkg-source -b . > dpkg-source: info: using source format '3.0 (quilt)' > dpkg-source: info: building ntpsec using existing > ./ntpsec_1.1.3+dfsg1.orig.tar.gz > dpkg-source: info: using patch list from debian/patches/series > dpkg-source: warning: ignoring deletion of directory build > dpkg-source: warning: ignoring deletion of directory build/main > dpkg-source: warning: ignoring deletion of directory build/main/ntpd > dpkg-source: warning: ignoring deletion of file build/main/ntpd/ntpd.8, use > --include-removal to override > dpkg-source: warning: ignoring deletion of file build/main/ntpd/ntp.keys.5, > use --include-removal to override > dpkg-source: warning: ignoring deletion of file build/main/ntpd/ntp.conf.5, > use --include-removal to override > dpkg-source: warning: ignoring deletion of directory build/main/ntptime > dpkg-source: warning: ignoring deletion of file build/main/ntptime/ntptime.8, > use --include-removal to override > dpkg-source: warning: ignoring deletion of directory build/main/ntpfrob > dpkg-source: warning: ignoring deletion of file build/main/ntpfrob/ntpfrob.8, > use --include-removal to override > dpkg-source: warning: ignoring deletion of directory build/main/ntpclients > dpkg-source: warning: ignoring deletion of file > build/main/ntpclients/ntpviz.1, use --include-removal to override > dpkg-source: warning: ignoring deletion of file > build/main/ntpclients/ntpdig.1, use --include-removal to override > dpkg-source: warning: ignoring deletion of file > build/main/ntpclients/ntpleapfetch.8, use --include-removal to override > dpkg-source: warning: ignoring deletion of file > build/main/ntpclients/ntpmon.1, use --include-removal to override > dpkg-source: warning: ignoring deletion of file build/main/ntpclients/ntpq.1, > use --include-removal to override > dpkg-source: warning: ignoring deletion of file > build/main/ntpclients/ntptrace.1, use --include-removal to override > dpkg-source: warning: ignoring deletion of file > build/main/ntpclients/ntpwait.8, use --include-removal to override > dpkg-source: warning: ignoring deletion of file > build/main/ntpclients/ntpkeygen.8, use --include-removal to override > dpkg-source: warning: ignoring deletion of file > build/main/ntpclients/ntploggps.1, use --include-removal to override > dpkg-source: warning: ignoring deletion of file > build/main/ntpclients/ntpsnmpd.8, use --include-removal to override > dpkg-source: warning: ignoring deletion of file > build/main/ntpclients/ntpsweep.1, use --include-removal to override > dpkg-source: warning: ignoring deletion of file > build/main/ntpclients/ntplogtemp.1, use --include-removal to override > dpkg-source: info: local changes detected, the modified files are: > ntpsec-1.1.3+dfsg1/ntpd/ntp_proto.c > dpkg-source: error: aborting due to unexpected upstream changes, see > /tmp/ntpsec_1.1.3+dfsg1-3~1.gbpdde9c0.diff.6WnYJh > dpkg-source: info: you can integrate the local changes with dpkg-source > --commit > dpkg-buildpackage: error: dpkg-source -b . subprocess returned exit status 2 > debuild: fatal error at line 1182: > dpkg-buildpackage -us -uc -ui failed >
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
Thanks! I’ll give it a try tonight… Rick > On Mar 19, 2019, at 10:18 AM, Richard Laager wrote: > > Attached is an untested debdiff. This is the upstream change refreshed > to apply to the package. You should be able to apply it and build a > package locally like this: > > sudo apt update > sudo apt install build-essential devscripts > sudo apt build-dep ntpsec > apt source ntpsec > cd ntpsec-1.1.3+dfsg1 > patch -p1 < ~/ntpsec_1.1.3+dfsg1-3~1.gbpdde9c0.debdiff > debuild -uc -us > > Can you try that and see if it fixes the issue for you? > > I'm sorry I'm short on time today and couldn't do further testing > myself. I hope this helps. > > -- > Richard >
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
Attached is an untested debdiff. This is the upstream change refreshed to apply to the package. You should be able to apply it and build a package locally like this: sudo apt update sudo apt install build-essential devscripts sudo apt build-dep ntpsec apt source ntpsec cd ntpsec-1.1.3+dfsg1 patch -p1 < ~/ntpsec_1.1.3+dfsg1-3~1.gbpdde9c0.debdiff debuild -uc -us Can you try that and see if it fixes the issue for you? I'm sorry I'm short on time today and couldn't do further testing myself. I hope this helps. -- Richard diff -Nru ntpsec-1.1.3+dfsg1/debian/changelog ntpsec-1.1.3+dfsg1/debian/changelog --- ntpsec-1.1.3+dfsg1/debian/changelog 2019-02-04 01:38:48.0 -0600 +++ ntpsec-1.1.3+dfsg1/debian/changelog 2019-03-19 12:07:34.0 -0500 @@ -1,3 +1,11 @@ +ntpsec (1.1.3+dfsg1-3~1.gbpdde9c0) UNRELEASED; urgency=medium + + ** SNAPSHOT build @dde9c0d5f1cc978b7542c64b660465b6ecd0d0fa ** + + * Backport fix for slow DNS retries (Closes: 924192) + + -- Richard Laager Tue, 19 Mar 2019 12:07:34 -0500 + ntpsec (1.1.3+dfsg1-2) unstable; urgency=medium * Suppress lintian warning diff -Nru ntpsec-1.1.3+dfsg1/debian/patches/0001-Fix-for-577-DNS-retry-sloth.patch ntpsec-1.1.3+dfsg1/debian/patches/0001-Fix-for-577-DNS-retry-sloth.patch --- ntpsec-1.1.3+dfsg1/debian/patches/0001-Fix-for-577-DNS-retry-sloth.patch 1969-12-31 18:00:00.0 -0600 +++ ntpsec-1.1.3+dfsg1/debian/patches/0001-Fix-for-577-DNS-retry-sloth.patch 2019-03-19 11:59:46.0 -0500 @@ -0,0 +1,56 @@ +From bf3dfbe30ad16b4d345dfe9d6c6d842d9321355f Mon Sep 17 00:00:00 2001 +From: Hal Murray +Date: Sat, 16 Mar 2019 11:07:41 -0700 +Subject: [PATCH] Fix for #577, DNS retry sloth + +There is only one thread for DNS (and NTS-KE) work. If an attempt +was made while the thread was busy, it waited for the retry timer +rather than trying again as soon as the previous DNS work finished. +--- + ntpd/ntp_proto.c | 20 +--- + 1 file changed, 17 insertions(+), 3 deletions(-) + +--- a/ntpd/ntp_proto.c b/ntpd/ntp_proto.c +@@ -811,7 +811,11 @@ + if ((peer_associations <= 2 * sys_maxclock) && + (peer_associations < sys_maxclock || +sys_survivors < sys_minclock)) +- if (!dns_probe(peer)) return; ++ if (!dns_probe(peer)) { ++ /* DNS thread busy, try again soon */ ++ peer->nextdate = current_time; ++ return; ++ } + poll_update(peer, hpoll); + return; + } +@@ -819,7 +823,10 @@ + /* Does server need DNS lookup? */ + if (peer->cfg.flags & FLAG_DNS) { + peer->outdate = current_time; +- if (!dns_probe(peer)) return; ++ if (!dns_probe(peer)) { ++ peer->nextdate = current_time; ++ return; ++ } + poll_update(peer, hpoll); + return; + } +@@ -2419,8 +2426,15 @@ + hpoll = 8; + break; + case DNS_temp: ++ /* DNS not working yet. ?? ++ * Want to retry soon, ++ * but also want to avoid log clutter. ++ * Beware, Fedora 29 lies: ++ * What I expect to be temp (no Wifi) ++ * gets EAI_NONAME, Name or service not known ++ */ + txt = "temp"; +- hpoll += 1; ++ hpoll = 3; + break; + case DNS_error: + txt = "error"; diff -Nru ntpsec-1.1.3+dfsg1/debian/patches/series ntpsec-1.1.3+dfsg1/debian/patches/series --- ntpsec-1.1.3+dfsg1/debian/patches/series2019-02-04 01:37:36.0 -0600 +++ ntpsec-1.1.3+dfsg1/debian/patches/series2019-03-19 11:47:37.0 -0500 @@ -42,6 +42,8 @@ 0001-Use-.egg-info-for-the-Python-info-file.patch ## Fix a spelling error 0001-Fix-a-typo.2.patch +## Fix DNS timeouts +0001-Fix-for-577-DNS-retry-sloth.patch # Forwarded 0001-Add-Documentation-to-ntp-wait.service.patch
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
> On Mar 14, 2019, at 1:04 AM, Richard Laager wrote: > > I forwarded your bug upstream: > https://gitlab.com/NTPsec/ntpsec/issues/577 Hi! I’m sorry to take so long getting back. I wanted to re-do my experiments in a standard environment that your would be able to reproduce easily. Here are the results. The system is a Cubox-I4Pro (2 GB RAM, quad-core ArmHF processor) The OS is fresh-out-of-the-box Debian Buster with a default “multi-user” text console configuration (i.e. no GUI). Things I have done to show the problem: 1) I modified the /etc/ntpsec/ntp.conf to use only two of the four available “debian.pool.ntp.org” pools. In my original use-case, one of these pools would be on the local intranet, while the other would be an internet pool to act as a ballast incase the local intranet pool has problems. 2) I further modified the ntp.conf to have “tos minclock 7 minsane 5”. The reason for this is to force it to use some servers from the “backup” pool to allow a smooth transition in the above-mentioned failure case. 3) To monitor the behavior under this setup I run an “@reboot” cron job that consists of the following shell sript: cut here = #!/bin/bash for i in `seq 1 130` do echo -n "$i"; date; /usr/bin/ntpq -pn ; /usr/bin/ntpstat echo sleep 30 done > /tmp/monitor.$$.out 2>&1 journalctl -b | egrep 'ntp|eth' > /tmp/journal.$$.out cut here = I have attached copies of the monitor and journal output files from this script. Hi-lights of the results from the monitor file: at 02:34:27 the script starts as part of the reboot process. ntpsec has not started yet. by 02:34:58 (31 seconds later) ntpsec has started and we have contacted four servers from one of the pools, but due to minsane=5 it is unable to synchronize. Note that the time taken from the system’s hardware clock at reboot is about a half second off from network time from these servers. at 02:38:32 (about the 4 minutes mark) this situation has continued unabated. The same four servers without any progress synchronizing. System time is still a half second off from network time. at 02:39:01 (29 seconds later) Something has happened to cause us to contact another group of 4 servers. Also, the system clock has been stepped to pick up the half-second. at 02:39:32 (about the 5 minute mark) We’re starting to get results from the (now) eight servers on our list, and we have finally achieved useful synchronization. Hi-lights of the results from the journal file: at 02:34:27 ntpd starts. But DHCP hasn’t yet got an IP address for the ethernet port. at 02:34:30 the ethernet link comes up. In the next few seconds, ntpd unsuccessfully tries to contact first 0.debian.pool.ntp.org, then 1.debian.pool.ntp.org (twice). at 02:34:35 DHCP finally gets an answer and an IP address is assigned to the ethernet port. at 02:34:36 ntpd tries 1.debian.pool.ntp.org for a third time (ignoring 0.debian.pool.ntp.org for some reason) This time it succeeds and gets four server addresses as we see in the monitor log at 02:34:58. Nothing happens for about 4 and a half minutes. at 02:38:52 ntpd tries 1.debian.pool.ntp.org again (still no mention of 0.debian.pool.ntp.org) and gets four different server addresses because the timer at the DNS server expired and caused it to remix the server addresses. at 02:38:58 ntpd steps the clock to pick up the missing half second. Nothing happens for the next roughly 25 minutes, until the test period expires. Observations: The system spent it’s first 4.5 minutes of life with an unsynchronized clock. ntpd never used the 0.debian.pool.ntp.org pool at all. It seems to have been completely forgotten after the first failed attempt. Conclusions (thinks I’d like to see in future versions): As long as minsane or minclock are unsatisfied, I’d like to see it attempting to use all the servers and pools at its disposal, not just the single most recently seen one. I’d also like to see it trying to contact DNS more frequently than once every 4.5 minutes. Enjoy! Rick
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
I forwarded your bug upstream: https://gitlab.com/NTPsec/ntpsec/issues/577 Getting back to the network-online thing for a bit... You tried adding network-online.target as a dependency of ntpsec.service, for example, by something like this (as root)? mkdir -p /etc/systemd/systemd/ntpsec.service.d cat > /etc/systemd/systemd/ntpsec.service.d/network.conf << EOF After=network-online.target Wants=network-online.target EOF How are you configuring your network (NetworkManager, systemd-networkd, or ifupdown), and what do these return? systemctl is-enabled NetworkManager-wait-online.service systemctl is-enabled systemd-networkd-wait-online.service systemctl is-enabled ifupdown-wait-online.service If the right one is not enabled, what happens if you try enabling it? Does that (plus the drop-in above to add the network-online dependency) cause ntpd to wait until the network is actually up? -- Richard
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
On 3/11/19 9:50 PM, Rick Thomas wrote: > Would unplugging/replugging the ethernet cable work for steps 2 and 4… Sure, that'd be another option. -- Richard
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
Thanks Richard, I’ll see if I can set up that experiment(s) and report back ASAP. Would unplugging/replugging the ethernet cable work for steps 2 and 4… Enjoy! Rick > On Mar 11, 2019, at 2:53 PM, Richard Laager wrote: > > Can we narrow this down to a reproducer? It sounds like something like > this would work: > > > > 0) Configure two different pools where you can tell the servers apart. > 1) Stop ntpd. > 2) Break your DNS. > 3) Start ntpd. Wait a few seconds for it to try to resolve and fail. > 4) Fix your DNS. > > Expected results: > ntpd recovers (successfully resolves the pools' DNS records) relatively > quickly. Most importantly, when ntpd recovers, it recovers for both > pools equally. > > Actual results: > Only the last configured pool recovers right away. The other one takes 5 > or 10 minutes. > > > > Can you try a few more things: > > 1) Does deleting the nameservers in /etc/resolv.conf work for step 2, or > more to the point, does putting them back in /etc/resolv.conf work for > step 4? > > 2) I doubt it matters, but can you set the poll intervals the same on > the two pools, to eliminate that variable? > > 3) If you reduce minsane to 4, what happens? Does it do the same thing, > never spin up the second pool, or (far less likely) spin them both up > normally? I wonder if ntpd is only "recovering" the second pool because > it is below minsane. > > -- > Richard
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
Can we narrow this down to a reproducer? It sounds like something like this would work: 0) Configure two different pools where you can tell the servers apart. 1) Stop ntpd. 2) Break your DNS. 3) Start ntpd. Wait a few seconds for it to try to resolve and fail. 4) Fix your DNS. Expected results: ntpd recovers (successfully resolves the pools' DNS records) relatively quickly. Most importantly, when ntpd recovers, it recovers for both pools equally. Actual results: Only the last configured pool recovers right away. The other one takes 5 or 10 minutes. Can you try a few more things: 1) Does deleting the nameservers in /etc/resolv.conf work for step 2, or more to the point, does putting them back in /etc/resolv.conf work for step 4? 2) I doubt it matters, but can you set the poll intervals the same on the two pools, to eliminate that variable? 3) If you reduce minsane to 4, what happens? Does it do the same thing, never spin up the second pool, or (far less likely) spin them both up normally? I wonder if ntpd is only "recovering" the second pool because it is below minsane. -- Richard
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
Hi Richard, My observations follow your quote… > On Mar 11, 2019, at 12:30 PM, Richard Laager wrote: > > On 3/10/19 4:11 AM, Rick Thomas wrote: >> If ntpsec implemented the "preempt" option on "peer" directives, the >> first "peer" would be revisited after a few minutes and all would be >> well. But it doesn't. So the first "peer" is as if it never existed. > > What leads you to this conclusion, specifically about preempt? The way > pool directives work is that they are re-evaluated until it reaches > maxclock: > https://docs.ntpsec.org/latest/discover.html > > I'm not 100% sure how multiple independent pools (like you have here) > will intersect. It's possible that ntpd is spinning up all the > associations from the public pool. If only for testing, what happens if > you comment out the public pool? Does it behave correctly then? > > On 3/11/19 2:46 AM, Rick Thomas wrote: >> Edited /lib/systemd/system/ntpsec-systemd-netif.path > > As you figured out, this is the wrong place. You want ntpsec.service. > >>> Mar 11 00:23:51 cube ntpd[354]: DNS: dns_probe: us.pool.ntp.org, >>> cast_flags:8, flags:101 >>> Mar 11 00:23:51 cube ntpd[354]: DNS: dns_check: processing us.pool.ntp.org, >>> 8, 101 >>> Mar 11 00:23:51 cube ntpd[354]: DNS: dns_check: DNS error: -3, Temporary >>> failure in name resolution >>> Mar 11 00:23:51 cube ntpd[354]: DNS: dns_take_status: >>> us.pool.ntp.org=>temp, 9 >>> Mar 11 00:23:52 cube ntpd[354]: DNS: dns_probe: pool.rcthomas.org, >>> cast_flags:8, flags:121 >>> Mar 11 00:23:52 cube ntpd[354]: DNS: dns_check: processing >>> pool.rcthomas.org, 8, 121 >>> Mar 11 00:23:52 cube ntpd[354]: DNS: dns_check: DNS error: -3, Temporary >>> failure in name resolution >>> Mar 11 00:23:52 cube ntpd[354]: DNS: dns_take_status: >>> pool.rcthomas.org=>temp, 9 > > I agree with your conclusion that DNS is not ready when ntpd started. > >>> Mar 11 00:23:56 cube ntpd[354]: IO: Listen normally on 4 eth0 >>> 192.168.3.129:123 >>> Mar 11 00:23:56 cube ntpd[354]: IO: Listen normally on 5 eth0 >>> [fe80::d263:b4ff:fe00:912f%2]:123 >>> Mar 11 00:23:56 cube ntpd[354]: DNS: dns_probe: pool.rcthomas.org, >>> cast_flags:8, flags:121 >>> Mar 11 00:23:56 cube ntpd[354]: DNS: dns_check: processing >>> pool.rcthomas.org, 8, 121 >>> Mar 11 00:23:56 cube ntpd[354]: DNS: Pool taking: 192.168.3.207 >>> Mar 11 00:23:56 cube ntpd[354]: DNS: Pool taking: 192.168.3.111 >>> Mar 11 00:23:56 cube ntpd[354]: DNS: Pool taking: 192.168.3.4 >>> Mar 11 00:23:56 cube ntpd[354]: DNS: Pool taking: 192.168.1.14 > > What about this, though? Those are private IP addresses, so I assume > those are coming from pool.rcthomas.org. > >>> Mar 11 00:23:56 cube ntpd[354]: DNS: dns_take_status: >>> pool.rcthomas.org=>good, 8 > > This sure looks like it resolved correctly. > > What is the output from: > ntpq -p > > Also note that the definition of network-online.target can vary and may > or may not represent what you actually want. > https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ > > If there's a bug here, I'd like to get it reported upstream so it can be > fixed. > > However, I am very skeptical of the idea that ntpd should, by default, > wait until the network is online before starting. That prevents useful > configurations like local refclocks, with no advantage if ntpd simply > retries the network connections periodically, which I believe it is > supposed to do. > > — > Richard You are correct that the private IP addresses are coming from “pool.rcthomas.org” and the public addresses are from “us.pool.ntp.org”. It may be worth noting that each of the two pools resolves (at any given time) to just four IP addresses, and the conf file specifies minsane 5 (so as to always have at least one sane server from each of the pools) and minclock 7 (to allow two spares if one or more of the servers goes insane). If that’s not a good idea, please correct me. I’ve tried various combinations and orders of the “pool” statements. What always happens is that they both are initially rejected; then the last one (and only the last one — whether it’s the private one or the public one) resolves OK on the next attempt and ntpsec connects to those servers — It almost looks as if (for some reason) the first one, having been rejected, is forgotten about but (again, for some reason) the last one is remembered and retried. The first pool statement is eventually retried, but not for a period of several minutes (5 or 10?) Something (I don’t know what) causes it to be re-considered, and this time (of course) it resolves. I just last night discovered the re-consideration after a long delay thing. Previously I would always loose patience and restart ntpsec before that timeout — so I didn’t see it. I thought (though I may be wrong) that “preempt” was supposed to shorten that timeout to something more reasonable. Please enlighten me if I’m mistaken! In any case, the net result is that ntpsec does not synchroni
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
On 3/10/19 4:11 AM, Rick Thomas wrote: > If ntpsec implemented the "preempt" option on "peer" directives, the > first "peer" would be revisited after a few minutes and all would be > well. But it doesn't. So the first "peer" is as if it never existed. What leads you to this conclusion, specifically about preempt? The way pool directives work is that they are re-evaluated until it reaches maxclock: https://docs.ntpsec.org/latest/discover.html I'm not 100% sure how multiple independent pools (like you have here) will intersect. It's possible that ntpd is spinning up all the associations from the public pool. If only for testing, what happens if you comment out the public pool? Does it behave correctly then? On 3/11/19 2:46 AM, Rick Thomas wrote: > Edited /lib/systemd/system/ntpsec-systemd-netif.path As you figured out, this is the wrong place. You want ntpsec.service. >> Mar 11 00:23:51 cube ntpd[354]: DNS: dns_probe: us.pool.ntp.org, >> cast_flags:8, flags:101 >> Mar 11 00:23:51 cube ntpd[354]: DNS: dns_check: processing us.pool.ntp.org, >> 8, 101 >> Mar 11 00:23:51 cube ntpd[354]: DNS: dns_check: DNS error: -3, Temporary >> failure in name resolution >> Mar 11 00:23:51 cube ntpd[354]: DNS: dns_take_status: us.pool.ntp.org=>temp, >> 9 >> Mar 11 00:23:52 cube ntpd[354]: DNS: dns_probe: pool.rcthomas.org, >> cast_flags:8, flags:121 >> Mar 11 00:23:52 cube ntpd[354]: DNS: dns_check: processing >> pool.rcthomas.org, 8, 121 >> Mar 11 00:23:52 cube ntpd[354]: DNS: dns_check: DNS error: -3, Temporary >> failure in name resolution >> Mar 11 00:23:52 cube ntpd[354]: DNS: dns_take_status: >> pool.rcthomas.org=>temp, 9 I agree with your conclusion that DNS is not ready when ntpd started. >> Mar 11 00:23:56 cube ntpd[354]: IO: Listen normally on 4 eth0 >> 192.168.3.129:123 >> Mar 11 00:23:56 cube ntpd[354]: IO: Listen normally on 5 eth0 >> [fe80::d263:b4ff:fe00:912f%2]:123 >> Mar 11 00:23:56 cube ntpd[354]: DNS: dns_probe: pool.rcthomas.org, >> cast_flags:8, flags:121 >> Mar 11 00:23:56 cube ntpd[354]: DNS: dns_check: processing >> pool.rcthomas.org, 8, 121 >> Mar 11 00:23:56 cube ntpd[354]: DNS: Pool taking: 192.168.3.207 >> Mar 11 00:23:56 cube ntpd[354]: DNS: Pool taking: 192.168.3.111 >> Mar 11 00:23:56 cube ntpd[354]: DNS: Pool taking: 192.168.3.4 >> Mar 11 00:23:56 cube ntpd[354]: DNS: Pool taking: 192.168.1.14 What about this, though? Those are private IP addresses, so I assume those are coming from pool.rcthomas.org. >> Mar 11 00:23:56 cube ntpd[354]: DNS: dns_take_status: >> pool.rcthomas.org=>good, 8 This sure looks like it resolved correctly. What is the output from: ntpq -p Also note that the definition of network-online.target can vary and may or may not represent what you actually want. https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ If there's a bug here, I'd like to get it reported upstream so it can be fixed. However, I am very skeptical of the idea that ntpd should, by default, wait until the network is online before starting. That prevents useful configurations like local refclocks, with no advantage if ntpd simply retries the network connections periodically, which I believe it is supposed to do. -- Richard
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
> On Mar 11, 2019, at 12:46 AM, Rick Thomas wrote: > > Examining the journal carefully, (specifically the lines labeled “Mar 11 > 00:23:44”, > it looks like the [Install] stanza isn’t the right place to put the “After” > “Wants” lines. > > I’ll keep looking for a more appropriate place. If you have any suggestions, > please share them. Hmmm… Something like those two lines are already present in the [Unit] section of /lib/systemd/system/ntpsec.service but they are looking for “network.target” rather than “network-online.target” I tried changing network => network-online (and doing “update-initramfs -u”) but to no avail. Same results, but without the complaint about them being in an inappropriate place. We’ll keep trying. Rick
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
Hi Tony! I’m not sure if I’m doing it right, but here’s what I did and what happened… What I did: Edited /lib/systemd/system/ntpsec-systemd-netif.path Here’s a diff > rbthomas@cube:~$ diff -c2 /SAVE//lib/systemd/system/ntpsec-systemd-netif.path > /lib/systemd/system/ntpsec-systemd-netif.path > *** /SAVE//lib/systemd/system/ntpsec-systemd-netif.path Mon Mar 11 > 00:20:38 2019 > --- /lib/systemd/system/ntpsec-systemd-netif.path Mon Mar 11 00:21:33 2019 > *** > *** 7,9 > > [Install] > ! WantedBy=network-pre.target > --- 7,10 > > [Install] > ! After=network-online.target > ! Wants=network-online.target > rbthomas@cube:~$ Then i did > root@cube:~# update-initramfs -u and rebooted. What happened: I still get only the servers from the last “path” statement . Here’s what the journal says: > rbthomas@cube:~$ journalctl -b | grep ntp > Mar 11 00:23:43 cube kernel: Mountpoint-cache hash table entries: 2048 > (order: 1, 8192 bytes) > Mar 11 00:23:44 cube systemd[1]: > /lib/systemd/system/ntpsec-systemd-netif.path:7: Unknown lvalue 'After' in > section 'Install', ignoring > Mar 11 00:23:44 cube systemd[1]: > /lib/systemd/system/ntpsec-systemd-netif.path:8: Unknown lvalue 'Wants' in > section 'Install', ignoring > Mar 11 00:23:48 cube kernel: audit: type=1400 audit(1552289028.492:6): > apparmor="STATUS" operation="profile_load" profile="unconfined" > name="/usr/sbin/ntpd" pid=297 comm="apparmor_parser" > Mar 11 00:23:48 cube audit[297]: AVC apparmor="STATUS" > operation="profile_load" profile="unconfined" name="/usr/sbin/ntpd" pid=297 > comm="apparmor_parser" > Mar 11 00:23:49 cube ntpd[350]: INIT: ntpd ntpsec-1.1.3 2019-02-04T07:38:48Z: > Starting > Mar 11 00:23:50 cube ntp-systemd-wrapper[345]: 2019-03-11T00:23:49 ntpd[350]: > INIT: ntpd ntpsec-1.1.3 2019-02-04T07:38:48Z: Starting > Mar 11 00:23:50 cube ntp-systemd-wrapper[345]: 2019-03-11T00:23:49 ntpd[350]: > INIT: Command line: /usr/sbin/ntpd -p /run/ntpd.pid -c /etc/ntpsec/ntp.conf > -g -N -u ntpsec:ntpsec > Mar 11 00:23:49 cube systemd[1]: ntpsec.service: Can't open PID file > /run/ntpd.pid (yet?) after start: No such file or directory > Mar 11 00:23:49 cube ntpd[350]: INIT: Command line: /usr/sbin/ntpd -p > /run/ntpd.pid -c /etc/ntpsec/ntp.conf -g -N -u ntpsec:ntpsec > Mar 11 00:23:49 cube ntpd[354]: INIT: precision = 1.667 usec (-19) > Mar 11 00:23:49 cube systemd[1]: Starting Wait for ntpd to synchronize system > clock... > Mar 11 00:23:50 cube ntpd[354]: INIT: successfully locked into RAM > Mar 11 00:23:50 cube ntpd[354]: CONFIG: readconfig: parsing file: > /etc/ntpsec/ntp.conf > Mar 11 00:23:50 cube ntpd[354]: CLOCK: leapsecond file > ('/usr/share/zoneinfo/leap-seconds.list'): good hash signature > Mar 11 00:23:50 cube ntpd[354]: CLOCK: leapsecond file > ('/usr/share/zoneinfo/leap-seconds.list'): loaded, expire=2019-06-28T00:00Z > last=2017-01-01T00:00Z ofs=37 > Mar 11 00:23:50 cube ntpd[354]: INIT: Using SO_TIMESTAMPNS > Mar 11 00:23:50 cube ntpd[354]: IO: Listen and drop on 0 v6wildcard [::]:123 > Mar 11 00:23:50 cube ntpd[354]: IO: Listen and drop on 1 v4wildcard > 0.0.0.0:123 > Mar 11 00:23:50 cube ntpd[354]: IO: Listen normally on 2 lo 127.0.0.1:123 > Mar 11 00:23:50 cube ntpd[354]: IO: Listen normally on 3 lo [::1]:123 > Mar 11 00:23:50 cube ntpd[354]: IO: Listening on routing socket on fd #20 for > interface updates > Mar 11 00:23:50 cube ntpd[354]: INIT: This system has a 32-bit time_t. > Mar 11 00:23:50 cube ntpd[354]: INIT: This ntpd will fail on > 2038-01-19T03:14:07Z. > Mar 11 00:23:51 cube ntpd[354]: DNS: dns_probe: us.pool.ntp.org, > cast_flags:8, flags:101 > Mar 11 00:23:51 cube ntpd[354]: DNS: dns_check: processing us.pool.ntp.org, > 8, 101 > Mar 11 00:23:51 cube ntpd[354]: DNS: dns_check: DNS error: -3, Temporary > failure in name resolution > Mar 11 00:23:51 cube ntpd[354]: DNS: dns_take_status: us.pool.ntp.org=>temp, 9 > Mar 11 00:23:52 cube ntpd[354]: DNS: dns_probe: pool.rcthomas.org, > cast_flags:8, flags:121 > Mar 11 00:23:52 cube ntpd[354]: DNS: dns_check: processing pool.rcthomas.org, > 8, 121 > Mar 11 00:23:52 cube ntpd[354]: DNS: dns_check: DNS error: -3, Temporary > failure in name resolution > Mar 11 00:23:52 cube ntpd[354]: DNS: dns_take_status: > pool.rcthomas.org=>temp, 9 > Mar 11 00:23:56 cube ntpd[354]: IO: Listen normally on 4 eth0 > 192.168.3.129:123 > Mar 11 00:23:56 cube ntpd[354]: IO: Listen normally on 5 eth0 > [fe80::d263:b4ff:fe00:912f%2]:123 > Mar 11 00:23:56 cube ntpd[354]: DNS: dns_probe: pool.rcthomas.org, > cast_flags:8, flags:121 > Mar 11 00:23:56 cube ntpd[354]: DNS: dns_check: processing pool.rcthomas.org, > 8, 121 > Mar 11 00:23:56 cube ntpd[354]: DNS: Pool taking: 192.168.3.207 > Mar 11 00:23:56 cube ntpd[354]: DNS: Pool taking: 192.168.3.111 > Mar 11 00:23:56 cube ntpd[354]: DNS: Pool taking: 192.168.3.4 > Mar 11 00:23:56 cube ntpd[354]: DNS: Pool taking: 192.168.1.14 > Mar 11 00:23:56 cube ntpd[354]: DNS: dns_take_statu
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
On Sun, Mar 10, 2019 at 01:11:46AM -0800, Rick Thomas wrote: > Package: ntpsec > Version: 1.1.3+dfsg1-2 > Severity: important > > Dear Maintainer, > > *** Reporter, please consider answering these questions, where appropriate *** > >* What led up to the situation? > the /etc/ntpsec/ntp.conf file (attached) has two "pool" directives and no > "server" directives. > So it requires DNS (hence network services) to find out the IP addresses it > needs for servers. > DNS fails on the first "peer" and (for some reason) succeeds on the second > "peer". > If ntpsec implemented the "preempt" option on "peer" directives, the first > "peer" would be > revisited after a few minutes and all would be well. But it doesn't. So the > first "peer" > is as if it never existed. >* What exactly did you do (or not do) that was effective (or ineffective)? > Wait several seconds after boot completes then issue "sudo service ntpset > restart" >* What was the outcome of this action? > This time around, it sees both sets of "peers". >* What outcome did you expect instead? > I would have hoped that the systemd stuff would wait until DNS was available > before starting up ntpsec. > Unfortunately, I don't know systemd configuration well enough to suggest a > patch. Hi Rick, Reading through the systemd network.target documentation here [1] (but also being somewhat new to it myself), I wonder if the behavior would work for your case if the Debian package changed: WantedBy=network-pre.target to After=network-online.target Wants=network-online.target in the unit file [2]. Would you be willing to test with that configuration? Thank you, tony [1] https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ [2] https://github.com/rlaager/ntpsec-pkg/blob/sid/debian/ntpsec-systemd-netif.path#L9 signature.asc Description: PGP signature
Bug#924192: ntpsec: at boot ntpsec starts before DNS is available
Package: ntpsec Version: 1.1.3+dfsg1-2 Severity: important Dear Maintainer, *** Reporter, please consider answering these questions, where appropriate *** * What led up to the situation? the /etc/ntpsec/ntp.conf file (attached) has two "pool" directives and no "server" directives. So it requires DNS (hence network services) to find out the IP addresses it needs for servers. DNS fails on the first "peer" and (for some reason) succeeds on the second "peer". If ntpsec implemented the "preempt" option on "peer" directives, the first "peer" would be revisited after a few minutes and all would be well. But it doesn't. So the first "peer" is as if it never existed. * What exactly did you do (or not do) that was effective (or ineffective)? Wait several seconds after boot completes then issue "sudo service ntpset restart" * What was the outcome of this action? This time around, it sees both sets of "peers". * What outcome did you expect instead? I would have hoped that the systemd stuff would wait until DNS was available before starting up ntpsec. Unfortunately, I don't know systemd configuration well enough to suggest a patch. *** End of the template - remove these template lines *** Systemd log files of a representative boot can be provided if necessary. -- System Information: Debian Release: buster/sid APT prefers testing APT policy: (500, 'testing') Architecture: armhf (armv7l) Kernel: Linux 4.19.0-2-armmp (SMP w/4 CPU cores) Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE=C (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages ntpsec depends on: ii adduser 3.118 ii libc62.28-7 ii libcap2 1:2.25-2 ii libssl1.11.1.1a-1 ii lsb-base 10.2018112800 ii netbase 5.6 ii python3 3.7.2-1 ii python3-ntp 1.1.3+dfsg1-2 ii tzdata 2018i-1 Versions of packages ntpsec recommends: ii cron [cron-daemon] 3.0pl1-132 ii systemd 241-1 Versions of packages ntpsec suggests: ii apparmor 2.13.2-9 ii ntpsec-doc 1.1.3+dfsg1-2 pn ntpsec-ntpviz -- Configuration Files: /etc/default/ntpsec changed: NTPD_OPTS="-g -N" IGNORE_DHCP="yes" /etc/ntpsec/ntp.conf changed: driftfile /var/lib/ntpsec/ntp.drift leapfile /usr/share/zoneinfo/leap-seconds.list statsdir /var/log/ntpsec/ statistics loopstats peerstats clockstats filegen loopstats file loopstats type day enable filegen peerstats file peerstats type day enable filegen clockstats file clockstats type day enable tos minclock 7 minsane 5 pool pool.rcthomas.org minpoll 3 maxpoll 6 prefer iburst pool us.pool.ntp.org minpoll 6 maxpoll 10 iburst restrict default kod nomodify nopeer noquery limited restrict 127.0.0.1 restrict ::1 restrict 192.168.0.0 mask 255.255.240.0 nomodify restrict 2001:4978:02d2:0001::0 mask ::::::: nomodify -- no debconf information