Re: OpenNTPD - no constraint reply - no time sync
> On Jul 18, 2015, at 1:52 PM, Mikolaj Kucharski wrote: > > On Sat, Jul 18, 2015 at 11:44:17AM -0600, Brent Cook wrote: >> On Thu, May 28, 2015 at 5:24 PM, Mikolaj Kucharski >> wrote: >>> minimalisic patch as below fixed the issue for me: >>> >>> Index: constraint.c >>> === >>> RCS file: /cvs/src/usr.sbin/ntpd/constraint.c,v >>> retrieving revision 1.12 >>> diff -u -p -u -r1.12 constraint.c >>> --- constraint.c28 May 2015 21:34:36 - 1.12 >>> +++ constraint.c28 May 2015 23:14:47 - >>> @@ -279,7 +279,7 @@ constraint_check_child(void) >>>&cstr->addr->ss), >>> CONSTRAINT_SCAN_INTERVAL); >>>} >>> >>> - if (fail || cstr->state < STATE_REPLY_RECEIVED) { >>> + if (fail || cstr->state < STATE_QUERY_SENT) { >>>cstr->senderrors++; >>>constraint_close(cstr->fd); >>>} >> >> Thank you for the in-depth analysis. I have been running this patch >> the last couple >> of days and have not experienced any issues as well. What say you reyk? > > I can confirm that I run this patch from the end of May (from the time > when I've posted this to the misc mailing list) on amd64 and i386 and I > don't have any issues with it. I did not test portable. > Thanks, I have committed your simpler patch.
Re: OpenNTPD - no constraint reply - no time sync
On Sat, Jul 18, 2015 at 11:44:17AM -0600, Brent Cook wrote: > On Thu, May 28, 2015 at 5:24 PM, Mikolaj Kucharski > wrote: > > minimalisic patch as below fixed the issue for me: > > > > Index: constraint.c > > === > > RCS file: /cvs/src/usr.sbin/ntpd/constraint.c,v > > retrieving revision 1.12 > > diff -u -p -u -r1.12 constraint.c > > --- constraint.c28 May 2015 21:34:36 - 1.12 > > +++ constraint.c28 May 2015 23:14:47 - > > @@ -279,7 +279,7 @@ constraint_check_child(void) > > &cstr->addr->ss), > > CONSTRAINT_SCAN_INTERVAL); > > } > > > > - if (fail || cstr->state < STATE_REPLY_RECEIVED) { > > + if (fail || cstr->state < STATE_QUERY_SENT) { > > cstr->senderrors++; > > constraint_close(cstr->fd); > > } > > Thank you for the in-depth analysis. I have been running this patch > the last couple > of days and have not experienced any issues as well. What say you reyk? I can confirm that I run this patch from the end of May (from the time when I've posted this to the misc mailing list) on amd64 and i386 and I don't have any issues with it. I did not test portable. -- best regards q#
Re: OpenNTPD - no constraint reply - no time sync
On Thu, May 28, 2015 at 5:24 PM, Mikolaj Kucharski wrote: > Hi, > > I've initially reported this problem a while ago and I thought problem > was related to IPv6. Now I belive it is not. I did some research and > this is what I've found. I contacted Reyk couple of days ago, but didn't > get any reply from him yet, so decided to post here. > > I'm using OpenNTPD's constraint from the time it was introduced. However > I found that ntpd(8) often is getting stuck at the very beginning when > it starts and never recovers from that scenario. Usually it looks as > follows when you hit the problem: > > # ntpctl -sa > 0/4 peers valid, clock unsynced > > peer >wt tl st next poll offset delay jitter > 87.232.1.41 0.pool.ntp.org > 1 2 -0s0s peer not valid > 54.171.104.100 1.pool.ntp.org > 1 2 -0s0s peer not valid > 193.1.193.157 2.pool.ntp.org > 1 2 -0s0s peer not valid > 85.91.1.180 3.pool.ntp.org > 1 2 -0s0s peer not valid > > All counters are zero and all peers are not valid. For me it usually > stays like that and never recovers until rcctl restart ntpd. Then often > it fails like above again. Multiple restarts in a row solve the problem. > > I did some debug modifications to ntpd and this is what I've found. > I'm running my tests on: > > # sysctl -n kern.version > OpenBSD 5.7-current (GENERIC) #955: Thu May 28 13:09:53 MDT 2015 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC > > and to keep IPv6 out of the picture I have following settings in resolv.conf: > > # grep -w family /etc/resolv.conf > family inet4 > > The ntpd server itself is configured as follows: > > # grep -ve '^#' /etc/ntpd.conf > server 0.pool.ntp.org > server 1.pool.ntp.org > server 2.pool.ntp.org > server 3.pool.ntp.org > constraints from www.google.com > > I'm running debug version of ntpd as follows: > > ./ntpd.debug -vd 2>&1 | ts > > Here is example debug session when I hit the issue. Lines which contain > XXX MK are added by me via log_debug(): > > May 28 23:25:54 ntp engine ready > May 28 23:25:54 XXX MK STATE_NONE=0 > May 28 23:25:54 XXX MK STATE_DNS_INPROGRESS=1 > May 28 23:25:54 XXX MK STATE_DNS_TEMPFAIL=2 > May 28 23:25:54 XXX MK STATE_DNS_DONE=3 > May 28 23:25:54 XXX MK STATE_QUERY_SENT=4 > May 28 23:25:54 XXX MK STATE_REPLY_RECEIVED=5 > May 28 23:25:54 XXX MK STATE_INVALID=6 > May 28 23:25:54 XXX MK ntp_main() constraint_cnt=0, constraint_median=0 > May 28 23:25:54 XXX MK client_query() start > May 28 23:25:54 XXX MK client_query() start > May 28 23:25:54 XXX MK client_query() start > May 28 23:25:54 XXX MK client_query() start > May 28 23:25:54 XXX MK constraint_query() starts state=1 > May 28 23:25:54 XXX MK constraint_query() fail with default? > May 28 23:25:54 XXX MK ntp_main() function constraint_query() returned -1 > May 28 23:25:54 XXX MK ntp_main() about to loop with control_dispatch_msg() > and j=3 i=3 nfds=1 > May 28 23:25:54 XXX MK ntp_main() about to loop with > constraint_dispatch_msg() and j=3 i=3 nfds=1 > May 28 23:25:54 XXX MK ntp_main() constraint_cnt=0, constraint_median=0 > May 28 23:25:54 XXX MK constraint_query() starts state=1 > May 28 23:25:54 XXX MK constraint_query() fail with default? > May 28 23:25:54 XXX MK ntp_main() function constraint_query() returned -1 > May 28 23:25:55 XXX MK ntp_main() about to loop with control_dispatch_msg() > and j=3 i=3 nfds=0 > May 28 23:25:55 XXX MK ntp_main() about to loop with > constraint_dispatch_msg() and j=3 i=3 nfds=0 > May 28 23:25:55 XXX MK ntp_main() constraint_cnt=1, constraint_median=0 > May 28 23:25:55 XXX MK constraint_query() starts with 216.58.208.68 state=3 > May 28 23:25:55 constraint request to 216.58.208.68 > May 28 23:25:55 XXX MK constraint_query() succeeded? > May 28 23:25:55 XXX MK httpsdate_query() with 216.58.208.68 > May 28 23:25:55 XXX MK httpsdate_init() with 216.58.208.68 > May 28 23:25:55 XXX MK httpsdate_request() with 216.58.208.68 port 443 > May 28 23:25:55 XXX MK httpsdate_request() date from 216.58.208.68 is Date: > Thu, 28 May 2015 22:25:56 GMT > May 28 23:25:55 XXX MK httpsdate_request() success from 216.58.208.68 > May 28 23:25:55 XXX MK httpsdate_query() with 216.58.208.68 done > May 28 23:25:55 XXX MK ntp_main() function poll() returned -1, errno=4 i=4 > May 28 23:25:55 XXX MK ntp_main() about to loop with control_dispatch_msg() > and j=3 i=4 nfds=-1 > May 28 23:25:55 XXX MK ntp_main() about to loop with > constraint_dispatch_msg() and j=3 i=4 nfds=-1 > May 28 23:25:55 XXX MK constraint_check_child() bummer, we bump senderrors > now senderrors=1 fail=0 state=4 > May 28 23:25:55 XXX MK constraint_close() are we going to set state to > STATE_INVALID? state=4 > May 28 23:25:55 XXX MK constraint_close() somehow we set the state to > STATE_INVALID state=6 senderrors=1 > May 28 23:25:55 XXX MK ntp_main() constraint_cnt=1, constraint_median=0
Re: OpenNTPD - no constraint reply - no time sync
On 29.05.2015 02:24, Mikolaj Kucharski wrote: Hi, I've initially reported this problem a while ago and I thought problem was related to IPv6. Now I belive it is not. I did some research and this is what I've found. I contacted Reyk couple of days ago, but didn't get any reply from him yet, so decided to post here. I'm using OpenNTPD's constraint from the time it was introduced. However I found that ntpd(8) often is getting stuck at the very beginning when it starts and never recovers from that scenario. Usually it looks as follows when you hit the problem: # ntpctl -sa 0/4 peers valid, clock unsynced peer wt tl st next poll offset delay jitter 87.232.1.41 0.pool.ntp.org 1 2 -0s0s peer not valid 54.171.104.100 1.pool.ntp.org 1 2 -0s0s peer not valid 193.1.193.157 2.pool.ntp.org 1 2 -0s0s peer not valid 85.91.1.180 3.pool.ntp.org 1 2 -0s0s peer not valid Hi, I observed this problem too. I'm following -current and after every reboot OpenNTPd stuck as you describe. Only my NMEA sensor is working. [ns]~$ ntpctl -sa 0/13 peers valid, 1/1 sensors valid, clock synced, stratum 1 peer wt tl st next poll offset delay jitter not resolved from pool 1.bg.pool.ntp.org 1 2 -0s 15s peer not valid 2001:470:1f0a:98e::2 from pool 2.bg.pool.ntp.org 1 2 -0s0s peer not valid 2a01:8740::fffd::123 from pool 2.bg.pool.ntp.org 1 2 -0s0s peer not valid 2a02:80e0:0:2::123 from pool 2.bg.pool.ntp.org 1 2 -0s0s peer not valid 2a02:6800:0:99::2 from pool 2.bg.pool.ntp.org 1 2 -0s0s peer not valid 77.70.121.24 from pool 2.bg.pool.ntp.org 1 2 -0s0s peer not valid 92.247.168.103 from pool 2.bg.pool.ntp.org 1 2 -0s0s peer not valid 87.97.157.120 from pool 2.bg.pool.ntp.org 1 2 -0s0s peer not valid 176.12.47.129 from pool 2.bg.pool.ntp.org 1 2 -0s0s peer not valid 213.154.229.24 from pool 3.europe.pool.ntp.org 1 2 -0s0s peer not valid 79.175.127.142 from pool 3.europe.pool.ntp.org 1 2 -0s0s peer not valid 91.235.212.22 from pool 3.europe.pool.ntp.org 1 2 -0s0s peer not valid 82.197.164.46 from pool 3.europe.pool.ntp.org 1 2 -0s0s peer not valid sensor wt gd st next poll offset correction nmea0 GPS * 10 1 0 12s 15s-0.003ms 0.600ms [ns]~$ uptime 12:08PM up 12:18, 1 user, load averages: 0.47, 0.71, 0.55 All counters are zero and all peers are not valid. For me it usually stays like that and never recovers until rcctl restart ntpd. Then often it fails like above again. Multiple restarts in a row solve the problem. Only rcctl restart ntpd helps. [ns]~$ cat /etc/ntpd.conf # $OpenBSD: ntpd.conf,v 1.13 2015/05/18 13:48:38 deraadt Exp $ # Addresses to listen on (ntpd does not listen by default) listen on * servers 1.bg.pool.ntp.org servers 2.bg.pool.ntp.org servers 3.europe.pool.ntp.org server ntp.rootshells.eu weight 7 # use a specific local timedelta sensor (radio clock, etc) sensor nmea0 correction 600 weight 10 refid GPS # get the time constraint from a well-known HTTPS site constraints from "https://www.google.com"; OpenBSD 5.8-beta (GENERIC) #1059: Fri Jul 3 12:47:22 MDT 2015 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC real mem = 2130575360 (2031MB) avail mem = 2062237696 (1966MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xfb7f0 (50 entries) bios0: vendor American Megatrends Inc. version "080011" date 02/20/2009 bios0: Supermicro H8SSL acpi0 at bios0: rev 2 acpi0: sleep states S0 S1 S4 S5 acpi0: tables DSDT FACP APIC OEMB acpi0: wakeup devices P1P2(S4) USB0(S1) USB1(S1) USB2(S1) PS2K(S4) PS2M(S4) SLPB(S1) acpitimer0 at acpi0: 3579545 Hz, 32 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: AMD Opteron(tm) Processor 146, 1995.33 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,NXE,M MXX,FFXSR,LONG,3DNOW2,3DNOW,LAHF cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 64b/line 16-way L2 cache cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu0: AMD erratum 89 present, BIOS upgrade may be required mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 199MHz ioapic0 at m
Re: OpenNTPD - no constraint reply - no time sync
Hi, I've initially reported this problem a while ago and I thought problem was related to IPv6. Now I belive it is not. I did some research and this is what I've found. I contacted Reyk couple of days ago, but didn't get any reply from him yet, so decided to post here. I'm using OpenNTPD's constraint from the time it was introduced. However I found that ntpd(8) often is getting stuck at the very beginning when it starts and never recovers from that scenario. Usually it looks as follows when you hit the problem: # ntpctl -sa 0/4 peers valid, clock unsynced peer wt tl st next poll offset delay jitter 87.232.1.41 0.pool.ntp.org 1 2 -0s0s peer not valid 54.171.104.100 1.pool.ntp.org 1 2 -0s0s peer not valid 193.1.193.157 2.pool.ntp.org 1 2 -0s0s peer not valid 85.91.1.180 3.pool.ntp.org 1 2 -0s0s peer not valid All counters are zero and all peers are not valid. For me it usually stays like that and never recovers until rcctl restart ntpd. Then often it fails like above again. Multiple restarts in a row solve the problem. I did some debug modifications to ntpd and this is what I've found. I'm running my tests on: # sysctl -n kern.version OpenBSD 5.7-current (GENERIC) #955: Thu May 28 13:09:53 MDT 2015 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC and to keep IPv6 out of the picture I have following settings in resolv.conf: # grep -w family /etc/resolv.conf family inet4 The ntpd server itself is configured as follows: # grep -ve '^#' /etc/ntpd.conf server 0.pool.ntp.org server 1.pool.ntp.org server 2.pool.ntp.org server 3.pool.ntp.org constraints from www.google.com I'm running debug version of ntpd as follows: ./ntpd.debug -vd 2>&1 | ts Here is example debug session when I hit the issue. Lines which contain XXX MK are added by me via log_debug(): May 28 23:25:54 ntp engine ready May 28 23:25:54 XXX MK STATE_NONE=0 May 28 23:25:54 XXX MK STATE_DNS_INPROGRESS=1 May 28 23:25:54 XXX MK STATE_DNS_TEMPFAIL=2 May 28 23:25:54 XXX MK STATE_DNS_DONE=3 May 28 23:25:54 XXX MK STATE_QUERY_SENT=4 May 28 23:25:54 XXX MK STATE_REPLY_RECEIVED=5 May 28 23:25:54 XXX MK STATE_INVALID=6 May 28 23:25:54 XXX MK ntp_main() constraint_cnt=0, constraint_median=0 May 28 23:25:54 XXX MK client_query() start May 28 23:25:54 XXX MK client_query() start May 28 23:25:54 XXX MK client_query() start May 28 23:25:54 XXX MK client_query() start May 28 23:25:54 XXX MK constraint_query() starts state=1 May 28 23:25:54 XXX MK constraint_query() fail with default? May 28 23:25:54 XXX MK ntp_main() function constraint_query() returned -1 May 28 23:25:54 XXX MK ntp_main() about to loop with control_dispatch_msg() and j=3 i=3 nfds=1 May 28 23:25:54 XXX MK ntp_main() about to loop with constraint_dispatch_msg() and j=3 i=3 nfds=1 May 28 23:25:54 XXX MK ntp_main() constraint_cnt=0, constraint_median=0 May 28 23:25:54 XXX MK constraint_query() starts state=1 May 28 23:25:54 XXX MK constraint_query() fail with default? May 28 23:25:54 XXX MK ntp_main() function constraint_query() returned -1 May 28 23:25:55 XXX MK ntp_main() about to loop with control_dispatch_msg() and j=3 i=3 nfds=0 May 28 23:25:55 XXX MK ntp_main() about to loop with constraint_dispatch_msg() and j=3 i=3 nfds=0 May 28 23:25:55 XXX MK ntp_main() constraint_cnt=1, constraint_median=0 May 28 23:25:55 XXX MK constraint_query() starts with 216.58.208.68 state=3 May 28 23:25:55 constraint request to 216.58.208.68 May 28 23:25:55 XXX MK constraint_query() succeeded? May 28 23:25:55 XXX MK httpsdate_query() with 216.58.208.68 May 28 23:25:55 XXX MK httpsdate_init() with 216.58.208.68 May 28 23:25:55 XXX MK httpsdate_request() with 216.58.208.68 port 443 May 28 23:25:55 XXX MK httpsdate_request() date from 216.58.208.68 is Date: Thu, 28 May 2015 22:25:56 GMT May 28 23:25:55 XXX MK httpsdate_request() success from 216.58.208.68 May 28 23:25:55 XXX MK httpsdate_query() with 216.58.208.68 done May 28 23:25:55 XXX MK ntp_main() function poll() returned -1, errno=4 i=4 May 28 23:25:55 XXX MK ntp_main() about to loop with control_dispatch_msg() and j=3 i=4 nfds=-1 May 28 23:25:55 XXX MK ntp_main() about to loop with constraint_dispatch_msg() and j=3 i=4 nfds=-1 May 28 23:25:55 XXX MK constraint_check_child() bummer, we bump senderrors now senderrors=1 fail=0 state=4 May 28 23:25:55 XXX MK constraint_close() are we going to set state to STATE_INVALID? state=4 May 28 23:25:55 XXX MK constraint_close() somehow we set the state to STATE_INVALID state=6 senderrors=1 May 28 23:25:55 XXX MK ntp_main() constraint_cnt=1, constraint_median=0 May 28 23:25:55 XXX MK constraint_query() starts with 216.58.208.68 state=6 May 28 23:25:55 XXX MK constraint_query() fail with STATE_INVALID last=850 now=850 diff=0 May 28 23:25:55 XXX MK ntp_main() function constraint_query() returned -1 for 216.58.208
OpenNTPD - no constraint reply - no time sync
Hi, I have following ntpd.conf file on an OpenBSD machine: OpenBSD 5.7-current (GENERIC) #839: Mon Mar 30 14:21:47 MDT 2015 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC # /etc/ntpd.conf server 0.pool.ntp.org server 1.pool.ntp.org server 2.pool.ntp.org server 3.pool.ntp.org constraints from www.google.com I've noticed that ntpd(8) often ends up with all peers in 'peer not valid' state, never syncing the time. I've enabled debug logging and this is what I see in the logs: Mar 31 00:29:38 openbsd57 ntpd[12097]: ntp engine ready Mar 31 00:29:39 openbsd57 ntpd[12097]: constraint request to 2a00:1450:400b:800::2004 Mar 31 00:29:39 openbsd57 ntpd[12097]: constraint request to 216.58.211.132 Mar 31 00:29:39 openbsd57 ntpd[30212]: tls failed: 2a00:1450:400b:800::2004: connect Mar 31 00:29:39 openbsd57 ntpd[12097]: no constraint reply from 2a00:1450:400b:800::2004 received in time, next query 900s Mar 31 00:34:37 openbsd57 ntpd[12097]: 0 out of 4 peers valid Mar 31 00:34:37 openbsd57 ntpd[12097]: bad peer 0.pool.ntp.org (54.194.18.100) Mar 31 00:34:37 openbsd57 ntpd[12097]: bad peer 1.pool.ntp.org (149.157.192.5) Mar 31 00:34:37 openbsd57 ntpd[12097]: bad peer 2.pool.ntp.org (193.1.219.116) Mar 31 00:34:37 openbsd57 ntpd[12097]: bad peer 3.pool.ntp.org (85.91.1.164) Mar 31 00:44:39 openbsd57 ntpd[12097]: constraint request to 2a00:1450:400b:800::2004 Mar 31 00:44:39 openbsd57 ntpd[12097]: constraint request to 216.58.211.132 Mar 31 00:44:39 openbsd57 ntpd[1720]: tls failed: 2a00:1450:400b:800::2004: connect Mar 31 00:44:39 openbsd57 ntpd[12097]: no constraint reply from 2a00:1450:400b:800::2004 received in time, next query 900s Mar 31 00:59:39 openbsd57 ntpd[12097]: constraint request to 2a00:1450:400b:800::2004 Mar 31 00:59:39 openbsd57 ntpd[12097]: constraint request to 216.58.211.132 Mar 31 00:59:39 openbsd57 ntpd[13994]: tls failed: 2a00:1450:400b:800::2004: connect Mar 31 00:59:39 openbsd57 ntpd[12097]: no constraint reply from 2a00:1450:400b:800::2004 received in time, next query 900s That box doesn't have IPv6 configured: # ifconfig lo0: flags=8049 mtu 32768 priority: 0 groups: lo inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet6 ::1 prefixlen 128 inet 127.0.0.1 netmask 0xff00 vio0: flags=8843 mtu 1500 lladdr 52:54:00:12:34:72 priority: 0 groups: egress media: Ethernet autoselect status: active inet 172.16.0.52 netmask 0xff00 broadcast 172.16.0.255 enc0: flags=0<> priority: 0 groups: enc status: active # nc -v 2a00:1450:400b:800::2004 443 nc: connect to 2a00:1450:400b:800::2004 port 443 (tcp) failed: No route to host -- best regards q#