Re: [chrony-users] chrony and ntpd xleave interoperability
On Wed, Jan 24, 2018 at 05:49:01PM +0100, Rob Janssen wrote: > Miroslav Lichvar wrote: > > > > The bug in the interleaved mode is a bit more subtle. The state is > > updated from received packet, but only when one of the timestamps is > > zero (i.e. it's the first packet of the association). This means two > > ntpd 4.2.8p10 can interoperate, but I suspect the association will not > > recover if there is a mismatch between the receive timestamps. > > > > I have seen problems like that, and stopped using symmetric peering. > As far as I know, just declaring "server" in each direction works OK (there > is loop-detection code) > and appears a lot more stable. Probably and debugged tested better. Yes, the complexity of the symmetric mode is ridiculous when compared to the client/server mode. As far as I know the only good use case for the symmetric mode is that it can be used to push time to a server if it supports ephemeral associations (chrony does not). I have some stratum-1 servers which are behind NAT and their address is dynamic, and also some public servers that are synchronized to them. If the public servers accepted ephemeral assocations, they could be specified as peers on the stratum-1 servers and it would work without forwarding ports on the router and updating a DNS record with the dynamic IP. -- Miroslav Lichvar -- To unsubscribe email chrony-users-requ...@chrony.tuxfamily.org with "unsubscribe" in the subject. For help email chrony-users-requ...@chrony.tuxfamily.org with "help" in the subject. Trouble? Email listmas...@chrony.tuxfamily.org.
Re: [chrony-users] chrony and ntpd xleave interoperability
Miroslav Lichvar wrote: The bug in the interleaved mode is a bit more subtle. The state is updated from received packet, but only when one of the timestamps is zero (i.e. it's the first packet of the association). This means two ntpd 4.2.8p10 can interoperate, but I suspect the association will not recover if there is a mismatch between the receive timestamps. I have seen problems like that, and stopped using symmetric peering. As far as I know, just declaring "server" in each direction works OK (there is loop-detection code) and appears a lot more stable. Probably and debugged tested better. Rob -- To unsubscribe email chrony-users-requ...@chrony.tuxfamily.org with "unsubscribe" in the subject. For help email chrony-users-requ...@chrony.tuxfamily.org with "help" in the subject. Trouble? Email listmas...@chrony.tuxfamily.org.
Re: [chrony-users] chrony and ntpd xleave interoperability
Le 24/01/2018 à 13:45, Miroslav Lichvar a écrit : > On Tue, Jan 23, 2018 at 05:42:22PM +0100, FUSTE Emmanuel wrote: >> Le 23/01/2018 à 16:58, Miroslav Lichvar a écrit : >>> A similar thing seem to happen when trying to use the interleaved mode >>> between two 4.2.8p10 ntpds. You said it worked for you before, so I >>> assume one of the ntpds was an older version which didn't have this >>> bug? >> I have a platform with tree ntpds in interleaved mode >> Was on 2.4.8p8. >> Were upgraded today to 2.4.8p10 and are still working properly. > You are right. My test was bad (it hit the bug with unsynchronized > source). > > The bug in the interleaved mode is a bit more subtle. The state is > updated from received packet, but only when one of the timestamps is > zero (i.e. it's the first packet of the association). This means two > ntpd 4.2.8p10 can interoperate, but I suspect the association will not > recover if there is a mismatch between the receive timestamps. > > I'll send a bug report to the ntp maintainers. > > In the meantime, if you are willing to patch ntp, this should fix it: > > diff -up ntp-4.2.8p10/ntpd/ntp_proto.c.orig ntp-4.2.8p10/ntpd/ntp_proto.c > --- ntp-4.2.8p10/ntpd/ntp_proto.c.orig2018-01-24 13:35:16.611488502 > +0100 > +++ ntp-4.2.8p10/ntpd/ntp_proto.c 2018-01-24 13:35:24.113505866 +0100 > @@ -1774,7 +1774,6 @@ receive( > peer->bogusorg++; > peer->flags |= FLAG_XBOGUS; > peer->flash |= TEST2; /* bogus */ > - return; /* Bogus packet, we are done */ > } > Yes it work ! Thank you. Emmanuel.
Re: [chrony-users] chrony and ntpd xleave interoperability
On Tue, Jan 23, 2018 at 05:42:22PM +0100, FUSTE Emmanuel wrote: > Le 23/01/2018 à 16:58, Miroslav Lichvar a écrit : > > A similar thing seem to happen when trying to use the interleaved mode > > between two 4.2.8p10 ntpds. You said it worked for you before, so I > > assume one of the ntpds was an older version which didn't have this > > bug? > I have a platform with tree ntpds in interleaved mode > Was on 2.4.8p8. > Were upgraded today to 2.4.8p10 and are still working properly. You are right. My test was bad (it hit the bug with unsynchronized source). The bug in the interleaved mode is a bit more subtle. The state is updated from received packet, but only when one of the timestamps is zero (i.e. it's the first packet of the association). This means two ntpd 4.2.8p10 can interoperate, but I suspect the association will not recover if there is a mismatch between the receive timestamps. I'll send a bug report to the ntp maintainers. In the meantime, if you are willing to patch ntp, this should fix it: diff -up ntp-4.2.8p10/ntpd/ntp_proto.c.orig ntp-4.2.8p10/ntpd/ntp_proto.c --- ntp-4.2.8p10/ntpd/ntp_proto.c.orig 2018-01-24 13:35:16.611488502 +0100 +++ ntp-4.2.8p10/ntpd/ntp_proto.c 2018-01-24 13:35:24.113505866 +0100 @@ -1774,7 +1774,6 @@ receive( peer->bogusorg++; peer->flags |= FLAG_XBOGUS; peer->flash |= TEST2; /* bogus */ - return; /* Bogus packet, we are done */ } -- Miroslav Lichvar -- To unsubscribe email chrony-users-requ...@chrony.tuxfamily.org with "unsubscribe" in the subject. For help email chrony-users-requ...@chrony.tuxfamily.org with "help" in the subject. Trouble? Email listmas...@chrony.tuxfamily.org.
Re: [chrony-users] chrony and ntpd xleave interoperability
Le 23/01/2018 à 16:58, Miroslav Lichvar a écrit : > On Tue, Jan 23, 2018 at 02:44:56PM +0100, FUSTE Emmanuel wrote: >> Le 23/01/2018 à 13:00, Miroslav Lichvar a écrit : >>> With the current versions, if you can avoid the issue with >>> unsynchronized sources, they should interoperate, at least when their >>> polling intervals match. If it doesn't work for you, I'd like to see a >>> tcpdump output. >> Ok. I fixed min/max polling interval to 5 for testing purpose. >> Then I first restarted chrony. Wait for it to sync on a online source. >> Then restarted ntp and take capture. >> Will send you all the datas >> >> NTP is stuck in unreachable state >> Chrony is stuck with only one valid RX. > Ok. I can reproduce this problem. It seems ntpd doesn't update its > state in the interleaved mode when it receives a packet with an > unexpected origin timestamp. There was a similar issue fixed for the > basic mode few ntp releases ago: > https://bugs.ntp.org/show_bug.cgi?id=2952 > > As chronyd doesn't switch to the interleaved mode until it's receiving > valid responses and ntpd doesn't accept responses in the basic mode, > they are stuck waiting forever on each other. > > A similar thing seem to happen when trying to use the interleaved mode > between two 4.2.8p10 ntpds. You said it worked for you before, so I > assume one of the ntpds was an older version which didn't have this > bug? > Here are data from the working 4.2.8p10 platform which is composed by w.w.w.w, y.y.y.y, z.z.z.z ind assid status conf reach auth condition last_event cnt === 1 29450 f414 yes yes ok candidate reachable 1 2 29451 f414 yes yes ok candidate reachable 1 3 29452 f31f yes yes ok outlier 1 4 29453 961a yes yes none sys.peer sys_peer 1 5 29454 931d yes yes none outlier 1 ntpq> lpe remote refid st t when poll reach delay offset jitter == +x.x.x.x .MRS. 1 u 5 8 377 0.363 0.038 0.030 +y.y.y.y .PTP0. 1 s 25 64 377 0.071 0.017 0.035 -z.z.z.z .PTP0. 1 s 45 64 376 0.058 0.041 0.044 *SHM(0) .PTP0. 0 l 2 8 377 0.000 -0.017 0.005 -ntp-gps-1.thale .GPS. 1 u 4 8 377 5.031 -0.435 0.020 ntpq> rv 29451 associd=29451 status=f414 conf, authenb, auth, reach, sel_candidate, 1 event, reachable, srcadr=y.y.y.y, srcport=123, dstadr=w.w.w.w, dstport=123, leap=00, stratum=1, precision=-23, rootdelay=0.000, rootdisp=1.099, refid=PTP0, reftime=de11e3d4.1850d73b Tue, Jan 23 2018 17:39:48.094, rec=de11e3db.18563cd1 Tue, Jan 23 2018 17:39:55.095, reach=376, unreach=0, hmode=1, pmode=1, hpoll=6, ppoll=6, headway=51, flash=00 ok, keyid=112, offset=0.017, delay=0.071, dispersion=1.719, jitter=0.035, xleave=0.024, filtdelay= 0.09 0.10 0.07 0.12 0.13 0.11 0.11 0.16, filtoffset= -0.01 -0.02 0.02 0.06 0.05 -0.01 -0.04 0.00, filtdisp= 0.00 0.96 1.95 2.94 3.90 4.89 5.88 6.86 ntpq> rv 29452 associd=29452 status=f31f conf, authenb, auth, reach, sel_outlier, 1 event, interleave_error, srcadr=z.z.z.z, srcport=123, dstadr=w.w.w.w, dstport=123, leap=00, stratum=1, precision=-23, rootdelay=0.000, rootdisp=1.099, refid=PTP0, reftime=de11e4c0.a5c3751c Tue, Jan 23 2018 17:43:44.647, rec=de11e4c7.a5ca043a Tue, Jan 23 2018 17:43:51.647, reach=377, unreach=0, hmode=1, pmode=1, hpoll=6, ppoll=6, headway=13, flash=00 ok, keyid=113, offset=0.041, delay=0.058, dispersion=5.542, jitter=0.062, xleave=0.014, filtdelay= 0.11 0.14 0.11 0.11 0.10 0.08 0.06 0.08, filtoffset= 0.03 -0.05 -0.02 -0.02 -0.03 -0.02 0.04 0.09, filtdisp= 0.00 0.98 1.92 2.87 3.84 4.83 5.78 6.75 Emmanuel.
Re: [chrony-users] chrony and ntpd xleave interoperability
Le 23/01/2018 à 16:58, Miroslav Lichvar a écrit : > On Tue, Jan 23, 2018 at 02:44:56PM +0100, FUSTE Emmanuel wrote: >> Le 23/01/2018 à 13:00, Miroslav Lichvar a écrit : >>> With the current versions, if you can avoid the issue with >>> unsynchronized sources, they should interoperate, at least when their >>> polling intervals match. If it doesn't work for you, I'd like to see a >>> tcpdump output. >> Ok. I fixed min/max polling interval to 5 for testing purpose. >> Then I first restarted chrony. Wait for it to sync on a online source. >> Then restarted ntp and take capture. >> Will send you all the datas >> >> NTP is stuck in unreachable state >> Chrony is stuck with only one valid RX. > Ok. I can reproduce this problem. It seems ntpd doesn't update its > state in the interleaved mode when it receives a packet with an > unexpected origin timestamp. There was a similar issue fixed for the > basic mode few ntp releases ago: > https://bugs.ntp.org/show_bug.cgi?id=2952 > > As chronyd doesn't switch to the interleaved mode until it's receiving > valid responses and ntpd doesn't accept responses in the basic mode, > they are stuck waiting forever on each other. OK ! > A similar thing seem to happen when trying to use the interleaved mode > between two 4.2.8p10 ntpds. You said it worked for you before, so I > assume one of the ntpds was an older version which didn't have this > bug? I have a platform with tree ntpds in interleaved mode Was on 2.4.8p8. Were upgraded today to 2.4.8p10 and are still working properly. As in this case i use authent I added authent to the test platform. Mutual auth validate but the two get stuck as before. Leap status : Not synchronised Version : 4 Mode: Symmetric active Stratum : 0 Poll interval : 5 (32 seconds) Precision : -24 (0.00060 seconds) Root delay : 0.00 seconds Root dispersion : 0.000656 seconds Reference ID: 494E4954 (INIT) Reference time : Thu Jan 01 00:00:00 1970 Offset : +0.0 seconds Peer delay : 0.0 seconds Peer dispersion : 0.0 seconds Response time : 0.0 seconds Jitter asymmetry: +0.00 NTP tests : 111 101 Interleaved : Yes Authenticated : Yes TX timestamping : Hardware RX timestamping : Hardware Total TX: 17 Total RX: 18 Total valid RX : 2 ssocid=3540 status=e011 conf, authenb, auth, sel_reject, 1 event, mobilize, srcadr=y.y.y.y, srcport=123, dstadr=x.x.x.x, dstport=123, leap=11, stratum=16, precision=-24, rootdelay=0.000, rootdisp=0.000, refid=INIT, reftime=. Thu, Feb 7 2036 7:28:16.000, rec=de11e02d.60d2f07f Tue, Jan 23 2018 17:24:13.378, reach=000, unreach=10, hmode=1, pmode=0, hpoll=5, ppoll=5, headway=17, flash=1606 pkt_bogus, pkt_unsync, peer_stratum, peer_dist, peer_unreach, keyid=1, offset=0.000, delay=0.000, dispersion=15937.500, jitter=0.000, xleave=0.028, filtdelay= 0.000.000.000.000.000.000.000.00, filtoffset=0.000.000.000.000.000.000.000.00, filtdisp= 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 Emmanuel.
Re: [chrony-users] chrony and ntpd xleave interoperability
On Tue, Jan 23, 2018 at 02:44:56PM +0100, FUSTE Emmanuel wrote: > Le 23/01/2018 à 13:00, Miroslav Lichvar a écrit : > > With the current versions, if you can avoid the issue with > > unsynchronized sources, they should interoperate, at least when their > > polling intervals match. If it doesn't work for you, I'd like to see a > > tcpdump output. > Ok. I fixed min/max polling interval to 5 for testing purpose. > Then I first restarted chrony. Wait for it to sync on a online source. > Then restarted ntp and take capture. > Will send you all the datas > > NTP is stuck in unreachable state > Chrony is stuck with only one valid RX. Ok. I can reproduce this problem. It seems ntpd doesn't update its state in the interleaved mode when it receives a packet with an unexpected origin timestamp. There was a similar issue fixed for the basic mode few ntp releases ago: https://bugs.ntp.org/show_bug.cgi?id=2952 As chronyd doesn't switch to the interleaved mode until it's receiving valid responses and ntpd doesn't accept responses in the basic mode, they are stuck waiting forever on each other. A similar thing seem to happen when trying to use the interleaved mode between two 4.2.8p10 ntpds. You said it worked for you before, so I assume one of the ntpds was an older version which didn't have this bug? -- Miroslav Lichvar -- To unsubscribe email chrony-users-requ...@chrony.tuxfamily.org with "unsubscribe" in the subject. For help email chrony-users-requ...@chrony.tuxfamily.org with "help" in the subject. Trouble? Email listmas...@chrony.tuxfamily.org.
Re: [chrony-users] chrony and ntpd xleave interoperability
Le 23/01/2018 à 13:00, Miroslav Lichvar a écrit : > On Tue, Jan 23, 2018 at 11:31:38AM +0100, FUSTE Emmanuel wrote: >> When I try to do the same with ntpd on one side and chrony on the other, >> things go bad. >> At best, chrony got a working association with interleave status with >> very long response time. > A long response time up to the polling interval of the peer is normal > in symmetric associations. > >> On the ntpd side, the association never work. The chrony server never >> get the "reach" state and the reach counter is stuck a zero. > Have you tried the same configuration and the timing of restarts, > between two ntpd servers? I suspect you would see some of the issues > in this case too. > > There are probably multiple issues involved, which make it difficult > to see what's going on. I'm aware of the following: > > - ntpd doesn't accept packets from peers that are not synchronized >(yet), so peers have to be configured with other sources in order >for the symmetric association (in both basic and interleaved modes) >to start. See https://bugs.ntp.org/show_bug.cgi?id=3445. > - interleaved mode in ntpd works only when the peers use the same >polling interval. If they have the same minpoll and maxpoll, but >minpoll != maxpoll, they should in theory both get to the maxpoll >if the association doesn't work, but there may be a bug that >prevents that. > - chrony switches to the basic mode when the polling intervals don't >match, but ntpd doesn't accept responses in the basic mode if the >interleaved mode is enabled > >> chrony 3.2 >> ntp-4.2.8p8, ntp-4.2.8p10 >> >> Could I normally expect xleave interoperability between chrony and ntpd >> or it is something too much "implementation specific" ? > With the current versions, if you can avoid the issue with > unsynchronized sources, they should interoperate, at least when their > polling intervals match. If it doesn't work for you, I'd like to see a > tcpdump output. Ok. I fixed min/max polling interval to 5 for testing purpose. Then I first restarted chrony. Wait for it to sync on a online source. Then restarted ntp and take capture. Will send you all the datas NTP is stuck in unreachable state Chrony is stuck with only one valid RX. > > Please note that the symmetric mode has some security issues and it's > generally recommended to use the client/server mode instead. Even if > authentication is enabled, it is possible to break a symmetric > association by replaying old packets. (chrony has a partial protection > against this attack, but it works only in the basic mode when the > polling intervals match and there are no packets with timestamps from > future that could be replayed. It's too fragile, don't rely on it!) Yes I know. It is only used on "trusted" lan segments and/or to try to inter-operate with ntpd xleave. > > It is possible that support for symmetric associations will be dropped > from chrony in future. > I only using it to transition from ntpd to chrony. So It will not be missed. I hope my clock vendor will sometime transition from ntpd to something else (chrony) to get good xleave support (and much more). At most, I mainly use theses clocks with PTP so the NTP part only affect fail-over scenarios. Emmanuel.
Re: [chrony-users] chrony and ntpd xleave interoperability
On Tue, Jan 23, 2018 at 11:31:38AM +0100, FUSTE Emmanuel wrote: > When I try to do the same with ntpd on one side and chrony on the other, > things go bad. > At best, chrony got a working association with interleave status with > very long response time. A long response time up to the polling interval of the peer is normal in symmetric associations. > On the ntpd side, the association never work. The chrony server never > get the "reach" state and the reach counter is stuck a zero. Have you tried the same configuration and the timing of restarts, between two ntpd servers? I suspect you would see some of the issues in this case too. There are probably multiple issues involved, which make it difficult to see what's going on. I'm aware of the following: - ntpd doesn't accept packets from peers that are not synchronized (yet), so peers have to be configured with other sources in order for the symmetric association (in both basic and interleaved modes) to start. See https://bugs.ntp.org/show_bug.cgi?id=3445. - interleaved mode in ntpd works only when the peers use the same polling interval. If they have the same minpoll and maxpoll, but minpoll != maxpoll, they should in theory both get to the maxpoll if the association doesn't work, but there may be a bug that prevents that. - chrony switches to the basic mode when the polling intervals don't match, but ntpd doesn't accept responses in the basic mode if the interleaved mode is enabled > chrony 3.2 > ntp-4.2.8p8, ntp-4.2.8p10 > > Could I normally expect xleave interoperability between chrony and ntpd > or it is something too much "implementation specific" ? With the current versions, if you can avoid the issue with unsynchronized sources, they should interoperate, at least when their polling intervals match. If it doesn't work for you, I'd like to see a tcpdump output. Please note that the symmetric mode has some security issues and it's generally recommended to use the client/server mode instead. Even if authentication is enabled, it is possible to break a symmetric association by replaying old packets. (chrony has a partial protection against this attack, but it works only in the basic mode when the polling intervals match and there are no packets with timestamps from future that could be replayed. It's too fragile, don't rely on it!) It is possible that support for symmetric associations will be dropped from chrony in future. -- Miroslav Lichvar -- To unsubscribe email chrony-users-requ...@chrony.tuxfamily.org with "unsubscribe" in the subject. For help email chrony-users-requ...@chrony.tuxfamily.org with "help" in the subject. Trouble? Email listmas...@chrony.tuxfamily.org.
[chrony-users] chrony and ntpd xleave interoperability
Hello, First, my apologies for the fingers crossing on chrony-dev when I tried to subscribe to chrony-users... I'm doing some tests to replace ntpd by chrony on some servers groups. Theses servers use a peer association with interleave option. When I try to do the same with ntpd on one side and chrony on the other, things go bad. At best, chrony got a working association with interleave status with very long response time. On the ntpd side, the association never work. The chrony server never get the "reach" state and the reach counter is stuck a zero. As soon as I remove the xleave option on the ntpd side, all start immediately to work as expected. ntpd : peer y.y.y.y minpoll 5 maxpoll10 xleave restrict y.y.y.y notrap nomodify noquery chrony : peer x.x.x.x xleave minpoll 5 maxpoll 10 allow x.x.x.0/24 Since yesterday, I had removed the xleave option on the ntpd side. All was good on the two sides. So I tried to reactivate the xleave option -> Boom it works !!! I restarted chrony -> ntpd logged "revceive: KoD packet from 192.54.145.235 has a zero org or rec timestamp. Ignoring." and four minute later "y.y.y.y 8613 83 unreacheable" The previously working assoc is now dead. No working assoc from chrony. So I restarted ntpd -> chrony start to see the other server (ntpdata) but never reach a good state. -> ntpd does not reach the "reach" state. remove the xleave from ntpd and restart -> all is still stuck restart chrony -> ntpd start to see the chrony server, reach state increment, and reach a "backup" condition. All is good on the chrony side. Re-add xleave option on ntpd side. unreach counter increment, flash=1606 so packet_bogus... on the chrony side, "Total valid RX" no longer increment... I'm lost. chrony 3.2 ntp-4.2.8p8, ntp-4.2.8p10 Could I normally expect xleave interoperability between chrony and ntpd or it is something too much "implementation specific" ? Emmanuel.