Re: [ntp:questions] NTP autokey: self-signed certificate expiration problem
Stephane lasagni wrote: Hello, I tried the NTP autokey protocol (TC scheme at first, then with IFF parameters - Schnorr algorithm since it is the scheme that is the most documented). I managed to get both schemes to work ok however I have noticed one problem: my product is a NTP client and self-generate its auto-signed non-trusted certificate as described in the protocol (using the ntp-keygen -H command). However when my product starts, it always start with a default date which is in 2015! Because the self-signed certificat is only valid for 1 year, it is expired immediately after its generation! I need to be synchronized before I generate the certificate...but then I need the certificate before to be able to synchronise! I found a workaround but I don't think it is a very "clean" solution: I use the option "-l" of ntp-keygen to specify the certificate life time duration and I put a big duration value (like 40 years) just to make sure the generated certificate is valid at power up. I can then make sure that I renew the certificate every month or so (but everytime with a 40 years duration => I've set up a cronjob to launch a script to generate the certificate at power-up and then every month but this script is "fixed" so each time it is launched the new generated certificate has a 40 years duration... I am thinking there must be a better way to deal with that! I'm probably not the only one to have this time of problem! :) How can this type of problem be dealt with? Is there a better solution? thank you very much for your help! Best regards Stéphane PS: I am planning to also test the "private certificate" to try to understand how it works (I have sent a question about this scheme recently) ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions Stephane, As alternative, you can use the symmetric key scheme. This does not require Autokey. The original intent of the keygen program with no argument was to generate a certificate using the current time of the operating system. Therefore, once you generate a proper certificate, the old certificate lifetime is updated. Dave ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTP autokey and the "private certificate" scheme
Stephane lasagni wrote: Hello, I apologize in advance if my questions further below seem basic to some of you: I am very new to NTP and Cybersecurity (a whole new world for me!). I am trying to work out out NTP autokey works when using the “private certificate” scheme, I thought you might be able to help me to understand it better. I know this scheme is not recommended by RFC 5906 (only for testing purposes). However in my application, this scheme could be appropriate. I think I understood how the other schemes (TC, IFF,..) worked but for some reasons I’m struggling to understand the “private certificate” scheme. I have the following questions (which I numbered to make the reading easier): 1. I understand the “private certificate” scheme is not recommended for general use (only for testing and development) only because, with this scheme, it is difficult to renew the certificate for all hosts in a secure way, is it correct? I understand that the TA (Trusted Authority) generates this private certificate off-line (signed by the TA) and provides it in a secure way to all hosts of the NTP group but what I am struggling to understand is what this private certificate contains exactly and how it is used: 2. Does the private certificate replace the self-signed certificate which is generated by each host at the beginning of the protocol? ie each host knows they can use the public key in that certificate (and the associated private key : see question 3) for the cookie encryption/de-encryption, etc..? 3. If answer to question 2 is yes, does it mean that, in addition with the certificate, the TA has to provide each host with the associated private key which goes with the public key of the certificate? 4. If answer to question 3 is no, does it mean each host has 2 certificates: the self-signed non-trusted certificate generated at the beginning of protocol + the private certificate? How the private certificate is then used exactly? 5. From RFC 5906, I understand that in case the private certificate scheme is used, then the certificate trail and the identification steps are not necessary. What about the SIGN exchange? The SIGN exchange only has sense with a non-trusted self-signed certificate so this brings me back to the previous questions 6. Last question (beginner lever I think...sorry!) and I am sure I probably forgot some : what does this private certificate contain in terms of subject name (the issuer is clearly the TA but is the subject name exactly the same for all hosts, ie the certificate is identical for all hosts? maybe it does not matter?) and how long is it valid for (1 year by default I guess which makes this scheme difficult to use in practice for the reasons given above?)? Thank you very much in advance for your help! Best regards Stéphane ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions Stephanie, Golly. You are the first person in 20 years to have asked about the private certificate scheme. Frankly, I don't remember all the tiny details you mentioned. However, the Autokey scheme is about to be replaced by new security proposals, so it is probably better to wait until the dust clears. Dave ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Understanding MJD in NTPv4 test cases
antony, The NTP timescale described on the NTP Era and Era Numbering page at the NTP Project site reckonds JDN days and fraction since noon on the first day of year 4613. The correspondence betwen JDN day and civil year prior to Pope Gregory's bull is in JDN years of 365.25 days , as is the habit of historians. The year since the papal bull corresponds to the Gregorian Calendar. The tables you see in the documentation were determined by an awesome Excel spreadsheet that apparently is unreadable with the current Excel version. Whatever bugs may remain are mine, but do not affect the current NTP timekeeping, only possible historic misadventures. David antony.arciu...@oooii.com wrote: In http://www.ietf.org/rfc/rfc5905.txt and http://www.eecis.udel.edu/~mills/database/reports/ntp4/ntp4.pdf Figure 4: Interesting Historic NTP Dates/Table 2: Interesting Historic NTP Dates It seems to me that all MJD values in the table are calculated using Gregorian calculations all the way back. If I put in an if before Gregorian was used to do a different leap day calculation, then none of the pre-Gregorian values come out correctly. This is all true EXCEPT for the first test, the Julian Day Number Epoch. The document describes what should be zero for MJD, i.e. the exact constant that the offset is from regular JD. Using the Gregorian calculation for Julian Days, the value for Jan 1, -4712 is 38, not 0. Thus the MJD I get is -2399963, not -241. That's way more than any precision/noon v. midnight would produce. Why is that first date different algorithmically, if the test values are to be believed? It's not a Julian v. Gregorian thing because I can generate the other BCE dates consistently. Am I missing something? I feel like either the first MJD in the table is incorrect, or all BCE values are incorrectly calculated using Gregorian math in a non-Gregorian era. I am using code from http://www.tondering.dk/claus/calendar.html AND http://bowie.gsfc.nasa.gov/time/julian.txt (ported to C) for JDN calc and they are always self consistent (I do a assert(JDN1 == JDN2) to be sure I've got the calcs right). I also have a 3rd I did right from http://en.wikipedia.org/wiki/Julian_day, but that too produces 38. I can only get 0 JDN, thus -241 MJD with a non-Gregorian calculation, but then all BCE calcs + Last day Julian fail (the CE tests after Last day Julian all pass. Please, can someone shed some light on what I am missing? What are the rules NTP uses for calculating JDN? ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTP vs RADclock?
Julian, Thanks for the paper reference. Your ideas on feed-forward are similar to the ideas in Greg Troxel's MIT dissertation. These ideas were partially implemented in NTPv3 a very long time ago. There are some minor misinterptretations in the paper. The NTP discipline loop is not critically damped; it is purposely underdamped with a minor overshoot of a few percent in order to improve the transient response. The impulse response was slavishly copied in the microkernel and nanokernel code that left here over a decade ago. The microkernel survives in Solaris and the nanokernel in most BSD systems with varying degrees of fidelity; however, many systems have elected to modify the original BSD tickadj semantics, which result in an extra pole. The result is a moderate instability at the longer poll intervals, especially if the step threshold is increased or eliminated. In any case, the response has no serious misbehavior as the paper described. Note that in no case are the daemon and kernel algorithms cascaded as the paper implies. Either one or the other is used, but not both. The system behavior with multiple servers is indeed as the paper suggests, but there is considerable algorithm horsepower to diminish the effects, including the cluster and combine algorithms, plus the anti-clockhop and prefer peer mechanisms. These provisions were first implementd in the Internet of twenty years ago when the congestion on overseas links frequently reached over one second. Perhaps today these algorithms could be more carefully tuned for LANs and even wifi netorks. As the paper describes, NTP algorithms are designed for traditional mathematical analysis, but with both linear and nonlinear components. However, the FLL algorithm is based on a model described by Levine as predictive. The model in the documentation describes both the PLL and FLL in predictive terms, but that doesn't change the conclusions in the paper. The paper suggests possible improvements in data filtering and analysis. The clock filter and popcorn spike suppressor algorithms in NTP represent one approach. A persistent observation is that NTP does not effectively use offset/delay samples other than at the apex of the scattergram. While it does indeed do that for the huff-n'-puff fiilter, the possible improvement in other cases is problematic. The paper does not mention the implications of roundtrip delay in the maximum error statistic, such as in Cristian's model, as used by NTP. It is a natural error bound for asymmetric paths such as mentioned in the paper. In summary, the NTP algorithms have evolved over thiry years in response to major changes in Internet service models and surely could use some further evolution. I am glad there is continuing interest in improvements. Dave Julien Ridoux wrote: On Thursday, June 7, 2012 4:19:31 PM UTC+10, unruh wrote: On 2012-06-07, skillz...@gmail.com skillz...@gmail.com wrote: On Jun 5, 6:46?pm, Julien Ridoux jul...@synclab.org wrote: On Tuesday, June 5, 2012 9:12:42 AM UTC+10, E-Mail Sent to this address will be added to the BlackLists wrote: Thanks for response. It would be great to have a simplified description of the algorithm. I've read most of the docs on the synclab site. That is a very impressive effort :) I'm trying to synchronize several devices to a reference clock of a similar device rather than to a true wall clock time of a real NTP server. I can't use the RADclock code directly because it's GPL so I'd like distill the algorithm down to something I can implement from scratch. I'd like to adapt my current NTP client and server code to use RADclock so I can compare performance. I'm aiming Why not just use the reference implimentation of radclock for the tests? The next version of RADclock is likely to be released under a BSD licence. This should save you the trouble of reimplementing the algorithm. for 10 microseconds or less of sync on a wireless LAN with a fast Not going to work. initial sync time (less than 5 seconds, but I can send many packets In 5 sec you could set the time, but you cannot get a good estimate of the rate. close together if needed). I haven't been able to get even close to this with my current NTP code (Ethernet is close, but WiFi introduces a lot of variability). Some of the key points seem to be: Precisely. And Radclock will not help with that. It uses the same ( or perhaps much slower) synchronization algorithm. I agree with Unruh, you are going to bump into quite a few issues and the much noisier characteristic of the WiFi channel makes it much harder for any synchronisation algorithm. RADclock however does a pretty good job at filtering noise but there is no magic involved. If the overall noise characteristics of all packets is bad, then you cannot do much about it. I have run RADclock over WiFi, but I would
[ntp:questions] Computer Network Time Synchronization: Russian translation
Folks, I thought you might get a kick out of this. See www.eecis.udel.edu/~mills for ISBN number. Translated from the second edition, published by CRC Press. Dave ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Falseticker determination
A C, Before you take a hacksaw to code, you should see the How NTP Works collection in the online documentation, in particular the clock select algorithm page. It includes advice on how to avoid falsetickers in cases like yours, including the use of tinker and/or tos commands. There should be no need of additional trace lines in the code, as there already are some that demonstrate the results of the clock select and clluster algorithms. D M A C wrote: On 4/4/2012 18:52, E-Mail Sent to this address will be added to the BlackLists wrote: Dave Hart wrote: A Cagcarver+...@acarver.net wrote: Where in the code of 4.2.7p270 is the determination that a peer is a falseticker? I'm looking through ntp_proto.c but I don't think I'm fully grasping how the determination is made and the peer marked. I want to put some debug lines in the area of the code where the falseticker is determined so I can figure out what conditions are causing the PPS to be marked as a false ticker. Line 2519 of ntp_proto.c (in clock_select): peer-new_status = CTL_PST_SEL_SANE; All survivors to that point in the code get the x, fleetingly. Those that keep it fail to survive to line 2688: peers[i].peer-new_status = CTL_PST_SEL_SELCAND; I would have thought 2835 - 2855 might be where he would want to take a closer look. Well that was a fun exercise. The end result is that the PPS is no longer a false ticker and I don't need a prefer peer either. :) Believe it or not, it's actually working better this way. For starters, the offset is staying within +/- 10 us of the PPS pulse. Additionally, allowing the system to select the rest of the clocks using the normal clock selection instead of a prefer peer has actually quieted the sys_fuzz messages. Usually I was seeing a sys_fuzz message perhaps once every couple minutes. Now I see one maybe once in a few hours or more. Plus if any of the peers explodes for whatever reason it doesn't wipe out my clock because the averaged time doesn't move much. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Off topic: using delay in routing protocols
Juliusz , The fuzzballs indeed used a delay metric. They made little nests at the earth stations in the SATnet program, as well as the routers used in the early NSFnet. In its original form, the ARPAnet also used a a node state metric like the fuzzballs, but switched to a link based metric like OpenSPF. So far as I know the fuzzballs used split horizon and hold-down before anybody else did. This was exemplified by the mantra good news travels fast, but bad news travels forever. See below for additional references. Mills, D.L. The Fuzzball. Proc. ACM SIGCOMM 88 Symposium (Palo Alto CA, August 1988), 115-122. Mills, D.L., and H.-W. Braun. The NSFNET Backbone Network. Proc. ACM SIGCOMM 87 Symposium (Stoweflake VT, August 1987), 191-196. Dave Juliusz Chroboczek wrote: Hi, Sorry for the offtopic post, but I really don't see another place to ask this question. I hear that the Fuzzball routing protocol used packet delay as a routing metric. Does anyone recall if that's right? Was it the RTT, or was it attempting to perform an estimate of one-way delay? More generally, I'll be grateful for any pointers to papers on the subject of using delay in routing protocols. Thanks for your help, -- Juliusz Chroboczek ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Choice of local reference clock seems to affect synchronization on a leaf node
unruh, unhurt, 1. You have a broken interpretation on how the NTP discipline algorithm works. See the online document How NTP Works, and in particular the discipline and clock state machine pages. 2. Your comparison between NTP and Chrony is badly conceived. Talk to Miroslav; he knows the issues. 3. The PIC (sic) issues have already been carefully considered. See the startup algorithms described on the How NTP Works. pages. 4. The orphan mode and local clock discipline require special provisions to delay clock adjustments until the configured sources have had a chance to activate. The paint isn't quite dry on some intricacies. 5. Starting NTP weith an initial ten-year offset is not a frequent adventure. Under these conditions, if the clock takes a little longer to stabilize, I'm not going to worry a lot about it. Dave unruh wrote: On 2011-11-07, Nathan Kitchen nkitc...@aristanetworks.com wrote: On Sun, Nov 6, 2011 at 2:13 PM, Danny Mayer ma...@ntp.org wrote: On 11/4/2011 7:27 PM, Nathan Kitchen wrote: I'm curious about some behavior that I'm observing on a host running ntpd as a client. As I understand it, configuring a local reference clock--either an undisciplined local clock or orphan mode--shouldn't help me, but I see different behavior when I do have one. In particular, when I'm synchronizing after correcting a very large offset, I synchronize about 2x faster in orphan mode than with no local clock, and with an undisciplined local clock I don't even fix the offset. I'm curious about whether this difference should be expected. I'm using the following configuration in all cases: ? ?driftfile /persist/local/ntp.drift ? ?server 172.22.22.50 iburst My three different configurations for local clocks are the following: 1. No additional commands 2. tos orphan 10 3. server 127.127.1.0 ? ? fudge 127.127.1.0 stratum 10 In all three cases, my test has these steps: 1. Stop ntpd. 2. Set the clock to 2000-1-1 00:00:00 (that is, more than 10 years ago). 3. Run ntpd -g. 4. Check that the 11-year offset is corrected. 5. Wait for synchronization to the time server. With either configuration #1 (no local clock) or #2 (orphan mode), the offset is corrected quickly: 4 and 13 seconds, respectively. With configuration #3 (undisciplined local clock), it fails to be corrected within 60 seconds. In case #3 that's expected if there are no servers to get the correct time. What else would you expect? Where would it get it's time from? In case #3, as in the other cases, the configuration includes the server 172.22.22.50. After the offset is corrected, configuration #1 takes 921 seconds to synchronize to the server. Configuration #2 takes 472. First, correcting the offset is the major concern. After that figuring out the frequency changes need to be calculated with additional packets being received and that takes time. It needs to have enough of them to do the calculation. Actually, that is not the way that ntpd works. It has no concept of frequency error. All it knows is the offset. It then changes the frequency in order to correct the offset. It does not correct the offset directly. It never figures out what the frequency error is. All it does is If offset is positive, speed up the clock, if negative slow it down ( where I am defining the offset at 'true' time- system clock time). (There is lots that goes into ntp's best estimate of the 'true' time, which is irrelevant to this discussion) chrony has a different philosophy, where it has a concept of both the frequency error and the offset, and it tries to correct both independently. It keeps a large number of measurements to estimate both the frequency error and the offset from those measurements. This results in a far far faster convergence, and a better system clock offset behaviour (by factors of 2-20). Another approach might be to use the PID concepts ( in which one uses the present offset, the derivative of the offset and the integral of the offset to drive the correction) to control the clock to get faster convergence, without overshoot and with high long term accuracy. These kinds of feedback systems are used for example to control the temperature of scientific heat baths to high precision and fast non ringing convergence (and have gained popular use in for example sous vide cooking). It might be interesting to get a Masters or PhD student somewhere to compare the various techniques for clock control to see what their advantages and disadvantages are especially under real life conditions. Why would it take fewer packets with orphan mode enabled (and no peers) than with no local clock? -- Nathan ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___
Re: [ntp:questions] Arcron (type 27) driver users needed
Guys, Joe sent be a receiver and I have finally verified it works sorta. It took a couple of weeks before it found WWVB, but it did. I connected it to deacon for test. When it first came up ntpq reported it as working, but after a couple of minutes the driver aparently went into a loops and stayed there. At least ntpq timed out for repeated pe queries. I left it commented out to aviiod posssible gattery exhaust. I need a little help here, as I cannot clearly see what I am doing. Dave Joe Landers wrote: Dave, I have a couple of Arcron's, although I don't actively use them anymore. If you let us know exactly what you'd like us to do, we'd be happy to help. Alternatively, I can ship one to you or Harlan and you can verify the fix yourself. Joe On 7/20/2011 8:33 PM, Dave Hart wrote: If you use the refclock_arc.c driver (127.127.27.*) please reply to me and mention which receiver and radio station you use. The driver supports MSF, DCF, and WWVB. I'm asking because five years ago, a bug report was filed indicating the driver was dependent on the system's time zone, when it should be independent of it. The driver maintainer provided a patch, and the reporter of the bug said it cured his problem. Then, unfortunately, the bug was dropped on the floor and the fix never integrated into the distributed ntpd sources. Details at: http://bugs.ntp.org/402 Harlan and I are embarrassed by the oversight and would like to resolve it, but also aware the proposed fix is not backwards-compatible and risks breaking currently-functioning setups. Additionally, we are both leery of making changes to a reference clock driver without adequate testing. I would like to see the change tested against several radio stations and with the system timezone set to UTC and local time. Cheers, Dave Hart ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Magice Server Numbers: 4,5,7,9
Danny, We would not be having this discussion if folks read how NTP works in the online documentation, in particular, the page on the select algorithm. The number of candidates is not limited to ten. By default, ten is the high water mark for survivors mobilized as preemptible in manycast and pool modes. Dave Danny Mayer wrote: On 9/16/2011 2:24 AM, unruh wrote: Danny, 6. Seven clocks allow for the failure of three. Etc, etc. . . . The only answer is to have at least 11 clocks although that is also not foolproof:-) Actually no. If you get too many reference clocks they will start to gang up against each other and it becomes impossibly complex to try and decide which set to use. As the number of clocks increase the number of gangs will likely increase. That's why the reference implementation limits the number of reference clocks to 10. Danny ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Magice Server Numbers: 4,5,7,9
Danny, We would not be having this discussion if folks read how NTP works in the online documentation. The maximum number of selectable candidates is not limited to ten. Ten is the high water mark for the number of preemptable candidates mobilized by the manycast and pool modes. Dave Danny Mayer wrote: On 9/16/2011 2:24 AM, unruh wrote: 6. Seven clocks allow for the failure of three. Etc, etc. . . . The only answer is to have at least 11 clocks although that is also not foolproof:-) Actually no. If you get too many reference clocks they will start to gang up against each other and it becomes impossibly complex to try and decide which set to use. As the number of clocks increase the number of gangs will likely increase. That's why the reference implementation limits the number of reference clocks to 10. Danny ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] how long does it take ntpd to sync up
Brian, See the release notes for the latest distribution in the online documentation. There has been a bit of facebook engineering in this discussion. For the real story, see the How NTP Works pages in the latest online documentation. Dave Brian Utterback wrote: On 08/28/11 06:53, David Woolley wrote: Harlan Stenn wrote: ntp *does* a fit/figure of the expected needed adjustment - your sentence implies that it does not (at least that's how it reads to me). No it doesn't. It is basically a feedback controller. If you take a modern central heating controller, which varies the output by varying how much of a ten minute cycle the pump is running, you have something similar to version 3 ntpd, except that it used 4 seconds. The central heating controller will use the measured difference from the target temperature to adjust the on to off proportion, and also include some of the integral of that, to, eventually, remove any remaining offset. A fitting process would be more like the controller measuring the rate of temperature change when the heat was off, and when it was on, and then calculating exactly how much on to off time to apply in one go. (In practice, it isn't as simple as that for the central heating system, as there is a lag involved for the heat to get from the boiler flame to the the thermostat. I think that there might be something to the process that Bill supports, at least in some situations. As you know, I have had problems with NTP and larger initial frequencies. (I know I owe you testing on the latest NTP. Sorry, I have been busy. I'll get to it soon.) But if you think about it, we already do something like this for offset. Of course, NTP uses a PLL in the general case. (It used to use a FLL when it settled in, but I seem to remember Dr. Mills saying that was removed.) but with iburst, we futz with the algorithm to get the offset right quickly. Couldn't we do something similar for frequency? Particularly in the case where there is no drift file, the initial frequency could be very far off. Save the first 16 (say) poll samples, correct for the offset adjustment, do a best fit analysis and some sanity checking and use the result for an initial frequency. You could keep this up, doing best fit analysis until it got to within a certain error interval and then switch to the normal regime. If you only did the first analysis, you could trivially prove that it does not break the PLL because you are left with an initial condition that potentially could have occurred anyway. But in most cases, you are going to zero in on the right frequency pretty quickly. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Setting back-up network servers to minpoll 10 automatically when sychronizied to a referrence clock.
David, Something like this was done in NTPv3 (xntpd) and it turned out to be a bad idea. The poll interval is determined by the time constant, which for PPS and other low-stratum sources is relatively small. If a backup is switched in at a poll interval much larger than this, it takes awhile for the time constant and backup poll interval to stabilize and meanwhile the Nyquist limit is exceeded. In NTPv3 this sometimes resulted in an evil twitch until things calmed down. It gets worse in the general case where sources of unequal stratum are configured. The algorithms can result in more than one stratum survivor is present and the combine algorithm uses them all. They must all run the same poll interval or evil twitches can result. All in all, to do what you suggest requires an intricate evaluation of what already is a complicated and fragile algorithm, so this might not happen soon. Dave David J Taylor wrote: Edward T. Mischanko etm1962@ wrote in message news:ivk80m$4uh$1...@speranza.aioe.org... I am using GPS with PPS as my primary time source. I don't want to set my back-up network servers to minpoll 10 in the configuration because if the GPS ever fails the servers would be fixed at minpoll 10. I propose an enhancement to the current NTPD functions: Have NTPD automatically set network clocks to minpoll 10 when using a stratum 0 clock as a time refference. This would cut network traffic and server load while still allowing a minpoll 6 setting or lower in the configuration to be used if the refference clock ever fails. I like and support that suggestion. David ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Controlling the combine algorithm
Steve, There is something wrong with the nomenclature in your message. By definition, the number of survivors is the number remaining after the cluster algorithm has completedd. This number is three by default, but can be changed using the minclock option. To reduce the numb er of survivors below three is generally a bad idea, but as I said, the option is available. I don't understand what you mean by it doesn't work. The maxclock option does not do what you want. It is intended for automatic configuration, where it specifies the maximum number of servers remaining after the preempt phase. For configured associations, this number is irrelevant. Dave Steve Kostecke wrote: On 2011-07-02, David L. Mills mi...@udel.edu wrote: The combine algorithm operates on the survivors of the cluster algorithm, as described on the How NTP Works page. The number of survivors can be set using the minclock option. I'm not sure why you want to do this, but the option is there. Dave, I've been having a discussion with someone who wishes to configure a large number of unicast time servers and restrict the number of survivors to a number below the default. He was attempting to accomplish this using maxclock option, but it did not work. Thanks, ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Failure of NIST Time Servers
Eugen , The remote NIST servers do not use the ACTS driver in the distribution. They use an algorithm called lockclock that functions as a modem device driver. I assume the unhealthy indication provided in the ACTS timecode is translated to the NTP LI indicator via the local clock driver, but Judah is using a relatively old and much modified version of the NTP reference implementation, so it is not clear how this is done. Dave Eugen COCA wrote: In my opinion, transmitting the time with an offset of about 680 seconds with ... some of the systems transmitted the wrong time without this indication. (unhealthy indicator), is a bit unprofessional. Of course, it is the users' sole responsibility to configure his/her time servers in order to avoid these things. I'm thinking for you to post a messages stating that such a behavior will never happen in the future as we discovered the bug and corrected it. Eugen On May 27, 11:16 pm, jlevine jlev...@boulder.nist.gov wrote: The primary and backup ACTS servers that are used to synchronize the NIST Internet time servers to the atomic clock ensemble in Boulder failed on Wednesday 24 May at about 1900 UTC (1300 MDT). The failure affected 11 of the 35 NIST Internet time servers, and the time transmitted by the affected servers was wrong by up to 680 seconds. In most cases, the incorrect time was accompanied by the unhealthy indicator, but some of the systems transmitted the wrong time without this indication. The other 24 servers were not affected. In some cases, one of the physical servers at a site was affected while the others were not, so that repeated requests to the public site address resulted in time messages that differed by up to 680 seconds. The ACTS servers have been fully repaired as of 27 May, 1800 UTC (1200 MDT), and all of the servers should be resynchronized within a few hours. There may be a transient error of up to 10 milliseconds during this period. I apologize for this failure, and I regret the problems and inconvenience that have resulted. Judah Levine jlev...@boulder.nist.gov ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Loop Filter Gains vs. Polling Interval
Edward, The loop time constant varies directly with the poll interval. See How NTP Works in the current documentation. Note that the default value and range are purposely optimized for public time servers in order to manage network overhead and are not appropriate for the most accurate LAN servers. For that the poll interval and thus loop time constant should be clamped below this, but not below 4 (16 s) for typical 100-Mb/s networks. Dave Mischanko, Edward T wrote: Dave, How do I adjust the loop time constant so that it is shorter?? The integral component should be retained. The effect of having too long a loop time constant should be that the system fails to track clock wander, so would be more vulnerable to temperature changes or forgetting to disable power management of clock frequency. -Original Message- On Behalf Of David Woolley Sent: Monday, May 16, 2011 2:01 AM To: questions@lists.ntp.org Subject: Re: [ntp:questions] Loop Filter Gains vs. Polling Interval Mischanko, Edward T wrote: Can anyone tell me, does the sensitivity for frequency adjustment lessen as the polling interval increases? I ask because I'm observing that my They are linked, but I have a feeling that the loop time constant is not clamped, even though the poll interval is. The aim is to always achieve a certain level of oversampling. At the very least, the intent is to vary the time constant and then choose a poll interval that is appropriate for that. offset increases and the frequency adjustment decreases to the point I fall out of sync at polling intervals above 256. What am I doing wrong? The integral component should be retained. The effect of having too long a loop time constant should be that the system fails to track clock wander, so would be more vulnerable to temperature changes or forgetting to disable power management of clock frequency. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] ntp-keygen -H and update options
Joe, The documentation is rather specific. If you generate a new host or sign key, the certificates are invalid and should be regenerated. Running ntp-keygen with now arguments generates a new certificate of the same type and signature as the existing one. Dave Joe Smithian wrote: Hi All, I am trying to configure a trusted NTP server and some clients using Autokey. ntp-keygen document: -HGenerate a new encrypted RSA public/private host key file and link. Note that if the sign key is the same as the host key, generating a new host key invalidates all certificates signed with the old host key.My questions: 1-When we should use -H option? When generating new keys? updating certificates? or both cases? 2-Does “-H” flag only generate RSA keys; not DSA even when we use –S DSA option, as in the example below? Let say we generate new keys using non-default options such as e.g:ntp-keygen generate -password mypasword -c RSA-SHA -S RSA -modulus 1024 3- Should we use the same arguments when running ntp-keygen later to update the certificates/keys? Is ntp-keygen smart enough to generate new certificates of the same type as the existing one without specifying the arguments? If not the problem is that if the user runs the ntp-keygen with no or different arguments it may generate new certificates of different type. I would appreciate your comments. Regards Joe ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] POSIX leap seconds versus the current NTP behaviour
that if an adjustment rate of once every 10 seconds is all that is necessary to achieve this precision with this system's clock and this fine a time source, then when you have the same system clock but a much sloppier time reference source (e.g. time samples from the network) the adjustment rate justifiable by the achievable timekeeping accuracy is going to be significantly lower (say once every few hundred seconds, like the Allan intercept with a good NTP source). This is a good result, if it can be implemented this way, since being able to keep the clock as accurate as it can be with a rate of adjustment which is typically quite small has some side benefits with respect to the implementation of kernel timestamping of packets or other events, or of system-call-free user space time stamping. Dennis Ferguson On 5 May 2011, at 04:10 , David L. Mills wrote: Dennis, Holy timewarp! Are you the same Dennis Ferguson that wrote much of the original xntpd code three decades ago? If so, your original leapseconds code has changed considerably, as documented in the white paper at www.eecis.udel.edu/~mills/leap.html. It does not speak POSIX, only UTC. This applies to both the daemon and the kernel. Dave Dennis Ferguson wrote: Hello, A strict reading of the part of the POSIX standard defining seconds since the epoch would seem to require that when a leap second is added the clock should be stepped back at 00:00:01. That is, the second which should be replayed is the second whose seconds since the epoch representation is an even multiple of 86400. Right now the NTP implementation doesn't do that, it instead steps the clock back at 00:00:00 and replays the second which is one before the even multiple of 86400 in the seconds since the epoch representation, to match what seems to be required for the NTP timescale. For a new implementation of this is there any reason not to do the kernel timekeeping the way POSIX seems to want it? I thought I preferred the NTP handling since it seemed to keep the leap problem on the correct day (for an all days have 86400 seconds timescale, which describes both the NTP and the POSIX timescales), but I've since decided that might not be all that important and I appreciate the symmetry of the POSIX approach (leaps forward occur at 23:59:59, leaps back at 00:00:01, and both leaps end up at 00:00:00) as well as the fact that the POSIX approach yields a simple equation to determine the conversion from time-of-day to seconds-since-the-epoch which is always valid, even across a leap (and even if the inverse conversion is ambiguous) while I'm having difficulty finding a similar description of NTP's behaviour. Dennis Ferguson ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] POSIX leap seconds versus the current NTP behaviour
Dennis, Holy timewarp! Are you the same Dennis Ferguson that wrote much of the original xntpd code three decades ago? If so, your original leapseconds code has changed considerably, as documented in the white paper at www.eecis.udel.edu/~mills/leap.html. It does not speak POSIX, only UTC. This applies to both the daemon and the kernel. Dave Dennis Ferguson wrote: Hello, A strict reading of the part of the POSIX standard defining seconds since the epoch would seem to require that when a leap second is added the clock should be stepped back at 00:00:01. That is, the second which should be replayed is the second whose seconds since the epoch representation is an even multiple of 86400. Right now the NTP implementation doesn't do that, it instead steps the clock back at 00:00:00 and replays the second which is one before the even multiple of 86400 in the seconds since the epoch representation, to match what seems to be required for the NTP timescale. For a new implementation of this is there any reason not to do the kernel timekeeping the way POSIX seems to want it? I thought I preferred the NTP handling since it seemed to keep the leap problem on the correct day (for an all days have 86400 seconds timescale, which describes both the NTP and the POSIX timescales), but I've since decided that might not be all that important and I appreciate the symmetry of the POSIX approach (leaps forward occur at 23:59:59, leaps back at 00:00:01, and both leaps end up at 00:00:00) as well as the fact that the POSIX approach yields a simple equation to determine the conversion from time-of-day to seconds-since-the-epoch which is always valid, even across a leap (and even if the inverse conversion is ambiguous) while I'm having difficulty finding a similar description of NTP's behaviour. Dennis Ferguson ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Bug 1700 - Clock drifts excessively at pollinglevels above 256.
Edward, I don't know enough about the mechanism Windows uses to adjust the system clock. If some variant of the Unix adjtime(), the solution may be straightforward. The phase-lock loop parameters determine the risetime and overshoot of the discipline loop, in particular the loop gain and corer frequency. The Unix discipline loop is carefully designed for minimum risetime consistent with controlled overshoot of about six percent The loop is designed to preserve this characteristic over a wide time constant an poll interval range from 8 s to 36 hr, but with the FLL in use at the longer time constants. The most critical parameter is the loop gain, which depends primarily on the timer frequency. In most Unix systems this is 100 Hz, but in some systems it can be as high as 1000 Hz with a change in parameters. It would not be feasible here to summarize in detail how to establish these parameters; however, Chapter 4 of both the first and second edition of the boot Network Time Synchronization: the NTP Protocol on Earth and in Space, CRC Press,, has the mathematical basis. Here is a quick litmus test. With the client in normal operation with the pool interval at 6 (64 s) and the time and frequency settled down, introduce a 100-ms step in time. The discipline loop should converge to zero in about 3000 s and overshoot about six percent. If the response is far different than that, major surgery is required. Dave, Mischanko, Edward T wrote: Your question is a very good one that I don't know the answer to. I have observed this behavoir while actually watching NTP Time Server Monitor by Meinberg, live. I didn't anticipate any power saving features as being active. I have seen a correction in the behavior by changing the PLL and FLL gains, as noted. I haven't specifially looked for this problem in other ports, only Windows and only on systems without a referrence clock. -Original Message- From: questions-bounces+edward.mischanko=arcelormittal@lists.ntp.org [mailto:questions-bounces+edward.mischanko=arcelormittal@lists.ntp.o rg] On Behalf Of unruh Sent: Saturday, April 23, 2011 1:20 PM To: questions@lists.ntp.org Subject: Re: [ntp:questions] Bug 1700 - Clock drifts excessively at pollinglevels above 256. On 2011-04-22, Mischanko, Edward T edward.mischa...@arcelormittal.com wrote: 256. My system clock drifts excessively when polling above 256 in a Windows environment, as much as 5 ms or more. I have made changes to .../ntpd/ntp_loopfilter.c CLOCK_PLL, CLOCK_FLL, CLOCK_LIMIT, and CLOCK_PGATE to address this problem. I realize the changes I have made are global in nature and really only need changes in the Windows port. I would welcome a patch to the Windows port to accomplish these changes or any other changes that accomplish the 1 ms stability I have now achieved. I hope Dr. Mills will have constructive comments on this problem and proposed solutions. Does your system cpu use powersaving cpu frequency changes? Is it a virtual Windows machine? */ ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] NTPD can take 10 hours to achieve stability
C., It doesn't take ten hours; it takes five/ten minutes. See the online documentation release notes for recent NTP development versions at www.ntp.org. Dave C BlacK wrote: Why would it take ntpd ten hours to achieve its accuracy? Can this be explained in laymans terms and mathematically Absolutely normal! NTPD can sometimes need up to ten hours to achieve the accuracy it is capable of. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Help getting IRIG working
Chris Co., The usual problem is overdriving the computer input.. Most IRIG devices produce a modulated signal in the range 10 V P-P, which is far larger than the line-in level. You might need an attenuator to produce in the order of 1 V P-P. As Chris says, the best way is to monitor the line out signal using the computer speaker. With a little practice, it is possible to slowly increase the input level until the speaker changes tone or becomes raspy. The bottom line is to monitor the AGC signal with that trace and bracket the input signal so the AGC reads in the middle of the range about 127. Dave Dave Chris Albertson wrote: On Fri, Apr 1, 2011 at 10:57 AM, Jim Kusznir jkusz...@gmail.com wrote: Hello all: I'm trying to set up a linux ntp server using IRIG as a time source, from a SEL 2407 (http://www.selinc.com/sel-2407/). Unfortunately, I've not managed to get this running yet. Have you tried flag3 to enable audio monitoring? This should allow you to hear the IRIG signal on the computer's speakers. Hearing the signal would 100% verify that the signal is being inpu ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Venting steam: Autokey in 4.2.6/4.2.7
Steve, Whatever does or does not work with IFF applies also to GQ and MV. These have been no changed. However, from a purely practical view, IFF is probably best for typical Internet configurations.. Dave Dave Steve Kostecke wrote: On 2011-03-29, Dave Hart h...@ntp.org wrote: On Tue, Mar 29, 2011 at 12:53 AM, David L. Mills mi...@udel.edu wrote: I sent you a message requesting to test this before deployment. I was referring to docs galore as I thrashed about earlier. I don't doubt each of your changes was an improvement, but each one also made Steve's 4.2.4 step-by-step guide less useful. I was looking at: I've moved the legacy Autokey Configuration to http://support.ntp.org/bin/view/Support/ConfiguringAutokeyFourTwoFour http://support.ntp.org/bin/view/Support/ConfiguringAutokey is being updated for the current Autokey configuration scheme. It currently only covers IFF and it does not address any of the ident/group name features. At the moment I have ntp-dev-4.2.7p142 Autokey+IFF running between psp-fb1 (trust group server) and psp-os1. Here's the view from the client: ntpq rv 6 assID=29118 \ status=f63a reach, conf, auth, sel_sys.peer, 3 events, event_10, srcadr=psp-fb1.ntp.org, srcport=123, dstadr=2001:4f8:fff7:1::26, dstport=123, leap=00, stratum=2, precision=-20, rootdelay=0.626, rootdisp=16.495, refid=209.81.9.7, reftime=d13c56aa.cc4f74b3 Tue, Mar 29 2011 13:01:30.798, rec=d13c588e.76244c5b Tue, Mar 29 2011 13:09:34.461, reach=377, unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=176, flash=00 ok, keyid=2472358740, offset=-1.346, delay=0.194, dispersion=5.554, jitter=0.605, xleave=0.028, filtdelay=0.28 0.25 0.34 0.29 0.25 0.26 0.19 0.22, filtoffset= -0.96 -0.85 -0.72 -0.69 -0.80 -0.97 -1.35 -0.39, filtdisp= 0.00 1.02 2.04 3.03 4.05 5.06 6.06 7.05, host=psp-fb1.ntp.org, flags=0x87f21, signature=md5WithRSAEncryption The flags decode as: #define CRYPTO_FLAG_ENAB 0x0001 /* crypto enable */ #define CRYPTO_FLAG_IFF 0x0020 /* IFF identity scheme */ #define CRYPTO_FLAG_VALID 0x0100 /* public key verified */ #define CRYPTO_FLAG_VRFY 0x0200 /* identity verified */ #define CRYPTO_FLAG_PROV 0x0400 /* signature verified */ #define CRYPTO_FLAG_AGREE 0x0800 /* cookie verifed */ #define CRYPTO_FLAG_AUTO 0x1000 /* autokey verified */ #define CRYPTO_FLAG_SIGN 0x2000 /* certificate signed */ #define CRYPTO_FLAG_LEAP 0x4000 /* leapseconds table verified */ I also have Autokey+IFF running between a 4.7.7p142 (amd64) client and a 4.2.6p2 (686) server on my home LAN. I appreciate Dave Hart's patience with me on IRC while getting this up and running. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] new driver development
Bruce, You have completely missed the point. The war to minimize the number of drivers has not always been successful, but does represent many hours of work on my part to update all the drivers when some minor detail of the common interface has changed over the last thirty years. Your reference to April Fool does not help when assessing your credibility. Dave Bruce Lilly wrote: On Mon, 28 Mar 2011 15:01:13 +, David L. Mills wrote: You may not be aware that all Spectracom devices are supported with one driver, all TrueTime devices are supported with one driver, all telephone modem services are supported with one driver, all Austron devices are supported with one driver, all Heath devices are supported with one driver and most GPS receivers are supported with one driver. I'm going to give the benefit of the doubt and presume that that's an early April Fool's joke: Spectracom devices involve not one, but multiple drivers: refclock_acts.c refclock_irig.c refclock_wwvb.c GPS receivers invlove *many* drivers, including at least the following: refclock_acts.c refclock_arbiter.c refclock_as2201.c refclock_bancomm.c refclock_fg.c refclock_gpsvme.c refclock_hopfpci.c refclock_hopfser.c refclock_hpgps.c refclock_jupiter.c refclock_mx4200.c refclock_nmea.c refclock_oncore.c refclock_palisade.c refclock_parse.c refclock_ripencc.c refclock_trak.c refclock_true.c refclock_wwvb.c refclock_zyfer.c ...and your reference to all one Austron devices [sic] is contradictory as the supported Austron device *is* a GPS receiver. Note also in the above list of GPS drivers that there are two separate ones for Hopf devices as well as for Trimble GPS devices. The refclock_heath.c driver in fact supports only one of the two (long since discontinued) Heathkit receivers, as is well-documented in the source code, e.g.: * The GC-1001 II was apparently never tested and, based on a Coverity * scan, apparently never worked [Bug 689]. Related code has been disabled. As for Truetime, the driver opens a serial port and parses the received text; it does not need to access different types of objects, different object namespaces, or different APIs -- it's really only one sort of device with relatively minor data stream inconsistencies. With somewhat greater accuracy, you might have said that there is one driver that supports all supported SVID IPC interfaces. This happened with many hours of dedicated effort on the part of refclock developers. You can appreciate the serious pushback in creating a new driver if a similar one is already available. One might then ask, as many of the above all merely grab data from a serial port, why they were not all required to be rolled into a single driver, such as refclock_parse.c. But then, a quick glance at that shows how convoluted things can get... One might also wonder why there are separate drivers for WWVB receivers, all using the same type of serial port communications, and all apparently minor variations derived from the first: refclock_wwvb.c refclock_chronolg.c refclock_dumbclock.c refclock_ulink.c ... or the ones for various IRIG time code receivers. In the case of what I have to date proposed, there are no similar drivers (I looked. Several times). There aren't any that address the issues outlined in the article which started this thread. There aren't any that use any form of POSIX IPC. There seems to be some confusion, probably on the part of those unfamiliar with the differences between SVID and POSIX shared memory: SVID shared memory and POSIX shared memory have as much in common as a Yamaha motorcycle and a Yamaha piano. Less in common than other types of IPC (e.g. semaphores), which are also quite different and in most cases also incompatible. An appropriate plan is [common interface code] #ifdef POSIX ... #else ... #endif That implies compiled-in support for one (exclusive-)or another type of device, i.e. no possibility to use both types. Worse, it means one build will try to access one type of device given a particular refclock server configuration, and a build with a different set of (build) configure options will access a different type of device given the same (runtime) ntpd configuration file refclock server pseudo-IP address specification. In effect, it means that there are two distinct drivers (only one of which can be incorporated at build time) using a single device driver number. Are you really suggesting some sort of build-time configure option and conditional compilation macro that would replace (e.g.) the Heathkit driver with something completely different? If so, in some sense that might not be such a terrible idea; some products (and in some cases the companies that produced them) have long since vanished. Likewise for some technologies (e.g. LORAN). However, it invites confusion when bug reports etc. refer to a device number that might be different
Re: [ntp:questions] Venting steam: Autokey in 4.2.6/4.2.7
Dave, I didn't mean to cause Steve problems, but something did need to be changed, particularly the binding between the trusted host name and the group name. Besides fixing the vulnerability, it makes use of non-keygen certificates less of a bother. Also, this allows more than one secure group to share the same broadcast network. This is the third more-or-less trivial change in syntax in fifteen years (frm Autokey Version 1). The -l option was added in order to change the certificate expiration time for test and to allow users to make long-lived certificates. Dave Dave Hart wrote: On Tue, Mar 29, 2011 at 12:53 AM, David L. Mills mi...@udel.edu mailto:mi...@udel.edu wrote: I sent you a message requesting to test this before deployment. I was referring to docs galore as I thrashed about earlier. Â I don't doubt each of your changes was an improvement, but each one also made Steve's 4.2.4 step-by-step guide less useful. Â I was looking at: http://www.eecis.udel.edu/~mills/ntp/html/autokey.html http://www.eecis.udel.edu/%7Emills/ntp/html/autokey.html http://www.eecis.udel.edu/~mills/ntp/html/keygen.html http://www.eecis.udel.edu/%7Emills/ntp/html/keygen.html http://support.ntp.org/bin/view/Support/ConfiguringAutokey http://bugs.ntp.org/1864 https://bugs.ntp.org/show_bug.cgi?id=1864Â BTW keygen.html mentions a -l days option which ntp-keygen doesn't understand, do you want me to fix the options processing so it does? Â Or get rid of that item from the docs? I'm not the dimmest bulb on the block, but when I was interested in reproducing the crash reported in bug 1864 and 1840, I didn't manage to. Â And I spent several hours trying. Â The crash may be a bug I introduced in ntp_config generic FIFO code that replaced the degenerate use of priority queues as FIFOs in Sachim's original ntp.conf parser rewrite. Â I was focused on getting past the configuration issues to debug the configuration code, not on setting up a working Autokey. That said, Steve has kindly dove in head first and is extracting me from my confusion one step at a time. Â I never forgot that you wanted me to test pool + autokey operation, I just feared and loathed the idea of setting up autokey again from scratch and have had other things to keep me busy. Â I'm optimistic Steve will be able to help me get a working setup to test pool + autokey and also to see if ntp_crypto.c:2984 really is unneeded. Cheers, Dave Hart ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Venting steam: Autokey in 4.2.6/4.2.7
Miroslav, Unfortunately, while things were in flux, snapshots continued to be produced, which was counterproductive. I have no direct say in that. The best advice is: 1. Produce a working version of the configuration without Autokey. 2. Roll keys for all group members using ntp-keygen with no options other than the -T option for the trusted hosts. Add the crypto command with no options to all configuration files. Add the autokey option to the server command for all clients of the trusted hosts. Verify the TC scheme works. 3. Make the group keys with the -I option on a trusted host or trusted agent. 4. Make the client keys from the group keys and distribute as in the original directions. Use an arbitray file name, preferably the name of the group. 5. Add the ident option to the client server command with name the same as the client keys installed. 6. For broadcast clients, use the same files, but use the ident option in the crypto command instead. All this is in the autokey.html page along with a detailed description of the operations. Note also the relevant white pages at the NTP project page www.eecis.udel.edu/~ntp.html, especially the security analysis and the simulation and analysis of the on-wire protocol. In contrast with the previous version, no options are required on the crypto command other than cited above. Note that the -s option is not required on the ntp-keygen program. These options can be added for special circumstances. Dave Miroslav Lichvar wrote: On Mon, Mar 28, 2011 at 11:11:28PM +, Dave Hart wrote: Autokey is very clever in dealing with some unique challenges other PKI OpenSSL client code doesn't have to. Anyone attempting to configure it should be on payroll, if not time and a half. (insert series of profanities here) I had a similar feeling when I was expanding my NTP test suite to test basic Autokey functionality and compatibility between 4.2.2, 4.2.4 and 4.2.6 version. I eventually got most of it working, but I'm not sure if it's working as intended or accidentaly by misplacing a private key, etc. I wasn't able to get the MV scheme working though. I have read the official ntp-keygen page and the wiki document. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] new driver development
Bruce Co., You may not be aware that all Spectracom devices are supported with one driver, all TrueTime devices are supported with one driver, all telephone modem services are supported with one driver, all Austron devices are supported with one driver, all Heath devices are supported with one driver and most GPS receivers are supported with one driver. This happened with many hours of dedicated effort on the part of refclock developers. You can appreciate the serious pushback in creating a new driver if a similar one is already available. An appropriate plan is [common interface code] #ifdef POSIX ... #else ... #endif Dave Bruce Lilly wrote: On Fri, 18 Mar 2011 04:51:40 +, Harlan Stenn wrote: I don't see this one. If flag1 0 (the current default) means SVID, and we decide that flag1 1 means POSIX, what is the issue? How is that significantly different from changing 127.127.28.x to 127.127.y.x ? Among others, 1. The following is workable: server 127.127.28.1 ... server 127.127.y.1 ... Your proposal in this respect, viz.: server 127.127.28.1 ... fudge 127.127.28.1 flag1 0 ... server 127.127.28.1 ... fudge 127.127.28.1 flag1 1 ... simply won't work. IOW, one can have 4 units ea. using different drivers, but one cannot have multiple devices sharing the same driver and unit numbers but differing flags (or ttl, etc.) 2. With separate drivers, each can perform appropriate initialization via the clock_init function pointer in its struct refclock structure. One cannot alter the way that works based on flag or ttl values as neither are accessible; the prototype is: void (*clock_init) (void); I.e. no pointer to a peer structure. They are separate issues. We support timespec where it exists. We want to support timespec under SHM regardless. If I thought that was feasible, I would have done it and submitted patches a year ago. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Venting steam: Autokey in 4.2.6/4.2.7
Dave, When all else fails, read the documentation. There were good reasons to change the configuration in minor ways. 1. There was a huge vulnerability if the identity file was specified by the server, but the correct file was not specified by the client. The scheme devolved to TC with no warring to the user. 2. Multiple secure groups (including anycast and pool) sharing the same broadcast network are supported. The primary intent is to provide an engineered selection of pool servers from the same DNS collection. 3. Configuration is much simpler and for the TC identity scheme requires no arguments on the ntp-keygen program or crypto configuration command. 4. Configuration for prior versions is possible; see the documentation. I sent you a message requesting to test this before deployment. Dave .Dave Hart wrote: http://support.ntp.org/bin/view/Support/ConfiguringAutokey For ntpd 4.2.4 and earlier, Steve Kostecke patiently worked out step-by-step instructions, and refined them over time heping people to use them, as seen on the page referenced above. For 4.2.6 ntp-keygen and autokey got an overhaul which makes those instructions useless. To investigate http://bugs.ntp.org/1840 and http://bugs.ntp.org/1864 filed by Rich Schmidt about ntpd 4.2.7 crashing when attempting to use Autokey, and to test a change to remove a presumed unneeded line of code (ntp_crypto.c:2984) identified through static analysis, I once again have tried to get a basic Autokey setup working. So far I have spent hours and achieved nothing but failure and humiliation. This is with Rich holding my hand telling me what to do. I'm so pissed off I want a baseball bat and an effigy. Now, granted, I'm not scratching an itch to secure my NTP, I'm scratching an itch to reproduce a fault and fix it, so i'm not typical, but if i were trying to secure my NTP, I'd use symmetric key. Autokey is very clever in dealing with some unique challenges other PKI OpenSSL client code doesn't have to. Anyone attempting to configure it should be on payroll, if not time and a half. (insert series of profanities here) Dave Hart ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Secure NTP
Yassica, In principle, NTP Autokey can use certificates generated by OpenSSL or by other certificate authorities (CA); however, there are some very minor details with these certificates, including the sequence number and use of the X.500 extension fields. Ideally, the CA would run the Autokey protocol and serve as the TH itself, which would be consistent with the TC model. Absent that, the choice is to use the certificates generated by the ntp-keygen program. Yessica wrote: Hello! I am installing an NTP server, but requires authentication for that clients can be synchronized with the server, and also that authentication should be with public and private keys. Let me know if I can work with certificates issued by any authority or can only use the certificates generated by the ntp-keygen. Thank you very much! I hope you can answer. PS: I'm working with ntp v4 ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] new driver development
Bruce, I take it your driver will replace or modify an existing driver, right? Adding a new driver to the current population of over 40 drivers is not a practical course. Dave Bruce Lilly wrote: I'm preparing a POSIX shared memory driver (PSHM) for ntp to address a few issues that exist with the present SHM driver. In no particular order, these are: o POSIX (not SVID) shared memory -- POSIX shared memory namespace rather than hexadecimal constant -- avoids 0x4e545030 [...] Big units will give non-ascii -- provides ample namespace size for ridiculously huge numbers of units w/o obfuscation o nanoseconds, not microseconds -- resolution compatible with bulk of the ntp reference implementation -- using POSIX struct timespec -- client compatibility with POSIX clock_gettime() o per-unit configurable source type (a.k.a. class) -- unlike present SHM driver, which treats all units as identical and unconfigurable -- currently UHF; wrong for everything else -- was TELEPHONE; wrong for everything else o per-unit PPS flag -- permits shared memory PPS drivers o POSIX-comforming code -- no attempt to work around buggy (non-POSIX-conforming) systems! o separate header file to simplify client code -- shared memory structure clearly defined and well-documented o source code/header version strings for use by what(1) or ident(1) -- for facilitation of bug reporting, version verification, daemon- client campatibility checks o client/driver run-time implementation/compatibility tests -- integer and pointer sizes -- endianness o no dead code -- e.g. SHM driver nsamples-related cruft o provision in shared memory to specify (variable part of) clockstats output -- client can control clockstats format and content (and frequency of logging) -- can be different for each unit o POSIX mutex for synchronized access to shared memory for updates -- obviates mode 0 / mode 1 / OLDWAY Comments and suggestions are welcome. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Is a Spectracom Netclock/2 worth saving?
Rick, All the Spectracom WWVB and GPS receivers use the same serial protocol; I have had one or more of these things running for almost 30 years. The Netclock/2 is useless without its ferrite-stick loop antenna, and even then the rising noise pollution due to the noisy electrical grid and machine room UPS systems have rendered any WWVB receiver essentially useless. An ironic fact is that my WWV receivers now outperform the wwvb receivers and both are much inferior to a GPS receiver. Considering the sometimes difficult problem of finding rooftop real estate for a GPS antenna, a CDMA receiver, such as the EndRun CNTP, might be now the easiest to deploy. Dave Rick Jones wrote: So, in a local on its way to its final reward pile I have come across a Spectracom Netclock/2 and am wondering whether it might be worth saving from the scrap heap. It does power-up and shows a time. Unsurprisingly I suppose since it has no antenna connected and I'm in an office building, the Antenna, Signal and Time Sync LEDs light red :) rick jones ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] AutoKey again
Jacek, An index to the cryptic error comment is in ./include/ntp_crypto.h. It says bad or missing group key. This message is from the client; you should see the similar message at the server. Check to be sure you are using the correct client parameters file. Recent chjanges to the configuration process makes it much simpler to deply a secure subnet. This doesn't change the protocol, just the commands to set it up. See the development documentation on the web and the Autokey Public Key Cryptography page.. Dave Jacek Igalson wrote: Hello, Some time ago I reported a bug in the implementation of AutoKey+IFF, in ntp ver 4.2.4p8. The error is intermittent and has been observed a in the long run of ntpd, that is within 2 - 10 days. When the error happens, ntpd keeps on running but authenticated server is rejected: ntpq -p remote refid st t when poll reach delay offset jitter neptune .CRYP. 16 u 6d 1600.0000.000 0.000 *ntp2.tp.pl .ATOM. 1 u 15 64 3772.5220.008 0.088 ntpq -c associations ind assID status conf reach auth condition last_event cnt === 1 60684 e0fe yes yes ok reject 15 2 60685 9614 yes yes none sys.peer reachable 1 Client synchronizes successfully to the another server which is in the configuration file. Server with the authentication is not used any more, reject status seems to be permanent (unless ntpd is restarted). The only hint is in cryptostats logfile: ...ntpkey_IFFkey_xxx.tpnet.pl.3479706582 mod 384 ...error 10e opcode 8207 ts 3505303563 fs 3479706582 What is a meaning of error 10e opcode? Has someone encountered such a problem in the longer run? I appreciate your help. Jacek ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Number of servers needed to detect one falseticker?
Terje, That's why Autokey uses digital signatures and zero-knowledge identity proofs. Dave Terje Mathisen wrote: David L. Mills wrote: Miroslav, Nowhere in the documentation produced by me is the statement that the minimum number of servers to reliably find the truechimers is four. There might have been some confusion in the past, in particular with reference to Lamport's paper, which describes an algorithm much more complicated and unsuitable for practical use. In that paper, four Byzantine generals are necessary to detect a traitor, but only three if digital signatures are available. The NTP algorithm, derived in part from Keith Marzullo's dissertation, is not that algorithm. I.e. Byzantine generals not only lie, the also lie about _who_ they are, spoofing messages from other generals. In NTP this would mean a falsticker which also sends out packets pretending to be responses from other servers, something which is effectively impossible unless they are based on the same (broadcast) network and can sniff incoming requests and/or poison the ARP tables to commandeer the other server's IP address. Your digital signatures make such lies impossible. The NTP algorithm is described on the page you cite. A constructive proof, elaborated in my book, is simple and based on the intersection properties of correctness intervals, which are loosely defined as the interval equal to the roundtrip delay with the center point as the maximum likelihood estimate of the server offset. If there are two servers and their correctness intervals overlap, both are truechimers. If the intervals do not overpap, no decision is possible. If there are three servers and the intersection of two intervals is nonempty, both are truechimers and the third is a falseticker. If no two intervals intersect, no decision is possible. So, it is incomplete to specify a minimum number of servers. The only valid statement is on the page The intersection interval is the smallest interval containing points from the largest number of correctness intervals. If the intersection interval contains more than half the total number of servers; those servers are truechimers and the others are falsetickers. I think Miroslav showed an ascii art example for when three servers might not be enough: Two servers which don't overlap, and a third which overlaps (partly) both of them: server A and B --- server C In this particular situation C must be a survivor, but since it overlaps both A and B with an identical amount, there is no way to determine if (A^C) or (B^C) is the best interval to pick. I guess the key here is that this situation is impossible unless at least one of the servers are lying (falseticker). You could even extend this to four servers, where server D is identical to server C, and it would be equally hard to determine if A or B was the falseticker, right? Fortunately, NTP timestamps have enough resolution to make the likelyhood of multiple perfectly positioned confidence intervals extremely unlikely, and if it does happen in a particular poll cycle, then NTPD will happily coast on until the next poll. :-) Terje ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Number of servers needed to detect one falseticker?
MIroslav, The select algorithm was changed in a very minor way to conform precisely to the formal assassin quoted in my previous message. It probably has very little practical significance. After all, the old algorithm has been going strong for nineteen years. Dave Miroslav Lichvar wrote: On Wed, Jan 05, 2011 at 09:23:59AM +0100, Terje Mathisen wrote: Two servers which don't overlap, and a third which overlaps (partly) both of them: server A and B --- server C In this particular situation C must be a survivor, but since it overlaps both A and B with an identical amount, there is no way to determine if (A^C) or (B^C) is the best interval to pick. The select algorithm doesn't care how much they overlap. Recent ntp-dev versions work as described on the select.html web page, so the intersection interval will be equal to C and all three sources will pass. Older versions worked also with centers of the intervals and as the centers of A and B are lying outside the intersection interval, C would be the only truechimer. I'd be curious to hear why that approach was dropped. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Number of servers needed to detect one falseticker?
Terfe, Read the formal assertion carefully and examine the algorithm on the Select Algorithm page. The algorithm would return interval C as the smallest intersection with the largess number of contributors. Dave Terje Mathisen wrote: David L. Mills wrote: Miroslav, Nowhere in the documentation produced by me is the statement that the minimum number of servers to reliably find the truechimers is four. There might have been some confusion in the past, in particular with reference to Lamport's paper, which describes an algorithm much more complicated and unsuitable for practical use. In that paper, four Byzantine generals are necessary to detect a traitor, but only three if digital signatures are available. The NTP algorithm, derived in part from Keith Marzullo's dissertation, is not that algorithm. I.e. Byzantine generals not only lie, the also lie about _who_ they are, spoofing messages from other generals. In NTP this would mean a falsticker which also sends out packets pretending to be responses from other servers, something which is effectively impossible unless they are based on the same (broadcast) network and can sniff incoming requests and/or poison the ARP tables to commandeer the other server's IP address. Your digital signatures make such lies impossible. The NTP algorithm is described on the page you cite. A constructive proof, elaborated in my book, is simple and based on the intersection properties of correctness intervals, which are loosely defined as the interval equal to the roundtrip delay with the center point as the maximum likelihood estimate of the server offset. If there are two servers and their correctness intervals overlap, both are truechimers. If the intervals do not overpap, no decision is possible. If there are three servers and the intersection of two intervals is nonempty, both are truechimers and the third is a falseticker. If no two intervals intersect, no decision is possible. So, it is incomplete to specify a minimum number of servers. The only valid statement is on the page The intersection interval is the smallest interval containing points from the largest number of correctness intervals. If the intersection interval contains more than half the total number of servers; those servers are truechimers and the others are falsetickers. I think Miroslav showed an ascii art example for when three servers might not be enough: Two servers which don't overlap, and a third which overlaps (partly) both of them: server A and B --- server C In this particular situation C must be a survivor, but since it overlaps both A and B with an identical amount, there is no way to determine if (A^C) or (B^C) is the best interval to pick. I guess the key here is that this situation is impossible unless at least one of the servers are lying (falseticker). You could even extend this to four servers, where server D is identical to server C, and it would be equally hard to determine if A or B was the falseticker, right? Fortunately, NTP timestamps have enough resolution to make the likelyhood of multiple perfectly positioned confidence intervals extremely unlikely, and if it does happen in a particular poll cycle, then NTPD will happily coast on until the next poll. :-) Terje ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Number of servers needed to detect one falseticker?
Miroslav, According to your diagram, the algorithm would determine the intersection interval as interval a. The midpoints of all three intervals would be considered truechimers, since each of the intervals a, b and c, contain points in the intersection interval. Dave Miroslav Lichvar wrote: On Wed, Jan 05, 2011 at 10:31:15AM -0500, Brian Utterback wrote: Let's equalize a bit to make it a bit more fair: c b- a-- So, now, if you were NTP, which would you choose? You are correct in your assessment that NTP would accept them all as truechimers. You are correct also that adding a fourth still does not guarantee that you will throw out the falseticker, but NTP uses intervals at this stage, not actual servers, so adding another truechimer will guarantee that the interval used will contain the real time. Not necessarily. | - A | - B C --- D | == X Here, B is the only server off, but the result X doesn't contain the actual time. I think clockhopping can happen with any number of servers, there just needs to be two or more similar sources on top of the list sorted by synchronization distance. With more servers on the list, the clustering and combining algorithms will merge them into a single offset and they will not hop. With two servers, these algorithms cannot function. Combining doesn't affect clockhopping, it happens after the system peer is selected. By the way, over time Dr. Mills has added features to try to suppress clock hopping as much as possible without compromising the correctness proofs. With the latest versions, clock hopping may not be so much of a problem. Bu tit is still an issue. Even if you prefer one clock, it might be inaccessible for a while and you will hop anyway. Yes, the maximum anti-clockhopping threshold is a fixed value (1 ms by default), so it can't work well in all situations. But it can be tuned with the tos mindist command. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Number of servers needed to detect one falseticker?
Miroslav, Nowhere in the documentation produced by me is the statement that the minimum number of servers to reliably find the truechimers is four. There might have been some confusion in the past, in particular with reference to Lamport's paper, which describes an algorithm much more complicated and unsuitable for practical use. In that paper, four Byzantine generals are necessary to detect a traitor, but only three if digital signatures are available. The NTP algorithm, derived in part from Keith Marzullo's dissertation, is not that algorithm. The NTP algorithm is described on the page you cite. A constructive proof, elaborated in my book, is simple and based on the intersection properties of correctness intervals, which are loosely defined as the interval equal to the roundtrip delay with the center point as the maximum likelihood estimate of the server offset. If there are two servers and their correctness intervals overlap, both are truechimers. If the intervals do not overpap, no decision is possible. If there are three servers and the intersection of two intervals is nonempty, both are truechimers and the third is a falseticker. If no two intervals intersect, no decision is possible. So, it is incomplete to specify a minimum number of servers. The only valid statement is on the page The intersection interval is the smallest interval containing points from the largest number of correctness intervals. If the intersection interval contains more than half the total number of servers; those servers are truechimers and the others are falsetickers. Dave Miroslav Lichvar wrote: Hi, I'm wondering about the section 5.3.3 on the ntp support web http://support.ntp.org/bin/view/Support/SelectingOffsiteNTPServers#Section_5.3.3. It says and explains that minimum number of servers to detect one falseticker is four, is that really correct? I understand that four is better for reliability, but from the algorithm description (http://www.eecis.udel.edu/~mills/ntp/html/select.html) and my tests with a simulated falseticker it seems that three is enough. Also, while running with two servers might be the worst configuration for ntpd, it still could be prefered over the configuration with only one server by users who would rather have two sources marked as falsetickers and know a problem needs to be fixed than unknowingly follow a bad truechimer. Is it possible to reword that section? Thanks, ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Number of servers needed to detect one falseticker?
David, As you might see from the online documentation, much of the tutorial material has been largely rewritten. Awhile back, some kind soul pointed out a logical discrepancy in the select algorithm. That was repaired, the code updated and the documentation refreshed. The pages linked from How NTP Works is offered as a definitive tutorial that might clear the air on these issues. Dave David Woolley wrote: Miroslav Lichvar wrote: On Tue, Jan 04, 2011 at 02:35:13PM -0500, Richard B. Gilbert wrote: The problem with using only two servers is that NTPD has no means of determining which is more nearly correct when the two differ, as they inevitably will! ntpd will pick the one with smaller distance if their intervals overlap. Otherwise they both will be falsetickers. In this case, ntpd will use an average of both of them, when the confidence intervals overlap; it will not pick just one except for the purposes of providing downstream error statistics. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] synchronization distance
David, I'm not making myself absolutely crystal clear and you are obscuring the point. Windows has an awesome protocol that sets the time. It happens to use the NTP packet header format, but is not otherwise compliant with the NTPv4 specification, especially the 36-h poll interval limitation, which is an engineering parameter based on the expected wander of a commodity crystal oscillator. All that doesn't matter at all, other than Windows servers are compatible with Windows clients. What does matter is that Windows servers are NOT compatible with NTPv4 clients, which SHOULD NOT BE USED. Use one of the SNTP variants instead. As a diehard workaround, use the tos maxdist command to set the distance threshold to something really high, like ten seconds. There is nothing whatsoever to be gained by this, as the expected error with update intervals of a week will be as bad or worse than with SNTP.. Dave David Woolley wrote: David L. Mills wrote: BlackList, I say again with considerable emphasis: this is a Microsoft product, not the NTPv4 distribution that leaves here. What you see is what you get, But it is often NTPv4 reference version that is used as the client and fails to synchronize because the root dispersion is too high. Corporate politics are such that it is difficult to get a Unix system, or even Windows running the reference version, near the root of the time distribution tree. People deeper in the tree then see the effects, even if they are using the reference implementation. warts and all. I doubt it has anything to do with root distance, or any other public specification, but that doesn't make any difference if the customer is satisfied with the performance. Just don't compare it with anything in the NTP distribution, documentation or specification. Dave E-Mail Sent to this address will be added to the BlackLists wrote: David L. Mills wrote: I had no idea somebody would try to configure current NTPv4 with a poll interval of a week. The current maximum allowed is 36 h. http://technet.microsoft.com/en-us/library/cc773263%28WS.10%29.aspx BlockQuote SpecialPollInterval This entry specifies the special poll interval in seconds for manual peers. ... The default value on stand-alone clients and servers is 604,800. /BlockQuote {7 days} ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] synchronization distance
David, I'm not learning anything at all in our exchange, and that is a real disappointment. Apparently, there are complaints NTPv4 does not play nicely with Microsoft. Microsoft is not about to change. NTPv4 is not about to change; however, there is a minor configuration option that makes NTPv4 work almost as well or maybe worse than SNTP. Whether this is good or bad corporate practice is not based on sound engineering principles, but on corporate convenience. I see absolutely no need to care about that or especially to prolong this discussion. Dave David Woolley wrote: David L. Mills wrote: David, I'm not making myself absolutely crystal clear and you are obscuring the point. Windows has an awesome protocol that sets the time. It happens to use the NTP packet header format, but is not otherwise compliant with the NTPv4 specification, especially the 36-h poll interval limitation, which is an engineering parameter based on the expected wander of a commodity crystal oscillator. All that doesn't matter at all, other than Windows servers are compatible with Windows clients. What does matter is that Windows servers are NOT compatible with NTPv4 clients, which SHOULD NOT BE USED. Use one of the SNTP variants instead. To a large extent I would agree with you, but the net effect of this is to say if you work for a marketing led company (probably true of most of the Fortune 500), do not use NTP as it is almost certain that your IT department has a strict Microsoft policy for their core systems, and are not time synchronisation experts. As a diehard workaround, use the tos maxdist command to set the distance threshold to something really high, like ten seconds. There is nothing whatsoever to be gained by this, as the expected error with update intervals of a week will be as bad or worse than with SNTP.. Dave David Woolley wrote: David L. Mills wrote: BlackList, I say again with considerable emphasis: this is a Microsoft product, not the NTPv4 distribution that leaves here. What you see is what you get, But it is often NTPv4 reference version that is used as the client and fails to synchronize because the root dispersion is too high. Corporate politics are such that it is difficult to get a Unix system, or even Windows running the reference version, near the root of the time distribution tree. People deeper in the tree then see the effects, even if they are using the reference implementation. warts and all. I doubt it has anything to do with root distance, or any other public specification, but that doesn't make any difference if the customer is satisfied with the performance. Just don't compare it with anything in the NTP distribution, documentation or specification. Dave E-Mail Sent to this address will be added to the BlackLists wrote: David L. Mills wrote: I had no idea somebody would try to configure current NTPv4 with a poll interval of a week. The current maximum allowed is 36 h. http://technet.microsoft.com/en-us/library/cc773263%28WS.10%29.aspx BlockQuote SpecialPollInterval This entry specifies the special poll interval in seconds for manual peers. ... The default value on stand-alone clients and servers is 604,800. /BlockQuote {7 days} ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] synchronization distance
David, I'm confused about your explanation, especially the default configuration. The definitive explanation of synchronization distance, in particular your reference more precisely to root distance, is on the page How NTP Works in the online documentation at ntp.org. Dave David Woolley wrote: Atul Gupta wrote: Can anyone explain me what is synchronization distance in case of ntp? and what is distance exceeded problem? It's an estimate of the maximum difference between the time on the stratum zero source and the time measured by the client, consisting of components for round trip time, system precision and clock drift since the last actual read of the stratum source. It most commonly happens because people try to use default configurations of w32time as their source. The default configuration has exceptionally long poll times, and doesn't adjust its stratum to indicate that it hasn't had an update in days. (The default configuration is also far from full NTP compliance.) ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] synchronization distance
BlackList, I say again with considerable emphasis: this is a Microsoft product, not the NTPv4 distribution that leaves here. What you see is what you get, warts and all. I doubt it has anything to do with root distance, or any other public specification, but that doesn't make any difference if the customer is satisfied with the performance. Just don't compare it with anything in the NTP distribution, documentation or specification. Dave E-Mail Sent to this address will be added to the BlackLists wrote: David L. Mills wrote: I had no idea somebody would try to configure current NTPv4 with a poll interval of a week. The current maximum allowed is 36 h. http://technet.microsoft.com/en-us/library/cc773263%28WS.10%29.aspx BlockQuote SpecialPollInterval This entry specifies the special poll interval in seconds for manual peers. ... The default value on stand-alone clients and servers is 604,800. /BlockQuote {7 days} ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] help needed for ntpd ipv6 setup
horhe, I can't speak to the versions used by other repackagers, but the current ntp-dev version interprets a nonzero broadest delay option as defeating the calibration volley for all broadcast and multicast clients. This is why it replaced the novolley option of the broadcast client command. If this turns out not to be the case, a bug report is suggested. Dave horhe wrote: W dniu 2010-11-29 21:37, Marc-Andre Alpers pisze: Hello! Have nobody a solution or idea wat is wrong with my server? Hello, I think this is the same bug : http://bugs.gentoo.org/326209 . I have got very similar problem. Regars ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Newbie question on the MD5 key of a public/remote NTP server.
Harry, As I said, NTP Autokey is designed to operate outside the NAT perimeter. In principal, although I don't recommend it, it is possible to use symmetric key cryptography transparently with a NAT box. The policies on assignment and distribution of keys depend on the agency. NIST has an experimental MD5 server with expectation you pay a service fee for the key. I am told NRC (Canada) either plans or has in operation a similar service. Dave Harry wrote: Hello, I'm quite new to the NTP world. I haven't had a chance to study and understand the NTP trust model fully. But I /have/ understood so far is... 1. that MD5 symmetric keys can be used to authenticate a public/ remote NTP Server 2. that this public/remote, MD5 talking NTP server can reach out to NTP clients behind a NAT/Firewall (which Autokey protocol cannot) 3. that the MD5 symmetric keys must be distributed securely somehow to the NTP client. What I haven't been able to figure out is... 1. How/Where to locate a public/remote NTP server that supports MD5 authentication? 2. How would the administrator of this NTP server (a human) distribute the keys to me: Via email? Via Phone/Fax? 3. Having received the keys even by secure means such as email/phone/ fax, what is stopping me from going rogue later... say, by using the key values of the authentic server and distributing wrong time? (I won't of course actually go rogue, just trying to understand.) Can somebody please explain this in plain English? Regards, /HS ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Will AutoKey setup work on a NAT host behind a firewall?
Harry, Symmetric key cryptography works fine behind a NAT box. See the Authentication Support page in the official NTP documentation on ntp.org. As I said, the intended Autokey model is for the server and client to live on the Internet side of the NAT box and have it serve time to the internal network via a separate interface. Dave Harry wrote: On Nov 10, 2:59 am, David L. Mills mi...@udel.edu wrote: Harry, Autokey is not designed to work behind NAT boxes. The Autokey server and client must have the same (reversed) IP addresses. The intended model is using two interfaces, one for the Internet side running Autokey, the other for the inside net on the other side of the NAT box. Dave Harry wrote: Hello, I want to employ the AutoKey method of securing NTP. Basically, I want one host that would act as an NTP client of an external NTP server, talking AutoKey. This NTP client is to become the NTP server for other hosts on the intranet. All these hosts are behind a corporate firewall and are very likely using NAT / IP masquerading as well. (I can tell NAT / IP masquerading is in use in our environment because all hosts report the same IP address at http://www.whatismyipaddress.com.) I ask this question because I ran into a circa 2004 link (http:// www.ecsirt.net/tools/crypto-ntp.html) that says, Be Aware! Before we start building ntpd, one important notice: NTP with Autokey does not work from a host that is behind a masquerading or NAT host! Is this a conceptual / fundamental limitation, or something related to NTP version? If latter, I'm hoping that it would probably have been fixed by now. If AutoKey and NAT don't go together conceptually, what would be my next best option of securing NTP? Though MD5 method is there but it is symmetric cryptography and prone to man-in-the-middle attacks... which is why btw I was hoping to be able to employ AutoKey. Many thanks, /HS ___ questions mailing list questi...@lists.ntp.org http://lists.ntp.org/listinfo/questions Dave, I really appreciate your response to my newbie question. May I ask (you or other users of this forum)... 1. What, then, would be the next best way (MD5-based symmetric key mode?) to syncing up a behind-NAT NTP client from an external NTP server in a tamper-proof manner? I'm not competent/powerful enough to advise the powers what be in my organization to have an Autokey NTP client outside our NAT/Firewall; most likely, I'll be told to continue to operate from behind the NAT/Firewall. 2. What physical/network setup should Autokey-desiring NTP clients follow? Is it OK, e.g., to have a Autokey client host (AkH) outside one's NAT network and have all the hosts inside the NAT network use AkH as a NTP server? I also skimmed thru your (excellent) book on NTP. I was hoping to find a mention of NAT in Chapter 9, but didnt. Not complaining, just humbly/ respectfully bringing it up. So, please do elaborate here if you can on this issue. Many thanks in advance, /HS ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Will AutoKey setup work on a NAT host behind a firewall?
Harry, Autokey is not designed to work behind NAT boxes. The Autokey server and client must have the same (reversed) IP addresses. The intended model is using two interfaces, one for the Internet side running Autokey, the other for the inside net on the other side of the NAT box. Dave Harry wrote: Hello, I want to employ the AutoKey method of securing NTP. Basically, I want one host that would act as an NTP client of an external NTP server, talking AutoKey. This NTP client is to become the NTP server for other hosts on the intranet. All these hosts are behind a corporate firewall and are very likely using NAT / IP masquerading as well. (I can tell NAT / IP masquerading is in use in our environment because all hosts report the same IP address at http://www.whatismyipaddress.com.) I ask this question because I ran into a circa 2004 link (http:// www.ecsirt.net/tools/crypto-ntp.html) that says, Be Aware! Before we start building ntpd, one important notice: NTP with Autokey does not work from a host that is behind a masquerading or NAT host! Is this a conceptual / fundamental limitation, or something related to NTP version? If latter, I'm hoping that it would probably have been fixed by now. If AutoKey and NAT don't go together conceptually, what would be my next best option of securing NTP? Though MD5 method is there but it is symmetric cryptography and prone to man-in-the-middle attacks... which is why btw I was hoping to be able to employ AutoKey. Many thanks, /HS ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] What level of timesynch error is typical onWinXP?
55680.806 0.000152206 0.000 0.000635788 0.00 6 55507 55748.805 0.17572 0.000 0.001913772 0.00 6 55507 55749.805 0.17023 0.000 0.002925621 0.00 6 55507 55816.805 0.02029 0.000 0.004627915 0.00 6 55507 55820.805 0.01787 0.000 0.006756535 0.00 6 55507 55887.805 0.00213 0.000 0.009488173 0.00 6 55507 55953.805 0.025053023 75.234 0.012286357 0.00 6 55507 56014.806 0.020799313 75.234 0.011590816 0.00 6 55507 56080.806 0.003262439 75.234 0.012489851 0.00 6 55507 56085.805 -0.002976502 75.234 0.011889591 0.00 6 55507 56284.806 -0.005790262 75.166 0.011166097 0.024282 6 55507 56486.805 -0.004832251 75.107 0.010450418 0.030644 6 55507 56619.806 -0.001653216 75.094 0.009839874 0.029037 6 55507 57107.805 0.001371581 75.134 0.009266278 0.030606 6 55507 57294.806 0.002566599 75.163 0.008678100 0.030363 6 55507 57565.806 0.003550332 75.220 0.008125067 0.034897 6 55507 57771.806 0.004306454 75.273 0.007605004 0.037617 6 55507 58259.806 0.004533934 75.405 0.007114285 0.058414 6 55507 58793.805 0.005079226 75.567 0.006657596 0.079074 6 55507 58851.806 0.005470119 75.585 0.006229144 0.074268 6 55507 59187.806 0.006022272 75.706 0.005830100 0.081514 6 55507 59256.805 0.006003279 75.731 0.005453563 0.076748 6 55507 59391.805 0.005970118 75.779 0.005101355 0.073773 6 55507 59919.805 0.006141090 75.972 0.004772263 0.097114 6 55507 60309.805 0.006432901 76.122 0.004465236 0.105107 6 55507 60441.805 0.006557244 76.173 0.004177077 0.06 6 55507 60635.805 0.006560367 76.249 0.003907298 0.097307 6 55507 60714.806 0.006276305 76.279 0.003656322 0.091620 6 55507 61240.806 0.004699083 76.426 0.003465336 0.100290 6 55507 61424.806 0.004688040 76.477 0.003241528 0.095558 6 55507 61566.805 0.004881127 76.519 0.003032940 0.090572 6 55507 62028.806 0.005194546 76.662 0.002839219 0.098669 6 55507 62146.805 0.005274662 76.699 0.002655997 0.093223 6 55507 62608.806 0.005172885 76.841 0.002484718 0.100701 6 55507 62937.806 0.004712989 76.934 0.002329922 0.099704 6 55507 63331.806 0.004662792 77.043 0.002179514 0.100980 6 55507 63397.805 0.004648114 77.061 0.002038756 0.094680 6 55507 63722.806 0.004832430 77.155 0.001908194 0.094547 6 55507 64246.805 0.004666484 77.301 0.001785916 0.102357 6 55507 64701.806 0.004691766 77.428 0.001670596 0.105788 6 55507 64768.805 0.004609847 77.446 0.001562968 0.099170 6 55507 65165.806 0.004017867 77.542 0.001476927 0.098667 6 55507 65548.806 0.003799573 77.628 0.001383693 0.097256 6 55507 65808.805 0.003839284 77.688 0.001294402 0.093375 6 55507 65893.805 0.003827564 77.707 0.001210810 0.087613 6 55507 66411.805 0.003819743 77.825 0.001132612 0.091952 6 55507 66611.806 0.003747307 77.870 0.001059771 0.087451 6 55507 66942.806 0.003664775 77.942 0.000991754 0.085704 6 55507 67268.805 0.003395993 78.008 0.000932556 0.083495 6 55507 67796.805 0.003322722 78.113 0.000872711 0.086411 6 55507 68061.806 0.003407083 78.166 0.000816891 0.083039 6 55507 68593.805 0.003187342 78.268 0.000768071 0.085501 6 55507 68989.805 0.003164909 78.342 0.000718508 0.084227 6 55507 69255.805 0.003198491 78.393 0.000672208 0.080801 6 55507 69428.805 0.003139896 78.425 0.000629134 0.076445 6 55507 69779.806 0.002965214 78.487 0.000591733 0.074796 6 55507 69976.806 0.002849590 78.521 0.000555023 0.070958 6 55507 70397.805 0.002597799 78.586 0.000526753 0.070263 6 55507 70501.806 0.002577416 78.602 0.000492785 0.065967 6 55507 70632.806 0.002619503 78.622 0.000461198 0.062129 6 55507 70941.809 0.002597823 78.670 0.000431480 0.060528 6 55507 71223.806 0.002607564 78.714 0.000403627 0.058701 6 55507 71471.806 0.002375362 78.749 0.000386381 0.056296 6 55507 71929.805 0.002268382 78.811 0.000363400 0.057030 6 55507 71941.805 0.002310620 78.813 0.000340257 0.053349 6 55507 72472.805 0.002392774 78.889 0.000319604 0.056633 6 55507 72668.805 0.002407426 78.917 0.000299007 0.053901 6 55507 72994.806 0.002265679 78.961 0.000284150 0.052767 6 With the clock off by 1ms at start, the frequency estimate is about 5 PPM low. Happy hunting, Dave Hart On Sun, Nov 7, 2010 at 00:58 UTC, David L. Mills mi...@udel.edu wrote: Dave, I think I have hunted down what is going on. It takes some serious investigation. Turns out the modern adjtime(), at least in some systems, is far from what I knew some years back. I have already described that era from what I knew of SunOS, Ultrix, OSF/1 and Tru64, since I or my graduate assistant implemented the precision time kernel used in those production systems. Today, at least Solaris and Linux have put up stuff that turns out to be absolute poison when attempting things like measuring frequency. The original model I started with was (a) messier the initial offset and tell adjtime() to slew the kernel to that offset; (b) after five minutes assume the kernel has largely completed the slew, measure the current offset anc compute the frequency. This is a little tricky, since the amount the kernel has slewed the time must be added before computing the time
Re: [ntp:questions] What level of timesynch error is typical onWinXP?
Dave, Notice toward the end of the calibration period adjtime() is called with only a small offset, so issues like the slew rate and residual offset are moot. Those calls should minimize the residual and the actual clock time should be within the measuement offset. Apparentyl, at least the FreeBSD mechanizm does not forget prior requests as the Solaris mechanism does. I tried it with the kernel disabledd with the same result. In the original precision time modifications for SunOS, Ultrix, OSF/1, HP-UX and Solaris some fifteen years ago, the kernels conformed to the original BSD semanitcs and all this claptrap about measuring clock frequency worked just fine. From recent experiments with initial clock offset in the 30-50 ms and hardware clock errors to 40 PPM, all this worked fine. I storngly suspect at least the Solaris and Tru64 precision time kernels have not been modified, but the adjtime() semantics has, at least for relativley large initial offsets. It is not an issue of limiting the slew rate to less than 500 PPM, as that is in fact the result shown in the looopstats trace. It seems at least some kernels do not forget past programmed offsets when presented with new ones. For that reason if no other, the mission to measure the intrinsic clock offset with large initial offsets is dead in the water. The source will be modified to entirely avoid all such initial training. Dave Dave Hart wrote: On Mon, Nov 8, 2010 at 05:14 UTC, David L. Mills mi...@udel.edu wrote: Thanks for the test. I verified the same thing. Note that the measured offset at the end of the frequency measurement phase was very small, so the net frequency measurement should be the same as Solaris. Obviously, FreeBSD is doing something very different than Solaris. I suspect Linux is doing something completely different as well. At this point I am prepared to abandon the mission entirely, as I don't want to get bogged down with the specifics of each idiosyncratic operating system. Accordingly, I will back out all the changes and revert to the bad old ugly algorithms. That is disappointing but I understand your frustration. I was hoping the remainder returned by adjtime() would allow ntpd to know exactly how much the OS had in fact slewed the clock, adapting to differing adjtime() implementations. Cheers, Dave Hart ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] What level of timesynch error is typical onWinXP?
Dave, I think I have hunted down what is going on. It takes some serious investigation. Turns out the modern adjtime(), at least in some systems, is far from what I knew some years back. I have already described that era from what I knew of SunOS, Ultrix, OSF/1 and Tru64, since I or my graduate assistant implemented the precision time kernel used in those production systems. Today, at least Solaris and Linux have put up stuff that turns out to be absolute poison when attempting things like measuring frequency. The original model I started with was (a) messier the initial offset and tell adjtime() to slew the kernel to that offset; (b) after five minutes assume the kernel has largely completed the slew, measure the current offset anc compute the frequency. This is a little tricky, since the amount the kernel has slewed the time must be added before computing the time difference. Well, this didn't work, apparently because the kernel didn't do what it was supposed to do. However, a hint is available in the form of the second phase of the training interval where the residual offset is amortized while holding the frequency constant. This involves periodically measuring the offset and updating the adjtime() programmed offset. This works, as evident my previous message. This could be due to a nonlinearity in the adjtime() calculation of the slew rate, which is apparently higher the larger the programmed offset. Apparently, the repeated calls to adjtime() eventually lowers the offset and thus the slew rate to something reasonable. Whatever the cause, the behavior when the frequency file is present is within expectations. So, I did the same thing during the frequency measurement phase, with result the following loopstats from a Solaris system with initial offset 122 ms, no frequency file and the kernel enabled. 55506 78196.297 0.122624404 0.000 0.043354274 0.00 4 55506 78200.301 0.10804 0.000 0.040745455 0.00 4 55506 78208.300 0.083775590 0.000 0.040190338 0.00 4 55506 78226.299 0.047307323 0.000 0.045698174 0.00 4 55506 78244.298 0.026714020 0.000 0.054270636 0.00 4 55506 78262.298 0.015085167 0.000 0.063186252 0.00 4 55506 78280.298 0.008518459 0.000 0.071337915 0.00 4 55506 78298.298 0.004810297 0.000 0.078439342 0.00 4 55506 78316.298 0.002716332 0.000 0.084507033 0.00 4 55506 78334.297 0.001533888 0.000 0.089652370 0.00 4 55506 78352.298 0.000866173 0.000 0.094006017 0.00 4 55506 78370.298 0.000489120 0.000 0.097751274 0.00 4 55506 78388.297 0.000276202 0.000 0.100850407 0.00 4 55506 78406.297 0.000155969 0.000 0.103480985 0.00 4 55506 78424.297 0.88074 0.000 0.105713875 0.00 4 55506 78442.297 0.49735 0.000 0.107604976 0.00 4 55506 78460.297 0.28085 0.000 0.109209608 0.00 4 55506 78478.298 0.15859 0.000 0.110570569 0.00 4 55506 78496.298 0.003133702 10.416 0.111724541 0.00 4 55506 78512.297 0.001806182 10.416 0.104509792 0.00 4 55506 78528.297 0.000873748 10.416 0.097760515 0.00 4 55506 78544.297 0.000362151 10.416 0.091446767 0.00 4 55506 78656.297 0.000353000 12.931 0.085540618 0.889337 4 55506 78688.298 0.000354000 13.104 0.080015921 0.834140 4 55506 78720.297 0.000226000 13.214 0.074848054 0.781241 4 55506 78736.297 -0.000163000 13.175 0.070014079 0.730920 4 55506 78864.299 -0.000234000 12.718 0.065492179 0.702547 4 55506 78912.299 -0.000289000 12.506 0.061262326 0.661420 4 55506 78944.298 -0.000286000 12.366 0.057305659 0.620669 4 55506 78992.297 -0.000286000 12.157 0.053604536 0.585287 5 55506 79008.297 -0.000233000 12.143 0.050142455 0.547509 5 55506 79269.297 -0.000469000 11.676 0.046904046 0.538099 5 55506 79430.297 -0.000319000 11.480 0.043874749 0.508089 5 55506 79495.297 -0.000265000 11.414 0.041041075 0.475841 5 55506 79755.304 -0.000206000 11.210 0.038390415 0.450932 5 55506 79852.297 -0.000257000 11.115 0.035910950 0.423146 5 55506 79949.297 -0.000249000 11.023 0.033591618 0.397155 6 It starts at second 78196 with offset 122 ms and frequency zero. At 300 s later, second 78496, the frequency is set at 10.496, which happens to be within 1 PPM of the nominal value. At this point the residual offset is about 3.1 ms. At second 78544 the residual offset drops below 0.5 ms and the frequency clamp is removed. However, there is about a 0.3 ms residual offset, which at this low poll interval of 16 s and low time constant makes the frequency loop rather sensitive, so the frequency jumps to 12.9 PPM. While this slowly subsides to the e nominal value, note the residual offset stays below 0.35 ms. Victory is declared I haven't tried this on other machines, but the sheer blunderbuss approach here should tame even them. Your exploits to the contrary are invited. The new code is in the backroom, but not yet a snapshot. Dave Dave Hart wrote: On Sat, Nov 6, 2010 at 03:34 UTC, David L. Mills mi...@udel.edu wrote: Now to the apparent initial frequency
Re: [ntp:questions] calldelay syntax error ntp4.2.7p77 ?
Steve, Clarification received. My understanding was that the current ntp-dev documentation was published on the web site, but that assumption apparently is defective. The release version is so far behind the development version that the two are essentially completely different protocols. They don't belong on the same web site. In any case, a prospective development user must first obtain the distribution, then read the release notes to see if it would be useful. At least, the release notes should be on the web site. Dave Steve Kostecke wrote: On 2010-11-05, David L. Mills mi...@udel.edu wrote: The calldelay option is not mentioned in the master copy of the documentation that resides here. Sometimes there can be considerable delay for it to be published at ntp.org. doc.ntp.org is an archive which houses copies of the original Official Distribution Documentation for production (i.e. stable) releases of NTP. doc.ntp.org links to the NTP-dev documentation you maintain. We do not mirror your documentation tree. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] What level of timesynch error is typical onWinXP?
Dave, Further investigation continues; however, you have a fundamental misunderstanding of the slew limit in Soalris or any other BSD-semantics system. The slew is not a limit, it is an intrinsic constant equal to 500 PPM. The slew is implemented in this way. When adjtime() is called, it computes the number of timer ticks necessary to amortize the given offset at a fixed amount at each tick. Those kernels I have seen, including SunOS, Ultrix, OSF/1, Tru64 and Solaris, use an increment of 5 microseconds at each interrupt of a 100-Hz clock, which results in a slew rate of 500 PPM. There is no issue of exceeding the slew rate, it is an intrinsic constant compiled in the kernel. In days past, this value could be tinkered using the tickadj() program, but today that is not always possible. The test I did was to run two tests using initial offsets of +90 ms and -90 s. The tests use ntptime -f 500 or -f -500 for three minutes in order to change the offset to about +-90 ms, then start ntpd normally. The frequency file was present and very near the expected hardware frequency. My results were far different than yours; offset converges to within 0.5 ms and frequency surge less than 1 PPM. We need to explain why the results are so far different. Dave Dave Hart wrote: I have reproduced the problems Miroslav reports with the new startup behavior on FreeBSD, indicating the problem is not isolated to Linux. My guess is Solaris is the outlier here, because its adjtime() doesn't enforce a 500 PPM cap. The system I'm using is psp-fb2, a FreeBSD 6.4 x86 machine with the latest ntp-dev snapshot which is known to normally keep time well. I ran several tests, but I think the last is the most interesting. I stopped ntpd and removed its drift file (containing 80.530), then slewed the time backward 100 msec using adjtime() and patience. After restarting it takes a few minutes for its three manycast sources to be found and sys.peer to be declared. Five minutes later, the frequency is mis-estimated at 236 PPM, about 155 PPM too much. As a result, the system sets up for quite a long settling period with many steps. Below are the logs and some billboard snapshots. Please ignore the refclock, it is marked noselect. If your FreeBSD test machine has a negative ntp.drift, you might see worse behavior with a positive initial offset. I am repeating the test with the correct drift file in place now. Cheers, Dave Hart loopstats 55505 59851.663 0.102341957 0.000 0.036183346 0.00 6 55505 60190.664 0.080163684 236.465 0.032321622 0.00 6 55505 60257.664 0.053082763 236.465 0.031713930 0.00 6 55505 60325.664 0.009274475 236.465 0.033465616 0.00 6 55505 60393.664 -0.029456239 236.465 0.034168151 0.00 6 55505 60458.664 -0.041054653 236.465 0.032223363 0.00 6 55505 60459.664 -0.041380233 236.465 0.030142416 0.00 6 55505 60662.664 -0.006418134 236.076 0.030786168 0.137559 6 55505 60943.120 -0.066341200 234.964 0.035751383 0.413381 6 55505 61215.145 -0.076412713 233.726 0.033631393 0.584265 6 55505 61334.664 -0.090128710 233.086 0.031830847 0.591422 6 55505 61669.516 0.0 233.086 0.01907 0.553225 6 55505 61748.516 -0.027231614 232.958 0.009627829 0.519476 6 55505 61883.516 -0.038290456 232.650 0.009818119 0.497985 6 55505 62351.516 -0.066567566 230.793 0.013575544 0.804986 6 55505 62555.516 -0.099115749 229.588 0.017137138 0.865194 6 55505 63161.378 0.0 229.588 0.01907 0.809315 6 55505 63244.058 -0.033000775 229.425 0.011667536 0.759242 6 55505 63358.379 -0.044957700 229.119 0.011704101 0.718371 6 ntp.log 5 Nov 16:27:36 ntpd[57402]: ntpd exiting on signal 15 5 Nov 16:34:05 ntpd[58309]: Listen normally on 10 multicast ff05::101 UDP 123 5 Nov 16:34:05 ntpd[58309]: Added Multicast Listener ff05::101 on interface #10 multicast 5 Nov 16:34:05 ntpd[58309]: NTP PARSE support: Copyright (c) 1989-2009, Frank Kardel 5 Nov 16:34:05 ntpd[58309]: PARSE receiver #0: initializing PPS to ASSERT 5 Nov 16:34:05 ntpd[58309]: PARSE receiver #0: reference clock Meinberg GPS16x receiver (I/O device /dev/refclock-0, PPS device /dev/refclock-0) added 5 Nov 16:34:05 ntpd[58309]: PARSE receiver #0: Stratum 0, trust time 4d+00:00:00, precision -19 5 Nov 16:34:05 ntpd[58309]: PARSE receiver #0: rootdelay 0.00 s, phase adjustment 0.001968 s, PPS phase adjustment 0.00 s, normal IO handling 5 Nov 16:34:05 ntpd[58309]: PARSE receiver #0: Format recognition: Meinberg GPS Extended 5 Nov 16:34:05 ntpd[58309]: PARSE receiver #0: PPS support (implementation PPS API) 5 Nov 16:34:05 ntpd[58309]: GENERIC(0) 8011 81 mobilize assoc 62401 5 Nov 16:34:05 ntpd[58309]: refclock_newpeer: clock type 43 invalid 5 Nov 16:34:05 ntpd[58309]: 127.127.43.0 interface 127.0.0.1 - null 5 Nov 16:34:05 ntpd[58309]: 192.168.4.255 c811 81 mobilize assoc 62403 5 Nov 16:34:05 ntpd[58309]: ff05::101 c811 81 mobilize assoc 62404 5 Nov 16:34:05 ntpd[58309]: 0.0.0.0 c016 06 restart 5 Nov 16:34:05 ntpd[58309]: 0.0.0.0
Re: [ntp:questions] What level of timesynch error is typical onWinXP?
Dave, We need some order here, as what you report with a pre-existing frequency file is quite dubious. Consider the two runs here, both with an existing frequency file and starting at both plus and minus 90 ms offsets: howland initial offset -91.7 ,s 55505 63959.804 -0.01000 -7.250 0.11310 0.003221 6 55505 64177.095 -0.091755534 -7.247 0.032440480 0.00 4 55505 64181.093 -0.083409937 -7.247 0.030488404 0.00 4 55505 64189.093 -0.064119966 -7.247 0.029323417 0.00 4 55505 64207.094 -0.035152179 -7.247 0.029279200 0.00 4 55505 64225.094 -0.019192498 -7.247 0.027963396 0.00 4 55505 64245.095 -0.010407119 -7.247 0.026341136 0.00 4 55505 64263.095 -0.005596762 -7.247 0.024698501 0.00 4 55505 64281.095 -0.002971348 -7.247 0.023121971 0.00 4 55505 64301.095 -0.001484549 -7.247 0.021635011 0.00 4 55505 64319.095 -0.000680614 -7.247 0.020239695 0.00 4 55505 64393.095 -0.00042 -8.676 0.018932726 0.505275 4 55505 64535.096 0.00013 -8.394 0.017711010 0.483019 4 55505 64567.095 0.000178000 -8.308 0.016567141 0.452867 4 55505 64695.095 0.000156000 -8.003 0.015497143 0.437100 4 55505 64711.095 0.000131000 -7.971 0.014496253 0.409026 4 55505 64743.095 0.89000 -7.927 0.013560011 0.382917 4 55505 64871.095 0.81000 -7.769 0.012684229 0.362527 4 55505 64887.095 0.000108000 -7.743 0.011865014 0.339241 5 55505 65065.095 0.65000 -7.699 0.011098715 0.317714 5 55505 65325.095 0.85000 -7.614 0.010381899 0.298685 5 55505 65421.095 0.000131000 -7.566 0.009711391 0.279909 5 55505 65616.095 0.88000 -7.501 0.009084187 0.262852 5 howland initial offset 93/7 55505 66464.549 0.093707184 -7.247 0.033130493 0.00 4 55505 66468.551 0.085202918 -7.247 0.031136252 0.00 4 55505 66476.551 0.065630650 -7.247 0.029936050 0.00 4 55505 66494.550 0.036251561 -7.247 0.029866998 0.00 4 55505 66512.549 0.020055273 -7.247 0.028518816 0.00 4 55505 66532.549 0.011178952 -7.247 0.026860866 0.00 4 55505 66550.549 0.006275409 -7.247 0.025185779 0.00 4 55505 66568.549 0.003628803 -7.247 0.023577714 0.00 4 55505 66588.549 0.002073854 -7.247 0.022061783 0.00 4 55505 66606.548 0.001265987 -7.247 0.020638884 0.00 4 55505 66624.549 0.000840837 -7.247 0.019306494 0.00 4 55505 2.549 0.000598000 -5.376 0.018059775 0.661350 4 55505 66736.549 0.49000 -5.321 0.016894488 0.618946 4 55505 66870.549 -0.000102000 -5.530 0.015803437 0.583647 4 55505 66886.549 -0.000258000 -5.593 0.014782864 0.546406 4 55505 67014.549 -0.000187000 -5.958 0.013828126 0.527176 4 55505 67110.548 -0.000212000 -6.268 0.012935031 0.505203 4 55505 67126.549 -0.000198000 -6.317 0.012099614 0.472883 4 55505 67254.548 -0.000151000 -6.612 0.011318165 0.454465 5 55505 67270.549 -0.9 -6.617 0.010587196 0.425117 5 55505 67336.549 -0.000115000 -6.646 0.009903419 0.397792 5 In both cases, the zeros in the wander column confirm that the frequency is not adjusted until either the offset falls below 0.5 ms or the 5-minute threshold runs out and the residual offset tickles the frequency. Considering this, it seems a stretch that the frequency is kicked leading to a massive frequency error. If your tests confirm this, please advise. Now to the apparent initial frequency error. This is new, as tests in the past have not confirmed that. I need to plant some debug code in direct-freq(). Dave Dave Hart wrote: I have reproduced the problems Miroslav reports with the new startup behavior on FreeBSD, indicating the problem is not isolated to Linux. My guess is Solaris is the outlier here, because its adjtime() doesn't enforce a 500 PPM cap. The system I'm using is psp-fb2, a FreeBSD 6.4 x86 machine with the latest ntp-dev snapshot which is known to normally keep time well. I ran several tests, but I think the last is the most interesting. I stopped ntpd and removed its drift file (containing 80.530), then slewed the time backward 100 msec using adjtime() and patience. After restarting it takes a few minutes for its three manycast sources to be found and sys.peer to be declared. Five minutes later, the frequency is mis-estimated at 236 PPM, about 155 PPM too much. As a result, the system sets up for quite a long settling period with many steps. Below are the logs and some billboard snapshots. Please ignore the refclock, it is marked noselect. If your FreeBSD test machine has a negative ntp.drift, you might see worse behavior with a positive initial offset. I am repeating the test with the correct drift file in place now. Cheers, Dave Hart loopstats 55505 59851.663 0.102341957 0.000 0.036183346 0.00 6 55505 60190.664 0.080163684 236.465 0.032321622 0.00 6 55505 60257.664 0.053082763 236.465 0.031713930 0.00 6 55505 60325.664 0.009274475 236.465 0.033465616 0.00 6 55505 60393.664 -0.029456239 236.465 0.034168151 0.00 6 55505 60458.664 -0.041054653 236.465 0.032223363 0.00 6 55505 60459.664 -0.041380233 236.465 0.030142416 0.00 6
Re: [ntp:questions] What level of timesynch error is typical onWinXP?
Miroslav, The NTP daemon purposely ignores the leftover from adjtime(). To do otherwise would invite massive instability. Each time an NTP update is received, a new offset estimate is available regardless of past history. Therefore, the intent is to ignore all past history and start with a fresh update. Note that the slew rate of adjtime() is not a factor with the kernel discipline. Dave Miroslav Lichvar wrote: On Wed, Nov 03, 2010 at 04:06:39PM +, Dave Hart wrote: On Wed, Nov 3, 2010 at 09:24 UTC, Miroslav Lichvar mlich...@redhat.com wrote: On Tue, Nov 02, 2010 at 10:03:30PM +, David L. Mills wrote: I ran the same test here on four different machines with the expected results. These included Solaris on both SPARC and Intel machines, as well as two FreeBSD machines. [...] Ok, I think I have found the problem. The adj_systime() routine is called from adj_host_clock() with adjustments over 500 microseconds, which means ntpd is trying to adjust the clock at a rate higher than what uses the Linux adjtime(). It can't keep up and the lost offset correction is what makes the ~170ppm frequency error. Congratulations on isolating the problem. If adjtime() is returning failure, ntpd will log that mentioning adj_systime. Do you see any of those? No, it's not an error in usage, adjtime() just don't have enough time to apply whole correction and ntpd doesn't check the leftover, so the offset is adjusted actually slower than what ntpd is assuming. Is it a feature or a bug that FreeBSD and Solaris can apparently slew faster than 500 PPM using adjtime()? If it's a feature, is there a way we can detect at configure time what the adjtime() slew limit is without actually trying it? We don't want to require root for configure. Probably not. I think I saw on one BSD system only 100ppm rate, so it will have to be clamped to either the lowest rate from all supported systems or to a constant defined in the configure script based on the system and version. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] What level of timesynch error is typical onWinXP?
Miroslav, Wrong. The damon starts off be setting the frequency to zero, as you can see in the protostats. When the frequency calibration is complete, the frequency is set directly, as you can also see in the protostats. It could be a massively broken motherboard might set the frequency larger than 500 PPM, but in your case it set it at 172 PPM, well within the tolerance. During the 5 minutes following the direct set, the frequency update is suppressed, so there is no clamp. So, please explain where to you find the bug. Dave Miroslav Lichvar wrote: On Wed, Nov 03, 2010 at 03:54:33PM +, David L. Mills wrote: The daemon clamps the adjtime() (sic) offset to 500 PPM, which is consistent with ordinary Unix semantics. No, during that new fast phase correction on start it's not clamped to anything. That's the bug I'm hitting here. If 500 ppm is the standard rate, Linux is working fine and the other systems are the bad ones. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] What level of timesynch error is typical onWinXP?
Miroslav, IT IS NOT S BUG. Specifically, the Unix adjtime() semantics allows any argument, even if bizarre. The slew rate is constant at 500 PPM; the duration of the slew is calculate to amortize the argument as given. There is no way to exceed the slew rate; it is constant. You and Linux may have a different view, but the NTP implementation conforms to the traditional Unix semantics. Dave Miroslav Lichvar wrote: On Thu, Nov 04, 2010 at 08:32:06PM +, David L. Mills wrote: Wrong. The damon starts off be setting the frequency to zero, as you can see in the protostats. When the frequency calibration is complete, the frequency is set directly, as you can also see in the protostats. It could be a massively broken motherboard might set the frequency larger than 500 PPM, but in your case it set it at 172 PPM, well within the tolerance. During the 5 minutes following the direct set, the frequency update is suppressed, so there is no clamp. So, please explain where to you find the bug. The bug is in the adj_host_clock() function in ntp_loopfilter.c. On startup when freq_cnt 0, a reduced time constant is used which makes the adjustment so large that the adjtime() argument is over 500 microseconds. On systems following the standard slew rate 500 ppm, the adjustment will be applied only partially in the one second interval it has and so there will be error in clock_offset. The missing offset is the cause of the 172ppm error in the frequency estimation. In order to fix this bug without checking the adjtime() leftover, a clamp for the adjustment has to be added to that function, so adjustment + drift_comp stays below 500 microseconds (or whatever value is appropriate for the system). For example: if (adjustment + drift_comp 500e-6) adjustment = 500e-6 - drift_comp; else if (adjustment + drift_comp -500e-6) adjustment = -500e-6 - drift_comp; clock_offset -= adjustment; adj_systime(adjustment + drift_comp); ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] What level of timesynch error is typical onWinXP?
Bill, You have absolutely no idea what you are talking about and you do reveal an abysmal lack of understudying of control theory. To incorporate past history in future controls when the current control variable is measured violates the time delay constrain. Go back to the books. Dave unruh wrote: On 2010-11-04, David L. Mills mi...@udel.edu wrote: Miroslav, The NTP daemon purposely ignores the leftover from adjtime(). To do otherwise would invite massive instability. Each time an NTP update is received, a new offset estimate is available regardless of past history. Therefore, the intent is to ignore all past history and start with a fresh update. Note that the slew rate of adjtime() is not a factor with the kernel discipline. That is of course a philosophical position, and a strange one. Clocks are largely predictable systems ( that is why they are used as clocks). Thus the past history is strongly determinative of what the future behaviour will be. To act as if this is not true, that each now measurement should be treated as if it completely disconnected with the past is a strange way of treating a highly predictable system. That is of course one of the key places where ntpd and chrony differ. The evidence is that properly taking account of the past does not create massive instability but rather creates far more accurate discipling of the clock than does past amnesia. Dave Miroslav Lichvar wrote: On Wed, Nov 03, 2010 at 04:06:39PM +, Dave Hart wrote: On Wed, Nov 3, 2010 at 09:24 UTC, Miroslav Lichvar mlich...@redhat.com wrote: On Tue, Nov 02, 2010 at 10:03:30PM +, David L. Mills wrote: I ran the same test here on four different machines with the expected results. These included Solaris on both SPARC and Intel machines, as well as two FreeBSD machines. [...] Ok, I think I have found the problem. The adj_systime() routine is called from adj_host_clock() with adjustments over 500 microseconds, which means ntpd is trying to adjust the clock at a rate higher than what uses the Linux adjtime(). It can't keep up and the lost offset correction is what makes the ~170ppm frequency error. Congratulations on isolating the problem. If adjtime() is returning failure, ntpd will log that mentioning adj_systime. Do you see any of those? No, it's not an error in usage, adjtime() just don't have enough time to apply whole correction and ntpd doesn't check the leftover, so the offset is adjusted actually slower than what ntpd is assuming. Is it a feature or a bug that FreeBSD and Solaris can apparently slew faster than 500 PPM using adjtime()? If it's a feature, is there a way we can detect at configure time what the adjtime() slew limit is without actually trying it? We don't want to require root for configure. Probably not. I think I saw on one BSD system only 100ppm rate, so it will have to be clamped to either the lowest rate from all supported systems or to a constant defined in the configure script based on the system and version. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] What level of timesynch error is typical onWinXP?
Dave, How does this issue persist? NTP does not limit the slew rate, adjtime() does, and NTP does not care. The adjtime() semantics is the same whether or not the slew rate is exceeded and for any programmed offset. The only issue is whether the programmed offset is complete in the time allowed, in this case five minutes. If it does not complete, the discipline reverts to the ordinary algorithm, which will probably surge as without the initial training period. At 500 PPM, the slew rate is 0.5 ms/s or 30 ms/min or 150 ms in five minutes. This is sufficient for a maximum offset of 128 ms before the step kicks in. And yes, as reported several times, I have tested it with and without when the step is exceeded, but not with Linux. Dave ms/5 mini. Dave Hart wrote: On Thu, Nov 4, 2010 at 22:22 UTC, David L. Mills mi...@udel.edu wrote: Miroslav, IT IS NOT S BUG. Specifically, the Unix adjtime() semantics allows any argument, even if bizarre. The slew rate is constant at 500 PPM; the duration of the slew is calculate to amortize the argument as given. There is no way to exceed the slew rate; it is constant. You and Linux may have a different view, but the NTP implementation conforms to the traditional Unix semantics. Dave Dr. Mills, I think you will find Miroslav in agreement with your view about how adjtime() works. As I understand it, the issue he's raising is that during the initial offset convergence period only, ntpd does not limit its slew rate to 500 PPM, as it does otherwise. When it exceeds 500 PPM, ntpd is assuming the full adjustment + drift_comp value has been applied, when only the first 500 PPM has been. As a result, ntpd stops slewing the clock sooner than it should. If that analysis is correct, two possible solutions come to mind. Either ntpd can limit its slew rate during initial convergence to 500 PPM, or it can go faster on systems which allow it by using the residual returned by adjtime() to accurately account for how much the clock is being slewed. Incidentally, Miroslav wrote a simple test program to measure the maximum slew rate of adjtime(), by requesting a constant 10 PPM correction every second, and displaying the difference between the requested amount and the residual returned: #include stdio.h #include sys/time.h #include unistd.h int main() { struct timeval tv, otv; tv.tv_sec = 0; tv.tv_usec = 10; while (1) { if (adjtime(tv, otv)) { printf(fail\n); return 1; } printf(%ld\n, tv.tv_usec - otv.tv_usec); sleep(1); } return 0; } On Linux and FreeBSD 6.4, this displays a rock-steady 500 every second after the first. On OpenSolaris, it's around 63000. Presumably the Solaris 10 you're using is similar. If so, you should be able to reproduce Miroslav's problem on FreeBSD, but not Solaris, which will happily apply up to 63 msec/sec of slew. By starting with an initial offset just under the step threshold, ntpd should exceed FreeBSD adjtime()'s ability to keep up during the initial offset convergence. Cheers, Dave Hart ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] calldelay syntax error ntp4.2.7p77 ?
BlackLists, The calldelay option is not mentioned in the master copy of the documentation that resides here. Sometimes there can be considerable delay for it to be published at ntp.org. Sorry about that, but I have no control over the publishing process. I am told the relevant documentation is in the current ntp-dev that leaves here. In any case, calldelay is no more, believe me. Dave E-Mail Sent to this address will be added to the BlackLists wrote: BlackLists wrote: Somewhere between ntp4.2.7p41 and ntp4.2.7p77 calldelay 69 became invalid ntp.conf syntax? {commenting it out eliminates the error.} log: syntax error in ntp.conf line 15, column 1 line 15 column 1 syntax error, unexpected T_String, expecting $end Nevermind, I found it in the diffs circa 4.2.7p63/64 2010-oct-13/15 FYI, it is still mentioned twice in http://doc.ntp.org/4.2.6/confopt.html with a link to http://doc.ntp.org/4.2.6/miscopt.html ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] What level of timesynch error is typical onWinXP?
Mirosalv, Why didn't you tell me you are using Linux? All bets are off. You are on your own. The daemon clamps the adjtime() (sic) offset to 500 PPM, which is consistent with ordinary Unix semantics. The Unix adjtime() syscall can return the amount of time not amortized since the lasdt call and leaves it up to the user to include in the next call. For nptd, the leftover is ignored, since a new update has been measured from scratch. I don't know how Linux gotother ideas. You might consider using FreeBSD. Dave Miroslav Lichvar wrote: On Tue, Nov 02, 2010 at 10:03:30PM +, David L. Mills wrote: I ran the same test here on four different machines with the expected results. These included Solaris on both SPARC and Intel machines, as well as two FreeBSD machines. I tested with and without the kernel, with initial offset 300 ms (including step correction) and 100 ms. I tested with initial poll interval of 16 s and 64 s. At 16 s, the leftover offset after frequency update was about half a millisecond within spec, but this torqued the frequency about 2 PPM as expected. It settled down within 1 PM within 20 m. Bottom line; I cannot verify your experience. Ok, I think I have found the problem. The adj_systime() routine is called from adj_host_clock() with adjustments over 500 microseconds, which means ntpd is trying to adjust the clock at a rate higher than what uses the Linux adjtime(). It can't keep up and the lost offset correction is what makes the ~170ppm frequency error. But I have to wonder at what rate adjtime() slews the clock on Solaris and FreeBSD, it certainly has to be over 3000 ppm. Anyway, clamping adjustment in adj_host_clock() so adjustment + drift_clock doesn't go over 500 microseconds fixed the problem for me. The error in frequency estimation is now below 2 ppm. I have also rerun the original test with drift file and ntpd now reaches the 200us sync in less than 2000 seconds with all initial offsets. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] What level of timesynch error is typical onWinXP?
David, In Mirsolav's test, the frequency was computed at 172 PPM, which is well within the capabilities of the algorithm. On the other hand, even if the intrinsic hardware frequency error is more than 500 PPM, the frequency will be set to 500 PPM and the daemon will continue normally, but will not be able to reduce the residual time offset to zero. Mirsolav's experience is far different; apparently, the computed frequency was not installed and the daemon continued at that frequency error and exceed the step threshold every twenty minutes. This apparently occurred whether or not the kernel was enabled and whether or not the simulator was involved. Again I stress no such behavior occurs with Solaris or FreeBSD, so something might be affected by Linux adjtime functionality. It might help to eyeball the ntp_loopfilter.c and the direct_freq() routine. There might be some angst with the Linux semantics. Dave David Woolley wrote: David L. Mills wrote: I don't think that is right. The adjtime() call can be in principle anything, accoridng to the Solaris and FreeBSD man pages, but the rate of adjustment is fixed at 500 PPM in the Unix implementation. If the Linux argument is limited to 500 microseconds, Linux is essentially unusable with NTP. I would be surprised if this were the case. I think what he is really saying is that he is not using the kernel discipline and ntpd is tweaking the clock every second, but he has broken hardware, which requires a correction of more than 500ppm, and, as he is describing it, adjtime has a residual correction to apply before the next tweak, or more likely ntpd is limiting it to 500ppm. As to Linux, I would guess most users of ntpd are using Linux. Miroslav: ntpd requires an uncorrected clock that is good to significantly better than 500ppm. You can probably get away with 450ppm, but the transient response will be compromised. A good quality PC should be within about 10ppm. A cheap one should be within about 50ppm. 500ppm is broken. You can use tickadj to compensate in steps of 100ppm, but a machine with that error is likely to have other problems; the crystal may be barely disciplining the oscillator. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] What level of timesynch error is typical onWinXP?
Mirslav, You have something seriously wrong. The frequency was apparently set correctly at -168 PPM, then shortly after that there was a series of step corrections about 172 ms in 20 min, which is about 170 PPM. In other words, the frequcny change did not work. It certainly worke here, even with the kernel enabled. You should see the frequency change in the loopstats. If not, try disabling the kernel to see if that is the problem. Dave, Miroslav Lichvar wrote: On Wed, Oct 27, 2010 at 11:55:22PM +, David L. Mills wrote: See the most recent ntp-dev. It needed some tuning. Hm, with ntp-dev-4.2.7p74 it still doesn't seem to work as expected. In the same testing conditions as I used the last time, the frequency estimate is now 170 ppm off and the clock is stepped several times before it settles down. http://mlichvar.fedorapeople.org/tmp/ntp_start4.png 1 Jan 01:00:00 ntpd.new2[3135]: ntpd 4.2.7...@1.2307 Mon Nov 1 12:20:56 UTC 2010 (1) 1 Jan 01:00:00 ntpd.new2[3135]: proto: precision = 0.101 usec 1 Jan 01:00:00 ntpd.new2[3135]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16 1 Jan 01:00:00 ntpd.new2[3135]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123 1 Jan 01:00:00 ntpd.new2[3135]: Listen normally on 1 lo 127.0.0.1 UDP 123 1 Jan 01:00:00 ntpd.new2[3135]: Listen normally on 2 eth0 192.168.123.4 UDP 123 1 Jan 01:00:00 ntpd.new2[3135]: 192.168.123.1 8011 81 mobilize assoc 9201 1 Jan 01:00:00 ntpd.new2[3135]: 0.0.0.0 c016 06 restart 1 Jan 01:00:00 ntpd.new2[3135]: 0.0.0.0 c012 02 freq_set ntpd 0.000 PPM 1 Jan 01:00:00 ntpd.new2[3135]: 0.0.0.0 c011 01 freq_not_set 1 Jan 01:00:01 ntpd.new2[3135]: 192.168.123.1 8024 84 reachable 1 Jan 01:03:17 ntpd.new2[3135]: 192.168.123.1 963a 8a sys_peer 1 Jan 01:03:17 ntpd.new2[3135]: 0.0.0.0 c614 04 freq_mode 1 Jan 01:08:43 ntpd.new2[3135]: 0.0.0.0 0612 02 freq_set ntpd -168.875 PPM 1 Jan 01:08:43 ntpd.new2[3135]: 0.0.0.0 0615 05 clock_sync 1 Jan 01:32:42 ntpd.new2[3135]: 0.0.0.0 0613 03 spike_detect +0.160001 s 1 Jan 01:41:23 ntpd.new2[3135]: 0.0.0.0 061c 0c clock_step +0.172040 s 1 Jan 01:41:23 ntpd.new2[3135]: 0.0.0.0 0615 05 clock_sync 1 Jan 01:41:24 ntpd.new2[3135]: 0.0.0.0 c618 08 no_sys_peer 1 Jan 01:41:24 ntpd.new2[3135]: 192.168.123.1 8044 84 reachable 1 Jan 01:45:53 ntpd.new2[3135]: 192.168.123.1 965a 8a sys_peer 1 Jan 01:58:07 ntpd.new2[3135]: 0.0.0.0 0613 03 spike_detect +0.140452 s 1 Jan 02:06:52 ntpd.new2[3135]: 0.0.0.0 061c 0c clock_step +0.170957 s 1 Jan 02:06:52 ntpd.new2[3135]: 0.0.0.0 0615 05 clock_sync 1 Jan 02:06:53 ntpd.new2[3135]: 0.0.0.0 c618 08 no_sys_peer ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] systems won't synchronize no matter what
Mirslav, There is a very good reason. First, the kernel can an only be switched between PLL and FLL mode discreetly, while the daemon has a gradual transition between modes so that the poll interval can vary seamlessly between 8 s and 36 hr. Second, the kernel PLL is most useful to minimize sawtooth errors, and is no better than the daemon loop to track incidental frequency noise. Seldom if ever is it useful to switch to FLL mode at poll intervals less than 1024 s, unless the incidental phase noise is less than a microsecond. It may happen that the kernel PLL switches modes occasionally during operation at 1024 s, but that does not disturb timekeeping accuracy. Recent versions of ntpd suppress log messages when switching modes like that. Dave Miroslav Lichvar wrote: On Sat, Oct 30, 2010 at 09:09:17AM -0700, Chuck Swiger wrote: http://www.ece.udel.edu/~mills/database/papers/nano/nano.pdf In operation, PLL mode is preferred at small update intervals and time constants and FLL mode at large intervals and time constants. The optimum crossover point between the PLL and FLL modes, as determined by simulation and analysis, is the Allan intercept. As a compromise, the PLL/FLL algorithm operates in PLL mode for update intervals of 256 s and smaller and in FLL mode for intervals of 1024 s and larger. Between 256 s and 1024 s the mode is specified by the API. This behavior parallels the NTP daemon behavior, except that in the latter the weight given the FLL prediction is linearly interpolated from zero at 256 s to unity at 1024 s. Is there a reason why ntpd doesn't make use of the STA_FLL flag? I think it would be nice if ntpd switched the FLL limit to 256 s with tinker allan 8 or lower. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] peers using same server
Miroslav, Depends on the synchronization distance. See the online page How NTP Works. Dave Miroslav Lichvar wrote: I've come across an interesting problem and I'm not sure if this is a bug or feature. When two peers are configured to use the same server as their source and the link between one peer and the server goes down, the peer doesn't switch to the other peer and will stay unsynchronized. 1 Jan 01:00:00 ntpd[3468]: ntpd 4.2.7...@1.2297 Mon Oct 25 09:43:53 UTC 2010 (1) 1 Jan 01:00:00 ntpd[3468]: proto: precision = 0.101 usec 1 Jan 01:00:00 ntpd[3468]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16 1 Jan 01:00:00 ntpd[3468]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123 1 Jan 01:00:00 ntpd[3468]: Listen normally on 1 lo 127.0.0.1 UDP 123 1 Jan 01:00:00 ntpd[3468]: Listen normally on 2 eth0 192.168.123.4 UDP 123 1 Jan 01:00:00 ntpd[3468]: 192.168.123.1 8011 81 mobilize assoc 9201 1 Jan 01:00:00 ntpd[3468]: 192.168.123.3 8011 81 mobilize assoc 9202 1 Jan 01:00:00 ntpd[3468]: 0.0.0.0 c016 06 restart 1 Jan 01:00:00 ntpd[3468]: 0.0.0.0 c012 02 freq_set kernel 100.000 PPM 1 Jan 01:05:29 ntpd[3468]: 192.168.123.3 8024 84 reachable 1 Jan 01:08:44 ntpd[3468]: 192.168.123.3 963a 8a sys_peer 1 Jan 01:08:44 ntpd[3468]: 0.0.0.0 c615 05 clock_sync 1 Jan 19:11:37 ntpd[3468]: 192.168.123.1 8024 84 reachable 1 Jan 19:40:48 ntpd[3468]: 192.168.123.1 963a 8a sys_peer 1 Jan 22:03:04 ntpd[3468]: 192.168.123.1 8643 83 unreachable 1 Jan 22:07:08 ntpd[3468]: 0.0.0.0 0618 08 no_sys_peer 2 Jan 16:07:01 ntpd[3468]: 192.168.123.1 8054 84 reachable 2 Jan 16:12:32 ntpd[3468]: 192.168.123.1 966a 8a sys_peer 2 Jan 18:58:12 ntpd[3468]: 192.168.123.1 8673 83 unreachable 2 Jan 19:00:34 ntpd[3468]: 0.0.0.0 0628 08 no_sys_peer But when each peer uses a different server, they will switch to the other peer quickly when the link goes down. 1 Jan 01:00:00 ntpd[3635]: ntpd 4.2.7...@1.2297 Mon Oct 25 09:43:53 UTC 2010 (1) 1 Jan 01:00:00 ntpd[3635]: proto: precision = 0.101 usec 1 Jan 01:00:00 ntpd[3635]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16 1 Jan 01:00:00 ntpd[3635]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123 1 Jan 01:00:00 ntpd[3635]: Listen normally on 1 lo 127.0.0.1 UDP 123 1 Jan 01:00:00 ntpd[3635]: Listen normally on 2 eth0 192.168.123.4 UDP 123 1 Jan 01:00:00 ntpd[3635]: 192.168.123.2 8011 81 mobilize assoc 9201 1 Jan 01:00:00 ntpd[3635]: 192.168.123.3 8011 81 mobilize assoc 9202 1 Jan 01:00:00 ntpd[3635]: 0.0.0.0 c016 06 restart 1 Jan 01:00:00 ntpd[3635]: 0.0.0.0 c012 02 freq_set kernel 100.000 PPM 1 Jan 01:05:29 ntpd[3635]: 192.168.123.3 8024 84 reachable 1 Jan 01:09:54 ntpd[3635]: 192.168.123.3 963a 8a sys_peer 1 Jan 01:09:54 ntpd[3635]: 0.0.0.0 c615 05 clock_sync 1 Jan 02:49:28 ntpd[3635]: 192.168.123.2 8024 84 reachable 1 Jan 03:11:52 ntpd[3635]: 192.168.123.2 963a 8a sys_peer 1 Jan 05:40:36 ntpd[3635]: 192.168.123.2 8643 83 unreachable 1 Jan 05:48:17 ntpd[3635]: 192.168.123.3 961a 8a sys_peer 1 Jan 07:22:11 ntpd[3635]: 192.168.123.2 8054 84 reachable 1 Jan 08:01:28 ntpd[3635]: 192.168.123.2 966a 8a sys_peer 1 Jan 10:13:37 ntpd[3635]: 192.168.123.2 8673 83 unreachable 1 Jan 10:20:08 ntpd[3635]: 192.168.123.3 961a 8a sys_peer Any explanation for this? I've tried versions 4.2.2, 4.2.4, 4.2.6 and the latest dev, the result is always the same. Thanks, ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] What level of timesynch error is typical onWinXP?
Miroslav, I uttered an egregious, terrible lie: the hold interval is not 600 s; it is 300 s, so the maximum time to converge, given a frequency file within 1 PPM, is five minutes; without a frequency file, within ten minutes to 0.5 ms and 1 PPM. Again, I caution that if for some reason the initial frequency file is in error something like 500 PPM, there will be a considerable additional time to converge. The model for this scheme is not to fix broken frequency files, but to quickly converge after a laptop has been off for a few hours. Dave David L. Mills wrote: Miroslav, No, it not expected, unless you are referring to broadcast mode when started with +-100 ms initial offset. That has been corrected as per your bug report. For record, a hold timer is started when the first update is received after startup and ends when the residual offset is less than 0.5 ms or after a timeout of 600 s. During the hold interval the PLL loop time constant is set very low and the frequency discipline is disabled. With this arrangement, the offset typical converges within 600 s, even with initial offsets up to +-100 ms, and much less if the initial offset is in the 10-50 ms range. If you see different behavior, either with client/server or broadcast modes, please report. Note that, if the initial frequency error is significant, there may still be a surge correction. If the frequency file is not present at startup, the frequency will be measured, typically within +-1 PPM, within 600 s, following with the above scheme will be in effect. Under worst case conditions, there still could be a wobble following startup not exceeding 1 ms. If somebody finds an extraordinarily unlikely y set of circumstances leading to, say 2 ms, I'm not going to lose sleep over that. Dave The Miroslav Lichvar wrote: On Fri, Oct 22, 2010 at 11:39:47AM +0100, David J Taylor wrote: Thanks, Dave. I may be missing something here, but it seems to me that 4.2.7p58 still takes a number of hours to reach the accuracy limits where thermal effects dominate. It's that which matters to me, rather than something in the first few minutes. I agree the graphs would not show such short time-scale initial disturbances. Did the clock frequency change before you started the new version? I played with the latest ntp-dev a bit and there indeed is a improvement on start, mainly when the initial offset is around 0.01-0.05s. But the frequency error has to be very small to make a difference, see these plots: http://mlichvar.fedorapeople.org/tmp/ntp_start_offset.png http://mlichvar.fedorapeople.org/tmp/ntp_start_freq.png Also, I've noticed when ntpd is started without driftfile and the initial offset is over 0.05 second, the overshoot can easily reach 100 percent, is this expected? ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] ntp orphan mode without manycast or multicast or broadcast ?
Jerzy , See the online documentation at www.ntp.org. The release notes page has a link to a tutorial on orphan mode. Note the online documentation applies to the latest development version; the feature may or may not appear in the release versioin. Dave Miernik, Jerzy (Jerzy) wrote: Orphan Mode description related to NTP Version 4 Release Notes does not require either manycast or multicast or broadcast to be configured. Is it then possible to have ntpd get into orphan mode, in a mesh, with just unicasts? I think it should be possible to configure several servers in each client and all clients in each server, peer servers in servers, all IP addresses unicast, to have orphan mode operational, if needed. Servers could be of stratum lower by 1 than clients. 1. Could I get an expert comment if this would work? 2. Has anyone ever tried such a multi-unicast addressing for orphan mode? 3. If unicast addresses of servers are ordered in ntp.conf in ascending order of hop numbers, would ntpd try the closest servers before trying more distant ones? 4. If I still may, any config examples? Or something to avoid? Best regards, Jerzy. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] GPS
John, My experience with SA some years ago was that the timing accuracy was in the order of LORAN C, that is, about one microsecond. However, the oscillator in my Austron 2201 GPS receiver was disciplined in frequency by LORAN C, with result the timing accuracy was in the order of 50 ns. I calibrated it with a Cesium oscillator. In other words, even if SA were turned back on, it is easily defeated. Dave John Hasler wrote: Chris H writes: With Selective Availability enabled would that cause time accuracy issues with the signals from the satellite? I believe that it would inject a maximum of about 300 ns of random error. However, SA is obsolete. The newest satellites don't even implement it. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Allan deviation survey
Dave, I'm glad it's gone, as the code was never intended to measure resolution. It is intended to measure precision, defined in the specification as the time to read the system clock. This turns out to be really important for a client to read two or more sources on the same fast Ethernet. The intent is to avoid non-intersecting correctness intervals. Dave Dave Hart wrote: On Thu, Sep 16, 2010 at 1:23 AM, David L. Mills mi...@udel.edu wrote: Miroslav, The fastest machine I can find on campus has precision -22, or about 230 ns. Then, I peeked at time.nist.gov, which is actually three machines behind a load leveler. It reports to be an i386 running FreeBSD 61. Are you ready for this? It reports precision -29 or 1.9 ns! I'm rather suspiciousabout that number. I think this can be attributed to some code that used to be in ntpd which, on FreeBSD only, used for precision an OS estimate of the clock resolution in place of the measured latency to read the clock used on every other platform. That FreeBSD exception was removed from ntpd years ago, but apparently after the version in use by NIST. Dave Hart ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Allan deviation survey
Miroslav, The fastest machine I can find on campus has precision -22, or about 230 ns. Then, I peeked at time.nist.gov, which is actually three machines behind a load leveler. It reports to be an i386 running FreeBSD 61. Are you ready for this? It reports precision -29 or 1.9 ns! I'm rather suspiciousabout that number. What processor and operating system are you using. What is the precision reported by ntpd? For historic perspective, the time to read the clock on a Sun SPARC IPC in the late 1980s was 42 microseconds. Thanks for the jitter.c update. Dave Miroslav Lichvar wrote: On Tue, Sep 14, 2010 at 07:17:04PM +, David L. Mills wrote: Miroslav, Better recalibrate your slide rule. On a 2.8 GHz dual-core Pentium running OpenSolaris 10, the measured precision is -21, which works out to 470 ns. That probably just means the system on your machine is not using the rdtsc instruction when reading time. You claim ten times faster. What snake oil are you using for your processor? To check, try running the jitter.c program in the distribution. Average 0.00088 First rank 0 0.00082 1 0.00082 2 0.00082 3 0.00082 4 0.00082 5 0.00082 6 0.00082 7 0.00082 8 0.00082 9 0.00082 Last rank 70 0.13065 71 0.13324 72 0.13477 73 0.13545 74 0.13782 75 0.14010 76 0.19316 77 0.23405 78 0.34991 79 0.000111950 But I had to apply the following patch, because there was time stored as seconds since 1900 in double format, which for current time gives only about 119ns resolution and so the differences ended up as zero. --- jitter.c.orig 2008-07-16 23:20:59.0 +0200 +++ jitter.c2010-09-15 10:10:06.0 +0200 @@ -15,6 +15,7 @@ #include sys/time.h #include stdlib.h #include jitter.h +#include time.h #define NBUF 82 #define FRAC 4294967296. /* a illion */ @@ -33,7 +34,7 @@ char *argv[] ) { - l_fp tr; + l_fp tr, first; int i, j; double dtemp, gtod[NBUF]; @@ -43,11 +44,13 @@ for (i = 0; i NBUF; i ++) gtod[i] = 0; + get_systime(first); /* * Construct gtod array */ for (i = 0; i NBUF; i ++) { get_systime(tr); + tr.l_i -= first.l_i; LFPTOD(tr, gtod[i]); } ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Allan deviation survey
Miroslav, Better recalibrate your slide rule. On a 2.8 GHz dual-core Pentium running OpenSolaris 10, the measured precision is -21, which works out to 470 ns. You claim ten times faster. What snake oil are you using for your processor? To check, try running the jitter.c program in the distribution. Dave Miroslav Lichvar wrote: On Tue, Sep 14, 2010 at 02:45:01AM +, David L. Mills wrote: Don't get fooled by the MINSTEP. Precision is defined by the time to read the system clock at the user interface and I have never seen anything less than 500 ns for that, more typically 1000 ns. This is what ntpd prints here (patched to use smaller MINSTEP): ntpd[9357]: proto: precision = 0.086 usec I've seen values as low as 40 ns. The system has to use TSC as clocksource though. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Allan deviation survey
Bill, A feedback loop that minimizes time and frequency errors is a type-2 loop whether linear or not. NTP as specified and implanted is not linear either, since it can use old samples in the clock filter algorithmic, turns into a FLL at larger poll interval,. and has an automatic poll-adjust mechanism. For the purposes here, only the clock filter is significant. I suspect you use the precision time support in the kernel and the ntp_adjtime() syscall or their equivalents in Linux. This code, which is a descendent of code I wrote for the Alpha, implements a linear, type-2 loop with the same impulse response and time constant as the daemon loop used when the precision time support is not available. Both the daemon and kernel loops and yours as well are, crudely put, lowpass filters. What I suspect you did was control the frequency directly and move the corner frequency of the PLL to up out of the way values by decreasing the time constant exponent (shift), then doing the lowpass function yourself. That might even give better results than the PLL alone. However, the ntpd measurements are after the clock filter and before the kernel call, while I suspect yours are after the discipline and before the kernel call. The two measurements are not comparable. Look at it this way. At a poll interval of 16 s, the PLL reduces a given time offset by a factor of 256, so in fact a 1-ms offset actually causes a 4-microsecond change in the clock phase. If that were the criteria to judge performance, ntpdt would look 256 times better than advertised. I am not here judging whether chrony is better than ntpd or not, just that the performance measurements be comparable and honest. Dave unruh wrote: On 2010-09-14, David L. Mills mi...@udel.edu wrote: Miroslav, I think we are talking right past each other. Both Chrony and NTP implement the clock discipline using a second-order feedback loop that chrony does not use a second-order feedback loop it is a high order, and variable order feedback loop. It remembers not just the slope and offset, as does ntp, but also past values of the errors. It is, as far as I can tell, stable ( poles in the lower have complex t plane.) The variable order also makes it non-linear. The high order and the non-linearity both make it very different from ntp. can minimize error in both time and frequency, although each uses a different loop filter. Chrony uses a least-squares technique; NTP uses a traditional phase-lock loop. The response of these loops is characterized by risetime and overshoot, or alternatively time constant and damping factor. If Chrony were designed to have similar risetime and overshoot characteristics and equivalent time constant, when operated under the same conditions (trace 1) it will perform in a manner similar to NTP. That was and is my claim. .. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Allan deviation survey
Miroslav, I think we are talking right past each other. Both Chrony and NTP implement the clock discipline using a second-order feedback loop that can minimize error in both time and frequency, although each uses a different loop filter. Chrony uses a least-squares technique; NTP uses a traditional phase-lock loop. The response of these loops is characterized by risetime and overshoot, or alternatively time constant and damping factor. If Chrony were designed to have similar risetime and overshoot characteristics and equivalent time constant, when operated under the same conditions (trace 1) it will perform in a manner similar to NTP. That was and is my claim. I read your message very carefully and conclude you have done something very similar to what I have. You generated phase noise from an exponential distribution and verified it has slope -0.5 on a variance-time plot, then generated random-walt frequency noise and verified it has slope near zero on a variance-time plot or used some other equivalent technique to verify the distributions. Using trial and error you found appropriate factors to combine the phase and frequency noise to produce an Allan variance characteristic similar to trace 1. All this is not hard using Matlab, but you might have used something else. The interesting thing to me is how you used that information to develop the claim that Chrony is far better than NTP? To support your claim, you would have to confront both Chrony and NTP with samples drawn from the resulting distribution and compare statistics. The cumulative probability distributions in Chapter 6 of my book were made using the NTP simulator included in the NTP distribution. I assume you have something similar. It would be interesting to repeat the experiment with trace 3 and NTP operating at a poll interval of 16 s.. Dave Miroslav Lichvar wrote: On Fri, Sep 10, 2010 at 10:10:08PM +, David L. Mills wrote: A previous message implied that, once the Allan characteristic was determined, it would show chrony to be better than ntpd. Be advised the default time constant (at 64 s poll interval) was specifically chosen to match trace 1 on the graph mentioned above. Wasn't that rather for 16s poll interval? From simulations is seems that the phase noise would have to be 10-30 times higher (or the frequency noise lower, but that's unrealistic) to ntpd perform well at 64s poll interval. In other words, it is in fact optimum for that characteristic and chrony can do no better. Well, it does better. With phase noise and random-walk frequency corresponding to the trace 1 from your graph chrony is about 5 times better than ntpd. With 30 times higher phase noise the difference is only in order of tens of percent, but it's still better. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Allan deviation survey
Miroslav, You don't need a week for that, since the anticipated intercept is in the order of 200 s (trace 3). However, plots such as these are really susceptible to little hidden resonances, so I tend to prefer a long tail and lots and lots of samples. For comparison, the averaging time for PPS signal in the kernel is 256 s, which is close to the expected Allan intercept for modern systems. Don't get fooled by the MINSTEP. Precision is defined by the time to read the system clock at the user interface and I have never seen anything less than 500 ns for that, more typically 1000 ns. Dave Miroslav Lichvar wrote: On Fri, Sep 10, 2010 at 08:48:58PM +, David L. Mills wrote: Miroslav, I've done this many times with several machines in several places and reported the results in Chapter 12 and 6 in both the first and second editions of my book, as well as my 1995 paper in ACM Trans. Networking. Judah Levine of NIST has done the same thing and reported in IEEE Transactions. He pointed out valuable precautions when making these measurements. You need to disconnect all time disciplines and let the computer clock free-wheel. You need to continue the measurements for at least a week, ten times longer than the largest lag in the plot. You need to display on log-log coordinates and look for straight lines intersecting at what I have called the Allan intercept. I have Matlab programs here that do that and produce graphs like the attached. For the simulation and development purposes I'm interested in the most important part of the graph is the point at which the line starts to divert from the -1 slope. With good PPS signal one day of collecting data should be enough. For those that might want to repeat the experiments, see the attached figure. Trace 1 is from an old Sun SPARC IPC; trace 2 is from a Digital Alpha. Thanks, that's very helpful. Traces 3 and 4 were generated using artificial noise sources with parameters chosen to closely match the measured characteristics. Phase noise is generated from an exponential distribution, while frequency nose is generated from the integral of a Gaussian distribution, in other words a random walk. Trace 4 is the interesting one. It shows the projected performance with precision of one nanosecond. The fastest machines I have found have a precision of about 500 ns. Note, precision is the time taken to read the kernel clock and is not the resolution. With current CPUs the precision is well below 100 ns. (thus the MINSTEP constant used in ntpd's precision routine is too high) ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Allan deviation survey
Bill, Please reread the definition of Allan deviation. It is a measure of frequency differences, not time errors. I principle, it could be applied to a virtual machine with virtual timer interrupts, but nobody familiar with the principles would do that and it would serve no useful purpose. The phase noise would be huge due to processor scheduling, etc. This would push the Allan intercept to very large values. The NTP clock discipline does that automatically. So, it makes no sense at all to use Allan analysis in such cases. I'm getting really tired of this discussion, and it servers no useful purpose. And, by the way, mail sent to your alleged mail address is returned to sender as undeliverable. Dave unruh wrote: On 2010-09-11, David L. Mills mi...@udel.edu wrote: David, With due respect, your comment has nothing to do with the issue. Allan deviation is between a quartz crystal oscillator, timer interrupt, interpolation mechanism and a kerel syscall to read. the clock. It has nothing whatsoever to do with virtual machines. ?? Allan deviation is a measure of the error of a clock as a function of lag. It does NOT specify the error soruce. It is not simply defined for only certain machines used in certain ways. Now it may be simple for some systems (like your lightly loaded systems) but that is largely irrelevant. The purpose of the doing things like measureing Alan deviation is to understand the noise sources affecting a clock. If those happen to be diurnal temperature variations, then that is what needs to be handeled. If it is Virtual Machines and their clock reading then that is what you need to look at. Errors are errors and understanding them is crucial to designing a decent error mitigation procedure. Closing one's eyes to a dominant error source will ssimply mean that the error mitigation procedure will suck. Dave David Woolley wrote: David L. Mills wrote: Bill, Running a precision time server on a busy public machine with a widely varying load is not a good idea and I have no interest in that. Running As indicated by the sort of questions the group is getting recently, it is becoming the norm to run time servers on virtual machines, because that is how businesses now run all their servers. The whole point of virtual machines is that the host is busy and running a varied load! ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Allan deviation survey
David, With due respect, your comment has nothing to do with the issue. Allan deviation is between a quartz crystal oscillator, timer interrupt, interpolation mechanism and a kerel syscall to read. the clock. It has nothing whatsoever to do with virtual machines. Dave David Woolley wrote: David L. Mills wrote: Bill, Running a precision time server on a busy public machine with a widely varying load is not a good idea and I have no interest in that. Running As indicated by the sort of questions the group is getting recently, it is becoming the norm to run time servers on virtual machines, because that is how businesses now run all their servers. The whole point of virtual machines is that the host is busy and running a varied load! ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Allan deviation survey
David, I have no idea where you are coming from. At my feet are two GPS/CDMA time servers running embedded Linux systems. I have two more on campus plus two dedicated Unix machines connected to GPS receivers. NIST has about a dozen dedicated time servers running FreeBSD. USNO has about a dozen running HP-UX. The NRC in Canada runs at least two of them, as does an unknown number in Europe, Japan and Australia. There is even one in Antarctica and occasionally one or two in space and oone on the seafloor of the Pacific Ocean. Maybe they are abnormal in your view, but they are the ones I am concerned about and they are the ones the Allan deviation analysis is intended for. If you want to run NTP in a virtual machines, the performance will depend on many factors, but none of which have to do with Allan deviation. Dave David Woolley wrote: David L. Mills wrote: I beg to differ. All the machines I used are PCs or similar workstations. They really and truly behave according to an exponential As you note in another reply, you seem to use them in a way that is abnormal for most users of NTP, i.e. as dedicated real machines in well controlled environments. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Allan deviation survey
Miroslav, I've done this many times with several machines in several places and reported the results in Chapter 12 and 6 in both the first and second editions of my book, as well as my 1995 paper in ACM Trans. Networking. Judah Levine of NIST has done the same thing and reported in IEEE Transactions. He pointed out valuable precautions when making these measurements. You need to disconnect all time disciplines and let the computer clock free-wheel. You need to continue the measurements for at least a week, ten times longer than the largest lag in the plot. You need to display on log-log coordinates and look for straight lines intersecting at what I have called the Allan intercept. I have Matlab programs here that do that and produce graphs like the attached. Most of the prior work was done 15 years ago. I strongly suspect the Allan intercept has moved to lower time values due to the fact that modern processors are faster and the interrupt latency is smaller. The current NTP distribution includes a NTP simulator that can be excited with white phase noise and random-walk frequency noise that very nicely models the real noise sources. For those that might want to repeat the experiments, see the attached figure. Trace 1 is from an old Sun SPARC IPC; trace 2 is from a Digital Alpha. Traces 3 and 4 were generated using artificial noise sources with parameters chosen to closely match the measured characteristics. Phase noise is generated from an exponential distribution, while frequency nose is generated from the integral of a Gaussian distribution, in other words a random walk. Trace 4 is the interesting one. It shows the projected performance with precision of one nanosecond. The fastest machines I have found have a precision of about 500 ns. Note, precision is the time taken to read the kernel clock and is not the resolution. Dave Miroslav Lichvar wrote: Hi, I'm trying to find out how a typical computer clock oscillator performs in normal conditions without temperature stabilization or a stable CPU load and how far it is from the ideal case which includes only a random-walk frequency noise. A very useful statistics is the Allan deviation. It can be used to compare performance of oscillators, to make a guess of the optimal polling interval, whether enabling ntpd daemon loop to use FLL will help, how much better chrony will be than ntpd, etc. If you have a PPS device and would be willing to run the machine unsynchronized for a day, I'd like to ask you to measure the Allan deviation and send it to me. I wrote a small ncurses program that can be used with LinuxPPS to capture the PPS samples and create an Allan deviation plot. An overview is displayed and continuously updated while samples are collected. Data which can be used to make an accurate graph (e.g. in gnuplot) are written to the file specified by -p option when the program is ended or when the 'w' key is pressed. Available at: http://mlichvar.fedorapeople.org/ppsallan-0.1.tar.gz Obligatory screenshot :-) Allan deviation plot (span 11:09:55, skew +0.0) 1e-05├ │ + │ │ + + 1e-06├ + │ +++ │ ++ │+++ │ ++ 1e-07├ +++ │++ │ +++ │ ++++ │ +++ 1e-08├ + │ │ │ │ 1e-09└───┴───┴───┴───┴───┴ 1e+00 1e+011e+02 1e+03 1e+04 1e+05 w:Write q:Quit r:Reset 1:Skew 0.0 2:Skew +1.0 3:Skew -0.5 To make a good plot: 1. disable everything that could make system clock adjustments 2. start ./ppsallan -p adev.plot /sys/devices/virtual/pps/pps0/assert (change the sys file as appropriate) 3. let it collect the PPS samples for at least one day 4. hit q and send me the adev.plot file Thanks, ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Allan deviation survey
Bill, All my measurements were in temperature-controlled environments, such as a campus lab or home office, and the data were collected over one week. The temperature varied less than a degree C. However, I have data from Poul-Henning Kamp for a similar experiment done in summertime Denmark where the environment was not controlled. As expected it looked something less than shown by the graph attached to my previous message. The Allan characteristic tends to fizzle out at lags greater than about one-fourth the span of the measurements. Thus, if you collect samples for only one day the maximum lag could not be more than a few hours and the diurnal effects would not be apparent. A previous message implied that, once the Allan characteristic was determined, it would show chrony to be better than ntpd. Be advised the default time constant (at 64 s poll interval) was specifically chosen to match trace 1 on the graph mentioned above. In other words, it is in fact optimum for that characteristic and chrony can do no better. Having said that, modern machines are faster and with less phase noise, although with the same rotten clock oscillator. Thus, I would expect a modern machine to behave something like trace 3 on my plot, where the intercept is more like 200 s than 2000 s. From anecdotal evidence, 16 s is about right, but 8 s is too vulnerable to jitter in the Ethernet NICs and switches. NIC jitter can vary widely, even using the same Ethernet chip, specifically the PCNET chips that do scatter-gather on-fly and coalesce interrupts. I have measured jitter components from 150 ns with a i386 running FreeBSD to over 1 ms with a Sun Ultra running Solaris 10. So, any performance comparisons must take these differences in account. Dave unruh wrote: On 2010-09-10, Miroslav Lichvar mlich...@redhat.com wrote: Hi, I'm trying to find out how a typical computer clock oscillator performs in normal conditions without temperature stabilization or a stable CPU load and how far it is from the ideal case which includes only a random-walk frequency noise. A very useful statistics is the Allan deviation. It can be used to compare performance of oscillators, to make a guess of the optimal polling interval, whether enabling ntpd daemon loop to use FLL will help, how much better chrony will be than ntpd, etc. If you have a PPS device and would be willing to run the machine unsynchronized for a day, I'd like to ask you to measure the Allan deviation and send it to me. I wrote a small ncurses program that can be used with LinuxPPS to capture the PPS samples and create an Allan deviation plot. An overview is displayed and continuously updated while samples are collected. Data which can be used to make an accurate graph (e.g. in gnuplot) are written to the file specified by -p option when the program is ended or when the 'w' key is pressed. Available at: http://mlichvar.fedorapeople.org/ppsallan-0.1.tar.gz Obligatory screenshot :-) Allan deviation plot (span 11:09:55, skew +0.0) 1e-05??? ??? + ??? ??? + + 1e-06??? + ??? +++ ??? ++ ???+++ ??? ++ 1e-07??? +++ ???++ ??? +++ ??? ++++ ??? +++ 1e-08??? + ??? ??? ??? ??? 1e-09??? 1e+00 1e+011e+02 1e+03 1e+04 1e+05 w:Write q:Quit r:Reset 1:Skew 0.0 2:Skew +1.0 3:Skew -0.5 To make a good plot: 1. disable everything that could make system clock adjustments 2. start ./ppsallan -p adev.plot /sys/devices/virtual/pps/pps0/assert (change the sys file as appropriate) 3. let it collect the PPS samples for at least one day 4. hit q and send me the adev.plot file Thanks, Except yo uknow that the typical computer clock is driven mostly by temperature variations, and those are time of day dependent. People tend to work during the day thus their computer works during the day, and does not at night. Ie, there is a very strong daily cycle in the temp of the computer. That is NOT within the Allan model, and the Allan variation and minimum are really irrelevant with this highly non-stochastic noise model. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Allan deviation survey
David, I beg to differ. All the machines I used are PCs or similar workstations. They really and truly behave according to an exponential distribution with a small mean of a few to a few tens of microseconds. I have done a tedious histogram from which I can pick out the cache replacement, context-switch and timer interrupts. I've used both uniform and exponential noise models with substantially the same results. Since I was looking for the best performers, most of the data was collected on lightly loaded machines; the characteristics with a heavily loaded campus server are much worse. Dave David Woolley wrote: Miroslav Lichvar wrote: A very useful statistics is the Allan deviation. It can be used to compare performance of oscillators, to make a guess of the optimal Surely that is based on a particular model of the phase noise and the big argument about ntpd is that PC's don't follow that model. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Allan deviation survey
Bill, Running a precision time server on a busy public machine with a widely varying load is not a good idea and I have no interest in that. Running experiments on a dedicated, but very busy, time server such as rackety.udel.edu is much more interesting. As for load-induced temperature variations, even on a busy NTPserver, the CPU is loaded to about five percent and the load is constant. As for your concern about diurnal variations for any reason, that's what the clock discipline algorithm is for and has nothing to do with Allan deviation. As for the question about the graph, it's from my book. However, there are examples in the Precision Time Synchronization briefing slides on the NTP project page at www.eecis.udel.edu/~mills/ntp.html. be advised, most of those briefings are from the 1990s. Dave unruh wrote: On 2010-09-10, David L. Mills mi...@udel.edu wrote: This is a multi-part message in MIME format. --090107050005040702060208 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Miroslav, I've done this many times with several machines in several places and=20 reported the results in Chapter 12 and 6 in both the first and second=20 editions of my book, as well as my 1995 paper in ACM Trans. Networking.=20 Judah Levine of NIST has done the same thing and reported in IEEE=20 Transactions. He pointed out valuable precautions when making these=20 measurements. You need to disconnect all time disciplines and let the=20 computer clock free-wheel. You need to continue the measurements for at=20 least a week, ten times longer than the largest lag in the plot. You=20 need to display on log-log coordinates and look for straight lines=20 intersecting at what I have called the Allan intercept. I have Matlab=20 programs here that do that and produce graphs like the attached. What was the load on those computers? Were they running just this time measurement software or were they being used for real work by normal people? From what I have seen from machines that are in use, they heat up during the day when people use them and cool off at night when they are idle. The amplitude of this component depends on how much work is being done. This does not fit the model of random phase noise/random walk frequency noise. It has a strong periodic component with period of a day. I see this in most of my systems. Such a periodic noise is not part of the noise model on which the Allan intercept is based. Also this assumes a very particular model of how the measurements are made, of how the time corrections are made and of what is desired from the system. Most of the prior work was done 15 years ago. I strongly suspect the=20 Allan intercept has moved to lower time values due to the fact that=20 modern processors are faster and the interrupt latency is smaller. The=20 current NTP distribution includes a NTP simulator that can be excited=20 with white phase noise and random-walk frequency noise that very nicely=20 models the real noise sources. For those that might want to repeat the experiments, see the attached=20 figure. Trace 1 is from an old Sun SPARC IPC; trace 2 is from a Digital=20 Alpha. Traces 3 and 4 were generated using artificial noise sources with=20 parameters chosen to closely match the measured characteristics. Phase=20 noise is generated from an exponential distribution, while frequency=20 nose is generated from the integral of a Gaussian distribution, in other=20 words a random walk. Trace 4 is the interesting one. It shows the=20 projected performance with precision of one nanosecond. The fastest=20 machines I have found have a precision of about 500 ns. Note, precision=20 is the time taken to read the kernel clock and is not the resolution. You graph of course did not make it through to newsnet. Have you archived the figure somewhere? ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] [Pool] 4000 packets a second?
Dave, Looks like you got impaled on my own bug. The intent was indeed to express the minimum headway, aka guard time (ntp_minpkt), in seconds, not as exponent. This is so folks could specify 3 s, for instance. This is what the configuration code assumes. On the other hand, the minimum average headway (ntp_minpoll) is specified as an exponent of two, since this is consistent with poll interval. Both of these items are used elsewhere in the rate management code. However, and it must by my bug, I mistakenly coded it in ntp_monitor.c and ntp_proto.c as exponent. We can discuss offline how to fix this simple bug. Apparently, I never did get around to documenting the command other than an orphan reference on the rate management page. That's in process of fix. Dave Dave Hart wrote: On Wed, Sep 1, 2010 at 03:43 UTC, David L. Mills mi...@udel.edu wrote: Dave, The code I wrote in ntp_monitor.c has apparently been rewritten. Yes, I take what I like to think is credit for rewriting ntp_restrict.c and ntp_monitor.c. I was unhappy with the way code and structure definitions had been duplicated when IPv6 support was added, and wanted to share more of the code between v4 and v6. However, it was intended to be (and I believe it is) equivalent to the original code functionally. The MRU resolution is in seconds. As part of the rewrite the resolution recorded in the MRU list was changed to keep the list correctly ordered at a subsecond level and to enable the iterative retrieval used by ntpq's mrulist command (replacing ntpdc's monlist). However, decisions are still made based on whole-second calculations. The original interpretation of minimum was that any headway less than this would be dropped. Setting that to zero would mean nothing would be dropped. Apparently, the current code is contrary to the original intent and documentation. You seemingly claim the units of discard minimum 0 are seconds and that would mean zero minimum seconds between requests. The units are log-base-2 seconds, as with minpoll and maxpoll. See the documentation you have so lovingly maintained [1]. This was true with the old code and remains true with the rewritten code. I have tested 4.2.6 builds on the pool server I'm involved with, and the rate limiting behavior appears unchanged by the rewrite, except it is more effective on a busy server where 600 monlist entries was inadequate for rate-limiting to be enforced. I didn't check to see if the probabilistic choice to preempt old entries if the list is full remains. My earlier experience is that this is important for the busiest servers. The code is still there, but it is much less likely to come into play with the 600-entry cap lifted. I also remember puzzling quite a bit over that snippet of code and the documentation for discard monitor describing it. I recall thinking the code did not appear to be doing what the documentation stated. I welcome review of discard monitor behavior in 4.2.7, where it can be made more relevant by limiting the size of the MRU list using mru maxdepth 100 or so. Cheers, Dave Hart [1] http://www.eecis.udel.edu/~mills/ntp/html/accopt.html#discard Dave Dave Hart wrote: On Wed, Sep 1, 2010 at 00:42 UTC, David L. Mills mi...@udel.edu wrote: Did you intend the discard minimum 0? That effectively disables the rate control defense mechanism. you should leave it out. That has not been my experience on the pool server I'm involved with: h...@pool1 fgrep discard /etc/ntp.conf # discard minimum 0 (power of 2 like poll interval) is needed discard minimum 0 average 3 h...@psp-fb1 ntpq -c sysstats uptime: 1059862 sysstats reset: 1059862 packets received: 263004216 current version:144454930 older version: 99867648 bad length or format: 18635251 authentication failed: 316799 declined: 3179 restricted: 14857 rate limited: 56970859 KoD responses: 1405175 processed for time: 76220 h...@pool1 ntpdc -c sysstats time since restart: 1059868 time since reset: 1059868 packets received: 263005578 packets processed: 76220 current version:144455895 previous version: 99867947 declined: 3179 access denied: 14857 bad length or format: 18635348 bad authentication: 316800 rate exceeded: 56971000 h...@pool1 A bit over 20% of incoming traffic has exceeded rate limits with discard minimum 0 used (1s minimum). Cheers, Dave Hart ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] [Pool] 4000 packets a second?
Dave, The code I wrote in ntp_monitor.c has apparently been rewritten. The MRU resolution is in seconds. The original interpretation of minimum was that any headway less than this would be dropped. Setting that to zero would mean nothing would be dropped. Apparently, the current code is contrary to the original intent and documentation. I didn't check to see if the probabilistic choice to preempt old entries if the list is full remains. My earlier experience is that this is important for the busiest servers. Dave Dave Hart wrote: On Wed, Sep 1, 2010 at 00:42 UTC, David L. Mills mi...@udel.edu wrote: Did you intend the discard minimum 0? That effectively disables the rate control defense mechanism. you should leave it out. That has not been my experience on the pool server I'm involved with: h...@pool1 fgrep discard /etc/ntp.conf # discard minimum 0 (power of 2 like poll interval) is needed discard minimum 0 average 3 h...@psp-fb1 ntpq -c sysstats uptime: 1059862 sysstats reset: 1059862 packets received: 263004216 current version:144454930 older version: 99867648 bad length or format: 18635251 authentication failed: 316799 declined: 3179 restricted: 14857 rate limited: 56970859 KoD responses: 1405175 processed for time: 76220 h...@pool1 ntpdc -c sysstats time since restart: 1059868 time since reset: 1059868 packets received: 263005578 packets processed: 76220 current version:144455895 previous version: 99867947 declined: 3179 access denied: 14857 bad length or format: 18635348 bad authentication: 316800 rate exceeded: 56971000 h...@pool1 A bit over 20% of incoming traffic has exceeded rate limits with discard minimum 0 used (1s minimum). Cheers, Dave Hart ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] noise and stability values
Jaap, Please see the Event Messages and Status Codes ant the ntpq pages in the documentation in your release. Dave Jaap Winius wrote: Hi folks, A few years ago I started graphing my NTP server's performance. The machine ran Debian lenny with ntp v4.2.4. However, after recently upgrading to squeeze, which comes with ntp v4.2.6, I noticed that two system variables included in my graphs -- noise and stability -- are no longer present: neither in the output of ntpq -c rv, nor in its associated manual, ntpq.html, which is part of the ntp-doc package. Well, actually the noise variable wasn't mentioned in the previous version of the manual either (dated July 28, 2005), but is was present in the output from the above command. So, can ntpq no longer be used to examine a system's noise and stability variables, or is it currently necessary to use different command? Cheers, Jaap ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Q: No state variable in ntpd 4....@1.2089-o?
Ulrich, The state variable was never intended to be externally visible. It has changed in many ways for many reasons. The variables designed for external monitoring are explicitly revealed in the documentation, in particular the ntpq page and the Event Messages and Status Words page. Anything else is and will be grossly misleading. Dave Ulrich Windl wrote: Hi, in a program I wrote I used the existence of the state variable to detect a ntpd v4. To my surprise I found a server that runs NTPv4 without a state variable: ntpq rl assID=0 status=0415 leap_none, sync_uhf_clock, 1 event, event_clock_reset, version=ntpd 4@1.2089-o Mon Feb 8 15:25:35 UTC 2010 (1), processor=i686, system=Linux/2.6.32.7-ppsbeta, leap=00, stratum=1, precision=-20, rootdelay=0.000, rootdisp=0.141, refid=GPS, reftime=d00f9725.13579120 Fri, Aug 13 2010 12:04:21.075, clock=d00f972d.e89a3fc1 Fri, Aug 13 2010 12:04:29.908, peer=46241, tc=4, mintc=3, offset=0.002, frequency=26.025, sys_jitter=0.002, clk_jitter=0.003, clk_wander=0.000 Is that standard, or is it a patched version? While I might like the additional variables, I wonder where the old ones went to... Regards, Ulrich ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Advice for a (LAN-interconnected) WLAN Testbed w/ ARM nodes
Guys, May I suggest you review the definitions of system offset (THETA), root distance (LAMBDA) and system jitter (PSI) in Section 11.2 of rfc5905. Dave David Woolley wrote: Miroslav Lichvar wrote: But there will be more clock updates. Noise in frequency may go up, but the offset will be the same or better, unless there are network congestions that last longer than the clock filter can handle (8 * poll interval). The offset may be better, but offset is not the offset from true time, it is the offset from the sum of the upstream server time and the measurement errors for the last poll. There should be no way of measuring the error from UTC. The fact that you can do so, to some extent, is due to the NTP algorithms being less than ideal. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Is a packet with stratum 1 allowed to contain a KoD code?
Danny, KoD packets have the leap bits set to 3 (unsynchronized); the stratum is not signficant. The reference implementation sets the stratum to 16 for the RATE kiss code. However, packet stratum 16 is mapped to stratum 0 as visible to the monitoring function. Codes like INIT and STEP are used as labels for associations and used only for monitoring purposes. Dave Danny Mayer wrote: On 7/19/2010 8:43 AM, Christer Eriksson wrote: Hi, Is an NTP packet with stratum set to 1 ever allowed to contain a kiss of death code? I got a server (NTPv4) that sends NTP packets with stratum 1 and KoD codes like INIT or STEP and I fail to find a confirmation in any RFC relating to version 4 of NTP whether this is allowed or not. See RFC5905 Section 7.4 for KOD packets. This has nothing to do with Stratum. Why would you assume that Stratum 1 is somehow exempt? INIT and STEP are just states of the server and it is instructing the client that it is not yet ready to deliver accurate timestamps. Danny Thanks Best Regards Christer Eriksson ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Do logs indicate a config problem?
Ulrich, From context I suspect Linux has incorporated the PPS kernel discipline code I wrote in the 1990s. That code has several provisions to groom noisy PPS signals, including a median filter, popcorn spike suppressor and range gate. Apparently, the PPS signal in this case is very noisy and these provisions are doing their job. Dave Ulrich Windl wrote: David Lord sn...@lordynet.org writes: Do logs indicate a config problem? system server1: MSFa, PPSa peer=server2 and remote servers system server2: peer=server1 and remote servers system server3: GPSb, PPSb, server2 and server1 This is total logged over period Feb 7 to Feb 12: ntp.log.server1: 7 Feb 19:22:39 ntpd[26820]: clock SHM(0) event 'clk_noreply' (0x01) 7 Feb 19:22:40 ntpd[26820]: clock PPS(0) event 'clk_noreply' (0x01) 7 Feb 19:27:01 ntpd[26820]: synchronized to SHM(0), stratum 0 7 Feb 19:27:01 ntpd[26820]: kernel time sync status change 2001 7 Feb 19:28:01 ntpd[26820]: synchronized to PPS(0), stratum 0 12 Feb 10:13:44 ntpd[26820]: sendto(xxx.xxx.xx.xx) (fd=27): Host is down 12 Feb 10:20:41 ntpd[26820]: ntpd exiting on signal 15 This is tiny section of log on server1 with similar entries repeating continuously over whole period Feb 12 to Feb 14. There were too many reboots and restarts over period for peerstats to be useful. I swapped kernels attempting to get frequency offset down from just over 50ppm but the autoconfig still gave 50ppm (on NetBSD-3.1 I could set options TIMER_FREQ= to get 10ppm without problem). ntp.log.server1: 13 Feb 07:07:48 ntpd[22483]: kernel time sync status change 2901 13 Feb 07:08:53 ntpd[22483]: kernel time sync status change 2101 13 Feb 07:09:59 ntpd[22483]: kernel time sync status change 2301 13 Feb 07:17:28 ntpd[22483]: kernel time sync status change 2501 13 Feb 07:18:32 ntpd[22483]: kernel time sync status change 2301 13 Feb 07:19:38 ntpd[22483]: kernel time sync status change 2901 13 Feb 07:20:41 ntpd[22483]: kernel time sync status change 2301 13 Feb 07:21:47 ntpd[22483]: kernel time sync status change 2101 13 Feb 07:22:50 ntpd[22483]: kernel time sync status change 2501 13 Feb 07:23:54 ntpd[22483]: kernel time sync status change 2101 from timex.h: 0x0001 enable pll updates 0x0002 enable pps freq discipline 0x0004 enable pps time discipline 0x0100 pps signal present 0x0200 pps signal jitter exceeded 0x0400 pps signal wander exceeded 0x0800 pps signal calibration error Hi! I'd say all 0x800 and 0x400 should never happen unless there is a severe problem. 0x200 should also happen extremely rarely under normal conditions. (based on my NTP/Linux experience several years ago) Regards, Ulrich This is total logged over period Feb 14 to Feb 19: ntp.log.server1 14 Feb 10:49:34 ntpd[169]: clock SHM(0) event 'clk_noreply' (0x01) 14 Feb 10:52:53 ntpd[169]: synchronized to xx.xxx.xx.xxx, stratum 2 14 Feb 10:52:53 ntpd[169]: kernel time sync status change 2001 14 Feb 10:58:05 ntpd[169]: synchronized to SHM(0), stratum 0 15 Feb 10:32:40 ntpd[169]: ntpd exiting on signal 15 15 Feb 10:34:00 ntpd[169]: clock SHM(0) event 'clk_noreply' (0x01) 15 Feb 10:40:27 ntpd[169]: synchronized to xx.xxx.xx.xxx, stratum 2 15 Feb 10:40:27 ntpd[169]: kernel time sync status change 2001 15 Feb 10:41:47 ntpd[169]: synchronized to xx.xxx.xx.xxx, stratum 2 15 Feb 10:42:36 ntpd[169]: synchronized to SHM(0), stratum 0 This is tiny section of log on server3 with similar entries repeating continuously over whole period Feb 7 to Feb 19. ntp.log.server3: 19 Feb 13:08:24 ntpd[6745]: kernel time sync error \ 0x2307PLL,PPSFREQ,PPSTIME,PPSSIGNAL,PPSJITTER,NANO,MODE=0x0=PLL,CLK=0x0=A 19 Feb 13:08:39 ntpd[6745]: kernel time sync status change \ 0x2107PLL,PPSFREQ,PPSTIME,PPSSIGNAL,NANO,MODE=0x0=PLL,CLK=0x0=A 19 Feb 13:13:31 ntpd[6745]: kernel time sync error \ 0x2307PLL,PPSFREQ,PPSTIME,PPSSIGNAL,PPSJITTER,NANO,MODE=0x0=PLL,CLK=0x0=A 19 Feb 13:13:46 ntpd[6745]: kernel time sync status change \ 0x2107PLL,PPSFREQ,PPSTIME,PPSSIGNAL,NANO,MODE=0x0=PLL,CLK=0x0=A Otherwise timekeeping is good enough for me: peerstats ident mean rms max == . MSF using serial dsr with shm and pps to serial dcd with atom server1 20100208-11127.127.22.00.0451.0387.100 PPSa server3 20100208-11127.127.22.0 -0.0430.2884.544 PPSb server3 20100208-11server1 0.0310.4163.485 server3 20100208-11server2 0.4220.5775.196 . MSF config changed to use only serial dcd with shm driver server1 20100215-18127.127.28.00.1970.4272.723 MSFa server3 20100215-18127.127.22.00.000 -0.0020.021 PPSb server3 20100215-18server1 1.3710.4121.139 server3 20100215-18server2 1.4520.5041.222 Offset from MSF receiver varies significantly with temperature, around 1ms/deg based on offset of server1 seen from
Re: [ntp:questions] monitoring symmetric peers
Kostas, Your symmetric peers have the same upstream system peer, so by rule they are not going to believe each other unless one of them switches to a different upstream source. Even if they have differerent upstream sources, one of them will not beleive the other, since that would create a loop. Dave Kostas Magkos wrote: Hi all, I have two Debian etch ntp servers, running on stock 2.6.18-5-amd64 kernel with ntp 4.2@1.1585-o. The two systems are configured as stratum 2 ntp servers, synchronised with various public stratum 1 servers. They are configured as symmetric active peers. How can I ensure that the two servers are actually symmetric peers? When I list their associations, I can see that each server lists the other one as reject, outlyer or falseticker. I understand that they couldn't be sys.peers, but shouldn't they be candindates for each other, in that way participating in the cluster? Thanks in advance, Kostas Magkos ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] SNTP with 1ms of precision?
David, The basic definition of SNTP has not changed over the yeas, although rfc5905 does clarify the intended scope and role of primary servers, secondary servers and clients. It was the expected, but not required, model that the Unix adjtime() system call be used if the offset was less than an unspecified value and the settimeofday() if greater. There was no intent, either in the earlier SNTP specifications or rfc5905 to specify the SNTP clock discipline algorithm itself. Dave David Woolley wrote: Danny Mayer wrote: On 6/16/2010 5:22 PM, Maarten Wiltink wrote: Marcelo Pimenta marcelopiment...@gmail.com wrote in message news:aanlktilq6m8apeoasibr-o8mhwifqkfv9xyf6mudr...@mail.gmail.com... [...] The NTP algorithm is much more complicated than the SNTP algorithm. The short, short version: there is no SNTP algorithm. SNTP is NTP _without_ the algorithms. Using NTP means continuously adjusting the speed of your clock so it tracks real time as best you can make it, while SNTP is simply asking what time [they think] it is. This is a totally inaccurate statement. See RFC 5905 Section 14. SNTP is That RFC was published after this thread was started! You can't go changing the definitions just for you convenience. Even if it had been published, say six months earlier, the reality is that de facto and historic definitions would still dominate the market. merely a subset of the full NTP protocol. An SNTP server is one with it's own refclock and not dependent on any other upstream servers while and SNTP client is one with a single upstream server and no dependent clients. An SNTP client or an SNTP server should be disciplining the clock in the same way as an NTP server. An SNTP server should continuously adjust the speed of your clock otherwise it's not SNTP compliant. In reality, most SNTP clients step the clock. A few may use a simple frequency control scheme. Once you go much beyond that, it becomes simpler to use a full NTP client, but maybe configure only one server. In fact, RFC 1305 doesn't require any specific clock discipline for NTPv3 clients; that is in an appendix, rather than in the main specification. The important clarification about SNTP, ignoring any recent attempt to redefine it, is that it doesn't specify an algorithm, rather than that it requires the use of only a trivial algorithm. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Clock and Network Simulator
Miroslav, You have not revealed the result of the experiment I suggested, so I don't know whether the Linux kernel performs as expected with the original design parameters. I think we are done with this discussion. The kernel discipline loop is conservatively designed according to sound engineering practice. This is the practice in the systems engineering course I taught over several years. You are invited to obstruct that practice to your own ends, but not in the public distribution. Dave Miroslav Lichvar wrote: On Wed, Jun 30, 2010 at 10:00:06PM +, David L. Mills wrote: Is there somebody around here that understands feedback control theory? You are doing extreme violence to determine a really simple thing, the discipline loop impulse response. There is a much simpler way. It was a demonstration of what clknetsim can do. You may be able to predict the result, but I'm not. I think being able to verify a theory with simulations is always a good thing. Of particular importance is the damping factor, which is evident from the overshoot. If SHIFT_PLL is radically changed, I would expect the overshoot to be replaced by an exponentially decaying ring characteristic. That's not what I see in tests on real hw and simulations with SHIFT_PLL 2. The change in SHIFT_PLL would result in unstable behavior below 5 (32 s), as well as serious transients if the discipline shifts from the daemon to the kernel and back. All feedback loops become unstable unless the time constant is at least several times the frequency update interval, which is this case is one second. If you do want to explore how stability may be affected, restore the original design and recompile the distribution with NTP_MINPOLL changed from 3 to 1. Is poll 1 SHIFT_PLL 4 really equal to poll 3 SHIFT_PLL 2 in this respect? If you can provide information how to demostrate the instability with SHIFT_PLL 2 and normal polls, it'll be much easier to convince the kernel folks to change it back to 4. With polls 3-10 and SHIFT_PLL 2, the only instability I've seen is with very long update intervals (e.g. when the network connection repeatedly goes up and down), the frequency will eventually start jumping between +500 and -500 ppm. But kernel loop with SHIFT_PLL 4 and daemon loop with small poll intervals have the same problem, the threshold is just 4 times higher for them. clknetsim has a pll_clamp option which can be enabled to avoid this instability, it clamps the PLL update interval to tc * (1 (ntp_shift_pll + 1)), where tc is the time constant in seconds. I will be doing more testing with it and possibly propose to include a similar code in the kernel. As for runtime switching between daemon and kernel discipline, I haven't tried that. I didn't even know it is supported by ntpd. To fix the original problem reported to me, change the frequency gain (only) by the square of 100 divided by the new clock frequency in Hz. For instance, to preserve the loop dynamics with a 1000-Hz clock, divide the frequency gain parameter by 100. In the original nanokernel routine ktime.c at line 60 there is a line SHIFT_PLL * 2 + time_constant. Replacing it by SHIFT_PLL * 20 + time_constant would fix the progblem for 1000-Hz clocks. I'm not a kernel developer, but I think this is already fixed. Current kernels can be configured to use a dynamic HZ (CONFIG_NO_HZ aka tickless mode), so the ntp code had to be rewritten to allow such operation. With SHIFT_PLL 4, the response and the overshoot is exactly as you describe it should be. BTW, the effect of changing SHIFT_PLL to 2 on clock accuracy in various network conditions is shown here: http://fedorapeople.org/~mlichvar/clknetsim/test1_exp.png With poll 6 and 10ppb/s wander, the crossover is around 10ms jitter. With larger jitters SHIFT_PLL 2 can be up to 2 times worse (it seems this can't be improved by lowering the poll interval) and with very small jitters it can be about 50 times better. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Clock and Network Simulator
Miroslav, Exactly as expected. The overshoot exceeds the design limit of 10 percent by as much as 40 percent. That's exactly what the design is intended to avoid Dave Miroslav Lichvar wrote: On Wed, Jun 30, 2010 at 10:00:06PM +, David L. Mills wrote: The change in SHIFT_PLL would result in unstable behavior below 5 (32 s), as well as serious transients if the discipline shifts from the daemon to the kernel and back. All feedback loops become unstable unless the time constant is at least several times the frequency update interval, which is this case is one second. If you do want to explore how stability may be affected, restore the original design and recompile the distribution with NTP_MINPOLL changed from 3 to 1. I recompiled ntpd with NTP_MINPOLL 1 and here is a PLL response plot for SHIFT_PLL 4 poll 1 and SHIFT_PLL 2 poll 3: http://fedorapeople.org/~mlichvar/clknetsim/test6.png The initial offset is 0.1 second, after crossing zero offset, they both stay in negative. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Clock and Network Simulator
Rob, With due respect, I don't think you know what you are talking about. The original discipline loop described in rfd1305 was refined as described in my 1995 paper and further refined over the years since then. For each and every refinement a series of tests, both in simulation and in situ, were performed with initial offsets up to +-500 PPM and +-100 ms to verify correct behavior with the original parameters. The daemon loop is required to operate over a time constant between 8 s and 36 h, which is an extremely large range as verified by ongoing configurations here. The kernel loop is designed to replicate the daemon loop over a much narrower range between 8s and 1024 s. The ideal poll interval is 16 and 64 s matching the Allan intercept as described in the literature. It was first implemented for the Alpha in 1992 and refined as described in my 1995 paper and not changed since then. Correct behavior must be confirmed by an experiment such as I suggested in a previous message. Sound engineering principles project that the behavior at other time constants will be as I described. If you don't believe those principles, you are not exercising sound engineering judgment. Dr. Dave Rob wrote: Miroslav Lichvar mlich...@redhat.com wrote: On Wed, Jun 30, 2010 at 10:00:06PM +, David L. Mills wrote: Is there somebody around here that understands feedback control theory? You are doing extreme violence to determine a really simple thing, the discipline loop impulse response. There is a much simpler way. It was a demonstration of what clknetsim can do. You may be able to predict the result, but I'm not. I think being able to verify a theory with simulations is always a good thing. Mr Mills is of the school that says the design predicts that the program behaves like that, and the implementation has not changed for 10 years, so it must be correct. While I normally adhere to the same principles, it happened just a week or two ago that he had to admit that there was a bug in code that he firmly believed was correct and that an observed behaviour could not possibly happen. So I agree with you that it never hurts to test something that theory has already proved to be correct. Maybe the actually released program does not really implement the mechanism that was designed, without the programmer knowing it. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Clock and Network Simulator
Rob, Your comment makes no sense. The actual code implemented from my design was tested in Solaris and also in FreeBSD. In both cases the tests confirmed the behavior described previously. I have not tested it in Linux. If it performs other than as I described, the port is broken. Dave Rob wrote: David L. Mills mi...@udel.edu wrote: Rob, With due respect, I don't think you know what you are talking about. Read it again. I don't question your design, I question your claims that the code implements the design which are upheld even when the contrary is shown in observations. Even when your design is perfect and your belief is firm, there can still be bugs. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Clock and Network Simulator
Miroslav, Is there somebody around here that understands feedback control theory? You are doing extreme violence to determine a really simple thing, the discipline loop impulse response. There is a much simpler way. Forget everything except the tools that come with the NTP distribution. Find a good, stable server and light up a selected client. Make sure the client kernel is enabled. Set minpoll and maxpoll to 6. Configure the loopstats monitoring function. Run the client until operation stabilizes as determined by the loopstats data While the daemon is running, use ntptime to set the clock offset to 100 ms. Go away and do something useful a couple of yours. Inspect the loopstats data. It should start at 100 ms, exponentially decay to zero in about 3000 s, overshoot about six percent of the initial offset., then slowly decrease to zero over a period of hours. This is the intended nominal behavior for a poll interval (same as time constant) of 6. If you increase (decrease) the poll interval by one, the impulse response will look the same, but at double (half) the time scale. This should hold true for poll intervals from 3 to 10. Of particular importance is the damping factor, which is evident from the overshoot. If SHIFT_PLL is radically changed, I would expect the overshoot to be replaced by an exponentially decaying ring characteristic. With the intended loop constants the behavior should have a single overshoot characteristic in the order of a few percent. From a mathematical and engineering point of view the intended behavior provides the fastest convergence, relative to the chosen time constant, with only nominal overshoot. If the intended effect of the SHIFT_PLL change was to decrease the convergence time, that is the absolute worst thing to do. The nanokernel allows the time constant to range from zero to ten and carefully scales the state variables to match, although it (and the daemon discipline) starts to become unstable at values below 3, the minimum enforced by the daemon. The change in SHIFT_PLL would result in unstable behavior below 5 (32 s), as well as serious transients if the discipline shifts from the daemon to the kernel and back. All feedback loops become unstable unless the time constant is at least several times the frequency update interval, which is this case is one second. If you do want to explore how stability may be affected, restore the original design and recompile the distribution with NTP_MINPOLL changed from 3 to 1. Now to the issue of multiple tandem server/clients. You don't need to explore the behavior; it can be reliably predicted. Assume the server and all downstream clients are started at the same time. The impulse response of the first downstream client of the original client operating as a server is the convolution of the original impulse response with itself. Roughly speaking, the offset decay is slower, reaching zero in twice the original time or about 6000 s. The behavior of the next downstream client is the convolution of this convolution and the original impulse response and so on. To fix the original problem reported to me, change the frequency gain (only) by the square of 100 divided by the new clock frequency in Hz. For instance, to preserve the loop dynamics with a 1000-Hz clock, divide the frequency gain parameter by 100. In the original nanokernel routine ktime.c at line 60 there is a line SHIFT_PLL * 2 + time_constant. Replacing it by SHIFT_PLL * 20 + time_constant would fix the progblem for 1000-Hz clocks. Dave Miroslav Lichvar wrote: On Tue, Jun 29, 2010 at 06:31:01PM +, David L. Mills wrote: From your description your simulator is designed to do something else, but what else is not clear from your messages. It might help to describe an experiment using your simulator and show what results it produces. It's designed to test NTP implementations, but it uses a more general approach. Ntpdsim tests ntpd as an NTP client with simulated NTP servers in a simulated network. Clknetsim doesn't simulate NTP servers, it simulates only a network to which are connected real NTP clients and servers. The difference is that ntpdsim tests one NTP client and clknetsim tests whole NTP network. Say we want to test how does the Linux SHIFT_PLL change affect an NTP network. There is a chain of seven ntpd daemons configured, all using poll 6. Strata 1, 3, 5, 7 have SHIFT_PLL 2 and strata 2, 4, 6 have SHIFT_PLL 4. Stratum 1 has clock with zero wander and frequency offset and is using the LOCAL driver, the rest have clocks with 1ppb/s wander. Between all nodes is network delay with exponential distribution and a constant jitter. The simulations are repeated with jitter starting from 10 microseconds and increased to 0.1 second in 28 steps. Each simulation is 400 seconds long and the result is a list of RMS offsets, one for each stratum. After finishing all iterations, we'll make an RMS
Re: [ntp:questions] Clock and Network Simulator
Bill, The ntpdsim simulator uses the real system clock, which is modeled as the Allan variance. However, the server clock is modeled as a random-walk process computed as the integral of a Gaussian process. The network is modeled as an exponential distribution, although provisions have been made to model transients in the form of step changes. Dave unruh wrote: On 2010-06-30, David Woolley da...@ex.djwhome.demon.invalid wrote: Miroslav Lichvar wrote: and is using the LOCAL driver, the rest have clocks with 1ppb/s wander. Between all nodes is network delay with exponential distribution and a constant jitter. The simulations are repeated with Real world NTP networks don't behave like that, and most of the things that annoy people relate to the real world behaviour. Real networks are subject to diurnal and near step changes in frequency. ? It seems you are discussing the behaviour of the clocks, not of the network. (the network has no frequency). His simulator works with real clocks which have exactly the behaviour you describe. Now it may be that the network model is not the best ( random long delays in one way trip time due to network overload). Dr Mills modelling also has this problem. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Clock and Network Simulator
Guys, Mercy me! I lied through my incisors. Upon review, the NTP simulator simulates the system interfaces to set, adjust and read the hardware system clock, so the actual hardware is not involved. It's been several years since I used the simulator and it has been updated since then. Dave David L. Mills wrote: Bill, The ntpdsim simulator uses the real system clock, which is modeled as the Allan variance. However, the server clock is modeled as a random-walk process computed as the integral of a Gaussian process. The network is modeled as an exponential distribution, although provisions have been made to model transients in the form of step changes. Dave unruh wrote: On 2010-06-30, David Woolley da...@ex.djwhome.demon.invalid wrote: Miroslav Lichvar wrote: and is using the LOCAL driver, the rest have clocks with 1ppb/s wander. Between all nodes is network delay with exponential distribution and a constant jitter. The simulations are repeated with Real world NTP networks don't behave like that, and most of the things that annoy people relate to the real world behaviour. Real networks are subject to diurnal and near step changes in frequency. ? It seems you are discussing the behaviour of the clocks, not of the network. (the network has no frequency). His simulator works with real clocks which have exactly the behaviour you describe. Now it may be that the network model is not the best ( random long delays in one way trip time due to network overload). Dr Mills modelling also has this problem. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Clock and Network Simulator
Miroslav, You don't need to read the code, just the documentation. The ntpdsim is what the literature calls a probabilistic discrete event simulator (PDES). It provides a virtual environment including multiple servers and simulated network behavior within which the actual ntpd code runs. Everything, including packet send and receive, is simulated independent of operating system. It runs hundreds of times faster than the real system. It is indent is to study the synchronization behavior under low-probabiliy and long-baseline conditions. It is not a simple system; it simulates the actual conditions in a real world, multiple server environment and provides the same error reports, statistics reports and synchronization data as the real system.. From your description your simulator is designed to do something else, but what else is not clear from your messages. It might help to describe an experiment using your simulator and show what results it produces. Dave Miroslav Lichvar wrote: On Mon, Jun 28, 2010 at 05:35:34PM +, David L. Mills wrote: How is your simulator different than the one included in the NTP software distribution? If I read the code correctly, ntpdsim simulates inside ntpd a minimalistic NTP server (or multiple servers) with configured wander and network delay, and an ideal local clock. clknetsim simulates clocks and a network to which are connected unmodified ntpd daemons. The simulation is transparent for them. With symbol preloading (dynamic linker's LD_PRELOAD variable) they don't use the real system calls like sendto(), recvfrom(), select(), gettimeofday(), ntp_adjtime(), but they use the symbols provided by clknetsim instead. The system calls are passed to the clknetsim server, which synchronizes all events, adjusts the clocks, forwards NTP packets and monitors the real time and frequency offsets of the virtual clocks. So, for a simulation similar to ntpdsim, at least two ntpd daemons have to be connected to clknetsim, one configured as a server with LOCAL clock (adding a refclock source is on my todo list) and a client which will be the subject of testing. The advantage of this approach is that it allows you to test pretty much anything that can be tested in a real network, a chain of NTP servers (few such tests are on the clknetsim webpage), server/peer modes, broadcast modes, authentication, compatibility with older versions or different NTP implementations, etc. The disadvantage is that the simulation runs slower. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Reference clock driver for /dev/rtc
Miroslav, Changing the value of SHIFT_PLL from 4 to 2 makes the kernel discipline behave quire differently than the daemon discipline. When switching from one to the other, the result will be a serious transient. With a kernel update interval of one second, this value violates the minimum delay requirement of the feedback loop. The design was very carefully done according to sound engineering theory and practice. The design insures optimum rise time relative to the time constant along with a controlled overshoot of about five percent. The change to SHIFT_PLL will compromise the intended behavior. If you make changes like that, be sure someone is around with knowledge of feedback control theory. The frequency gain problem was reported some time ago and I provided a simple fix which reduces the frequency gain by a factor of the square of the ratio of the actual clock frequency to 100 Hz. It has nothing to do with a tickless kernel. The extra pole in the adjtime() emulation was reported to me some time ago and might have since been removed. The result with the extra pole will be underdamping at large poll intervals resulting in oscillatory behavior. It is most serious where the adjtime() pole frequency is close to the discipline poll frequency. If it makes any difference, SGI has the same problem. Linux users should be told the incompatibility with the ntp_gettimeofday() call means the TAI-UTC offset feature provided by the NIST leapseconds file and the Autokey protocol will be unavailable. This is most important for NASA/JPL users when converting to and from UTC and TAI and eventually to TDB for deep space missions. Dave Miroslav Lichvar wrote: On Sat, Jun 26, 2010 at 03:07:33PM +, David L. Mills wrote: Another case in which the engineering model in Linux and NTP are not compatible. Neither is necessarily wrong, just different. The following issues are known to me. 1. The Linux kernel discipline code adapted from my Alpha code of the 1990s does not account for the frequency gain at other than a 100-Hz clock. With a 1000-Hz clock this results in serious instability. I pointed this out some years ago and it is a trivial modification, but so far as I know it has not been fixed. I think this was fixed few years ago when the tickless mode was introduced in the kernel. However, current kernels have one compile time constant set differently from the standard implementation, the PLL gain shift (SHIFT_PLL) is 2 instead of 4. BTW, on the clknetsim page I announced in my previous post there are some tests done with both SHIFT_PLL constants. 2. The Linux adjtime mechanism inserts an extra pole in the impulse response, presumably to speed convergence when relatively large adjustments are made. This makes NTP unstable at the larger poll intervals when the kernel discipline is not in use. Both the kernel and daemon discipline loops are carefully designed according to sound engineering principles for optimum response, but the extra pole defeats the design. In which Linux version was this or how large the adjustment needed to be? I don't remember ever seeing it and the ntpd daemon mode works fine as far as I can tell. 3. The calling sequence for the ntp_gettime() system call is incompatible with current use. As a result, access to the TAI-UTC offset by application programs is not available. This probably won't be fixed as it would break glibc compatibility. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Reference clock driver for /dev/rtc
Bill, Another case in which the engineering model in Linux and NTP are not compatible. Neither is necessarily wrong, just different. The following issues are known to me. 1. The Linux kernel discipline code adapted from my Alpha code of the 1990s does not account for the frequency gain at other than a 100-Hz clock. With a 1000-Hz clock this results in serious instability. I pointed this out some years ago and it is a trivial modification, but so far as I know it has not been fixed. 2. The Linux adjtime mechanism inserts an extra pole in the impulse response, presumably to speed convergence when relatively large adjustments are made. This makes NTP unstable at the larger poll intervals when the kernel discipline is not in use. Both the kernel and daemon discipline loops are carefully designed according to sound engineering principles for optimum response, but the extra pole defeats the design. 3. The calling sequence for the ntp_gettime() system call is incompatible with current use. As a result, access to the TAI-UTC offset by application programs is not available. 4. As in the current instance, management of the RTC and system clock is incompatible. This issue should be reviewed in the context of the various models, whether the kernel or daemon discipline is in use and whether the system is awake or sleeping. There are probably others and they probably could be resolved to insure a consistent model between Linux and other operating systems. Dave unruh wrote: On 2010-06-23, David L. Mills mi...@udel.edu wrote: Pavel, Linux has many, many times broken the NTP model compatible with other systems such as Solaris and FreeBSD, among others. I have no trouble with that as long as whatever modifications are required in NTP to make the RTC driver work remain proprietary to Linux and never leak to other systems. I have no idea what the Linux 11-minute process is about, but it probably conflicts with the NTP 1-hour RTC alignment. Linux, depending on the setting of a flag in the adjtimex setup, sets the rtc from the system time once every 11 min. . This is a disaster if you have a procedure to discipline the rtc (eg hwclock, or chrony) and the sychronization flag must be kept unset to prevent this behaviour. On most systems the rtc is used to set the clock whent he computer is down . Ie, the rtc in those cases CANNOT be disciplined. All you can do is to determine the offset and drift rate of the rtc to make the use as accurate as possible when the system comes up again. It is very hard to detrmine the drift rate of a clocck that keeps getting reset To then use the rtc mechanism in a VM seems to me to be overloading the mechanism, making it hard to anything reasonable with it. If your driver requires the Linux model, whatever modifications are required in the base code (#ifdefs) will not be supported here and may conflict with future developments. On the other hand, it could be, for example, that the RTC provide a 1-second interrupt similar to the PPS signal now. On that assumption the base code might have a feature that supports the RTC in much the same way the NMEA driver does now. That would be a generic solution and nicely fit the NTP. Dave Krejci, Pavel wrote: Hello Dave, From: David L. Mills [mailto:mi...@udel.edu] Sent: Wednesday, June 23, 2010 4:42 AM To: Krejci, Pavel Cc: questions@lists.ntp.org Subject: Re: [ntp:questions] Reference clock driver for /dev/rtc Pavel, It's not as simple as that. Normally, ntpd uses settimeofday() once per hour to set the system clock, which has the side effect of setting the RTC. Obviously, you don't want that. If the RTC refclock is enabled, that has to be disabled, so some kind of interlock must be devised. This can be a tricky business and have unintended consequences if something or other fails. The interlocks with the PPS signal come to mind. Do you mean the 11 minute mode in Linux, when the system time is periodically written to the rtc in 11 minute intervals? This is triggered by the synch status (time_status variable in the kernel). I've solved this by periodically resetting this synch status in my refclock driver. You are correct in that the RTC has in general far better temperature compensation than either the system clock or the TSC/PCC counter. However, its resolution is generally far worse. Even so, the lowpass character of the clock discipline masks this so actual delivered system time should be quite good. Chapter 15 of my new book due in September contains an extensive discussion on these issues. Theoretically the worst RTC resolution is 1 second, but usually if offers update IRQ whenever the seconds counter changes. And this gives good resolution for my system. Attached is the /dev/rtc peerstats from my qemu guest
Re: [ntp:questions] Reference clock driver for /dev/rtc
Kalle, Calling settimeofday() is completely transparent to the kernel and ntpd state variables, including the UNSYNC bit; however, the actions in Linux might violate this design. Setting the RTC is a byproduct of settimeofday(), but in general setting the time to the current time is a no-op, at least to within 800 microseconds in a 1986 SPARC IPC running SunOS 4, but much less in modern times. See the comments at about line 228 in ntp_util.c; note the code is enabled by the DOSYNCTODR define. If there is a more generic way to set the RTC over all or most operating systems, it should be reconsidered. The Linux folks are invited to contribute #ifdefs as necessary. As it is, the current code goes back to SunOS circa 1986. Dave Kalle Pokki wrote: On Sat, Jun 26, 2010 at 11:48, David Woolley da...@ex.djwhome.demon.invalid wrote: I think you missed Dave Mills' point that ntpd does this every 60 minutes, so will also break mechanisms for compensating for RTC drift whilst the processor is powered down. I don't understand. Settimeofday() isn't about updating the RTC. It updates the system clock. Why would ntpd call that regularly and cause unnecessary jumps in system time? Calling settimeofday() also clears all NTP state variables inside the kernel and sets the UNSYNC bit. ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions ___ questions mailing list questions@lists.ntp.org http://lists.ntp.org/listinfo/questions
Re: [ntp:questions] Reference clock driver for /dev/rtc
Pavel, Linux has many, many times broken the NTP model compatible with other systems such as Solaris and FreeBSD, among others. I have no trouble with that as long as whatever modifications are required in NTP to make the RTC driver work remain proprietary to Linux and never leak to other systems. I have no idea what the Linux 11-minute process is about, but it probably conflicts with the NTP 1-hour RTC alignment. If your driver requires the Linux model, whatever modifications are required in the base code (#ifdefs) will not be supported here and may conflict with future developments. On the other hand, it could be, for example, that the RTC provide a 1-second interrupt similar to the PPS signal now. On that assumption the base code might have a feature that supports the RTC in much the same way the NMEA driver does now. That would be a generic solution and nicely fit the NTP. Dave Krejci, Pavel wrote: Hello Dave, From: David L. Mills [mailto:mi...@udel.edu] Sent: Wednesday, June 23, 2010 4:42 AM To: Krejci, Pavel Cc: questions@lists.ntp.org Subject: Re: [ntp:questions] Reference clock driver for /dev/rtc Pavel, It's not as simple as that. Normally, ntpd uses settimeofday() once per hour to set the system clock, which has the side effect of setting the RTC. Obviously, you don't want that. If the RTC refclock is enabled, that has to be disabled, so some kind of interlock must be devised. This can be a tricky business and have unintended consequences if something or other fails. The interlocks with the PPS signal come to mind. Do you mean the 11 minute mode in Linux, when the system time is periodically written to the rtc in 11 minute intervals? This is triggered by the synch status (time_status variable in the kernel). I've solved this by periodically resetting this synch status in my refclock driver. You are correct in that the RTC has in general far better temperature compensation than either the system clock or the TSC/PCC counter. However, its resolution is generally far worse. Even so, the lowpass character of the clock discipline masks this so actual delivered system time should be quite good. Chapter 15 of my new book due in September contains an extensive discussion on these issues. Theoretically the worst RTC resolution is 1 second, but usually if offers update IRQ whenever the seconds counter changes. And this gives good resolution for my system. Attached is the /dev/rtc peerstats from my qemu guest system. The clock offset keeps under 1 milisecond which is enough for our purposes. I will check your book when published. Regards Pavel Dave Krejci, Pavel wrote: Hi, well, then, do you find it useful? How should I proceed to contribute into ntpd project? Thanks Pavel -Original Message- From: unruh [mailto:un...@wormhole.physics.ubc.ca] Sent: Thursday, June 17, 2010 11:48 AM To: questions@lists.ntp.org Subject: Re: [ntp:questions] Reference clock driver for /dev/rtc On 2010-06-16, Krejci, Pavel pavel.kre...@siemens-enterprise.com wrote: Hi, -Original Message- From: unruh [mailto:un...@wormhole.physics.ubc.ca] Sent: Tuesday, June 15, 2010 7:15 PM To: questions@lists.ntp.org Subject: Re: [ntp:questions] Reference clock driver for /dev/rtc On 2010-06-15, Krejci, Pavel pavel.kre...@siemens-enterprise.com wrote: Hi, since I cannot use kvm-clock as the clock source (older guest kernel) I am using pit as the clock source. According to my tests this is the most stable clock source among tsc,hpet but still can drift. Since the qemu keeps the /dev/rtc perfectly synchronized with the Host's system time it is a good time source for the ntpd on the guest. The host itself is then sychronized via NTP with the external time server. I don't know any other way how to read the system time from the Host, please offer if you have some. I do not understand. If you driver can read the rtc, it can read the system clock instead. I am not reading the Host's /dev/rtc. I am reading the Guest's /dev/rtc, which is synchronized with the Host's system clock. OK, if that is the way your virtual system works, (Ie it delivers the system time via /dev/rtc) then so be it. I would say it is terrible, since it uses a predefined item ( rtc) to deliver something totally different ( the system time of the underlying host) rtc has numberous idiosyncracies, not oleast being that it delivers only times with one second precision. It also delivers an interrupt on one second boundaries, is written by a displacement of .5 sec (Ie if you write the time x to it, that time refers to the time of the rtc .5 sec in the future. ) I