Re: [ntp:questions] ntpdate.c unsafe buffer write
Hello Ulrich, Ulrich Windl wrote: Is that too simple? msyslog(LOG_ERR, authentication key %lu unknown, (unsigned long)sys_authkey); Oooh, of course that't the best fix. I have already prepared a patch but I must have been blind that I didn't see the obvious solution. Since the patch has not yet been committed I'll update it once more. Thanks, Martin -- Martin Burnicki Meinberg Funkuhren Bad Pyrmont Germany ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Ulrich Windl wrote: Unruh [EMAIL PROTECTED] writes: In ntpdate.c around line 542 (4.2.4p4)is the sequence if (!authistrusted(sys_authkey)) { char buf[10]; (void) sprintf(buf, %lu, (unsigned long)sys_authkey); msyslog(LOG_ERR, authentication key %s unknown, buf); Is that too simple? msyslog(LOG_ERR, authentication key %lu unknown, (unsigned long)sys_authkey); In this case it's the right solution. There's no need for an intermediate buffer here. Danny ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Unruh [EMAIL PROTECTED] writes: In ntpdate.c around line 542 (4.2.4p4)is the sequence if (!authistrusted(sys_authkey)) { char buf[10]; (void) sprintf(buf, %lu, (unsigned long)sys_authkey); msyslog(LOG_ERR, authentication key %s unknown, buf); Is that too simple? msyslog(LOG_ERR, authentication key %lu unknown, (unsigned long)sys_authkey); exit(1); } Since unsigned long does not have a definite length on all machines, and with the trailing zero certainly is potentially longer than 10 bytes, that buf is ripe for buffer overflow. It should be something like char buf[(sizeof(unsigned long)*12/5+2)]; And/or the sprintf should be an snprintf. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
David L. Mills [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Is there also a random backoff after an increase of the polling interval? No. However, there is a small dither of a few percent at all poll intervals to resist self-synchronization. The natural behavior of a bunch of oscillators near the same frequency is to become one giant phase-locked oscillator. Adding a bit of random fuzz at each poll turns each oscillator into a mini random-walk which breaks up that tendency. The fuzz is not a lot, like 10 percent. Do you mean the dither alluded to above is cumulative? I was never much good with statistics and remember only that the expectation of the offset after N steps in a random walk is sqrt(N) times the average step size. Not a clue what the distribution might be. Intuitively, I would be aiming for uniform, and randomly adding half a polling interval delay when doubling it seemed to me like it would do that. Groetjes, Maarten Wiltink ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Hal Murray wrote: 20 ms sounds like a typical DSL link. That 1ms accuracy goes out the window if you are doing a big download. (At least on my DSL link.) People don't generally do big downloads during the boot of a machine! On a big network, the most likely reason for rebooting a timeserver in prime time is a power failure. In which case the whole network is likely to be down. At worst, using ntpdate -b, you only get something like the current ntpd behavour. Typically you end up within a millisecond. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
David L. Mills [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] No, there is no random delay at startup. Each association starts one second after the previous one. The random backoff occurs only after a step. Is there also a random backoff after an increase of the polling interval? No. However, there is a small dither of a few percent at all poll intervals to resist self-synchronization. Wouldn't that be a nice feature to add? If it's currently polling a server on, say second 100 (reckoned externally) of 256, to go to either 100 _or 356_ of 512. I understand that there are already some random waits in the client code and Internet servers are well protected by random noise. But for large numbers of clients in a uniform environment that were all started at about the same time, is there any way they tend to naturally disperse across the final 1024s polling interval? Groetjes, Maarten Wiltink ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Maarten, The natural behavior of a bunch of oscillators near the same frequency is to become one giant phase-locked oscillator. Adding a bit of random fuzz at each poll turns each oscillator into a mini random-walk which breaks up that tendency. The fuzz is not a lot, like 10 percent. Dave Maarten Wiltink wrote: David L. Mills [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] No, there is no random delay at startup. Each association starts one second after the previous one. The random backoff occurs only after a step. Is there also a random backoff after an increase of the polling interval? No. However, there is a small dither of a few percent at all poll intervals to resist self-synchronization. Wouldn't that be a nice feature to add? If it's currently polling a server on, say second 100 (reckoned externally) of 256, to go to either 100 _or 356_ of 512. I understand that there are already some random waits in the client code and Internet servers are well protected by random noise. But for large numbers of clients in a uniform environment that were all started at about the same time, is there any way they tend to naturally disperse across the final 1024s polling interval? Groetjes, Maarten Wiltink ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Dave, David L. Mills wrote: Serge, The behavior after a step is deliberate. The iburst volley after a step is delayed a random fraction of the poll interval to avoid implosion at a busy server. An additional delay may be enforced to avoid violating the headway restrictions. This is not to protect your applications; it is to protect the server. Is it really necessary to insert a random delay after a step? There has already been a random delay immediately after startup, before the client has decided that a step was required. So even if a bunch of clients started up at the same time and had to step, they wouln't step at the same time, and thus wouldn't do the next iburst volley at the same time anyway. Martin -- Martin Burnicki Meinberg Funkuhren Bad Pyrmont Germany ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Harlan Stenn wrote: For the general use case (LAN and/or WAN and/or jerky path) ntpd behaves well. We are talking typical rather than general cases. In the typical case, 1ms after 1 second is a reasonable expectation on a WAN, especially when a site is restarting, e.g. after a power failure, or a home system switching on, and, therefore, the network load is low. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Dave, David L. Mills wrote: [...] The ntpd time constant is purposely set somewhat large at 2000 s, which results in a risetime of about 3000 s. This is a compromise for stable acquisition for herky-jerky Internet paths and speed of convergence for LANs. For typical Internet paths the Allan intercept is about 2000 s. For fast LANs with nanosecond clock resolution, the Allan intercept can be as low as 250s, which is what the kernel PPS loop is designed for. Wouldn't it make sense to adjust the time constant depending on the time after startup, and/or the quality of the responses from the upstream servers? I.e. the time constant could be smaller after startup to get a fast initial correction, and then increase depending on the requirements. The packet delay and jitter should also give a good indication whether an upstream server is on the local LAN, or on the internet. So the settings used to make ntpd work well for the worst cases could be used if those cases apply, but the limitations could be reduced in non-worst cases. Martin -- Martin Burnicki Meinberg Funkuhren Bad Pyrmont Germany ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Unruh wrote: David Woolley [EMAIL PROTECTED] writes: Harlan Stenn wrote: For the general use case (LAN and/or WAN and/or jerky path) ntpd behaves well. We are talking typical rather than general cases. In the typical case, 1ms after 1 second is a reasonable expectation on a WAN, especially when a site is restarting, e.g. after a power failure, or a home system switching on, and, therefore, the network load is low. I think you go t your units mixed up. computer A goes down for three days due to an avalanch cutting the power. It takes a lot longer than one second to resync that computer. A few hours is more like it. I was talking about what people could expect from software that behaved well; I think you are describing what ntpd actually does here. My point was that ntpd's ability to tolerate really rotten links is irrelevant for most users, who are only about 20ms away from their ISP's time server, and can expect to read it to about 1ms accuracy. If you mean, I shut down ntp and restart it immediately , then 1ms in 1 minute is reasonable ( you cannot have made enough measurements in 1 sec to even know if it is accurate.) ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Bill, In article [EMAIL PROTECTED], Unruh [EMAIL PROTECTED] writes: Unruh Why not? The power comes on on your computer farm of 2000 machines, Unruh all the clients are the same type so the bootup sequence is Unruh identical. They all start ntp at the same time, to within a second or Unruh so. And suddenly the poor server is flooded. 2000 packets hitting ntpd all at once should not be a problem for an ntp server in that environment. -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Maarten, No. However, there is a small dither of a few percent at all poll intervals to resist self-synchronization. Dave Maarten Wiltink wrote: David L. Mills [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] No, there is no random delay at startup. Each association starts one second after the previous one. The random backoff occurs only after a step. Is there also a random backoff after an increase of the polling interval? Groetjes, Maarten Wiltink ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Unruh, Depends who the clients are. An ntpd client will not come up in the first second, although successive associations will come up at 2-s intervals. I would not expect 2000 clients to come up at the same exact time anyway due ordinary latency variations in the boot process. I would be more worried about a broacast server coming up with 2000 corporate broadcast clients, but in that case the initial client response is randomized over the poll interval. Dave Unruh wrote: Martin Burnicki [EMAIL PROTECTED] writes: Dave, David L. Mills wrote: Serge, The behavior after a step is deliberate. The iburst volley after a step is delayed a random fraction of the poll interval to avoid implosion at a busy server. An additional delay may be enforced to avoid violating the headway restrictions. This is not to protect your applications; it is to protect the server. Is it really necessary to insert a random delay after a step? There has already been a random delay immediately after startup, before the client has decided that a step was required. So even if a bunch of clients started up at the same time and had to step, they wouln't step at the same time, and thus wouldn't do the next iburst volley at the same time anyway. Why not? The power comes on on your computer farm of 2000 machines, all the clients are the same type so the bootup sequence is identical. They all start ntp at the same time, to within a second or so. And suddenly the poor server is flooded. Martin -- Martin Burnicki Meinberg Funkuhren Bad Pyrmont Germany ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
I was talking about what people could expect from software that behaved well; I think you are describing what ntpd actually does here. My point was that ntpd's ability to tolerate really rotten links is irrelevant for most users, who are only about 20ms away from their ISP's time server, and can expect to read it to about 1ms accuracy. 20 ms sounds like a typical DSL link. That 1ms accuracy goes out the window if you are doing a big download. (At least on my DSL link.) -- These are my opinions, not necessarily my employer's. I hate spam. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Hello David, On Tuesday, February 12, 2008 at 15:04:45 +, David L. Mills wrote: Serge Bets wrote: ntpd -q can make use of the driftfile to set the kernel frequency That was removed as a significant security hazard. Why exactly? If you want to rxplicitly set the frequency, use ntptime -f. Sure: I can preset the frequency by hand. But not setting the frequency is not a sensible option: it's required for good ntpq -q operations, otherwise slews don't end on the zero. Ths scheme is designed so you can run ntpd until the kernel frequency has stabilized, then kill ntpd and run SNTP client at regular intervals. There is no obstacle to that. When ntpd quits, the kernel runs on the last computed frequency. Without driftfile, ntpd -q runs above this frequency. With a driftfile, ntpd -q could even run above this frequency after a reboot. The obstacle if one existed would be a frequency reset to zero at startup, like done by loop_config(LOOP_DRIFTINIT). Fortunately this doesn't happen in mode_ntpdate (the -q flag). Serge. -- Serge point Bets arobase laposte point net ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Martin Burnicki wrote: Wouldn't it make sense to adjust the time constant depending on the time after startup, and/or the quality of the responses from the upstream servers? It does get adjusted. We are talking about the minimum value! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Martin Burnicki [EMAIL PROTECTED] writes: Dave, David L. Mills wrote: Serge, The behavior after a step is deliberate. The iburst volley after a step is delayed a random fraction of the poll interval to avoid implosion at a busy server. An additional delay may be enforced to avoid violating the headway restrictions. This is not to protect your applications; it is to protect the server. Is it really necessary to insert a random delay after a step? There has already been a random delay immediately after startup, before the client has decided that a step was required. So even if a bunch of clients started up at the same time and had to step, they wouln't step at the same time, and thus wouldn't do the next iburst volley at the same time anyway. Why not? The power comes on on your computer farm of 2000 machines, all the clients are the same type so the bootup sequence is identical. They all start ntp at the same time, to within a second or so. And suddenly the poor server is flooded. Martin -- Martin Burnicki Meinberg Funkuhren Bad Pyrmont Germany ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Serge, That was removed as a significant security hazard. If you want to rxplicitly set the frequency, use ntptime -f. Ths scheme is designed so you can run ntpd until the kernel frequency has stabilized, then kill ntpd and run SNTP client at regular intervals. I surely wouldn't recommend that, but folks have their biases. Dave Serge Bets wrote: Hello David, On Tuesday, February 12, 2008 at 2:43:06 +, David L. Mills wrote: Just for clarity, neither the daemon nor kernel frequency is adjusted in any way with ntpd -q. ntpd -q can make use of the driftfile to set the kernel frequency: | # ntpd -q -d | grep frequency | addto_syslog: frequency initialized -1.752 PPM from /var/lib/ntp/ntp.drift Note that this is plain necessary for the correct operations of ntpd -q. If the kernel frequency was not initialised, then a slew would not end right on the zero offset. Serge. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Hello David, On Tuesday, February 12, 2008 at 3:03:37 +, David L. Mills wrote: The behavior after a step is deliberate. The iburst volley after a step is delayed a random fraction of the poll interval to avoid implosion at a busy server. Ah OK, I understand now! Thank you. This makes me wonder: When starting ntpd -gq doing a step and quitting, then immediatly starting ntpd daemon, this sequence sends 2 iburst volleys, over around 14 seconds, without the said random delay in between. Is that not rude to servers? The slew_sleeping script should be modified to sleep some time after a step. How much? 16 to 64 s? | /^ntpd: time set .*s$/ { | sleep = 16 + int(rand() * 49) | success = 1 | } Serge. -- Serge point Bets arobase laposte point net ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Hello David, On Tuesday, February 12, 2008 at 2:43:06 +, David L. Mills wrote: Just for clarity, neither the daemon nor kernel frequency is adjusted in any way with ntpd -q. ntpd -q can make use of the driftfile to set the kernel frequency: | # ntpd -q -d | grep frequency | addto_syslog: frequency initialized -1.752 PPM from /var/lib/ntp/ntp.drift Note that this is plain necessary for the correct operations of ntpd -q. If the kernel frequency was not initialised, then a slew would not end right on the zero offset. Serge. -- Serge point Bets arobase laposte point net ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Hello Harlan, On Tuesday, February 12, 2008 at 3:22:59 +, Harlan Stenn wrote: Interesting script - thanks. Would you like me to put it in the distribution? Excellent idea! As contrib example, or installed in bindir along with ntp-wait? what benefit do we get by using the script to delay things while we are waiting for a slew to finish while in state 4? I don't understand the reasoning above your questions, but can reply at first degree to this one: If we didn't delay after ntpd -q, then the daemon would be started while the slew is still in progress during some minutes. This is not a sane situation: The daemon gathers biased data. Serge. -- Serge point Bets arobase laposte point net ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Martin, No, there is no random delay at startup. Each association starts one second after the previous one. The random backoff occurs only after a step. The fact that the initial backoff is small means that the client population is crudely synchronized and could well gang up after a step. There have been incremental changes over the years to randomize and even out the load for busy servers, some of which made folks sad. Originally, the code did randomize at startup, but folks hated that since it resulted in an initial delay averaging 30 s. Now the backoff occurs only when stepped, which is by every measure a rare event. I don't think a step has ever happend with our production servers, unless after extensive downtime for repair. You can easily modify the peer_clear() routine in ntp_proto.c to remove the backoff. If so, you will not be able to use any server running the reference implementation, as the rate violation will result in a dropped packet and, if configured, a KoD. Dave Martin Burnicki wrote: Dave, David L. Mills wrote: Serge, The behavior after a step is deliberate. The iburst volley after a step is delayed a random fraction of the poll interval to avoid implosion at a busy server. An additional delay may be enforced to avoid violating the headway restrictions. This is not to protect your applications; it is to protect the server. Is it really necessary to insert a random delay after a step? There has already been a random delay immediately after startup, before the client has decided that a step was required. So even if a bunch of clients started up at the same time and had to step, they wouln't step at the same time, and thus wouldn't do the next iburst volley at the same time anyway. Martin ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
ntdate -b steps the clock. That's the function under discussion. The one that's used nearly universally in boot sequences. -Tom David L. Mills wrote: Guys, There seems to some misinformation here. Both ntpdate and ntpd -q set the offset with adjtime() and then exit. After that, stock Unix adjtime() slews the clock at rate 500 PPM, which indeed could take 256 s for an initial offset of 128 ms. A prudent response would be to measure the initial offset and compute the time to wait. The ntp-wait script waits for ntpd to enter state 4, which could happen with an initial offset as high as 128 ms. The ntpd time constant is purposely set somewhat large at 2000 s, which results in a risetime of about 3000 s. This is a compromise for stable acquisition for herky-jerky Internet paths and speed of convergence for LANs. For typical Internet paths the Allan intercept is about 2000 s. For fast LANs with nanosecond clock resolution, the Allan intercept can be as low as 250s, which is what the kernel PPS loop is designed for. Both the daemon and kernel loops are engineered so that the time constant is directly proportional to the poll interval and the risetime scales directly. If the poll exponent is set to the minimum 4 (16 s) the risetinme is 500 s. While not admitted in public, the latest snapshot can set the poll interval to 3 (8 s), so the risetime is 250 s. This works just fine on a LAN, but I would never do this on an outside circuit. Dave Unruh wrote: Harlan Stenn [EMAIL PROTECTED] writes: In article [EMAIL PROTECTED], David Woolley [EMAIL PROTECTED] writes: David Harlan Stenn wrote: Why would ntpd be exiting during a warm start? David Because we are discussing using it with the -q option. If you just David use -g, it will take a lot longer to converge within a few David milliseconds, as it will not slew at the maximum rate. If you use David -q, you need to force a step if you want fast convergence. I still maintain you are barking up the wrong tree. In terms of the behavior model of ntp, state 4 is as good as it gets. You are in the right ballpark. And as has been commented on numerous times, ntp is state 4 is very slow to converge to the best possible time control. This was a deliberate design decision, as I understand it, so that in steady state the time is averaged over a large number of samples ( not helped by the fact that 85% of samples are thrown away), to reduce the statistical error in the clock control. Note that at poll 7 the number of actual samples averaged over in the time scale of the ntp feedback loop is only about 3, so the statistical averaging even with such a long time constant, is not very good. If you want something else, something you consider better than state 4, please make a case for this and lobby for it. I think many people have lobbied for faster response. In the discussion of the chrony/ntp comparison, chrony is much faster to correct errors, and at least on a local network, better at disciplining the clock as well ( in part I think because on such a minimal round trip network, the frequency fluctuations dominate over the offset measurement errors-- Ie, the Allen intercept is much lower than the assumed 1500 sec. in that kind of situation-- also the drift model on real systems is not well modeled by 1/f noise.) So, what I think the point is that using ntpdate, one can rapidly bring the clock into a few msec of the correct time, rather than waiting for the feedback loop to finally eliminate that last 128msec of offset. For the case I'm describing the startup script sequence is to fire up 'ntpd -g' early. If there are applications that need the system clock to be on-track stable (even if a wiggle is being dealt with), that's 'state 4', and running 'ntp-wait' before starting those services is, to the best of my knowledge, all that is required. David State 4 means within 128ms and using the normal control loop, which David has a time constant of around an hour. OK, and so what? Is State 4 insufficient for your needs, or are you just splitting hairs? David For a cold start, it won't reach state 4 for a further 900 seconds David after first priming the clock filter. If the system has a good drift file, I disagree with you. David The definition of cold start is that there is no drift file. OK, now I know what the definitions are. I don't recall offhand the expected time to hit state 4 without a drift file. 1) This should not be the ordinary case 2) How does this have any bearing on the ntpdate -b discussion? And what is the big deal with using different config files? The config file mechanism has include capability so it is trivial to to easily maintain common 'base' configuration with customizations for separate start/run phases. David You are now talking about using -q. The difficulty is that people
Re: [ntp:questions] ntpdate.c unsafe buffer write
Harlan Stenn [EMAIL PROTECTED] writes: In article [EMAIL PROTECTED], David Woolley [EMAIL PROTECTED] writes: David Harlan Stenn wrote: Why would ntpd be exiting during a warm start? David Because we are discussing using it with the -q option. If you just David use -g, it will take a lot longer to converge within a few David milliseconds, as it will not slew at the maximum rate. If you use David -q, you need to force a step if you want fast convergence. I still maintain you are barking up the wrong tree. In terms of the behavior model of ntp, state 4 is as good as it gets. You are in the right ballpark. And as has been commented on numerous times, ntp is state 4 is very slow to converge to the best possible time control. This was a deliberate design decision, as I understand it, so that in steady state the time is averaged over a large number of samples ( not helped by the fact that 85% of samples are thrown away), to reduce the statistical error in the clock control. Note that at poll 7 the number of actual samples averaged over in the time scale of the ntp feedback loop is only about 3, so the statistical averaging even with such a long time constant, is not very good. If you want something else, something you consider better than state 4, please make a case for this and lobby for it. I think many people have lobbied for faster response. In the discussion of the chrony/ntp comparison, chrony is much faster to correct errors, and at least on a local network, better at disciplining the clock as well ( in part I think because on such a minimal round trip network, the frequency fluctuations dominate over the offset measurement errors-- Ie, the Allen intercept is much lower than the assumed 1500 sec. in that kind of situation-- also the drift model on real systems is not well modeled by 1/f noise.) So, what I think the point is that using ntpdate, one can rapidly bring the clock into a few msec of the correct time, rather than waiting for the feedback loop to finally eliminate that last 128msec of offset. For the case I'm describing the startup script sequence is to fire up 'ntpd -g' early. If there are applications that need the system clock to be on-track stable (even if a wiggle is being dealt with), that's 'state 4', and running 'ntp-wait' before starting those services is, to the best of my knowledge, all that is required. David State 4 means within 128ms and using the normal control loop, which David has a time constant of around an hour. OK, and so what? Is State 4 insufficient for your needs, or are you just splitting hairs? David For a cold start, it won't reach state 4 for a further 900 seconds David after first priming the clock filter. If the system has a good drift file, I disagree with you. David The definition of cold start is that there is no drift file. OK, now I know what the definitions are. I don't recall offhand the expected time to hit state 4 without a drift file. 1) This should not be the ordinary case 2) How does this have any bearing on the ntpdate -b discussion? And what is the big deal with using different config files? The config file mechanism has include capability so it is trivial to to easily maintain common 'base' configuration with customizations for separate start/run phases. David You are now talking about using -q. The difficulty is that people David have enough trouble getting the run phase config file right. I mention it because it's what you seem to be insisting on talking about. I was providing a way to address the problems you describe with the (IMO bad) mechanism (-q) under discussion. But the bigger problem is why are you insisting on separate start/run phases? This has not been best practice for quite a while, and if you insist on using this method you will be running in to the exact problems you are describing. No, the best advice is to understand why you have been using ntpdate -b so far and understand the pros/cons of the new choices. David We are talking about system managers and package creators, neither of David which have much time to study the details. Blessed are those who get what they deserve. These are the same folks who must get ssh configurations and various other network configurations working. If the stock things work well enough for folks, great. If folks have suggestions for improvements I welcome them. If folks want something different I invite them to make a case for it. Please remember the scope and complexity of the problem case. It's much easier to have a simpler solution if one is prepared to ignore certain problems. Another case in this point is Maildir. If somebody is in the situation where they know they have specific requirements for time, they are in the situation where they have enough altitude on their requirements to know the costs/benefits of what is involved in getting there. Well, I disagree. The sign of a good piece of software is that it does what it needs to do
Re: [ntp:questions] ntpdate.c unsafe buffer write
Hello David, On Monday, February 11, 2008 at 19:03:36 +, David L. Mills wrote: Both ntpdate and ntpd -q set the offset with adjtime() and then exit. After that, stock Unix adjtime() slews the clock at rate 500 PPM, which indeed could take 256 s for an initial offset of 128 ms. And on some systems, adjtime() calls adjtimex(ADJ_OFFSET_SINGLESHOT) to do the job. Note that ntpdate does not stop slewing when it reaches the zero offset, but voluntarily overshoots by 50%. That's why ntpdate -b (forced step) or ntpd -q (exact slew until zero) are so much better. A prudent response would be to measure the initial offset and compute the time to wait. Thanks! That's exactly what does the slew_sleeping script: #!/bin/sh function slew_sleeping() { awk ' {print} /^ntpd: time slew .*s$/ { sleep = $4 * 2000 if (sleep 0) sleep = -sleep sleep = int(sleep + 0.99) # rounded by excess success = 1 } /^ntpd: time set .*s$/ { success = 1 } END{ if (sleep) { printf wait for the end of time slew, sleeping %d seconds\n, sleep system(sleep sleep) } exit success } ' } # echo ntpd: time slew -0.003000s | slew_sleeping; exit while ntpd -gq | slew_sleeping; do :; done; ntpd Serge. -- Serge point Bets arobase laposte point net ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
In article [EMAIL PROTECTED], David L. Mills [EMAIL PROTECTED] writes: David Serge, I didn't believe what you said until I checked the code and it David does increase the correction by 50%, but limits the overshoot to 50 David ms. Why in the would it overshoot at all? Dave, this is one of the many problem with ntpdate and why we wanted to kill it off since nobody was maintaining it. As I recall, somebody said For folks who want to run ntpdate out of cron, we should do a bit of overshoot so we can home in on the right adjustment. As I recall, the thought was If we start off with no overshoot and make our adjustment, the next time we run ntpdate we will make the same adjustment that we just did. So let's overshoot so next time we will be a bit closer. I didn't say the idea makes a lot of sense, but hey. -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
In article [EMAIL PROTECTED], Tom Smith [EMAIL PROTECTED] writes: Tom ntdate -b steps the clock. That's the function under discussion. The Tom one that's used nearly universally in boot sequences. Then change the boot sequence. Using ntpdate -b to step the clock before starting ntpd is no longer best common practice, and it hasn't been for a decent hunk of time. -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Serge, I didn't believe what you said until I checked the code and it does increase the correction by 50%, but limits the overshoot to 50 ms. Why in the would it overshoot at all? Dave Serge Bets wrote: Hello David, On Monday, February 11, 2008 at 19:03:36 +, David L. Mills wrote: Both ntpdate and ntpd -q set the offset with adjtime() and then exit. After that, stock Unix adjtime() slews the clock at rate 500 PPM, which indeed could take 256 s for an initial offset of 128 ms. And on some systems, adjtime() calls adjtimex(ADJ_OFFSET_SINGLESHOT) to do the job. Note that ntpdate does not stop slewing when it reaches the zero offset, but voluntarily overshoots by 50%. That's why ntpdate -b (forced step) or ntpd -q (exact slew until zero) are so much better. A prudent response would be to measure the initial offset and compute the time to wait. Thanks! That's exactly what does the slew_sleeping script: #!/bin/sh function slew_sleeping() { awk ' {print} /^ntpd: time slew .*s$/ { sleep = $4 * 2000 if (sleep 0) sleep = -sleep sleep = int(sleep + 0.99) # rounded by excess success = 1 } /^ntpd: time set .*s$/ { success = 1 } END{ if (sleep) { printf wait for the end of time slew, sleeping %d seconds\n, sleep system(sleep sleep) } exit success } ' } # echo ntpd: time slew -0.003000s | slew_sleeping; exit while ntpd -gq | slew_sleeping; do :; done; ntpd Serge. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Serge, Interesting script - thanks. Would you like me to put it in the distribution? This brings up an underlying question. It is possible for events to unfold in a way that while in state 4, events will be such that there will be future wiggles. Some of them may even take us out of state 4. Agreed? If so, what benefit do we get by using the script to delay things while we are waiting for a slew to finish while in state 4? What difference does it make if the system in question is an client as opposed to a server? -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Guys, Just for clarity, neither the daemon nor kernel frequency is adjusted in any way with ntpd -q. Serge Bets wrote: On Monday, February 11, 2008 at 7:38:53 +, David Woolley wrote: Serge Bets wrote: the kind of slew (singleshot) initiated by ntpd -q comes *above* the usual frequency correction That assumes the use of the kernel time discipline Indeed: I sometimes forget this can lack or be disabled, sorry. Serge. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Hello Harlan, On Monday, February 11, 2008 at 0:33:36 +, Harlan Stenn wrote: 1) what are you trying to accomplish by the sequence: ntpd -gq ; wait a bit; ntpd that you do not get with: ntpd -g ; ntp-wait Let's compare. I used a some weeks old ntp-dev 4.2.5p95, because the latest p113 seems to behave strangely (clearing STA_UNSYNC long before the clock is really synced). The driftfile exists and has a correct value. ntp.conf declares one reachable LAN server with iburst. There are 4 main cases: initial phase offset bigger than 128 ms, or below, and your startup method, or my method. -1) Initial phase offset over 128 ms, ntp-wait method: | 0:00 # ntpd -g; ntp-wait; time_critical_apps | 0:07 time step == the clock is very near 0 offset (less than a ms), | stratum 16, refid .STEP., state 4 | 0:12 ntp-wait terminates == time critical apps can be started | 1:20 *synchronized, stratum x == ntpd starts serving good time Timings are in minutes:seconds, relative to startup. Note this last *sync stage, when ntpd takes a non-16 stratum, comes at a seemingly random moment, sometimes as early as 0:40. -2) Initial phase offset over 128 ms, my slew_sleeping script: | 0:00 # ntpd -gq | slew_sleeping; ntpd | 0:07 time step, no sleep == near 0 offset (time critical apps can be | started) | 0:14 *synchronized == ntpd starts serving good time -3) Initial phase offset below 128 ms, ntp-wait method (worst case): | 0:00 # ntpd -g; ntp-wait; time_critical_apps | 0:07 *synchronized == ntpd starts serving time, a still bad time, | because the 128 ms offset is not yet slewed | 0:12 ntp-wait terminates == time critical apps are started | 7:30 offset crosses the zero line for the first time, and begins an | excursion on the other side (up to maybe 40 ms). The initial good | frequency has been modified to slew the phase offset, and is now | wildly bad (by perhaps 50 or 70 ppm). The chaos begins, and will | stabilize some hours later. -4) Initial phase offset below 128 ms, slew_sleeping script: | 0:00 ntpd -gq | slew_sleeping; ntpd | 0:07 begin max rate slew, sleeping all the necessary time (max 256 | seconds) | 4:23 wake-up == near 0 offset, time critical apps can be started | 4:30 *synchronized == ntpd starts serving good time Summary: The ntp-wait method is good at protecting apps against steps, but not against large offsets (tens or a hundred of ms). The daemon itself can start serving such less-than-good time. Startup takes more time to reach a near 0 offset, and can wreck the frequency. The ntpd -gq method does also avoid steps to applications, if all works well. But it's not a 100% protection, not the goal. It also protects apps against large offsets, never serves bad time, and never squashes the driftfile. It makes a much saner daemon startup, more stable, where the chaos situation described above (case #3) doesn't happen. It startups faster, outside of the cases where ntp-wait cheats by tolerating not yet good offsets. If necessary, slew_sleeping and ntp-wait can be combined, for a better level of protection. What about the following, that should survive even a server temporarily unavailable during startup, further delaying time critical apps: | # ntpd -gq | slew_sleeping; ntpd -g; ntp-wait; time_critical_apps One could also imagine looping ntpd -gq until it works, then sleep, then ntpd and time_critical_apps (the slew_sleeping scripts has to be modified to return success code): | # while ntpd -gq | slew_sleeping; do :; done; ntpd; time_critical_apps Serge. -- Serge point Bets arobase laposte point net ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Tom, With tinker step .001 in the configuration file, ntpd -q will step the clock, unless the residual offset is less than .001 s. This is probably more complexity than you can stand. Just keep using ntpdate and be happy. Dave Tom Smith wrote: ntdate -b steps the clock. That's the function under discussion. The one that's used nearly universally in boot sequences. -Tom David L. Mills wrote: Guys, There seems to some misinformation here. Both ntpdate and ntpd -q set the offset with adjtime() and then exit. After that, stock Unix adjtime() slews the clock at rate 500 PPM, which indeed could take 256 s for an initial offset of 128 ms. A prudent response would be to measure the initial offset and compute the time to wait. The ntp-wait script waits for ntpd to enter state 4, which could happen with an initial offset as high as 128 ms. The ntpd time constant is purposely set somewhat large at 2000 s, which results in a risetime of about 3000 s. This is a compromise for stable acquisition for herky-jerky Internet paths and speed of convergence for LANs. For typical Internet paths the Allan intercept is about 2000 s. For fast LANs with nanosecond clock resolution, the Allan intercept can be as low as 250s, which is what the kernel PPS loop is designed for. Both the daemon and kernel loops are engineered so that the time constant is directly proportional to the poll interval and the risetime scales directly. If the poll exponent is set to the minimum 4 (16 s) the risetinme is 500 s. While not admitted in public, the latest snapshot can set the poll interval to 3 (8 s), so the risetime is 250 s. This works just fine on a LAN, but I would never do this on an outside circuit. Dave snip ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Harlan Stenn [EMAIL PROTECTED] writes: In article [EMAIL PROTECTED], Unruh [EMAIL PROTECTED] writes: Harlan In terms of the behavior model of ntp, state 4 is as good as it Harlan gets. You are in the right ballpark. Unruh And as has been commented on numerous times, ntp is state 4 is very Unruh slow to converge to the best possible time control. This was a Unruh deliberate design decision, as I understand it, so that in steady Unruh state the time is averaged over a large number of samples ( not Unruh helped by the fact that 85% of samples are thrown away), to reduce Unruh the statistical error in the clock control. Note that at poll 7 the Unruh number of actual samples averaged over in the time scale of the ntp Unruh feedback loop is only about 3, so the statistical averaging even with Unruh such a long time constant, is not very good. OK, and please don't take this the wrong way, but So What? For the general use case (LAN and/or WAN and/or jerky path) ntpd behaves well. The question is not does it work well, but does it work the best it can. As Dave recently replied, if you are only interested in LAN performance there are tweaks that can be made that will improve the performance. No, I am interested in the behaviour in general. That is why I am trying to test it on an ADSL link as well. The current setup will Just Work regardless of the network environment. This, to me, is the sign of a good piece of software. If somebody with extra knowledge can make a local optimization based on tighter specs, great. The question is whether or not it can be made better in general. I suspect that if Dave can be shown that whatever chrony is doing will behave in the wider space that NTP covers, he will be OK making changes to use those algorithms. There may even be a way to choose different algorithms based on the behavior in evidence. But you seem to be talking about how improvements can be made and I thought this original thread was about how there was a *problem*. This original thread was about how ntpdate had an too small a buffer for a given use-- a very easily fixable problem. It then wandered to whether or not ntpdate should be axed or not. And then as an aside I mentioned further experiments I was doing on the comparison of chrony and ntp-- mentioned because one of the reasons ntpdate is used is the slow convergence of ntp to the true time. I mentioned that chrony has much faster convergence. So as sometimes happens in threads, they wander, and in this case I was at least partially responsible for part of the wander. If you want something else, something you consider better than state 4, please make a case for this and lobby for it. Unruh I think many people have lobbied for faster response. In the Unruh discussion of the chrony/ntp comparison, chrony is much faster to Unruh correct errors, and at least on a local network, better at Unruh disciplining the clock as well ( in part I think because on such a Unruh minimal round trip network, the frequency fluctuations dominate over Unruh the offset measurement errors-- Ie, the Allen intercept is much lower Unruh than the assumed 1500 sec. in that kind of situation-- also the drift Unruh model on real systems is not well modeled by 1/f noise.) So, what I Unruh think the point is that using ntpdate, one can rapidly bring the Unruh clock into a few msec of the correct time, rather than waiting for Unruh the feedback loop to finally eliminate that last 128msec of offset. OK, and again, I'm seeing you lobby for an enhancement/improvement here (and I'm all for that). David (I think) was talking about a *problem*. I agree with you that we can do better. I am trying to see if there is also a problem. Too many potential problems. I am confused about which one. Harlan If folks have suggestions for improvements I welcome them. Harlan If folks want something different I invite them to make a case for Harlan it. Please remember the scope and complexity of the problem case. Harlan It's much easier to have a simpler solution if one is prepared to Harlan ignore certain problems. Another case in this point is Maildir. Harlan If somebody is in the situation where they know they have specific Harlan requirements for time, they are in the situation where they have Harlan enough altitude on their requirements to know the costs/benefits of Harlan what is involved in getting there. Unruh Well, I disagree. The sign of a good piece of software is that it Unruh does what it needs to do despite the user having a bad idea of how to Unruh accomplish the task. Sounds like NTP. Folks often have pretty bad ideas about what they need to do or what problems they think they are solving by doing strange things and the code works anyway. Mine was a specific response to the comment that Harlan made. But more to the point, what is the *problem* you are trying to solve? You are still communicating to me that we can do *better* and I agree with you. You
Re: [ntp:questions] ntpdate.c unsafe buffer write
I've tried to keep quiet and bite my tongue at this whole ntp vs chrony thing... But something has been nagging me in the back of my head that i juat have to know the answer to... How are you measuring your results? From what I've skimmed over you are simply using each program's own generated statistics... Wouldn't a more correct way be to use an external (and calibrated) device to measure / compare to ensure the results are actually valid? Otherwise you are in essence comparing apples to oranges... ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
So, no, I am comparing apples to apples ( the offsets as determined from the ntp packet exchange mechanism which both use and both report). Another approach is to setup a 3rd machine to watch both. -- These are my opinions, not necessarily my employer's. I hate spam. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Guys, There seems to some misinformation here. Both ntpdate and ntpd -q set the offset with adjtime() and then exit. After that, stock Unix adjtime() slews the clock at rate 500 PPM, which indeed could take 256 s for an initial offset of 128 ms. A prudent response would be to measure the initial offset and compute the time to wait. The ntp-wait script waits for ntpd to enter state 4, which could happen with an initial offset as high as 128 ms. The ntpd time constant is purposely set somewhat large at 2000 s, which results in a risetime of about 3000 s. This is a compromise for stable acquisition for herky-jerky Internet paths and speed of convergence for LANs. For typical Internet paths the Allan intercept is about 2000 s. For fast LANs with nanosecond clock resolution, the Allan intercept can be as low as 250s, which is what the kernel PPS loop is designed for. Both the daemon and kernel loops are engineered so that the time constant is directly proportional to the poll interval and the risetime scales directly. If the poll exponent is set to the minimum 4 (16 s) the risetinme is 500 s. While not admitted in public, the latest snapshot can set the poll interval to 3 (8 s), so the risetime is 250 s. This works just fine on a LAN, but I would never do this on an outside circuit. Dave Unruh wrote: Harlan Stenn [EMAIL PROTECTED] writes: In article [EMAIL PROTECTED], David Woolley [EMAIL PROTECTED] writes: David Harlan Stenn wrote: Why would ntpd be exiting during a warm start? David Because we are discussing using it with the -q option. If you just David use -g, it will take a lot longer to converge within a few David milliseconds, as it will not slew at the maximum rate. If you use David -q, you need to force a step if you want fast convergence. I still maintain you are barking up the wrong tree. In terms of the behavior model of ntp, state 4 is as good as it gets. You are in the right ballpark. And as has been commented on numerous times, ntp is state 4 is very slow to converge to the best possible time control. This was a deliberate design decision, as I understand it, so that in steady state the time is averaged over a large number of samples ( not helped by the fact that 85% of samples are thrown away), to reduce the statistical error in the clock control. Note that at poll 7 the number of actual samples averaged over in the time scale of the ntp feedback loop is only about 3, so the statistical averaging even with such a long time constant, is not very good. If you want something else, something you consider better than state 4, please make a case for this and lobby for it. I think many people have lobbied for faster response. In the discussion of the chrony/ntp comparison, chrony is much faster to correct errors, and at least on a local network, better at disciplining the clock as well ( in part I think because on such a minimal round trip network, the frequency fluctuations dominate over the offset measurement errors-- Ie, the Allen intercept is much lower than the assumed 1500 sec. in that kind of situation-- also the drift model on real systems is not well modeled by 1/f noise.) So, what I think the point is that using ntpdate, one can rapidly bring the clock into a few msec of the correct time, rather than waiting for the feedback loop to finally eliminate that last 128msec of offset. For the case I'm describing the startup script sequence is to fire up 'ntpd -g' early. If there are applications that need the system clock to be on-track stable (even if a wiggle is being dealt with), that's 'state 4', and running 'ntp-wait' before starting those services is, to the best of my knowledge, all that is required. David State 4 means within 128ms and using the normal control loop, which David has a time constant of around an hour. OK, and so what? Is State 4 insufficient for your needs, or are you just splitting hairs? David For a cold start, it won't reach state 4 for a further 900 seconds David after first priming the clock filter. If the system has a good drift file, I disagree with you. David The definition of cold start is that there is no drift file. OK, now I know what the definitions are. I don't recall offhand the expected time to hit state 4 without a drift file. 1) This should not be the ordinary case 2) How does this have any bearing on the ntpdate -b discussion? And what is the big deal with using different config files? The config file mechanism has include capability so it is trivial to to easily maintain common 'base' configuration with customizations for separate start/run phases. David You are now talking about using -q. The difficulty is that people David have enough trouble getting the run phase config file right. I mention it because it's what you seem to be insisting on talking about. I was providing a way to address the problems you describe with
Re: [ntp:questions] ntpdate.c unsafe buffer write
In article [EMAIL PROTECTED], David Woolley [EMAIL PROTECTED] writes: David Harlan Stenn wrote: Why would ntpd be exiting during a warm start? David Because we are discussing using it with the -q option. If you just David use -g, it will take a lot longer to converge within a few David milliseconds, as it will not slew at the maximum rate. If you use David -q, you need to force a step if you want fast convergence. I still maintain you are barking up the wrong tree. In terms of the behavior model of ntp, state 4 is as good as it gets. You are in the right ballpark. If you want something else, something you consider better than state 4, please make a case for this and lobby for it. For the case I'm describing the startup script sequence is to fire up 'ntpd -g' early. If there are applications that need the system clock to be on-track stable (even if a wiggle is being dealt with), that's 'state 4', and running 'ntp-wait' before starting those services is, to the best of my knowledge, all that is required. David State 4 means within 128ms and using the normal control loop, which David has a time constant of around an hour. OK, and so what? Is State 4 insufficient for your needs, or are you just splitting hairs? David For a cold start, it won't reach state 4 for a further 900 seconds David after first priming the clock filter. If the system has a good drift file, I disagree with you. David The definition of cold start is that there is no drift file. OK, now I know what the definitions are. I don't recall offhand the expected time to hit state 4 without a drift file. 1) This should not be the ordinary case 2) How does this have any bearing on the ntpdate -b discussion? And what is the big deal with using different config files? The config file mechanism has include capability so it is trivial to to easily maintain common 'base' configuration with customizations for separate start/run phases. David You are now talking about using -q. The difficulty is that people David have enough trouble getting the run phase config file right. I mention it because it's what you seem to be insisting on talking about. I was providing a way to address the problems you describe with the (IMO bad) mechanism (-q) under discussion. But the bigger problem is why are you insisting on separate start/run phases? This has not been best practice for quite a while, and if you insist on using this method you will be running in to the exact problems you are describing. No, the best advice is to understand why you have been using ntpdate -b so far and understand the pros/cons of the new choices. David We are talking about system managers and package creators, neither of David which have much time to study the details. Blessed are those who get what they deserve. These are the same folks who must get ssh configurations and various other network configurations working. If the stock things work well enough for folks, great. If folks have suggestions for improvements I welcome them. If folks want something different I invite them to make a case for it. Please remember the scope and complexity of the problem case. It's much easier to have a simpler solution if one is prepared to ignore certain problems. Another case in this point is Maildir. If somebody is in the situation where they know they have specific requirements for time, they are in the situation where they have enough altitude on their requirements to know the costs/benefits of what is involved in getting there. -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Hello David, On Sunday, February 10, 2008 at 10:55:29 +, David Woolley wrote: However, if it wasn't stepped, because it was already within 128ms, it will be slewing at maximum rate. Allowing 100ppm for motherboard tolerances, that means that it can take up to a further 320 seconds to reach the low milliseconds. Only 256 seconds maximum, because the kind of slew (singleshot) initiated by ntpd -q comes *above* the usual frequency correction already annihiliating the motherboard error. I don't believe it would be safe to start ntpd in normal mode within that period. Indeed: the daemon then behaves strangely, not sane at all. Last year I published here an awk script calling ntpd -gq and then sleeping until an eventual slew is finished. After that, normal mode ntpd can be started safely. And of course the daemon really appreciates to startup with a near-zero initial phase offset. Serge. -- Serge point Bets arobase laposte point net ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
In article [EMAIL PROTECTED], Serge Bets [EMAIL PROTECTED] writes: David I don't believe it would be safe to start ntpd in normal mode within David that period. Serge Indeed: the daemon then behaves strangely, not sane at all. Last year Serge I published here an awk script calling ntpd -gq and then sleeping Serge until an eventual slew is finished. After that, normal mode ntpd can Serge be started safely. And of course the daemon really appreciates to Serge startup with a near-zero initial phase offset. 1) what are you trying to accomplish by the sequence: ntpd -gq ; wait a bit; ntpd that you do not get with: ntpd -g ; ntp-wait 2) there have been recent changes to the initial frequency/offset situation with ntp-dev. Have you tried the latest code to see how it behaves? -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Serge Bets wrote: Only 256 seconds maximum, because the kind of slew (singleshot) initiated by ntpd -q comes *above* the usual frequency correction already annihiliating the motherboard error. That assumes the use of the kernel time discipline, alhtough if you don't have that, it is even more important to use ntpdate -b, if you want fast phase convergence, as the time won't drift much between the initial set and the start of ntpd. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
David L. Mills wrote: Harlan, You make some good points. However, if folks want SNTP from here I think they would prefer it in its own distribution rather than bundle it with the huge NTP distribution. You can make a strong argument to host here I don't think you are ever going to get rid of ntpdate from the distribution (as supplied by packagers and vendors) until ntpd offers a mode which sets the time within about one second of being started. I'm not convinced that SNTP will displace ntpdate for this purpose. People don't want to delay boot sequences, but they also don't want to start applications until the time has been set. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
David Woolley wrote: David L. Mills wrote: Harlan, You make some good points. However, if folks want SNTP from here I think they would prefer it in its own distribution rather than bundle it with the huge NTP distribution. You can make a strong argument to host here I don't think you are ever going to get rid of ntpdate from the distribution (as supplied by packagers and vendors) until ntpd offers a mode which sets the time within about one second of being started. I'm not convinced that SNTP will displace ntpdate for this purpose. People don't want to delay boot sequences, but they also don't want to start applications until the time has been set. How long does ntpd -g take to set the time? As I understand it, it's supposed to query the configured servers, make a best guess as to what time it is, set that, and then go to normal operation. That should put you within a second or so. If you need better, either wait for it, or keep your server alive 24x7x365. I think most data centers do run 24x7x365. If you're talking about a data center that lives under the boss's desk, consider buying a UPS and hope that the power doesn't fail for longer than the run time. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Richard B. Gilbert wrote: David Woolley wrote: David L. Mills wrote: Harlan, You make some good points. However, if folks want SNTP from here I think they would prefer it in its own distribution rather than bundle it with the huge NTP distribution. You can make a strong argument to host here I don't think you are ever going to get rid of ntpdate from the distribution (as supplied by packagers and vendors) until ntpd offers a mode which sets the time within about one second of being started. I'm not convinced that SNTP will displace ntpdate for this purpose. People don't want to delay boot sequences, but they also don't want to start applications until the time has been set. How long does ntpd -g take to set the time? As I understand it, it's supposed to query the configured servers, make a best guess as to what time it is, set that, and then go to normal operation. That should put you within a second or so. If you need better, either wait for it, or keep your server alive 24x7x365. I think most data centers do run 24x7x365. If you're talking about a data center that lives under the boss's desk, consider buying a UPS and hope that the power doesn't fail for longer than the run time. David is right. He means be done with it, including hard-setting the clock, within a second. The accuracy expected, based on ntpdate -b as the benchmark you are trying to replace, is within a small number of milliseconds of the specified servers. Sorry, ntpd -q doesn't meet the requirements. -Tom ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
On 2008-02-09, Tom Smith [EMAIL PROTECTED] wrote: He means be done with it, including hard-setting the clock, within a second. The accuracy expected, based on ntpdate -b as the benchmark you are trying to replace, is within a small number of milliseconds of the specified servers. Sorry, ntpd -q doesn't meet the requirements. You need to be realistic about your requirements. In the case of systems which run time sensitive services, or are rarely rebooted, an ~11 second pause, which is _is_ about the amount of time it takes for 'ntpq -gq' to do a quick sanity check on your configured time servers and set the clock, is not unreasonable. In the case of systems which do not run time critical services there is no reason why ntpd can not be started with -g and be allowed to set the clock as the boot progresses. In most cases the clock will be set before, or very shortly after, the boot sequence is completed. The big issue in the ntpdate vs ntpd -gq debate is the fact that the former may be used over unprivileged ports while the latter can not. This gives ntpdate the advantage in situtations where a firewall is blocking port 123/UDP. That's what you should be complaining about, not some trivial 11 second delay. -- Steve Kostecke [EMAIL PROTECTED] NTP Public Services Project - http://support.ntp.org/ ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
In article [EMAIL PROTECTED], David Woolley [EMAIL PROTECTED] writes: David I don't think you are ever going to get rid of ntpdate from the David distribution (as supplied by packagers and vendors) until ntpd offers David a mode which sets the time within about one second of being started. The current sntp code can do this now. David I'm not convinced that SNTP will displace ntpdate for this purpose. Why not? David People don't want to delay boot sequences, but they also don't want David to start applications until the time has been set. Then I submit you are focusing a bit too deeply on the details and invite you to take a step back. I believe the current set of tools can be used in a variety of combinations that will handle the various cases to the best that we know how to do them. If you want to get the time set *now* and then start, regardless of how well the system can maintain that time, we can do that (sntp/ntpdate+ntpd). If you want to set the time ASAP and have stable system time before starting your apps, in the usual case you are talking about 11 seconds for this to happen (ntpd -g, with iburst, early in the boot sequence, using ntp-wait later in the boot sequence, just before starting time-critical services). Near as I can recall, any other cases have looser constraints so they're not particularly interesting for this conversation. -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
In article [EMAIL PROTECTED], David L. Mills [EMAIL PROTECTED] writes: David Harlan, You make some good points. However, if folks want SNTP from David here I think they would prefer it in its own distribution rather than David bundle it with the huge NTP distribution. That's not the feedback I have received, but I will note it would be possible to have an ntp+sntp distribution and a separate sntp distribution. It would take a couple of days' time to do this, and I have much hotter fires to put out first. Additionally, there will be significant changes in the code layout as the sntp code is overhauled, so I'd prefer to wait on this additional distribution tarball until that effort is completed. David You can make a strong David argument to host here if the claim that both NTP and SNTP are David strictly specification conformant. That's why I rewrote the SNTP David documentation to take out all mention that it could be used as a David server. OK. David The three of us that wrote rfc 2030 had just come down from a massive David clogging situation at UWisc and NIST and were frantic to get across David the need for polite client behavior. This has to do with DNS lookups, David poll intervals and behavior when no response is received. Even so, David there remains at least three violators of those principles right now David on two of our public servers. Therefore, if an SNTP product leaves David here, it really and surely should compley with the on-wire protocol David in the NTPv4 spec and these best practices. We're on the same page. David A aside, I should reveal my biases. At the moment, to configure the David current software on an Sun Ultra 5 takse 12 minutes, 6 minutes for David NTP and 6 minutes for SNTP. But, it takes only 8 minutes to compile David and link all programs, including both NTP and SNTP. It is not now David possible to build either separately. I'm not sure what you mean about building separately. We *used* to be able to build: - ntp + sntp: configure ; make - ntp only: configure --without-sntp ; make - sntp only: cd sntp ; configure ; make About a year and a half ago we got the SNTP code to the point where it would build on Unix (nobody has done the work for Windows, but apparently nobody is asking for it there either - http://bugs.ntp.org/500 has the details). Since we've been announcing that ntpdate will be deprecated because its functionality can be replaced by various combinations of ntpd and sntp, we made sntp a 'required' part of the NTP build. David As I have said privately before, the NTP daemon can be operated in David SNTP mode which does everything NTP does, but terminates just after David the clock has been set for the first time. Yes, it has a rather large David footprint, but it lasts only about 11 seconds. The downside is that David it requires a configuration file containing a list of servers. If David this were done on the command line, NTP in SNTP mode would be David indistinguishable from SNTP other than a command line option. You have provided a mechanism for doing this. It will be an acceptable choice for a good number of people. But there is a significant group of people for whom this particular mechanism will not work. They require any or all of the following: - a small footprint - set the time with the smallest possible delay While we might be able to achieve the smallest delay with ntpd, I don't currently see how we can do that while also offering full NTP support from a single binary and achieve the small footprint. David So, the ideal solution would seem to include a list of links on the David NTP home page to external sites and in addition internal links to the David NTP and SNTP distributions along with a statement that both are David strictly specification conformant. That might inspire other wannabees David to make and enforce similar claims. We already have internal and external links on the ntp.org site. And if somebody wants additional or different information there, contact information is also listed in what should be obvious places. -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Harlan Stenn wrote: Guys, This is all discussed pretty well at: http://support.ntp.org/bin/view/Dev/DeprecatingNtpdate So far everything I have seen in this thread has already been covered on that page. I just followed the above link. I see ONE feature missing! ntpdate -Du (I think it's -D) does NOT set the clock, it simply tells you what it would have done had it been permitted to do so. I suppose this feature is not essential but I've used it a time or two to find out how my time agreed, or disagreed, with some other server. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
In article [EMAIL PROTECTED], David Woolley [EMAIL PROTECTED] writes: David Harlan Stenn wrote: In article [EMAIL PROTECTED], David Woolley [EMAIL PROTECTED] writes: David I'm not convinced that SNTP will displace ntpdate for this purpose. Why not? David Because ntpdate is fixed in the popular culture and, for the ordinary David user, SNTP doesn't offer any obvious advantages. Well, The Plan is to remove ntpdate. So unless somebody writes a contributed script, the fact that ntpdate (with its known bugs) is going away and a documented set of functional equivalents will be available will probably be all the convincing that is needed. If you want to get the time set *now* and then start, regardless of how well the system can maintain that time, we can do that (sntp/ntpdate+ntpd). David Not in Dave Mills future of ntpd, as you don't get ntpdate or SNTP. That would be true if Dave controlled the contents of the distribution. There is a set of required functionality out there that will be met by the distribution I control. There may be distributions I roll that have subset functionality, and Dave may choose to offer other distributions. I see no benefit and many problems in forcing this issue too soon, so at the moment it is a topic for discussion and the situation seems to be on track right now. This is, by no means, the most important thing we're all working on right now. Getting the sntp code up to spec is far more important, IMO. If you want to set the time ASAP and have stable system time before starting your apps, in the usual case you are talking about 11 seconds for this to happen (ntpd -g, with iburst, early in the boot sequence, using ntp-wait later in the boot sequence, just before starting time-critical services). David I suspect that only sets the time to the nearest 128ms, unless it David does something that ntpd doesn't normally do. I suspect you are mistaken, and what I describe is correct. In the case I describe, at the end of that O(11 second) period the clock is Real Close (ie, the offset is low enough), the frequency drift is known and compensated for, and ntpd is in state 4. -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Harlan Stenn [EMAIL PROTECTED] writes: In article [EMAIL PROTECTED], Unruh [EMAIL PROTECTED] writes: Unruh Harlan Stenn [EMAIL PROTECTED] writes: Bill, ntpdate is being deprecated. Unruh Maybe, but it should still not have bugs if it is actually still part Unruh of the distro. I mostly agree with you. And one reason there are a bunch of outstanding bugs in ntpdate is that nobody has stepped forward to maintain it, especially after the last round of bugs where we decided that the best thing to do for ntpdate was kill it off and replace it with sntp. Speaking of which, I need to ping the folks who volunteered to work on the SNTP code and see what the status is. And it is *much* better to file reports like this using bugs.ntp.org as otherwise they tend to get lost in the wind. Unruh OK. Will do. I saw that bug get filed - thanks a bunch! Where it met with the same reaction-- ntpdate is deprecated so why fix the bug. Do you want to bet that ntpdate will still be there in 2010? ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
In article [EMAIL PROTECTED], David L. Mills [EMAIL PROTECTED] writes: David Harlan, My position on ntpdate and sntp has always been clear. Remove David them both from the distribution and let other folks contribute sntp David products. Yes, your position has been clear and your opinion has been noted. David The standards labs in various contries do not recommend the David NTP reference implementation, they recommend other shrinkwrap David products. I'd appreciate references on this point. And how it is germane to this discussion? David There is no need for folks to download the reference David implementatino only to bring up an sntp product. Yes, which is why the sntp code can be trivially bundled separately. The feedback I have received is that the majority of folks want the distribution to contain both ntp and sntp. David The matter of concern is an sntp product that strictly conforms to David the NTPv4 specification as it applies to sntp. There is at least one David contributor testing the kiss-o'-death rate limit and has apparently David actually read rfc 2030. On the other hand, there are numerous David examples of clients that casually violate the rate rules both at David servers we operate here and at the national labs. Yup. David What we should be David doing is supporting those products that play by the rules and that David are maintained by other players. This depends first on the definition of we, and then on the definition of supporting. The people who talk to me want an SNTP implementation from the NTP Project. Nobody is expecting you to ride herd over any SNTP code that may or may not be part of the same tarball that includes NTP. I am mulling over different ideas in this regard. Two obvious ways to go on NTP/SNTP are to have shared code, or completely separate codebases. There is some middle ground regarding support libraries. I see difficult tradeoffs with either approach. -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Harlan, You make some good points. However, if folks want SNTP from here I think they would prefer it in its own distribution rather than bundle it with the huge NTP distribution. You can make a strong argument to host here if the claim that both NTP and SNTP are strictly specification conformant. That's why I rewrote the SNTP documentation to take out all mention that it could be used as a server. The three of us that wrote rfc 2030 had just come down from a massive clogging situation at UWisc and NIST and were frantic to get across the need for polite client behavior. This has to do with DNS lookups, poll intervals and behavior when no response is received. Even so, there remains at least three violators of those principles right now on two of our public servers. Therefore, if an SNTP product leaves here, it really and surely should compley with the on-wire protocol in the NTPv4 spec and these best practices. A aside, I should reveal my biases. At the moment, to configure the current software on an Sun Ultra 5 takse 12 minutes, 6 minutes for NTP and 6 minutes for SNTP. But, it takes only 8 minutes to compile and link all programs, including both NTP and SNTP. It is not now possible to build either separately. As I have said privately before, the NTP daemon can be operated in SNTP mode which does everything NTP does, but terminates just after the clock has been set for the first time. Yes, it has a rather large footprint, but it lasts only about 11 seconds. The downside is that it requires a configuration file containing a list of servers. If this were done on the command line, NTP in SNTP mode would be indistinguishable from SNTP other than a command line option. So, the ideal solution would seem to include a list of links on the NTP home page to external sites and in addition internal links to the NTP and SNTP distributions along with a statement that both are strictly specification conformant. That might inspire other wannabees to make and enforce similar claims. Dave Harlan Stenn wrote: In article [EMAIL PROTECTED], David L. Mills [EMAIL PROTECTED] writes: David Harlan, My position on ntpdate and sntp has always been clear. Remove David them both from the distribution and let other folks contribute sntp David products. Yes, your position has been clear and your opinion has been noted. David The standards labs in various contries do not recommend the David NTP reference implementation, they recommend other shrinkwrap David products. I'd appreciate references on this point. And how it is germane to this discussion? David There is no need for folks to download the reference David implementatino only to bring up an sntp product. Yes, which is why the sntp code can be trivially bundled separately. The feedback I have received is that the majority of folks want the distribution to contain both ntp and sntp. David The matter of concern is an sntp product that strictly conforms to David the NTPv4 specification as it applies to sntp. There is at least one David contributor testing the kiss-o'-death rate limit and has apparently David actually read rfc 2030. On the other hand, there are numerous David examples of clients that casually violate the rate rules both at David servers we operate here and at the national labs. Yup. David What we should be David doing is supporting those products that play by the rules and that David are maintained by other players. This depends first on the definition of we, and then on the definition of supporting. The people who talk to me want an SNTP implementation from the NTP Project. Nobody is expecting you to ride herd over any SNTP code that may or may not be part of the same tarball that includes NTP. I am mulling over different ideas in this regard. Two obvious ways to go on NTP/SNTP are to have shared code, or completely separate codebases. There is some middle ground regarding support libraries. I see difficult tradeoffs with either approach. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
[ntp:questions] ntpdate.c unsafe buffer write
In ntpdate.c around line 542 (4.2.4p4)is the sequence if (!authistrusted(sys_authkey)) { char buf[10]; (void) sprintf(buf, %lu, (unsigned long)sys_authkey); msyslog(LOG_ERR, authentication key %s unknown, buf); exit(1); } Since unsigned long does not have a definite length on all machines, and with the trailing zero certainly is potentially longer than 10 bytes, that buf is ripe for buffer overflow. It should be something like char buf[(sizeof(unsigned long)*12/5+2)]; And/or the sprintf should be an snprintf. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Bill, ntpdate is being deprecated. And it is *much* better to file reports like this using bugs.ntp.org as otherwise they tend to get lost in the wind. H -- In article [EMAIL PROTECTED], Unruh [EMAIL PROTECTED] writes: Unruh In ntpdate.c around line 542 (4.2.4p4)is the sequence if Unruh (!authistrusted(sys_authkey)) { char buf[10]; Unruh (void) sprintf(buf, %lu, (unsigned long)sys_authkey); Unruh msyslog(LOG_ERR, authentication key %s unknown, buf); exit(1); Unruh } Unruh Since unsigned long does not have a definite length on all machines, Unruh and with the trailing zero certainly is potentially longer than 10 Unruh bytes, that buf is ripe for buffer overflow. It should be something Unruh like char buf[(sizeof(unsigned long)*12/5+2)]; And/or the sprintf Unruh should be an snprintf. -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
Harlan Stenn [EMAIL PROTECTED] writes: Bill, ntpdate is being deprecated. Maybe, but it should still not have bugs if it is actually still part of the distro. And it is *much* better to file reports like this using bugs.ntp.org as otherwise they tend to get lost in the wind. OK. Will do. H -- In article [EMAIL PROTECTED], Unruh [EMAIL PROTECTED] writes: Unruh In ntpdate.c around line 542 (4.2.4p4)is the sequence if Unruh (!authistrusted(sys_authkey)) { char buf[10]; Unruh (void) sprintf(buf, %lu, (unsigned long)sys_authkey); Unruh msyslog(LOG_ERR, authentication key %s unknown, buf); exit(1); Unruh } Unruh Since unsigned long does not have a definite length on all machines, Unruh and with the trailing zero certainly is potentially longer than 10 Unruh bytes, that buf is ripe for buffer overflow. It should be something Unruh like char buf[(sizeof(unsigned long)*12/5+2)]; And/or the sprintf Unruh should be an snprintf. -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] ntpdate.c unsafe buffer write
In article [EMAIL PROTECTED], Unruh [EMAIL PROTECTED] writes: Unruh Harlan Stenn [EMAIL PROTECTED] writes: Bill, ntpdate is being deprecated. Unruh Maybe, but it should still not have bugs if it is actually still part Unruh of the distro. I mostly agree with you. And one reason there are a bunch of outstanding bugs in ntpdate is that nobody has stepped forward to maintain it, especially after the last round of bugs where we decided that the best thing to do for ntpdate was kill it off and replace it with sntp. Speaking of which, I need to ping the folks who volunteered to work on the SNTP code and see what the status is. And it is *much* better to file reports like this using bugs.ntp.org as otherwise they tend to get lost in the wind. Unruh OK. Will do. I saw that bug get filed - thanks a bunch! -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions