Re: [ntp:questions] Time reset
On another note, I wish that NTP had an option to disable backward jumps in time. Is it something that has been considered before? TIA ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Evandro Menezes wrote: On another note, I wish that NTP had an option to disable backward jumps in time. Is it something that has been considered before? TIA If you are seeing backward jumps on a running ntpd, something is VERY wrong. Normally the only time you will see a jump in either direction is when ntpd initializes. If you keep your server running 24x7 you should not see jumps at any other time. Some Linux systems have been reported to lose clock interrupts during periods of heavy disk activity but this results in forward steps. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Unruh wrote: [] But ntp arguably should handle jumps in time or frequency faster than the hours or days it takes now. Why? A jump (to me) implies an error in the hardware. Who is to say that it's a one-off jump, or that it might not occur again in a few seconds, minutes ot hours? Should it not be the hardware which is corrected, not NTP? G Cheers, David ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
David J Taylor wrote: Unruh wrote: [] But ntp arguably should handle jumps in time or frequency faster than the hours or days it takes now. Why? A jump (to me) implies an error in the hardware. Who is to say that it's a one-off jump, or that it might not occur again in a few seconds, minutes ot hours? Should it not be the hardware which is corrected, not NTP? I think one has to distinguish between jumps in time and in frequency. The naive test uses a jump in time, which is not a real life situation for serviceable equipment. However, jumps in frequency are quite possible as the result of rapid temperature changes, and as the results of aging processes in the electronics. Unruh's main issues with ntpd is actually its response to frequency transients, not its response to phase transients, and he is diluting his case by introducing phase transients. The one exception to this is the ntpd's poor response to the startup phase transient. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
David J Taylor [EMAIL PROTECTED] writes: Unruh wrote: [] But ntp arguably should handle jumps in time or frequency faster than the hours or days it takes now. Why? A jump (to me) implies an error in the hardware. Who is to say that Or an error in software (missed timer ticks), or... it's a one-off jump, or that it might not occur again in a few seconds, Yes, who is to say. So ntp should correct each one as quickly as possible. minutes ot hours? Should it not be the hardware which is corrected, not NTP? On that argument ntp should not do anything more than rdate did, since if the oscillator on the clocks were perfect there would be no need for anything but an single setting of the clock once per bootup. So ntp should not try to correct any errors but rather people should be advised to buy new hardware. The whole point to ntp is to try to keep the clocks operating as close to real time as possible in the real world-- with bad oscillators, time jumps in the OS, Agreed it ain't perfect, the question is how close to perfection can we get it. G Cheers, David ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
jkvbe [EMAIL PROTECTED] writes: The ntp log file shows when NTP steps the time. But then the potential harm is already done. Especially if the time moves backward, our server might have serious trouble. Is there a log event which indicates that the time is going to be reset in order to enable us to take appropriate action before the actual reset? Actually I think: Yes. The symptom typically is a state that is different from 4 over multiple polling/update intervals (ntpq -c rl). Ulrich ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Ulrich, The current (development) code has a new statistics file, called protostats, that records significant events, like system peer changes, mobilization/demobilization and error events. Among the events is a spike detected event, which indicates that an offset arrived greater than the step threshold (125 ms). Most of the time the spike is one-off and is simply discarded. However, if the offset persists beyound the stepout threshold (900 s), it will result in a step correction. Some might interpret the spike event as a warning that a step might occur 15 minutes later, but then would have to determine that this is indeed not a spike but a legitimate warning. Should it be interreted as a warning, it is not at all clear what an application should do. For instance, some IBM mainframers recommends that aall pplications should be completely shut down during the changeover between standard and daylight time. On the other hand, a protostats event triggers a trap, which could be caught by a monitoring program that kills power to the machine room. Dave Ulrich Windl wrote: jkvbe [EMAIL PROTECTED] writes: The ntp log file shows when NTP steps the time. But then the potential harm is already done. Especially if the time moves backward, our server might have serious trouble. Is there a log event which indicates that the time is going to be reset in order to enable us to take appropriate action before the actual reset? Actually I think: Yes. The symptom typically is a state that is different from 4 over multiple polling/update intervals (ntpq -c rl). Ulrich ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Ulrich Windl wrote: David Woolley [EMAIL PROTECTED] writes: Richard B. Gilbert wrote: that people do some very strange things with computer clocks. I'm thinking, in particlar, of at least one individual who deliberately set his clock to an incorrect time in order to see if Ntpd would correct it. Many people do this. It is the naive users' way of testing that ntpd works. Yes, but many forget to restart ntpd immediately after having chnaged the They don't forget. The naive users who run this sort of test have no idea that it is an unreasonable thing to do. In fact, restarting ntpd on the client would invalidate their test, as the test is about the ability of ntpd to track time jumps without any manual intervention. I'm not suggesting that ntpd should accommodate naive user tests, but just pointing out that they are rather common. clock. Not having done so is as if the clock suddenly got a transient hardware defect. NTP is not designed repairing hardware defects. It syncs (good) clocks. (I had to say) Ulrich ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
David Woolley [EMAIL PROTECTED] writes: Ulrich Windl wrote: David Woolley [EMAIL PROTECTED] writes: Richard B. Gilbert wrote: that people do some very strange things with computer clocks. I'm thinking, in particlar, of at least one individual who deliberately set his clock to an incorrect time in order to see if Ntpd would correct it. Many people do this. It is the naive users' way of testing that ntpd works. Yes, but many forget to restart ntpd immediately after having chnaged the They don't forget. The naive users who run this sort of test have no idea that it is an unreasonable thing to do. In fact, restarting ntpd on the client would invalidate their test, as the test is about the ability of ntpd to track time jumps without any manual intervention. I'm not suggesting that ntpd should accommodate naive user tests, but just pointing out that they are rather common. But ntp arguably should handle jumps in time or frequency faster than the hours or days it takes now. clock. Not having done so is as if the clock suddenly got a transient hardware defect. NTP is not designed repairing hardware defects. It syncs (good) clocks. (I had to say) Ulrich ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Danny Mayer wrote: You cannot have advanced warning before stepping. It doesn't know until it decides it needs to do it. Yes and no. You do not know that it will definitely step until the step occurs, but you know that it definitely won't step 900 seconds earlier. It can only step if it has been in the SPIK state for 900 seconds. Note that the comment tabulating the state transitions in 4.2p4 is wrong. Yes. You can use -x to slew always. -x doesn't prevent stepping. What it does, in recent ntpd's, is to set the magnitude of the offset before a step will be considered to 600ms, which is a strange choice, as the maximum value for which the kernel discipline is used is 499ms! If you actually want to completely stop stepping, you need to tinker the value to the magic value of zero. Note both -x and tinkering to zero effectively turn off the kernel discipline. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Hal Murray wrote: The ntp log file shows when NTP steps the time. But then the potential harm is already done. Especially if the time moves backward, our server might have serious trouble. Is there a log event which indicates that the time is going to be reset in order to enable us to take appropriate action before the actual reset? I don't know of any way to get advanced warning when ntpd is about to step the time. You cannot have advanced warning before stepping. It doesn't know until it decides it needs to do it. There are command line switches to prevent stepping and to allow one step at startup time. Yes. You can use -x to slew always. Danny ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Danny Mayer wrote: Hal Murray wrote: The ntp log file shows when NTP steps the time. But then the potential harm is already done. Especially if the time moves backward, our server might have serious trouble. Is there a log event which indicates that the time is going to be reset in order to enable us to take appropriate action before the actual reset? I don't know of any way to get advanced warning when ntpd is about to step the time. You cannot have advanced warning before stepping. It doesn't know until it decides it needs to do it. There are command line switches to prevent stepping and to allow one step at startup time. Yes. You can use -x to slew always. Danny And if the clock is off by many hours, it could take days or weeks to synchronize. Use the -g option when you start ntpd. This will set the clock once only. IF your system is working properly, ntpd should be able to synchronize the clock fairly quickly and to keep it synchronized as long as it has a good source of time. If ntpd steps the time AFTER startup, you have a serious problem somewhere. IMHO the best way to deal with such stepping is to find out what's broken that makes stepping necessary and fix it! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
David Woolley wrote: Andy Helten wrote: offset never went 128ms). With time steps enabled the drift value settles 90ppm (and again, no step actually occurs). 90ms is a relatively bad static frequency error; a good machine will be around 10ms. That won't help a clean cold start. I didn't check, but did you have the default min and maxpoll values. A high minpoll might make it difficult to get the loop to converge from there without overflows. My current problem is that drift settles at 82ppm (what I called 90 in previous email) in one run and then 32ppm in another run (with a reboot between). This is similar to the problem I had with stepping disabled where drift would go from +500ppm in one run and then swing all the way to -500ppm in another run (usually with a reboot between). I am not going to spend another minute troubleshooting this problem until we get an updated linux kernel. I will dig into it more deeply if the new kernel exhibits this same drift instability. Our system is considered real-time and thus has many constraints on it, namely that it will run in an isolated environment with no Internet connection. Our setup runs one machine with NTP as a local stratum 1 server using an IRIG-B time source. On that machine I have minpoll set to the lowest (16 seconds). I had to do this so that NTP would begin serving sync requests in a reasonable amount. Startup time is another constraint and we have other boards running as NTP clients that must sync with the NTP server before they can finish initialization. I don't set maxpoll on the server because I've never caught the server changing the polling interval from 16 seconds -- maybe it's a reference clock feature. All other boards in the system run as NTP clients and I use minpoll 5 maxpoll 9 for them. I'm not 100% sure why I chose those values, but I think the idea was to improve NTP reaction time to changes in the synchronization environment. I'm not sure whether those poll settings achieve that, but it sounds like you are suggesting a lower minpoll may speed convergence in cases of higher drift. Andy ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
[EMAIL PROTECTED] (Andy Helten) writes: David Woolley wrote: Andy Helten wrote: offset never went 128ms). With time steps enabled the drift value settles 90ppm (and again, no step actually occurs). 90ms is a relatively bad static frequency error; a good machine will be around 10ms. That won't help a clean cold start. I didn't check, but did you have the default min and maxpoll values. A high minpoll might make it difficult to get the loop to converge from there without overflows. My current problem is that drift settles at 82ppm (what I called 90 in previous email) in one run and then 32ppm in another run (with a reboot between). This is similar to the problem I had with stepping disabled where drift would go from +500ppm in one run and then swing all the way to -500ppm in another run (usually with a reboot between). I am not going to spend another minute troubleshooting this problem until we get an updated linux kernel. I will dig into it more deeply if the new kernel exhibits this same drift instability. That is an incredibly unstable clock. It is hard to imagine that this is a kernel problem. This is on one of your machines? It is not the server connected to the IRIG-B is it? Our system is considered real-time and thus has many constraints on it, namely that it will run in an isolated environment with no Internet connection. Our setup runs one machine with NTP as a local stratum 1 server using an IRIG-B time source. On that machine I have minpoll set No need for internet if you have a local clock. to the lowest (16 seconds). I had to do this so that NTP would begin serving sync requests in a reasonable amount. Startup time is another constraint and we have other boards running as NTP clients that must sync with the NTP server before they can finish initialization. I don't set maxpoll on the server because I've never caught the server changing the polling interval from 16 seconds -- maybe it's a reference clock feature. All other boards in the system run as NTP clients and I use minpoll 5 maxpoll 9 for them. I'm not 100% sure why I chose those values, but I think the idea was to improve NTP reaction time to changes in the synchronization environment. I'm not sure whether those poll settings achieve that, but it sounds like you are suggesting a lower minpoll may speed convergence in cases of higher drift. No. He meant if you had minpoll say 8 or 10 it would make settling down long if the ssytem did not start with a good drift value. However, even minpoll 5 means one data sample every 4 hours roughly(since ntp throws away roughly 7/8 of the samples in the clock_filter). That's a slow convergence. And even minpoll 4, the minimum, is only one sample every 2 hrs. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
My current problem is that drift settles at 82ppm (what I called 90 in previous email) in one run and then 32ppm in another run (with a reboot between). This is similar to the problem I had with stepping disabled where drift would go from +500ppm in one run and then swing all the way to -500ppm in another run (usually with a reboot between). I am not going to spend another minute troubleshooting this problem until we get an updated linux kernel. I will dig into it more deeply if the new kernel exhibits this same drift instability. I think we are talking about two different bugs here. The different drifts on reboot are due to a quirk in the tsc calibration code in the kernal. Grep your sys log for messages like these: Mar 30 21:56:23 shuksan kernel: Detected 2793.091 MHz processor. Mar 30 22:23:28 shuksan kernel: Detected 2793.067 MHz processor. Mar 30 22:42:31 shuksan kernel: Detected 2793.037 MHz processor. Mar 30 23:03:21 shuksan kernel: Detected 2793.085 MHz processor. Mar 31 00:07:37 shuksan kernel: Detected 2793.147 MHz processor. Those bottom bits jumping arround correspond to the different drift values. If you only have one system, you can pick one and hack your kernel to smash in a constant value at the right place. Or you can add something like this to your boot line: clocksource=acpi_pm That's assuming your hardware has acpi and whatever. I've been using it for a while. I haven't noticed any quirks, but who knows. -- These are my opinions, not necessarily my employer's. I hate spam. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Unruh wrote: [EMAIL PROTECTED] (Andy Helten) writes: David Woolley wrote: Andy Helten wrote: offset never went 128ms). With time steps enabled the drift value settles 90ppm (and again, no step actually occurs). 90ms is a relatively bad static frequency error; a good machine will be around 10ms. That won't help a clean cold start. I didn't check, but did you have the default min and maxpoll values. A high minpoll might make it difficult to get the loop to converge from there without overflows. My current problem is that drift settles at 82ppm (what I called 90 in previous email) in one run and then 32ppm in another run (with a reboot between). This is similar to the problem I had with stepping disabled where drift would go from +500ppm in one run and then swing all the way to -500ppm in another run (usually with a reboot between). I am not going to spend another minute troubleshooting this problem until we get an updated linux kernel. I will dig into it more deeply if the new kernel exhibits this same drift instability. That is an incredibly unstable clock. It is hard to imagine that this is a kernel problem. This is on one of your machines? It is not the server connected to the IRIG-B is it? Our system is considered real-time and thus has many constraints on it, namely that it will run in an isolated environment with no Internet connection. Our setup runs one machine with NTP as a local stratum 1 server using an IRIG-B time source. On that machine I have minpoll set No need for internet if you have a local clock. to the lowest (16 seconds). I had to do this so that NTP would begin serving sync requests in a reasonable amount. Startup time is another constraint and we have other boards running as NTP clients that must sync with the NTP server before they can finish initialization. I don't set maxpoll on the server because I've never caught the server changing the polling interval from 16 seconds -- maybe it's a reference clock feature. All other boards in the system run as NTP clients and I use minpoll 5 maxpoll 9 for them. I'm not 100% sure why I chose those values, but I think the idea was to improve NTP reaction time to changes in the synchronization environment. I'm not sure whether those poll settings achieve that, but it sounds like you are suggesting a lower minpoll may speed convergence in cases of higher drift. No. He meant if you had minpoll say 8 or 10 it would make settling down long if the ssytem did not start with a good drift value. However, even minpoll 5 means one data sample every 4 hours roughly(since ntp throws away roughly 7/8 of the samples in the clock_filter). That's a slow convergence. And even minpoll 4, the minimum, is only one sample every 2 hrs. I must be missing something! Minpoll=5 means 2^5 seconds is the minimum poll interval. How are you getting to every four hours from that? ISTR that the default minpoll is 6 which gives 2^6 or 64 seconds. If the server lines in ntp.conf include the iburst keyword, the servers will be polled with an initial burst of eight requests sent at two second intervals. This fills the pipeline and pacifies the filter. Thereafter, ntpd adjusts the polling interval as it thinks best. Normally the poll interval will increase to somewhere between 256 and 1024 seconds once the clock is synchronized. In general, the better the network connection the higher the maximum poll interval. It's interesting to watch the performance of ntpd improve as the network quiets down during the hours when most people sleep! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
On 3 apr, 23:10, Richard B. Gilbert [EMAIL PROTECTED] wrote: ... DOES the time step backward? If ntpd is working properly it should NOT need to step the time at all with the possible exception of a single step when ntpd is first started. If ntpd is stepping time regularly, you have some other problem. If you find and fix that problem, ntpd should stop stepping the time. There are/were known issues with some Linux systems; during periods of high disk usage, clock interrupts would be lost resulting in a FORWARD step. AFAIK these issues were related to EIDE disks used in PIO mode rather than DMA mode. ISTR reading that the problem has been fixed in recent versions of Linux. YMMV I agree that ntpd should not stepping time regularly and that it points to a problem if it happens regularly. But we develop an appliance and we don't control how customers deploy it. Given the adverse effects of stepping time (especially if it moves backwards), I'd would have liked to be protected against badly set-up NTP infrastructure or time servers that are compromised. Jan ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
jkvbe wrote: On 3 apr, 23:10, Richard B. Gilbert [EMAIL PROTECTED] wrote: ... DOES the time step backward? If ntpd is working properly it should NOT need to step the time at all with the possible exception of a single step when ntpd is first started. If ntpd is stepping time regularly, you have some other problem. If you find and fix that problem, ntpd should stop stepping the time. There are/were known issues with some Linux systems; during periods of high disk usage, clock interrupts would be lost resulting in a FORWARD step. AFAIK these issues were related to EIDE disks used in PIO mode rather than DMA mode. ISTR reading that the problem has been fixed in recent versions of Linux. YMMV I agree that ntpd should not stepping time regularly and that it points to a problem if it happens regularly. But we develop an appliance and we don't control how customers deploy it. Given the adverse effects of stepping time (especially if it moves backwards), I'd would have liked to be protected against badly set-up NTP infrastructure or time servers that are compromised. Jan It seems to me that, in the circumstance you describe, supplying correct time is the customer's problem! Having read this newsgroup for the last four or five years, I'm aware that people do some very strange things with computer clocks. I'm thinking, in particlar, of at least one individual who deliberately set his clock to an incorrect time in order to see if Ntpd would correct it. Ntpd did so, of course, but he was not happy with the way it was done or the amount of time it took! If it's not under your control, it's not your responsibilty! Your instructions for the appliance should point this out pretty explicitly; e.g. IF YOUR TIME SERVERS CAUSE TIME TO STEP, THE FOLLOWING ADVERSE CONSEQUENCES CAN BE EXPECTED TO OCCUR: list of adverse consequences It is YOUR responsibility to ensure that this does not happen! The only halfway legitimate thing I can think of that would cause time to step would be a leap second. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Richard B. Gilbert wrote: that people do some very strange things with computer clocks. I'm thinking, in particlar, of at least one individual who deliberately set his clock to an incorrect time in order to see if Ntpd would correct it. Many people do this. It is the naive users' way of testing that ntpd works. Ntpd did so, of course, but he was not happy with the way it was done or the amount of time it took! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Hal wrote: My current problem is that drift settles at 82ppm (what I called 90 in previous email) in one run and then 32ppm in another run (with a reboot between). This is similar to the problem I had with stepping disabled where drift would go from +500ppm in one run and then swing all the way to -500ppm in another run (usually with a reboot between). I am not going to spend another minute troubleshooting this problem until we get an updated linux kernel. I will dig into it more deeply if the new kernel exhibits this same drift instability. I think we are talking about two different bugs here. The different drifts on reboot are due to a quirk in the tsc calibration code in the kernal. Grep your sys log for messages like these: Mar 30 21:56:23 shuksan kernel: Detected 2793.091 MHz processor. Mar 30 22:23:28 shuksan kernel: Detected 2793.067 MHz processor. Mar 30 22:42:31 shuksan kernel: Detected 2793.037 MHz processor. Mar 30 23:03:21 shuksan kernel: Detected 2793.085 MHz processor. Mar 31 00:07:37 shuksan kernel: Detected 2793.147 MHz processor. Those bottom bits jumping arround correspond to the different drift values. If you only have one system, you can pick one and hack your kernel to smash in a constant value at the right place. Or you can add something like this to your boot line: clocksource=acpi_pm That's assuming your hardware has acpi and whatever. I've been using it for a while. I haven't noticed any quirks, but who knows. YES! The slight variation in measured CPU speed seems to explain my continued drift instability (where continued means even with stepping enabled). I was able to retrieve four CPU speed measurements that had corresponding NTP loop logs. The table below shows the perfect correlation between linux-measured CPU speed and NTP-measured drift. Clearly the real CPU speed is somewhere around 2000.200 MHz. measured CPU speed | measured drift (MHz) | (ppm) --- 2000.153 | -23 2000.215 |8 2000.321 | 61 2000.367 | 84 As I've stated before, I don't believe the oscillator is really this unstable, but I could be wrong. After all, my CPU measurements varied much more than yours, especially from one run to the next. However, I'm still open to the possibility that linux's approach to speed measurement is less than perfect (at least for my version of linux). These measurements were on a core 2 duo (2 processors) running RedHawk linux 2.6.18.8. Hal, can you tell me which version of linux resulted in your list of speed measurements? I also wonder if the use of two processors has any impact on this behavior. I tried forcing CPU affinity for the NTP process, but it didn't have any effect on the measured drift value. This means that either there truly is no difference between CPUs (as in different speed/frequency characteristics) or I wasn't actually moving the process between CPUs (using /proc/pid/affinity). I'm assuming both CPUs have the same oscillator, so it makes sense that they would measure the same drift. Thanks, Andy ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
As I've stated before, I don't believe the oscillator is really this unstable, but I could be wrong. After all, my CPU measurements varied much more than yours, especially from one run to the next. However, I'm still open to the possibility that linux's approach to speed measurement is less than perfect (at least for my version of linux). These measurements were on a core 2 duo (2 processors) running RedHawk linux 2.6.18.8. Hal, can you tell me which version of linux resulted in your list of speed measurements? Your crystal is probably fine. At one point, I hacked my kernel to call the calibration routine several times and printout the answer. A batch of answers from the same time (and hopefully same temperature) had the same sort of scatter. I'm running 2.6.23 wih a few local hacks. 2.6.19 has similar problems. -- These are my opinions, not necessarily my employer's. I hate spam. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
The ntp log file shows when NTP steps the time. But then the potential harm is already done. Especially if the time moves backward, our server might have serious trouble. Is there a log event which indicates that the time is going to be reset in order to enable us to take appropriate action before the actual reset? I don't know of any way to get advanced warning when ntpd is about to step the time. There are command line switches to prevent stepping and to allow one step at startup time. The disadvantage with preventing steps is that it might take a long time to correct the time. But if you start with good time your clock will never get off far enough to cause problems. Is there a wiki page on this topic? -- These are my opinions, not necessarily my employer's. I hate spam. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Hal Murray wrote: The ntp log file shows when NTP steps the time. But then the potential harm is already done. Especially if the time moves backward, our server might have serious trouble. Is there a log event which indicates that the time is going to be reset in order to enable us to take appropriate action before the actual reset? I don't know of any way to get advanced warning when ntpd is about to step the time. There are command line switches to prevent stepping and to allow one step at startup time. The disadvantage with preventing steps is that it might take a long time to correct the time. But if you start with good time your clock will never get off far enough to cause problems. Is there a wiki page on this topic? Another disadvantage with preventing steps is that it isn't really a supported mode (because it's a tinker) and, as I've found, it doesn't always work. When I disable time steps on a linux 2.6.18 kernel, the drift value goes to +/-500 and can actually swap sign from one run to the next. This happens even though a time step was never needed (i.e. offset never went 128ms). With time steps enabled the drift value settles 90ppm (and again, no step actually occurs). From what I've been able to piece together, this different behavior between step/!step is probably due to the kernel time discipline being disabled with !step, coupled with a (potential) bug in linux that forces NTP's manual adjustments to have a granularity of 1ms (i.e. somewhere an adjustment is rounded up or down). I've not verified the bug is present in my 2.6.18 linux kernel, so don't quote me on it. One might ask why the kernel time discipline is preemptively disabled in this manner -- maybe there is a good reason. Our application also does not currently handle backward time steps. Our workaround to the problematic !step is to realize, as others on this list have pointed out, that a time step should never occur in a normally functioning system. If a step does occur, we probably have bigger problems than those caused by the step itself, such as: lost timer interrupts, failing hardware, runaway process, kernel bug, NTP bug, etc. Andy ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Hi Hal, Can you post the Twiki URL after posting it. Venu Hal Murray wrote: So its the DISK I/O thats causing loss of ticks ? My first Red Hat system defaulted to no-DMA for IDE disks. (Yes, that was a long time ago.) With that setup, it was simple to generate lots of lost timer interrupts: just keep the disk busy doing reads. (Seeks don't count. Read consecutive sectors.) My problem went away when I turned on DMA. I have a hack that I originally wrote for measuring disk transfer rates. I'll dust it off and put it on the wiki if people think it will be helpful for discussions like this. It compiles on Linux. It should be easy to get it to work on other *nix systems. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
D.Venu Gopal wrote: David Woolley wrote: Venu Gopal wrote: Its clear that CPU is heavily loaded which might be leading to loss of ticks. Yet to check the DMA status for CPU loading doesn't cause lost timer interrupts. (More precisely overruns.) So its the DISK I/O thats causing loss of ticks ? IDE DISK. I'll be out for about a week, after returning I'll give few more stats based on combination of CPU load, Disk I/O and Network I/O. There have been several examples of drivers which kept interrupts disabled for too long, so that timer ticks couldn't get through. In most cases (AFAIK) this have been drivers for IDE disks, especially if they didn't use DMA. This has happened across several operating systems (I remember Linux, Windows, and formerly OS/2), with drivers which had not been designed properly. So this depends on a specific version of the OS and a specific version of a specific driver. You can not say in general that IDE drivers cause lost timer ticks, but they are good candidates. Unfortunately the new clock routines in the Linux kernel seem to be causing problems sometime. This seems to be due to certain combination of a clock module which handles a particulare timer on the mainboard and the particular timer the implementation of which may vary by the chipset. This is not exactly the same as lost timer ticks, but the results are similar, i.e. the clock drift can be so large or changing so much that ntpd fails to correct it. Martin -- Martin Burnicki Meinberg Funkuhren Bad Pyrmont Germany ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
So its the DISK I/O thats causing loss of ticks ? My first Red Hat system defaulted to no-DMA for IDE disks. (Yes, that was a long time ago.) With that setup, it was simple to generate lots of lost timer interrupts: just keep the disk busy doing reads. (Seeks don't count. Read consecutive sectors.) My problem went away when I turned on DMA. I have a hack that I originally wrote for measuring disk transfer rates. I'll dust it off and put it on the wiki if people think it will be helpful for discussions like this. It compiles on Linux. It should be easy to get it to work on other *nix systems. -- These are my opinions, not necessarily my employer's. I hate spam. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Hal Murray wrote: So its the DISK I/O thats causing loss of ticks ? My first Red Hat system defaulted to no-DMA for IDE disks. (Yes, that was a long time ago.) With that setup, it was simple to generate lots of lost timer interrupts: just keep the disk busy doing reads. (Seeks don't count. Read consecutive sectors.) My problem went away when I turned on DMA. I have a hack that I originally wrote for measuring disk transfer rates. I'll dust it off and put it on the wiki if people think it will be helpful for discussions like this. Ok Hal. This will be certainly helpful. It compiles on Linux. It should be easy to get it to work on other *nix systems. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
David Woolley wrote: Venu Gopal wrote: Its clear that CPU is heavily loaded which might be leading to loss of ticks. Yet to check the DMA status for CPU loading doesn't cause lost timer interrupts. (More precisely overruns.) So its the DISK I/O thats causing loss of ticks ? IDE DISK. I'll be out for about a week, after returning I'll give few more stats based on combination of CPU load, Disk I/O and Network I/O. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Venu Gopal [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] [...] So its true that when CPU load is high, kernel might be loosing ticks. When I repeated the same in other clients the drift was in the order of few milliseconds. I suppose it has something to do with the amount of CPU load and disk I/O when crond performs its tasks. More often disk I/O then CPU load. And then often because DMA is disabled. Could you check that? Groetjes, Maarten Wiltink ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Hi , crond has only one weekly task of updating whatis database. It is scheduled to run makewhatis with path for man pages. Venu ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Hi, I need to experiment a bit to give some stats on CPU load and DISK I/O when crond tasks are run. Venu ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
[EMAIL PROTECTED] (Venu Gopal) wrote in news:[EMAIL PROTECTED]: Hi, I need to experiment a bit to give some stats on CPU load and DISK I/O when crond tasks are run. Venu Hi all, To measure the CPU load and DISK I/O I've used iostat while running the crons weekly tasks of updatring the whatis database. Below is the slice of the log : avg-cpu: %user %nice%sys %idle 77.000.00 16.007.00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev3-0 132.50 1032.0068.00 2064136 avg-cpu: %user %nice%sys %idle 91.000.004.504.50 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev3-0 50.50 296.00 2492.00592 4984 avg-cpu: %user %nice%sys %idle 82.000.002.00 16.00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev3-0 43.50 420.00 0.00840 0 avg-cpu: %user %nice%sys %idle 99.000.001.000.00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev3-01.00 4.00 4.00 8 8 avg-cpu: %user %nice%sys %idle 98.000.002.000.00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev3-0 22.00 0.00 2412.00 0 4824 avg-cpu: %user %nice%sys %idle 97.500.002.500.00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev3-01.00 0.00 8.00 0 16 avg-cpu: %user %nice%sys %idle 98.500.001.500.00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev3-0 59.50 0.00 3340.00 0 6680 avg-cpu: %user %nice%sys %idle 98.000.002.000.00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev3-01.00 0.00 8.00 0 16 avg-cpu: %user %nice%sys %idle 14.000.001.00 85.00 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev3-04.0036.00 8.00 72 16 Its clear that CPU is heavily loaded which might be leading to loss of ticks. Yet to check the DMA status for IDE DISK. I'll be out for about a week, after returning I'll give few more stats based on combination of CPU load, Disk I/O and Network I/O. Venu ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Venu Gopal wrote: Its clear that CPU is heavily loaded which might be leading to loss of ticks. Yet to check the DMA status for CPU loading doesn't cause lost timer interrupts. (More precisely overruns.) IDE DISK. I'll be out for about a week, after returning I'll give few more stats based on combination of CPU load, Disk I/O and Network I/O. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Hi Harlan, Its 'HP COMPAQ DX200 MT' running RedHat-9.0 (Linux-2.4.20-8). The previous machine is a similar one where time reset used to happen at least once a day. I referred to the http://support.ntp.org/Support for troubleshooting pages. I tried to get the system manuals but its phased out and no documentation is available at HP/COMPAQ sites. I am trying to find the material supplied along with the machines. But is there a way to debug this problem ? This has been a long standing problem ( almost 2 years ! ) Venu ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Venu Gopal) writes: Venu Hi Harlan, Its 'HP COMPAQ DX200 MT' running RedHat-9.0 Venu (Linux-2.4.20-8). The previous machine is a similar one where time Venu reset used to happen at least once a day. What about: http://support.ntp.org/bin/view/Support/KnownOsIssues#Section_9.2.3.1. which talks about a lost interrupt problem for that kernel. -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Hi Harlan, Thanks a lot for the references. Venu ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
[ntp:questions] Time reset
Hi, We have a time-server with about seven clients over LAN. Only in one clients it is observed that the time is reset almost daily. We have replaced the client with a new system. And if not daily but the time reset has occurred once. Below is the ntp log : 2 Mar 02:33:52 ntpd[1566]: offset -0.33 sec freq 0.245 ppm error 0.20 poll 4 2 Mar 03:33:52 ntpd[1566]: offset 0.10 sec freq 0.267 ppm error 0.17poll 4 2 Mar 04:33:52 ntpd[1566]: offset -0.20 sec freq 0.232 ppm error 0.057544 poll 4 2 Mar 04:37:56 ntpd[1566]: time reset 2.220040 s 2 Mar 04:37:56 ntpd[1566]: synchronisation lost 2 Mar 04:37:56 ntpd[1566]: system event 'event_clock_reset' (0x05) status 'leap_none, sync_unspec, 15 events, event_peer/strat_chg' (0xf4) 2 Mar 04:37:56 ntpd[1566]: system event 'event_peer/strat_chg' (0x04) status 'leap_none, sync_unspec, 15 events, event_clock_reset' (0xf5) 2 Mar 04:37:57 ntpd[1566]: peer 10.10.10.10 event 'event_reach' (0x84) status 'unreach, conf, 5 events, event_reach' (0x8054) 2 Mar 04:38:46 ntpd[1566]: peer 10.10.10.11 event 'event_reach' (0x84) status 'unreach, conf, 5 events, event_reach' (0x8054) 2 Mar 04:38:50 ntpd[1566]: peer 10.10.10.12 event 'event_reach' (0x84) status 'unreach, conf, 7 events, event_reach' (0x8074) 2 Mar 05:33:54 ntpd[1566]: offset -0.000327 sec freq 3.477 ppm error 0.15 poll 4 2 Mar 06:33:54 ntpd[1566]: offset -0.09 sec freq 0.246 ppm error 0.19 poll 4 2 Mar 07:33:54 ntpd[1566]: offset 0.05 sec freq 0.246 ppm error 0.14poll 4 In the third line the ppm error jumped from 0.17 to 0.057544. Is it something to do with improper clock interrupts due to its misbehavior ? Venu ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] Time reset
Venu, What hardware and OS is this? Have you seen the troubleshooting pages (known hardware and OS problems) at http://support.ntp.org/Support ? -- Harlan Stenn [EMAIL PROTECTED] http://ntpforum.isc.org - be a member! ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions