Re: [ntp:questions] Time reset

2008-05-05 Thread Evandro Menezes
On another note, I wish that NTP had an option to disable backward
jumps in time.  Is it something that has been considered before?

TIA

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-05-05 Thread Richard B. Gilbert
Evandro Menezes wrote:
 On another note, I wish that NTP had an option to disable backward
 jumps in time.  Is it something that has been considered before?
 
 TIA

If you are seeing backward jumps on a running ntpd, something is VERY 
wrong.  Normally the only time you will see a jump in either direction 
is when ntpd initializes.  If you keep your server running 24x7 you 
should not see jumps at any other time.  Some Linux systems have been 
reported to lose clock interrupts during periods of heavy disk activity 
but this results in forward steps.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-05-03 Thread David J Taylor
Unruh wrote:
[]
 But ntp arguably should handle jumps in time or frequency faster than
 the hours or days it takes now.

Why?  A jump (to me) implies an error in the hardware.  Who is to say that 
it's a one-off jump, or that it might not occur again in a few seconds, 
minutes ot hours?  Should it not be the hardware which is corrected, not 
NTP?

G

Cheers,
David 


___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-05-03 Thread David Woolley
David J Taylor wrote:
 Unruh wrote:
 []
 But ntp arguably should handle jumps in time or frequency faster than
 the hours or days it takes now.
 
 Why?  A jump (to me) implies an error in the hardware.  Who is to say that 
 it's a one-off jump, or that it might not occur again in a few seconds, 
 minutes ot hours?  Should it not be the hardware which is corrected, not 
 NTP?

I think one has to distinguish between jumps in time and in frequency. 
The naive test uses a jump in time, which is not a real life situation 
for serviceable equipment.  However, jumps in frequency are quite 
possible as the result of rapid temperature changes, and as the results 
of aging processes in the electronics.

Unruh's main issues with ntpd is actually its response to frequency 
transients, not its response to phase transients, and he is diluting his 
case by introducing phase transients.  The one exception to this is the 
ntpd's poor response to the startup phase transient.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-05-03 Thread Unruh
David J Taylor [EMAIL PROTECTED] writes:

Unruh wrote:
[]
 But ntp arguably should handle jumps in time or frequency faster than
 the hours or days it takes now.

Why?  A jump (to me) implies an error in the hardware.  Who is to say that 

Or an error in software (missed timer ticks), or...

it's a one-off jump, or that it might not occur again in a few seconds, 

Yes, who is to say. So ntp should correct each one as quickly as possible. 


minutes ot hours?  Should it not be the hardware which is corrected, not 
NTP?

On that argument ntp should not do anything more than rdate did, since if
the oscillator on the clocks were perfect there would be no need for
anything but an single setting of the clock once per bootup. So ntp should
not try to correct any errors but rather people should be advised to buy
new hardware. The whole point to ntp is to try to keep the clocks operating
as close to real time as possible in the real world-- with bad oscillators,
time jumps in the OS, 
Agreed it ain't perfect, the question is how close to perfection can we get
it.


G

Cheers,
David 


___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-05-02 Thread Ulrich Windl
jkvbe [EMAIL PROTECTED] writes:

 The ntp log file shows when NTP steps the time. But then the potential harm
 is already done. Especially if the time moves backward, our server might
 have serious trouble. Is there a log event which indicates that the time is
 going to be reset in order to enable us to take appropriate action before
 the actual reset?

Actually I think: Yes. The symptom typically is a state that is different
from 4 over multiple polling/update intervals (ntpq -c rl).

Ulrich

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-05-02 Thread David L. Mills
Ulrich,

The current (development) code has a new statistics file, called 
protostats, that records significant events, like system peer changes, 
mobilization/demobilization and error events. Among the events is a 
spike detected event, which indicates that an offset arrived greater 
than the step threshold (125 ms). Most of the time the spike is one-off 
and is simply discarded. However, if the offset persists beyound the 
stepout threshold (900 s), it will result in a step correction.

Some might interpret the spike event as a warning that a step might 
occur 15 minutes later, but then would have to determine that this is 
indeed not a spike but a legitimate warning. Should it be interreted as 
a warning, it is not at all clear what an application should do. For 
instance, some IBM mainframers recommends that aall pplications should 
be completely shut down during the changeover between standard and 
daylight time. On the other hand, a protostats event triggers a trap, 
which could be caught by a monitoring program that kills power to the 
machine room.

Dave

Ulrich Windl wrote:

 jkvbe [EMAIL PROTECTED] writes:
 
 
The ntp log file shows when NTP steps the time. But then the potential harm
is already done. Especially if the time moves backward, our server might
have serious trouble. Is there a log event which indicates that the time is
going to be reset in order to enable us to take appropriate action before
the actual reset?
 
 
 Actually I think: Yes. The symptom typically is a state that is different
 from 4 over multiple polling/update intervals (ntpq -c rl).
 
 Ulrich

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-05-02 Thread David Woolley
Ulrich Windl wrote:
 David Woolley [EMAIL PROTECTED] writes:
 
 Richard B. Gilbert wrote:

 that people do some very strange things with computer clocks.  I'm thinking,
 in particlar, of at least one individual who deliberately set his clock to
 an incorrect time in order to see if Ntpd would correct it.
 Many people do this.  It is the naive users' way of testing that ntpd 
 works.
 
 Yes, but many forget to restart ntpd immediately after having chnaged the

They don't forget.  The naive users who run this sort of test have no 
idea that it is an unreasonable thing to do.  In fact, restarting ntpd 
on the client would invalidate their test, as the test is about the 
ability of ntpd to track time jumps without any manual intervention.

I'm not suggesting that ntpd should accommodate naive user tests, but 
just pointing out that they are rather common.

 clock. Not having done so is as if the clock suddenly got a transient hardware
 defect. NTP is not designed repairing hardware defects. It syncs (good)
 clocks.
 
 (I had to say)
 
 Ulrich

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-05-02 Thread Unruh
David Woolley [EMAIL PROTECTED] writes:

Ulrich Windl wrote:
 David Woolley [EMAIL PROTECTED] writes:
 
 Richard B. Gilbert wrote:

 that people do some very strange things with computer clocks.  I'm 
 thinking,
 in particlar, of at least one individual who deliberately set his clock to
 an incorrect time in order to see if Ntpd would correct it.
 Many people do this.  It is the naive users' way of testing that ntpd 
 works.
 
 Yes, but many forget to restart ntpd immediately after having chnaged the

They don't forget.  The naive users who run this sort of test have no 
idea that it is an unreasonable thing to do.  In fact, restarting ntpd 
on the client would invalidate their test, as the test is about the 
ability of ntpd to track time jumps without any manual intervention.

I'm not suggesting that ntpd should accommodate naive user tests, but 
just pointing out that they are rather common.

But ntp arguably should handle jumps in time or frequency faster than the
hours or days it takes now. 



 clock. Not having done so is as if the clock suddenly got a transient 
 hardware
 defect. NTP is not designed repairing hardware defects. It syncs (good)
 clocks.
 
 (I had to say)
 
 Ulrich

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-04-06 Thread David Woolley
Danny Mayer wrote:
 
 You cannot have advanced warning before stepping. It doesn't know until 
 it decides it needs to do it.

Yes and no.  You do not know that it will definitely step until the step 
occurs, but you know that it definitely won't step 900 seconds earlier. 
  It can only step if it has been in the SPIK state for 900 seconds. 
Note that the comment tabulating the state transitions in 4.2p4 is wrong.

 
 Yes. You can use -x to slew always.

-x doesn't prevent stepping.  What it does, in recent ntpd's, is to set 
the magnitude of the offset before a step will be considered to 600ms, 
which is a strange choice, as the maximum value for which the kernel 
discipline is used is 499ms!

If you actually want to completely stop stepping, you need to tinker the 
value to the magic value of zero.

Note both -x and tinkering to zero effectively turn off the kernel 
discipline.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-04-05 Thread Danny Mayer
Hal Murray wrote:
 The ntp log file shows when NTP steps the time. But then the potential harm
 is already done. Especially if the time moves backward, our server might
 have serious trouble. Is there a log event which indicates that the time is
 going to be reset in order to enable us to take appropriate action before
 the actual reset?
 
 I don't know of any way to get advanced warning when ntpd is about to
 step the time.
 

You cannot have advanced warning before stepping. It doesn't know until 
it decides it needs to do it.

 There are command line switches to prevent stepping and to allow
 one step at startup time.
 

Yes. You can use -x to slew always.

Danny
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-04-05 Thread Richard B. Gilbert
Danny Mayer wrote:
 Hal Murray wrote:
 
The ntp log file shows when NTP steps the time. But then the potential harm
is already done. Especially if the time moves backward, our server might
have serious trouble. Is there a log event which indicates that the time is
going to be reset in order to enable us to take appropriate action before
the actual reset?

I don't know of any way to get advanced warning when ntpd is about to
step the time.

 
 
 You cannot have advanced warning before stepping. It doesn't know until 
 it decides it needs to do it.
 
 
There are command line switches to prevent stepping and to allow
one step at startup time.

 
 
 Yes. You can use -x to slew always.
 
 Danny

And if the clock is off by many hours, it could take days or weeks to 
synchronize.

Use the -g option when you start ntpd.  This will set the clock once 
only.  IF your system is working properly, ntpd should be able to 
synchronize the clock fairly quickly and to keep it synchronized as long 
as it has a good source of time.

If ntpd steps the time AFTER startup, you have a serious problem 
somewhere.  IMHO the best way to deal with such stepping is to find out 
what's broken that makes stepping necessary and fix it!

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-04-04 Thread Andy Helten

David Woolley wrote:
 Andy Helten wrote:
   
 offset never went 128ms).  With time steps enabled the drift value
 settles 90ppm (and again, no step actually occurs).
 

 90ms is a relatively bad static frequency error; a good machine will be 
 around 10ms.  That won't help a clean cold start.

 I didn't check, but did you have the default min and maxpoll values.  A 
 high minpoll might make it difficult to get the loop to converge from 
 there without overflows.
   

My current problem is that drift settles at 82ppm (what I called 90 in
previous email) in one run and then 32ppm in another run (with a reboot
between).  This is similar to the problem I had with stepping disabled
where drift would go from +500ppm in one run and then swing all the way
to -500ppm in another run (usually with a reboot between).  I am not
going to spend another minute troubleshooting this problem until we get
an updated linux kernel.  I will dig into it more deeply if the new
kernel exhibits this same drift instability.

Our system is considered real-time and thus has many constraints on
it, namely that it will run in an isolated environment with no Internet
connection.  Our setup runs one machine with NTP as a local stratum 1
server using an IRIG-B time source.  On that machine I have minpoll set
to the lowest (16 seconds).  I had to do this so that NTP would begin
serving sync requests in a reasonable amount.  Startup time is another
constraint and we have other boards running as NTP clients that must
sync with the NTP server before they can finish initialization.  I don't
set maxpoll on the server because I've never caught the server changing
the polling interval from 16 seconds -- maybe it's a reference clock
feature.

All other boards in the system run as NTP clients and I use minpoll 5
maxpoll 9 for them.  I'm not 100% sure why I chose those values, but I
think the idea was to improve NTP reaction time to changes in the
synchronization environment.  I'm not sure whether those poll settings
achieve that, but it sounds like you are suggesting a lower minpoll may
speed convergence in cases of higher drift.

Andy

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-04-04 Thread Unruh
[EMAIL PROTECTED] (Andy Helten) writes:


David Woolley wrote:
 Andy Helten wrote:
   
 offset never went 128ms).  With time steps enabled the drift value
 settles 90ppm (and again, no step actually occurs).
 

 90ms is a relatively bad static frequency error; a good machine will be 
 around 10ms.  That won't help a clean cold start.

 I didn't check, but did you have the default min and maxpoll values.  A 
 high minpoll might make it difficult to get the loop to converge from 
 there without overflows.
   

My current problem is that drift settles at 82ppm (what I called 90 in
previous email) in one run and then 32ppm in another run (with a reboot
between).  This is similar to the problem I had with stepping disabled
where drift would go from +500ppm in one run and then swing all the way
to -500ppm in another run (usually with a reboot between).  I am not
going to spend another minute troubleshooting this problem until we get
an updated linux kernel.  I will dig into it more deeply if the new
kernel exhibits this same drift instability.


That is an incredibly unstable clock. It is hard to imagine that this is a
kernel problem. This is on one of your machines? It is not the server
connected to the IRIG-B is it? 

Our system is considered real-time and thus has many constraints on
it, namely that it will run in an isolated environment with no Internet
connection.  Our setup runs one machine with NTP as a local stratum 1
server using an IRIG-B time source.  On that machine I have minpoll set

No need for internet if you have a local clock. 


to the lowest (16 seconds).  I had to do this so that NTP would begin
serving sync requests in a reasonable amount.  Startup time is another
constraint and we have other boards running as NTP clients that must
sync with the NTP server before they can finish initialization.  I don't
set maxpoll on the server because I've never caught the server changing
the polling interval from 16 seconds -- maybe it's a reference clock
feature.

All other boards in the system run as NTP clients and I use minpoll 5
maxpoll 9 for them.  I'm not 100% sure why I chose those values, but I
think the idea was to improve NTP reaction time to changes in the
synchronization environment.  I'm not sure whether those poll settings
achieve that, but it sounds like you are suggesting a lower minpoll may
speed convergence in cases of higher drift.

No. He meant if you had minpoll say 8 or 10 it would make settling down
long if the ssytem did not start with a good drift value. 
However, even minpoll 5 means one data sample every 4 hours roughly(since
ntp throws away roughly 7/8 of the samples in the clock_filter). That's a slow
convergence. And even minpoll 4, the minimum, is only one sample every 2
hrs.


___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-04-04 Thread Hal Murray

My current problem is that drift settles at 82ppm (what I called 90 in
previous email) in one run and then 32ppm in another run (with a reboot
between).  This is similar to the problem I had with stepping disabled
where drift would go from +500ppm in one run and then swing all the way
to -500ppm in another run (usually with a reboot between).  I am not
going to spend another minute troubleshooting this problem until we get
an updated linux kernel.  I will dig into it more deeply if the new
kernel exhibits this same drift instability.

I think we are talking about two different bugs here.

The different drifts on reboot are due to a quirk in the tsc
calibration code in the kernal.  Grep your sys log for messages
like these:
  Mar 30 21:56:23 shuksan kernel: Detected 2793.091 MHz processor.
  Mar 30 22:23:28 shuksan kernel: Detected 2793.067 MHz processor.
  Mar 30 22:42:31 shuksan kernel: Detected 2793.037 MHz processor.
  Mar 30 23:03:21 shuksan kernel: Detected 2793.085 MHz processor.
  Mar 31 00:07:37 shuksan kernel: Detected 2793.147 MHz processor.
Those bottom bits jumping arround correspond to the different
drift values.

If you only have one system, you can pick one and hack your
kernel to smash in a constant value at the right place.

Or you can add something like this to your boot line:
  clocksource=acpi_pm
That's assuming your hardware has acpi and whatever.

I've been using it for a while.  I haven't noticed any quirks,
but who knows.

-- 
These are my opinions, not necessarily my employer's.  I hate spam.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-04-04 Thread Richard B. Gilbert
Unruh wrote:
 [EMAIL PROTECTED] (Andy Helten) writes:
 
 
 
David Woolley wrote:

Andy Helten wrote:
  

offset never went 128ms).  With time steps enabled the drift value
settles 90ppm (and again, no step actually occurs).


90ms is a relatively bad static frequency error; a good machine will be 
around 10ms.  That won't help a clean cold start.

I didn't check, but did you have the default min and maxpoll values.  A 
high minpoll might make it difficult to get the loop to converge from 
there without overflows.
  

 
My current problem is that drift settles at 82ppm (what I called 90 in
previous email) in one run and then 32ppm in another run (with a reboot
between).  This is similar to the problem I had with stepping disabled
where drift would go from +500ppm in one run and then swing all the way
to -500ppm in another run (usually with a reboot between).  I am not
going to spend another minute troubleshooting this problem until we get
an updated linux kernel.  I will dig into it more deeply if the new
kernel exhibits this same drift instability.
 
 
 
 That is an incredibly unstable clock. It is hard to imagine that this is a
 kernel problem. This is on one of your machines? It is not the server
 connected to the IRIG-B is it? 
 
 
Our system is considered real-time and thus has many constraints on
it, namely that it will run in an isolated environment with no Internet
connection.  Our setup runs one machine with NTP as a local stratum 1
server using an IRIG-B time source.  On that machine I have minpoll set
 
 
 No need for internet if you have a local clock. 
 
 
 
to the lowest (16 seconds).  I had to do this so that NTP would begin
serving sync requests in a reasonable amount.  Startup time is another
constraint and we have other boards running as NTP clients that must
sync with the NTP server before they can finish initialization.  I don't
set maxpoll on the server because I've never caught the server changing
the polling interval from 16 seconds -- maybe it's a reference clock
feature.
 
 
All other boards in the system run as NTP clients and I use minpoll 5
maxpoll 9 for them.  I'm not 100% sure why I chose those values, but I
think the idea was to improve NTP reaction time to changes in the
synchronization environment.  I'm not sure whether those poll settings
achieve that, but it sounds like you are suggesting a lower minpoll may
speed convergence in cases of higher drift.
 
 
 No. He meant if you had minpoll say 8 or 10 it would make settling down
 long if the ssytem did not start with a good drift value. 
 However, even minpoll 5 means one data sample every 4 hours roughly(since
 ntp throws away roughly 7/8 of the samples in the clock_filter). That's a slow
 convergence. And even minpoll 4, the minimum, is only one sample every 2
 hrs.
 
 

I must be missing something!  Minpoll=5 means 2^5 seconds is the minimum 
poll interval.  How are you getting to every four hours from that?  ISTR 
that the default minpoll is 6 which gives 2^6 or 64 seconds.

If the server lines in ntp.conf include the iburst keyword, the 
servers will be polled with an initial burst of eight requests sent at 
two second intervals.  This fills the pipeline and pacifies the 
filter.  Thereafter, ntpd adjusts the polling interval as it thinks 
best.  Normally the poll interval will increase to somewhere between 256 
and 1024 seconds once the clock is synchronized.  In general, the better 
the network connection the higher the maximum poll interval.

It's interesting to watch the performance of ntpd improve as the network 
quiets down during the hours when most people sleep!


___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-04-04 Thread jkvbe
On 3 apr, 23:10, Richard B. Gilbert [EMAIL PROTECTED] wrote:
...

 DOES the time step backward?

 If ntpd is working properly it should NOT need to step the time at all
 with the possible exception of a single step when ntpd is first started.

 If ntpd is stepping time regularly, you have some other problem.  If you
 find and fix that problem, ntpd should stop stepping the time.

 There are/were known issues with some Linux systems; during periods of
 high disk usage, clock interrupts would be lost resulting in a FORWARD
 step.  AFAIK these issues were related to EIDE disks used in PIO mode
 rather than DMA mode.  ISTR reading that the problem has been fixed in
 recent versions of Linux.  YMMV

I agree that ntpd should not stepping time regularly and that it
points to a problem if it happens regularly. But we develop an
appliance and we don't control how customers deploy it. Given the
adverse effects of stepping time (especially if it moves backwards),
I'd would have liked to be protected against badly set-up NTP
infrastructure or time servers that are compromised.

Jan

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-04-04 Thread Richard B. Gilbert
jkvbe wrote:
 On 3 apr, 23:10, Richard B. Gilbert [EMAIL PROTECTED] wrote:
 ...
 
DOES the time step backward?

If ntpd is working properly it should NOT need to step the time at all
with the possible exception of a single step when ntpd is first started.

If ntpd is stepping time regularly, you have some other problem.  If you
find and fix that problem, ntpd should stop stepping the time.

There are/were known issues with some Linux systems; during periods of
high disk usage, clock interrupts would be lost resulting in a FORWARD
step.  AFAIK these issues were related to EIDE disks used in PIO mode
rather than DMA mode.  ISTR reading that the problem has been fixed in
recent versions of Linux.  YMMV
 
 
 I agree that ntpd should not stepping time regularly and that it
 points to a problem if it happens regularly. But we develop an
 appliance and we don't control how customers deploy it. Given the
 adverse effects of stepping time (especially if it moves backwards),
 I'd would have liked to be protected against badly set-up NTP
 infrastructure or time servers that are compromised.
 
 Jan

It seems to me that, in the circumstance you describe, supplying correct 
time is the customer's problem!

Having read this newsgroup for the last four or five years, I'm aware 
that people do some very strange things with computer clocks.  I'm 
thinking, in particlar, of at least one individual who deliberately set 
his clock to an incorrect time in order to see if Ntpd would correct it.
Ntpd did so, of course, but he was not happy with the way it was done or 
the amount of time it took!

If it's not under your control, it's not your responsibilty!  Your 
instructions for the appliance should point this out pretty explicitly;
e.g. IF YOUR TIME SERVERS CAUSE TIME TO STEP, THE FOLLOWING ADVERSE 
CONSEQUENCES CAN BE EXPECTED TO OCCUR: list of adverse consequences
It is YOUR responsibility to ensure that this does not happen!

The only halfway legitimate thing I can think of that would cause time 
to step would be a leap second.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-04-04 Thread David Woolley
Richard B. Gilbert wrote:

 that people do some very strange things with computer clocks.  I'm 
 thinking, in particlar, of at least one individual who deliberately set 
 his clock to an incorrect time in order to see if Ntpd would correct it.

Many people do this.  It is the naive users' way of testing that ntpd 
works.

 Ntpd did so, of course, but he was not happy with the way it was done or 
 the amount of time it took!

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-04-04 Thread Andy Helten

Hal wrote:

 My current problem is that drift settles at 82ppm (what I called 90 in
 previous email) in one run and then 32ppm in another run (with a reboot
 between).  This is similar to the problem I had with stepping disabled
 where drift would go from +500ppm in one run and then swing all the way
 to -500ppm in another run (usually with a reboot between).  I am not
 going to spend another minute troubleshooting this problem until we get
 an updated linux kernel.  I will dig into it more deeply if the new
 kernel exhibits this same drift instability.
 

 I think we are talking about two different bugs here.

 The different drifts on reboot are due to a quirk in the tsc
 calibration code in the kernal.  Grep your sys log for messages
 like these:
   Mar 30 21:56:23 shuksan kernel: Detected 2793.091 MHz processor.
   Mar 30 22:23:28 shuksan kernel: Detected 2793.067 MHz processor.
   Mar 30 22:42:31 shuksan kernel: Detected 2793.037 MHz processor.
   Mar 30 23:03:21 shuksan kernel: Detected 2793.085 MHz processor.
   Mar 31 00:07:37 shuksan kernel: Detected 2793.147 MHz processor.
 Those bottom bits jumping arround correspond to the different
 drift values.

 If you only have one system, you can pick one and hack your
 kernel to smash in a constant value at the right place.

 Or you can add something like this to your boot line:
   clocksource=acpi_pm
 That's assuming your hardware has acpi and whatever.

 I've been using it for a while.  I haven't noticed any quirks,
 but who knows.

   

YES!  The slight variation in measured CPU speed seems to explain my
continued drift instability (where continued means even with stepping
enabled).  I was able to retrieve four CPU speed measurements that had
corresponding NTP loop logs.  The table below shows the perfect
correlation between linux-measured CPU speed and NTP-measured drift. 
Clearly the real CPU speed is somewhere around 2000.200 MHz.


measured CPU speed  |   measured drift
(MHz)   |   (ppm)
---
  2000.153  |  -23
  2000.215  |8
  2000.321  |   61
  2000.367  |   84


As I've stated before, I don't believe the oscillator is really this
unstable, but I could be wrong.  After all, my CPU measurements varied
much more than yours, especially from one run to the next.  However, I'm
still open to the possibility that linux's approach to speed measurement
is less than perfect (at least for my version of linux).  These
measurements were on a core 2 duo (2 processors) running RedHawk linux
2.6.18.8.  Hal, can you tell me which version of linux resulted in your
list of speed measurements?

I also wonder if the use of two processors has any impact on this
behavior.  I tried forcing CPU affinity for the NTP process, but it
didn't have any effect on the measured drift value.  This means that
either there truly is no difference between CPUs (as in different
speed/frequency characteristics) or I wasn't actually moving the process
between CPUs (using /proc/pid/affinity).  I'm assuming both CPUs have
the same oscillator, so it makes sense that they would measure the same
drift.

Thanks,
Andy


___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-04-04 Thread Hal Murray

As I've stated before, I don't believe the oscillator is really this
unstable, but I could be wrong.  After all, my CPU measurements varied
much more than yours, especially from one run to the next.  However, I'm
still open to the possibility that linux's approach to speed measurement
is less than perfect (at least for my version of linux).  These
measurements were on a core 2 duo (2 processors) running RedHawk linux
2.6.18.8.  Hal, can you tell me which version of linux resulted in your
list of speed measurements?

Your crystal is probably fine.

At one point, I hacked my kernel to call the calibration routine
several times and printout the answer.  A batch of answers from
the same time (and hopefully same temperature) had the same sort
of scatter.

I'm running 2.6.23 wih a few local hacks.  2.6.19 has similar problems.


-- 
These are my opinions, not necessarily my employer's.  I hate spam.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-04-03 Thread Hal Murray

The ntp log file shows when NTP steps the time. But then the potential harm
is already done. Especially if the time moves backward, our server might
have serious trouble. Is there a log event which indicates that the time is
going to be reset in order to enable us to take appropriate action before
the actual reset?

I don't know of any way to get advanced warning when ntpd is about to
step the time.

There are command line switches to prevent stepping and to allow
one step at startup time.

The disadvantage with preventing steps is that it might take a long
time to correct the time.  But if you start with good time your clock
will never get off far enough to cause problems.

Is there a wiki page on this topic?

-- 
These are my opinions, not necessarily my employer's.  I hate spam.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-04-03 Thread Andy Helten

Hal Murray wrote:
 The ntp log file shows when NTP steps the time. But then the potential harm
 is already done. Especially if the time moves backward, our server might
 have serious trouble. Is there a log event which indicates that the time is
 going to be reset in order to enable us to take appropriate action before
 the actual reset?
 

 I don't know of any way to get advanced warning when ntpd is about to
 step the time.

 There are command line switches to prevent stepping and to allow
 one step at startup time.

 The disadvantage with preventing steps is that it might take a long
 time to correct the time.  But if you start with good time your clock
 will never get off far enough to cause problems.

 Is there a wiki page on this topic?
   

Another disadvantage with preventing steps is that it isn't really a
supported mode (because it's a tinker) and, as I've found, it doesn't
always work.  When I disable time steps on a linux 2.6.18 kernel, the
drift value goes to +/-500 and can actually swap sign from one run to
the next.  This happens even though a time step was never needed (i.e.
offset never went 128ms).  With time steps enabled the drift value
settles 90ppm (and again, no step actually occurs).

From what I've been able to piece together, this different behavior
between step/!step is probably due to the kernel time discipline being
disabled with !step, coupled with a (potential) bug in linux that forces
NTP's manual adjustments to have a granularity of 1ms (i.e. somewhere
an adjustment is rounded up or down).  I've not verified the bug is
present in my 2.6.18 linux kernel, so don't quote me on it.  One might
ask why the kernel time discipline is preemptively disabled in this
manner -- maybe there is a good reason.

Our application also does not currently handle backward time steps.  Our
workaround to the problematic !step is to realize, as others on this
list have pointed out, that a time step should never occur in a normally
functioning system.  If a step does occur, we probably have bigger
problems than those caused by the step itself, such as:  lost timer
interrupts, failing hardware, runaway process, kernel bug, NTP bug, etc.

Andy

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-03-18 Thread Venu Gopal
Hi Hal,

Can you post the Twiki URL after posting it.

Venu

Hal Murray wrote:
 So its the DISK I/O thats causing loss of ticks ?
 
 My first Red Hat system defaulted to no-DMA for IDE disks.  (Yes,
 that was a long time ago.)  With that setup, it was simple to generate
 lots of lost timer interrupts: just keep the disk busy doing reads.
 (Seeks don't count.  Read consecutive sectors.)
 
 My problem went away when I turned on DMA.
 
 I have a hack that I originally wrote for measuring disk
 transfer rates.  I'll dust it off and put it on the wiki
 if people think it will be helpful for discussions like this.
 
 It compiles on Linux.  It should be easy to get it to work on
 other *nix systems.
 

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-03-13 Thread Martin Burnicki
D.Venu Gopal wrote:
 David Woolley wrote:
 Venu Gopal wrote:
 
 Its clear that CPU is heavily loaded which might be leading to loss of
 ticks. Yet to check the DMA status for
 
 CPU loading doesn't cause lost timer interrupts. (More  precisely
 overruns.)

 
 So its the DISK I/O thats causing loss of ticks ?
 
 IDE DISK. I'll be out for about a week, after returning I'll
 give few more stats based on combination of CPU load, Disk I/O
 and Network I/O.

There have been several examples of drivers which kept interrupts disabled
for too long, so that timer ticks couldn't get through. In most cases
(AFAIK) this have been drivers for IDE disks, especially if they didn't use
DMA. 

This has happened across several operating systems (I remember Linux,
Windows, and formerly OS/2), with drivers which had not been designed
properly. So this depends on a specific version of the OS and a specific
version of a specific driver. You can not say in general that IDE drivers
cause lost timer ticks, but they are good candidates.

Unfortunately the new clock routines in the Linux kernel seem to be causing
problems sometime. This seems to be due to certain combination of a clock
module which handles a particulare timer on the mainboard and the
particular timer the implementation of which may vary by the chipset.

This is not exactly the same as lost timer ticks, but the results are
similar, i.e. the clock drift can be so large or changing so much that ntpd
fails to correct it.

Martin
-- 
Martin Burnicki

Meinberg Funkuhren
Bad Pyrmont
Germany

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-03-13 Thread Hal Murray

 So its the DISK I/O thats causing loss of ticks ?

My first Red Hat system defaulted to no-DMA for IDE disks.  (Yes,
that was a long time ago.)  With that setup, it was simple to generate
lots of lost timer interrupts: just keep the disk busy doing reads.
(Seeks don't count.  Read consecutive sectors.)

My problem went away when I turned on DMA.

I have a hack that I originally wrote for measuring disk
transfer rates.  I'll dust it off and put it on the wiki
if people think it will be helpful for discussions like this.

It compiles on Linux.  It should be easy to get it to work on
other *nix systems.

-- 
These are my opinions, not necessarily my employer's.  I hate spam.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-03-13 Thread Venu Gopal
Hal Murray wrote:
 So its the DISK I/O thats causing loss of ticks ?
 
 My first Red Hat system defaulted to no-DMA for IDE disks.  (Yes,
 that was a long time ago.)  With that setup, it was simple to generate
 lots of lost timer interrupts: just keep the disk busy doing reads.
 (Seeks don't count.  Read consecutive sectors.)
 
 My problem went away when I turned on DMA.
 
 I have a hack that I originally wrote for measuring disk
 transfer rates.  I'll dust it off and put it on the wiki
 if people think it will be helpful for discussions like this.

Ok Hal. This will be certainly helpful.

 It compiles on Linux.  It should be easy to get it to work on
 other *nix systems.
 

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-03-12 Thread D.Venu Gopal
David Woolley wrote:
 Venu Gopal wrote:
 
 Its clear that CPU is heavily loaded which might be leading to loss of 
 ticks. Yet to check the DMA status for
 
 CPU loading doesn't cause lost timer interrupts. (More  precisely
 overruns.)


So its the DISK I/O thats causing loss of ticks ?

 IDE DISK. I'll be out for about a week, after returning I'll
 give few more stats based on combination of CPU load, Disk I/O
 and Network I/O.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-03-05 Thread Maarten Wiltink
Venu Gopal [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
[...]
 So its true that when CPU load is high, kernel might be loosing ticks.
 When I repeated the same in other clients the drift was in the order of
 few milliseconds. I suppose it has something to do with the amount
 of CPU load and disk I/O when crond performs its tasks.

More often disk I/O then CPU load. And then often because DMA is disabled.
Could you check that?

Groetjes,
Maarten Wiltink


___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-03-05 Thread Venu Gopal
Hi ,

crond has only one weekly task of updating whatis database.
It is scheduled to run makewhatis with path for man pages.

Venu
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-03-05 Thread Venu Gopal
Hi,

I need to experiment a bit to give some stats on CPU load
and DISK I/O when crond tasks are run.

Venu
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-03-05 Thread Venu Gopal
[EMAIL PROTECTED] (Venu Gopal) wrote in 
news:[EMAIL PROTECTED]:

 Hi,
 
 I need to experiment a bit to give some stats on CPU load
 and DISK I/O when crond tasks are run.
 
 Venu

Hi all,

To measure the CPU load and DISK I/O I've used iostat
while running the crons weekly tasks of updatring the
whatis database. Below is the slice of the log :



avg-cpu:  %user   %nice%sys   %idle
  77.000.00   16.007.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev3-0  132.50  1032.0068.00   2064136

avg-cpu:  %user   %nice%sys   %idle
  91.000.004.504.50

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev3-0   50.50   296.00  2492.00592   4984

avg-cpu:  %user   %nice%sys   %idle
  82.000.002.00   16.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev3-0   43.50   420.00 0.00840  0

avg-cpu:  %user   %nice%sys   %idle
  99.000.001.000.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev3-01.00 4.00 4.00  8  8

avg-cpu:  %user   %nice%sys   %idle
  98.000.002.000.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev3-0   22.00 0.00  2412.00  0   4824

avg-cpu:  %user   %nice%sys   %idle
  97.500.002.500.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev3-01.00 0.00 8.00  0 16

avg-cpu:  %user   %nice%sys   %idle
  98.500.001.500.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev3-0   59.50 0.00  3340.00  0   6680

avg-cpu:  %user   %nice%sys   %idle
  98.000.002.000.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev3-01.00 0.00 8.00  0 16

avg-cpu:  %user   %nice%sys   %idle
  14.000.001.00   85.00

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev3-04.0036.00 8.00 72 16



Its clear that CPU is heavily loaded which might be 
leading to loss of ticks. Yet to check the DMA status for
IDE DISK. I'll be out for about a week, after returning I'll
give few more stats based on combination of CPU load, Disk I/O
and Network I/O.

Venu

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-03-05 Thread David Woolley
Venu Gopal wrote:

 Its clear that CPU is heavily loaded which might be 
 leading to loss of ticks. Yet to check the DMA status for

CPU loading doesn't cause lost timer interrupts. (More  precisely
overruns.)

 IDE DISK. I'll be out for about a week, after returning I'll
 give few more stats based on combination of CPU load, Disk I/O
 and Network I/O.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-03-04 Thread Venu Gopal
Hi Harlan,

Its 'HP COMPAQ DX200 MT' running RedHat-9.0 (Linux-2.4.20-8).
The previous machine is a similar one where time reset used to happen at
least
once a day.

I referred to the http://support.ntp.org/Support for troubleshooting pages.
I tried to get the system manuals but its phased out and no documentation
is available at HP/COMPAQ sites. I am trying to find the material supplied
along with the machines.

But is there a way to debug this problem ?
This has been a long standing problem ( almost 2 years ! )

Venu
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-03-04 Thread Harlan Stenn
 In article [EMAIL PROTECTED], [EMAIL PROTECTED] (Venu Gopal) writes:

Venu Hi Harlan, Its 'HP COMPAQ DX200 MT' running RedHat-9.0
Venu (Linux-2.4.20-8).  The previous machine is a similar one where time
Venu reset used to happen at least once a day.

What about:

 http://support.ntp.org/bin/view/Support/KnownOsIssues#Section_9.2.3.1.

which talks about a lost interrupt problem for that kernel.
-- 
Harlan Stenn [EMAIL PROTECTED]
http://ntpforum.isc.org  - be a member!

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-03-04 Thread Venu Gopal
Hi Harlan,

Thanks a lot for the references.

Venu
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


[ntp:questions] Time reset

2008-03-03 Thread venu gopal
Hi,

We have a time-server with about seven clients over LAN.
Only in one clients it is observed that the time is reset almost daily.
We have replaced the client with a new system. And if not daily but
the time reset has occurred once. Below is the ntp log :

2 Mar 02:33:52 ntpd[1566]: offset -0.33 sec freq 0.245 ppm error
0.20 poll 4
2 Mar 03:33:52 ntpd[1566]: offset 0.10 sec freq 0.267 ppm error
0.17poll 4
2 Mar 04:33:52 ntpd[1566]: offset -0.20 sec freq 0.232 ppm error
0.057544 poll 4
2 Mar 04:37:56 ntpd[1566]: time reset 2.220040 s
2 Mar 04:37:56 ntpd[1566]: synchronisation lost
2 Mar 04:37:56 ntpd[1566]: system event 'event_clock_reset' (0x05) status
'leap_none, sync_unspec, 15 events, event_peer/strat_chg' (0xf4)
2 Mar 04:37:56 ntpd[1566]: system event 'event_peer/strat_chg' (0x04) status
'leap_none, sync_unspec, 15 events, event_clock_reset' (0xf5)
2 Mar 04:37:57 ntpd[1566]: peer 10.10.10.10 event 'event_reach' (0x84)
status 'unreach, conf, 5 events, event_reach' (0x8054)
2 Mar 04:38:46 ntpd[1566]: peer 10.10.10.11 event 'event_reach' (0x84)
status 'unreach, conf, 5 events, event_reach' (0x8054)
2 Mar 04:38:50 ntpd[1566]: peer 10.10.10.12 event 'event_reach' (0x84)
status 'unreach, conf, 7 events, event_reach' (0x8074)
2 Mar 05:33:54 ntpd[1566]: offset -0.000327 sec freq 3.477 ppm error
0.15 poll 4
2 Mar 06:33:54 ntpd[1566]: offset -0.09 sec freq 0.246 ppm error
0.19 poll 4
2 Mar 07:33:54 ntpd[1566]: offset 0.05 sec freq 0.246 ppm error
0.14poll 4

In the third line the ppm error jumped from 0.17 to 0.057544.
Is it something to do with improper clock interrupts due to its misbehavior
?

Venu
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Time reset

2008-03-03 Thread Harlan Stenn
Venu,

What hardware and OS is this?

Have you seen the troubleshooting pages (known hardware and OS problems) at
http://support.ntp.org/Support ?
-- 
Harlan Stenn [EMAIL PROTECTED]
http://ntpforum.isc.org  - be a member!

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions