Re: [ntp:questions] testing slew only mode (-x), not slewing correctly (linux sles10, ntpd v 4.1.1)

Brian Utterback Sat, 24 Oct 2009 10:45:18 -0700

Unruh wrote:

> Utterback was, I believe, claiming that on some systems, the rtc would
> be used not just at startup, but while the system was running in order
> to keep the time. Ie, lets say 3 hours after the system has started up,
> the system reads the rtc and uses that information in the delivery of
> the system time. I was wondering on which systems that occured?
> Startup is obviously when the rtc is used, since there is no other
> possible source of the time. 
> 
>


Sorry I wasn't clearer. Solaris specifically interacts with the 
TOD/RTC battery backed clock, even after boot up. This is an artifact 
of the original system design, where system ticks could be lost under 
certain conditions. Since tick interrupts did not queue, the clock 
would lose ticks unpredictably and the TOD was checked once per second 
to see if the clock dropped behind. Because the TOD only reads integer 
seconds and the kernel clock is in nanoseconds, the only way to be 
sure that the TOD time and the kernel time is at least 1 second 
different is if they are at least 2 seconds apart. So, if the kernel 
clock and the TOD get 2 seconds of more apart, the kernel is set to 
match the TOD. This means that a Solaris system, just sitting idle, 
minding its own business and without NTP running, could step the clock 
back 2 seconds. However, due to customer complaints, the code was 
changed so the adjustment is done by slewing now, so the clock never 
steps, just slews and the clock doesn't jump back.

Unfortunately, two things have changed since all this was implemented. 
One was the introduction of cyclic timers in Solaris 8. The ticks were 
converted to use the cyclics, so ticks were no longer lost. The other 
(of course) was the introduction of NTP in Solaris 2.6. Needless to 
say, NTP gets upset if the clock is stepped by the TOD.

However, to mitigate this problem, the good news is that whenever the 
kernel clock is set or adjusted for any reason, the TOD is written, so 
in theory the two should never, ever get more than a small amount 
apart. The bad news is that as you get the correct frequency 
adjustment in the kernel, the longer between actual adjustments NTP 
will take, meaning that if there is no adjustment for long enough, 
eventually you might reach a point that the TOD drifts enough to cause 
the clock to slew, confusing NTP no end. Another problem is that not 
all TOD's behave well. There have been hardware bugs that have caused 
intermittent TOD read errors, and of course the battery might go dead 
and cause to the TOD to behave erratically as it dies.

It is possible to disable this interaction with the TOD clock, but 
unfortunately this also prevents the writing of the TOD when the clock 
is adjusted. That means that for a system that is up and running for 
months or years, the TOD might get arbitrarily far off from the real 
time, which could bit you when the system reboots.

The code that does all this is rather difficult to puzzle out, which 
lead me to file Solaris bug 4514730 "dosynctodr code has structure 
similar to game of fizzbin".

Hope that clears it all up.

Brian Utterback

_______________________________________________
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions

Re: [ntp:questions] testing slew only mode (-x), not slewing correctly (linux sles10, ntpd v 4.1.1)

Reply via email to