David Woolley wrote:

In article <[EMAIL PROTECTED]>,
[EMAIL PROTECTED] (probably David Mills with an IT department that is overzealous about preventing spam) wrote:


The modern NTP feedback loop is much more intricate than you report. It is represented as a hybrid phase/frequency feedback loop with a


There may be various finesses, but it is still the essentially analogue
nature of the process that causes people to complain about overshoots
and runaway frequency excursions.


state-machine driven initial frequency measurement. Details are in the


As I understand it, the initial frequency measurement is only applied
when cold started (no ntp.drift). Moreover, the perceived problem being
reported here is about the initial phase correction.  It is normal
to have to make phase corrections many times the mean phase error
on a restart, even though it isn't normal to have to do a signficant
frequency correction.


There are lots of nasty little approximations in the PLL/FLL code due to imprecise measurement of some time intervals. While the design targe for overshoot is 5-6 percent, I would not be surprised if in some cases it is 10 percent.


I think the problem here is that a human trying to manually control the
effective frequency might have overshot by only 0.1%.  They would have
slewed the phase in at the maximum acceptable rate and then made a
step change in frequency at the moment they crossed a measured phase
error of zero, stepping by minus the average rate of phase change during the slew in. Only then would they start operating anything like the current algorithm.

What they are seeing is 10% of the original error after about an hour,
when they know that they could have achieved 0.1% in under 10 minutes,
assuming a 500ppm slew rate limit.  (They'd probably need some automation
to time the transition accurately enough to get to 100 microseconds, as
assumed here.)

The best way of implementing this is probably to provide the system with
memory about the likely phase measurement noise, but a simpler approach
of detecting the first zero crossing would probably work quite well.


I believe that Dave Mills has already explained that the problem is due to changes in the adjtime() routine in both Sun Solaris and Unix. This being the case, the choices would seem to be:
a. Live with it.
b. Get Sun and the Linux developers to back out the change to adjtime() that broke ntpd. c. Provide a custom adjtime() for each platform affected. I suspect that the routine in question runs in kernel mode and may be part of the kernel so that this may be easier said than done!

_______________________________________________
questions mailing list
[email protected]
https://lists.ntp.isc.org/mailman/listinfo/questions

Reply via email to