I have a couple of machines on the same LAN behind consumer broadband
NAT running ntp 4.2.4p6.

One is running Windows XP, the other Windows Vista, both 32-bit OSes.
The Windows XP machine has done well with NTP, usually below 10ms
offset.  The Vista machine, however, has spasmed with spikes of plus
or minus 30-90ms in the offset with a wandering frequency drift
estimate.

I've taken a blunt object to nt_clockstuff.c and was able to
dramatically improve the performance on Vista by detecting its 1ms
system time granularity (vs 10-15ms on earlier Windows) and switching
QueryPerformanceCounter-based interpolation off at runtime, relying
solely on GetSystemTimeAsFileTime to satisfy higher-level NTP requests
for the current time.

I've posted three loopstats graphs that illustrate the difference on:

http://davehart.net/ntp/vista/blunt/index.html

The downside (and why I call it a blunt object) is that
QueryPerformanceCounter-based interpolation gives NTP a high-quality
system clock reference, which this heuristic workaround tosses out in
favor of a NTP system clock that steps in 1ms granularity.

It would be nice to find a new interpolation strategy that doesn't
break down in the face of a millisecond-grained system clock.  The old
approach is to use a high-priority thread to run a routine 1000 times
per second which looks at the current system time (passed to it
conveniently by the OS waitable timer interface, saving a system
call).  If the system time has changed since the last go-round a
millisecond before, a snapshot is taken of the current
QueryPerformanceCounter value and stowed away with the current system
time expressed in 0.1us units.  So with pre-Vista Windows, 9/10ths or
14/15ths of the 1000Hz timer callbacks would compare two 32-bit
counters, find them equal, and be done.  Now Vista is updating the
system time every millisecond, so our timer thread is establishing a
new system time to performance counter equivalence baseline and
attempting to acquire a lock to update those stored global values
every millisecond as well.  Apparently lock contention is what is
causing NTP to fall down so badly on Vista.  Presumably, Win7 will be
similarly affected.

There are tweaks that could be done to keep the model nearly unchanged
and reduce the lock contention, but I question whether it really makes
sense to re-establish that baseline 1000 times per second.  The
waitable timers used can't fire any more frequently than NTP is using
now, and the intended goal of catching the tick of the system clock is
unattainable, even if you solve the contention issue.

How often do we need to establish the correlation between system time
and QueryPerformanceCounter values to maintain high-quality
interpolation?  I suspect infrequently enough that the high-priority
timer thread can still be done away with.

Feedback solicited
(cross posted to comp.protocols.time.ntp and hack...@lists.ntp.org)

_______________________________________________
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions

Reply via email to