Bob, the system clock may be broken for SMP, or your hardware may be broken. Maybe also, your sofware config (speed-step technology, ACPI throttling, power now, etc) is incompatible with NTP.
If you want to experiment, there's a set of patches by John Stultz (IBM I think) for Linux 2.6 that implement a completely new system clock. (experimental status right now) Regards, Ulrich [EMAIL PROTECTED] (Bob Robison) writes: > I'm running a moderate number (around 50) dual-opterons that are > diskless booting a Linux 2.6.12 smp kernel and trying to synch with a > Symmetricon XLI-GPS stratum-1 NTP server on an isolated network. > > The problem I have is that when I run "ntpq -c peers" on a number of > these machines to check the status of the ntp synchronization, I see > offsets ranging over almost 1000 msecs. If I grep through the /var/log/ > messages file, I see that there are often messages around every 20 > minutes like this: > > Dec 1 20:30:28 (none) ntpd[27203]: time reset 0.613771 s > Dec 1 20:30:28 (none) ntpd[27203]: synchronisation lost > Dec 1 20:50:45 (none) ntpd[27203]: time reset 0.931388 s > Dec 1 20:50:45 (none) ntpd[27203]: synchronisation lost > Dec 1 21:19:23 (none) ntpd[27203]: time reset 0.451491 s > Dec 1 21:19:23 (none) ntpd[27203]: synchronisation lost > Dec 1 21:36:24 (none) ntpd[27203]: time reset 0.391510 s > Dec 1 21:36:24 (none) ntpd[27203]: synchronisation lost > > This seems like large (and frequent) steps to be occuring. I have a > fairly simple ntp.conf file: > --------------------------------- > restrict default ignore > restrict 10.2.40.1 mask 255.255.255.255 nomodify notrap noquery > restrict 127.0.0.1 > > server 10.2.40.1 iburst > server 127.127.1.0 iburst # local clock > fudge 127.127.1.0 stratum 5 # default was 10 > > driftfile /var/lib/ntp/drift > ---------------------------------- > > These machines each have a Gigabit network connection to a high-end > network switch. I believe the NTP Server probably has only a 100MBit > link, and he has all the traffic, but I don't think that is the > problem. > > Probably the main issue is the CPU and I/O loading on these opteron > machines. They are each handling streaming data from a firewire card > (IEEE-1394a) and the CPUs stay fairly busy handling that data -- though > they are not pegged at 100% or anything. > > Here is a typical ntpq output: > ntpq> as > ind assID status conf reach auth condition last_event cnt > =========================================================== > 1 48644 9634 yes yes none sys.peer reachable 3 > 2 48645 9034 yes yes none reject reachable 3 > ntpq> rv 48644 > status=9634 reach, conf, sel_sys.peer, 3 events, event_reach, > srcadr=ntpserv, srcport=123, dstadr=10.1.1.1, dstport=123, leap=00, > stratum=1, precision=-9, rootdelay=0.000, rootdispersion=5.554, > refid=GPSM, reach=377, unreach=0, hmode=3, pmode=4, hpoll=7, ppoll=7, > flash=00 ok, keyid=0, offset=360.879, delay=2.544, dispersion=3.803, > jitter=6.636, reftime=c739efcd.cf993b0f Thu, Dec 1 2005 21:55:25.810, > org=c739efde.6ea22848 Thu, Dec 1 2005 21:55:42.432, > rec=c739efde.1292f6e8 Thu, Dec 1 2005 21:55:42.072, > xmt=c739efde.0c8ede54 Thu, Dec 1 2005 21:55:42.049, > filtdelay= 2.54 4.42 2.50 2.98 2.55 2.61 2.44 > 2.68, > filtoffset= 360.88 354.24 412.02 412.20 464.11 -95.25 > -78.39 -56.90, > filtdisp= 1.96 3.90 5.82 7.77 9.70 > 11.62 12.61 13.57 > > If anyone has any suggestions about what might be happening, or how to > keep these guys synched up more tightly, I would certainly appreciate > it. I've dug around through FAQs, Wiki's, Docs, etc... but not sure > exactly why my time is bouncing around so much. > > thanks in advance, > bob > -- > Bob Robison [EMAIL PROTECTED] > Staff Engineer 210-522-3935 > Southwest Research Institute San Antonio, TX > _______________________________________________ > questions mailing list > [email protected] > https://lists.ntp.isc.org/mailman/listinfo/questions _______________________________________________ questions mailing list [email protected] https://lists.ntp.isc.org/mailman/listinfo/questions
