In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (Bob Robison) writes: >I'm running a moderate number (around 50) dual-opterons that are >diskless booting a Linux 2.6.12 smp kernel and trying to synch with a >Symmetricon XLI-GPS stratum-1 NTP server on an isolated network. > >The problem I have is that when I run "ntpq -c peers" on a number of >these machines to check the status of the ntp synchronization, I see >offsets ranging over almost 1000 msecs. If I grep through the /var/log/ >messages file, I see that there are often messages around every 20 >minutes like this: > >Dec 1 20:30:28 (none) ntpd[27203]: time reset 0.613771 s >Dec 1 20:30:28 (none) ntpd[27203]: synchronisation lost >Dec 1 20:50:45 (none) ntpd[27203]: time reset 0.931388 s >Dec 1 20:50:45 (none) ntpd[27203]: synchronisation lost >Dec 1 21:19:23 (none) ntpd[27203]: time reset 0.451491 s >Dec 1 21:19:23 (none) ntpd[27203]: synchronisation lost >Dec 1 21:36:24 (none) ntpd[27203]: time reset 0.391510 s >Dec 1 21:36:24 (none) ntpd[27203]: synchronisation lost
Somebody else suggested lost interrupts. That would be pretty high on my list. What happens if you let one of the systems just sit there without doing anything? If it keeps good time your problem is probably caused by your normal workload. > Probably the main issue is the CPU and I/O loading on these opteron > machines. They are each handling streaming data from a firewire card > (IEEE-1394a) and the CPUs stay fairly busy handling that data -- though > they are not pegged at 100% or anything. The issue is not so much if you are using all the CPU, but if the clock update interrupt routine is being locked out long enough to miss an interrupt because the second one comes in before the first one has been processed. If I was trying to understand this, I'd consider patching the firewire interrupt routine to turn on a printer port bit at the start and turn it off at the end, and then put a scope on that pin to see how long it was on. Most modern (digital) scopes have a trigger on X longer then Y mode that will show you the bad cases. Or do it all in software by grabbing the cycle counter and making a histogram. -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam. _______________________________________________ questions mailing list [email protected] https://lists.ntp.isc.org/mailman/listinfo/questions
