Hi, Thank you all for your replies, sorry I have not got back with comments before, got involved in other things at work etc.
Having got my hands on another 8 units from production I set them all up in a rack. Of the 8 units 4 of them locked up when first synchronised to ntp, with the drift file being stuck at -500. These 4 units where then left for 48 hours and, they where still in the same condition that was: - -- Drift file read between -495 and -500, there seemed to be a small changes over time but mostly -500 -- Using ntpq -p to monitor the lock, you could see the time initially being within 15ms of the server, then over a few minutes make its way up to an offset of 500ms, when a step change was made and the hole process started all over again. -- The good units where locked to the same time server and where within +/- 15ms or better, the server is local but locked to PPS ntp servers over the internet ... not perfect as the delay and jitter changes through the day as internet usage changes. Have seen then locked to within 2ms when used on a customer's network with a local GPS controlled server. After 48 hours of being in this stuck condition I logged in and stopped the ntp process, cleared the drift file back to 0, used ntpdate to set the system time to that on the server and re-started the ntp process. (The drift file will always start at zero on these systems as they operate from read only flash, the /var directory is constructed in /dev/shm at boot) After 3 hours or so I checked them, and all 8 of the systems where locked to the server, with drift files ranging from about -20 to -75, so all the processor clocks don't see to fare off. Now from this I can only conclude its a problem with the control algorithm in ntp, that under some start conditions gets stuck applying the maximum compensation. If it were a problem with the hardware I would have expected it to still be there after stopping and re-starting the ntp process. If a transient problem at power up with the hardware why still stuck after 48 hours. Looking at the differences between the systems more I found one of them that had initially failed to lock had a completely dead CMOS battery. Apparently production had received a batch of 100 dead batteries, and this was one that slipped through the net. Looking at all the others turned up one other dead battery, but that was form one that locked Ok. All of the Real time clocks in the units where set to some time in December 2008, having never been set, the BIOS was written around then so it seems that's the time they start with then the battery is first fitted. This is in part to do with the way the units are used with a read only file system, they are designed to just be turned off rather than shut down, so the system time never gets written to the RTC, as I understand this is normally done during shutdown. (In normal use these units will be on all the time) I have since tried to reproduce the lockup with these units, returning there flash drive to the production image and starting all over again, with the RTC again set to different times in the past and future. Non of these efforts has so far reproduced the right conditions, they always seem to lock with no problems at all. I have seen one other unit recently that locked up, that was a unit being used by one of the software development engineers, who had been on holiday for two weeks and when he returned and powered his unit up, it failed to lock. Unfortunately I did not get to examine it before he had shut it down and re-booted so don't know the status of the RTC when it was booted, the sundown had set it to the server time. So may be some merit in testing a few more systems that have been off for a while and see what happens? Think my next step is to get the software, or may be just a cron script to a) Set the RTC to the system time once locked to NTP b) If the drift file is -500, stop ntp, set time to the server time, restart ntp with zeroed drift file. PS systems are running Linux version 2.6.22.18-0.2-default (ge...@buildhost) (gcc version 4.2.1 (SUSE Linux)) Dave _______________________________________________ questions mailing list [email protected] https://lists.ntp.org/mailman/listinfo/questions
