Re: [ntp:questions] quirky adjtimex behaviour [SOLVED]
Hello Dean and Hal, On Tuesday, January 22, 2008 at 1:08:00 +, Dean S. Messing wrote: hal-usenet wrote: try changing the code that reads the CMOS clock to spin in a loop reading it until it changes. That will give you the time early in the second. The adjtimex code is already designed to detect the exact beginning of an RTC second. Either via the /dev/rtc update-ended interrupt, or by busywaiting for the fall of the update-in-progress (UIP) flag. But nevertheless your analysis of facts seems good, Hal: This tick synchronisation probably fails for some unknown reason in Dean's case. I just replaced version 1.23 of adjtimex with an old version 1.20 and the quirky behaviour disappeared. I first noticed it on my new Fedora 7 with version 1.21. Interesting: adjtimex 1.21 was the first version using by default the /dev/rtc interrupt to detect the clock beat. The problem might be there. Adjtimex 1.23 has an option to force the UIP method: does it show the quirky offsets? | # adjtimex --utc --compare=20 --interval=10 --directisa Anyway the default /dev/rtc method is preferable. The 1.23 debug output may reveal what's up with your interrupts: | # adjtimex --utc --compare=1 --verbose Serge. -- Serge point Bets arobase laposte point net ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] quirky adjtimex behaviour [SOLVED]
hal-usenet wrote: Dean Messing wrote: I am seeing strange behaviour on my _x86_64 Fedora 7 desktop workstation with regard to the system-cmos time that `adjtimex' reports. snip It seems that leaves two other possibilities: a bug in adjtimex or a bug in the kernel. That's where I am right now. My guess is that the system/kernel is working correctly and that the adjtimex utility is printing out misleading stuff. The CMOS/hardware clock only returns the time to the nearest second. I think that would cause quirks like this if the code has a loop that does a bit of work and sleeps for N seconds and the bit of work takes 0.1 second the time when the CMOS clock is read will drift by 0.1 second each time around the loop. If you want to play and you can find the source, try changing the code that reads the CMOS clock to spin in a loop reading it until it changes. That will give you the time early in the second. Your guess is right, Hal. It's been nearly three weeks since I've had a few minutes to further pursue this. I just replaced version 1.23 of adjtimex with an old version 1.20 and the quirky behaviour disappeared. I first noticed it on my new Fedora 7 with version 1.21. When I looked on the adjtimex site I saw it was up to 1.23 so I thought that surely this problem has been detected and fixed. When it didn't go away in 1.23 I looked elsewhere: 64 bit machine, new kernel, c. I'll write the author and report the bug. I'm really surprised nobody has reported it already. Dean ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] quirky adjtimex behaviour
Hi Dean. Dean S. Messing wrote: Can I however suggest that you first try and eliminate CPU frequency scaling as a cause of the symptoms you're seeing: use cpufreq-set -g to select a policy that results in a constant CPU frequency and then check if this changes the behaviour (or renders it more predictable). ... analyzing CPU 0: no or unknown cpufreq driver is active on this CPU ... OK, this eliminates CPU frequency scaling as the cause of your problem. Sorry to have sent you off on a tangent; from recent experience this seemed like a promising low-hanhing fruit (but it turned out not to be). analyzing CPU 1: no or unknown cpufreq driver is active on this CPU analyzing CPU 2: no or unknown cpufreq driver is active on this CPU analyzing CPU 3: no or unknown cpufreq driver is active on this CPU ... If you or others wouldn't mind reading my whole original post (it's not _that_ long :-) maybe some other ideas might occur. Thanks. Sorry, I haven't a clue. Also note that I don't have any experience with SMP at all (let alone timekeeping on SMP machines). I'm very interested in this subject, but I've never been able to justify the hardware cost just so that I could play around with this. Cheers, Jan ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] quirky adjtimex behaviour
I am seeing strange behaviour on my _x86_64 Fedora 7 desktop workstation with regard to the system-cmos time that `adjtimex' reports. That leaves the RTC doing the jumping. But having an RTC that is runing nearly 1 ppm slower than my system clock and which jumps ahead every 10 seconds seems absurd. It seems that leaves two other possibilities: a bug in adjtimex or a bug in the kernel. That's where I am right now. My guess is that the system/kernel is working correctly and that the adjtimex utility is printing out misleading stuff. The CMOS/hardware clock only returns the time to the nearest second. I think that would cause quirks like this if the code has a loop that does a bit of work and sleeps for N seconds and the bit of work takes 0.1 second the time when the CMOS clock is read will drift by 0.1 second each time around the loop. If you want to play and you can find the source, try changing the code that reads the CMOS clock to spin in a loop reading it until it changes. That will give you the time early in the second. -- These are my opinions, not necessarily my employer's. I hate spam. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] quirky adjtimex behaviour
Dean, Dean S. Messing wrote: I am seeing strange behaviour on my _x86_64 Fedora 7 desktop workstation with regard to the system-cmos time that `adjtimex' reports. I've not read your whole post; it's clear that you've been wrestling with this problem for a while and have done quite a bit of work already. Can I however suggest that you first try and eliminate CPU frequency scaling as a cause of the symptoms you're seeing: use cpufreq-set -g to select a policy that results in a constant CPU frequency and then check if this changes the behaviour (or renders it more predictable). HTH. Jan ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
Re: [ntp:questions] quirky adjtimex behaviour
Hi Jan, all. Jan Ceuleers wrote: Dean S. Messing wrote: I am seeing strange behaviour on my _x86_64 Fedora 7 desktop workstation with regard to the system-cmos time that `adjtimex' reports. I've not read your whole post; it's clear that you've been wrestling with this problem for a while and have done quite a bit of work already. Well, I've done what I can but I'm really no expert on this stuff. That's why I wrote to this list, which seems to be populated by _many_ very knowledgeable people. Can I however suggest that you first try and eliminate CPU frequency scaling as a cause of the symptoms you're seeing: use cpufreq-set -g to select a policy that results in a constant CPU frequency and then check if this changes the behaviour (or renders it more predictable). I installed the cpufreq-utils package. The result of `cpufreq-info' is: [EMAIL PROTECTED] ~]# cpufreq-info cpufrequtils 002: cpufreq-info (C) Dominik Brodowski 2004-2006 Report errors and bugs to [EMAIL PROTECTED], please. analyzing CPU 0: no or unknown cpufreq driver is active on this CPU analyzing CPU 1: no or unknown cpufreq driver is active on this CPU analyzing CPU 2: no or unknown cpufreq driver is active on this CPU analyzing CPU 3: no or unknown cpufreq driver is active on this CPU Also /sys/devices/system/cpu/cpu{0,1,2,3}/cpufreq/ does not exist on this system. I don't know much about cpufreq adjustments. Should I be looking elsewhere? Note that this is a desktop workstation. Will the cpufreq (actually there are four CPUs in two dual-core units) change on such a machine? If you or others wouldn't mind reading my whole original post (it's not _that_ long :-) maybe some other ideas might occur. Thanks. ___ questions mailing list questions@lists.ntp.org https://lists.ntp.org/mailman/listinfo/questions
[ntp:questions] quirky adjtimex behaviour
First, apologies if this is the wrong list for this. Please direct my to the right place if it is. I am seeing strange behaviour on my _x86_64 Fedora 7 desktop workstation with regard to the system-cmos time that `adjtimex' reports. Below is an illustration: [EMAIL PROTECTED] ~]# adjtimex --utc --compare=20 --interval=10 --- current --- -- suggested -- cmos time system-cmos error_ppm tick freqtick freq 1199380337 0.438974 1199380346 0.54042210144.8 10001 395 1199380355 0.64238210196.0 10001 3959899 4212512 1199380364 0.74433610195.4 10001 3959899 4251575 1199380373 0.84529310095.7 10001 3959900 4230787 1199380382 0.94625910096.6 10001 3959900 4172975 1199380392 0.047206 -89905.3 10001 395 10900 4297975 1199380401 0.14916610196.0 10001 3959899 4210950 1199380410 0.25113410196.8 10001 3959899 4160950 1199380419 0.35309610196.2 10001 3959899 4198450 1199380428 0.45505510195.9 10001 3959899 4218762 1199380437 0.55700410194.9 10001 3959899 4284387 1199380446 0.65897310196.9 10001 3959899 4153137 1199380455 0.75992510095.2 10001 3959900 4265162 1199380464 0.86088910096.4 10001 3959900 4185475 1199380473 0.96184810095.9 10001 3959900 4218287 1199380483 0.063806 -89804.2 10001 395 10899 4225012 1199380492 0.16575910195.3 10001 3959899 4257825 1199380501 0.26771910196.0 10001 3959899 4212512 1199380510 0.36968210196.3 10001 3959899 4192200 [EMAIL PROTECTED] ~]# As you can see, the system time appears to advance by almost exactly 0.1 seconds every 10 seconds relative to the RTC. Then, just as they are about to get out of phase by 1 second, something causes either the system clock to jump back by ~1 second or the RTC to jump forward, or so it appears. Furthermore if I change --interval to something odd (like 17) the delta from line to line remains about the same at 0.1 sec, which it should not if there was a real slew occurring: [EMAIL PROTECTED] ~]# adjtimex --utc --compare=10 --interval=17 --- current --- -- suggested -- cmos time system-cmos error_ppm tick freqtick freq 1199380633 0.540237 1199380649 0.642055 5989.3 10001 395 1199380665 0.743996 5996.5 10001 3959941 4177948 1199380681 0.845918 5995.4 10001 3959941 4250558 1199380697 0.947846 5995.8 10001 3959941 4227580 1199380714 0.048774 -52886.6 10001 395 10530 3070784 1199380730 0.150698 5995.5 10001 3959941 4243205 1199380746 0.252637 5996.4 10001 3959941 4185301 1199380762 0.354556 5995.2 10001 3959941 4261588 1199380778 0.456482 5995.6 10001 3959941 4235852 From my investigations, the system time is _not_ advancing faster than UTC. In fact: [EMAIL PROTECTED] ~]# ntpdate -q montpelier.ilan.caltech.edu server 192.12.19.20, stratum 1, offset -0.001267, delay 0.05643 3 Jan 09:24:28 ntpdate[18831]: adjust time server 192.12.19.20 offset -0.00126\ 7 sec so my system is currently about 1 ms ahead the Stratum 1 server, montpelier.ilan.caltech.edu. This offset is nearly constant over several minutes. Also, if I execute the command at several random times over the period of a minute, the offset only fluctuates by 1/2 a ms or so. Conclusion: my system clock is not jumping around. That leaves the RTC doing the jumping. But having an RTC that is runing nearly 1 ppm slower than my system clock and which jumps ahead every 10 seconds seems absurd. In fact two results seem to prove that the RTC is running smoothly: -- If I look at the seconds digit changing w/in the desktop BIOS (pre-boot) it do not change its relative phase w.r.t. the displayed system clock on my laptop screen as I hold the latter next to the BIOS clock. (A 10% second retard would be quite visible, as would the jump. -- The delta from line to line in the `adjtimex' output remains at ~0.1 no matter the --inverval used. This shd. not be the case if there was a true slew between the system clcok and RTC. It seems that leaves two other possibilities: a bug in adjtimex or a bug in the kernel. That's where I am right now. For reference here's a few lines of output from the laptop (running Fedora 6, kernel 2.6.20, and being an i386 32-bit machine): [EMAIL PROTECTED] ~]# adjtimex --utc --compare=20 --interval=17 --- current --- -- suggested -- cmos time system-cmos error_ppm tick