Re: [ntp:questions] quirky adjtimex behaviour [SOLVED]

2008-01-28 Thread Serge Bets
Hello Dean and Hal,

 On Tuesday, January 22, 2008 at 1:08:00 +, Dean S. Messing wrote:

 hal-usenet wrote:
 try changing the code that reads the CMOS clock to spin in a loop
 reading it until it changes.  That will give you the time early in
 the second.

The adjtimex code is already designed to detect the exact beginning of
an RTC second. Either via the /dev/rtc update-ended interrupt, or by
busywaiting for the fall of the update-in-progress (UIP) flag. But
nevertheless your analysis of facts seems good, Hal: This tick
synchronisation probably fails for some unknown reason in Dean's case.


 I just replaced version 1.23 of adjtimex with an old version 1.20 and
 the quirky behaviour disappeared.  I first noticed it on my new
 Fedora 7 with version 1.21.

Interesting: adjtimex 1.21 was the first version using by default the
/dev/rtc interrupt to detect the clock beat. The problem might be there.
Adjtimex 1.23 has an option to force the UIP method: does it show the
quirky offsets?

| # adjtimex --utc --compare=20 --interval=10 --directisa

Anyway the default /dev/rtc method is preferable. The 1.23 debug output
may reveal what's up with your interrupts:

| # adjtimex --utc --compare=1 --verbose


Serge.
-- 
Serge point Bets arobase laposte point net

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] quirky adjtimex behaviour [SOLVED]

2008-01-21 Thread Dean S. Messing

hal-usenet wrote:
 Dean Messing wrote:
 I am seeing strange behaviour on my _x86_64 Fedora 7 desktop
 workstation with regard to the system-cmos time that `adjtimex'
 reports.
 snip
 
 It seems that leaves two other possibilities: a bug in adjtimex or a
 bug in the kernel.  That's where I am right now.
 
 My guess is that the system/kernel is working correctly and that
 the adjtimex utility is printing out misleading stuff.
 
 The CMOS/hardware clock only returns the time to the nearest second.
 I think that would cause quirks like this if the code has a loop that
 does a bit of work and sleeps for N seconds and the bit of work
 takes 0.1 second the time when the CMOS clock is read will drift
 by 0.1 second each time around the loop.
 
 If you want to play and you can find the source, try changing
 the code that reads the CMOS clock to spin in a loop reading
 it until it changes.  That will give you the time early in the
 second.

Your guess is right, Hal.

It's been nearly three weeks since I've had a few minutes to further
pursue this.  I just replaced version 1.23 of adjtimex with an old
version 1.20 and the quirky behaviour disappeared.  I first noticed it
on my new Fedora 7 with version 1.21.

When I looked on the adjtimex site I saw it was up to 1.23 so I
thought that surely this problem has been detected and fixed.  When it
didn't go away in 1.23 I looked elsewhere: 64 bit machine, new kernel,
c.

I'll write the author and report the bug.  I'm really surprised nobody
has reported it already.

Dean 
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] quirky adjtimex behaviour

2008-01-05 Thread Jan Ceuleers
Hi Dean.

Dean S. Messing wrote:
 Can I however suggest that you first try and eliminate CPU frequency 
 scaling as a cause of the symptoms you're seeing: use cpufreq-set -g to 
 select a policy that results in a constant CPU frequency and then check 
 if this changes the behaviour (or renders it more predictable).
...
 analyzing CPU 0:
   no or unknown cpufreq driver is active on this CPU
...

OK, this eliminates CPU frequency scaling as the cause of your problem.

Sorry to have sent you off on a tangent; from recent experience this 
seemed like a promising low-hanhing fruit (but it turned out not to be).

 analyzing CPU 1:
   no or unknown cpufreq driver is active on this CPU
 analyzing CPU 2:
   no or unknown cpufreq driver is active on this CPU
 analyzing CPU 3:
   no or unknown cpufreq driver is active on this CPU
...
 If you or others wouldn't mind reading my whole original post (it's
 not _that_ long :-) maybe some other ideas might occur.  Thanks.

Sorry, I haven't a clue. Also note that I don't have any experience with 
SMP at all (let alone timekeeping on SMP machines). I'm very interested 
in this subject, but I've never been able to justify the hardware cost 
just so that I could play around with this.

Cheers, Jan

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] quirky adjtimex behaviour

2008-01-05 Thread Hal Murray

I am seeing strange behaviour on my _x86_64 Fedora 7 desktop
workstation with regard to the system-cmos time that `adjtimex'
reports.

That leaves the RTC doing the jumping.  But having an RTC that is
runing nearly 1 ppm slower than my system clock and which jumps
ahead every 10 seconds seems absurd.

It seems that leaves two other possibilities: a bug in adjtimex or a
bug in the kernel.  That's where I am right now.

My guess is that the system/kernel is working correctly and that
the adjtimex utility is printing out misleading stuff.

The CMOS/hardware clock only returns the time to the nearest second.
I think that would cause quirks like this if the code has a loop that
does a bit of work and sleeps for N seconds and the bit of work
takes 0.1 second the time when the CMOS clock is read will drift
by 0.1 second each time around the loop.

If you want to play and you can find the source, try changing
the code that reads the CMOS clock to spin in a loop reading
it until it changes.  That will give you the time early in the
second.


-- 
These are my opinions, not necessarily my employer's.  I hate spam.

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] quirky adjtimex behaviour

2008-01-04 Thread Jan Ceuleers
Dean,

Dean S. Messing wrote:
 I am seeing strange behaviour on my _x86_64 Fedora 7 desktop
 workstation with regard to the system-cmos time that `adjtimex'
 reports.

I've not read your whole post; it's clear that you've been wrestling 
with this problem for a while and have done quite a bit of work already.

Can I however suggest that you first try and eliminate CPU frequency 
scaling as a cause of the symptoms you're seeing: use cpufreq-set -g to 
select a policy that results in a constant CPU frequency and then check 
if this changes the behaviour (or renders it more predictable).

HTH.

Jan

___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] quirky adjtimex behaviour

2008-01-04 Thread Dean S. Messing

Hi Jan, all.

Jan Ceuleers wrote:
 Dean S. Messing wrote:
  I am seeing strange behaviour on my _x86_64 Fedora 7 desktop
  workstation with regard to the system-cmos time that `adjtimex'
  reports.
 
 I've not read your whole post; it's clear that you've been wrestling 
 with this problem for a while and have done quite a bit of work already.

Well, I've done what I can but I'm really no expert on this stuff.
That's why I wrote to this list, which seems to be populated by _many_
very knowledgeable people.

 Can I however suggest that you first try and eliminate CPU frequency 
 scaling as a cause of the symptoms you're seeing: use cpufreq-set -g to 
 select a policy that results in a constant CPU frequency and then check 
 if this changes the behaviour (or renders it more predictable).

I installed the cpufreq-utils package.
The result of `cpufreq-info' is:

[EMAIL PROTECTED] ~]# cpufreq-info
cpufrequtils 002: cpufreq-info (C) Dominik Brodowski 2004-2006
Report errors and bugs to [EMAIL PROTECTED], please.
analyzing CPU 0:
  no or unknown cpufreq driver is active on this CPU
analyzing CPU 1:
  no or unknown cpufreq driver is active on this CPU
analyzing CPU 2:
  no or unknown cpufreq driver is active on this CPU
analyzing CPU 3:
  no or unknown cpufreq driver is active on this CPU

Also /sys/devices/system/cpu/cpu{0,1,2,3}/cpufreq/ does not exist on
this system.  I don't know much about cpufreq adjustments.  Should I
be looking elsewhere?  Note that this is a desktop workstation.  Will
the cpufreq (actually there are four CPUs in two dual-core units)
change on such a machine?

If you or others wouldn't mind reading my whole original post (it's
not _that_ long :-) maybe some other ideas might occur.  Thanks.


___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


[ntp:questions] quirky adjtimex behaviour

2008-01-03 Thread Dean S. Messing
First, apologies if this is the wrong list for this.
Please direct my to the right place if it is.

I am seeing strange behaviour on my _x86_64 Fedora 7 desktop
workstation with regard to the system-cmos time that `adjtimex'
reports.

Below is an illustration:

[EMAIL PROTECTED] ~]# adjtimex --utc --compare=20 --interval=10
  --- current ---   -- suggested --
cmos time system-cmos  error_ppm   tick  freqtick  freq
1199380337   0.438974
1199380346   0.54042210144.8  10001   395
1199380355   0.64238210196.0  10001   3959899   4212512
1199380364   0.74433610195.4  10001   3959899   4251575
1199380373   0.84529310095.7  10001   3959900   4230787
1199380382   0.94625910096.6  10001   3959900   4172975
1199380392   0.047206   -89905.3  10001   395   10900   4297975
1199380401   0.14916610196.0  10001   3959899   4210950
1199380410   0.25113410196.8  10001   3959899   4160950
1199380419   0.35309610196.2  10001   3959899   4198450
1199380428   0.45505510195.9  10001   3959899   4218762
1199380437   0.55700410194.9  10001   3959899   4284387
1199380446   0.65897310196.9  10001   3959899   4153137
1199380455   0.75992510095.2  10001   3959900   4265162
1199380464   0.86088910096.4  10001   3959900   4185475
1199380473   0.96184810095.9  10001   3959900   4218287
1199380483   0.063806   -89804.2  10001   395   10899   4225012
1199380492   0.16575910195.3  10001   3959899   4257825
1199380501   0.26771910196.0  10001   3959899   4212512
1199380510   0.36968210196.3  10001   3959899   4192200
[EMAIL PROTECTED] ~]# 

As you can see, the system time appears to advance by almost exactly
0.1 seconds every 10 seconds relative to the RTC.  Then, just as they
are about to get out of phase by 1 second, something causes either the
system clock to jump back by ~1 second or the RTC to jump forward,
or so it appears.

Furthermore if I change --interval to something odd (like 17) the
delta from line to line remains about the same at 0.1 sec, which it
should not if there was a real slew occurring:

[EMAIL PROTECTED] ~]# adjtimex --utc --compare=10 --interval=17
  --- current ---   -- suggested --
cmos time system-cmos  error_ppm   tick  freqtick  freq
1199380633   0.540237
1199380649   0.642055 5989.3  10001   395
1199380665   0.743996 5996.5  10001   3959941   4177948
1199380681   0.845918 5995.4  10001   3959941   4250558
1199380697   0.947846 5995.8  10001   3959941   4227580
1199380714   0.048774   -52886.6  10001   395   10530   3070784
1199380730   0.150698 5995.5  10001   3959941   4243205
1199380746   0.252637 5996.4  10001   3959941   4185301
1199380762   0.354556 5995.2  10001   3959941   4261588
1199380778   0.456482 5995.6  10001   3959941   4235852


From my investigations, the system time is _not_ 
advancing faster than UTC.  In fact:

[EMAIL PROTECTED] ~]# ntpdate -q  montpelier.ilan.caltech.edu
server 192.12.19.20, stratum 1, offset -0.001267, delay 0.05643
 3 Jan 09:24:28 ntpdate[18831]: adjust time server 192.12.19.20 offset -0.00126\
7 sec

so my system is currently about 1 ms ahead the Stratum 1 server,
montpelier.ilan.caltech.edu.

This offset is nearly constant over several minutes. Also, if I
execute the command at several random times over the period of a
minute, the offset only fluctuates by 1/2 a ms or so.  

Conclusion: my system clock is not jumping around.

That leaves the RTC doing the jumping.  But having an RTC that is
runing nearly 1 ppm slower than my system clock and which jumps
ahead every 10 seconds seems absurd.

In fact two results seem to prove that the RTC is running smoothly:

-- If I look at the seconds digit changing w/in the desktop BIOS (pre-boot)
   it do not change its relative phase w.r.t. the displayed system
   clock on my laptop screen as I hold the latter next to the BIOS clock.
   (A 10% second retard would be quite visible, as would the jump.

-- The delta from line to line in the `adjtimex' output remains
   at ~0.1 no matter the --inverval used.  This shd. not be the case
   if there was a true slew between the system clcok and RTC.

It seems that leaves two other possibilities: a bug in adjtimex or a
bug in the kernel.  That's where I am right now.

For reference here's a few lines of output from the laptop (running
Fedora 6, kernel 2.6.20, and being an i386 32-bit machine):

[EMAIL PROTECTED] ~]# adjtimex --utc --compare=20 --interval=17
  --- current ---   -- suggested --
cmos time system-cmos  error_ppm   tick