Re: [ntp:questions] Q: Disabling 11 minute mode

2008-01-25 Thread Dean S. Messing

Serge Bets wrote:
  On Tuesday, January 22, 2008 at 19:02:23 +, Dean S. Messing wrote:
 
  Is it possible to disable 11 minute mode from ntp.conf?
 
 No. You have to tweak the kernel. If you have the PPSkit:
 
 | $ echo 0  /proc/sys/kernel/time/rtc_update
 
 Otherwise you have to patch time.c in the kernel. Dead easy, just a
 matter of commenting out a line or two. I'm so patching all my kernels,
 reading and writing the RTC exclusively with hwclock 2.31, and am
 getting a far better accuracy.
 
 The main purpose of an RTC is to initialise the system time at powerup,
 isn't it? Most people startup in the morning at around half a second of
 the true time, and later ntpd has to step this to UTC. I routinely
 startup at some low milliseconds of the true time, offset quickly
 slewed. My last step event was years ago.

Thanks Serge.  I looked up PPSkit.  Looks good, but I'm going to have
to learn how to patch the Fedora kernel to install PPSkit.

But I'm discovering that I have rather deeper problems on my machine
(a Dell 490 Precision).  Using adjtimex --compare to track the drift
between system and cmos clock (ntpd not running), I see that the RTC
is behaving _very_ strangely.  It will begin to return screwy values
after several hours of doing adjtimex --compare and then get to the
point where hwclcok --show hangs.  So my desire to turn off 11
minute mode is  mute when ntp is running is mute.

For your amusement, here's a snippet of the output of
adjtime --compare with an interval of 60 seconds:

1200982902 0.001784   -2.0  10001   3929312   10001   4060301
1200982962 0.0017920.1  10001   3929312   10001   3920719
1200983022 0.0020514.3  10001   3929312   10001   3646240
1200983082 0.001828   -3.7  10001   3929312   10001   4173062
1200983142 0.001756   -1.2  10001   3929312   10001   4007957
1200983202 0.0020254.5  10001   3926656   10001   3632906
1200983261 0.500370 8305.8  10001   39262889918   3549307
120098328140.001689   658355.3  10001   39262883418301130
120098334140.0019314.0  10001   3926288   10001   3661966
120098340734.001894  -10.6  10001   3926288   11001   3966652
120098346140.0016465.9  10001   39262889001   4197121
120098352140.0018904.1  10001   3926288   10001   3659882
120098360912.001763  -48.8  10001   3924640   14668   1878649
120098364140.001606   44.0  10001   39246405334   6280787
1200983741 0.001726  -64.7  10001   3924640   16668   1609118
120098376140.001911   69.8  10001   39246403334   5907090
120098382140.001553   -6.0  10001   3924640   10001   4315525
1200983921 0.001748  -63.4  10001   3924640   16668   1527086
120098394140.001894   69.1  10001   39246403334   5949798
120098400140.001554   -5.7  10001   3924640   10001   4295994
1200984101 0.001700  -64.2  10001   3921488   16668   1577580
1200984161 0.001291   -6.8  10001   3921104   10001   4367718
1200984221 0.0015324.0  10001   3921104   10001   3657823
1200984275 6.001806   14.6  10001   39211049001   3621886
120098430140.001722   55.3  10001   39203684334   6196568
120098436140.0019744.2  10001   3920368   10001   3645108
120098442734.001868  -11.8  10001   3920368   11001   4036253
120098448140.0016796.8  10001   39203689001   4126878

Things got so bad that the output eventually became:

199345540  1001658696.064552  1592732.9  10001   3879376   -5926   1725431
199345717  1001658600.500585 -1592732.8  10001   3879376   25928   6027853
199345718  1001658696.023830  1592054.1  10001   3879376   -5919335126
199345896  1001658600.500586 -1592054.1  10001   3879376   25922868985
199345897  1001658696.045414  1592413.8  10001   3879376   -5923   2975047


Before it went crazy, it had run smoothly for 5 or 6 hours.
When I rebooted into the BIOS and looked at the RTC it was off by
several years.

This has now happened thrice, but only when adjtimex is running in the
compare mode for long periods.  I have no idea what this means.  The
cmos battery does not appear to be the problem since, after a reboot,
the RTC remains at proper time indefinitely (modulo drift), unless
and until I run adjtimex --compare for several hours.

Anyway, thanks for the info. on 11 minute mode.  Wish I could fix my
RTC problem

Dean
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


[ntp:questions] Q: Disabling 11 minute mode

2008-01-22 Thread Dean S. Messing

Is it possible to disable 11 minute mode from ntp.conf?
I've tried using the command disable kernel but that
appears to change the way time discipline is maintained, but
does nothing for 11 minute mode.

If using ntp.conf is not the way, what is?

Thanks for your help.
Dean
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] Q: Disabling 11 minute mode

2008-01-22 Thread Dean S. Messing

Richard B. Gilbert wrote:
 Dean S. Messing wrote:
  Is it possible to disable 11 minute mode from ntp.conf?
  I've tried using the command disable kernel but that
  appears to change the way time discipline is maintained, but
  does nothing for 11 minute mode.
  
  If using ntp.conf is not the way, what is?
  
  Thanks for your help.
  Dean
 
 What IS 11 minute mode??

Oops.  Sorry!  I thought everyone reading this list
(who could answer my question :-) would know.

David Woolley gave you a good answer already so
I'll only add that if you want to see if you
are in 11 minute mode, do adjtimex -p and
look at the status: value. If it's odd, (LSB==1)
then your kernel is in 11 minute mode.

Now, if someone would tell me how to disable it
(short of hacking time.c) I'd be most thankful.

I tried turning it off with adjtimex -S 64) but
ntp changes it back again in a few minutes.

I'd like to disable it, but keep ntp kernel
discipline so I can do some analysis of my RTC.

Dean
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] quirky adjtimex behaviour [SOLVED]

2008-01-21 Thread Dean S. Messing

hal-usenet wrote:
 Dean Messing wrote:
 I am seeing strange behaviour on my _x86_64 Fedora 7 desktop
 workstation with regard to the system-cmos time that `adjtimex'
 reports.
 snip
 
 It seems that leaves two other possibilities: a bug in adjtimex or a
 bug in the kernel.  That's where I am right now.
 
 My guess is that the system/kernel is working correctly and that
 the adjtimex utility is printing out misleading stuff.
 
 The CMOS/hardware clock only returns the time to the nearest second.
 I think that would cause quirks like this if the code has a loop that
 does a bit of work and sleeps for N seconds and the bit of work
 takes 0.1 second the time when the CMOS clock is read will drift
 by 0.1 second each time around the loop.
 
 If you want to play and you can find the source, try changing
 the code that reads the CMOS clock to spin in a loop reading
 it until it changes.  That will give you the time early in the
 second.

Your guess is right, Hal.

It's been nearly three weeks since I've had a few minutes to further
pursue this.  I just replaced version 1.23 of adjtimex with an old
version 1.20 and the quirky behaviour disappeared.  I first noticed it
on my new Fedora 7 with version 1.21.

When I looked on the adjtimex site I saw it was up to 1.23 so I
thought that surely this problem has been detected and fixed.  When it
didn't go away in 1.23 I looked elsewhere: 64 bit machine, new kernel,
c.

I'll write the author and report the bug.  I'm really surprised nobody
has reported it already.

Dean 
___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


Re: [ntp:questions] quirky adjtimex behaviour

2008-01-04 Thread Dean S. Messing

Hi Jan, all.

Jan Ceuleers wrote:
 Dean S. Messing wrote:
  I am seeing strange behaviour on my _x86_64 Fedora 7 desktop
  workstation with regard to the system-cmos time that `adjtimex'
  reports.
 
 I've not read your whole post; it's clear that you've been wrestling 
 with this problem for a while and have done quite a bit of work already.

Well, I've done what I can but I'm really no expert on this stuff.
That's why I wrote to this list, which seems to be populated by _many_
very knowledgeable people.

 Can I however suggest that you first try and eliminate CPU frequency 
 scaling as a cause of the symptoms you're seeing: use cpufreq-set -g to 
 select a policy that results in a constant CPU frequency and then check 
 if this changes the behaviour (or renders it more predictable).

I installed the cpufreq-utils package.
The result of `cpufreq-info' is:

[EMAIL PROTECTED] ~]# cpufreq-info
cpufrequtils 002: cpufreq-info (C) Dominik Brodowski 2004-2006
Report errors and bugs to [EMAIL PROTECTED], please.
analyzing CPU 0:
  no or unknown cpufreq driver is active on this CPU
analyzing CPU 1:
  no or unknown cpufreq driver is active on this CPU
analyzing CPU 2:
  no or unknown cpufreq driver is active on this CPU
analyzing CPU 3:
  no or unknown cpufreq driver is active on this CPU

Also /sys/devices/system/cpu/cpu{0,1,2,3}/cpufreq/ does not exist on
this system.  I don't know much about cpufreq adjustments.  Should I
be looking elsewhere?  Note that this is a desktop workstation.  Will
the cpufreq (actually there are four CPUs in two dual-core units)
change on such a machine?

If you or others wouldn't mind reading my whole original post (it's
not _that_ long :-) maybe some other ideas might occur.  Thanks.


___
questions mailing list
questions@lists.ntp.org
https://lists.ntp.org/mailman/listinfo/questions


[ntp:questions] quirky adjtimex behaviour

2008-01-03 Thread Dean S. Messing
First, apologies if this is the wrong list for this.
Please direct my to the right place if it is.

I am seeing strange behaviour on my _x86_64 Fedora 7 desktop
workstation with regard to the system-cmos time that `adjtimex'
reports.

Below is an illustration:

[EMAIL PROTECTED] ~]# adjtimex --utc --compare=20 --interval=10
  --- current ---   -- suggested --
cmos time system-cmos  error_ppm   tick  freqtick  freq
1199380337   0.438974
1199380346   0.54042210144.8  10001   395
1199380355   0.64238210196.0  10001   3959899   4212512
1199380364   0.74433610195.4  10001   3959899   4251575
1199380373   0.84529310095.7  10001   3959900   4230787
1199380382   0.94625910096.6  10001   3959900   4172975
1199380392   0.047206   -89905.3  10001   395   10900   4297975
1199380401   0.14916610196.0  10001   3959899   4210950
1199380410   0.25113410196.8  10001   3959899   4160950
1199380419   0.35309610196.2  10001   3959899   4198450
1199380428   0.45505510195.9  10001   3959899   4218762
1199380437   0.55700410194.9  10001   3959899   4284387
1199380446   0.65897310196.9  10001   3959899   4153137
1199380455   0.75992510095.2  10001   3959900   4265162
1199380464   0.86088910096.4  10001   3959900   4185475
1199380473   0.96184810095.9  10001   3959900   4218287
1199380483   0.063806   -89804.2  10001   395   10899   4225012
1199380492   0.16575910195.3  10001   3959899   4257825
1199380501   0.26771910196.0  10001   3959899   4212512
1199380510   0.36968210196.3  10001   3959899   4192200
[EMAIL PROTECTED] ~]# 

As you can see, the system time appears to advance by almost exactly
0.1 seconds every 10 seconds relative to the RTC.  Then, just as they
are about to get out of phase by 1 second, something causes either the
system clock to jump back by ~1 second or the RTC to jump forward,
or so it appears.

Furthermore if I change --interval to something odd (like 17) the
delta from line to line remains about the same at 0.1 sec, which it
should not if there was a real slew occurring:

[EMAIL PROTECTED] ~]# adjtimex --utc --compare=10 --interval=17
  --- current ---   -- suggested --
cmos time system-cmos  error_ppm   tick  freqtick  freq
1199380633   0.540237
1199380649   0.642055 5989.3  10001   395
1199380665   0.743996 5996.5  10001   3959941   4177948
1199380681   0.845918 5995.4  10001   3959941   4250558
1199380697   0.947846 5995.8  10001   3959941   4227580
1199380714   0.048774   -52886.6  10001   395   10530   3070784
1199380730   0.150698 5995.5  10001   3959941   4243205
1199380746   0.252637 5996.4  10001   3959941   4185301
1199380762   0.354556 5995.2  10001   3959941   4261588
1199380778   0.456482 5995.6  10001   3959941   4235852


From my investigations, the system time is _not_ 
advancing faster than UTC.  In fact:

[EMAIL PROTECTED] ~]# ntpdate -q  montpelier.ilan.caltech.edu
server 192.12.19.20, stratum 1, offset -0.001267, delay 0.05643
 3 Jan 09:24:28 ntpdate[18831]: adjust time server 192.12.19.20 offset -0.00126\
7 sec

so my system is currently about 1 ms ahead the Stratum 1 server,
montpelier.ilan.caltech.edu.

This offset is nearly constant over several minutes. Also, if I
execute the command at several random times over the period of a
minute, the offset only fluctuates by 1/2 a ms or so.  

Conclusion: my system clock is not jumping around.

That leaves the RTC doing the jumping.  But having an RTC that is
runing nearly 1 ppm slower than my system clock and which jumps
ahead every 10 seconds seems absurd.

In fact two results seem to prove that the RTC is running smoothly:

-- If I look at the seconds digit changing w/in the desktop BIOS (pre-boot)
   it do not change its relative phase w.r.t. the displayed system
   clock on my laptop screen as I hold the latter next to the BIOS clock.
   (A 10% second retard would be quite visible, as would the jump.

-- The delta from line to line in the `adjtimex' output remains
   at ~0.1 no matter the --inverval used.  This shd. not be the case
   if there was a true slew between the system clcok and RTC.

It seems that leaves two other possibilities: a bug in adjtimex or a
bug in the kernel.  That's where I am right now.

For reference here's a few lines of output from the laptop (running
Fedora 6, kernel 2.6.20, and being an i386 32-bit machine):

[EMAIL PROTECTED] ~]# adjtimex --utc --compare=20 --interval=17
  --- current ---   -- suggested --
cmos time system-cmos  error_ppm   tick