Hal Murray wrote:

> Spoon wrote:
>
>> I've noticed something I find very strange on the systems I have to work
>> with. Every time I reboot the computer, the clock skew of the local
>> clock changes, sometimes by what seems to be a huge amount.
>> 
>> For example, I boot the computer, let ntpd run for 12 hours, and the
>> value recorded in the drift file is 35 ppm. I reboot the computer, let
>> ntpd run for 12 hours, and I get 5 ppm...
> 
> I'm chasing the same glitch.
> 
> I've seen it on two systems, both i386 running Linux 2.6 kernel.
> 
> I think I've tracked it to tsc_init which calls calculate_cpu_khz
> both are in ./arch/i386/kernel/tsc.c
> tsc_init prints a line like this:
>   kernel: Detected 2793.226 MHz processor.
> 
> The problem is that calculate_cpu_khz doesn't return the
> same answer.  I hacked the code to call/print it 10 times
> and I get things like this:
>  kernel: Detected 2793.287 MHz processor.
>  kernel: Detected 2793.225 MHz processor.
>  kernel: Detected 2793.228 MHz processor.
>  kernel: Detected 2793.304 MHz processor.
>  kernel: Detected 2793.242 MHz processor.
>  kernel: Detected 2793.192 MHz processor.
>  kernel: Detected 2793.334 MHz processor.
>  kernel: Detected 2793.203 MHz processor.
>  kernel: Detected 2793.292 MHz processor.
>  kernel: Detected 2793.237 MHz processor.
> 
> That's a spread of about 50 ppm which matches what I've seen
> before I started looking for this glitch.

I believe you've nailed the problem.

I patched my kernel with:
--- tsc.c       2007-04-11 10:04:50.000000000 +0200
+++ tsc.c       2007-04-11 10:13:13.000000000 +0200
@@ -123,6 +123,7 @@
        int i;
        unsigned long flags;

+       printk("DEBUG: INSIDE calculate_cpu_khz()\n");
        local_irq_save(flags);

        /* run 3 times to ensure the cache is warm */
@@ -187,7 +188,7 @@
        if (!cpu_has_tsc || tsc_disable)
                goto out_no_tsc;

-       cpu_khz = calculate_cpu_khz();
+       cpu_khz = 1266700;
        tsc_khz = cpu_khz;

        if (!cpu_khz)

I tested the new kernel on two identical systems.

The frequency offset computed by NTP is now very consistent, within 1-2 
ppm each time. This dispersion could easily be attributed to temperature 
variation, I think.

Sometime next week, I'll try and understand *why* the calibration in 
Linux is incorrect. I've been told to look into SMI and SMM.

Keep me posted if you get other interesting results.

Regards.

_______________________________________________
questions mailing list
[email protected]
https://lists.ntp.isc.org/mailman/listinfo/questions

Reply via email to