Re: PATCH: Assume PM Timer to be reliable on broken board/BIOS

2005-08-10 Thread Olivier Fourdan
On 7/27/05, Robert Hancock <[EMAIL PROTECTED]> wrote:
> > In a nutshell, sometimes, the PIT/TSC timer runs 3x too fast [1]. That
> > causes many issues, including DMA errors, MCE, and clock running way too
> > fast (making the laptop unusable for any software development). So far,
> > no BIOS update was able to fix the issue for me.
> 
> Shouldn't this be looked into further rather than adding this
> workaround? Surely Windows is using the PIT as well, so there must be
> some way to get it to behave properly..

Sorry for the late follow up. Well, the timer management in Windows
depends on the HAL used. By default, it's the ACPI HAL that is used in
this laptop.

I did re-install Windows by forcing the "Standard PC"  HAL in Windows
XP installation and, without ACPI, Windows exhibits the exact same
problem as Linux or any other system: The clock runs 3 times too fast
in Windows too...

So my guess is that the HAL ACPI in Windows does more or less the same
thing that does my patch (updated, available here:
http://www.xfce.org/~olivier/r3000), it calibrates the PIT timer based
on the ACPI (PM) timer.

Cheers,
Olivier.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH: Assume PM Timer to be reliable on broken board/BIOS

2005-08-10 Thread Olivier Fourdan
On 7/27/05, Robert Hancock [EMAIL PROTECTED] wrote:
  In a nutshell, sometimes, the PIT/TSC timer runs 3x too fast [1]. That
  causes many issues, including DMA errors, MCE, and clock running way too
  fast (making the laptop unusable for any software development). So far,
  no BIOS update was able to fix the issue for me.
 
 Shouldn't this be looked into further rather than adding this
 workaround? Surely Windows is using the PIT as well, so there must be
 some way to get it to behave properly..

Sorry for the late follow up. Well, the timer management in Windows
depends on the HAL used. By default, it's the ACPI HAL that is used in
this laptop.

I did re-install Windows by forcing the Standard PC  HAL in Windows
XP installation and, without ACPI, Windows exhibits the exact same
problem as Linux or any other system: The clock runs 3 times too fast
in Windows too...

So my guess is that the HAL ACPI in Windows does more or less the same
thing that does my patch (updated, available here:
http://www.xfce.org/~olivier/r3000), it calibrates the PIT timer based
on the ACPI (PM) timer.

Cheers,
Olivier.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Time Flies (Twice as Fast)

2005-07-28 Thread Olivier Fourdan

Kurt

Did you try with the "no_timer_check" boot option?

HTH
Olivier.

On Thu, 2005-07-28 at 22:03 -0400, Kurt Wall wrote:
> Hola,
> 
> I have an eMachines T6212 Opteron system on which the system clock
> seems to run at ~twice the speed of the wall clock. The main board
> is an ASUS K8 of some description with at ATI SB400 southbridge and
> an ATI RS480 northbridge. Kernel version is 2.6.12.3.
> 
> If I disable ACPI, the clock slows down to what seems to be the proper
> speed, but then my NIC doesn't work, presumably because it shares
> an interrupt with something else.
> 
> I've tried booting with clock=tsc and clock=pit to no effect. Based
> on my review of the list archives, there appears to be issues with
> the chipset, but I haven't been able to sort out what the real problem
> is and the appropriate solution.
> 
> There's an ACPI error that seems potentially troublesome:
> 
> ACPI: Subsystem revision 20050309
> ACPI-0352: *** Error: Looking up [\_SB_.PCI0.LPC0.LNK0] in namespace, 
> AE_NOT_FOUND
> search_node 81001fec9440 start_node 81001fec9440 return_node 
> 
> 
> I also see this message from the PCI subsystem:
> 
> PCI: Ignoring BAR0-3 of IDE controller :00:14.1
> 
> As a starting point, I've attached lspci output and the boot log. I'm
> willing to provide more information and try patches and such.
> 
> Thanks.
> 
> Kurt
> 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Time Flies (Twice as Fast)

2005-07-28 Thread Olivier Fourdan

Kurt

Did you try with the no_timer_check boot option?

HTH
Olivier.

On Thu, 2005-07-28 at 22:03 -0400, Kurt Wall wrote:
 Hola,
 
 I have an eMachines T6212 Opteron system on which the system clock
 seems to run at ~twice the speed of the wall clock. The main board
 is an ASUS K8 of some description with at ATI SB400 southbridge and
 an ATI RS480 northbridge. Kernel version is 2.6.12.3.
 
 If I disable ACPI, the clock slows down to what seems to be the proper
 speed, but then my NIC doesn't work, presumably because it shares
 an interrupt with something else.
 
 I've tried booting with clock=tsc and clock=pit to no effect. Based
 on my review of the list archives, there appears to be issues with
 the chipset, but I haven't been able to sort out what the real problem
 is and the appropriate solution.
 
 There's an ACPI error that seems potentially troublesome:
 
 ACPI: Subsystem revision 20050309
 ACPI-0352: *** Error: Looking up [\_SB_.PCI0.LPC0.LNK0] in namespace, 
 AE_NOT_FOUND
 search_node 81001fec9440 start_node 81001fec9440 return_node 
 
 
 I also see this message from the PCI subsystem:
 
 PCI: Ignoring BAR0-3 of IDE controller :00:14.1
 
 As a starting point, I've attached lspci output and the boot log. I'm
 willing to provide more information and try patches and such.
 
 Thanks.
 
 Kurt
 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH: Assume PM Timer to be reliable on broken board/BIOS

2005-07-26 Thread Olivier Fourdan
On Tue, 2005-07-26 at 17:34 -0600, Robert Hancock wrote:
> > In a nutshell, sometimes, the PIT/TSC timer runs 3x too fast [1]. That
> > causes many issues, including DMA errors, MCE, and clock running way too
> > fast (making the laptop unusable for any software development). So far,
> > no BIOS update was able to fix the issue for me.
> 
> Shouldn't this be looked into further rather than adding this 
> workaround? Surely Windows is using the PIT as well, so there must be 
> some way to get it to behave properly..

Surely, but I've been desesperatly trying to find the cause w/out
success for months.

My first idea was that the BIOS doesn't set the CPU voltage properly at
boot, so I made up a patch that sets the right fid/vid before any
calibration but that didn't help.

The BIOS is wrong (ie the BIOS reports a 1/3 of the actual CPU speed),
memtest86+ which doesn't use any ACPI or whatever reports wrong time
too, so it's definitely not a Linux bug.

My guess is that Windows reinitialize some register but it's hard to
tell.

Cheers,
Olivier.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PATCH: Assume PM Timer to be reliable on broken board/BIOS

2005-07-26 Thread Olivier Fourdan
Hi all,


Background
==

I have a laptop (Compaq R3480EA, AMD 64 3400+ with NForce3) and reported
multiple problems related to timer issues.

In a nutshell, sometimes, the PIT/TSC timer runs 3x too fast [1]. That
causes many issues, including DMA errors, MCE, and clock running way too
fast (making the laptop unusable for any software development). So far,
no BIOS update was able to fix the issue for me.

As I first reported this the LKML back in march [2], the only reliable
time source on this laptop seems to be the PM timer. However, the time
in Linux is tick based and forcing the PM timer doesn't help.

Also, the PIT timer being used to calibrate the lpj, the wrong LP was
causing the nasty errors I had with DMA and other MCE. Although the lpj
can be forced at boot, having it right in the first place even on such
broken hardware as my laptop can save quite a lot of time and
investigations for novice users.

Many similar reports can be found on the web for the Compaq R3000 and HP
zv5000 laptops, either with 64 or 32 bit CPU [3]. Similar bug reports
with no fix can be also found in SuSE and Red Hat bugzilla databases.

What the patch does
===

Basically, the patch adjusts the PIT/TSC passed values based on the PM
timer rate.

The PM timer is compared to the TSC/PIT rate and a a multiplier is
computed. On a "normal" system, the ratio is 1. On my broken laptop, the
ratio is 3.

That ration is then applied to all values passed to the PIT timer.

For example, instead of using:

   outb_p(LATCH & 0xff, PIT_CH0);
   outb(LATCH >> 8, PIT_CH0);

The patch uses :
   outb_p((LATCH * timer_mult) & 0xff, PIT_CH0);
   outb((LATCH * timer_mult) >> 8, PIT_CH0);

Also, the ratio is computed/used only if the user has specified the
"clock=pmtmr" boot option on i386 or "pmtmr" on x86_64. If the user has
not explicitly asked for the PM timer to be used, and if there is a
delta of more than 5% between the PM timer and the PIT, then the PM
timer is not used (just like it is in the current implementation for
i386 arch).

What is included in the patch
=

The patch includes the code that implements the workaround described
above for x86_64 and i386 arch.

The patch applies in Linux 2.6.12.3.

Documentation is also updated.
==


Please let me know if there are some fixes or improvements to add and if
such a patch could be suitable in the kernel.

As a side note, this patch is very useful for me as it makes the laptop
usable under Linux and I plan to keep it available somewhere on xfce.org
so that other Compaq R3000 and HP zv5000 owners can use it.

Ref.

[1] http://kerneltrap.org/mailarchive/1/message/43741/thread
[2] http://lkml.org/lkml/2005/3/29/265
[3] http://lists.pcxperience.com/pipermail/linuxr3000/2004-
September/003678.html
http://lists.pcxperience.com/pipermail/linuxr3000/2004-
September/003788.html
http://lists.pcxperience.com/pipermail/linuxr3000/2005-
July/006763.html
http://lists.pcxperience.com/pipermail/linuxr3000/2005-
January/004650.html

Thanks,
Regards,
Olivier.

diff -Naur linux-2.6.12.3/arch/i386/kernel/time.c linux-2.6.12.3-pmtimer/arch/i386/kernel/time.c
--- linux-2.6.12.3/arch/i386/kernel/time.c	2005-06-17 21:48:29.0 +0200
+++ linux-2.6.12.3-pmtimer/arch/i386/kernel/time.c	2005-07-26 22:30:52.0 +0200
@@ -77,6 +77,12 @@
 
 EXPORT_SYMBOL(jiffies_64);
 
+/* 
+ * timer_mult is a mutiplier used to work arround some very buggy BIOS
+ * or hardware where the PIT/TSC timer runs n times too fast.
+ */
+u16 timer_mult = 1;
+
 unsigned long cpu_khz;	/* Detected as we calibrate the TSC */
 
 extern unsigned long wall_jiffies;
diff -Naur linux-2.6.12.3/arch/i386/kernel/timers/timer_cyclone.c linux-2.6.12.3-pmtimer/arch/i386/kernel/timers/timer_cyclone.c
--- linux-2.6.12.3/arch/i386/kernel/timers/timer_cyclone.c	2005-06-17 21:48:29.0 +0200
+++ linux-2.6.12.3-pmtimer/arch/i386/kernel/timers/timer_cyclone.c	2005-07-26 22:52:24.0 +0200
@@ -21,6 +21,12 @@
 
 extern spinlock_t i8253_lock;
 
+/* 
+ * timer_mult is a mutiplier used to work arround some very buggy BIOS
+ * or hardware where the PIT/TSC timer runs n times too fast.
+ */
+extern u16 timer_mult;
+
 /* Number of usecs that the last interrupt was delayed */
 static int delay_at_last_interrupt;
 
@@ -70,8 +76,8 @@
 	 */
 	if (count > LATCH) {
 		outb_p(0x34, PIT_MODE);
-		outb_p(LATCH & 0xff, PIT_CH0);
-		outb(LATCH >> 8, PIT_CH0);
+		outb_p((LATCH * timer_mult) & 0xff, PIT_CH0);
+		outb((LATCH * timer_mult) >> 8, PIT_CH0);
 		count = LATCH - 1;
 	}
 	spin_unlock(_lock);
diff -Naur linux-2.6.12.3/arch/i386/kernel/timers/timer_pit.c linux-2.6.12.3-pmtimer/arch/i386/kernel/timers/timer_pit.c
--- linux-2.6.12.3/arch/i386/kernel/timers/timer_pit.c	2005-06-17 21:48:29.0 +0200
+++ linux-2.6.12.3-pmtimer/arch/i386/kernel/timers/timer_pit.c	

PATCH: Assume PM Timer to be reliable on broken board/BIOS

2005-07-26 Thread Olivier Fourdan
Hi all,


Background
==

I have a laptop (Compaq R3480EA, AMD 64 3400+ with NForce3) and reported
multiple problems related to timer issues.

In a nutshell, sometimes, the PIT/TSC timer runs 3x too fast [1]. That
causes many issues, including DMA errors, MCE, and clock running way too
fast (making the laptop unusable for any software development). So far,
no BIOS update was able to fix the issue for me.

As I first reported this the LKML back in march [2], the only reliable
time source on this laptop seems to be the PM timer. However, the time
in Linux is tick based and forcing the PM timer doesn't help.

Also, the PIT timer being used to calibrate the lpj, the wrong LP was
causing the nasty errors I had with DMA and other MCE. Although the lpj
can be forced at boot, having it right in the first place even on such
broken hardware as my laptop can save quite a lot of time and
investigations for novice users.

Many similar reports can be found on the web for the Compaq R3000 and HP
zv5000 laptops, either with 64 or 32 bit CPU [3]. Similar bug reports
with no fix can be also found in SuSE and Red Hat bugzilla databases.

What the patch does
===

Basically, the patch adjusts the PIT/TSC passed values based on the PM
timer rate.

The PM timer is compared to the TSC/PIT rate and a a multiplier is
computed. On a normal system, the ratio is 1. On my broken laptop, the
ratio is 3.

That ration is then applied to all values passed to the PIT timer.

For example, instead of using:

   outb_p(LATCH  0xff, PIT_CH0);
   outb(LATCH  8, PIT_CH0);

The patch uses :
   outb_p((LATCH * timer_mult)  0xff, PIT_CH0);
   outb((LATCH * timer_mult)  8, PIT_CH0);

Also, the ratio is computed/used only if the user has specified the
clock=pmtmr boot option on i386 or pmtmr on x86_64. If the user has
not explicitly asked for the PM timer to be used, and if there is a
delta of more than 5% between the PM timer and the PIT, then the PM
timer is not used (just like it is in the current implementation for
i386 arch).

What is included in the patch
=

The patch includes the code that implements the workaround described
above for x86_64 and i386 arch.

The patch applies in Linux 2.6.12.3.

Documentation is also updated.
==


Please let me know if there are some fixes or improvements to add and if
such a patch could be suitable in the kernel.

As a side note, this patch is very useful for me as it makes the laptop
usable under Linux and I plan to keep it available somewhere on xfce.org
so that other Compaq R3000 and HP zv5000 owners can use it.

Ref.

[1] http://kerneltrap.org/mailarchive/1/message/43741/thread
[2] http://lkml.org/lkml/2005/3/29/265
[3] http://lists.pcxperience.com/pipermail/linuxr3000/2004-
September/003678.html
http://lists.pcxperience.com/pipermail/linuxr3000/2004-
September/003788.html
http://lists.pcxperience.com/pipermail/linuxr3000/2005-
July/006763.html
http://lists.pcxperience.com/pipermail/linuxr3000/2005-
January/004650.html

Thanks,
Regards,
Olivier.

diff -Naur linux-2.6.12.3/arch/i386/kernel/time.c linux-2.6.12.3-pmtimer/arch/i386/kernel/time.c
--- linux-2.6.12.3/arch/i386/kernel/time.c	2005-06-17 21:48:29.0 +0200
+++ linux-2.6.12.3-pmtimer/arch/i386/kernel/time.c	2005-07-26 22:30:52.0 +0200
@@ -77,6 +77,12 @@
 
 EXPORT_SYMBOL(jiffies_64);
 
+/* 
+ * timer_mult is a mutiplier used to work arround some very buggy BIOS
+ * or hardware where the PIT/TSC timer runs n times too fast.
+ */
+u16 timer_mult = 1;
+
 unsigned long cpu_khz;	/* Detected as we calibrate the TSC */
 
 extern unsigned long wall_jiffies;
diff -Naur linux-2.6.12.3/arch/i386/kernel/timers/timer_cyclone.c linux-2.6.12.3-pmtimer/arch/i386/kernel/timers/timer_cyclone.c
--- linux-2.6.12.3/arch/i386/kernel/timers/timer_cyclone.c	2005-06-17 21:48:29.0 +0200
+++ linux-2.6.12.3-pmtimer/arch/i386/kernel/timers/timer_cyclone.c	2005-07-26 22:52:24.0 +0200
@@ -21,6 +21,12 @@
 
 extern spinlock_t i8253_lock;
 
+/* 
+ * timer_mult is a mutiplier used to work arround some very buggy BIOS
+ * or hardware where the PIT/TSC timer runs n times too fast.
+ */
+extern u16 timer_mult;
+
 /* Number of usecs that the last interrupt was delayed */
 static int delay_at_last_interrupt;
 
@@ -70,8 +76,8 @@
 	 */
 	if (count  LATCH) {
 		outb_p(0x34, PIT_MODE);
-		outb_p(LATCH  0xff, PIT_CH0);
-		outb(LATCH  8, PIT_CH0);
+		outb_p((LATCH * timer_mult)  0xff, PIT_CH0);
+		outb((LATCH * timer_mult)  8, PIT_CH0);
 		count = LATCH - 1;
 	}
 	spin_unlock(i8253_lock);
diff -Naur linux-2.6.12.3/arch/i386/kernel/timers/timer_pit.c linux-2.6.12.3-pmtimer/arch/i386/kernel/timers/timer_pit.c
--- linux-2.6.12.3/arch/i386/kernel/timers/timer_pit.c	2005-06-17 21:48:29.0 +0200
+++ linux-2.6.12.3-pmtimer/arch/i386/kernel/timers/timer_pit.c	2005-07-26 

Re: PATCH: Assume PM Timer to be reliable on broken board/BIOS

2005-07-26 Thread Olivier Fourdan
On Tue, 2005-07-26 at 17:34 -0600, Robert Hancock wrote:
  In a nutshell, sometimes, the PIT/TSC timer runs 3x too fast [1]. That
  causes many issues, including DMA errors, MCE, and clock running way too
  fast (making the laptop unusable for any software development). So far,
  no BIOS update was able to fix the issue for me.
 
 Shouldn't this be looked into further rather than adding this 
 workaround? Surely Windows is using the PIT as well, so there must be 
 some way to get it to behave properly..

Surely, but I've been desesperatly trying to find the cause w/out
success for months.

My first idea was that the BIOS doesn't set the CPU voltage properly at
boot, so I made up a patch that sets the right fid/vid before any
calibration but that didn't help.

The BIOS is wrong (ie the BIOS reports a 1/3 of the actual CPU speed),
memtest86+ which doesn't use any ACPI or whatever reports wrong time
too, so it's definitely not a Linux bug.

My guess is that Windows reinitialize some register but it's hard to
tell.

Cheers,
Olivier.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Is it possible to "reset" the processor to a sane state at boot?

2005-04-10 Thread Olivier Fourdan
Hi,

Sorry if this post sounds a bit off topic now. It seems I've narrowed
down the issue with the timer running too fast on my AMD 64 based Compaq
laptop.

As said previously, after a cold restart, the system runs 3x too fast.
The processor speed as reported by both the Linux kernel and memtest86
is 266MHz while the lowest speed is actually 800MHz (1).

Even the BIOS shows that problem, instead of reporting the correct
800MHz speed for the CPU (like it does normally when the system is
fine), it shows "???MHz" at boot instead. So it's probably a hardware or
a BIOS issue (or both).

What is puzzling me is that doesn't make a single difference for WinXP.
Everything works just fine in WinXP (2). So I wonder, is there a way to
"reset" the processor to a sane state? If such a workaround is doable,
could someone point me to where I should look?

Thanks in advance

Olivier


(1) memtest86 uses "rdtsc" to compute cpu speed.
(2) The laptop came preloaded with WinXP and it runs fine with it, so I
guess that from a "support" point of view, the system is fine.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Is it possible to reset the processor to a sane state at boot?

2005-04-10 Thread Olivier Fourdan
Hi,

Sorry if this post sounds a bit off topic now. It seems I've narrowed
down the issue with the timer running too fast on my AMD 64 based Compaq
laptop.

As said previously, after a cold restart, the system runs 3x too fast.
The processor speed as reported by both the Linux kernel and memtest86
is 266MHz while the lowest speed is actually 800MHz (1).

Even the BIOS shows that problem, instead of reporting the correct
800MHz speed for the CPU (like it does normally when the system is
fine), it shows ???MHz at boot instead. So it's probably a hardware or
a BIOS issue (or both).

What is puzzling me is that doesn't make a single difference for WinXP.
Everything works just fine in WinXP (2). So I wonder, is there a way to
reset the processor to a sane state? If such a workaround is doable,
could someone point me to where I should look?

Thanks in advance

Olivier


(1) memtest86 uses rdtsc to compute cpu speed.
(2) The laptop came preloaded with WinXP and it runs fine with it, so I
guess that from a support point of view, the system is fine.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Clock 3x too fast on AMD64 laptop [WAS Re: Various issues after rebooting]

2005-03-31 Thread Olivier Fourdan
Hi John, Dominik,


On Tue, 2005-03-29 at 14:11 -0800, john stultz wrote:
> Yea. From your description this is most likely the cause of the issue.
> Currently the time of day is still tick-based, using the tsc/pmtmr/hpet
> only for interpolating between ticks. 

Sorry for the late follow up. Unfortunately, a quick hack to disable the
"pmtmr" check shows that even when "trusting" the PM-Timer, the clock
and interrupts still run 3x too fast. That makes no difference.

> Well, if you tried the time of day re-work I've been working on it would
> mask the issue somewhat, but you'd still have the problem that you are
> taking too many timer interrupts.

Where could I get that patch from ? I'd be glad to do some testing for
you if you need it.

> One thing you could try is playing with the CLOCK_TICK_RATE value to see
> if you just have very unique hardware. 

Problem is that the issue shows exactly after one quick power off/power
on sequence. It doesn't show after a real cold start (leaving the laptop
off for a  couple of hours) or even after a reboot.

> A similar sounding issue has also been reported here:
> http://bugme.osdl.org/show_bug.cgi?id=3927

Not sure if that's the exact same problem. What I can say, after reading
that bug report, is that disabling ACPI and/or APIC makes no difference.
Specifying the clock=... makes no difference either. It doesn't seem
related to the AMD64 part of the kernel since it shows equally when
using a 64bit kernel and a 32bit kernel.

Moreover, when that bug shows, there are other different problems
showing (such as the cdrom not being to mount anything, or ndiswrapper
crashing the system with a MCE error).

At first, I thought the issue might be related to the nforce3, but the
bug refers to an ATI chipset so I guess it's not related to the nforce.

Anyway, it doesn't seem to be an uncommon issue with AMD64 based
hardware. I don't know where to start from though.

Cheers,
Olivier.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Clock 3x too fast on AMD64 laptop [WAS Re: Various issues after rebooting]

2005-03-31 Thread Olivier Fourdan
Hi John, Dominik,


On Tue, 2005-03-29 at 14:11 -0800, john stultz wrote:
 Yea. From your description this is most likely the cause of the issue.
 Currently the time of day is still tick-based, using the tsc/pmtmr/hpet
 only for interpolating between ticks. 

Sorry for the late follow up. Unfortunately, a quick hack to disable the
pmtmr check shows that even when trusting the PM-Timer, the clock
and interrupts still run 3x too fast. That makes no difference.

 Well, if you tried the time of day re-work I've been working on it would
 mask the issue somewhat, but you'd still have the problem that you are
 taking too many timer interrupts.

Where could I get that patch from ? I'd be glad to do some testing for
you if you need it.

 One thing you could try is playing with the CLOCK_TICK_RATE value to see
 if you just have very unique hardware. 

Problem is that the issue shows exactly after one quick power off/power
on sequence. It doesn't show after a real cold start (leaving the laptop
off for a  couple of hours) or even after a reboot.

 A similar sounding issue has also been reported here:
 http://bugme.osdl.org/show_bug.cgi?id=3927

Not sure if that's the exact same problem. What I can say, after reading
that bug report, is that disabling ACPI and/or APIC makes no difference.
Specifying the clock=... makes no difference either. It doesn't seem
related to the AMD64 part of the kernel since it shows equally when
using a 64bit kernel and a 32bit kernel.

Moreover, when that bug shows, there are other different problems
showing (such as the cdrom not being to mount anything, or ndiswrapper
crashing the system with a MCE error).

At first, I thought the issue might be related to the nforce3, but the
bug refers to an ATI chipset so I guess it's not related to the nforce.

Anyway, it doesn't seem to be an uncommon issue with AMD64 based
hardware. I don't know where to start from though.

Cheers,
Olivier.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Clock 3x too fast on AMD64 laptop [WAS Re: Various issues after rebooting]

2005-03-29 Thread Olivier Fourdan
Hi,

A quick look at the source shows that the error is triggered in
arch/i386/kernel/timers/timer_pm.c by the verify_pmtr_rate() function.

My guess is that the pmtmr timer is right and the pit is wrong in my
case. That would explain why the clock is wrong when being based on pit
(like when forced with "clock=pit")

Maybe, if I can prove my guesses, a fix could be to "trust" the pmtmr
clock when the user has passed a "clock=pmtmr" argument ? Does that make
any sense ?

TIA
Olivier.



On Tue, 2005-03-29 at 23:28 +0200, Olivier Fourdan wrote:
> Hi all
> 
> Following my own thread, I found the following error in dmesg:
> 
> PM-Timer running at invalid rate: 33% of normal - aborting.
> 
> I found that interesting because 33% is 1/3 and the clock runs exactly
> 3x faster than normal...
> 
> A bit of search on google gave me several links to posts from other
> people with the exact same problem on similar hardware (AMD64 laptop)
> but I couldn't find neither the cause nor the fix of that issue (as I
> think it might be related to the other issues I observe when the clock
> goes too fast)
> 
> Does that PM-Timer message makes sense to someone knowledgeable?
> 
> Thanks in advance,
> 
> Cheers,
> Olivier.
> 
> On Mon, 2005-03-28 at 21:39 +0200, Willy Tarreau wrote:
> > On Mon, Mar 28, 2005 at 09:30:26PM +0200, Olivier Fourdan wrote:
> > > Hi Willy
> > > 
> > > On Mon, 2005-03-28 at 21:20 +0200, Willy Tarreau wrote:
> > > > Now I have a compaq (nc8000) which does not exhibit such buggy 
> > > > behaviour,
> > > > but you can try disabling the APIC too just in case it's a similar 
> > > > problem
> > > > (at least in 32 bits, I don't know if you can disable it in 64 bits 
> > > > mode).
> > > 
> > > Thanks for the hint, but unfortunately, it's one of the first things I
> > > tried, and that makes no difference.
> > 
> > Sorry, at first I only noticed ACPI in your mail, but after reading it
> > again, I also noticed APIC. So now, you can only try not to initialize
> > some peripherals (IDE, network, display, etc...) by removing their drivers
> > from the kernel. You may end up with a kernel panic, but that does not
> > matter is you boot it with "panic=5" so that it automatically reboots
> > 5 seconds after the panic. You should then finally identify the subsystem
> > which is responsible for your problems. Perhaps you'll even need to remove
> > PCI support :-(
> > 
> > Regards,
> > Willy
> > 
> > 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Clock 3x too fast on AMD64 laptop [WAS Re: Various issues after rebooting]

2005-03-29 Thread Olivier Fourdan
Hi all

Following my own thread, I found the following error in dmesg:

PM-Timer running at invalid rate: 33% of normal - aborting.

I found that interesting because 33% is 1/3 and the clock runs exactly
3x faster than normal...

A bit of search on google gave me several links to posts from other
people with the exact same problem on similar hardware (AMD64 laptop)
but I couldn't find neither the cause nor the fix of that issue (as I
think it might be related to the other issues I observe when the clock
goes too fast)

Does that PM-Timer message makes sense to someone knowledgeable?

Thanks in advance,

Cheers,
Olivier.

On Mon, 2005-03-28 at 21:39 +0200, Willy Tarreau wrote:
> On Mon, Mar 28, 2005 at 09:30:26PM +0200, Olivier Fourdan wrote:
> > Hi Willy
> > 
> > On Mon, 2005-03-28 at 21:20 +0200, Willy Tarreau wrote:
> > > Now I have a compaq (nc8000) which does not exhibit such buggy behaviour,
> > > but you can try disabling the APIC too just in case it's a similar problem
> > > (at least in 32 bits, I don't know if you can disable it in 64 bits mode).
> > 
> > Thanks for the hint, but unfortunately, it's one of the first things I
> > tried, and that makes no difference.
> 
> Sorry, at first I only noticed ACPI in your mail, but after reading it
> again, I also noticed APIC. So now, you can only try not to initialize
> some peripherals (IDE, network, display, etc...) by removing their drivers
> from the kernel. You may end up with a kernel panic, but that does not
> matter is you boot it with "panic=5" so that it automatically reboots
> 5 seconds after the panic. You should then finally identify the subsystem
> which is responsible for your problems. Perhaps you'll even need to remove
> PCI support :-(
> 
> Regards,
> Willy
> 
> 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Clock 3x too fast on AMD64 laptop [WAS Re: Various issues after rebooting]

2005-03-29 Thread Olivier Fourdan
Hi all

Following my own thread, I found the following error in dmesg:

PM-Timer running at invalid rate: 33% of normal - aborting.

I found that interesting because 33% is 1/3 and the clock runs exactly
3x faster than normal...

A bit of search on google gave me several links to posts from other
people with the exact same problem on similar hardware (AMD64 laptop)
but I couldn't find neither the cause nor the fix of that issue (as I
think it might be related to the other issues I observe when the clock
goes too fast)

Does that PM-Timer message makes sense to someone knowledgeable?

Thanks in advance,

Cheers,
Olivier.

On Mon, 2005-03-28 at 21:39 +0200, Willy Tarreau wrote:
 On Mon, Mar 28, 2005 at 09:30:26PM +0200, Olivier Fourdan wrote:
  Hi Willy
  
  On Mon, 2005-03-28 at 21:20 +0200, Willy Tarreau wrote:
   Now I have a compaq (nc8000) which does not exhibit such buggy behaviour,
   but you can try disabling the APIC too just in case it's a similar problem
   (at least in 32 bits, I don't know if you can disable it in 64 bits mode).
  
  Thanks for the hint, but unfortunately, it's one of the first things I
  tried, and that makes no difference.
 
 Sorry, at first I only noticed ACPI in your mail, but after reading it
 again, I also noticed APIC. So now, you can only try not to initialize
 some peripherals (IDE, network, display, etc...) by removing their drivers
 from the kernel. You may end up with a kernel panic, but that does not
 matter is you boot it with panic=5 so that it automatically reboots
 5 seconds after the panic. You should then finally identify the subsystem
 which is responsible for your problems. Perhaps you'll even need to remove
 PCI support :-(
 
 Regards,
 Willy
 
 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Clock 3x too fast on AMD64 laptop [WAS Re: Various issues after rebooting]

2005-03-29 Thread Olivier Fourdan
Hi,

A quick look at the source shows that the error is triggered in
arch/i386/kernel/timers/timer_pm.c by the verify_pmtr_rate() function.

My guess is that the pmtmr timer is right and the pit is wrong in my
case. That would explain why the clock is wrong when being based on pit
(like when forced with clock=pit)

Maybe, if I can prove my guesses, a fix could be to trust the pmtmr
clock when the user has passed a clock=pmtmr argument ? Does that make
any sense ?

TIA
Olivier.



On Tue, 2005-03-29 at 23:28 +0200, Olivier Fourdan wrote:
 Hi all
 
 Following my own thread, I found the following error in dmesg:
 
 PM-Timer running at invalid rate: 33% of normal - aborting.
 
 I found that interesting because 33% is 1/3 and the clock runs exactly
 3x faster than normal...
 
 A bit of search on google gave me several links to posts from other
 people with the exact same problem on similar hardware (AMD64 laptop)
 but I couldn't find neither the cause nor the fix of that issue (as I
 think it might be related to the other issues I observe when the clock
 goes too fast)
 
 Does that PM-Timer message makes sense to someone knowledgeable?
 
 Thanks in advance,
 
 Cheers,
 Olivier.
 
 On Mon, 2005-03-28 at 21:39 +0200, Willy Tarreau wrote:
  On Mon, Mar 28, 2005 at 09:30:26PM +0200, Olivier Fourdan wrote:
   Hi Willy
   
   On Mon, 2005-03-28 at 21:20 +0200, Willy Tarreau wrote:
Now I have a compaq (nc8000) which does not exhibit such buggy 
behaviour,
but you can try disabling the APIC too just in case it's a similar 
problem
(at least in 32 bits, I don't know if you can disable it in 64 bits 
mode).
   
   Thanks for the hint, but unfortunately, it's one of the first things I
   tried, and that makes no difference.
  
  Sorry, at first I only noticed ACPI in your mail, but after reading it
  again, I also noticed APIC. So now, you can only try not to initialize
  some peripherals (IDE, network, display, etc...) by removing their drivers
  from the kernel. You may end up with a kernel panic, but that does not
  matter is you boot it with panic=5 so that it automatically reboots
  5 seconds after the panic. You should then finally identify the subsystem
  which is responsible for your problems. Perhaps you'll even need to remove
  PCI support :-(
  
  Regards,
  Willy
  
  
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Various issues after rebooting

2005-03-28 Thread Olivier Fourdan
Hi Willy,

On Mon, 2005-03-28 at 21:39 +0200, Willy Tarreau wrote:
> Sorry, at first I only noticed ACPI in your mail, but after reading it
> again, I also noticed APIC. So now, you can only try not to initialize
> some peripherals (IDE, network, display, etc...) by removing their drivers
> from the kernel. You may end up with a kernel panic, but that does not
> matter is you boot it with "panic=5" so that it automatically reboots
> 5 seconds after the panic. You should then finally identify the subsystem
> which is responsible for your problems. Perhaps you'll even need to remove
> PCI support :-(

Well, actually, the system runs (at least) unless I try to load
"ndiswrapper" which leads to a kernel panic.

I tried to bring the issue to the ndiswrapper ML but I doubt that
ndiswrapper is faulty.

I can reliably predict the crash. If the clock (and all other time based
events) are too fast, then modprobing ndiswrapper will lead to a system
crash, just like mounting a CDROM will fail.

I think the clock speed and other effects are just signs, not the cause
of the problem. What I'd like to determine is what would need to be done
to avoid the root cause, or maybe if there is anything that can be done
in Linux to avoid that?

I just tried "acpi_fake_ecdt" but that leads to a immediate kernel
panic.

Ps: Given the crash (Machine check exception), the sleep option seems to
have no effect.

Thanks,
Olivier.




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Various issues after rebooting

2005-03-28 Thread Olivier Fourdan
Hi Willy

On Mon, 2005-03-28 at 21:20 +0200, Willy Tarreau wrote:
> Now I have a compaq (nc8000) which does not exhibit such buggy behaviour,
> but you can try disabling the APIC too just in case it's a similar problem
> (at least in 32 bits, I don't know if you can disable it in 64 bits mode).

Thanks for the hint, but unfortunately, it's one of the first things I
tried, and that makes no difference.

Regards,
Olivier.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Various issues after rebooting

2005-03-28 Thread Olivier Fourdan
Hi all,

I'm facing some various odd issues with a AMD64 based laptop (Compaq
R3480EA) I bought recently.

On first boot, everything is all right. The laptop runs flawlessly. But
if I shutdown the laptop and restart it, I can see all kind of strange
things happening.

1) the system clock runs 3 times faster,
2) the system is unable to mount cdroms,
3) modprobing nidswrapper cause a whole system freeze with the following
message:

CPU 0: Machine Check Exception: 0004
Bank 4: b2070f0f
Kernel panic - not syncing: CPU context corrupt

I've tried with various kernels and distributions in 32bit and 64bit
modes but that make no differences.

I also tried disable ACPI, setting clock=[tsc|pmtmr|pti], diabling APIC,
etc. No luck. No matter how many reboots I do, the problem remains. The
only way to fix the problem is to keep the laptop off for a couple of
hours.

I thought of a hardware issue, but in WinXP, everything is fine. And in
the case of a hardware issue, I guess the problem would always show, not
just in Linux after a reboot. 

My guess is that the BIOS doesn't re-initialize the hardware correctly
in case of a quick shutdown/reboot but WinXP might be initializing the
things by itself (it's a guess, I'm probably completely wrong).

Does that make any sense so someone? How could I help tracking down this
issue?

Thanks in advance,

Best regards,
Olivier.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Various issues after rebooting

2005-03-28 Thread Olivier Fourdan
Hi all,

I'm facing some various odd issues with a AMD64 based laptop (Compaq
R3480EA) I bought recently.

On first boot, everything is all right. The laptop runs flawlessly. But
if I shutdown the laptop and restart it, I can see all kind of strange
things happening.

1) the system clock runs 3 times faster,
2) the system is unable to mount cdroms,
3) modprobing nidswrapper cause a whole system freeze with the following
message:

CPU 0: Machine Check Exception: 0004
Bank 4: b2070f0f
Kernel panic - not syncing: CPU context corrupt

I've tried with various kernels and distributions in 32bit and 64bit
modes but that make no differences.

I also tried disable ACPI, setting clock=[tsc|pmtmr|pti], diabling APIC,
etc. No luck. No matter how many reboots I do, the problem remains. The
only way to fix the problem is to keep the laptop off for a couple of
hours.

I thought of a hardware issue, but in WinXP, everything is fine. And in
the case of a hardware issue, I guess the problem would always show, not
just in Linux after a reboot. 

My guess is that the BIOS doesn't re-initialize the hardware correctly
in case of a quick shutdown/reboot but WinXP might be initializing the
things by itself (it's a guess, I'm probably completely wrong).

Does that make any sense so someone? How could I help tracking down this
issue?

Thanks in advance,

Best regards,
Olivier.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Various issues after rebooting

2005-03-28 Thread Olivier Fourdan
Hi Willy

On Mon, 2005-03-28 at 21:20 +0200, Willy Tarreau wrote:
 Now I have a compaq (nc8000) which does not exhibit such buggy behaviour,
 but you can try disabling the APIC too just in case it's a similar problem
 (at least in 32 bits, I don't know if you can disable it in 64 bits mode).

Thanks for the hint, but unfortunately, it's one of the first things I
tried, and that makes no difference.

Regards,
Olivier.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Various issues after rebooting

2005-03-28 Thread Olivier Fourdan
Hi Willy,

On Mon, 2005-03-28 at 21:39 +0200, Willy Tarreau wrote:
 Sorry, at first I only noticed ACPI in your mail, but after reading it
 again, I also noticed APIC. So now, you can only try not to initialize
 some peripherals (IDE, network, display, etc...) by removing their drivers
 from the kernel. You may end up with a kernel panic, but that does not
 matter is you boot it with panic=5 so that it automatically reboots
 5 seconds after the panic. You should then finally identify the subsystem
 which is responsible for your problems. Perhaps you'll even need to remove
 PCI support :-(

Well, actually, the system runs (at least) unless I try to load
ndiswrapper which leads to a kernel panic.

I tried to bring the issue to the ndiswrapper ML but I doubt that
ndiswrapper is faulty.

I can reliably predict the crash. If the clock (and all other time based
events) are too fast, then modprobing ndiswrapper will lead to a system
crash, just like mounting a CDROM will fail.

I think the clock speed and other effects are just signs, not the cause
of the problem. What I'd like to determine is what would need to be done
to avoid the root cause, or maybe if there is anything that can be done
in Linux to avoid that?

I just tried acpi_fake_ecdt but that leads to a immediate kernel
panic.

Ps: Given the crash (Machine check exception), the sleep option seems to
have no effect.

Thanks,
Olivier.




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/