Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-30 Thread Andi Kleen
On Sunday 30 September 2007 16:06:59 Thomas Gleixner wrote:
> On Sun, 30 Sep 2007, Andi Kleen wrote:
> 
> >>> OK, this explains 2) and 3). I just looked into the code and the logic
> >>> vs. noapictimer on SMP is completely broken.
> >
> > noapictimer really doesn't make any sense on non SMP imho with the old
> > timer architecture. That is why I never bothered to implement it.
> > It's purely a UP hack.
> 
> It does not matter whether it makes sense to you or not. It is a command 
> line option which bricks systems. 

A lot of command line options do that -- if not they would be usually
default or automatically used by the kernel.

> There is neither an explanation in  
> Dokumentation/kernel-parameters.txt nor a check in the code, which 
> disables this completely.

Fair enough. I can add a warning in the Documentation.
 
> It makes a lot of sense even with the existing architecture. Trouble 
> shooting a box, where the local apic timer does not work correctly is not 
> an UP only requirement.

It should not be needed with current systems as far as I know
(see my previous mail) 

> I understand the code quite well. I'm just surprised from time to time by 
> interesting hacks in the so clean x8664 tree.

No hack in this area as far as I know.
 
> > [1]  Or let's call it "I trust all my time to the CPU" and no more 
> > southrbridge
> > aka put all eggs in one basket. Given the trends in CPU power saving that
> > is a quite dangerous strategy.
> 
> No, it's not dangerous. 

It definitely caused a lot of problems in the single socket multi core world;
but yes you probably worked around all of them that I'm aware of currently.
What I just objected to was that you complained that the current x86-64
time code -- which works much more conservatively and thus needs less 
workarounds --
doesn't have all of them. You basically tried to apply the special debugging 
strategies
for clockevents to the old code and then complained that they don't work.

> We spent quite some time to make the clock events  
> layer flexible enough to handle the current problems and the design allows 
> to add more infrastructure when necessary.

Grand words for relatively simple changes. Anyways as far as I know
even for hypothetical future C2+ capable multi socket systems the current
x86-64 time code should work -- it should automatically select broadcasting.
The only thing it relies on that if there are no multi socket C1E systems
with broken APIC timers. Since that could be only future CPUs anyways
and I haven't seen any indication that of the upcomming CPUs will have
such broken C1.

> The maybe new (mis)features of  
> upcoming CPUs need to be addressed with or without clock events and they 
> need to be done careful and not by random hacks.

Not sure what random hacks you refer to.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-30 Thread Thomas Gleixner

On Sun, 30 Sep 2007, Andi Kleen wrote:


OK, this explains 2) and 3). I just looked into the code and the logic
vs. noapictimer on SMP is completely broken.


noapictimer really doesn't make any sense on non SMP imho with the old
timer architecture. That is why I never bothered to implement it.
It's purely a UP hack.


It does not matter whether it makes sense to you or not. It is a command 
line option which bricks systems. There is neither an explanation in 
Dokumentation/kernel-parameters.txt nor a check in the code, which 
disables this completely.


It makes a lot of sense even with the existing architecture. Trouble 
shooting a box, where the local apic timer does not work correctly is not 
an UP only requirement.


Yes, it is a hack, a _bad_ hack.


..and thanks for the explanation.

Thanks for finding it so quickly guys. Sounds like this will be fixed
properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt
patch too)


There is nothing really to fix currently.  Clockevents changes behaviour
majorly (always using APIC timers without irq 0 backups[1]) and that causes
problems that need new workarounds and new fixes (surprise surprise!)

That merge would probably fix a few more such "Thomas doesn't understand
the code" bugs I guess because he hacks much more on i386 than x86-64;
but if the overall result will be really better is a totally different
question.


I understand the code quite well. I'm just surprised from time to time by 
interesting hacks in the so clean x8664 tree.



[1]  Or let's call it "I trust all my time to the CPU" and no more southrbridge
aka put all eggs in one basket. Given the trends in CPU power saving that
is a quite dangerous strategy.


No, it's not dangerous. We spent quite some time to make the clock events 
layer flexible enough to handle the current problems and the design allows 
to add more infrastructure when necessary. The maybe new (mis)features of 
upcoming CPUs need to be addressed with or without clock events and they 
need to be done careful and not by random hacks.


  tglx
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-30 Thread Andi Kleen

> 
> PIT keeps jiffies (and the system) running, but the local APIC timer
> interrupts can get out of sync due to this C1E effect. 

The way C1e works on AMD is that even when one core is woken up
by the PIT the APIC timer resumes on the other core on the socket too because
the deep power saving that breaks the APIC timer is only active
with both cores idle.

And on true multi socket systems there is currently no such deep
C1e -- apic timer should always work.

At least that is how it was supposed to work and while I admit
I haven't read every mail in this endless thread closely I didn't
think Rafael's box contradicted that.
 
> I don't think this is a critical problem, but it is wrong nevertheless.
> 
> I think it's safe to revert the C1E patch

Yes the C1e patch is completely redundant on a non clockevents kernel.

> and postpone the fix to the 
> clock events conversion.

Well, a change is only needed together with clockevent's "apicrunsmaintimer"
default; but not on any non clockevents kernel.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-30 Thread Andi Kleen

> 
> > OK, this explains 2) and 3). I just looked into the code and the logic
> > vs. noapictimer on SMP is completely broken.

noapictimer really doesn't make any sense on non SMP imho with the old
timer architecture. That is why I never bothered to implement it.
It's purely a UP hack.
 
> ..and thanks for the explanation.
> 
> Thanks for finding it so quickly guys. Sounds like this will be fixed 
> properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt 
> patch too)

There is nothing really to fix currently.  Clockevents changes behaviour
majorly (always using APIC timers without irq 0 backups[1]) and that causes
problems that need new workarounds and new fixes (surprise surprise!)

That merge would probably fix a few more such "Thomas doesn't understand
the code" bugs I guess because he hacks much more on i386 than x86-64;
but if the overall result will be really better is a totally different
question.

-Andi

[1]  Or let's call it "I trust all my time to the CPU" and no more southrbridge
aka put all eggs in one basket. Given the trends in CPU power saving that
is a quite dangerous strategy.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-30 Thread Andi Kleen

 
  OK, this explains 2) and 3). I just looked into the code and the logic
  vs. noapictimer on SMP is completely broken.

noapictimer really doesn't make any sense on non SMP imho with the old
timer architecture. That is why I never bothered to implement it.
It's purely a UP hack.
 
 ..and thanks for the explanation.
 
 Thanks for finding it so quickly guys. Sounds like this will be fixed 
 properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt 
 patch too)

There is nothing really to fix currently.  Clockevents changes behaviour
majorly (always using APIC timers without irq 0 backups[1]) and that causes
problems that need new workarounds and new fixes (surprise surprise!)

That merge would probably fix a few more such Thomas doesn't understand
the code bugs I guess because he hacks much more on i386 than x86-64;
but if the overall result will be really better is a totally different
question.

-Andi

[1]  Or let's call it I trust all my time to the CPU and no more southrbridge
aka put all eggs in one basket. Given the trends in CPU power saving that
is a quite dangerous strategy.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-30 Thread Andi Kleen

 
 PIT keeps jiffies (and the system) running, but the local APIC timer
 interrupts can get out of sync due to this C1E effect. 

The way C1e works on AMD is that even when one core is woken up
by the PIT the APIC timer resumes on the other core on the socket too because
the deep power saving that breaks the APIC timer is only active
with both cores idle.

And on true multi socket systems there is currently no such deep
C1e -- apic timer should always work.

At least that is how it was supposed to work and while I admit
I haven't read every mail in this endless thread closely I didn't
think Rafael's box contradicted that.
 
 I don't think this is a critical problem, but it is wrong nevertheless.
 
 I think it's safe to revert the C1E patch

Yes the C1e patch is completely redundant on a non clockevents kernel.

 and postpone the fix to the 
 clock events conversion.

Well, a change is only needed together with clockevent's apicrunsmaintimer
default; but not on any non clockevents kernel.


-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-30 Thread Thomas Gleixner

On Sun, 30 Sep 2007, Andi Kleen wrote:


OK, this explains 2) and 3). I just looked into the code and the logic
vs. noapictimer on SMP is completely broken.


noapictimer really doesn't make any sense on non SMP imho with the old
timer architecture. That is why I never bothered to implement it.
It's purely a UP hack.


It does not matter whether it makes sense to you or not. It is a command 
line option which bricks systems. There is neither an explanation in 
Dokumentation/kernel-parameters.txt nor a check in the code, which 
disables this completely.


It makes a lot of sense even with the existing architecture. Trouble 
shooting a box, where the local apic timer does not work correctly is not 
an UP only requirement.


Yes, it is a hack, a _bad_ hack.


..and thanks for the explanation.

Thanks for finding it so quickly guys. Sounds like this will be fixed
properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt
patch too)


There is nothing really to fix currently.  Clockevents changes behaviour
majorly (always using APIC timers without irq 0 backups[1]) and that causes
problems that need new workarounds and new fixes (surprise surprise!)

That merge would probably fix a few more such Thomas doesn't understand
the code bugs I guess because he hacks much more on i386 than x86-64;
but if the overall result will be really better is a totally different
question.


I understand the code quite well. I'm just surprised from time to time by 
interesting hacks in the so clean x8664 tree.



[1]  Or let's call it I trust all my time to the CPU and no more southrbridge
aka put all eggs in one basket. Given the trends in CPU power saving that
is a quite dangerous strategy.


No, it's not dangerous. We spent quite some time to make the clock events 
layer flexible enough to handle the current problems and the design allows 
to add more infrastructure when necessary. The maybe new (mis)features of 
upcoming CPUs need to be addressed with or without clock events and they 
need to be done careful and not by random hacks.


  tglx
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-30 Thread Andi Kleen
On Sunday 30 September 2007 16:06:59 Thomas Gleixner wrote:
 On Sun, 30 Sep 2007, Andi Kleen wrote:
 
  OK, this explains 2) and 3). I just looked into the code and the logic
  vs. noapictimer on SMP is completely broken.
 
  noapictimer really doesn't make any sense on non SMP imho with the old
  timer architecture. That is why I never bothered to implement it.
  It's purely a UP hack.
 
 It does not matter whether it makes sense to you or not. It is a command 
 line option which bricks systems. 

A lot of command line options do that -- if not they would be usually
default or automatically used by the kernel.

 There is neither an explanation in  
 Dokumentation/kernel-parameters.txt nor a check in the code, which 
 disables this completely.

Fair enough. I can add a warning in the Documentation.
 
 It makes a lot of sense even with the existing architecture. Trouble 
 shooting a box, where the local apic timer does not work correctly is not 
 an UP only requirement.

It should not be needed with current systems as far as I know
(see my previous mail) 

 I understand the code quite well. I'm just surprised from time to time by 
 interesting hacks in the so clean x8664 tree.

No hack in this area as far as I know.
 
  [1]  Or let's call it I trust all my time to the CPU and no more 
  southrbridge
  aka put all eggs in one basket. Given the trends in CPU power saving that
  is a quite dangerous strategy.
 
 No, it's not dangerous. 

It definitely caused a lot of problems in the single socket multi core world;
but yes you probably worked around all of them that I'm aware of currently.
What I just objected to was that you complained that the current x86-64
time code -- which works much more conservatively and thus needs less 
workarounds --
doesn't have all of them. You basically tried to apply the special debugging 
strategies
for clockevents to the old code and then complained that they don't work.

 We spent quite some time to make the clock events  
 layer flexible enough to handle the current problems and the design allows 
 to add more infrastructure when necessary.

Grand words for relatively simple changes. Anyways as far as I know
even for hypothetical future C2+ capable multi socket systems the current
x86-64 time code should work -- it should automatically select broadcasting.
The only thing it relies on that if there are no multi socket C1E systems
with broken APIC timers. Since that could be only future CPUs anyways
and I haven't seen any indication that of the upcomming CPUs will have
such broken C1.

 The maybe new (mis)features of  
 upcoming CPUs need to be addressed with or without clock events and they 
 need to be done careful and not by random hacks.

Not sure what random hacks you refer to.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-27 Thread Rafael J. Wysocki
On Thursday, 27 September 2007 01:21, Thomas Gleixner wrote:
> On Thu, 2007-09-27 at 01:30 +0200, Rafael J. Wysocki wrote:
> > > > Tested for a couple of times with each kernel, the results seem to be
> > > > reproducible 100% of the time.
> > > 
> > > Thanks for going through this debug marathon.
> > 
> > No big deal.  I'm glad that you've found what's up.
> > 
> > Well, we still have the "CPU hotplug during suspend w/ the hrt patch" 
> > problem
> > to debug ... ;-)
> 
> Yeah. Knowing the actual line of code where it breaks might be helpful.

Instead, I have a fix (appended, against 2.6.23-rc8-mm2). :-)

Next, I'm going to enable NO_HZ and HIGH_RES_TIMERS and see what happens. ;-)

Greetings,
Rafael

---
From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Fix CPU hotplug breakage on HP nx6325 and similar boxes caused by the reference
to disable_apic_timer (labeled as __initdata) from the CPU initialization code.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/apic.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.23-rc8-mm2/arch/x86_64/kernel/apic.c
===
--- linux-2.6.23-rc8-mm2.orig/arch/x86_64/kernel/apic.c
+++ linux-2.6.23-rc8-mm2/arch/x86_64/kernel/apic.c
@@ -42,7 +42,7 @@
 
 int apic_verbosity;
 static int apic_calibrate_pmtmr __initdata;
-int disable_apic_timer __initdata;
+int disable_apic_timer __cpuinitdata;
 
 /* Local APIC timer works in C2? */
 int local_apic_timer_c2_ok;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-27 Thread Rafael J. Wysocki
On Thursday, 27 September 2007 01:21, Thomas Gleixner wrote:
 On Thu, 2007-09-27 at 01:30 +0200, Rafael J. Wysocki wrote:
Tested for a couple of times with each kernel, the results seem to be
reproducible 100% of the time.
   
   Thanks for going through this debug marathon.
  
  No big deal.  I'm glad that you've found what's up.
  
  Well, we still have the CPU hotplug during suspend w/ the hrt patch 
  problem
  to debug ... ;-)
 
 Yeah. Knowing the actual line of code where it breaks might be helpful.

Instead, I have a fix (appended, against 2.6.23-rc8-mm2). :-)

Next, I'm going to enable NO_HZ and HIGH_RES_TIMERS and see what happens. ;-)

Greetings,
Rafael

---
From: Rafael J. Wysocki [EMAIL PROTECTED]

Fix CPU hotplug breakage on HP nx6325 and similar boxes caused by the reference
to disable_apic_timer (labeled as __initdata) from the CPU initialization code.

Signed-off-by: Rafael J. Wysocki [EMAIL PROTECTED]
---
 arch/x86_64/kernel/apic.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.23-rc8-mm2/arch/x86_64/kernel/apic.c
===
--- linux-2.6.23-rc8-mm2.orig/arch/x86_64/kernel/apic.c
+++ linux-2.6.23-rc8-mm2/arch/x86_64/kernel/apic.c
@@ -42,7 +42,7 @@
 
 int apic_verbosity;
 static int apic_calibrate_pmtmr __initdata;
-int disable_apic_timer __initdata;
+int disable_apic_timer __cpuinitdata;
 
 /* Local APIC timer works in C2? */
 int local_apic_timer_c2_ok;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-26 Thread Thomas Gleixner
On Thu, 2007-09-27 at 01:30 +0200, Rafael J. Wysocki wrote:
> > > Tested for a couple of times with each kernel, the results seem to be
> > > reproducible 100% of the time.
> > 
> > Thanks for going through this debug marathon.
> 
> No big deal.  I'm glad that you've found what's up.
> 
> Well, we still have the "CPU hotplug during suspend w/ the hrt patch" problem
> to debug ... ;-)

Yeah. Knowing the actual line of code where it breaks might be helpful.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-26 Thread Rafael J. Wysocki
Thomas,

On Wednesday, 26 September 2007 23:34, Thomas Gleixner wrote:
> Rafael,
> 
> On Wed, 2007-09-26 at 23:00 +0200, Rafael J. Wysocki wrote:
> > > > > First, with the "x86-64: Disable local APIC timer use on AMD systems 
> > > > > with C1E"
> > > > > patch and my collection of suspend patches applied, the box doesn't 
> > > > > boot
> > > > > (the suspend patches don't even thouch the boot code, so they should 
> > > > > be
> > > > > irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch 
> > > > > (adjusted
> > > > > for 2.6.23-rc8) is applied in addition.  Is this expected?
> > > > 
> > > > No. That's odd. It is nothing else than adding "noapictimer" to the
> > > > kernel command line.
> > > 
> > > Seems to be reproducible, though.  I'll investigate further.
> > 
> > So far, the results are the following:
> > 
> > 1) current Linus' tree doesn't boot with any command line (regression)
> > 
> > [  Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0
> > 
> >x86-64: Disable local APIC timer use on AMD systems with C1E
> > 
> >It's not necessary for 2.6.23 and actually kills the box that it's 
> > supposed to fix. ]
> > 
> > 2) 2.6.23-rc8 w/ the "x86-64: Disable local APIC timer use on AMD systems 
> > with C1E"
> >patch applied behaves like the current -git
> > 
> > 3) 2.6.23-rc8 w/o this patch doesn't boot with either "noapictimer" _or_
> 
> OK, this explains 2) and 3). I just looked into the code and the logic
> vs. noapictimer on SMP is completely broken.
> 
> On i386 the noapictimer option not only disables the local APIC timer,
> it also registers the CPUs for broadcasting via IPI on SMP systems. 
> 
> The x8664 code uses the broadcast only when the local apic timer is
> active, i.e. "noapictimer" is not on the command line. This defeats the
> whole purpose of "noapictimer". It should be there to make boxen work,
> where the local APIC timer actually has a hardware problem, e.g. the
> nx6325.
> 
> The current implementation of x86_64 only fixes the ACPI c-states
> related problem where the APIC timer stops in C3(2), nothing else.
> 
> On nx6325 and other AMD X2 equipped systems which have the C1E enabled
> we run into the following:
> 
> PIT keeps jiffies (and the system) running, but the local APIC timer
> interrupts can get out of sync due to this C1E effect. 
> 
> I don't think this is a critical problem, but it is wrong nevertheless.
> 
> I think it's safe to revert the C1E patch and postpone the fix to the
> clock events conversion.
> 
> >   "apicmaintimer"
> 
> on your box is not going to work. See the C1E patch. "apicmaintimer"
> switches off PIT and then waits for ever for the local APIC timer
> interrupts.
> 
> > 4) 2.6.22 behaves like 2.6.23-rc8
> 
> No surprise
> 
> > 5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with
> >"noapictimer"
> > 
> > 6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the
> >"x86-64: Disable local APIC timer use on AMD systems with C1E" patch 
> > boots
> >without any extra command line options
> 
> That's consistent behaviour.
> 
> > Tested for a couple of times with each kernel, the results seem to be
> > reproducible 100% of the time.
> 
> Thanks for going through this debug marathon.

No big deal.  I'm glad that you've found what's up.

Well, we still have the "CPU hotplug during suspend w/ the hrt patch" problem
to debug ... ;-)

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-26 Thread Thomas Gleixner
On Wed, 2007-09-26 at 15:22 -0700, Linus Torvalds wrote:
> 
> On Wed, 26 Sep 2007, Thomas Gleixner wrote:
> > > 
> > > 1) current Linus' tree doesn't boot with any command line (regression)
> > > 
> > > [  Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0
> 
> Reverted.
> 
> > OK, this explains 2) and 3). I just looked into the code and the logic
> > vs. noapictimer on SMP is completely broken.
> 
> ..and thanks for the explanation.
> 
> Thanks for finding it so quickly guys. Sounds like this will be fixed 
> properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt 
> patch too)

It's even worse than I thought on the first check:

"noapictimer" on the command line of an SMP box prevents _ONLY_ the boot
CPU apic timer from being used. But the secondary CPU is still
unconditionally setting up the APIC timer and uses the non calibrated
variable calibration_result, which is of course 0, to setup the APIC
timer. Wreckage guaranteed.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-26 Thread Linus Torvalds


On Wed, 26 Sep 2007, Thomas Gleixner wrote:
> > 
> > 1) current Linus' tree doesn't boot with any command line (regression)
> > 
> > [  Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0

Reverted.

> OK, this explains 2) and 3). I just looked into the code and the logic
> vs. noapictimer on SMP is completely broken.

..and thanks for the explanation.

Thanks for finding it so quickly guys. Sounds like this will be fixed 
properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt 
patch too)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-26 Thread Thomas Gleixner
Rafael,

On Wed, 2007-09-26 at 23:00 +0200, Rafael J. Wysocki wrote:
> > > > First, with the "x86-64: Disable local APIC timer use on AMD systems 
> > > > with C1E"
> > > > patch and my collection of suspend patches applied, the box doesn't boot
> > > > (the suspend patches don't even thouch the boot code, so they should be
> > > > irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch 
> > > > (adjusted
> > > > for 2.6.23-rc8) is applied in addition.  Is this expected?
> > > 
> > > No. That's odd. It is nothing else than adding "noapictimer" to the
> > > kernel command line.
> > 
> > Seems to be reproducible, though.  I'll investigate further.
> 
> So far, the results are the following:
> 
> 1) current Linus' tree doesn't boot with any command line (regression)
> 
> [  Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0
> 
>x86-64: Disable local APIC timer use on AMD systems with C1E
> 
>It's not necessary for 2.6.23 and actually kills the box that it's 
> supposed to fix. ]
> 
> 2) 2.6.23-rc8 w/ the "x86-64: Disable local APIC timer use on AMD systems 
> with C1E"
>patch applied behaves like the current -git
> 
> 3) 2.6.23-rc8 w/o this patch doesn't boot with either "noapictimer" _or_

OK, this explains 2) and 3). I just looked into the code and the logic
vs. noapictimer on SMP is completely broken.

On i386 the noapictimer option not only disables the local APIC timer,
it also registers the CPUs for broadcasting via IPI on SMP systems. 

The x8664 code uses the broadcast only when the local apic timer is
active, i.e. "noapictimer" is not on the command line. This defeats the
whole purpose of "noapictimer". It should be there to make boxen work,
where the local APIC timer actually has a hardware problem, e.g. the
nx6325.

The current implementation of x86_64 only fixes the ACPI c-states
related problem where the APIC timer stops in C3(2), nothing else.

On nx6325 and other AMD X2 equipped systems which have the C1E enabled
we run into the following:

PIT keeps jiffies (and the system) running, but the local APIC timer
interrupts can get out of sync due to this C1E effect. 

I don't think this is a critical problem, but it is wrong nevertheless.

I think it's safe to revert the C1E patch and postpone the fix to the
clock events conversion.

>   "apicmaintimer"

on your box is not going to work. See the C1E patch. "apicmaintimer"
switches off PIT and then waits for ever for the local APIC timer
interrupts.

> 4) 2.6.22 behaves like 2.6.23-rc8

No surprise

> 5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with
>"noapictimer"
> 
> 6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the
>"x86-64: Disable local APIC timer use on AMD systems with C1E" patch boots
>without any extra command line options

That's consistent behaviour.

> Tested for a couple of times with each kernel, the results seem to be
> reproducible 100% of the time.

Thanks for going through this debug marathon.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-26 Thread Rafael J. Wysocki
On Wednesday, 26 September 2007 21:49, Rafael J. Wysocki wrote:
> On Wednesday, 26 September 2007 20:51, Thomas Gleixner wrote:
> > On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote:
> > > There still are some oddities.
> > > 
> > > First, with the "x86-64: Disable local APIC timer use on AMD systems with 
> > > C1E"
> > > patch and my collection of suspend patches applied, the box doesn't boot
> > > (the suspend patches don't even thouch the boot code, so they should be
> > > irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch 
> > > (adjusted
> > > for 2.6.23-rc8) is applied in addition.  Is this expected?
> > 
> > No. That's odd. It is nothing else than adding "noapictimer" to the
> > kernel command line.
> 
> Seems to be reproducible, though.  I'll investigate further.

So far, the results are the following:

1) current Linus' tree doesn't boot with any command line (regression)

[  Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0

   x86-64: Disable local APIC timer use on AMD systems with C1E

   It's not necessary for 2.6.23 and actually kills the box that it's supposed 
to fix. ]

2) 2.6.23-rc8 w/ the "x86-64: Disable local APIC timer use on AMD systems with 
C1E"
   patch applied behaves like the current -git

3) 2.6.23-rc8 w/o this patch doesn't boot with either "noapictimer" _or_
   "apicmaintimer"

4) 2.6.22 behaves like 2.6.23-rc8

5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with
   "noapictimer"

6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the
   "x86-64: Disable local APIC timer use on AMD systems with C1E" patch boots
   without any extra command line options

Tested for a couple of times with each kernel, the results seem to be
reproducible 100% of the time.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-26 Thread Rafael J. Wysocki
On Wednesday, 26 September 2007 20:51, Thomas Gleixner wrote:
> On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote:
> > There still are some oddities.
> > 
> > First, with the "x86-64: Disable local APIC timer use on AMD systems with 
> > C1E"
> > patch and my collection of suspend patches applied, the box doesn't boot
> > (the suspend patches don't even thouch the boot code, so they should be
> > irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch 
> > (adjusted
> > for 2.6.23-rc8) is applied in addition.  Is this expected?
> 
> No. That's odd. It is nothing else than adding "noapictimer" to the
> kernel command line.

Seems to be reproducible, though.  I'll investigate further.

> > Next, on 2.6.23-rc8 with the patches from:
> > 
> > http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/
> > 
> > plus the "x86-64: Disable local APIC timer use on AMD systems with C1E" 
> > patch
> > and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation 
> > doesn't
> > work correctly.  Although the box hibernates and restores, there is a 
> > temporary
> > "hang" during the "resume hardware" sequence, after which the "lock" led 
> > starts
> > to blink (and remains in this state) and something like this appears in 
> > dmesg:
> > 
> > Extended CMOS year: 2000
> > Enabling non-boot CPUs ...
> > SMP alternatives: switching to SMP code
> > Booting processor 1/2 APIC 0x1
> > Initializing CPU#1
> > Calibrating delay using timer specific routine.. 3990.36 BogoMIPS 
> > (lpj=7980735)
> > CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> > CPU: L2 Cache: 512K (64 bytes/line)
> > Unable to handle kernel paging request at 806c64d4 RIP: 
> >  [] identify_cpu+0x2ac/0x5a1
> 
> Hmm. That's really early in the CPU bring up. The only change in this
> area is the C1E patch. Can you decode the exact source line, where it is
> failing ?

Yes, I can, but I'll first see what's wrong with the boot.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-26 Thread Thomas Gleixner
On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote:
> There still are some oddities.
> 
> First, with the "x86-64: Disable local APIC timer use on AMD systems with C1E"
> patch and my collection of suspend patches applied, the box doesn't boot
> (the suspend patches don't even thouch the boot code, so they should be
> irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted
> for 2.6.23-rc8) is applied in addition.  Is this expected?

No. That's odd. It is nothing else than adding "noapictimer" to the
kernel command line.

> Next, on 2.6.23-rc8 with the patches from:
> 
> http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/
> 
> plus the "x86-64: Disable local APIC timer use on AMD systems with C1E" patch
> and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation doesn't
> work correctly.  Although the box hibernates and restores, there is a 
> temporary
> "hang" during the "resume hardware" sequence, after which the "lock" led 
> starts
> to blink (and remains in this state) and something like this appears in dmesg:
> 
> Extended CMOS year: 2000
> Enabling non-boot CPUs ...
> SMP alternatives: switching to SMP code
> Booting processor 1/2 APIC 0x1
> Initializing CPU#1
> Calibrating delay using timer specific routine.. 3990.36 BogoMIPS 
> (lpj=7980735)
> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> CPU: L2 Cache: 512K (64 bytes/line)
> Unable to handle kernel paging request at 806c64d4 RIP: 
>  [] identify_cpu+0x2ac/0x5a1

Hmm. That's really early in the CPU bring up. The only change in this
area is the C1E patch. Can you decode the exact source line, where it is
failing ?

tglx



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-26 Thread Rafael J. Wysocki
Thomas,

On Tuesday, 25 September 2007 23:24, Thomas Gleixner wrote:
> Rafael,
> 
> On Tue, 2007-09-25 at 23:28 +0200, Rafael J. Wysocki wrote:
> > > I'm a bit confused by your earlier confirmation, that mainline w/o the
> > > -hrt patches boots fine, when you add "apicmaintimer" to the kernel
> > > command line. "apicmaintimer" stops the PIT like we do in -hrt and we
> > > just use the local APIC timer for everything. Can you please retest and
> > > confirm that this is correct ?
> > 
> > No, it's not.  The mainline _usually_ doesn't boot with "apicmaintimer".
> > 
> > It seems to me that _sometimes_ the CPU just doesn't enter this C1E state
> > and then everything goes fine ...
> 
> I'm relieved. I really started to go nuts on this contradicting
> patterns.
> 
> Your box seems to be worse than the VAIO, it has some random surprise
> generator built in :)
> 
> > > Is the 32 bit kernel working on that box ?
> > 
> > Can't tell, I have only 64-bit userland here.
> 
> Should be fine. The check is there since late 2.6.21-rc. I really could
> kick my own ass that I did not remember the nx6325 wreckage in the
> 2.6.21-rc time frame. Sigh, way too much broken hardware out there to
> keep track of it.
> 
> > > Thanks for your patience.
> > 
> > Well, I'm only making sure that future kernels will run on my box. ;-)
> 
> Nothing wrong with that. Thanks again for your help,

There still are some oddities.

First, with the "x86-64: Disable local APIC timer use on AMD systems with C1E"
patch and my collection of suspend patches applied, the box doesn't boot
(the suspend patches don't even thouch the boot code, so they should be
irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted
for 2.6.23-rc8) is applied in addition.  Is this expected?

Next, on 2.6.23-rc8 with the patches from:

http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/

plus the "x86-64: Disable local APIC timer use on AMD systems with C1E" patch
and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation doesn't
work correctly.  Although the box hibernates and restores, there is a temporary
"hang" during the "resume hardware" sequence, after which the "lock" led starts
to blink (and remains in this state) and something like this appears in dmesg:

Extended CMOS year: 2000
Enabling non-boot CPUs ...
SMP alternatives: switching to SMP code
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 3990.36 BogoMIPS (lpj=7980735)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
Unable to handle kernel paging request at 806c64d4 RIP: 
 [] identify_cpu+0x2ac/0x5a1
PGD 203067 PUD 207063 PMD 37fb4163 PTE 6c6000
Oops: 0002 [1] SMP 
CPU 1 
Modules linked in: ip6t_LOG nf_conntrack_ipv6 xt_pkttype ipt_LOG xt_limit 
cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave 
powernow_k8 freq_table thermal processor fan snd_pcm_oss button snd_mixer_oss 
snd_seq battery snd_seq_device ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state 
iptable_mangle iptable_nat nf_nat iptable_filter ip6table_mangle 
nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables ip6table_filter ip6_tables 
x_tables ipv6 loop dm_mod rfcomm hidp l2cap usbhid ff_memless psmouse hci_usb 
bluetooth pcmcia tg3 ohci_hcd snd_hda_intel ehci_hcd yenta_socket 
rsrc_nonstatic ide_cd ohci1394 k8temp i2c_piix4 pcmcia_core sdhci shpchp 
snd_pcm usbcore hwmon i2c_core rtc_cmos rtc_core rtc_lib ieee1394 mmc_core 
tifm_7xx1 tifm_core pci_hotplug snd_timer cdrom snd firmware_class 
ieee80211softmac ieee80211 ieee80211_crypt soundcore snd_page_alloc ext3 jbd 
edd atiixp ide_disk ide_core sg
Pid: 0, comm: swapper Not tainted 2.6.23-rc8-rjw #6
RIP: 0010:[]  [] identify_cpu+0x2ac/0x5a1
RSP: 0018:810037abdea8  EFLAGS: 00010006
RAX: 14008015 RBX: 01020800 RCX: c0010055
RDX:  RSI: 0004 RDI: 0001
RBP: 810037abded8 R08:  R09: 80444ad0
R10: 8070c860 R11: 0001 R12: 805920c0
R13:  R14:  R15: 
FS:  () GS:810037ac3e88() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 806c64d4 CR3: 00201000 CR4: 06a0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 0, threadinfo 810037abc000, task 810037a8f800)
Stack:  000f4e5a1540 059f 0001 805920c0
 0001  810037abdef8 8021acaa
 059f  810037abdf48 8021b380
Call Trace:
 [] smp_callin+0xc8/0xde
 [] start_secondary+0x1b/0x2e8


Code: c7 05 ff 5f 4b 00 01 00 00 00 e9 4f 01 00 00 4c 89 e7 e8 27 
RIP  [] identify_cpu+0x2ac/0x5a1
 RSP 
CR2: 806c64d4

Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-26 Thread Rafael J. Wysocki
Thomas,

On Tuesday, 25 September 2007 23:24, Thomas Gleixner wrote:
 Rafael,
 
 On Tue, 2007-09-25 at 23:28 +0200, Rafael J. Wysocki wrote:
   I'm a bit confused by your earlier confirmation, that mainline w/o the
   -hrt patches boots fine, when you add apicmaintimer to the kernel
   command line. apicmaintimer stops the PIT like we do in -hrt and we
   just use the local APIC timer for everything. Can you please retest and
   confirm that this is correct ?
  
  No, it's not.  The mainline _usually_ doesn't boot with apicmaintimer.
  
  It seems to me that _sometimes_ the CPU just doesn't enter this C1E state
  and then everything goes fine ...
 
 I'm relieved. I really started to go nuts on this contradicting
 patterns.
 
 Your box seems to be worse than the VAIO, it has some random surprise
 generator built in :)
 
   Is the 32 bit kernel working on that box ?
  
  Can't tell, I have only 64-bit userland here.
 
 Should be fine. The check is there since late 2.6.21-rc. I really could
 kick my own ass that I did not remember the nx6325 wreckage in the
 2.6.21-rc time frame. Sigh, way too much broken hardware out there to
 keep track of it.
 
   Thanks for your patience.
  
  Well, I'm only making sure that future kernels will run on my box. ;-)
 
 Nothing wrong with that. Thanks again for your help,

There still are some oddities.

First, with the x86-64: Disable local APIC timer use on AMD systems with C1E
patch and my collection of suspend patches applied, the box doesn't boot
(the suspend patches don't even thouch the boot code, so they should be
irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted
for 2.6.23-rc8) is applied in addition.  Is this expected?

Next, on 2.6.23-rc8 with the patches from:

http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/

plus the x86-64: Disable local APIC timer use on AMD systems with C1E patch
and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation doesn't
work correctly.  Although the box hibernates and restores, there is a temporary
hang during the resume hardware sequence, after which the lock led starts
to blink (and remains in this state) and something like this appears in dmesg:

Extended CMOS year: 2000
Enabling non-boot CPUs ...
SMP alternatives: switching to SMP code
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 3990.36 BogoMIPS (lpj=7980735)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
Unable to handle kernel paging request at 806c64d4 RIP: 
 [802104cb] identify_cpu+0x2ac/0x5a1
PGD 203067 PUD 207063 PMD 37fb4163 PTE 6c6000
Oops: 0002 [1] SMP 
CPU 1 
Modules linked in: ip6t_LOG nf_conntrack_ipv6 xt_pkttype ipt_LOG xt_limit 
cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave 
powernow_k8 freq_table thermal processor fan snd_pcm_oss button snd_mixer_oss 
snd_seq battery snd_seq_device ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state 
iptable_mangle iptable_nat nf_nat iptable_filter ip6table_mangle 
nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables ip6table_filter ip6_tables 
x_tables ipv6 loop dm_mod rfcomm hidp l2cap usbhid ff_memless psmouse hci_usb 
bluetooth pcmcia tg3 ohci_hcd snd_hda_intel ehci_hcd yenta_socket 
rsrc_nonstatic ide_cd ohci1394 k8temp i2c_piix4 pcmcia_core sdhci shpchp 
snd_pcm usbcore hwmon i2c_core rtc_cmos rtc_core rtc_lib ieee1394 mmc_core 
tifm_7xx1 tifm_core pci_hotplug snd_timer cdrom snd firmware_class 
ieee80211softmac ieee80211 ieee80211_crypt soundcore snd_page_alloc ext3 jbd 
edd atiixp ide_disk ide_core sg
Pid: 0, comm: swapper Not tainted 2.6.23-rc8-rjw #6
RIP: 0010:[802104cb]  [802104cb] identify_cpu+0x2ac/0x5a1
RSP: 0018:810037abdea8  EFLAGS: 00010006
RAX: 14008015 RBX: 01020800 RCX: c0010055
RDX:  RSI: 0004 RDI: 0001
RBP: 810037abded8 R08:  R09: 80444ad0
R10: 8070c860 R11: 0001 R12: 805920c0
R13:  R14:  R15: 
FS:  () GS:810037ac3e88() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 806c64d4 CR3: 00201000 CR4: 06a0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 0, threadinfo 810037abc000, task 810037a8f800)
Stack:  000f4e5a1540 059f 0001 805920c0
 0001  810037abdef8 8021acaa
 059f  810037abdf48 8021b380
Call Trace:
 [8021acaa] smp_callin+0xc8/0xde
 [8021b380] start_secondary+0x1b/0x2e8


Code: c7 05 ff 5f 4b 00 01 00 00 00 e9 4f 01 00 00 4c 89 e7 e8 27 
RIP  [802104cb] identify_cpu+0x2ac/0x5a1
 RSP 

Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-26 Thread Thomas Gleixner
On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote:
 There still are some oddities.
 
 First, with the x86-64: Disable local APIC timer use on AMD systems with C1E
 patch and my collection of suspend patches applied, the box doesn't boot
 (the suspend patches don't even thouch the boot code, so they should be
 irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted
 for 2.6.23-rc8) is applied in addition.  Is this expected?

No. That's odd. It is nothing else than adding noapictimer to the
kernel command line.

 Next, on 2.6.23-rc8 with the patches from:
 
 http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/
 
 plus the x86-64: Disable local APIC timer use on AMD systems with C1E patch
 and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation doesn't
 work correctly.  Although the box hibernates and restores, there is a 
 temporary
 hang during the resume hardware sequence, after which the lock led 
 starts
 to blink (and remains in this state) and something like this appears in dmesg:
 
 Extended CMOS year: 2000
 Enabling non-boot CPUs ...
 SMP alternatives: switching to SMP code
 Booting processor 1/2 APIC 0x1
 Initializing CPU#1
 Calibrating delay using timer specific routine.. 3990.36 BogoMIPS 
 (lpj=7980735)
 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
 CPU: L2 Cache: 512K (64 bytes/line)
 Unable to handle kernel paging request at 806c64d4 RIP: 
  [802104cb] identify_cpu+0x2ac/0x5a1

Hmm. That's really early in the CPU bring up. The only change in this
area is the C1E patch. Can you decode the exact source line, where it is
failing ?

tglx



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-26 Thread Rafael J. Wysocki
On Wednesday, 26 September 2007 20:51, Thomas Gleixner wrote:
 On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote:
  There still are some oddities.
  
  First, with the x86-64: Disable local APIC timer use on AMD systems with 
  C1E
  patch and my collection of suspend patches applied, the box doesn't boot
  (the suspend patches don't even thouch the boot code, so they should be
  irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch 
  (adjusted
  for 2.6.23-rc8) is applied in addition.  Is this expected?
 
 No. That's odd. It is nothing else than adding noapictimer to the
 kernel command line.

Seems to be reproducible, though.  I'll investigate further.

  Next, on 2.6.23-rc8 with the patches from:
  
  http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/
  
  plus the x86-64: Disable local APIC timer use on AMD systems with C1E 
  patch
  and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation 
  doesn't
  work correctly.  Although the box hibernates and restores, there is a 
  temporary
  hang during the resume hardware sequence, after which the lock led 
  starts
  to blink (and remains in this state) and something like this appears in 
  dmesg:
  
  Extended CMOS year: 2000
  Enabling non-boot CPUs ...
  SMP alternatives: switching to SMP code
  Booting processor 1/2 APIC 0x1
  Initializing CPU#1
  Calibrating delay using timer specific routine.. 3990.36 BogoMIPS 
  (lpj=7980735)
  CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  CPU: L2 Cache: 512K (64 bytes/line)
  Unable to handle kernel paging request at 806c64d4 RIP: 
   [802104cb] identify_cpu+0x2ac/0x5a1
 
 Hmm. That's really early in the CPU bring up. The only change in this
 area is the C1E patch. Can you decode the exact source line, where it is
 failing ?

Yes, I can, but I'll first see what's wrong with the boot.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-26 Thread Rafael J. Wysocki
On Wednesday, 26 September 2007 21:49, Rafael J. Wysocki wrote:
 On Wednesday, 26 September 2007 20:51, Thomas Gleixner wrote:
  On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote:
   There still are some oddities.
   
   First, with the x86-64: Disable local APIC timer use on AMD systems with 
   C1E
   patch and my collection of suspend patches applied, the box doesn't boot
   (the suspend patches don't even thouch the boot code, so they should be
   irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch 
   (adjusted
   for 2.6.23-rc8) is applied in addition.  Is this expected?
  
  No. That's odd. It is nothing else than adding noapictimer to the
  kernel command line.
 
 Seems to be reproducible, though.  I'll investigate further.

So far, the results are the following:

1) current Linus' tree doesn't boot with any command line (regression)

[  Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0

   x86-64: Disable local APIC timer use on AMD systems with C1E

   It's not necessary for 2.6.23 and actually kills the box that it's supposed 
to fix. ]

2) 2.6.23-rc8 w/ the x86-64: Disable local APIC timer use on AMD systems with 
C1E
   patch applied behaves like the current -git

3) 2.6.23-rc8 w/o this patch doesn't boot with either noapictimer _or_
   apicmaintimer

4) 2.6.22 behaves like 2.6.23-rc8

5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with
   noapictimer

6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the
   x86-64: Disable local APIC timer use on AMD systems with C1E patch boots
   without any extra command line options

Tested for a couple of times with each kernel, the results seem to be
reproducible 100% of the time.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-26 Thread Thomas Gleixner
Rafael,

On Wed, 2007-09-26 at 23:00 +0200, Rafael J. Wysocki wrote:
First, with the x86-64: Disable local APIC timer use on AMD systems 
with C1E
patch and my collection of suspend patches applied, the box doesn't boot
(the suspend patches don't even thouch the boot code, so they should be
irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch 
(adjusted
for 2.6.23-rc8) is applied in addition.  Is this expected?
   
   No. That's odd. It is nothing else than adding noapictimer to the
   kernel command line.
  
  Seems to be reproducible, though.  I'll investigate further.
 
 So far, the results are the following:
 
 1) current Linus' tree doesn't boot with any command line (regression)
 
 [  Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0
 
x86-64: Disable local APIC timer use on AMD systems with C1E
 
It's not necessary for 2.6.23 and actually kills the box that it's 
 supposed to fix. ]
 
 2) 2.6.23-rc8 w/ the x86-64: Disable local APIC timer use on AMD systems 
 with C1E
patch applied behaves like the current -git
 
 3) 2.6.23-rc8 w/o this patch doesn't boot with either noapictimer _or_

OK, this explains 2) and 3). I just looked into the code and the logic
vs. noapictimer on SMP is completely broken.

On i386 the noapictimer option not only disables the local APIC timer,
it also registers the CPUs for broadcasting via IPI on SMP systems. 

The x8664 code uses the broadcast only when the local apic timer is
active, i.e. noapictimer is not on the command line. This defeats the
whole purpose of noapictimer. It should be there to make boxen work,
where the local APIC timer actually has a hardware problem, e.g. the
nx6325.

The current implementation of x86_64 only fixes the ACPI c-states
related problem where the APIC timer stops in C3(2), nothing else.

On nx6325 and other AMD X2 equipped systems which have the C1E enabled
we run into the following:

PIT keeps jiffies (and the system) running, but the local APIC timer
interrupts can get out of sync due to this C1E effect. 

I don't think this is a critical problem, but it is wrong nevertheless.

I think it's safe to revert the C1E patch and postpone the fix to the
clock events conversion.

   apicmaintimer

on your box is not going to work. See the C1E patch. apicmaintimer
switches off PIT and then waits for ever for the local APIC timer
interrupts.

 4) 2.6.22 behaves like 2.6.23-rc8

No surprise

 5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with
noapictimer
 
 6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the
x86-64: Disable local APIC timer use on AMD systems with C1E patch boots
without any extra command line options

That's consistent behaviour.

 Tested for a couple of times with each kernel, the results seem to be
 reproducible 100% of the time.

Thanks for going through this debug marathon.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-26 Thread Linus Torvalds


On Wed, 26 Sep 2007, Thomas Gleixner wrote:
  
  1) current Linus' tree doesn't boot with any command line (regression)
  
  [  Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0

Reverted.

 OK, this explains 2) and 3). I just looked into the code and the logic
 vs. noapictimer on SMP is completely broken.

..and thanks for the explanation.

Thanks for finding it so quickly guys. Sounds like this will be fixed 
properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt 
patch too)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-26 Thread Thomas Gleixner
On Wed, 2007-09-26 at 15:22 -0700, Linus Torvalds wrote:
 
 On Wed, 26 Sep 2007, Thomas Gleixner wrote:
   
   1) current Linus' tree doesn't boot with any command line (regression)
   
   [  Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0
 
 Reverted.
 
  OK, this explains 2) and 3). I just looked into the code and the logic
  vs. noapictimer on SMP is completely broken.
 
 ..and thanks for the explanation.
 
 Thanks for finding it so quickly guys. Sounds like this will be fixed 
 properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt 
 patch too)

It's even worse than I thought on the first check:

noapictimer on the command line of an SMP box prevents _ONLY_ the boot
CPU apic timer from being used. But the secondary CPU is still
unconditionally setting up the APIC timer and uses the non calibrated
variable calibration_result, which is of course 0, to setup the APIC
timer. Wreckage guaranteed.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-26 Thread Rafael J. Wysocki
Thomas,

On Wednesday, 26 September 2007 23:34, Thomas Gleixner wrote:
 Rafael,
 
 On Wed, 2007-09-26 at 23:00 +0200, Rafael J. Wysocki wrote:
 First, with the x86-64: Disable local APIC timer use on AMD systems 
 with C1E
 patch and my collection of suspend patches applied, the box doesn't 
 boot
 (the suspend patches don't even thouch the boot code, so they should 
 be
 irrelevant here).  However, it boots if patch-2.6.23-rc7-hrt1.patch 
 (adjusted
 for 2.6.23-rc8) is applied in addition.  Is this expected?

No. That's odd. It is nothing else than adding noapictimer to the
kernel command line.
   
   Seems to be reproducible, though.  I'll investigate further.
  
  So far, the results are the following:
  
  1) current Linus' tree doesn't boot with any command line (regression)
  
  [  Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0
  
 x86-64: Disable local APIC timer use on AMD systems with C1E
  
 It's not necessary for 2.6.23 and actually kills the box that it's 
  supposed to fix. ]
  
  2) 2.6.23-rc8 w/ the x86-64: Disable local APIC timer use on AMD systems 
  with C1E
 patch applied behaves like the current -git
  
  3) 2.6.23-rc8 w/o this patch doesn't boot with either noapictimer _or_
 
 OK, this explains 2) and 3). I just looked into the code and the logic
 vs. noapictimer on SMP is completely broken.
 
 On i386 the noapictimer option not only disables the local APIC timer,
 it also registers the CPUs for broadcasting via IPI on SMP systems. 
 
 The x8664 code uses the broadcast only when the local apic timer is
 active, i.e. noapictimer is not on the command line. This defeats the
 whole purpose of noapictimer. It should be there to make boxen work,
 where the local APIC timer actually has a hardware problem, e.g. the
 nx6325.
 
 The current implementation of x86_64 only fixes the ACPI c-states
 related problem where the APIC timer stops in C3(2), nothing else.
 
 On nx6325 and other AMD X2 equipped systems which have the C1E enabled
 we run into the following:
 
 PIT keeps jiffies (and the system) running, but the local APIC timer
 interrupts can get out of sync due to this C1E effect. 
 
 I don't think this is a critical problem, but it is wrong nevertheless.
 
 I think it's safe to revert the C1E patch and postpone the fix to the
 clock events conversion.
 
apicmaintimer
 
 on your box is not going to work. See the C1E patch. apicmaintimer
 switches off PIT and then waits for ever for the local APIC timer
 interrupts.
 
  4) 2.6.22 behaves like 2.6.23-rc8
 
 No surprise
 
  5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with
 noapictimer
  
  6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the
 x86-64: Disable local APIC timer use on AMD systems with C1E patch 
  boots
 without any extra command line options
 
 That's consistent behaviour.
 
  Tested for a couple of times with each kernel, the results seem to be
  reproducible 100% of the time.
 
 Thanks for going through this debug marathon.

No big deal.  I'm glad that you've found what's up.

Well, we still have the CPU hotplug during suspend w/ the hrt patch problem
to debug ... ;-)

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)

2007-09-26 Thread Thomas Gleixner
On Thu, 2007-09-27 at 01:30 +0200, Rafael J. Wysocki wrote:
   Tested for a couple of times with each kernel, the results seem to be
   reproducible 100% of the time.
  
  Thanks for going through this debug marathon.
 
 No big deal.  I'm glad that you've found what's up.
 
 Well, we still have the CPU hotplug during suspend w/ the hrt patch problem
 to debug ... ;-)

Yeah. Knowing the actual line of code where it breaks might be helpful.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Thomas Gleixner
Rafael,

On Tue, 2007-09-25 at 23:28 +0200, Rafael J. Wysocki wrote:
> > I'm a bit confused by your earlier confirmation, that mainline w/o the
> > -hrt patches boots fine, when you add "apicmaintimer" to the kernel
> > command line. "apicmaintimer" stops the PIT like we do in -hrt and we
> > just use the local APIC timer for everything. Can you please retest and
> > confirm that this is correct ?
> 
> No, it's not.  The mainline _usually_ doesn't boot with "apicmaintimer".
> 
> It seems to me that _sometimes_ the CPU just doesn't enter this C1E state
> and then everything goes fine ...

I'm relieved. I really started to go nuts on this contradicting
patterns.

Your box seems to be worse than the VAIO, it has some random surprise
generator built in :)

> > Is the 32 bit kernel working on that box ?
> 
> Can't tell, I have only 64-bit userland here.

Should be fine. The check is there since late 2.6.21-rc. I really could
kick my own ass that I did not remember the nx6325 wreckage in the
2.6.21-rc time frame. Sigh, way too much broken hardware out there to
keep track of it.

> > Thanks for your patience.
> 
> Well, I'm only making sure that future kernels will run on my box. ;-)

Nothing wrong with that. Thanks again for your help,

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Rafael J. Wysocki
Thomas,

On Tuesday, 25 September 2007 22:46, Thomas Gleixner wrote:
> Rafael,
> 
> On Tue, 2007-09-25 at 22:07 +0200, Rafael J. Wysocki wrote:
> > On Tuesday, 25 September 2007 15:17, Thomas Gleixner wrote:
> > > On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote:
> > [--snip--]
> > > 
> > > I start to get desperate. Below is a patch, which moves the apic timer
> > > disable check after the calibration routine. Can you please apply on top
> > > of -hrt and add "noapictimer" to the command line ? Does it boot ?
> >
> > 2.6.23-rc7 with patch-2.6.23-rc7-hrt1.patch and the patch below applied 
> > boots
> > with noapictimer and doesn't boot without it.
> 
> That was expected. I explicitly asked to add "noapictimer" to the kernel
> command line.
> 
> Ok, so we ruled out the apic timer calibration routine. I did not expect
> that this would be the culprit, but with "dark screen" as the only debug
> info, I need to resort to small steps.
> 
> Can you please send me the output of /proc/timer_list of 2.6.23-rc7-hrt1
> after booting with "noapictimer" ?

Sure, attached.  [Note: the kernel has been compiled with both NO_HZ and
HIGH_RES_TIMERS unset.]

> I'm a bit confused by your earlier confirmation, that mainline w/o the
> -hrt patches boots fine, when you add "apicmaintimer" to the kernel
> command line. "apicmaintimer" stops the PIT like we do in -hrt and we
> just use the local APIC timer for everything. Can you please retest and
> confirm that this is correct ?

No, it's not.  The mainline _usually_ doesn't boot with "apicmaintimer".

It seems to me that _sometimes_ the CPU just doesn't enter this C1E state
and then everything goes fine ...

> Is the 32 bit kernel working on that box ?

Can't tell, I have only 64-bit userland here.

> Thanks for your patience.

Well, I'm only making sure that future kernels will run on my box. ;-)

>   tglx
> 
> PS: I just sent out the "disable APIC timer for AMD C1E boxen" patch.

Yes, I've already tested it and sent a reply.  It works. :-)

> We debugged this half a year ago on a nx6325, but I completely forgot about
> that. The explanation from AMD was sensible, but your "apicmaintimer"
> works statement is contradictory.

Well, it was wrong.

I have some problems with resuming from suspend to RAM using 2.6.23-rc8-mm1
with this patch applied, but I think they are related to something else.  I'll
wait for the next -mm with debugging that.

For now, I'm going to build 2.6.23-rc8 with my collection of suspend patches
plus patch-2.6.23-rc7-hrt1.patch and the "disable APIC timer for AMD C1E boxes"
patch applied.  I'll play with that a bit and let you know how it's behaving.

Greetings,
Rafael
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: 2
now at 279792107058 nsecs

cpu: 0
 clock 0:
  .index:  0
  .resolution: 4000250 nsecs
  .get_time:   ktime_get_real
active timers:
 clock 1:
  .index:  1
  .resolution: 4000250 nsecs
  .get_time:   ktime_get
active timers:
 #0: , hrtimer_wakeup, S:01, do_nanosleep, kwrapper/4664
 # expires at 280207419178 nsecs [in 415312120 nsecs]
 #1: , hrtimer_wakeup, S:01, futex_wait, nscd/4080
 # expires at 282678021548 nsecs [in 2885914490 nsecs]
 #2: , hrtimer_wakeup, S:01, futex_wait, nscd/4082
 # expires at 282678129670 nsecs [in 2886022612 nsecs]
 #3: , it_real_fn, S:01, do_setitimer, qmgr/4239
 # expires at 378654389676 nsecs [in 98862282618 nsecs]
 #4: , it_real_fn, S:01, do_setitimer, pickup/4238
 # expires at 557809025993 nsecs [in 278016918935 nsecs]
 #5: , it_real_fn, S:01, do_setitimer, master/4216
 # expires at 557809137746 nsecs [in 278017030688 nsecs]

cpu: 1
 clock 0:
  .index:  0
  .resolution: 4000250 nsecs
  .get_time:   ktime_get_real
active timers:
 clock 1:
  .index:  1
  .resolution: 4000250 nsecs
  .get_time:   ktime_get
active timers:
 #0: , it_real_fn, S:01, do_setitimer, Xorg/4355
 # expires at 279804542721 nsecs [in 12435663 nsecs]
 #1: , it_real_fn, S:01, do_setitimer, ssh-agent/4611
 # expires at 279962268496 nsecs [in 170161438 nsecs]
 #2: , hrtimer_wakeup, S:01, do_nanosleep, 
hald-addon-stor/4148
 # expires at 280071774352 nsecs [in 279667294 nsecs]
 #3: , hrtimer_wakeup, S:01, futex_wait, nscd/4081
 # expires at 282678034680 nsecs [in 2885927622 nsecs]
 #4: , hrtimer_wakeup, S:01, do_nanosleep, cron/4241
 # expires at 335311096287 nsecs [in 55518989229 nsecs]
 #5: , it_real_fn, S:01, do_setitimer, dhcpcd/5128
 # expires at 604918992928181 nsecs [in 604639200821123 nsecs]
 #6: , hrtimer_wakeup, S:01, do_nanosleep, dhcpcd/5128
 # expires at 604918992950531 nsecs [in 604639200843473 nsecs]


Tick Device: mode: 0
Clock Event Device: pit
 max_delta_ns:   27461866
 min_delta_ns:   12571
 mult:   5124677
 shift:  32
 mode:   2
 next_event: 9223372036854775807 nsecs
 set_next_event: pit_next_event
 set_mode:   init_pit_timer
 event_handler:  tick_handle_periodic_broadcast
tick_broadcast_mask: 0003


Tick Device: mode: 0
Clock Event Device: lapic
 

Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Thomas Gleixner
Rafael,

On Tue, 2007-09-25 at 22:07 +0200, Rafael J. Wysocki wrote:
> On Tuesday, 25 September 2007 15:17, Thomas Gleixner wrote:
> > On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote:
> [--snip--]
> > 
> > I start to get desperate. Below is a patch, which moves the apic timer
> > disable check after the calibration routine. Can you please apply on top
> > of -hrt and add "noapictimer" to the command line ? Does it boot ?
>
> 2.6.23-rc7 with patch-2.6.23-rc7-hrt1.patch and the patch below applied boots
> with noapictimer and doesn't boot without it.

That was expected. I explicitly asked to add "noapictimer" to the kernel
command line.

Ok, so we ruled out the apic timer calibration routine. I did not expect
that this would be the culprit, but with "dark screen" as the only debug
info, I need to resort to small steps.

Can you please send me the output of /proc/timer_list of 2.6.23-rc7-hrt1
after booting with "noapictimer" ?

I'm a bit confused by your earlier confirmation, that mainline w/o the
-hrt patches boots fine, when you add "apicmaintimer" to the kernel
command line. "apicmaintimer" stops the PIT like we do in -hrt and we
just use the local APIC timer for everything. Can you please retest and
confirm that this is correct ?

Is the 32 bit kernel working on that box ?

Thanks for your patience.

tglx

PS: I just sent out the "disable APIC timer for AMD C1E boxen" patch. We
debugged this half a year ago on a nx6325, but I completely forgot about
that. The explanation from AMD was sensible, but your "apicmaintimer"
works statement is contradictory.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Rafael J. Wysocki
On Tuesday, 25 September 2007 15:17, Thomas Gleixner wrote:
> On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote:
[--snip--]
> 
> I start to get desperate. Below is a patch, which moves the apic timer
> disable check after the calibration routine. Can you please apply on top
> of -hrt and add "noapictimer" to the command line ? Does it boot ?

2.6.23-rc7 with patch-2.6.23-rc7-hrt1.patch and the patch below applied boots
with noapictimer and doesn't boot without it.

Also, attached is the output of

# cat /proc/interrupts; sleep 10; cat /proc/interrupts

from the current mainline.

Greetings,
Rafael


> Index: linux-2.6.23-rc7/arch/x86_64/kernel/apic.c
> ===
> --- linux-2.6.23-rc7.orig/arch/x86_64/kernel/apic.c   2007-09-24 
> 20:30:00.0 +0200
> +++ linux-2.6.23-rc7/arch/x86_64/kernel/apic.c2007-09-25 
> 15:05:32.0 +0200
> @@ -927,6 +927,7 @@ static void __init calibrate_APIC_clock(
>  
>  void __init setup_boot_APIC_clock (void)
>  {
> +#if 0
>   /*
>* The local apic timer can be disabled via the kernel commandline.
>* Register the lapic timer as a dummy clock event source on SMP
> @@ -940,7 +941,7 @@ void __init setup_boot_APIC_clock (void)
>   setup_APIC_timer();
>   return;
>   }
> -
> +#endif
>   printk(KERN_INFO "Using local APIC timer interrupts.\n");
>   calibrate_APIC_clock();
>  
> @@ -949,11 +950,13 @@ void __init setup_boot_APIC_clock (void)
>* PIT/HPET going.  Otherwise register lapic as a dummy
>* device.
>*/
> - if (nmi_watchdog != NMI_IO_APIC)
> + if (!disable_apic_timer && nmi_watchdog != NMI_IO_APIC)
>   lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY;
> +#if 0
>   else
>   printk(KERN_WARNING "APIC timer registered as dummy,"
>  " due to nmi_watchdog=1!\n");
> +#endif
>  
>   setup_APIC_timer();
>  }
> 
> 
> 
> 

-- 
"Premature optimization is the root of all evil." - Donald Knuth
albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts
   CPU0   CPU1
  0:  62489  0  local-APIC-edge  timer
  1:  3232   IO-APIC-edge  i8042
  8:  0  0   IO-APIC-edge  rtc
 12:  1147   IO-APIC-edge  i8042
 14: 15   1947   IO-APIC-edge  ide0
 16:193  14151   IO-APIC-fasteoi   sata_sil, HDA Intel
 19: 76  43153   IO-APIC-fasteoi   ohci_hcd:usb1, ehci_hcd:usb2, ohci_hcd:usb3
 20:  0  4   IO-APIC-fasteoi   ohci1394, tifm_7xx1, yenta, sdhci:slot0
 21:  7172   IO-APIC-fasteoi   acpi
NMI:  0  0
LOC:  62454  62082
ERR:  0
   CPU0   CPU1
  0:  64993  0  local-APIC-edge  timer
  1:  3233   IO-APIC-edge  i8042
  8:  0  0   IO-APIC-edge  rtc
 12:  1147   IO-APIC-edge  i8042
 14: 15   2037   IO-APIC-edge  ide0
 16:194  14265   IO-APIC-fasteoi   sata_sil, HDA Intel
 19: 77  45155   IO-APIC-fasteoi   ohci_hcd:usb1, ehci_hcd:usb2, ohci_hcd:usb3
 20:  0  4   IO-APIC-fasteoi   ohci1394, tifm_7xx1, yenta, sdhci:slot0
 21:  7176   IO-APIC-fasteoi   acpi
NMI:  0  0
LOC:  64958  64586
ERR:  0
albercik:~ #


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Thomas Gleixner
On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote:
> > > There seems to be a history effect in the box, to make things more
> > > "interesting".
> > 
> > Did you connect this box to Andrews VAIO during KS ?
> 
> No, but it's famous for being interestingly broken nevertheless.

:)

> > > I think the only solid data point so far is that "noapictimer" makes the 
> > > box
> > > boot.
> > 
> > Ok. Can you add "nmi_watchdog=1" to the command line please. This runs
> > through the calibration of APIC, but registers it as a dummy clock
> > source (the PIT must run to make the watchdog work).
> > 
> > If it boots, please provide the output of /proc/timer_list
> 
> No, it doesn't.

I start to get desperate. Below is a patch, which moves the apic timer
disable check after the calibration routine. Can you please apply on top
of -hrt and add "noapictimer" to the command line ? Does it boot ?

tglx

Index: linux-2.6.23-rc7/arch/x86_64/kernel/apic.c
===
--- linux-2.6.23-rc7.orig/arch/x86_64/kernel/apic.c 2007-09-24 
20:30:00.0 +0200
+++ linux-2.6.23-rc7/arch/x86_64/kernel/apic.c  2007-09-25 15:05:32.0 
+0200
@@ -927,6 +927,7 @@ static void __init calibrate_APIC_clock(
 
 void __init setup_boot_APIC_clock (void)
 {
+#if 0
/*
 * The local apic timer can be disabled via the kernel commandline.
 * Register the lapic timer as a dummy clock event source on SMP
@@ -940,7 +941,7 @@ void __init setup_boot_APIC_clock (void)
setup_APIC_timer();
return;
}
-
+#endif
printk(KERN_INFO "Using local APIC timer interrupts.\n");
calibrate_APIC_clock();
 
@@ -949,11 +950,13 @@ void __init setup_boot_APIC_clock (void)
 * PIT/HPET going.  Otherwise register lapic as a dummy
 * device.
 */
-   if (nmi_watchdog != NMI_IO_APIC)
+   if (!disable_apic_timer && nmi_watchdog != NMI_IO_APIC)
lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY;
+#if 0
else
printk(KERN_WARNING "APIC timer registered as dummy,"
   " due to nmi_watchdog=1!\n");
+#endif
 
setup_APIC_timer();
 }


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Rafael J. Wysocki
On Monday, 24 September 2007 21:13, Thomas Gleixner wrote:
> On Mon, 2007-09-24 at 21:11 +0200, Rafael J. Wysocki wrote:
> > > /me scratches head
> > 
> > Retested.
> > 
> > > We know, that
> > > - disabling local apic timers work
> > 
> > This works reproducibly accross the board.
> 
> Ok
> 
> > > - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING
> > 
> > This stopped working, although it evidently worked yesterday (wtf?).
> > 
> > There seems to be a history effect in the box, to make things more
> > "interesting".
> 
> Did you connect this box to Andrews VAIO during KS ?

No, but it's famous for being interestingly broken nevertheless.

> > I think the only solid data point so far is that "noapictimer" makes the box
> > boot.
> 
> Ok. Can you add "nmi_watchdog=1" to the command line please. This runs
> through the calibration of APIC, but registers it as a dummy clock
> source (the PIT must run to make the watchdog work).
> 
> If it boots, please provide the output of /proc/timer_list

No, it doesn't.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Rafael J. Wysocki
On Tuesday, 25 September 2007 14:52, Rafael J. Wysocki wrote:
> On Tuesday, 25 September 2007 14:28, Thomas Gleixner wrote:
> > On Tue, 2007-09-25 at 14:20 +0200, Rafael J. Wysocki wrote:
> > > > > > As i can see from the log, you are booting on computer with 
> > > > > > dualcore AMD
> > > > > > processor. Do you have C1E feature enabled? 
> > > 
> > > That's possible, how to check?
> > > 
> > > > > > i386 kernel disable lapic on dualcore AMD with C1E support (see 
> > > > > > http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this
> > > > > > patch still (it's required for tickless kernel only).
> > > > > 
> > > > > Well it is required for non tickless mode as well.
> > > > > 
> > > > > >  As result, if
> > > > > > you run x86_64 kernel with hrt patch on such computer, the system
> > > > > > will stall during boot on lapic timer calibration.
> > > > > 
> > > > > Thanks for the reminder. I have a look into this.
> > > > 
> > > > Can you please boot mainline and provide the output of:
> > > > 
> > > > # cat /proc/interrupts; sleep 10; cat /proc/interrupts
> > > 
> > > albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts
> > >CPU0   CPU1
> > >   0:1159492  0  local-APIC-edge  timer
> > > LOC:  01158220   Local interrupts
> > >
> > >   0:1161996  0  local-APIC-edge  timer
> > > LOC:  01160723   Local interrupts
> > 
> > Hmm. That's strange. It looks like the local apic timer is not used, but
> > x86_64 definitely lacks the above check.
> 
> Ouch, sorry.  This is from the kernel booted with "noapictimer".
> 
> I'll get the correct output in a little while.

OK, this one is from -rc7 with no extra command line:

albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts
   CPU0   CPU1
  0:  27311  0  local-APIC-edge  timer
  1:  1 77   IO-APIC-edge  i8042
  8:  0  0   IO-APIC-edge  rtc
 12:  0148   IO-APIC-edge  i8042
 14: 19683   IO-APIC-edge  ide0
 16:178  12443   IO-APIC-fasteoi   sata_sil, HDA Intel
 19:111  15197   IO-APIC-fasteoi   ehci_hcd:usb1, ohci_hcd:usb2, 
ohci_hcd:usb3
 20:  0  3   IO-APIC-fasteoi   tifm_7xx1, yenta, sdhci:slot0, 
ohci1394
 21:  0113   IO-APIC-fasteoi   acpi
NMI:  0  0
LOC:  27270  27119
ERR:  2
   CPU0   CPU1
  0:  29815  0  local-APIC-edge  timer
  1:  1 77   IO-APIC-edge  i8042
  8:  0  0   IO-APIC-edge  rtc
 12:  0148   IO-APIC-edge  i8042
 14: 20772   IO-APIC-edge  ide0
 16:178  12451   IO-APIC-fasteoi   sata_sil, HDA Intel
 19:112  17199   IO-APIC-fasteoi   ehci_hcd:usb1, ohci_hcd:usb2, 
ohci_hcd:usb3
 20:  0  3   IO-APIC-fasteoi   tifm_7xx1, yenta, sdhci:slot0, 
ohci1394
 21:  0117   IO-APIC-fasteoi   acpi
NMI:  0  0
LOC:  29774  29623
ERR:  2
albercik:~ #

Greetings,
Rafael

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Rafael J. Wysocki
On Tuesday, 25 September 2007 14:28, Thomas Gleixner wrote:
> On Tue, 2007-09-25 at 14:20 +0200, Rafael J. Wysocki wrote:
> > > > > As i can see from the log, you are booting on computer with dualcore 
> > > > > AMD
> > > > > processor. Do you have C1E feature enabled? 
> > 
> > That's possible, how to check?
> > 
> > > > > i386 kernel disable lapic on dualcore AMD with C1E support (see 
> > > > > http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this
> > > > > patch still (it's required for tickless kernel only).
> > > > 
> > > > Well it is required for non tickless mode as well.
> > > > 
> > > > >  As result, if
> > > > > you run x86_64 kernel with hrt patch on such computer, the system
> > > > > will stall during boot on lapic timer calibration.
> > > > 
> > > > Thanks for the reminder. I have a look into this.
> > > 
> > > Can you please boot mainline and provide the output of:
> > > 
> > > # cat /proc/interrupts; sleep 10; cat /proc/interrupts
> > 
> > albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts
> >CPU0   CPU1
> >   0:1159492  0  local-APIC-edge  timer
> > LOC:  01158220   Local interrupts
> >
> >   0:1161996  0  local-APIC-edge  timer
> > LOC:  01160723   Local interrupts
> 
> Hmm. That's strange. It looks like the local apic timer is not used, but
> x86_64 definitely lacks the above check.

Ouch, sorry.  This is from the kernel booted with "noapictimer".

I'll get the correct output in a little while.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Thomas Gleixner
On Tue, 2007-09-25 at 14:20 +0200, Rafael J. Wysocki wrote:
> > > > As i can see from the log, you are booting on computer with dualcore AMD
> > > > processor. Do you have C1E feature enabled? 
> 
> That's possible, how to check?
> 
> > > > i386 kernel disable lapic on dualcore AMD with C1E support (see 
> > > > http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this
> > > > patch still (it's required for tickless kernel only).
> > > 
> > > Well it is required for non tickless mode as well.
> > > 
> > > >  As result, if
> > > > you run x86_64 kernel with hrt patch on such computer, the system
> > > > will stall during boot on lapic timer calibration.
> > > 
> > > Thanks for the reminder. I have a look into this.
> > 
> > Can you please boot mainline and provide the output of:
> > 
> > # cat /proc/interrupts; sleep 10; cat /proc/interrupts
> 
> albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts
>CPU0   CPU1
>   0:1159492  0  local-APIC-edge  timer
> LOC:  01158220   Local interrupts
>
>   0:1161996  0  local-APIC-edge  timer
> LOC:  01160723   Local interrupts

Hmm. That's strange. It looks like the local apic timer is not used, but
x86_64 definitely lacks the above check. Can you please remove/disable
the acpi processor module and recheck ?

tglx




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Rafael J. Wysocki
Thomas,

On Tuesday, 25 September 2007 11:30, Thomas Gleixner wrote:
> Rafael,
> 
> On Tue, 2007-09-25 at 10:07 +0200, Thomas Gleixner wrote:
> > On Tue, 2007-09-25 at 10:14 +0400, Mikhail Kshevetskiy wrote:
> > > Hello Thomas, Rafael
> > > 
> > > > We know, that
> > > > - disabling local apic timers work
> > > 
> > > As i can see from the log, you are booting on computer with dualcore AMD
> > > processor. Do you have C1E feature enabled? 

That's possible, how to check?

> > > i386 kernel disable lapic on dualcore AMD with C1E support (see 
> > > http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this
> > > patch still (it's required for tickless kernel only).
> > 
> > Well it is required for non tickless mode as well.
> > 
> > >  As result, if
> > > you run x86_64 kernel with hrt patch on such computer, the system
> > > will stall during boot on lapic timer calibration.
> > 
> > Thanks for the reminder. I have a look into this.
> 
> Can you please boot mainline and provide the output of:
> 
> # cat /proc/interrupts; sleep 10; cat /proc/interrupts

albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts
   CPU0   CPU1
  0:1159492  0  local-APIC-edge  timer
  1:   6892   1692   IO-APIC-edge  i8042
  8:  0  0   IO-APIC-edge  rtc
 12:156110   IO-APIC-edge  i8042
 14:  29613  11409   IO-APIC-edge  ide0
 16:  23365  21934   IO-APIC-fasteoi   sata_sil, HDA Intel
 18:196  88386   IO-APIC-fasteoi   bcm43xx
 19: 744874 279433   IO-APIC-fasteoi   ohci_hcd:usb1, ohci_hcd:usb2, 
ehci_hcd:usb3
 20:  2  4   IO-APIC-fasteoi   ohci1394, yenta, tifm_7xx1, 
sdhci:slot0
 21:   1408592   IO-APIC-fasteoi   acpi
NMI:  0  0   Non-maskable interrupts
LOC:  01158220   Local interrupts
RES: 260520 295387   Rescheduling interrupts
CAL:419652   function call interrupts
TLB:864541   TLB shootdowns
TRM:  0  0   Thermal event interrupts
THR:  0  0   Threshold APIC interrupts
SPU:  0  0   Spurious interrupts
ERR: 13
   CPU0   CPU1
  0:1161996  0  local-APIC-edge  timer
  1:   6893   1692   IO-APIC-edge  i8042
  8:  0  0   IO-APIC-edge  rtc
 12:156110   IO-APIC-edge  i8042
 14:  29703  11409   IO-APIC-edge  ide0
 16:  23393  21934   IO-APIC-fasteoi   sata_sil, HDA Intel
 18:196  88490   IO-APIC-fasteoi   bcm43xx
 19: 747268 279433   IO-APIC-fasteoi   ohci_hcd:usb1, ohci_hcd:usb2, 
ehci_hcd:usb3
 20:  2  4   IO-APIC-fasteoi   ohci1394, yenta, tifm_7xx1, 
sdhci:slot0
 21:   1412592   IO-APIC-fasteoi   acpi
NMI:  0  0   Non-maskable interrupts
LOC:  01160723   Local interrupts
RES: 260567 295433   Rescheduling interrupts
CAL:419652   function call interrupts
TLB:866543   TLB shootdowns
TRM:  0  0   Thermal event interrupts
THR:  0  0   Threshold APIC interrupts
SPU:  0  0   Spurious interrupts
ERR: 13
albercik:~ #
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Thomas Gleixner
Rafael,

On Tue, 2007-09-25 at 10:07 +0200, Thomas Gleixner wrote:
> On Tue, 2007-09-25 at 10:14 +0400, Mikhail Kshevetskiy wrote:
> > Hello Thomas, Rafael
> > 
> > > We know, that
> > > - disabling local apic timers work
> > 
> > As i can see from the log, you are booting on computer with dualcore AMD
> > processor. Do you have C1E feature enabled? 
> > 
> > i386 kernel disable lapic on dualcore AMD with C1E support (see 
> > http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this
> > patch still (it's required for tickless kernel only).
> 
> Well it is required for non tickless mode as well.
> 
> >  As result, if
> > you run x86_64 kernel with hrt patch on such computer, the system
> > will stall during boot on lapic timer calibration.
> 
> Thanks for the reminder. I have a look into this.

Can you please boot mainline and provide the output of:

# cat /proc/interrupts; sleep 10; cat /proc/interrupts

Thanks,

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Thomas Gleixner
On Tue, 2007-09-25 at 10:14 +0400, Mikhail Kshevetskiy wrote:
> Hello Thomas, Rafael
> 
> > We know, that
> > - disabling local apic timers work
> 
> As i can see from the log, you are booting on computer with dualcore AMD
> processor. Do you have C1E feature enabled? 
> 
> i386 kernel disable lapic on dualcore AMD with C1E support (see 
> http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this
> patch still (it's required for tickless kernel only).

Well it is required for non tickless mode as well.

>  As result, if
> you run x86_64 kernel with hrt patch on such computer, the system
> will stall during boot on lapic timer calibration.

Thanks for the reminder. I have a look into this.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Thomas Gleixner
On Tue, 2007-09-25 at 10:14 +0400, Mikhail Kshevetskiy wrote:
 Hello Thomas, Rafael
 
  We know, that
  - disabling local apic timers work
 
 As i can see from the log, you are booting on computer with dualcore AMD
 processor. Do you have C1E feature enabled? 
 
 i386 kernel disable lapic on dualcore AMD with C1E support (see 
 http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this
 patch still (it's required for tickless kernel only).

Well it is required for non tickless mode as well.

  As result, if
 you run x86_64 kernel with hrt patch on such computer, the system
 will stall during boot on lapic timer calibration.

Thanks for the reminder. I have a look into this.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Thomas Gleixner
Rafael,

On Tue, 2007-09-25 at 10:07 +0200, Thomas Gleixner wrote:
 On Tue, 2007-09-25 at 10:14 +0400, Mikhail Kshevetskiy wrote:
  Hello Thomas, Rafael
  
   We know, that
   - disabling local apic timers work
  
  As i can see from the log, you are booting on computer with dualcore AMD
  processor. Do you have C1E feature enabled? 
  
  i386 kernel disable lapic on dualcore AMD with C1E support (see 
  http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this
  patch still (it's required for tickless kernel only).
 
 Well it is required for non tickless mode as well.
 
   As result, if
  you run x86_64 kernel with hrt patch on such computer, the system
  will stall during boot on lapic timer calibration.
 
 Thanks for the reminder. I have a look into this.

Can you please boot mainline and provide the output of:

# cat /proc/interrupts; sleep 10; cat /proc/interrupts

Thanks,

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Rafael J. Wysocki
Thomas,

On Tuesday, 25 September 2007 11:30, Thomas Gleixner wrote:
 Rafael,
 
 On Tue, 2007-09-25 at 10:07 +0200, Thomas Gleixner wrote:
  On Tue, 2007-09-25 at 10:14 +0400, Mikhail Kshevetskiy wrote:
   Hello Thomas, Rafael
   
We know, that
- disabling local apic timers work
   
   As i can see from the log, you are booting on computer with dualcore AMD
   processor. Do you have C1E feature enabled? 

That's possible, how to check?

   i386 kernel disable lapic on dualcore AMD with C1E support (see 
   http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this
   patch still (it's required for tickless kernel only).
  
  Well it is required for non tickless mode as well.
  
As result, if
   you run x86_64 kernel with hrt patch on such computer, the system
   will stall during boot on lapic timer calibration.
  
  Thanks for the reminder. I have a look into this.
 
 Can you please boot mainline and provide the output of:
 
 # cat /proc/interrupts; sleep 10; cat /proc/interrupts

albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts
   CPU0   CPU1
  0:1159492  0  local-APIC-edge  timer
  1:   6892   1692   IO-APIC-edge  i8042
  8:  0  0   IO-APIC-edge  rtc
 12:156110   IO-APIC-edge  i8042
 14:  29613  11409   IO-APIC-edge  ide0
 16:  23365  21934   IO-APIC-fasteoi   sata_sil, HDA Intel
 18:196  88386   IO-APIC-fasteoi   bcm43xx
 19: 744874 279433   IO-APIC-fasteoi   ohci_hcd:usb1, ohci_hcd:usb2, 
ehci_hcd:usb3
 20:  2  4   IO-APIC-fasteoi   ohci1394, yenta, tifm_7xx1, 
sdhci:slot0
 21:   1408592   IO-APIC-fasteoi   acpi
NMI:  0  0   Non-maskable interrupts
LOC:  01158220   Local interrupts
RES: 260520 295387   Rescheduling interrupts
CAL:419652   function call interrupts
TLB:864541   TLB shootdowns
TRM:  0  0   Thermal event interrupts
THR:  0  0   Threshold APIC interrupts
SPU:  0  0   Spurious interrupts
ERR: 13
   CPU0   CPU1
  0:1161996  0  local-APIC-edge  timer
  1:   6893   1692   IO-APIC-edge  i8042
  8:  0  0   IO-APIC-edge  rtc
 12:156110   IO-APIC-edge  i8042
 14:  29703  11409   IO-APIC-edge  ide0
 16:  23393  21934   IO-APIC-fasteoi   sata_sil, HDA Intel
 18:196  88490   IO-APIC-fasteoi   bcm43xx
 19: 747268 279433   IO-APIC-fasteoi   ohci_hcd:usb1, ohci_hcd:usb2, 
ehci_hcd:usb3
 20:  2  4   IO-APIC-fasteoi   ohci1394, yenta, tifm_7xx1, 
sdhci:slot0
 21:   1412592   IO-APIC-fasteoi   acpi
NMI:  0  0   Non-maskable interrupts
LOC:  01160723   Local interrupts
RES: 260567 295433   Rescheduling interrupts
CAL:419652   function call interrupts
TLB:866543   TLB shootdowns
TRM:  0  0   Thermal event interrupts
THR:  0  0   Threshold APIC interrupts
SPU:  0  0   Spurious interrupts
ERR: 13
albercik:~ #
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Thomas Gleixner
On Tue, 2007-09-25 at 14:20 +0200, Rafael J. Wysocki wrote:
As i can see from the log, you are booting on computer with dualcore AMD
processor. Do you have C1E feature enabled? 
 
 That's possible, how to check?
 
i386 kernel disable lapic on dualcore AMD with C1E support (see 
http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this
patch still (it's required for tickless kernel only).
   
   Well it is required for non tickless mode as well.
   
 As result, if
you run x86_64 kernel with hrt patch on such computer, the system
will stall during boot on lapic timer calibration.
   
   Thanks for the reminder. I have a look into this.
  
  Can you please boot mainline and provide the output of:
  
  # cat /proc/interrupts; sleep 10; cat /proc/interrupts
 
 albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts
CPU0   CPU1
   0:1159492  0  local-APIC-edge  timer
 LOC:  01158220   Local interrupts

   0:1161996  0  local-APIC-edge  timer
 LOC:  01160723   Local interrupts

Hmm. That's strange. It looks like the local apic timer is not used, but
x86_64 definitely lacks the above check. Can you please remove/disable
the acpi processor module and recheck ?

tglx




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Rafael J. Wysocki
On Tuesday, 25 September 2007 14:28, Thomas Gleixner wrote:
 On Tue, 2007-09-25 at 14:20 +0200, Rafael J. Wysocki wrote:
 As i can see from the log, you are booting on computer with dualcore 
 AMD
 processor. Do you have C1E feature enabled? 
  
  That's possible, how to check?
  
 i386 kernel disable lapic on dualcore AMD with C1E support (see 
 http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this
 patch still (it's required for tickless kernel only).

Well it is required for non tickless mode as well.

  As result, if
 you run x86_64 kernel with hrt patch on such computer, the system
 will stall during boot on lapic timer calibration.

Thanks for the reminder. I have a look into this.
   
   Can you please boot mainline and provide the output of:
   
   # cat /proc/interrupts; sleep 10; cat /proc/interrupts
  
  albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts
 CPU0   CPU1
0:1159492  0  local-APIC-edge  timer
  LOC:  01158220   Local interrupts
 
0:1161996  0  local-APIC-edge  timer
  LOC:  01160723   Local interrupts
 
 Hmm. That's strange. It looks like the local apic timer is not used, but
 x86_64 definitely lacks the above check.

Ouch, sorry.  This is from the kernel booted with noapictimer.

I'll get the correct output in a little while.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Rafael J. Wysocki
On Tuesday, 25 September 2007 14:52, Rafael J. Wysocki wrote:
 On Tuesday, 25 September 2007 14:28, Thomas Gleixner wrote:
  On Tue, 2007-09-25 at 14:20 +0200, Rafael J. Wysocki wrote:
  As i can see from the log, you are booting on computer with 
  dualcore AMD
  processor. Do you have C1E feature enabled? 
   
   That's possible, how to check?
   
  i386 kernel disable lapic on dualcore AMD with C1E support (see 
  http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this
  patch still (it's required for tickless kernel only).
 
 Well it is required for non tickless mode as well.
 
   As result, if
  you run x86_64 kernel with hrt patch on such computer, the system
  will stall during boot on lapic timer calibration.
 
 Thanks for the reminder. I have a look into this.

Can you please boot mainline and provide the output of:

# cat /proc/interrupts; sleep 10; cat /proc/interrupts
   
   albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts
  CPU0   CPU1
 0:1159492  0  local-APIC-edge  timer
   LOC:  01158220   Local interrupts
  
 0:1161996  0  local-APIC-edge  timer
   LOC:  01160723   Local interrupts
  
  Hmm. That's strange. It looks like the local apic timer is not used, but
  x86_64 definitely lacks the above check.
 
 Ouch, sorry.  This is from the kernel booted with noapictimer.
 
 I'll get the correct output in a little while.

OK, this one is from -rc7 with no extra command line:

albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts
   CPU0   CPU1
  0:  27311  0  local-APIC-edge  timer
  1:  1 77   IO-APIC-edge  i8042
  8:  0  0   IO-APIC-edge  rtc
 12:  0148   IO-APIC-edge  i8042
 14: 19683   IO-APIC-edge  ide0
 16:178  12443   IO-APIC-fasteoi   sata_sil, HDA Intel
 19:111  15197   IO-APIC-fasteoi   ehci_hcd:usb1, ohci_hcd:usb2, 
ohci_hcd:usb3
 20:  0  3   IO-APIC-fasteoi   tifm_7xx1, yenta, sdhci:slot0, 
ohci1394
 21:  0113   IO-APIC-fasteoi   acpi
NMI:  0  0
LOC:  27270  27119
ERR:  2
   CPU0   CPU1
  0:  29815  0  local-APIC-edge  timer
  1:  1 77   IO-APIC-edge  i8042
  8:  0  0   IO-APIC-edge  rtc
 12:  0148   IO-APIC-edge  i8042
 14: 20772   IO-APIC-edge  ide0
 16:178  12451   IO-APIC-fasteoi   sata_sil, HDA Intel
 19:112  17199   IO-APIC-fasteoi   ehci_hcd:usb1, ohci_hcd:usb2, 
ohci_hcd:usb3
 20:  0  3   IO-APIC-fasteoi   tifm_7xx1, yenta, sdhci:slot0, 
ohci1394
 21:  0117   IO-APIC-fasteoi   acpi
NMI:  0  0
LOC:  29774  29623
ERR:  2
albercik:~ #

Greetings,
Rafael

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Rafael J. Wysocki
On Monday, 24 September 2007 21:13, Thomas Gleixner wrote:
 On Mon, 2007-09-24 at 21:11 +0200, Rafael J. Wysocki wrote:
   /me scratches head
  
  Retested.
  
   We know, that
   - disabling local apic timers work
  
  This works reproducibly accross the board.
 
 Ok
 
   - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING
  
  This stopped working, although it evidently worked yesterday (wtf?).
  
  There seems to be a history effect in the box, to make things more
  interesting.
 
 Did you connect this box to Andrews VAIO during KS ?

No, but it's famous for being interestingly broken nevertheless.

  I think the only solid data point so far is that noapictimer makes the box
  boot.
 
 Ok. Can you add nmi_watchdog=1 to the command line please. This runs
 through the calibration of APIC, but registers it as a dummy clock
 source (the PIT must run to make the watchdog work).
 
 If it boots, please provide the output of /proc/timer_list

No, it doesn't.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Thomas Gleixner
On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote:
   There seems to be a history effect in the box, to make things more
   interesting.
  
  Did you connect this box to Andrews VAIO during KS ?
 
 No, but it's famous for being interestingly broken nevertheless.

:)

   I think the only solid data point so far is that noapictimer makes the 
   box
   boot.
  
  Ok. Can you add nmi_watchdog=1 to the command line please. This runs
  through the calibration of APIC, but registers it as a dummy clock
  source (the PIT must run to make the watchdog work).
  
  If it boots, please provide the output of /proc/timer_list
 
 No, it doesn't.

I start to get desperate. Below is a patch, which moves the apic timer
disable check after the calibration routine. Can you please apply on top
of -hrt and add noapictimer to the command line ? Does it boot ?

tglx

Index: linux-2.6.23-rc7/arch/x86_64/kernel/apic.c
===
--- linux-2.6.23-rc7.orig/arch/x86_64/kernel/apic.c 2007-09-24 
20:30:00.0 +0200
+++ linux-2.6.23-rc7/arch/x86_64/kernel/apic.c  2007-09-25 15:05:32.0 
+0200
@@ -927,6 +927,7 @@ static void __init calibrate_APIC_clock(
 
 void __init setup_boot_APIC_clock (void)
 {
+#if 0
/*
 * The local apic timer can be disabled via the kernel commandline.
 * Register the lapic timer as a dummy clock event source on SMP
@@ -940,7 +941,7 @@ void __init setup_boot_APIC_clock (void)
setup_APIC_timer();
return;
}
-
+#endif
printk(KERN_INFO Using local APIC timer interrupts.\n);
calibrate_APIC_clock();
 
@@ -949,11 +950,13 @@ void __init setup_boot_APIC_clock (void)
 * PIT/HPET going.  Otherwise register lapic as a dummy
 * device.
 */
-   if (nmi_watchdog != NMI_IO_APIC)
+   if (!disable_apic_timer  nmi_watchdog != NMI_IO_APIC)
lapic_clockevent.features = ~CLOCK_EVT_FEAT_DUMMY;
+#if 0
else
printk(KERN_WARNING APIC timer registered as dummy,
due to nmi_watchdog=1!\n);
+#endif
 
setup_APIC_timer();
 }


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Rafael J. Wysocki
On Tuesday, 25 September 2007 15:17, Thomas Gleixner wrote:
 On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote:
[--snip--]
 
 I start to get desperate. Below is a patch, which moves the apic timer
 disable check after the calibration routine. Can you please apply on top
 of -hrt and add noapictimer to the command line ? Does it boot ?

2.6.23-rc7 with patch-2.6.23-rc7-hrt1.patch and the patch below applied boots
with noapictimer and doesn't boot without it.

Also, attached is the output of

# cat /proc/interrupts; sleep 10; cat /proc/interrupts

from the current mainline.

Greetings,
Rafael


 Index: linux-2.6.23-rc7/arch/x86_64/kernel/apic.c
 ===
 --- linux-2.6.23-rc7.orig/arch/x86_64/kernel/apic.c   2007-09-24 
 20:30:00.0 +0200
 +++ linux-2.6.23-rc7/arch/x86_64/kernel/apic.c2007-09-25 
 15:05:32.0 +0200
 @@ -927,6 +927,7 @@ static void __init calibrate_APIC_clock(
  
  void __init setup_boot_APIC_clock (void)
  {
 +#if 0
   /*
* The local apic timer can be disabled via the kernel commandline.
* Register the lapic timer as a dummy clock event source on SMP
 @@ -940,7 +941,7 @@ void __init setup_boot_APIC_clock (void)
   setup_APIC_timer();
   return;
   }
 -
 +#endif
   printk(KERN_INFO Using local APIC timer interrupts.\n);
   calibrate_APIC_clock();
  
 @@ -949,11 +950,13 @@ void __init setup_boot_APIC_clock (void)
* PIT/HPET going.  Otherwise register lapic as a dummy
* device.
*/
 - if (nmi_watchdog != NMI_IO_APIC)
 + if (!disable_apic_timer  nmi_watchdog != NMI_IO_APIC)
   lapic_clockevent.features = ~CLOCK_EVT_FEAT_DUMMY;
 +#if 0
   else
   printk(KERN_WARNING APIC timer registered as dummy,
   due to nmi_watchdog=1!\n);
 +#endif
  
   setup_APIC_timer();
  }
 
 
 
 

-- 
Premature optimization is the root of all evil. - Donald Knuth
albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts
   CPU0   CPU1
  0:  62489  0  local-APIC-edge  timer
  1:  3232   IO-APIC-edge  i8042
  8:  0  0   IO-APIC-edge  rtc
 12:  1147   IO-APIC-edge  i8042
 14: 15   1947   IO-APIC-edge  ide0
 16:193  14151   IO-APIC-fasteoi   sata_sil, HDA Intel
 19: 76  43153   IO-APIC-fasteoi   ohci_hcd:usb1, ehci_hcd:usb2, ohci_hcd:usb3
 20:  0  4   IO-APIC-fasteoi   ohci1394, tifm_7xx1, yenta, sdhci:slot0
 21:  7172   IO-APIC-fasteoi   acpi
NMI:  0  0
LOC:  62454  62082
ERR:  0
   CPU0   CPU1
  0:  64993  0  local-APIC-edge  timer
  1:  3233   IO-APIC-edge  i8042
  8:  0  0   IO-APIC-edge  rtc
 12:  1147   IO-APIC-edge  i8042
 14: 15   2037   IO-APIC-edge  ide0
 16:194  14265   IO-APIC-fasteoi   sata_sil, HDA Intel
 19: 77  45155   IO-APIC-fasteoi   ohci_hcd:usb1, ehci_hcd:usb2, ohci_hcd:usb3
 20:  0  4   IO-APIC-fasteoi   ohci1394, tifm_7xx1, yenta, sdhci:slot0
 21:  7176   IO-APIC-fasteoi   acpi
NMI:  0  0
LOC:  64958  64586
ERR:  0
albercik:~ #


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Thomas Gleixner
Rafael,

On Tue, 2007-09-25 at 22:07 +0200, Rafael J. Wysocki wrote:
 On Tuesday, 25 September 2007 15:17, Thomas Gleixner wrote:
  On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote:
 [--snip--]
  
  I start to get desperate. Below is a patch, which moves the apic timer
  disable check after the calibration routine. Can you please apply on top
  of -hrt and add noapictimer to the command line ? Does it boot ?

 2.6.23-rc7 with patch-2.6.23-rc7-hrt1.patch and the patch below applied boots
 with noapictimer and doesn't boot without it.

That was expected. I explicitly asked to add noapictimer to the kernel
command line.

Ok, so we ruled out the apic timer calibration routine. I did not expect
that this would be the culprit, but with dark screen as the only debug
info, I need to resort to small steps.

Can you please send me the output of /proc/timer_list of 2.6.23-rc7-hrt1
after booting with noapictimer ?

I'm a bit confused by your earlier confirmation, that mainline w/o the
-hrt patches boots fine, when you add apicmaintimer to the kernel
command line. apicmaintimer stops the PIT like we do in -hrt and we
just use the local APIC timer for everything. Can you please retest and
confirm that this is correct ?

Is the 32 bit kernel working on that box ?

Thanks for your patience.

tglx

PS: I just sent out the disable APIC timer for AMD C1E boxen patch. We
debugged this half a year ago on a nx6325, but I completely forgot about
that. The explanation from AMD was sensible, but your apicmaintimer
works statement is contradictory.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Rafael J. Wysocki
Thomas,

On Tuesday, 25 September 2007 22:46, Thomas Gleixner wrote:
 Rafael,
 
 On Tue, 2007-09-25 at 22:07 +0200, Rafael J. Wysocki wrote:
  On Tuesday, 25 September 2007 15:17, Thomas Gleixner wrote:
   On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote:
  [--snip--]
   
   I start to get desperate. Below is a patch, which moves the apic timer
   disable check after the calibration routine. Can you please apply on top
   of -hrt and add noapictimer to the command line ? Does it boot ?
 
  2.6.23-rc7 with patch-2.6.23-rc7-hrt1.patch and the patch below applied 
  boots
  with noapictimer and doesn't boot without it.
 
 That was expected. I explicitly asked to add noapictimer to the kernel
 command line.
 
 Ok, so we ruled out the apic timer calibration routine. I did not expect
 that this would be the culprit, but with dark screen as the only debug
 info, I need to resort to small steps.
 
 Can you please send me the output of /proc/timer_list of 2.6.23-rc7-hrt1
 after booting with noapictimer ?

Sure, attached.  [Note: the kernel has been compiled with both NO_HZ and
HIGH_RES_TIMERS unset.]

 I'm a bit confused by your earlier confirmation, that mainline w/o the
 -hrt patches boots fine, when you add apicmaintimer to the kernel
 command line. apicmaintimer stops the PIT like we do in -hrt and we
 just use the local APIC timer for everything. Can you please retest and
 confirm that this is correct ?

No, it's not.  The mainline _usually_ doesn't boot with apicmaintimer.

It seems to me that _sometimes_ the CPU just doesn't enter this C1E state
and then everything goes fine ...

 Is the 32 bit kernel working on that box ?

Can't tell, I have only 64-bit userland here.

 Thanks for your patience.

Well, I'm only making sure that future kernels will run on my box. ;-)

   tglx
 
 PS: I just sent out the disable APIC timer for AMD C1E boxen patch.

Yes, I've already tested it and sent a reply.  It works. :-)

 We debugged this half a year ago on a nx6325, but I completely forgot about
 that. The explanation from AMD was sensible, but your apicmaintimer
 works statement is contradictory.

Well, it was wrong.

I have some problems with resuming from suspend to RAM using 2.6.23-rc8-mm1
with this patch applied, but I think they are related to something else.  I'll
wait for the next -mm with debugging that.

For now, I'm going to build 2.6.23-rc8 with my collection of suspend patches
plus patch-2.6.23-rc7-hrt1.patch and the disable APIC timer for AMD C1E boxes
patch applied.  I'll play with that a bit and let you know how it's behaving.

Greetings,
Rafael
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: 2
now at 279792107058 nsecs

cpu: 0
 clock 0:
  .index:  0
  .resolution: 4000250 nsecs
  .get_time:   ktime_get_real
active timers:
 clock 1:
  .index:  1
  .resolution: 4000250 nsecs
  .get_time:   ktime_get
active timers:
 #0: 81004f98bda8, hrtimer_wakeup, S:01, do_nanosleep, kwrapper/4664
 # expires at 280207419178 nsecs [in 415312120 nsecs]
 #1: 81004f98bda8, hrtimer_wakeup, S:01, futex_wait, nscd/4080
 # expires at 282678021548 nsecs [in 2885914490 nsecs]
 #2: 81004f98bda8, hrtimer_wakeup, S:01, futex_wait, nscd/4082
 # expires at 282678129670 nsecs [in 2886022612 nsecs]
 #3: 81004f98bda8, it_real_fn, S:01, do_setitimer, qmgr/4239
 # expires at 378654389676 nsecs [in 98862282618 nsecs]
 #4: 81004f98bda8, it_real_fn, S:01, do_setitimer, pickup/4238
 # expires at 557809025993 nsecs [in 278016918935 nsecs]
 #5: 81004f98bda8, it_real_fn, S:01, do_setitimer, master/4216
 # expires at 557809137746 nsecs [in 278017030688 nsecs]

cpu: 1
 clock 0:
  .index:  0
  .resolution: 4000250 nsecs
  .get_time:   ktime_get_real
active timers:
 clock 1:
  .index:  1
  .resolution: 4000250 nsecs
  .get_time:   ktime_get
active timers:
 #0: 81004f98bda8, it_real_fn, S:01, do_setitimer, Xorg/4355
 # expires at 279804542721 nsecs [in 12435663 nsecs]
 #1: 81004f98bda8, it_real_fn, S:01, do_setitimer, ssh-agent/4611
 # expires at 279962268496 nsecs [in 170161438 nsecs]
 #2: 81004f98bda8, hrtimer_wakeup, S:01, do_nanosleep, 
hald-addon-stor/4148
 # expires at 280071774352 nsecs [in 279667294 nsecs]
 #3: 81004f98bda8, hrtimer_wakeup, S:01, futex_wait, nscd/4081
 # expires at 282678034680 nsecs [in 2885927622 nsecs]
 #4: 81004f98bda8, hrtimer_wakeup, S:01, do_nanosleep, cron/4241
 # expires at 335311096287 nsecs [in 55518989229 nsecs]
 #5: 81004f98bda8, it_real_fn, S:01, do_setitimer, dhcpcd/5128
 # expires at 604918992928181 nsecs [in 604639200821123 nsecs]
 #6: 81004f98bda8, hrtimer_wakeup, S:01, do_nanosleep, dhcpcd/5128
 # expires at 604918992950531 nsecs [in 604639200843473 nsecs]


Tick Device: mode: 0
Clock Event Device: pit
 max_delta_ns:   27461866
 min_delta_ns:   12571
 mult:   5124677
 shift:  32
 mode:   2
 next_event: 9223372036854775807 nsecs
 set_next_event: pit_next_event
 set_mode:   init_pit_timer
 

Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-25 Thread Thomas Gleixner
Rafael,

On Tue, 2007-09-25 at 23:28 +0200, Rafael J. Wysocki wrote:
  I'm a bit confused by your earlier confirmation, that mainline w/o the
  -hrt patches boots fine, when you add apicmaintimer to the kernel
  command line. apicmaintimer stops the PIT like we do in -hrt and we
  just use the local APIC timer for everything. Can you please retest and
  confirm that this is correct ?
 
 No, it's not.  The mainline _usually_ doesn't boot with apicmaintimer.
 
 It seems to me that _sometimes_ the CPU just doesn't enter this C1E state
 and then everything goes fine ...

I'm relieved. I really started to go nuts on this contradicting
patterns.

Your box seems to be worse than the VAIO, it has some random surprise
generator built in :)

  Is the 32 bit kernel working on that box ?
 
 Can't tell, I have only 64-bit userland here.

Should be fine. The check is there since late 2.6.21-rc. I really could
kick my own ass that I did not remember the nx6325 wreckage in the
2.6.21-rc time frame. Sigh, way too much broken hardware out there to
keep track of it.

  Thanks for your patience.
 
 Well, I'm only making sure that future kernels will run on my box. ;-)

Nothing wrong with that. Thanks again for your help,

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Thomas Gleixner
On Mon, 2007-09-24 at 21:11 +0200, Rafael J. Wysocki wrote:
> > /me scratches head
> 
> Retested.
> 
> > We know, that
> > - disabling local apic timers work
> 
> This works reproducibly accross the board.

Ok

> > - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING
> 
> This stopped working, although it evidently worked yesterday (wtf?).
> 
> There seems to be a history effect in the box, to make things more
> "interesting".

Did you connect this box to Andrews VAIO during KS ?

> I think the only solid data point so far is that "noapictimer" makes the box
> boot.

Ok. Can you add "nmi_watchdog=1" to the command line please. This runs
through the calibration of APIC, but registers it as a dummy clock
source (the PIT must run to make the watchdog work).

If it boots, please provide the output of /proc/timer_list

Thanks, 

tlgx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Rafael J. Wysocki
On Monday, 24 September 2007 18:46, Thomas Gleixner wrote:
> On Mon, 2007-09-24 at 17:18 +0200, Rafael J. Wysocki wrote:
> > > > Well, "noacpi" seems to be a synonym for "pci=noacpi".
> > > > 
> > > > Anyway, it causes acpi_disable_pci() to be executed, which according to
> > > > Documentation/kernel-parameters.txt means "Do not use ACPI for IRQ 
> > > > routing or
> > > > for PCI scanning" (it works like this on x86_64 too, although the doc 
> > > > says it's
> > > > x86_32-specific).
> > > 
> > > Hrm. The local apic timer calibration does not use anything which is
> > > related to interrupts, but if we use the local APIC timer we switch off
> > > PIT.
> > > 
> > > Can you boot Linus latest (w/o hrt patches) and add "apicmaintimer" to
> > > the kernel command line please ?
> > 
> > Works, dmesg attached.
> 
> /me scratches head

Retested.

> We know, that
> - disabling local apic timers work

This works reproducibly accross the board.

> - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING

This stopped working, although it evidently worked yesterday (wtf?).

There seems to be a history effect in the box, to make things more
"interesting".

> is given on the kernel command line.
> 
> I have no clue, what might be the difference of noacpiFSCKEDPARSING. The
> boot log is not giving any hint at all.
> 
> acpi_disable_pci() sets acpi_pci_disabled and acpi_noirq to 1.
> 
> What happens, if you set "acpi=noirq" instead ?

That obviously doesn't help.

I think the only solid data point so far is that "noapictimer" makes the box
boot.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Thomas Gleixner
On Mon, 2007-09-24 at 17:18 +0200, Rafael J. Wysocki wrote:
> > > Well, "noacpi" seems to be a synonym for "pci=noacpi".
> > > 
> > > Anyway, it causes acpi_disable_pci() to be executed, which according to
> > > Documentation/kernel-parameters.txt means "Do not use ACPI for IRQ 
> > > routing or
> > > for PCI scanning" (it works like this on x86_64 too, although the doc 
> > > says it's
> > > x86_32-specific).
> > 
> > Hrm. The local apic timer calibration does not use anything which is
> > related to interrupts, but if we use the local APIC timer we switch off
> > PIT.
> > 
> > Can you boot Linus latest (w/o hrt patches) and add "apicmaintimer" to
> > the kernel command line please ?
> 
> Works, dmesg attached.

/me scratches head

We know, that
- disabling local apic timers work
- local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING
is given on the kernel command line.

I have no clue, what might be the difference of noacpiFSCKEDPARSING. The
boot log is not giving any hint at all.

acpi_disable_pci() sets acpi_pci_disabled and acpi_noirq to 1.

What happens, if you set "acpi=noirq" instead ?

tglx








-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Rafael J. Wysocki
On Monday, 24 September 2007 16:23, Thomas Gleixner wrote:
> On Mon, 2007-09-24 at 15:52 +0200, Rafael J. Wysocki wrote:
> > > > > So I really wonder, why noacpitimer on the kernel command line makes 
> > > > > any
> > > > > difference. I'm confused.
> > > > 
> > > > \metoo
> > > > 
> > > > Well, it was probably read as "noacpi". :-)
> > > 
> > > Hmm, ACPI is in the log all over the place.
> > 
> > Well, "noacpi" seems to be a synonym for "pci=noacpi".
> > 
> > Anyway, it causes acpi_disable_pci() to be executed, which according to
> > Documentation/kernel-parameters.txt means "Do not use ACPI for IRQ routing 
> > or
> > for PCI scanning" (it works like this on x86_64 too, although the doc says 
> > it's
> > x86_32-specific).
> 
> Hrm. The local apic timer calibration does not use anything which is
> related to interrupts, but if we use the local APIC timer we switch off
> PIT.
> 
> Can you boot Linus latest (w/o hrt patches) and add "apicmaintimer" to
> the kernel command line please ?

Works, dmesg attached.

Greetings,
Rafael
Linux version 2.6.23-rc7test ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #19 SMP Mon Sep 24 16:55:05 CEST 2007
Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 apicmaintimer apic=verbose 2
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 77fd (usable)
 BIOS-e820: 77fd - 77fe5600 (reserved)
 BIOS-e820: 77fe5600 - 77ff8000 (ACPI NVS)
 BIOS-e820: 77ff8000 - 8000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec02000 (reserved)
 BIOS-e820: ffbc - ffcc (reserved)
 BIOS-e820: fff0 - 0001 (reserved)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 491472) 1 entries of 256 used
end_pfn_map = 1048576
DMI 2.4 present.
ACPI: RSDP 000F7D30, 0024 (r2 HP)
ACPI: XSDT 77FE57B4, 0054 (r1 HP 0944  6070620 HP  1)
ACPI: FACP 77FE5684, 00F4 (r4 HP 09443 HP  1)
ACPI: DSDT 77FE58DC, EE7A (r1 HPSB4001 MSFT  10E)
ACPI: FACS 77FF7E80, 0040
ACPI: APIC 77FE5808, 0062 (r1 HP 09441 HP  1)
ACPI: MCFG 77FE586C, 003C (r1 HP 09441 HP  1)
ACPI: TCPA 77FE58A8, 0032 (r2 HP 09441 HP  1)
ACPI: SSDT 77FF4756, 0059 (r1 HP   HPQNLP1 MSFT  10E)
ACPI: SSDT 77FF47AF, 0206 (r1 HP PSSTBLID1 HP  1)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 491472) 1 entries of 256 used
No mptable found.
Zone PFN ranges:
  DMA 0 -> 4096
  DMA324096 ->  1048576
  Normal1048576 ->  1048576
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 ->  159
0:  256 ->   491472
On node 0 totalpages: 491375
  DMA zone: 56 pages used for memmap
  DMA zone: 1442 pages reserved
  DMA zone: 2501 pages, LIFO batch:0
  DMA32 zone: 6663 pages used for memmap
  DMA32 zone: 480713 pages, LIFO batch:31
  Normal zone: 0 pages used for memmap
  Movable zone: 0 pages used for memmap
ATI board detected. Disabling timer routing over 8254.
ACPI: PM-Timer IO Port: 0x8008
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
mapped APIC to ff5fb000 (fee0)
mapped IOAPIC to ff5fa000 (fec0)
swsusp: Registered nosave memory region: 0009f000 - 000a
swsusp: Registered nosave memory region: 000a - 000e
swsusp: Registered nosave memory region: 000e - 0010
Allocating PCI resources starting at 8800 (gap: 8000:6000)
SMP: Allowing 2 CPUs, 0 hotplug CPUs
PERCPU: Allocating 47320 bytes of per cpu data
Built 1 zonelists in Zone order.  Total pages: 483214
Kernel command line: root=/dev/sda3 vga=792 resume=/dev/sda1 apicmaintimer apic=verbose 2
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Extended CMOS year: 2000
Marking TSC unstable due to TSCs unsynchronized
time.c: Detected 1995.108 MHz processor.
Console: colour dummy device 80x25
console [tty0] enabled
Dentry cache hash table 

Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Thomas Gleixner
On Mon, 2007-09-24 at 15:52 +0200, Rafael J. Wysocki wrote:
> > > > So I really wonder, why noacpitimer on the kernel command line makes any
> > > > difference. I'm confused.
> > > 
> > > \metoo
> > > 
> > > Well, it was probably read as "noacpi". :-)
> > 
> > Hmm, ACPI is in the log all over the place.
> 
> Well, "noacpi" seems to be a synonym for "pci=noacpi".
> 
> Anyway, it causes acpi_disable_pci() to be executed, which according to
> Documentation/kernel-parameters.txt means "Do not use ACPI for IRQ routing or
> for PCI scanning" (it works like this on x86_64 too, although the doc says 
> it's
> x86_32-specific).

Hrm. The local apic timer calibration does not use anything which is
related to interrupts, but if we use the local APIC timer we switch off
PIT.

Can you boot Linus latest (w/o hrt patches) and add "apicmaintimer" to
the kernel command line please ?

> And yes, it matches "noacpiwhatever" in the command line with "noacpi".  Sigh.

Urgh.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Rafael J. Wysocki
On Monday, 24 September 2007 15:05, Thomas Gleixner wrote:
> On Mon, 2007-09-24 at 14:57 +0200, Rafael J. Wysocki wrote:
> > > > http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2
> > > > 
> > > > applied.  I also have the 2.6.23-rc6-mm1 dmesg output ready, but 
> > > > there's some
> > > > -mm-specific noise in it.  Please let me know if you want it, though.
> > > 
> > > Hmm:
> > > 
> > > > Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer 
> > > > apic=verbose 2
> > > ^^^
> > > 
> > > noacpitimer is not a valid commandline option.
> > > 
> > > I asked for: 
> > > >> > > noapictimer
> > 
> > I'm blind, sorry.
> > 
> > > So I really wonder, why noacpitimer on the kernel command line makes any
> > > difference. I'm confused.
> > 
> > \metoo
> > 
> > Well, it was probably read as "noacpi". :-)
> 
> Hmm, ACPI is in the log all over the place.

Well, "noacpi" seems to be a synonym for "pci=noacpi".

Anyway, it causes acpi_disable_pci() to be executed, which according to
Documentation/kernel-parameters.txt means "Do not use ACPI for IRQ routing or
for PCI scanning" (it works like this on x86_64 too, although the doc says it's
x86_32-specific).

And yes, it matches "noacpiwhatever" in the command line with "noacpi".  Sigh.

> > Fortunately, noapictimer helps as well, dmesg attached (I have the one
> > from 2.6.23-rc6-mm1 ready, too).
> 
> Ok, at which point is the box stopping, when you omit noa* ? Is
> earlyprintk giving you any useful info ?

earlyprintk=vga doesn't display anything (ie. black screen) and there are no
serial ports in the box.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Thomas Gleixner
On Mon, 2007-09-24 at 14:57 +0200, Rafael J. Wysocki wrote:
> > > http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2
> > > 
> > > applied.  I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's 
> > > some
> > > -mm-specific noise in it.  Please let me know if you want it, though.
> > 
> > Hmm:
> > 
> > > Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer 
> > > apic=verbose 2
> > ^^^
> > 
> > noacpitimer is not a valid commandline option.
> > 
> > I asked for: 
> > >> > > noapictimer
> 
> I'm blind, sorry.
> 
> > So I really wonder, why noacpitimer on the kernel command line makes any
> > difference. I'm confused.
> 
> \metoo
> 
> Well, it was probably read as "noacpi". :-)

Hmm, ACPI is in the log all over the place.

> Fortunately, noapictimer helps as well, dmesg attached (I have the one
> from 2.6.23-rc6-mm1 ready, too).

Ok, at which point is the box stopping, when you omit noa* ? Is
earlyprintk giving you any useful info ?

tglx




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Rafael J. Wysocki
On Monday, 24 September 2007 10:07, Thomas Gleixner wrote:
> On Sun, 2007-09-23 at 22:52 +0200, Rafael J. Wysocki wrote:
> > > > Second, noacpitimer added to the command line makes all of the kernels, 
> > > > up to
> > > > and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible).
> > > 
> > > That's valuable information. Can you please provide a boot log of one of
> > > those with an additional "apic=verbose" on the command line ?
> > 
> > Attached is the dmesg output from the 2.6.23-rc6 kernel with the patchset:
> > 
> > http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2
> > 
> > applied.  I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's 
> > some
> > -mm-specific noise in it.  Please let me know if you want it, though.
> 
> Hmm:
> 
> > Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer 
> > apic=verbose 2
> ^^^
> 
> noacpitimer is not a valid commandline option.
> 
> I asked for: 
> >> > > noapictimer

I'm blind, sorry.

> So I really wonder, why noacpitimer on the kernel command line makes any
> difference. I'm confused.

\metoo

Well, it was probably read as "noacpi". :-)

Fortunately, noapictimer helps as well, dmesg attached (I have the one
from 2.6.23-rc6-mm1 ready, too).

Greetings,
Rafael
Linux version 2.6.23-rc6-hrt ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Sat Sep 22 22:38:18 CEST 2007
Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noapictimer apic=verbose 2
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 77fd (usable)
 BIOS-e820: 77fd - 77fe5600 (reserved)
 BIOS-e820: 77fe5600 - 77ff8000 (ACPI NVS)
 BIOS-e820: 77ff8000 - 8000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec02000 (reserved)
 BIOS-e820: ffbc - ffcc (reserved)
 BIOS-e820: fff0 - 0001 (reserved)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 491472) 1 entries of 256 used
end_pfn_map = 1048576
DMI 2.4 present.
ACPI: RSDP 000F7D30, 0024 (r2 HP)
ACPI: XSDT 77FE57B4, 0054 (r1 HP 0944  6070620 HP  1)
ACPI: FACP 77FE5684, 00F4 (r4 HP 09443 HP  1)
ACPI: DSDT 77FE58DC, EE7A (r1 HPSB4001 MSFT  10E)
ACPI: FACS 77FF7E80, 0040
ACPI: APIC 77FE5808, 0062 (r1 HP 09441 HP  1)
ACPI: MCFG 77FE586C, 003C (r1 HP 09441 HP  1)
ACPI: TCPA 77FE58A8, 0032 (r2 HP 09441 HP  1)
ACPI: SSDT 77FF4756, 0059 (r1 HP   HPQNLP1 MSFT  10E)
ACPI: SSDT 77FF47AF, 0206 (r1 HP PSSTBLID1 HP  1)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 491472) 1 entries of 256 used
No mptable found.
Zone PFN ranges:
  DMA 0 -> 4096
  DMA324096 ->  1048576
  Normal1048576 ->  1048576
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 ->  159
0:  256 ->   491472
On node 0 totalpages: 491375
  DMA zone: 56 pages used for memmap
  DMA zone: 1446 pages reserved
  DMA zone: 2497 pages, LIFO batch:0
  DMA32 zone: 6663 pages used for memmap
  DMA32 zone: 480713 pages, LIFO batch:31
  Normal zone: 0 pages used for memmap
  Movable zone: 0 pages used for memmap
ATI board detected. Disabling timer routing over 8254.
ACPI: PM-Timer IO Port: 0x8008
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
mapped APIC to ff5fb000 (fee0)
mapped IOAPIC to ff5fa000 (fec0)
swsusp: Registered nosave memory region: 0009f000 - 000a
swsusp: Registered nosave memory region: 000a - 000e
swsusp: Registered nosave memory region: 000e - 0010
Allocating PCI resources starting at 8800 (gap: 8000:6000)
SMP: Allowing 2 CPUs, 0 hotplug CPUs
PERCPU: Allocating 47576 bytes of per cpu data
Built 1 zonelists in Zone order.  Total pages: 483210
Kernel command line: root=/dev/sda3 vga=792 

Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Thomas Gleixner
On Sun, 2007-09-23 at 22:52 +0200, Rafael J. Wysocki wrote:
> > > Second, noacpitimer added to the command line makes all of the kernels, 
> > > up to
> > > and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible).
> > 
> > That's valuable information. Can you please provide a boot log of one of
> > those with an additional "apic=verbose" on the command line ?
> 
> Attached is the dmesg output from the 2.6.23-rc6 kernel with the patchset:
> 
> http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2
> 
> applied.  I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's some
> -mm-specific noise in it.  Please let me know if you want it, though.

Hmm:

> Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer 
> apic=verbose 2
^^^

noacpitimer is not a valid commandline option.

I asked for: 
>> > > noapictimer

So I really wonder, why noacpitimer on the kernel command line makes any
difference. I'm confused.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Thomas Gleixner
On Sun, 2007-09-23 at 22:52 +0200, Rafael J. Wysocki wrote:
   Second, noacpitimer added to the command line makes all of the kernels, 
   up to
   and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible).
  
  That's valuable information. Can you please provide a boot log of one of
  those with an additional apic=verbose on the command line ?
 
 Attached is the dmesg output from the 2.6.23-rc6 kernel with the patchset:
 
 http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2
 
 applied.  I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's some
 -mm-specific noise in it.  Please let me know if you want it, though.

Hmm:

 Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer 
 apic=verbose 2
^^^

noacpitimer is not a valid commandline option.

I asked for: 
   noapictimer

So I really wonder, why noacpitimer on the kernel command line makes any
difference. I'm confused.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Rafael J. Wysocki
On Monday, 24 September 2007 10:07, Thomas Gleixner wrote:
 On Sun, 2007-09-23 at 22:52 +0200, Rafael J. Wysocki wrote:
Second, noacpitimer added to the command line makes all of the kernels, 
up to
and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible).
   
   That's valuable information. Can you please provide a boot log of one of
   those with an additional apic=verbose on the command line ?
  
  Attached is the dmesg output from the 2.6.23-rc6 kernel with the patchset:
  
  http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2
  
  applied.  I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's 
  some
  -mm-specific noise in it.  Please let me know if you want it, though.
 
 Hmm:
 
  Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer 
  apic=verbose 2
 ^^^
 
 noacpitimer is not a valid commandline option.
 
 I asked for: 
noapictimer

I'm blind, sorry.

 So I really wonder, why noacpitimer on the kernel command line makes any
 difference. I'm confused.

\metoo

Well, it was probably read as noacpi. :-)

Fortunately, noapictimer helps as well, dmesg attached (I have the one
from 2.6.23-rc6-mm1 ready, too).

Greetings,
Rafael
Linux version 2.6.23-rc6-hrt ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Sat Sep 22 22:38:18 CEST 2007
Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noapictimer apic=verbose 2
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 77fd (usable)
 BIOS-e820: 77fd - 77fe5600 (reserved)
 BIOS-e820: 77fe5600 - 77ff8000 (ACPI NVS)
 BIOS-e820: 77ff8000 - 8000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec02000 (reserved)
 BIOS-e820: ffbc - ffcc (reserved)
 BIOS-e820: fff0 - 0001 (reserved)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 491472) 1 entries of 256 used
end_pfn_map = 1048576
DMI 2.4 present.
ACPI: RSDP 000F7D30, 0024 (r2 HP)
ACPI: XSDT 77FE57B4, 0054 (r1 HP 0944  6070620 HP  1)
ACPI: FACP 77FE5684, 00F4 (r4 HP 09443 HP  1)
ACPI: DSDT 77FE58DC, EE7A (r1 HPSB4001 MSFT  10E)
ACPI: FACS 77FF7E80, 0040
ACPI: APIC 77FE5808, 0062 (r1 HP 09441 HP  1)
ACPI: MCFG 77FE586C, 003C (r1 HP 09441 HP  1)
ACPI: TCPA 77FE58A8, 0032 (r2 HP 09441 HP  1)
ACPI: SSDT 77FF4756, 0059 (r1 HP   HPQNLP1 MSFT  10E)
ACPI: SSDT 77FF47AF, 0206 (r1 HP PSSTBLID1 HP  1)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 491472) 1 entries of 256 used
No mptable found.
Zone PFN ranges:
  DMA 0 - 4096
  DMA324096 -  1048576
  Normal1048576 -  1048576
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 -  159
0:  256 -   491472
On node 0 totalpages: 491375
  DMA zone: 56 pages used for memmap
  DMA zone: 1446 pages reserved
  DMA zone: 2497 pages, LIFO batch:0
  DMA32 zone: 6663 pages used for memmap
  DMA32 zone: 480713 pages, LIFO batch:31
  Normal zone: 0 pages used for memmap
  Movable zone: 0 pages used for memmap
ATI board detected. Disabling timer routing over 8254.
ACPI: PM-Timer IO Port: 0x8008
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
mapped APIC to ff5fb000 (fee0)
mapped IOAPIC to ff5fa000 (fec0)
swsusp: Registered nosave memory region: 0009f000 - 000a
swsusp: Registered nosave memory region: 000a - 000e
swsusp: Registered nosave memory region: 000e - 0010
Allocating PCI resources starting at 8800 (gap: 8000:6000)
SMP: Allowing 2 CPUs, 0 hotplug CPUs
PERCPU: Allocating 47576 bytes of per cpu data
Built 1 zonelists in Zone order.  Total pages: 483210
Kernel command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noapictimer apic=verbose 2
Initializing CPU#0
PID hash 

Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Thomas Gleixner
On Mon, 2007-09-24 at 14:57 +0200, Rafael J. Wysocki wrote:
   http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2
   
   applied.  I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's 
   some
   -mm-specific noise in it.  Please let me know if you want it, though.
  
  Hmm:
  
   Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer 
   apic=verbose 2
  ^^^
  
  noacpitimer is not a valid commandline option.
  
  I asked for: 
 noapictimer
 
 I'm blind, sorry.
 
  So I really wonder, why noacpitimer on the kernel command line makes any
  difference. I'm confused.
 
 \metoo
 
 Well, it was probably read as noacpi. :-)

Hmm, ACPI is in the log all over the place.

 Fortunately, noapictimer helps as well, dmesg attached (I have the one
 from 2.6.23-rc6-mm1 ready, too).

Ok, at which point is the box stopping, when you omit noa* ? Is
earlyprintk giving you any useful info ?

tglx




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Rafael J. Wysocki
On Monday, 24 September 2007 15:05, Thomas Gleixner wrote:
 On Mon, 2007-09-24 at 14:57 +0200, Rafael J. Wysocki wrote:
http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2

applied.  I also have the 2.6.23-rc6-mm1 dmesg output ready, but 
there's some
-mm-specific noise in it.  Please let me know if you want it, though.
   
   Hmm:
   
Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer 
apic=verbose 2
   ^^^
   
   noacpitimer is not a valid commandline option.
   
   I asked for: 
  noapictimer
  
  I'm blind, sorry.
  
   So I really wonder, why noacpitimer on the kernel command line makes any
   difference. I'm confused.
  
  \metoo
  
  Well, it was probably read as noacpi. :-)
 
 Hmm, ACPI is in the log all over the place.

Well, noacpi seems to be a synonym for pci=noacpi.

Anyway, it causes acpi_disable_pci() to be executed, which according to
Documentation/kernel-parameters.txt means Do not use ACPI for IRQ routing or
for PCI scanning (it works like this on x86_64 too, although the doc says it's
x86_32-specific).

And yes, it matches noacpiwhatever in the command line with noacpi.  Sigh.

  Fortunately, noapictimer helps as well, dmesg attached (I have the one
  from 2.6.23-rc6-mm1 ready, too).
 
 Ok, at which point is the box stopping, when you omit noa* ? Is
 earlyprintk giving you any useful info ?

earlyprintk=vga doesn't display anything (ie. black screen) and there are no
serial ports in the box.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Thomas Gleixner
On Mon, 2007-09-24 at 15:52 +0200, Rafael J. Wysocki wrote:
So I really wonder, why noacpitimer on the kernel command line makes any
difference. I'm confused.
   
   \metoo
   
   Well, it was probably read as noacpi. :-)
  
  Hmm, ACPI is in the log all over the place.
 
 Well, noacpi seems to be a synonym for pci=noacpi.
 
 Anyway, it causes acpi_disable_pci() to be executed, which according to
 Documentation/kernel-parameters.txt means Do not use ACPI for IRQ routing or
 for PCI scanning (it works like this on x86_64 too, although the doc says 
 it's
 x86_32-specific).

Hrm. The local apic timer calibration does not use anything which is
related to interrupts, but if we use the local APIC timer we switch off
PIT.

Can you boot Linus latest (w/o hrt patches) and add apicmaintimer to
the kernel command line please ?

 And yes, it matches noacpiwhatever in the command line with noacpi.  Sigh.

Urgh.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Rafael J. Wysocki
On Monday, 24 September 2007 16:23, Thomas Gleixner wrote:
 On Mon, 2007-09-24 at 15:52 +0200, Rafael J. Wysocki wrote:
 So I really wonder, why noacpitimer on the kernel command line makes 
 any
 difference. I'm confused.

\metoo

Well, it was probably read as noacpi. :-)
   
   Hmm, ACPI is in the log all over the place.
  
  Well, noacpi seems to be a synonym for pci=noacpi.
  
  Anyway, it causes acpi_disable_pci() to be executed, which according to
  Documentation/kernel-parameters.txt means Do not use ACPI for IRQ routing 
  or
  for PCI scanning (it works like this on x86_64 too, although the doc says 
  it's
  x86_32-specific).
 
 Hrm. The local apic timer calibration does not use anything which is
 related to interrupts, but if we use the local APIC timer we switch off
 PIT.
 
 Can you boot Linus latest (w/o hrt patches) and add apicmaintimer to
 the kernel command line please ?

Works, dmesg attached.

Greetings,
Rafael
Linux version 2.6.23-rc7test ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #19 SMP Mon Sep 24 16:55:05 CEST 2007
Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 apicmaintimer apic=verbose 2
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 77fd (usable)
 BIOS-e820: 77fd - 77fe5600 (reserved)
 BIOS-e820: 77fe5600 - 77ff8000 (ACPI NVS)
 BIOS-e820: 77ff8000 - 8000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec02000 (reserved)
 BIOS-e820: ffbc - ffcc (reserved)
 BIOS-e820: fff0 - 0001 (reserved)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 491472) 1 entries of 256 used
end_pfn_map = 1048576
DMI 2.4 present.
ACPI: RSDP 000F7D30, 0024 (r2 HP)
ACPI: XSDT 77FE57B4, 0054 (r1 HP 0944  6070620 HP  1)
ACPI: FACP 77FE5684, 00F4 (r4 HP 09443 HP  1)
ACPI: DSDT 77FE58DC, EE7A (r1 HPSB4001 MSFT  10E)
ACPI: FACS 77FF7E80, 0040
ACPI: APIC 77FE5808, 0062 (r1 HP 09441 HP  1)
ACPI: MCFG 77FE586C, 003C (r1 HP 09441 HP  1)
ACPI: TCPA 77FE58A8, 0032 (r2 HP 09441 HP  1)
ACPI: SSDT 77FF4756, 0059 (r1 HP   HPQNLP1 MSFT  10E)
ACPI: SSDT 77FF47AF, 0206 (r1 HP PSSTBLID1 HP  1)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 491472) 1 entries of 256 used
No mptable found.
Zone PFN ranges:
  DMA 0 - 4096
  DMA324096 -  1048576
  Normal1048576 -  1048576
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 -  159
0:  256 -   491472
On node 0 totalpages: 491375
  DMA zone: 56 pages used for memmap
  DMA zone: 1442 pages reserved
  DMA zone: 2501 pages, LIFO batch:0
  DMA32 zone: 6663 pages used for memmap
  DMA32 zone: 480713 pages, LIFO batch:31
  Normal zone: 0 pages used for memmap
  Movable zone: 0 pages used for memmap
ATI board detected. Disabling timer routing over 8254.
ACPI: PM-Timer IO Port: 0x8008
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
mapped APIC to ff5fb000 (fee0)
mapped IOAPIC to ff5fa000 (fec0)
swsusp: Registered nosave memory region: 0009f000 - 000a
swsusp: Registered nosave memory region: 000a - 000e
swsusp: Registered nosave memory region: 000e - 0010
Allocating PCI resources starting at 8800 (gap: 8000:6000)
SMP: Allowing 2 CPUs, 0 hotplug CPUs
PERCPU: Allocating 47320 bytes of per cpu data
Built 1 zonelists in Zone order.  Total pages: 483214
Kernel command line: root=/dev/sda3 vga=792 resume=/dev/sda1 apicmaintimer apic=verbose 2
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Extended CMOS year: 2000
Marking TSC unstable due to TSCs unsynchronized
time.c: Detected 1995.108 MHz processor.
Console: colour dummy device 80x25
console [tty0] enabled
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode-cache hash table entries: 131072 

Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Thomas Gleixner
On Mon, 2007-09-24 at 17:18 +0200, Rafael J. Wysocki wrote:
   Well, noacpi seems to be a synonym for pci=noacpi.
   
   Anyway, it causes acpi_disable_pci() to be executed, which according to
   Documentation/kernel-parameters.txt means Do not use ACPI for IRQ 
   routing or
   for PCI scanning (it works like this on x86_64 too, although the doc 
   says it's
   x86_32-specific).
  
  Hrm. The local apic timer calibration does not use anything which is
  related to interrupts, but if we use the local APIC timer we switch off
  PIT.
  
  Can you boot Linus latest (w/o hrt patches) and add apicmaintimer to
  the kernel command line please ?
 
 Works, dmesg attached.

/me scratches head

We know, that
- disabling local apic timers work
- local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING
is given on the kernel command line.

I have no clue, what might be the difference of noacpiFSCKEDPARSING. The
boot log is not giving any hint at all.

acpi_disable_pci() sets acpi_pci_disabled and acpi_noirq to 1.

What happens, if you set acpi=noirq instead ?

tglx








-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Rafael J. Wysocki
On Monday, 24 September 2007 18:46, Thomas Gleixner wrote:
 On Mon, 2007-09-24 at 17:18 +0200, Rafael J. Wysocki wrote:
Well, noacpi seems to be a synonym for pci=noacpi.

Anyway, it causes acpi_disable_pci() to be executed, which according to
Documentation/kernel-parameters.txt means Do not use ACPI for IRQ 
routing or
for PCI scanning (it works like this on x86_64 too, although the doc 
says it's
x86_32-specific).
   
   Hrm. The local apic timer calibration does not use anything which is
   related to interrupts, but if we use the local APIC timer we switch off
   PIT.
   
   Can you boot Linus latest (w/o hrt patches) and add apicmaintimer to
   the kernel command line please ?
  
  Works, dmesg attached.
 
 /me scratches head

Retested.

 We know, that
 - disabling local apic timers work

This works reproducibly accross the board.

 - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING

This stopped working, although it evidently worked yesterday (wtf?).

There seems to be a history effect in the box, to make things more
interesting.

 is given on the kernel command line.
 
 I have no clue, what might be the difference of noacpiFSCKEDPARSING. The
 boot log is not giving any hint at all.
 
 acpi_disable_pci() sets acpi_pci_disabled and acpi_noirq to 1.
 
 What happens, if you set acpi=noirq instead ?

That obviously doesn't help.

I think the only solid data point so far is that noapictimer makes the box
boot.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-24 Thread Thomas Gleixner
On Mon, 2007-09-24 at 21:11 +0200, Rafael J. Wysocki wrote:
  /me scratches head
 
 Retested.
 
  We know, that
  - disabling local apic timers work
 
 This works reproducibly accross the board.

Ok

  - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING
 
 This stopped working, although it evidently worked yesterday (wtf?).
 
 There seems to be a history effect in the box, to make things more
 interesting.

Did you connect this box to Andrews VAIO during KS ?

 I think the only solid data point so far is that noapictimer makes the box
 boot.

Ok. Can you add nmi_watchdog=1 to the command line please. This runs
through the calibration of APIC, but registers it as a dummy clock
source (the PIT must run to make the watchdog work).

If it boots, please provide the output of /proc/timer_list

Thanks, 

tlgx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-23 Thread Rafael J. Wysocki
On Sunday, 23 September 2007 21:59, Thomas Gleixner wrote:
> On Sun, 2007-09-23 at 22:08 +0200, Rafael J. Wysocki wrote:
> > > > Since the boot fails very early, before any messages reach the (VGA) 
> > > > console,
> > > > I have no idea what to do next, except for digging in the code.
> > > 
> > > Ok, lets track it down. Is there any difference when you add:
> > > 
> > > nohz=off
> > > highres=off
> > > noapictimer
> > > 
> > > or any combinations of the above to the kernel command line ?
> > 
> > First, for now, I build all kernels with NO_HZ and HIGH_RES_TIMERS unset
> > (.config for 2.6.23-rc6-mm1 is attached).
> > 
> > Second, noacpitimer added to the command line makes all of the kernels, up 
> > to
> > and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible).
> 
> That's valuable information. Can you please provide a boot log of one of
> those with an additional "apic=verbose" on the command line ?

Attached is the dmesg output from the 2.6.23-rc6 kernel with the patchset:

http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2

applied.  I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's some
-mm-specific noise in it.  Please let me know if you want it, though.

Greetings,
Rafael
Linux version 2.6.23-rc6-hrt ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Sat Sep 22 22:38:18 CEST 2007
Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer apic=verbose 2
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 77fd (usable)
 BIOS-e820: 77fd - 77fe5600 (reserved)
 BIOS-e820: 77fe5600 - 77ff8000 (ACPI NVS)
 BIOS-e820: 77ff8000 - 8000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec02000 (reserved)
 BIOS-e820: ffbc - ffcc (reserved)
 BIOS-e820: fff0 - 0001 (reserved)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 491472) 1 entries of 256 used
end_pfn_map = 1048576
DMI 2.4 present.
ACPI: RSDP 000F7D30, 0024 (r2 HP)
ACPI: XSDT 77FE57B4, 0054 (r1 HP 0944  6070620 HP  1)
ACPI: FACP 77FE5684, 00F4 (r4 HP 09443 HP  1)
ACPI: DSDT 77FE58DC, EE7A (r1 HPSB4001 MSFT  10E)
ACPI: FACS 77FF7E80, 0040
ACPI: APIC 77FE5808, 0062 (r1 HP 09441 HP  1)
ACPI: MCFG 77FE586C, 003C (r1 HP 09441 HP  1)
ACPI: TCPA 77FE58A8, 0032 (r2 HP 09441 HP  1)
ACPI: SSDT 77FF4756, 0059 (r1 HP   HPQNLP1 MSFT  10E)
ACPI: SSDT 77FF47AF, 0206 (r1 HP PSSTBLID1 HP  1)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 491472) 1 entries of 256 used
No mptable found.
Zone PFN ranges:
  DMA 0 -> 4096
  DMA324096 ->  1048576
  Normal1048576 ->  1048576
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 ->  159
0:  256 ->   491472
On node 0 totalpages: 491375
  DMA zone: 56 pages used for memmap
  DMA zone: 1446 pages reserved
  DMA zone: 2497 pages, LIFO batch:0
  DMA32 zone: 6663 pages used for memmap
  DMA32 zone: 480713 pages, LIFO batch:31
  Normal zone: 0 pages used for memmap
  Movable zone: 0 pages used for memmap
ATI board detected. Disabling timer routing over 8254.
ACPI: PM-Timer IO Port: 0x8008
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
mapped APIC to ff5fb000 (fee0)
mapped IOAPIC to ff5fa000 (fec0)
swsusp: Registered nosave memory region: 0009f000 - 000a
swsusp: Registered nosave memory region: 000a - 000e
swsusp: Registered nosave memory region: 000e - 0010
Allocating PCI resources starting at 8800 (gap: 8000:6000)
SMP: Allowing 2 CPUs, 0 hotplug CPUs
PERCPU: Allocating 47576 bytes of per cpu data
Built 1 zonelists in Zone order.  Total pages: 483210
Kernel command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer apic=verbose 2
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 

Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-23 Thread Thomas Gleixner
On Sun, 2007-09-23 at 22:08 +0200, Rafael J. Wysocki wrote:
> > > Since the boot fails very early, before any messages reach the (VGA) 
> > > console,
> > > I have no idea what to do next, except for digging in the code.
> > 
> > Ok, lets track it down. Is there any difference when you add:
> > 
> > nohz=off
> > highres=off
> > noapictimer
> > 
> > or any combinations of the above to the kernel command line ?
> 
> First, for now, I build all kernels with NO_HZ and HIGH_RES_TIMERS unset
> (.config for 2.6.23-rc6-mm1 is attached).
> 
> Second, noacpitimer added to the command line makes all of the kernels, up to
> and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible).

That's valuable information. Can you please provide a boot log of one of
those with an additional "apic=verbose" on the command line ?

Thanks,

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-23 Thread Rafael J. Wysocki
On Sunday, 23 September 2007 21:10, Thomas Gleixner wrote:
> On Sun, 2007-09-23 at 12:57 +0200, Rafael J. Wysocki wrote:
> > Hi Thomas,
> > 
> > Unfortunately, my observation that the patch series:
> > 
> > http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2
> > 
> > worked with 2.6.23-rc4 was wrong.  It _sometimes_ works, but usually doesn't
> > boot, just like 2.6.23-rc4-mm1, 2.6.23-rc6-mm1 and everything in between 
> > with
> > the above patch series applied.  I've also tried:
> > 
> > http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2
> > http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch
> > 
> > with the same result.
> > 
> > The problematic patch is x86_64-convert-to-clockevents.patch .
> > 
> > Since the boot fails very early, before any messages reach the (VGA) 
> > console,
> > I have no idea what to do next, except for digging in the code.
> 
> Ok, lets track it down. Is there any difference when you add:
> 
> nohz=off
> highres=off
> noapictimer
> 
> or any combinations of the above to the kernel command line ?

First, for now, I build all kernels with NO_HZ and HIGH_RES_TIMERS unset
(.config for 2.6.23-rc6-mm1 is attached).

Second, noacpitimer added to the command line makes all of the kernels, up to
and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible).

Greetings,
Rafael
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc6-mm1
# Tue Sep 18 22:52:04 2007
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_NONIRQ_WAKEUP=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_NR_QUICK=2
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
# CONFIG_TASK_XACCT is not set
# CONFIG_USER_NS is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_TREE=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
# CONFIG_CONTAINERS is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_KPAGEMAP=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_BLK_DEV_BSG is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
# CONFIG_TICK_ONESHOT is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_X86_PC=y
# CONFIG_X86_VSMP is not set
CONFIG_MK8=y
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_MTRR=y
CONFIG_SMP=y
# CONFIG_SCHED_SMT is not set
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
# CONFIG_NUMA is not set
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not 

Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-23 Thread Thomas Gleixner
On Sun, 2007-09-23 at 12:57 +0200, Rafael J. Wysocki wrote:
> Hi Thomas,
> 
> Unfortunately, my observation that the patch series:
> 
> http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2
> 
> worked with 2.6.23-rc4 was wrong.  It _sometimes_ works, but usually doesn't
> boot, just like 2.6.23-rc4-mm1, 2.6.23-rc6-mm1 and everything in between with
> the above patch series applied.  I've also tried:
> 
> http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2
> http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch
> 
> with the same result.
> 
> The problematic patch is x86_64-convert-to-clockevents.patch .
> 
> Since the boot fails very early, before any messages reach the (VGA) console,
> I have no idea what to do next, except for digging in the code.

Ok, lets track it down. Is there any difference when you add:

nohz=off
highres=off
noapictimer

or any combinations of the above to the kernel command line ?

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-23 Thread Rafael J. Wysocki
Hi Thomas,

Unfortunately, my observation that the patch series:

http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2

worked with 2.6.23-rc4 was wrong.  It _sometimes_ works, but usually doesn't
boot, just like 2.6.23-rc4-mm1, 2.6.23-rc6-mm1 and everything in between with
the above patch series applied.  I've also tried:

http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2
http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch

with the same result.

The problematic patch is x86_64-convert-to-clockevents.patch .

Since the boot fails very early, before any messages reach the (VGA) console,
I have no idea what to do next, except for digging in the code.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-23 Thread Rafael J. Wysocki
Hi Thomas,

Unfortunately, my observation that the patch series:

http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2

worked with 2.6.23-rc4 was wrong.  It _sometimes_ works, but usually doesn't
boot, just like 2.6.23-rc4-mm1, 2.6.23-rc6-mm1 and everything in between with
the above patch series applied.  I've also tried:

http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2
http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch

with the same result.

The problematic patch is x86_64-convert-to-clockevents.patch .

Since the boot fails very early, before any messages reach the (VGA) console,
I have no idea what to do next, except for digging in the code.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-23 Thread Thomas Gleixner
On Sun, 2007-09-23 at 12:57 +0200, Rafael J. Wysocki wrote:
 Hi Thomas,
 
 Unfortunately, my observation that the patch series:
 
 http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2
 
 worked with 2.6.23-rc4 was wrong.  It _sometimes_ works, but usually doesn't
 boot, just like 2.6.23-rc4-mm1, 2.6.23-rc6-mm1 and everything in between with
 the above patch series applied.  I've also tried:
 
 http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2
 http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch
 
 with the same result.
 
 The problematic patch is x86_64-convert-to-clockevents.patch .
 
 Since the boot fails very early, before any messages reach the (VGA) console,
 I have no idea what to do next, except for digging in the code.

Ok, lets track it down. Is there any difference when you add:

nohz=off
highres=off
noapictimer

or any combinations of the above to the kernel command line ?

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-23 Thread Rafael J. Wysocki
On Sunday, 23 September 2007 21:10, Thomas Gleixner wrote:
 On Sun, 2007-09-23 at 12:57 +0200, Rafael J. Wysocki wrote:
  Hi Thomas,
  
  Unfortunately, my observation that the patch series:
  
  http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2
  
  worked with 2.6.23-rc4 was wrong.  It _sometimes_ works, but usually doesn't
  boot, just like 2.6.23-rc4-mm1, 2.6.23-rc6-mm1 and everything in between 
  with
  the above patch series applied.  I've also tried:
  
  http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2
  http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch
  
  with the same result.
  
  The problematic patch is x86_64-convert-to-clockevents.patch .
  
  Since the boot fails very early, before any messages reach the (VGA) 
  console,
  I have no idea what to do next, except for digging in the code.
 
 Ok, lets track it down. Is there any difference when you add:
 
 nohz=off
 highres=off
 noapictimer
 
 or any combinations of the above to the kernel command line ?

First, for now, I build all kernels with NO_HZ and HIGH_RES_TIMERS unset
(.config for 2.6.23-rc6-mm1 is attached).

Second, noacpitimer added to the command line makes all of the kernels, up to
and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible).

Greetings,
Rafael
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc6-mm1
# Tue Sep 18 22:52:04 2007
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_NONIRQ_WAKEUP=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_NR_QUICK=2
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
# CONFIG_TASK_XACCT is not set
# CONFIG_USER_NS is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_TREE=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
# CONFIG_CONTAINERS is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_KPAGEMAP=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_BLK_DEV_BSG is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED=cfq

#
# Processor type and features
#
# CONFIG_TICK_ONESHOT is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_X86_PC=y
# CONFIG_X86_VSMP is not set
CONFIG_MK8=y
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_MTRR=y
CONFIG_SMP=y
# CONFIG_SCHED_SMT is not set
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
# CONFIG_NUMA is not set
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y

Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-23 Thread Thomas Gleixner
On Sun, 2007-09-23 at 22:08 +0200, Rafael J. Wysocki wrote:
   Since the boot fails very early, before any messages reach the (VGA) 
   console,
   I have no idea what to do next, except for digging in the code.
  
  Ok, lets track it down. Is there any difference when you add:
  
  nohz=off
  highres=off
  noapictimer
  
  or any combinations of the above to the kernel command line ?
 
 First, for now, I build all kernels with NO_HZ and HIGH_RES_TIMERS unset
 (.config for 2.6.23-rc6-mm1 is attached).
 
 Second, noacpitimer added to the command line makes all of the kernels, up to
 and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible).

That's valuable information. Can you please provide a boot log of one of
those with an additional apic=verbose on the command line ?

Thanks,

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents

2007-09-23 Thread Rafael J. Wysocki
On Sunday, 23 September 2007 21:59, Thomas Gleixner wrote:
 On Sun, 2007-09-23 at 22:08 +0200, Rafael J. Wysocki wrote:
Since the boot fails very early, before any messages reach the (VGA) 
console,
I have no idea what to do next, except for digging in the code.
   
   Ok, lets track it down. Is there any difference when you add:
   
   nohz=off
   highres=off
   noapictimer
   
   or any combinations of the above to the kernel command line ?
  
  First, for now, I build all kernels with NO_HZ and HIGH_RES_TIMERS unset
  (.config for 2.6.23-rc6-mm1 is attached).
  
  Second, noacpitimer added to the command line makes all of the kernels, up 
  to
  and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible).
 
 That's valuable information. Can you please provide a boot log of one of
 those with an additional apic=verbose on the command line ?

Attached is the dmesg output from the 2.6.23-rc6 kernel with the patchset:

http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2

applied.  I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's some
-mm-specific noise in it.  Please let me know if you want it, though.

Greetings,
Rafael
Linux version 2.6.23-rc6-hrt ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Sat Sep 22 22:38:18 CEST 2007
Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer apic=verbose 2
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 77fd (usable)
 BIOS-e820: 77fd - 77fe5600 (reserved)
 BIOS-e820: 77fe5600 - 77ff8000 (ACPI NVS)
 BIOS-e820: 77ff8000 - 8000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fec02000 (reserved)
 BIOS-e820: ffbc - ffcc (reserved)
 BIOS-e820: fff0 - 0001 (reserved)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 491472) 1 entries of 256 used
end_pfn_map = 1048576
DMI 2.4 present.
ACPI: RSDP 000F7D30, 0024 (r2 HP)
ACPI: XSDT 77FE57B4, 0054 (r1 HP 0944  6070620 HP  1)
ACPI: FACP 77FE5684, 00F4 (r4 HP 09443 HP  1)
ACPI: DSDT 77FE58DC, EE7A (r1 HPSB4001 MSFT  10E)
ACPI: FACS 77FF7E80, 0040
ACPI: APIC 77FE5808, 0062 (r1 HP 09441 HP  1)
ACPI: MCFG 77FE586C, 003C (r1 HP 09441 HP  1)
ACPI: TCPA 77FE58A8, 0032 (r2 HP 09441 HP  1)
ACPI: SSDT 77FF4756, 0059 (r1 HP   HPQNLP1 MSFT  10E)
ACPI: SSDT 77FF47AF, 0206 (r1 HP PSSTBLID1 HP  1)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 491472) 1 entries of 256 used
No mptable found.
Zone PFN ranges:
  DMA 0 - 4096
  DMA324096 -  1048576
  Normal1048576 -  1048576
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 -  159
0:  256 -   491472
On node 0 totalpages: 491375
  DMA zone: 56 pages used for memmap
  DMA zone: 1446 pages reserved
  DMA zone: 2497 pages, LIFO batch:0
  DMA32 zone: 6663 pages used for memmap
  DMA32 zone: 480713 pages, LIFO batch:31
  Normal zone: 0 pages used for memmap
  Movable zone: 0 pages used for memmap
ATI board detected. Disabling timer routing over 8254.
ACPI: PM-Timer IO Port: 0x8008
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
mapped APIC to ff5fb000 (fee0)
mapped IOAPIC to ff5fa000 (fec0)
swsusp: Registered nosave memory region: 0009f000 - 000a
swsusp: Registered nosave memory region: 000a - 000e
swsusp: Registered nosave memory region: 000e - 0010
Allocating PCI resources starting at 8800 (gap: 8000:6000)
SMP: Allowing 2 CPUs, 0 hotplug CPUs
PERCPU: Allocating 47576 bytes of per cpu data
Built 1 zonelists in Zone order.  Total pages: 483210
Kernel command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer apic=verbose 2
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Extended CMOS year: 2000
TSC calibrated against