Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Sunday 30 September 2007 16:06:59 Thomas Gleixner wrote: > On Sun, 30 Sep 2007, Andi Kleen wrote: > > >>> OK, this explains 2) and 3). I just looked into the code and the logic > >>> vs. noapictimer on SMP is completely broken. > > > > noapictimer really doesn't make any sense on non SMP imho with the old > > timer architecture. That is why I never bothered to implement it. > > It's purely a UP hack. > > It does not matter whether it makes sense to you or not. It is a command > line option which bricks systems. A lot of command line options do that -- if not they would be usually default or automatically used by the kernel. > There is neither an explanation in > Dokumentation/kernel-parameters.txt nor a check in the code, which > disables this completely. Fair enough. I can add a warning in the Documentation. > It makes a lot of sense even with the existing architecture. Trouble > shooting a box, where the local apic timer does not work correctly is not > an UP only requirement. It should not be needed with current systems as far as I know (see my previous mail) > I understand the code quite well. I'm just surprised from time to time by > interesting hacks in the so clean x8664 tree. No hack in this area as far as I know. > > [1] Or let's call it "I trust all my time to the CPU" and no more > > southrbridge > > aka put all eggs in one basket. Given the trends in CPU power saving that > > is a quite dangerous strategy. > > No, it's not dangerous. It definitely caused a lot of problems in the single socket multi core world; but yes you probably worked around all of them that I'm aware of currently. What I just objected to was that you complained that the current x86-64 time code -- which works much more conservatively and thus needs less workarounds -- doesn't have all of them. You basically tried to apply the special debugging strategies for clockevents to the old code and then complained that they don't work. > We spent quite some time to make the clock events > layer flexible enough to handle the current problems and the design allows > to add more infrastructure when necessary. Grand words for relatively simple changes. Anyways as far as I know even for hypothetical future C2+ capable multi socket systems the current x86-64 time code should work -- it should automatically select broadcasting. The only thing it relies on that if there are no multi socket C1E systems with broken APIC timers. Since that could be only future CPUs anyways and I haven't seen any indication that of the upcomming CPUs will have such broken C1. > The maybe new (mis)features of > upcoming CPUs need to be addressed with or without clock events and they > need to be done careful and not by random hacks. Not sure what random hacks you refer to. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Sun, 30 Sep 2007, Andi Kleen wrote: OK, this explains 2) and 3). I just looked into the code and the logic vs. noapictimer on SMP is completely broken. noapictimer really doesn't make any sense on non SMP imho with the old timer architecture. That is why I never bothered to implement it. It's purely a UP hack. It does not matter whether it makes sense to you or not. It is a command line option which bricks systems. There is neither an explanation in Dokumentation/kernel-parameters.txt nor a check in the code, which disables this completely. It makes a lot of sense even with the existing architecture. Trouble shooting a box, where the local apic timer does not work correctly is not an UP only requirement. Yes, it is a hack, a _bad_ hack. ..and thanks for the explanation. Thanks for finding it so quickly guys. Sounds like this will be fixed properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt patch too) There is nothing really to fix currently. Clockevents changes behaviour majorly (always using APIC timers without irq 0 backups[1]) and that causes problems that need new workarounds and new fixes (surprise surprise!) That merge would probably fix a few more such "Thomas doesn't understand the code" bugs I guess because he hacks much more on i386 than x86-64; but if the overall result will be really better is a totally different question. I understand the code quite well. I'm just surprised from time to time by interesting hacks in the so clean x8664 tree. [1] Or let's call it "I trust all my time to the CPU" and no more southrbridge aka put all eggs in one basket. Given the trends in CPU power saving that is a quite dangerous strategy. No, it's not dangerous. We spent quite some time to make the clock events layer flexible enough to handle the current problems and the design allows to add more infrastructure when necessary. The maybe new (mis)features of upcoming CPUs need to be addressed with or without clock events and they need to be done careful and not by random hacks. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
> > PIT keeps jiffies (and the system) running, but the local APIC timer > interrupts can get out of sync due to this C1E effect. The way C1e works on AMD is that even when one core is woken up by the PIT the APIC timer resumes on the other core on the socket too because the deep power saving that breaks the APIC timer is only active with both cores idle. And on true multi socket systems there is currently no such deep C1e -- apic timer should always work. At least that is how it was supposed to work and while I admit I haven't read every mail in this endless thread closely I didn't think Rafael's box contradicted that. > I don't think this is a critical problem, but it is wrong nevertheless. > > I think it's safe to revert the C1E patch Yes the C1e patch is completely redundant on a non clockevents kernel. > and postpone the fix to the > clock events conversion. Well, a change is only needed together with clockevent's "apicrunsmaintimer" default; but not on any non clockevents kernel. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
> > > OK, this explains 2) and 3). I just looked into the code and the logic > > vs. noapictimer on SMP is completely broken. noapictimer really doesn't make any sense on non SMP imho with the old timer architecture. That is why I never bothered to implement it. It's purely a UP hack. > ..and thanks for the explanation. > > Thanks for finding it so quickly guys. Sounds like this will be fixed > properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt > patch too) There is nothing really to fix currently. Clockevents changes behaviour majorly (always using APIC timers without irq 0 backups[1]) and that causes problems that need new workarounds and new fixes (surprise surprise!) That merge would probably fix a few more such "Thomas doesn't understand the code" bugs I guess because he hacks much more on i386 than x86-64; but if the overall result will be really better is a totally different question. -Andi [1] Or let's call it "I trust all my time to the CPU" and no more southrbridge aka put all eggs in one basket. Given the trends in CPU power saving that is a quite dangerous strategy. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
OK, this explains 2) and 3). I just looked into the code and the logic vs. noapictimer on SMP is completely broken. noapictimer really doesn't make any sense on non SMP imho with the old timer architecture. That is why I never bothered to implement it. It's purely a UP hack. ..and thanks for the explanation. Thanks for finding it so quickly guys. Sounds like this will be fixed properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt patch too) There is nothing really to fix currently. Clockevents changes behaviour majorly (always using APIC timers without irq 0 backups[1]) and that causes problems that need new workarounds and new fixes (surprise surprise!) That merge would probably fix a few more such Thomas doesn't understand the code bugs I guess because he hacks much more on i386 than x86-64; but if the overall result will be really better is a totally different question. -Andi [1] Or let's call it I trust all my time to the CPU and no more southrbridge aka put all eggs in one basket. Given the trends in CPU power saving that is a quite dangerous strategy. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
PIT keeps jiffies (and the system) running, but the local APIC timer interrupts can get out of sync due to this C1E effect. The way C1e works on AMD is that even when one core is woken up by the PIT the APIC timer resumes on the other core on the socket too because the deep power saving that breaks the APIC timer is only active with both cores idle. And on true multi socket systems there is currently no such deep C1e -- apic timer should always work. At least that is how it was supposed to work and while I admit I haven't read every mail in this endless thread closely I didn't think Rafael's box contradicted that. I don't think this is a critical problem, but it is wrong nevertheless. I think it's safe to revert the C1E patch Yes the C1e patch is completely redundant on a non clockevents kernel. and postpone the fix to the clock events conversion. Well, a change is only needed together with clockevent's apicrunsmaintimer default; but not on any non clockevents kernel. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Sun, 30 Sep 2007, Andi Kleen wrote: OK, this explains 2) and 3). I just looked into the code and the logic vs. noapictimer on SMP is completely broken. noapictimer really doesn't make any sense on non SMP imho with the old timer architecture. That is why I never bothered to implement it. It's purely a UP hack. It does not matter whether it makes sense to you or not. It is a command line option which bricks systems. There is neither an explanation in Dokumentation/kernel-parameters.txt nor a check in the code, which disables this completely. It makes a lot of sense even with the existing architecture. Trouble shooting a box, where the local apic timer does not work correctly is not an UP only requirement. Yes, it is a hack, a _bad_ hack. ..and thanks for the explanation. Thanks for finding it so quickly guys. Sounds like this will be fixed properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt patch too) There is nothing really to fix currently. Clockevents changes behaviour majorly (always using APIC timers without irq 0 backups[1]) and that causes problems that need new workarounds and new fixes (surprise surprise!) That merge would probably fix a few more such Thomas doesn't understand the code bugs I guess because he hacks much more on i386 than x86-64; but if the overall result will be really better is a totally different question. I understand the code quite well. I'm just surprised from time to time by interesting hacks in the so clean x8664 tree. [1] Or let's call it I trust all my time to the CPU and no more southrbridge aka put all eggs in one basket. Given the trends in CPU power saving that is a quite dangerous strategy. No, it's not dangerous. We spent quite some time to make the clock events layer flexible enough to handle the current problems and the design allows to add more infrastructure when necessary. The maybe new (mis)features of upcoming CPUs need to be addressed with or without clock events and they need to be done careful and not by random hacks. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Sunday 30 September 2007 16:06:59 Thomas Gleixner wrote: On Sun, 30 Sep 2007, Andi Kleen wrote: OK, this explains 2) and 3). I just looked into the code and the logic vs. noapictimer on SMP is completely broken. noapictimer really doesn't make any sense on non SMP imho with the old timer architecture. That is why I never bothered to implement it. It's purely a UP hack. It does not matter whether it makes sense to you or not. It is a command line option which bricks systems. A lot of command line options do that -- if not they would be usually default or automatically used by the kernel. There is neither an explanation in Dokumentation/kernel-parameters.txt nor a check in the code, which disables this completely. Fair enough. I can add a warning in the Documentation. It makes a lot of sense even with the existing architecture. Trouble shooting a box, where the local apic timer does not work correctly is not an UP only requirement. It should not be needed with current systems as far as I know (see my previous mail) I understand the code quite well. I'm just surprised from time to time by interesting hacks in the so clean x8664 tree. No hack in this area as far as I know. [1] Or let's call it I trust all my time to the CPU and no more southrbridge aka put all eggs in one basket. Given the trends in CPU power saving that is a quite dangerous strategy. No, it's not dangerous. It definitely caused a lot of problems in the single socket multi core world; but yes you probably worked around all of them that I'm aware of currently. What I just objected to was that you complained that the current x86-64 time code -- which works much more conservatively and thus needs less workarounds -- doesn't have all of them. You basically tried to apply the special debugging strategies for clockevents to the old code and then complained that they don't work. We spent quite some time to make the clock events layer flexible enough to handle the current problems and the design allows to add more infrastructure when necessary. Grand words for relatively simple changes. Anyways as far as I know even for hypothetical future C2+ capable multi socket systems the current x86-64 time code should work -- it should automatically select broadcasting. The only thing it relies on that if there are no multi socket C1E systems with broken APIC timers. Since that could be only future CPUs anyways and I haven't seen any indication that of the upcomming CPUs will have such broken C1. The maybe new (mis)features of upcoming CPUs need to be addressed with or without clock events and they need to be done careful and not by random hacks. Not sure what random hacks you refer to. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Thursday, 27 September 2007 01:21, Thomas Gleixner wrote: > On Thu, 2007-09-27 at 01:30 +0200, Rafael J. Wysocki wrote: > > > > Tested for a couple of times with each kernel, the results seem to be > > > > reproducible 100% of the time. > > > > > > Thanks for going through this debug marathon. > > > > No big deal. I'm glad that you've found what's up. > > > > Well, we still have the "CPU hotplug during suspend w/ the hrt patch" > > problem > > to debug ... ;-) > > Yeah. Knowing the actual line of code where it breaks might be helpful. Instead, I have a fix (appended, against 2.6.23-rc8-mm2). :-) Next, I'm going to enable NO_HZ and HIGH_RES_TIMERS and see what happens. ;-) Greetings, Rafael --- From: Rafael J. Wysocki <[EMAIL PROTECTED]> Fix CPU hotplug breakage on HP nx6325 and similar boxes caused by the reference to disable_apic_timer (labeled as __initdata) from the CPU initialization code. Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> --- arch/x86_64/kernel/apic.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.23-rc8-mm2/arch/x86_64/kernel/apic.c === --- linux-2.6.23-rc8-mm2.orig/arch/x86_64/kernel/apic.c +++ linux-2.6.23-rc8-mm2/arch/x86_64/kernel/apic.c @@ -42,7 +42,7 @@ int apic_verbosity; static int apic_calibrate_pmtmr __initdata; -int disable_apic_timer __initdata; +int disable_apic_timer __cpuinitdata; /* Local APIC timer works in C2? */ int local_apic_timer_c2_ok; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Thursday, 27 September 2007 01:21, Thomas Gleixner wrote: On Thu, 2007-09-27 at 01:30 +0200, Rafael J. Wysocki wrote: Tested for a couple of times with each kernel, the results seem to be reproducible 100% of the time. Thanks for going through this debug marathon. No big deal. I'm glad that you've found what's up. Well, we still have the CPU hotplug during suspend w/ the hrt patch problem to debug ... ;-) Yeah. Knowing the actual line of code where it breaks might be helpful. Instead, I have a fix (appended, against 2.6.23-rc8-mm2). :-) Next, I'm going to enable NO_HZ and HIGH_RES_TIMERS and see what happens. ;-) Greetings, Rafael --- From: Rafael J. Wysocki [EMAIL PROTECTED] Fix CPU hotplug breakage on HP nx6325 and similar boxes caused by the reference to disable_apic_timer (labeled as __initdata) from the CPU initialization code. Signed-off-by: Rafael J. Wysocki [EMAIL PROTECTED] --- arch/x86_64/kernel/apic.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.23-rc8-mm2/arch/x86_64/kernel/apic.c === --- linux-2.6.23-rc8-mm2.orig/arch/x86_64/kernel/apic.c +++ linux-2.6.23-rc8-mm2/arch/x86_64/kernel/apic.c @@ -42,7 +42,7 @@ int apic_verbosity; static int apic_calibrate_pmtmr __initdata; -int disable_apic_timer __initdata; +int disable_apic_timer __cpuinitdata; /* Local APIC timer works in C2? */ int local_apic_timer_c2_ok; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Thu, 2007-09-27 at 01:30 +0200, Rafael J. Wysocki wrote: > > > Tested for a couple of times with each kernel, the results seem to be > > > reproducible 100% of the time. > > > > Thanks for going through this debug marathon. > > No big deal. I'm glad that you've found what's up. > > Well, we still have the "CPU hotplug during suspend w/ the hrt patch" problem > to debug ... ;-) Yeah. Knowing the actual line of code where it breaks might be helpful. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
Thomas, On Wednesday, 26 September 2007 23:34, Thomas Gleixner wrote: > Rafael, > > On Wed, 2007-09-26 at 23:00 +0200, Rafael J. Wysocki wrote: > > > > > First, with the "x86-64: Disable local APIC timer use on AMD systems > > > > > with C1E" > > > > > patch and my collection of suspend patches applied, the box doesn't > > > > > boot > > > > > (the suspend patches don't even thouch the boot code, so they should > > > > > be > > > > > irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch > > > > > (adjusted > > > > > for 2.6.23-rc8) is applied in addition. Is this expected? > > > > > > > > No. That's odd. It is nothing else than adding "noapictimer" to the > > > > kernel command line. > > > > > > Seems to be reproducible, though. I'll investigate further. > > > > So far, the results are the following: > > > > 1) current Linus' tree doesn't boot with any command line (regression) > > > > [ Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0 > > > >x86-64: Disable local APIC timer use on AMD systems with C1E > > > >It's not necessary for 2.6.23 and actually kills the box that it's > > supposed to fix. ] > > > > 2) 2.6.23-rc8 w/ the "x86-64: Disable local APIC timer use on AMD systems > > with C1E" > >patch applied behaves like the current -git > > > > 3) 2.6.23-rc8 w/o this patch doesn't boot with either "noapictimer" _or_ > > OK, this explains 2) and 3). I just looked into the code and the logic > vs. noapictimer on SMP is completely broken. > > On i386 the noapictimer option not only disables the local APIC timer, > it also registers the CPUs for broadcasting via IPI on SMP systems. > > The x8664 code uses the broadcast only when the local apic timer is > active, i.e. "noapictimer" is not on the command line. This defeats the > whole purpose of "noapictimer". It should be there to make boxen work, > where the local APIC timer actually has a hardware problem, e.g. the > nx6325. > > The current implementation of x86_64 only fixes the ACPI c-states > related problem where the APIC timer stops in C3(2), nothing else. > > On nx6325 and other AMD X2 equipped systems which have the C1E enabled > we run into the following: > > PIT keeps jiffies (and the system) running, but the local APIC timer > interrupts can get out of sync due to this C1E effect. > > I don't think this is a critical problem, but it is wrong nevertheless. > > I think it's safe to revert the C1E patch and postpone the fix to the > clock events conversion. > > > "apicmaintimer" > > on your box is not going to work. See the C1E patch. "apicmaintimer" > switches off PIT and then waits for ever for the local APIC timer > interrupts. > > > 4) 2.6.22 behaves like 2.6.23-rc8 > > No surprise > > > 5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with > >"noapictimer" > > > > 6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the > >"x86-64: Disable local APIC timer use on AMD systems with C1E" patch > > boots > >without any extra command line options > > That's consistent behaviour. > > > Tested for a couple of times with each kernel, the results seem to be > > reproducible 100% of the time. > > Thanks for going through this debug marathon. No big deal. I'm glad that you've found what's up. Well, we still have the "CPU hotplug during suspend w/ the hrt patch" problem to debug ... ;-) Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Wed, 2007-09-26 at 15:22 -0700, Linus Torvalds wrote: > > On Wed, 26 Sep 2007, Thomas Gleixner wrote: > > > > > > 1) current Linus' tree doesn't boot with any command line (regression) > > > > > > [ Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0 > > Reverted. > > > OK, this explains 2) and 3). I just looked into the code and the logic > > vs. noapictimer on SMP is completely broken. > > ..and thanks for the explanation. > > Thanks for finding it so quickly guys. Sounds like this will be fixed > properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt > patch too) It's even worse than I thought on the first check: "noapictimer" on the command line of an SMP box prevents _ONLY_ the boot CPU apic timer from being used. But the secondary CPU is still unconditionally setting up the APIC timer and uses the non calibrated variable calibration_result, which is of course 0, to setup the APIC timer. Wreckage guaranteed. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Wed, 26 Sep 2007, Thomas Gleixner wrote: > > > > 1) current Linus' tree doesn't boot with any command line (regression) > > > > [ Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0 Reverted. > OK, this explains 2) and 3). I just looked into the code and the logic > vs. noapictimer on SMP is completely broken. ..and thanks for the explanation. Thanks for finding it so quickly guys. Sounds like this will be fixed properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt patch too) Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
Rafael, On Wed, 2007-09-26 at 23:00 +0200, Rafael J. Wysocki wrote: > > > > First, with the "x86-64: Disable local APIC timer use on AMD systems > > > > with C1E" > > > > patch and my collection of suspend patches applied, the box doesn't boot > > > > (the suspend patches don't even thouch the boot code, so they should be > > > > irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch > > > > (adjusted > > > > for 2.6.23-rc8) is applied in addition. Is this expected? > > > > > > No. That's odd. It is nothing else than adding "noapictimer" to the > > > kernel command line. > > > > Seems to be reproducible, though. I'll investigate further. > > So far, the results are the following: > > 1) current Linus' tree doesn't boot with any command line (regression) > > [ Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0 > >x86-64: Disable local APIC timer use on AMD systems with C1E > >It's not necessary for 2.6.23 and actually kills the box that it's > supposed to fix. ] > > 2) 2.6.23-rc8 w/ the "x86-64: Disable local APIC timer use on AMD systems > with C1E" >patch applied behaves like the current -git > > 3) 2.6.23-rc8 w/o this patch doesn't boot with either "noapictimer" _or_ OK, this explains 2) and 3). I just looked into the code and the logic vs. noapictimer on SMP is completely broken. On i386 the noapictimer option not only disables the local APIC timer, it also registers the CPUs for broadcasting via IPI on SMP systems. The x8664 code uses the broadcast only when the local apic timer is active, i.e. "noapictimer" is not on the command line. This defeats the whole purpose of "noapictimer". It should be there to make boxen work, where the local APIC timer actually has a hardware problem, e.g. the nx6325. The current implementation of x86_64 only fixes the ACPI c-states related problem where the APIC timer stops in C3(2), nothing else. On nx6325 and other AMD X2 equipped systems which have the C1E enabled we run into the following: PIT keeps jiffies (and the system) running, but the local APIC timer interrupts can get out of sync due to this C1E effect. I don't think this is a critical problem, but it is wrong nevertheless. I think it's safe to revert the C1E patch and postpone the fix to the clock events conversion. > "apicmaintimer" on your box is not going to work. See the C1E patch. "apicmaintimer" switches off PIT and then waits for ever for the local APIC timer interrupts. > 4) 2.6.22 behaves like 2.6.23-rc8 No surprise > 5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with >"noapictimer" > > 6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the >"x86-64: Disable local APIC timer use on AMD systems with C1E" patch boots >without any extra command line options That's consistent behaviour. > Tested for a couple of times with each kernel, the results seem to be > reproducible 100% of the time. Thanks for going through this debug marathon. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Wednesday, 26 September 2007 21:49, Rafael J. Wysocki wrote: > On Wednesday, 26 September 2007 20:51, Thomas Gleixner wrote: > > On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote: > > > There still are some oddities. > > > > > > First, with the "x86-64: Disable local APIC timer use on AMD systems with > > > C1E" > > > patch and my collection of suspend patches applied, the box doesn't boot > > > (the suspend patches don't even thouch the boot code, so they should be > > > irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch > > > (adjusted > > > for 2.6.23-rc8) is applied in addition. Is this expected? > > > > No. That's odd. It is nothing else than adding "noapictimer" to the > > kernel command line. > > Seems to be reproducible, though. I'll investigate further. So far, the results are the following: 1) current Linus' tree doesn't boot with any command line (regression) [ Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0 x86-64: Disable local APIC timer use on AMD systems with C1E It's not necessary for 2.6.23 and actually kills the box that it's supposed to fix. ] 2) 2.6.23-rc8 w/ the "x86-64: Disable local APIC timer use on AMD systems with C1E" patch applied behaves like the current -git 3) 2.6.23-rc8 w/o this patch doesn't boot with either "noapictimer" _or_ "apicmaintimer" 4) 2.6.22 behaves like 2.6.23-rc8 5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with "noapictimer" 6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the "x86-64: Disable local APIC timer use on AMD systems with C1E" patch boots without any extra command line options Tested for a couple of times with each kernel, the results seem to be reproducible 100% of the time. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Wednesday, 26 September 2007 20:51, Thomas Gleixner wrote: > On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote: > > There still are some oddities. > > > > First, with the "x86-64: Disable local APIC timer use on AMD systems with > > C1E" > > patch and my collection of suspend patches applied, the box doesn't boot > > (the suspend patches don't even thouch the boot code, so they should be > > irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch > > (adjusted > > for 2.6.23-rc8) is applied in addition. Is this expected? > > No. That's odd. It is nothing else than adding "noapictimer" to the > kernel command line. Seems to be reproducible, though. I'll investigate further. > > Next, on 2.6.23-rc8 with the patches from: > > > > http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/ > > > > plus the "x86-64: Disable local APIC timer use on AMD systems with C1E" > > patch > > and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation > > doesn't > > work correctly. Although the box hibernates and restores, there is a > > temporary > > "hang" during the "resume hardware" sequence, after which the "lock" led > > starts > > to blink (and remains in this state) and something like this appears in > > dmesg: > > > > Extended CMOS year: 2000 > > Enabling non-boot CPUs ... > > SMP alternatives: switching to SMP code > > Booting processor 1/2 APIC 0x1 > > Initializing CPU#1 > > Calibrating delay using timer specific routine.. 3990.36 BogoMIPS > > (lpj=7980735) > > CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) > > CPU: L2 Cache: 512K (64 bytes/line) > > Unable to handle kernel paging request at 806c64d4 RIP: > > [] identify_cpu+0x2ac/0x5a1 > > Hmm. That's really early in the CPU bring up. The only change in this > area is the C1E patch. Can you decode the exact source line, where it is > failing ? Yes, I can, but I'll first see what's wrong with the boot. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote: > There still are some oddities. > > First, with the "x86-64: Disable local APIC timer use on AMD systems with C1E" > patch and my collection of suspend patches applied, the box doesn't boot > (the suspend patches don't even thouch the boot code, so they should be > irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted > for 2.6.23-rc8) is applied in addition. Is this expected? No. That's odd. It is nothing else than adding "noapictimer" to the kernel command line. > Next, on 2.6.23-rc8 with the patches from: > > http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/ > > plus the "x86-64: Disable local APIC timer use on AMD systems with C1E" patch > and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation doesn't > work correctly. Although the box hibernates and restores, there is a > temporary > "hang" during the "resume hardware" sequence, after which the "lock" led > starts > to blink (and remains in this state) and something like this appears in dmesg: > > Extended CMOS year: 2000 > Enabling non-boot CPUs ... > SMP alternatives: switching to SMP code > Booting processor 1/2 APIC 0x1 > Initializing CPU#1 > Calibrating delay using timer specific routine.. 3990.36 BogoMIPS > (lpj=7980735) > CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) > CPU: L2 Cache: 512K (64 bytes/line) > Unable to handle kernel paging request at 806c64d4 RIP: > [] identify_cpu+0x2ac/0x5a1 Hmm. That's really early in the CPU bring up. The only change in this area is the C1E patch. Can you decode the exact source line, where it is failing ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Thomas, On Tuesday, 25 September 2007 23:24, Thomas Gleixner wrote: > Rafael, > > On Tue, 2007-09-25 at 23:28 +0200, Rafael J. Wysocki wrote: > > > I'm a bit confused by your earlier confirmation, that mainline w/o the > > > -hrt patches boots fine, when you add "apicmaintimer" to the kernel > > > command line. "apicmaintimer" stops the PIT like we do in -hrt and we > > > just use the local APIC timer for everything. Can you please retest and > > > confirm that this is correct ? > > > > No, it's not. The mainline _usually_ doesn't boot with "apicmaintimer". > > > > It seems to me that _sometimes_ the CPU just doesn't enter this C1E state > > and then everything goes fine ... > > I'm relieved. I really started to go nuts on this contradicting > patterns. > > Your box seems to be worse than the VAIO, it has some random surprise > generator built in :) > > > > Is the 32 bit kernel working on that box ? > > > > Can't tell, I have only 64-bit userland here. > > Should be fine. The check is there since late 2.6.21-rc. I really could > kick my own ass that I did not remember the nx6325 wreckage in the > 2.6.21-rc time frame. Sigh, way too much broken hardware out there to > keep track of it. > > > > Thanks for your patience. > > > > Well, I'm only making sure that future kernels will run on my box. ;-) > > Nothing wrong with that. Thanks again for your help, There still are some oddities. First, with the "x86-64: Disable local APIC timer use on AMD systems with C1E" patch and my collection of suspend patches applied, the box doesn't boot (the suspend patches don't even thouch the boot code, so they should be irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8) is applied in addition. Is this expected? Next, on 2.6.23-rc8 with the patches from: http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/ plus the "x86-64: Disable local APIC timer use on AMD systems with C1E" patch and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation doesn't work correctly. Although the box hibernates and restores, there is a temporary "hang" during the "resume hardware" sequence, after which the "lock" led starts to blink (and remains in this state) and something like this appears in dmesg: Extended CMOS year: 2000 Enabling non-boot CPUs ... SMP alternatives: switching to SMP code Booting processor 1/2 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 3990.36 BogoMIPS (lpj=7980735) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) Unable to handle kernel paging request at 806c64d4 RIP: [] identify_cpu+0x2ac/0x5a1 PGD 203067 PUD 207063 PMD 37fb4163 PTE 6c6000 Oops: 0002 [1] SMP CPU 1 Modules linked in: ip6t_LOG nf_conntrack_ipv6 xt_pkttype ipt_LOG xt_limit cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table thermal processor fan snd_pcm_oss button snd_mixer_oss snd_seq battery snd_seq_device ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter ip6table_mangle nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables ipv6 loop dm_mod rfcomm hidp l2cap usbhid ff_memless psmouse hci_usb bluetooth pcmcia tg3 ohci_hcd snd_hda_intel ehci_hcd yenta_socket rsrc_nonstatic ide_cd ohci1394 k8temp i2c_piix4 pcmcia_core sdhci shpchp snd_pcm usbcore hwmon i2c_core rtc_cmos rtc_core rtc_lib ieee1394 mmc_core tifm_7xx1 tifm_core pci_hotplug snd_timer cdrom snd firmware_class ieee80211softmac ieee80211 ieee80211_crypt soundcore snd_page_alloc ext3 jbd edd atiixp ide_disk ide_core sg Pid: 0, comm: swapper Not tainted 2.6.23-rc8-rjw #6 RIP: 0010:[] [] identify_cpu+0x2ac/0x5a1 RSP: 0018:810037abdea8 EFLAGS: 00010006 RAX: 14008015 RBX: 01020800 RCX: c0010055 RDX: RSI: 0004 RDI: 0001 RBP: 810037abded8 R08: R09: 80444ad0 R10: 8070c860 R11: 0001 R12: 805920c0 R13: R14: R15: FS: () GS:810037ac3e88() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 806c64d4 CR3: 00201000 CR4: 06a0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 0, threadinfo 810037abc000, task 810037a8f800) Stack: 000f4e5a1540 059f 0001 805920c0 0001 810037abdef8 8021acaa 059f 810037abdf48 8021b380 Call Trace: [] smp_callin+0xc8/0xde [] start_secondary+0x1b/0x2e8 Code: c7 05 ff 5f 4b 00 01 00 00 00 e9 4f 01 00 00 4c 89 e7 e8 27 RIP [] identify_cpu+0x2ac/0x5a1 RSP CR2: 806c64d4
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Thomas, On Tuesday, 25 September 2007 23:24, Thomas Gleixner wrote: Rafael, On Tue, 2007-09-25 at 23:28 +0200, Rafael J. Wysocki wrote: I'm a bit confused by your earlier confirmation, that mainline w/o the -hrt patches boots fine, when you add apicmaintimer to the kernel command line. apicmaintimer stops the PIT like we do in -hrt and we just use the local APIC timer for everything. Can you please retest and confirm that this is correct ? No, it's not. The mainline _usually_ doesn't boot with apicmaintimer. It seems to me that _sometimes_ the CPU just doesn't enter this C1E state and then everything goes fine ... I'm relieved. I really started to go nuts on this contradicting patterns. Your box seems to be worse than the VAIO, it has some random surprise generator built in :) Is the 32 bit kernel working on that box ? Can't tell, I have only 64-bit userland here. Should be fine. The check is there since late 2.6.21-rc. I really could kick my own ass that I did not remember the nx6325 wreckage in the 2.6.21-rc time frame. Sigh, way too much broken hardware out there to keep track of it. Thanks for your patience. Well, I'm only making sure that future kernels will run on my box. ;-) Nothing wrong with that. Thanks again for your help, There still are some oddities. First, with the x86-64: Disable local APIC timer use on AMD systems with C1E patch and my collection of suspend patches applied, the box doesn't boot (the suspend patches don't even thouch the boot code, so they should be irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8) is applied in addition. Is this expected? Next, on 2.6.23-rc8 with the patches from: http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/ plus the x86-64: Disable local APIC timer use on AMD systems with C1E patch and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation doesn't work correctly. Although the box hibernates and restores, there is a temporary hang during the resume hardware sequence, after which the lock led starts to blink (and remains in this state) and something like this appears in dmesg: Extended CMOS year: 2000 Enabling non-boot CPUs ... SMP alternatives: switching to SMP code Booting processor 1/2 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 3990.36 BogoMIPS (lpj=7980735) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) Unable to handle kernel paging request at 806c64d4 RIP: [802104cb] identify_cpu+0x2ac/0x5a1 PGD 203067 PUD 207063 PMD 37fb4163 PTE 6c6000 Oops: 0002 [1] SMP CPU 1 Modules linked in: ip6t_LOG nf_conntrack_ipv6 xt_pkttype ipt_LOG xt_limit cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table thermal processor fan snd_pcm_oss button snd_mixer_oss snd_seq battery snd_seq_device ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter ip6table_mangle nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables ipv6 loop dm_mod rfcomm hidp l2cap usbhid ff_memless psmouse hci_usb bluetooth pcmcia tg3 ohci_hcd snd_hda_intel ehci_hcd yenta_socket rsrc_nonstatic ide_cd ohci1394 k8temp i2c_piix4 pcmcia_core sdhci shpchp snd_pcm usbcore hwmon i2c_core rtc_cmos rtc_core rtc_lib ieee1394 mmc_core tifm_7xx1 tifm_core pci_hotplug snd_timer cdrom snd firmware_class ieee80211softmac ieee80211 ieee80211_crypt soundcore snd_page_alloc ext3 jbd edd atiixp ide_disk ide_core sg Pid: 0, comm: swapper Not tainted 2.6.23-rc8-rjw #6 RIP: 0010:[802104cb] [802104cb] identify_cpu+0x2ac/0x5a1 RSP: 0018:810037abdea8 EFLAGS: 00010006 RAX: 14008015 RBX: 01020800 RCX: c0010055 RDX: RSI: 0004 RDI: 0001 RBP: 810037abded8 R08: R09: 80444ad0 R10: 8070c860 R11: 0001 R12: 805920c0 R13: R14: R15: FS: () GS:810037ac3e88() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 806c64d4 CR3: 00201000 CR4: 06a0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 0, threadinfo 810037abc000, task 810037a8f800) Stack: 000f4e5a1540 059f 0001 805920c0 0001 810037abdef8 8021acaa 059f 810037abdf48 8021b380 Call Trace: [8021acaa] smp_callin+0xc8/0xde [8021b380] start_secondary+0x1b/0x2e8 Code: c7 05 ff 5f 4b 00 01 00 00 00 e9 4f 01 00 00 4c 89 e7 e8 27 RIP [802104cb] identify_cpu+0x2ac/0x5a1 RSP
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote: There still are some oddities. First, with the x86-64: Disable local APIC timer use on AMD systems with C1E patch and my collection of suspend patches applied, the box doesn't boot (the suspend patches don't even thouch the boot code, so they should be irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8) is applied in addition. Is this expected? No. That's odd. It is nothing else than adding noapictimer to the kernel command line. Next, on 2.6.23-rc8 with the patches from: http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/ plus the x86-64: Disable local APIC timer use on AMD systems with C1E patch and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation doesn't work correctly. Although the box hibernates and restores, there is a temporary hang during the resume hardware sequence, after which the lock led starts to blink (and remains in this state) and something like this appears in dmesg: Extended CMOS year: 2000 Enabling non-boot CPUs ... SMP alternatives: switching to SMP code Booting processor 1/2 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 3990.36 BogoMIPS (lpj=7980735) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) Unable to handle kernel paging request at 806c64d4 RIP: [802104cb] identify_cpu+0x2ac/0x5a1 Hmm. That's really early in the CPU bring up. The only change in this area is the C1E patch. Can you decode the exact source line, where it is failing ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Wednesday, 26 September 2007 20:51, Thomas Gleixner wrote: On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote: There still are some oddities. First, with the x86-64: Disable local APIC timer use on AMD systems with C1E patch and my collection of suspend patches applied, the box doesn't boot (the suspend patches don't even thouch the boot code, so they should be irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8) is applied in addition. Is this expected? No. That's odd. It is nothing else than adding noapictimer to the kernel command line. Seems to be reproducible, though. I'll investigate further. Next, on 2.6.23-rc8 with the patches from: http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc8/patches/ plus the x86-64: Disable local APIC timer use on AMD systems with C1E patch and patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8), hibernation doesn't work correctly. Although the box hibernates and restores, there is a temporary hang during the resume hardware sequence, after which the lock led starts to blink (and remains in this state) and something like this appears in dmesg: Extended CMOS year: 2000 Enabling non-boot CPUs ... SMP alternatives: switching to SMP code Booting processor 1/2 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 3990.36 BogoMIPS (lpj=7980735) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) Unable to handle kernel paging request at 806c64d4 RIP: [802104cb] identify_cpu+0x2ac/0x5a1 Hmm. That's really early in the CPU bring up. The only change in this area is the C1E patch. Can you decode the exact source line, where it is failing ? Yes, I can, but I'll first see what's wrong with the boot. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Wednesday, 26 September 2007 21:49, Rafael J. Wysocki wrote: On Wednesday, 26 September 2007 20:51, Thomas Gleixner wrote: On Wed, 2007-09-26 at 17:25 +0200, Rafael J. Wysocki wrote: There still are some oddities. First, with the x86-64: Disable local APIC timer use on AMD systems with C1E patch and my collection of suspend patches applied, the box doesn't boot (the suspend patches don't even thouch the boot code, so they should be irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8) is applied in addition. Is this expected? No. That's odd. It is nothing else than adding noapictimer to the kernel command line. Seems to be reproducible, though. I'll investigate further. So far, the results are the following: 1) current Linus' tree doesn't boot with any command line (regression) [ Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0 x86-64: Disable local APIC timer use on AMD systems with C1E It's not necessary for 2.6.23 and actually kills the box that it's supposed to fix. ] 2) 2.6.23-rc8 w/ the x86-64: Disable local APIC timer use on AMD systems with C1E patch applied behaves like the current -git 3) 2.6.23-rc8 w/o this patch doesn't boot with either noapictimer _or_ apicmaintimer 4) 2.6.22 behaves like 2.6.23-rc8 5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with noapictimer 6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the x86-64: Disable local APIC timer use on AMD systems with C1E patch boots without any extra command line options Tested for a couple of times with each kernel, the results seem to be reproducible 100% of the time. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
Rafael, On Wed, 2007-09-26 at 23:00 +0200, Rafael J. Wysocki wrote: First, with the x86-64: Disable local APIC timer use on AMD systems with C1E patch and my collection of suspend patches applied, the box doesn't boot (the suspend patches don't even thouch the boot code, so they should be irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8) is applied in addition. Is this expected? No. That's odd. It is nothing else than adding noapictimer to the kernel command line. Seems to be reproducible, though. I'll investigate further. So far, the results are the following: 1) current Linus' tree doesn't boot with any command line (regression) [ Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0 x86-64: Disable local APIC timer use on AMD systems with C1E It's not necessary for 2.6.23 and actually kills the box that it's supposed to fix. ] 2) 2.6.23-rc8 w/ the x86-64: Disable local APIC timer use on AMD systems with C1E patch applied behaves like the current -git 3) 2.6.23-rc8 w/o this patch doesn't boot with either noapictimer _or_ OK, this explains 2) and 3). I just looked into the code and the logic vs. noapictimer on SMP is completely broken. On i386 the noapictimer option not only disables the local APIC timer, it also registers the CPUs for broadcasting via IPI on SMP systems. The x8664 code uses the broadcast only when the local apic timer is active, i.e. noapictimer is not on the command line. This defeats the whole purpose of noapictimer. It should be there to make boxen work, where the local APIC timer actually has a hardware problem, e.g. the nx6325. The current implementation of x86_64 only fixes the ACPI c-states related problem where the APIC timer stops in C3(2), nothing else. On nx6325 and other AMD X2 equipped systems which have the C1E enabled we run into the following: PIT keeps jiffies (and the system) running, but the local APIC timer interrupts can get out of sync due to this C1E effect. I don't think this is a critical problem, but it is wrong nevertheless. I think it's safe to revert the C1E patch and postpone the fix to the clock events conversion. apicmaintimer on your box is not going to work. See the C1E patch. apicmaintimer switches off PIT and then waits for ever for the local APIC timer interrupts. 4) 2.6.22 behaves like 2.6.23-rc8 No surprise 5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with noapictimer 6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the x86-64: Disable local APIC timer use on AMD systems with C1E patch boots without any extra command line options That's consistent behaviour. Tested for a couple of times with each kernel, the results seem to be reproducible 100% of the time. Thanks for going through this debug marathon. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Wed, 26 Sep 2007, Thomas Gleixner wrote: 1) current Linus' tree doesn't boot with any command line (regression) [ Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0 Reverted. OK, this explains 2) and 3). I just looked into the code and the logic vs. noapictimer on SMP is completely broken. ..and thanks for the explanation. Thanks for finding it so quickly guys. Sounds like this will be fixed properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt patch too) Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Wed, 2007-09-26 at 15:22 -0700, Linus Torvalds wrote: On Wed, 26 Sep 2007, Thomas Gleixner wrote: 1) current Linus' tree doesn't boot with any command line (regression) [ Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0 Reverted. OK, this explains 2) and 3). I just looked into the code and the logic vs. noapictimer on SMP is completely broken. ..and thanks for the explanation. Thanks for finding it so quickly guys. Sounds like this will be fixed properly in 2.6.24 with the x86 merge (which hopefully brings in the hrt patch too) It's even worse than I thought on the first check: noapictimer on the command line of an SMP box prevents _ONLY_ the boot CPU apic timer from being used. But the secondary CPU is still unconditionally setting up the APIC timer and uses the non calibrated variable calibration_result, which is of course 0, to setup the APIC timer. Wreckage guaranteed. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
Thomas, On Wednesday, 26 September 2007 23:34, Thomas Gleixner wrote: Rafael, On Wed, 2007-09-26 at 23:00 +0200, Rafael J. Wysocki wrote: First, with the x86-64: Disable local APIC timer use on AMD systems with C1E patch and my collection of suspend patches applied, the box doesn't boot (the suspend patches don't even thouch the boot code, so they should be irrelevant here). However, it boots if patch-2.6.23-rc7-hrt1.patch (adjusted for 2.6.23-rc8) is applied in addition. Is this expected? No. That's odd. It is nothing else than adding noapictimer to the kernel command line. Seems to be reproducible, though. I'll investigate further. So far, the results are the following: 1) current Linus' tree doesn't boot with any command line (regression) [ Linus, please revert commit e66485d747505e9d960b864fc6c37f8b2afafaf0 x86-64: Disable local APIC timer use on AMD systems with C1E It's not necessary for 2.6.23 and actually kills the box that it's supposed to fix. ] 2) 2.6.23-rc8 w/ the x86-64: Disable local APIC timer use on AMD systems with C1E patch applied behaves like the current -git 3) 2.6.23-rc8 w/o this patch doesn't boot with either noapictimer _or_ OK, this explains 2) and 3). I just looked into the code and the logic vs. noapictimer on SMP is completely broken. On i386 the noapictimer option not only disables the local APIC timer, it also registers the CPUs for broadcasting via IPI on SMP systems. The x8664 code uses the broadcast only when the local apic timer is active, i.e. noapictimer is not on the command line. This defeats the whole purpose of noapictimer. It should be there to make boxen work, where the local APIC timer actually has a hardware problem, e.g. the nx6325. The current implementation of x86_64 only fixes the ACPI c-states related problem where the APIC timer stops in C3(2), nothing else. On nx6325 and other AMD X2 equipped systems which have the C1E enabled we run into the following: PIT keeps jiffies (and the system) running, but the local APIC timer interrupts can get out of sync due to this C1E effect. I don't think this is a critical problem, but it is wrong nevertheless. I think it's safe to revert the C1E patch and postpone the fix to the clock events conversion. apicmaintimer on your box is not going to work. See the C1E patch. apicmaintimer switches off PIT and then waits for ever for the local APIC timer interrupts. 4) 2.6.22 behaves like 2.6.23-rc8 No surprise 5) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch boots only with noapictimer 6) 2.6.23-rc8 with (adjusted) patch-2.6.23-rc7-hrt1.patch and with the x86-64: Disable local APIC timer use on AMD systems with C1E patch boots without any extra command line options That's consistent behaviour. Tested for a couple of times with each kernel, the results seem to be reproducible 100% of the time. Thanks for going through this debug marathon. No big deal. I'm glad that you've found what's up. Well, we still have the CPU hotplug during suspend w/ the hrt patch problem to debug ... ;-) Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION from 2.6.23-rc8] (was: Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents)
On Thu, 2007-09-27 at 01:30 +0200, Rafael J. Wysocki wrote: Tested for a couple of times with each kernel, the results seem to be reproducible 100% of the time. Thanks for going through this debug marathon. No big deal. I'm glad that you've found what's up. Well, we still have the CPU hotplug during suspend w/ the hrt patch problem to debug ... ;-) Yeah. Knowing the actual line of code where it breaks might be helpful. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Rafael, On Tue, 2007-09-25 at 23:28 +0200, Rafael J. Wysocki wrote: > > I'm a bit confused by your earlier confirmation, that mainline w/o the > > -hrt patches boots fine, when you add "apicmaintimer" to the kernel > > command line. "apicmaintimer" stops the PIT like we do in -hrt and we > > just use the local APIC timer for everything. Can you please retest and > > confirm that this is correct ? > > No, it's not. The mainline _usually_ doesn't boot with "apicmaintimer". > > It seems to me that _sometimes_ the CPU just doesn't enter this C1E state > and then everything goes fine ... I'm relieved. I really started to go nuts on this contradicting patterns. Your box seems to be worse than the VAIO, it has some random surprise generator built in :) > > Is the 32 bit kernel working on that box ? > > Can't tell, I have only 64-bit userland here. Should be fine. The check is there since late 2.6.21-rc. I really could kick my own ass that I did not remember the nx6325 wreckage in the 2.6.21-rc time frame. Sigh, way too much broken hardware out there to keep track of it. > > Thanks for your patience. > > Well, I'm only making sure that future kernels will run on my box. ;-) Nothing wrong with that. Thanks again for your help, tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Thomas, On Tuesday, 25 September 2007 22:46, Thomas Gleixner wrote: > Rafael, > > On Tue, 2007-09-25 at 22:07 +0200, Rafael J. Wysocki wrote: > > On Tuesday, 25 September 2007 15:17, Thomas Gleixner wrote: > > > On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote: > > [--snip--] > > > > > > I start to get desperate. Below is a patch, which moves the apic timer > > > disable check after the calibration routine. Can you please apply on top > > > of -hrt and add "noapictimer" to the command line ? Does it boot ? > > > > 2.6.23-rc7 with patch-2.6.23-rc7-hrt1.patch and the patch below applied > > boots > > with noapictimer and doesn't boot without it. > > That was expected. I explicitly asked to add "noapictimer" to the kernel > command line. > > Ok, so we ruled out the apic timer calibration routine. I did not expect > that this would be the culprit, but with "dark screen" as the only debug > info, I need to resort to small steps. > > Can you please send me the output of /proc/timer_list of 2.6.23-rc7-hrt1 > after booting with "noapictimer" ? Sure, attached. [Note: the kernel has been compiled with both NO_HZ and HIGH_RES_TIMERS unset.] > I'm a bit confused by your earlier confirmation, that mainline w/o the > -hrt patches boots fine, when you add "apicmaintimer" to the kernel > command line. "apicmaintimer" stops the PIT like we do in -hrt and we > just use the local APIC timer for everything. Can you please retest and > confirm that this is correct ? No, it's not. The mainline _usually_ doesn't boot with "apicmaintimer". It seems to me that _sometimes_ the CPU just doesn't enter this C1E state and then everything goes fine ... > Is the 32 bit kernel working on that box ? Can't tell, I have only 64-bit userland here. > Thanks for your patience. Well, I'm only making sure that future kernels will run on my box. ;-) > tglx > > PS: I just sent out the "disable APIC timer for AMD C1E boxen" patch. Yes, I've already tested it and sent a reply. It works. :-) > We debugged this half a year ago on a nx6325, but I completely forgot about > that. The explanation from AMD was sensible, but your "apicmaintimer" > works statement is contradictory. Well, it was wrong. I have some problems with resuming from suspend to RAM using 2.6.23-rc8-mm1 with this patch applied, but I think they are related to something else. I'll wait for the next -mm with debugging that. For now, I'm going to build 2.6.23-rc8 with my collection of suspend patches plus patch-2.6.23-rc7-hrt1.patch and the "disable APIC timer for AMD C1E boxes" patch applied. I'll play with that a bit and let you know how it's behaving. Greetings, Rafael Timer List Version: v0.3 HRTIMER_MAX_CLOCK_BASES: 2 now at 279792107058 nsecs cpu: 0 clock 0: .index: 0 .resolution: 4000250 nsecs .get_time: ktime_get_real active timers: clock 1: .index: 1 .resolution: 4000250 nsecs .get_time: ktime_get active timers: #0: , hrtimer_wakeup, S:01, do_nanosleep, kwrapper/4664 # expires at 280207419178 nsecs [in 415312120 nsecs] #1: , hrtimer_wakeup, S:01, futex_wait, nscd/4080 # expires at 282678021548 nsecs [in 2885914490 nsecs] #2: , hrtimer_wakeup, S:01, futex_wait, nscd/4082 # expires at 282678129670 nsecs [in 2886022612 nsecs] #3: , it_real_fn, S:01, do_setitimer, qmgr/4239 # expires at 378654389676 nsecs [in 98862282618 nsecs] #4: , it_real_fn, S:01, do_setitimer, pickup/4238 # expires at 557809025993 nsecs [in 278016918935 nsecs] #5: , it_real_fn, S:01, do_setitimer, master/4216 # expires at 557809137746 nsecs [in 278017030688 nsecs] cpu: 1 clock 0: .index: 0 .resolution: 4000250 nsecs .get_time: ktime_get_real active timers: clock 1: .index: 1 .resolution: 4000250 nsecs .get_time: ktime_get active timers: #0: , it_real_fn, S:01, do_setitimer, Xorg/4355 # expires at 279804542721 nsecs [in 12435663 nsecs] #1: , it_real_fn, S:01, do_setitimer, ssh-agent/4611 # expires at 279962268496 nsecs [in 170161438 nsecs] #2: , hrtimer_wakeup, S:01, do_nanosleep, hald-addon-stor/4148 # expires at 280071774352 nsecs [in 279667294 nsecs] #3: , hrtimer_wakeup, S:01, futex_wait, nscd/4081 # expires at 282678034680 nsecs [in 2885927622 nsecs] #4: , hrtimer_wakeup, S:01, do_nanosleep, cron/4241 # expires at 335311096287 nsecs [in 55518989229 nsecs] #5: , it_real_fn, S:01, do_setitimer, dhcpcd/5128 # expires at 604918992928181 nsecs [in 604639200821123 nsecs] #6: , hrtimer_wakeup, S:01, do_nanosleep, dhcpcd/5128 # expires at 604918992950531 nsecs [in 604639200843473 nsecs] Tick Device: mode: 0 Clock Event Device: pit max_delta_ns: 27461866 min_delta_ns: 12571 mult: 5124677 shift: 32 mode: 2 next_event: 9223372036854775807 nsecs set_next_event: pit_next_event set_mode: init_pit_timer event_handler: tick_handle_periodic_broadcast tick_broadcast_mask: 0003 Tick Device: mode: 0 Clock Event Device: lapic
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Rafael, On Tue, 2007-09-25 at 22:07 +0200, Rafael J. Wysocki wrote: > On Tuesday, 25 September 2007 15:17, Thomas Gleixner wrote: > > On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote: > [--snip--] > > > > I start to get desperate. Below is a patch, which moves the apic timer > > disable check after the calibration routine. Can you please apply on top > > of -hrt and add "noapictimer" to the command line ? Does it boot ? > > 2.6.23-rc7 with patch-2.6.23-rc7-hrt1.patch and the patch below applied boots > with noapictimer and doesn't boot without it. That was expected. I explicitly asked to add "noapictimer" to the kernel command line. Ok, so we ruled out the apic timer calibration routine. I did not expect that this would be the culprit, but with "dark screen" as the only debug info, I need to resort to small steps. Can you please send me the output of /proc/timer_list of 2.6.23-rc7-hrt1 after booting with "noapictimer" ? I'm a bit confused by your earlier confirmation, that mainline w/o the -hrt patches boots fine, when you add "apicmaintimer" to the kernel command line. "apicmaintimer" stops the PIT like we do in -hrt and we just use the local APIC timer for everything. Can you please retest and confirm that this is correct ? Is the 32 bit kernel working on that box ? Thanks for your patience. tglx PS: I just sent out the "disable APIC timer for AMD C1E boxen" patch. We debugged this half a year ago on a nx6325, but I completely forgot about that. The explanation from AMD was sensible, but your "apicmaintimer" works statement is contradictory. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tuesday, 25 September 2007 15:17, Thomas Gleixner wrote: > On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote: [--snip--] > > I start to get desperate. Below is a patch, which moves the apic timer > disable check after the calibration routine. Can you please apply on top > of -hrt and add "noapictimer" to the command line ? Does it boot ? 2.6.23-rc7 with patch-2.6.23-rc7-hrt1.patch and the patch below applied boots with noapictimer and doesn't boot without it. Also, attached is the output of # cat /proc/interrupts; sleep 10; cat /proc/interrupts from the current mainline. Greetings, Rafael > Index: linux-2.6.23-rc7/arch/x86_64/kernel/apic.c > === > --- linux-2.6.23-rc7.orig/arch/x86_64/kernel/apic.c 2007-09-24 > 20:30:00.0 +0200 > +++ linux-2.6.23-rc7/arch/x86_64/kernel/apic.c2007-09-25 > 15:05:32.0 +0200 > @@ -927,6 +927,7 @@ static void __init calibrate_APIC_clock( > > void __init setup_boot_APIC_clock (void) > { > +#if 0 > /* >* The local apic timer can be disabled via the kernel commandline. >* Register the lapic timer as a dummy clock event source on SMP > @@ -940,7 +941,7 @@ void __init setup_boot_APIC_clock (void) > setup_APIC_timer(); > return; > } > - > +#endif > printk(KERN_INFO "Using local APIC timer interrupts.\n"); > calibrate_APIC_clock(); > > @@ -949,11 +950,13 @@ void __init setup_boot_APIC_clock (void) >* PIT/HPET going. Otherwise register lapic as a dummy >* device. >*/ > - if (nmi_watchdog != NMI_IO_APIC) > + if (!disable_apic_timer && nmi_watchdog != NMI_IO_APIC) > lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY; > +#if 0 > else > printk(KERN_WARNING "APIC timer registered as dummy," > " due to nmi_watchdog=1!\n"); > +#endif > > setup_APIC_timer(); > } > > > > -- "Premature optimization is the root of all evil." - Donald Knuth albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts CPU0 CPU1 0: 62489 0 local-APIC-edge timer 1: 3232 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 12: 1147 IO-APIC-edge i8042 14: 15 1947 IO-APIC-edge ide0 16:193 14151 IO-APIC-fasteoi sata_sil, HDA Intel 19: 76 43153 IO-APIC-fasteoi ohci_hcd:usb1, ehci_hcd:usb2, ohci_hcd:usb3 20: 0 4 IO-APIC-fasteoi ohci1394, tifm_7xx1, yenta, sdhci:slot0 21: 7172 IO-APIC-fasteoi acpi NMI: 0 0 LOC: 62454 62082 ERR: 0 CPU0 CPU1 0: 64993 0 local-APIC-edge timer 1: 3233 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 12: 1147 IO-APIC-edge i8042 14: 15 2037 IO-APIC-edge ide0 16:194 14265 IO-APIC-fasteoi sata_sil, HDA Intel 19: 77 45155 IO-APIC-fasteoi ohci_hcd:usb1, ehci_hcd:usb2, ohci_hcd:usb3 20: 0 4 IO-APIC-fasteoi ohci1394, tifm_7xx1, yenta, sdhci:slot0 21: 7176 IO-APIC-fasteoi acpi NMI: 0 0 LOC: 64958 64586 ERR: 0 albercik:~ #
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote: > > > There seems to be a history effect in the box, to make things more > > > "interesting". > > > > Did you connect this box to Andrews VAIO during KS ? > > No, but it's famous for being interestingly broken nevertheless. :) > > > I think the only solid data point so far is that "noapictimer" makes the > > > box > > > boot. > > > > Ok. Can you add "nmi_watchdog=1" to the command line please. This runs > > through the calibration of APIC, but registers it as a dummy clock > > source (the PIT must run to make the watchdog work). > > > > If it boots, please provide the output of /proc/timer_list > > No, it doesn't. I start to get desperate. Below is a patch, which moves the apic timer disable check after the calibration routine. Can you please apply on top of -hrt and add "noapictimer" to the command line ? Does it boot ? tglx Index: linux-2.6.23-rc7/arch/x86_64/kernel/apic.c === --- linux-2.6.23-rc7.orig/arch/x86_64/kernel/apic.c 2007-09-24 20:30:00.0 +0200 +++ linux-2.6.23-rc7/arch/x86_64/kernel/apic.c 2007-09-25 15:05:32.0 +0200 @@ -927,6 +927,7 @@ static void __init calibrate_APIC_clock( void __init setup_boot_APIC_clock (void) { +#if 0 /* * The local apic timer can be disabled via the kernel commandline. * Register the lapic timer as a dummy clock event source on SMP @@ -940,7 +941,7 @@ void __init setup_boot_APIC_clock (void) setup_APIC_timer(); return; } - +#endif printk(KERN_INFO "Using local APIC timer interrupts.\n"); calibrate_APIC_clock(); @@ -949,11 +950,13 @@ void __init setup_boot_APIC_clock (void) * PIT/HPET going. Otherwise register lapic as a dummy * device. */ - if (nmi_watchdog != NMI_IO_APIC) + if (!disable_apic_timer && nmi_watchdog != NMI_IO_APIC) lapic_clockevent.features &= ~CLOCK_EVT_FEAT_DUMMY; +#if 0 else printk(KERN_WARNING "APIC timer registered as dummy," " due to nmi_watchdog=1!\n"); +#endif setup_APIC_timer(); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Monday, 24 September 2007 21:13, Thomas Gleixner wrote: > On Mon, 2007-09-24 at 21:11 +0200, Rafael J. Wysocki wrote: > > > /me scratches head > > > > Retested. > > > > > We know, that > > > - disabling local apic timers work > > > > This works reproducibly accross the board. > > Ok > > > > - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING > > > > This stopped working, although it evidently worked yesterday (wtf?). > > > > There seems to be a history effect in the box, to make things more > > "interesting". > > Did you connect this box to Andrews VAIO during KS ? No, but it's famous for being interestingly broken nevertheless. > > I think the only solid data point so far is that "noapictimer" makes the box > > boot. > > Ok. Can you add "nmi_watchdog=1" to the command line please. This runs > through the calibration of APIC, but registers it as a dummy clock > source (the PIT must run to make the watchdog work). > > If it boots, please provide the output of /proc/timer_list No, it doesn't. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tuesday, 25 September 2007 14:52, Rafael J. Wysocki wrote: > On Tuesday, 25 September 2007 14:28, Thomas Gleixner wrote: > > On Tue, 2007-09-25 at 14:20 +0200, Rafael J. Wysocki wrote: > > > > > > As i can see from the log, you are booting on computer with > > > > > > dualcore AMD > > > > > > processor. Do you have C1E feature enabled? > > > > > > That's possible, how to check? > > > > > > > > > i386 kernel disable lapic on dualcore AMD with C1E support (see > > > > > > http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this > > > > > > patch still (it's required for tickless kernel only). > > > > > > > > > > Well it is required for non tickless mode as well. > > > > > > > > > > > As result, if > > > > > > you run x86_64 kernel with hrt patch on such computer, the system > > > > > > will stall during boot on lapic timer calibration. > > > > > > > > > > Thanks for the reminder. I have a look into this. > > > > > > > > Can you please boot mainline and provide the output of: > > > > > > > > # cat /proc/interrupts; sleep 10; cat /proc/interrupts > > > > > > albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts > > >CPU0 CPU1 > > > 0:1159492 0 local-APIC-edge timer > > > LOC: 01158220 Local interrupts > > > > > > 0:1161996 0 local-APIC-edge timer > > > LOC: 01160723 Local interrupts > > > > Hmm. That's strange. It looks like the local apic timer is not used, but > > x86_64 definitely lacks the above check. > > Ouch, sorry. This is from the kernel booted with "noapictimer". > > I'll get the correct output in a little while. OK, this one is from -rc7 with no extra command line: albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts CPU0 CPU1 0: 27311 0 local-APIC-edge timer 1: 1 77 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 12: 0148 IO-APIC-edge i8042 14: 19683 IO-APIC-edge ide0 16:178 12443 IO-APIC-fasteoi sata_sil, HDA Intel 19:111 15197 IO-APIC-fasteoi ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3 20: 0 3 IO-APIC-fasteoi tifm_7xx1, yenta, sdhci:slot0, ohci1394 21: 0113 IO-APIC-fasteoi acpi NMI: 0 0 LOC: 27270 27119 ERR: 2 CPU0 CPU1 0: 29815 0 local-APIC-edge timer 1: 1 77 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 12: 0148 IO-APIC-edge i8042 14: 20772 IO-APIC-edge ide0 16:178 12451 IO-APIC-fasteoi sata_sil, HDA Intel 19:112 17199 IO-APIC-fasteoi ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3 20: 0 3 IO-APIC-fasteoi tifm_7xx1, yenta, sdhci:slot0, ohci1394 21: 0117 IO-APIC-fasteoi acpi NMI: 0 0 LOC: 29774 29623 ERR: 2 albercik:~ # Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tuesday, 25 September 2007 14:28, Thomas Gleixner wrote: > On Tue, 2007-09-25 at 14:20 +0200, Rafael J. Wysocki wrote: > > > > > As i can see from the log, you are booting on computer with dualcore > > > > > AMD > > > > > processor. Do you have C1E feature enabled? > > > > That's possible, how to check? > > > > > > > i386 kernel disable lapic on dualcore AMD with C1E support (see > > > > > http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this > > > > > patch still (it's required for tickless kernel only). > > > > > > > > Well it is required for non tickless mode as well. > > > > > > > > > As result, if > > > > > you run x86_64 kernel with hrt patch on such computer, the system > > > > > will stall during boot on lapic timer calibration. > > > > > > > > Thanks for the reminder. I have a look into this. > > > > > > Can you please boot mainline and provide the output of: > > > > > > # cat /proc/interrupts; sleep 10; cat /proc/interrupts > > > > albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts > >CPU0 CPU1 > > 0:1159492 0 local-APIC-edge timer > > LOC: 01158220 Local interrupts > > > > 0:1161996 0 local-APIC-edge timer > > LOC: 01160723 Local interrupts > > Hmm. That's strange. It looks like the local apic timer is not used, but > x86_64 definitely lacks the above check. Ouch, sorry. This is from the kernel booted with "noapictimer". I'll get the correct output in a little while. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tue, 2007-09-25 at 14:20 +0200, Rafael J. Wysocki wrote: > > > > As i can see from the log, you are booting on computer with dualcore AMD > > > > processor. Do you have C1E feature enabled? > > That's possible, how to check? > > > > > i386 kernel disable lapic on dualcore AMD with C1E support (see > > > > http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this > > > > patch still (it's required for tickless kernel only). > > > > > > Well it is required for non tickless mode as well. > > > > > > > As result, if > > > > you run x86_64 kernel with hrt patch on such computer, the system > > > > will stall during boot on lapic timer calibration. > > > > > > Thanks for the reminder. I have a look into this. > > > > Can you please boot mainline and provide the output of: > > > > # cat /proc/interrupts; sleep 10; cat /proc/interrupts > > albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts >CPU0 CPU1 > 0:1159492 0 local-APIC-edge timer > LOC: 01158220 Local interrupts > > 0:1161996 0 local-APIC-edge timer > LOC: 01160723 Local interrupts Hmm. That's strange. It looks like the local apic timer is not used, but x86_64 definitely lacks the above check. Can you please remove/disable the acpi processor module and recheck ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Thomas, On Tuesday, 25 September 2007 11:30, Thomas Gleixner wrote: > Rafael, > > On Tue, 2007-09-25 at 10:07 +0200, Thomas Gleixner wrote: > > On Tue, 2007-09-25 at 10:14 +0400, Mikhail Kshevetskiy wrote: > > > Hello Thomas, Rafael > > > > > > > We know, that > > > > - disabling local apic timers work > > > > > > As i can see from the log, you are booting on computer with dualcore AMD > > > processor. Do you have C1E feature enabled? That's possible, how to check? > > > i386 kernel disable lapic on dualcore AMD with C1E support (see > > > http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this > > > patch still (it's required for tickless kernel only). > > > > Well it is required for non tickless mode as well. > > > > > As result, if > > > you run x86_64 kernel with hrt patch on such computer, the system > > > will stall during boot on lapic timer calibration. > > > > Thanks for the reminder. I have a look into this. > > Can you please boot mainline and provide the output of: > > # cat /proc/interrupts; sleep 10; cat /proc/interrupts albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts CPU0 CPU1 0:1159492 0 local-APIC-edge timer 1: 6892 1692 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 12:156110 IO-APIC-edge i8042 14: 29613 11409 IO-APIC-edge ide0 16: 23365 21934 IO-APIC-fasteoi sata_sil, HDA Intel 18:196 88386 IO-APIC-fasteoi bcm43xx 19: 744874 279433 IO-APIC-fasteoi ohci_hcd:usb1, ohci_hcd:usb2, ehci_hcd:usb3 20: 2 4 IO-APIC-fasteoi ohci1394, yenta, tifm_7xx1, sdhci:slot0 21: 1408592 IO-APIC-fasteoi acpi NMI: 0 0 Non-maskable interrupts LOC: 01158220 Local interrupts RES: 260520 295387 Rescheduling interrupts CAL:419652 function call interrupts TLB:864541 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts SPU: 0 0 Spurious interrupts ERR: 13 CPU0 CPU1 0:1161996 0 local-APIC-edge timer 1: 6893 1692 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 12:156110 IO-APIC-edge i8042 14: 29703 11409 IO-APIC-edge ide0 16: 23393 21934 IO-APIC-fasteoi sata_sil, HDA Intel 18:196 88490 IO-APIC-fasteoi bcm43xx 19: 747268 279433 IO-APIC-fasteoi ohci_hcd:usb1, ohci_hcd:usb2, ehci_hcd:usb3 20: 2 4 IO-APIC-fasteoi ohci1394, yenta, tifm_7xx1, sdhci:slot0 21: 1412592 IO-APIC-fasteoi acpi NMI: 0 0 Non-maskable interrupts LOC: 01160723 Local interrupts RES: 260567 295433 Rescheduling interrupts CAL:419652 function call interrupts TLB:866543 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts SPU: 0 0 Spurious interrupts ERR: 13 albercik:~ # - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Rafael, On Tue, 2007-09-25 at 10:07 +0200, Thomas Gleixner wrote: > On Tue, 2007-09-25 at 10:14 +0400, Mikhail Kshevetskiy wrote: > > Hello Thomas, Rafael > > > > > We know, that > > > - disabling local apic timers work > > > > As i can see from the log, you are booting on computer with dualcore AMD > > processor. Do you have C1E feature enabled? > > > > i386 kernel disable lapic on dualcore AMD with C1E support (see > > http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this > > patch still (it's required for tickless kernel only). > > Well it is required for non tickless mode as well. > > > As result, if > > you run x86_64 kernel with hrt patch on such computer, the system > > will stall during boot on lapic timer calibration. > > Thanks for the reminder. I have a look into this. Can you please boot mainline and provide the output of: # cat /proc/interrupts; sleep 10; cat /proc/interrupts Thanks, tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tue, 2007-09-25 at 10:14 +0400, Mikhail Kshevetskiy wrote: > Hello Thomas, Rafael > > > We know, that > > - disabling local apic timers work > > As i can see from the log, you are booting on computer with dualcore AMD > processor. Do you have C1E feature enabled? > > i386 kernel disable lapic on dualcore AMD with C1E support (see > http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this > patch still (it's required for tickless kernel only). Well it is required for non tickless mode as well. > As result, if > you run x86_64 kernel with hrt patch on such computer, the system > will stall during boot on lapic timer calibration. Thanks for the reminder. I have a look into this. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tue, 2007-09-25 at 10:14 +0400, Mikhail Kshevetskiy wrote: Hello Thomas, Rafael We know, that - disabling local apic timers work As i can see from the log, you are booting on computer with dualcore AMD processor. Do you have C1E feature enabled? i386 kernel disable lapic on dualcore AMD with C1E support (see http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this patch still (it's required for tickless kernel only). Well it is required for non tickless mode as well. As result, if you run x86_64 kernel with hrt patch on such computer, the system will stall during boot on lapic timer calibration. Thanks for the reminder. I have a look into this. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Rafael, On Tue, 2007-09-25 at 10:07 +0200, Thomas Gleixner wrote: On Tue, 2007-09-25 at 10:14 +0400, Mikhail Kshevetskiy wrote: Hello Thomas, Rafael We know, that - disabling local apic timers work As i can see from the log, you are booting on computer with dualcore AMD processor. Do you have C1E feature enabled? i386 kernel disable lapic on dualcore AMD with C1E support (see http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this patch still (it's required for tickless kernel only). Well it is required for non tickless mode as well. As result, if you run x86_64 kernel with hrt patch on such computer, the system will stall during boot on lapic timer calibration. Thanks for the reminder. I have a look into this. Can you please boot mainline and provide the output of: # cat /proc/interrupts; sleep 10; cat /proc/interrupts Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Thomas, On Tuesday, 25 September 2007 11:30, Thomas Gleixner wrote: Rafael, On Tue, 2007-09-25 at 10:07 +0200, Thomas Gleixner wrote: On Tue, 2007-09-25 at 10:14 +0400, Mikhail Kshevetskiy wrote: Hello Thomas, Rafael We know, that - disabling local apic timers work As i can see from the log, you are booting on computer with dualcore AMD processor. Do you have C1E feature enabled? That's possible, how to check? i386 kernel disable lapic on dualcore AMD with C1E support (see http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this patch still (it's required for tickless kernel only). Well it is required for non tickless mode as well. As result, if you run x86_64 kernel with hrt patch on such computer, the system will stall during boot on lapic timer calibration. Thanks for the reminder. I have a look into this. Can you please boot mainline and provide the output of: # cat /proc/interrupts; sleep 10; cat /proc/interrupts albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts CPU0 CPU1 0:1159492 0 local-APIC-edge timer 1: 6892 1692 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 12:156110 IO-APIC-edge i8042 14: 29613 11409 IO-APIC-edge ide0 16: 23365 21934 IO-APIC-fasteoi sata_sil, HDA Intel 18:196 88386 IO-APIC-fasteoi bcm43xx 19: 744874 279433 IO-APIC-fasteoi ohci_hcd:usb1, ohci_hcd:usb2, ehci_hcd:usb3 20: 2 4 IO-APIC-fasteoi ohci1394, yenta, tifm_7xx1, sdhci:slot0 21: 1408592 IO-APIC-fasteoi acpi NMI: 0 0 Non-maskable interrupts LOC: 01158220 Local interrupts RES: 260520 295387 Rescheduling interrupts CAL:419652 function call interrupts TLB:864541 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts SPU: 0 0 Spurious interrupts ERR: 13 CPU0 CPU1 0:1161996 0 local-APIC-edge timer 1: 6893 1692 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 12:156110 IO-APIC-edge i8042 14: 29703 11409 IO-APIC-edge ide0 16: 23393 21934 IO-APIC-fasteoi sata_sil, HDA Intel 18:196 88490 IO-APIC-fasteoi bcm43xx 19: 747268 279433 IO-APIC-fasteoi ohci_hcd:usb1, ohci_hcd:usb2, ehci_hcd:usb3 20: 2 4 IO-APIC-fasteoi ohci1394, yenta, tifm_7xx1, sdhci:slot0 21: 1412592 IO-APIC-fasteoi acpi NMI: 0 0 Non-maskable interrupts LOC: 01160723 Local interrupts RES: 260567 295433 Rescheduling interrupts CAL:419652 function call interrupts TLB:866543 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts SPU: 0 0 Spurious interrupts ERR: 13 albercik:~ # - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tue, 2007-09-25 at 14:20 +0200, Rafael J. Wysocki wrote: As i can see from the log, you are booting on computer with dualcore AMD processor. Do you have C1E feature enabled? That's possible, how to check? i386 kernel disable lapic on dualcore AMD with C1E support (see http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this patch still (it's required for tickless kernel only). Well it is required for non tickless mode as well. As result, if you run x86_64 kernel with hrt patch on such computer, the system will stall during boot on lapic timer calibration. Thanks for the reminder. I have a look into this. Can you please boot mainline and provide the output of: # cat /proc/interrupts; sleep 10; cat /proc/interrupts albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts CPU0 CPU1 0:1159492 0 local-APIC-edge timer LOC: 01158220 Local interrupts 0:1161996 0 local-APIC-edge timer LOC: 01160723 Local interrupts Hmm. That's strange. It looks like the local apic timer is not used, but x86_64 definitely lacks the above check. Can you please remove/disable the acpi processor module and recheck ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tuesday, 25 September 2007 14:28, Thomas Gleixner wrote: On Tue, 2007-09-25 at 14:20 +0200, Rafael J. Wysocki wrote: As i can see from the log, you are booting on computer with dualcore AMD processor. Do you have C1E feature enabled? That's possible, how to check? i386 kernel disable lapic on dualcore AMD with C1E support (see http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this patch still (it's required for tickless kernel only). Well it is required for non tickless mode as well. As result, if you run x86_64 kernel with hrt patch on such computer, the system will stall during boot on lapic timer calibration. Thanks for the reminder. I have a look into this. Can you please boot mainline and provide the output of: # cat /proc/interrupts; sleep 10; cat /proc/interrupts albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts CPU0 CPU1 0:1159492 0 local-APIC-edge timer LOC: 01158220 Local interrupts 0:1161996 0 local-APIC-edge timer LOC: 01160723 Local interrupts Hmm. That's strange. It looks like the local apic timer is not used, but x86_64 definitely lacks the above check. Ouch, sorry. This is from the kernel booted with noapictimer. I'll get the correct output in a little while. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tuesday, 25 September 2007 14:52, Rafael J. Wysocki wrote: On Tuesday, 25 September 2007 14:28, Thomas Gleixner wrote: On Tue, 2007-09-25 at 14:20 +0200, Rafael J. Wysocki wrote: As i can see from the log, you are booting on computer with dualcore AMD processor. Do you have C1E feature enabled? That's possible, how to check? i386 kernel disable lapic on dualcore AMD with C1E support (see http://lkml.org/lkml/2007/3/29/199). x86_64 kernel do not have this patch still (it's required for tickless kernel only). Well it is required for non tickless mode as well. As result, if you run x86_64 kernel with hrt patch on such computer, the system will stall during boot on lapic timer calibration. Thanks for the reminder. I have a look into this. Can you please boot mainline and provide the output of: # cat /proc/interrupts; sleep 10; cat /proc/interrupts albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts CPU0 CPU1 0:1159492 0 local-APIC-edge timer LOC: 01158220 Local interrupts 0:1161996 0 local-APIC-edge timer LOC: 01160723 Local interrupts Hmm. That's strange. It looks like the local apic timer is not used, but x86_64 definitely lacks the above check. Ouch, sorry. This is from the kernel booted with noapictimer. I'll get the correct output in a little while. OK, this one is from -rc7 with no extra command line: albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts CPU0 CPU1 0: 27311 0 local-APIC-edge timer 1: 1 77 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 12: 0148 IO-APIC-edge i8042 14: 19683 IO-APIC-edge ide0 16:178 12443 IO-APIC-fasteoi sata_sil, HDA Intel 19:111 15197 IO-APIC-fasteoi ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3 20: 0 3 IO-APIC-fasteoi tifm_7xx1, yenta, sdhci:slot0, ohci1394 21: 0113 IO-APIC-fasteoi acpi NMI: 0 0 LOC: 27270 27119 ERR: 2 CPU0 CPU1 0: 29815 0 local-APIC-edge timer 1: 1 77 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 12: 0148 IO-APIC-edge i8042 14: 20772 IO-APIC-edge ide0 16:178 12451 IO-APIC-fasteoi sata_sil, HDA Intel 19:112 17199 IO-APIC-fasteoi ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3 20: 0 3 IO-APIC-fasteoi tifm_7xx1, yenta, sdhci:slot0, ohci1394 21: 0117 IO-APIC-fasteoi acpi NMI: 0 0 LOC: 29774 29623 ERR: 2 albercik:~ # Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Monday, 24 September 2007 21:13, Thomas Gleixner wrote: On Mon, 2007-09-24 at 21:11 +0200, Rafael J. Wysocki wrote: /me scratches head Retested. We know, that - disabling local apic timers work This works reproducibly accross the board. Ok - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING This stopped working, although it evidently worked yesterday (wtf?). There seems to be a history effect in the box, to make things more interesting. Did you connect this box to Andrews VAIO during KS ? No, but it's famous for being interestingly broken nevertheless. I think the only solid data point so far is that noapictimer makes the box boot. Ok. Can you add nmi_watchdog=1 to the command line please. This runs through the calibration of APIC, but registers it as a dummy clock source (the PIT must run to make the watchdog work). If it boots, please provide the output of /proc/timer_list No, it doesn't. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote: There seems to be a history effect in the box, to make things more interesting. Did you connect this box to Andrews VAIO during KS ? No, but it's famous for being interestingly broken nevertheless. :) I think the only solid data point so far is that noapictimer makes the box boot. Ok. Can you add nmi_watchdog=1 to the command line please. This runs through the calibration of APIC, but registers it as a dummy clock source (the PIT must run to make the watchdog work). If it boots, please provide the output of /proc/timer_list No, it doesn't. I start to get desperate. Below is a patch, which moves the apic timer disable check after the calibration routine. Can you please apply on top of -hrt and add noapictimer to the command line ? Does it boot ? tglx Index: linux-2.6.23-rc7/arch/x86_64/kernel/apic.c === --- linux-2.6.23-rc7.orig/arch/x86_64/kernel/apic.c 2007-09-24 20:30:00.0 +0200 +++ linux-2.6.23-rc7/arch/x86_64/kernel/apic.c 2007-09-25 15:05:32.0 +0200 @@ -927,6 +927,7 @@ static void __init calibrate_APIC_clock( void __init setup_boot_APIC_clock (void) { +#if 0 /* * The local apic timer can be disabled via the kernel commandline. * Register the lapic timer as a dummy clock event source on SMP @@ -940,7 +941,7 @@ void __init setup_boot_APIC_clock (void) setup_APIC_timer(); return; } - +#endif printk(KERN_INFO Using local APIC timer interrupts.\n); calibrate_APIC_clock(); @@ -949,11 +950,13 @@ void __init setup_boot_APIC_clock (void) * PIT/HPET going. Otherwise register lapic as a dummy * device. */ - if (nmi_watchdog != NMI_IO_APIC) + if (!disable_apic_timer nmi_watchdog != NMI_IO_APIC) lapic_clockevent.features = ~CLOCK_EVT_FEAT_DUMMY; +#if 0 else printk(KERN_WARNING APIC timer registered as dummy, due to nmi_watchdog=1!\n); +#endif setup_APIC_timer(); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Tuesday, 25 September 2007 15:17, Thomas Gleixner wrote: On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote: [--snip--] I start to get desperate. Below is a patch, which moves the apic timer disable check after the calibration routine. Can you please apply on top of -hrt and add noapictimer to the command line ? Does it boot ? 2.6.23-rc7 with patch-2.6.23-rc7-hrt1.patch and the patch below applied boots with noapictimer and doesn't boot without it. Also, attached is the output of # cat /proc/interrupts; sleep 10; cat /proc/interrupts from the current mainline. Greetings, Rafael Index: linux-2.6.23-rc7/arch/x86_64/kernel/apic.c === --- linux-2.6.23-rc7.orig/arch/x86_64/kernel/apic.c 2007-09-24 20:30:00.0 +0200 +++ linux-2.6.23-rc7/arch/x86_64/kernel/apic.c2007-09-25 15:05:32.0 +0200 @@ -927,6 +927,7 @@ static void __init calibrate_APIC_clock( void __init setup_boot_APIC_clock (void) { +#if 0 /* * The local apic timer can be disabled via the kernel commandline. * Register the lapic timer as a dummy clock event source on SMP @@ -940,7 +941,7 @@ void __init setup_boot_APIC_clock (void) setup_APIC_timer(); return; } - +#endif printk(KERN_INFO Using local APIC timer interrupts.\n); calibrate_APIC_clock(); @@ -949,11 +950,13 @@ void __init setup_boot_APIC_clock (void) * PIT/HPET going. Otherwise register lapic as a dummy * device. */ - if (nmi_watchdog != NMI_IO_APIC) + if (!disable_apic_timer nmi_watchdog != NMI_IO_APIC) lapic_clockevent.features = ~CLOCK_EVT_FEAT_DUMMY; +#if 0 else printk(KERN_WARNING APIC timer registered as dummy, due to nmi_watchdog=1!\n); +#endif setup_APIC_timer(); } -- Premature optimization is the root of all evil. - Donald Knuth albercik:~ # cat /proc/interrupts; sleep 10; cat /proc/interrupts CPU0 CPU1 0: 62489 0 local-APIC-edge timer 1: 3232 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 12: 1147 IO-APIC-edge i8042 14: 15 1947 IO-APIC-edge ide0 16:193 14151 IO-APIC-fasteoi sata_sil, HDA Intel 19: 76 43153 IO-APIC-fasteoi ohci_hcd:usb1, ehci_hcd:usb2, ohci_hcd:usb3 20: 0 4 IO-APIC-fasteoi ohci1394, tifm_7xx1, yenta, sdhci:slot0 21: 7172 IO-APIC-fasteoi acpi NMI: 0 0 LOC: 62454 62082 ERR: 0 CPU0 CPU1 0: 64993 0 local-APIC-edge timer 1: 3233 IO-APIC-edge i8042 8: 0 0 IO-APIC-edge rtc 12: 1147 IO-APIC-edge i8042 14: 15 2037 IO-APIC-edge ide0 16:194 14265 IO-APIC-fasteoi sata_sil, HDA Intel 19: 77 45155 IO-APIC-fasteoi ohci_hcd:usb1, ehci_hcd:usb2, ohci_hcd:usb3 20: 0 4 IO-APIC-fasteoi ohci1394, tifm_7xx1, yenta, sdhci:slot0 21: 7176 IO-APIC-fasteoi acpi NMI: 0 0 LOC: 64958 64586 ERR: 0 albercik:~ #
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Rafael, On Tue, 2007-09-25 at 22:07 +0200, Rafael J. Wysocki wrote: On Tuesday, 25 September 2007 15:17, Thomas Gleixner wrote: On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote: [--snip--] I start to get desperate. Below is a patch, which moves the apic timer disable check after the calibration routine. Can you please apply on top of -hrt and add noapictimer to the command line ? Does it boot ? 2.6.23-rc7 with patch-2.6.23-rc7-hrt1.patch and the patch below applied boots with noapictimer and doesn't boot without it. That was expected. I explicitly asked to add noapictimer to the kernel command line. Ok, so we ruled out the apic timer calibration routine. I did not expect that this would be the culprit, but with dark screen as the only debug info, I need to resort to small steps. Can you please send me the output of /proc/timer_list of 2.6.23-rc7-hrt1 after booting with noapictimer ? I'm a bit confused by your earlier confirmation, that mainline w/o the -hrt patches boots fine, when you add apicmaintimer to the kernel command line. apicmaintimer stops the PIT like we do in -hrt and we just use the local APIC timer for everything. Can you please retest and confirm that this is correct ? Is the 32 bit kernel working on that box ? Thanks for your patience. tglx PS: I just sent out the disable APIC timer for AMD C1E boxen patch. We debugged this half a year ago on a nx6325, but I completely forgot about that. The explanation from AMD was sensible, but your apicmaintimer works statement is contradictory. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Thomas, On Tuesday, 25 September 2007 22:46, Thomas Gleixner wrote: Rafael, On Tue, 2007-09-25 at 22:07 +0200, Rafael J. Wysocki wrote: On Tuesday, 25 September 2007 15:17, Thomas Gleixner wrote: On Tue, 2007-09-25 at 15:16 +0200, Rafael J. Wysocki wrote: [--snip--] I start to get desperate. Below is a patch, which moves the apic timer disable check after the calibration routine. Can you please apply on top of -hrt and add noapictimer to the command line ? Does it boot ? 2.6.23-rc7 with patch-2.6.23-rc7-hrt1.patch and the patch below applied boots with noapictimer and doesn't boot without it. That was expected. I explicitly asked to add noapictimer to the kernel command line. Ok, so we ruled out the apic timer calibration routine. I did not expect that this would be the culprit, but with dark screen as the only debug info, I need to resort to small steps. Can you please send me the output of /proc/timer_list of 2.6.23-rc7-hrt1 after booting with noapictimer ? Sure, attached. [Note: the kernel has been compiled with both NO_HZ and HIGH_RES_TIMERS unset.] I'm a bit confused by your earlier confirmation, that mainline w/o the -hrt patches boots fine, when you add apicmaintimer to the kernel command line. apicmaintimer stops the PIT like we do in -hrt and we just use the local APIC timer for everything. Can you please retest and confirm that this is correct ? No, it's not. The mainline _usually_ doesn't boot with apicmaintimer. It seems to me that _sometimes_ the CPU just doesn't enter this C1E state and then everything goes fine ... Is the 32 bit kernel working on that box ? Can't tell, I have only 64-bit userland here. Thanks for your patience. Well, I'm only making sure that future kernels will run on my box. ;-) tglx PS: I just sent out the disable APIC timer for AMD C1E boxen patch. Yes, I've already tested it and sent a reply. It works. :-) We debugged this half a year ago on a nx6325, but I completely forgot about that. The explanation from AMD was sensible, but your apicmaintimer works statement is contradictory. Well, it was wrong. I have some problems with resuming from suspend to RAM using 2.6.23-rc8-mm1 with this patch applied, but I think they are related to something else. I'll wait for the next -mm with debugging that. For now, I'm going to build 2.6.23-rc8 with my collection of suspend patches plus patch-2.6.23-rc7-hrt1.patch and the disable APIC timer for AMD C1E boxes patch applied. I'll play with that a bit and let you know how it's behaving. Greetings, Rafael Timer List Version: v0.3 HRTIMER_MAX_CLOCK_BASES: 2 now at 279792107058 nsecs cpu: 0 clock 0: .index: 0 .resolution: 4000250 nsecs .get_time: ktime_get_real active timers: clock 1: .index: 1 .resolution: 4000250 nsecs .get_time: ktime_get active timers: #0: 81004f98bda8, hrtimer_wakeup, S:01, do_nanosleep, kwrapper/4664 # expires at 280207419178 nsecs [in 415312120 nsecs] #1: 81004f98bda8, hrtimer_wakeup, S:01, futex_wait, nscd/4080 # expires at 282678021548 nsecs [in 2885914490 nsecs] #2: 81004f98bda8, hrtimer_wakeup, S:01, futex_wait, nscd/4082 # expires at 282678129670 nsecs [in 2886022612 nsecs] #3: 81004f98bda8, it_real_fn, S:01, do_setitimer, qmgr/4239 # expires at 378654389676 nsecs [in 98862282618 nsecs] #4: 81004f98bda8, it_real_fn, S:01, do_setitimer, pickup/4238 # expires at 557809025993 nsecs [in 278016918935 nsecs] #5: 81004f98bda8, it_real_fn, S:01, do_setitimer, master/4216 # expires at 557809137746 nsecs [in 278017030688 nsecs] cpu: 1 clock 0: .index: 0 .resolution: 4000250 nsecs .get_time: ktime_get_real active timers: clock 1: .index: 1 .resolution: 4000250 nsecs .get_time: ktime_get active timers: #0: 81004f98bda8, it_real_fn, S:01, do_setitimer, Xorg/4355 # expires at 279804542721 nsecs [in 12435663 nsecs] #1: 81004f98bda8, it_real_fn, S:01, do_setitimer, ssh-agent/4611 # expires at 279962268496 nsecs [in 170161438 nsecs] #2: 81004f98bda8, hrtimer_wakeup, S:01, do_nanosleep, hald-addon-stor/4148 # expires at 280071774352 nsecs [in 279667294 nsecs] #3: 81004f98bda8, hrtimer_wakeup, S:01, futex_wait, nscd/4081 # expires at 282678034680 nsecs [in 2885927622 nsecs] #4: 81004f98bda8, hrtimer_wakeup, S:01, do_nanosleep, cron/4241 # expires at 335311096287 nsecs [in 55518989229 nsecs] #5: 81004f98bda8, it_real_fn, S:01, do_setitimer, dhcpcd/5128 # expires at 604918992928181 nsecs [in 604639200821123 nsecs] #6: 81004f98bda8, hrtimer_wakeup, S:01, do_nanosleep, dhcpcd/5128 # expires at 604918992950531 nsecs [in 604639200843473 nsecs] Tick Device: mode: 0 Clock Event Device: pit max_delta_ns: 27461866 min_delta_ns: 12571 mult: 5124677 shift: 32 mode: 2 next_event: 9223372036854775807 nsecs set_next_event: pit_next_event set_mode: init_pit_timer
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Rafael, On Tue, 2007-09-25 at 23:28 +0200, Rafael J. Wysocki wrote: I'm a bit confused by your earlier confirmation, that mainline w/o the -hrt patches boots fine, when you add apicmaintimer to the kernel command line. apicmaintimer stops the PIT like we do in -hrt and we just use the local APIC timer for everything. Can you please retest and confirm that this is correct ? No, it's not. The mainline _usually_ doesn't boot with apicmaintimer. It seems to me that _sometimes_ the CPU just doesn't enter this C1E state and then everything goes fine ... I'm relieved. I really started to go nuts on this contradicting patterns. Your box seems to be worse than the VAIO, it has some random surprise generator built in :) Is the 32 bit kernel working on that box ? Can't tell, I have only 64-bit userland here. Should be fine. The check is there since late 2.6.21-rc. I really could kick my own ass that I did not remember the nx6325 wreckage in the 2.6.21-rc time frame. Sigh, way too much broken hardware out there to keep track of it. Thanks for your patience. Well, I'm only making sure that future kernels will run on my box. ;-) Nothing wrong with that. Thanks again for your help, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Mon, 2007-09-24 at 21:11 +0200, Rafael J. Wysocki wrote: > > /me scratches head > > Retested. > > > We know, that > > - disabling local apic timers work > > This works reproducibly accross the board. Ok > > - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING > > This stopped working, although it evidently worked yesterday (wtf?). > > There seems to be a history effect in the box, to make things more > "interesting". Did you connect this box to Andrews VAIO during KS ? > I think the only solid data point so far is that "noapictimer" makes the box > boot. Ok. Can you add "nmi_watchdog=1" to the command line please. This runs through the calibration of APIC, but registers it as a dummy clock source (the PIT must run to make the watchdog work). If it boots, please provide the output of /proc/timer_list Thanks, tlgx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Monday, 24 September 2007 18:46, Thomas Gleixner wrote: > On Mon, 2007-09-24 at 17:18 +0200, Rafael J. Wysocki wrote: > > > > Well, "noacpi" seems to be a synonym for "pci=noacpi". > > > > > > > > Anyway, it causes acpi_disable_pci() to be executed, which according to > > > > Documentation/kernel-parameters.txt means "Do not use ACPI for IRQ > > > > routing or > > > > for PCI scanning" (it works like this on x86_64 too, although the doc > > > > says it's > > > > x86_32-specific). > > > > > > Hrm. The local apic timer calibration does not use anything which is > > > related to interrupts, but if we use the local APIC timer we switch off > > > PIT. > > > > > > Can you boot Linus latest (w/o hrt patches) and add "apicmaintimer" to > > > the kernel command line please ? > > > > Works, dmesg attached. > > /me scratches head Retested. > We know, that > - disabling local apic timers work This works reproducibly accross the board. > - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING This stopped working, although it evidently worked yesterday (wtf?). There seems to be a history effect in the box, to make things more "interesting". > is given on the kernel command line. > > I have no clue, what might be the difference of noacpiFSCKEDPARSING. The > boot log is not giving any hint at all. > > acpi_disable_pci() sets acpi_pci_disabled and acpi_noirq to 1. > > What happens, if you set "acpi=noirq" instead ? That obviously doesn't help. I think the only solid data point so far is that "noapictimer" makes the box boot. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Mon, 2007-09-24 at 17:18 +0200, Rafael J. Wysocki wrote: > > > Well, "noacpi" seems to be a synonym for "pci=noacpi". > > > > > > Anyway, it causes acpi_disable_pci() to be executed, which according to > > > Documentation/kernel-parameters.txt means "Do not use ACPI for IRQ > > > routing or > > > for PCI scanning" (it works like this on x86_64 too, although the doc > > > says it's > > > x86_32-specific). > > > > Hrm. The local apic timer calibration does not use anything which is > > related to interrupts, but if we use the local APIC timer we switch off > > PIT. > > > > Can you boot Linus latest (w/o hrt patches) and add "apicmaintimer" to > > the kernel command line please ? > > Works, dmesg attached. /me scratches head We know, that - disabling local apic timers work - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING is given on the kernel command line. I have no clue, what might be the difference of noacpiFSCKEDPARSING. The boot log is not giving any hint at all. acpi_disable_pci() sets acpi_pci_disabled and acpi_noirq to 1. What happens, if you set "acpi=noirq" instead ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Monday, 24 September 2007 16:23, Thomas Gleixner wrote: > On Mon, 2007-09-24 at 15:52 +0200, Rafael J. Wysocki wrote: > > > > > So I really wonder, why noacpitimer on the kernel command line makes > > > > > any > > > > > difference. I'm confused. > > > > > > > > \metoo > > > > > > > > Well, it was probably read as "noacpi". :-) > > > > > > Hmm, ACPI is in the log all over the place. > > > > Well, "noacpi" seems to be a synonym for "pci=noacpi". > > > > Anyway, it causes acpi_disable_pci() to be executed, which according to > > Documentation/kernel-parameters.txt means "Do not use ACPI for IRQ routing > > or > > for PCI scanning" (it works like this on x86_64 too, although the doc says > > it's > > x86_32-specific). > > Hrm. The local apic timer calibration does not use anything which is > related to interrupts, but if we use the local APIC timer we switch off > PIT. > > Can you boot Linus latest (w/o hrt patches) and add "apicmaintimer" to > the kernel command line please ? Works, dmesg attached. Greetings, Rafael Linux version 2.6.23-rc7test ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #19 SMP Mon Sep 24 16:55:05 CEST 2007 Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 apicmaintimer apic=verbose 2 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - 77fd (usable) BIOS-e820: 77fd - 77fe5600 (reserved) BIOS-e820: 77fe5600 - 77ff8000 (ACPI NVS) BIOS-e820: 77ff8000 - 8000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec02000 (reserved) BIOS-e820: ffbc - ffcc (reserved) BIOS-e820: fff0 - 0001 (reserved) Entering add_active_range(0, 0, 159) 0 entries of 256 used Entering add_active_range(0, 256, 491472) 1 entries of 256 used end_pfn_map = 1048576 DMI 2.4 present. ACPI: RSDP 000F7D30, 0024 (r2 HP) ACPI: XSDT 77FE57B4, 0054 (r1 HP 0944 6070620 HP 1) ACPI: FACP 77FE5684, 00F4 (r4 HP 09443 HP 1) ACPI: DSDT 77FE58DC, EE7A (r1 HPSB4001 MSFT 10E) ACPI: FACS 77FF7E80, 0040 ACPI: APIC 77FE5808, 0062 (r1 HP 09441 HP 1) ACPI: MCFG 77FE586C, 003C (r1 HP 09441 HP 1) ACPI: TCPA 77FE58A8, 0032 (r2 HP 09441 HP 1) ACPI: SSDT 77FF4756, 0059 (r1 HP HPQNLP1 MSFT 10E) ACPI: SSDT 77FF47AF, 0206 (r1 HP PSSTBLID1 HP 1) Entering add_active_range(0, 0, 159) 0 entries of 256 used Entering add_active_range(0, 256, 491472) 1 entries of 256 used No mptable found. Zone PFN ranges: DMA 0 -> 4096 DMA324096 -> 1048576 Normal1048576 -> 1048576 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0:0 -> 159 0: 256 -> 491472 On node 0 totalpages: 491375 DMA zone: 56 pages used for memmap DMA zone: 1442 pages reserved DMA zone: 2501 pages, LIFO batch:0 DMA32 zone: 6663 pages used for memmap DMA32 zone: 480713 pages, LIFO batch:31 Normal zone: 0 pages used for memmap Movable zone: 0 pages used for memmap ATI board detected. Disabling timer routing over 8254. ACPI: PM-Timer IO Port: 0x8008 ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information mapped APIC to ff5fb000 (fee0) mapped IOAPIC to ff5fa000 (fec0) swsusp: Registered nosave memory region: 0009f000 - 000a swsusp: Registered nosave memory region: 000a - 000e swsusp: Registered nosave memory region: 000e - 0010 Allocating PCI resources starting at 8800 (gap: 8000:6000) SMP: Allowing 2 CPUs, 0 hotplug CPUs PERCPU: Allocating 47320 bytes of per cpu data Built 1 zonelists in Zone order. Total pages: 483214 Kernel command line: root=/dev/sda3 vga=792 resume=/dev/sda1 apicmaintimer apic=verbose 2 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) Extended CMOS year: 2000 Marking TSC unstable due to TSCs unsynchronized time.c: Detected 1995.108 MHz processor. Console: colour dummy device 80x25 console [tty0] enabled Dentry cache hash table
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Mon, 2007-09-24 at 15:52 +0200, Rafael J. Wysocki wrote: > > > > So I really wonder, why noacpitimer on the kernel command line makes any > > > > difference. I'm confused. > > > > > > \metoo > > > > > > Well, it was probably read as "noacpi". :-) > > > > Hmm, ACPI is in the log all over the place. > > Well, "noacpi" seems to be a synonym for "pci=noacpi". > > Anyway, it causes acpi_disable_pci() to be executed, which according to > Documentation/kernel-parameters.txt means "Do not use ACPI for IRQ routing or > for PCI scanning" (it works like this on x86_64 too, although the doc says > it's > x86_32-specific). Hrm. The local apic timer calibration does not use anything which is related to interrupts, but if we use the local APIC timer we switch off PIT. Can you boot Linus latest (w/o hrt patches) and add "apicmaintimer" to the kernel command line please ? > And yes, it matches "noacpiwhatever" in the command line with "noacpi". Sigh. Urgh. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Monday, 24 September 2007 15:05, Thomas Gleixner wrote: > On Mon, 2007-09-24 at 14:57 +0200, Rafael J. Wysocki wrote: > > > > http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 > > > > > > > > applied. I also have the 2.6.23-rc6-mm1 dmesg output ready, but > > > > there's some > > > > -mm-specific noise in it. Please let me know if you want it, though. > > > > > > Hmm: > > > > > > > Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer > > > > apic=verbose 2 > > > ^^^ > > > > > > noacpitimer is not a valid commandline option. > > > > > > I asked for: > > > >> > > noapictimer > > > > I'm blind, sorry. > > > > > So I really wonder, why noacpitimer on the kernel command line makes any > > > difference. I'm confused. > > > > \metoo > > > > Well, it was probably read as "noacpi". :-) > > Hmm, ACPI is in the log all over the place. Well, "noacpi" seems to be a synonym for "pci=noacpi". Anyway, it causes acpi_disable_pci() to be executed, which according to Documentation/kernel-parameters.txt means "Do not use ACPI for IRQ routing or for PCI scanning" (it works like this on x86_64 too, although the doc says it's x86_32-specific). And yes, it matches "noacpiwhatever" in the command line with "noacpi". Sigh. > > Fortunately, noapictimer helps as well, dmesg attached (I have the one > > from 2.6.23-rc6-mm1 ready, too). > > Ok, at which point is the box stopping, when you omit noa* ? Is > earlyprintk giving you any useful info ? earlyprintk=vga doesn't display anything (ie. black screen) and there are no serial ports in the box. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Mon, 2007-09-24 at 14:57 +0200, Rafael J. Wysocki wrote: > > > http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 > > > > > > applied. I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's > > > some > > > -mm-specific noise in it. Please let me know if you want it, though. > > > > Hmm: > > > > > Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer > > > apic=verbose 2 > > ^^^ > > > > noacpitimer is not a valid commandline option. > > > > I asked for: > > >> > > noapictimer > > I'm blind, sorry. > > > So I really wonder, why noacpitimer on the kernel command line makes any > > difference. I'm confused. > > \metoo > > Well, it was probably read as "noacpi". :-) Hmm, ACPI is in the log all over the place. > Fortunately, noapictimer helps as well, dmesg attached (I have the one > from 2.6.23-rc6-mm1 ready, too). Ok, at which point is the box stopping, when you omit noa* ? Is earlyprintk giving you any useful info ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Monday, 24 September 2007 10:07, Thomas Gleixner wrote: > On Sun, 2007-09-23 at 22:52 +0200, Rafael J. Wysocki wrote: > > > > Second, noacpitimer added to the command line makes all of the kernels, > > > > up to > > > > and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible). > > > > > > That's valuable information. Can you please provide a boot log of one of > > > those with an additional "apic=verbose" on the command line ? > > > > Attached is the dmesg output from the 2.6.23-rc6 kernel with the patchset: > > > > http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 > > > > applied. I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's > > some > > -mm-specific noise in it. Please let me know if you want it, though. > > Hmm: > > > Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer > > apic=verbose 2 > ^^^ > > noacpitimer is not a valid commandline option. > > I asked for: > >> > > noapictimer I'm blind, sorry. > So I really wonder, why noacpitimer on the kernel command line makes any > difference. I'm confused. \metoo Well, it was probably read as "noacpi". :-) Fortunately, noapictimer helps as well, dmesg attached (I have the one from 2.6.23-rc6-mm1 ready, too). Greetings, Rafael Linux version 2.6.23-rc6-hrt ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Sat Sep 22 22:38:18 CEST 2007 Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noapictimer apic=verbose 2 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - 77fd (usable) BIOS-e820: 77fd - 77fe5600 (reserved) BIOS-e820: 77fe5600 - 77ff8000 (ACPI NVS) BIOS-e820: 77ff8000 - 8000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec02000 (reserved) BIOS-e820: ffbc - ffcc (reserved) BIOS-e820: fff0 - 0001 (reserved) Entering add_active_range(0, 0, 159) 0 entries of 256 used Entering add_active_range(0, 256, 491472) 1 entries of 256 used end_pfn_map = 1048576 DMI 2.4 present. ACPI: RSDP 000F7D30, 0024 (r2 HP) ACPI: XSDT 77FE57B4, 0054 (r1 HP 0944 6070620 HP 1) ACPI: FACP 77FE5684, 00F4 (r4 HP 09443 HP 1) ACPI: DSDT 77FE58DC, EE7A (r1 HPSB4001 MSFT 10E) ACPI: FACS 77FF7E80, 0040 ACPI: APIC 77FE5808, 0062 (r1 HP 09441 HP 1) ACPI: MCFG 77FE586C, 003C (r1 HP 09441 HP 1) ACPI: TCPA 77FE58A8, 0032 (r2 HP 09441 HP 1) ACPI: SSDT 77FF4756, 0059 (r1 HP HPQNLP1 MSFT 10E) ACPI: SSDT 77FF47AF, 0206 (r1 HP PSSTBLID1 HP 1) Entering add_active_range(0, 0, 159) 0 entries of 256 used Entering add_active_range(0, 256, 491472) 1 entries of 256 used No mptable found. Zone PFN ranges: DMA 0 -> 4096 DMA324096 -> 1048576 Normal1048576 -> 1048576 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0:0 -> 159 0: 256 -> 491472 On node 0 totalpages: 491375 DMA zone: 56 pages used for memmap DMA zone: 1446 pages reserved DMA zone: 2497 pages, LIFO batch:0 DMA32 zone: 6663 pages used for memmap DMA32 zone: 480713 pages, LIFO batch:31 Normal zone: 0 pages used for memmap Movable zone: 0 pages used for memmap ATI board detected. Disabling timer routing over 8254. ACPI: PM-Timer IO Port: 0x8008 ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information mapped APIC to ff5fb000 (fee0) mapped IOAPIC to ff5fa000 (fec0) swsusp: Registered nosave memory region: 0009f000 - 000a swsusp: Registered nosave memory region: 000a - 000e swsusp: Registered nosave memory region: 000e - 0010 Allocating PCI resources starting at 8800 (gap: 8000:6000) SMP: Allowing 2 CPUs, 0 hotplug CPUs PERCPU: Allocating 47576 bytes of per cpu data Built 1 zonelists in Zone order. Total pages: 483210 Kernel command line: root=/dev/sda3 vga=792
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Sun, 2007-09-23 at 22:52 +0200, Rafael J. Wysocki wrote: > > > Second, noacpitimer added to the command line makes all of the kernels, > > > up to > > > and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible). > > > > That's valuable information. Can you please provide a boot log of one of > > those with an additional "apic=verbose" on the command line ? > > Attached is the dmesg output from the 2.6.23-rc6 kernel with the patchset: > > http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 > > applied. I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's some > -mm-specific noise in it. Please let me know if you want it, though. Hmm: > Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer > apic=verbose 2 ^^^ noacpitimer is not a valid commandline option. I asked for: >> > > noapictimer So I really wonder, why noacpitimer on the kernel command line makes any difference. I'm confused. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Sun, 2007-09-23 at 22:52 +0200, Rafael J. Wysocki wrote: Second, noacpitimer added to the command line makes all of the kernels, up to and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible). That's valuable information. Can you please provide a boot log of one of those with an additional apic=verbose on the command line ? Attached is the dmesg output from the 2.6.23-rc6 kernel with the patchset: http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 applied. I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's some -mm-specific noise in it. Please let me know if you want it, though. Hmm: Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer apic=verbose 2 ^^^ noacpitimer is not a valid commandline option. I asked for: noapictimer So I really wonder, why noacpitimer on the kernel command line makes any difference. I'm confused. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Monday, 24 September 2007 10:07, Thomas Gleixner wrote: On Sun, 2007-09-23 at 22:52 +0200, Rafael J. Wysocki wrote: Second, noacpitimer added to the command line makes all of the kernels, up to and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible). That's valuable information. Can you please provide a boot log of one of those with an additional apic=verbose on the command line ? Attached is the dmesg output from the 2.6.23-rc6 kernel with the patchset: http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 applied. I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's some -mm-specific noise in it. Please let me know if you want it, though. Hmm: Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer apic=verbose 2 ^^^ noacpitimer is not a valid commandline option. I asked for: noapictimer I'm blind, sorry. So I really wonder, why noacpitimer on the kernel command line makes any difference. I'm confused. \metoo Well, it was probably read as noacpi. :-) Fortunately, noapictimer helps as well, dmesg attached (I have the one from 2.6.23-rc6-mm1 ready, too). Greetings, Rafael Linux version 2.6.23-rc6-hrt ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Sat Sep 22 22:38:18 CEST 2007 Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noapictimer apic=verbose 2 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - 77fd (usable) BIOS-e820: 77fd - 77fe5600 (reserved) BIOS-e820: 77fe5600 - 77ff8000 (ACPI NVS) BIOS-e820: 77ff8000 - 8000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec02000 (reserved) BIOS-e820: ffbc - ffcc (reserved) BIOS-e820: fff0 - 0001 (reserved) Entering add_active_range(0, 0, 159) 0 entries of 256 used Entering add_active_range(0, 256, 491472) 1 entries of 256 used end_pfn_map = 1048576 DMI 2.4 present. ACPI: RSDP 000F7D30, 0024 (r2 HP) ACPI: XSDT 77FE57B4, 0054 (r1 HP 0944 6070620 HP 1) ACPI: FACP 77FE5684, 00F4 (r4 HP 09443 HP 1) ACPI: DSDT 77FE58DC, EE7A (r1 HPSB4001 MSFT 10E) ACPI: FACS 77FF7E80, 0040 ACPI: APIC 77FE5808, 0062 (r1 HP 09441 HP 1) ACPI: MCFG 77FE586C, 003C (r1 HP 09441 HP 1) ACPI: TCPA 77FE58A8, 0032 (r2 HP 09441 HP 1) ACPI: SSDT 77FF4756, 0059 (r1 HP HPQNLP1 MSFT 10E) ACPI: SSDT 77FF47AF, 0206 (r1 HP PSSTBLID1 HP 1) Entering add_active_range(0, 0, 159) 0 entries of 256 used Entering add_active_range(0, 256, 491472) 1 entries of 256 used No mptable found. Zone PFN ranges: DMA 0 - 4096 DMA324096 - 1048576 Normal1048576 - 1048576 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0:0 - 159 0: 256 - 491472 On node 0 totalpages: 491375 DMA zone: 56 pages used for memmap DMA zone: 1446 pages reserved DMA zone: 2497 pages, LIFO batch:0 DMA32 zone: 6663 pages used for memmap DMA32 zone: 480713 pages, LIFO batch:31 Normal zone: 0 pages used for memmap Movable zone: 0 pages used for memmap ATI board detected. Disabling timer routing over 8254. ACPI: PM-Timer IO Port: 0x8008 ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information mapped APIC to ff5fb000 (fee0) mapped IOAPIC to ff5fa000 (fec0) swsusp: Registered nosave memory region: 0009f000 - 000a swsusp: Registered nosave memory region: 000a - 000e swsusp: Registered nosave memory region: 000e - 0010 Allocating PCI resources starting at 8800 (gap: 8000:6000) SMP: Allowing 2 CPUs, 0 hotplug CPUs PERCPU: Allocating 47576 bytes of per cpu data Built 1 zonelists in Zone order. Total pages: 483210 Kernel command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noapictimer apic=verbose 2 Initializing CPU#0 PID hash
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Mon, 2007-09-24 at 14:57 +0200, Rafael J. Wysocki wrote: http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 applied. I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's some -mm-specific noise in it. Please let me know if you want it, though. Hmm: Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer apic=verbose 2 ^^^ noacpitimer is not a valid commandline option. I asked for: noapictimer I'm blind, sorry. So I really wonder, why noacpitimer on the kernel command line makes any difference. I'm confused. \metoo Well, it was probably read as noacpi. :-) Hmm, ACPI is in the log all over the place. Fortunately, noapictimer helps as well, dmesg attached (I have the one from 2.6.23-rc6-mm1 ready, too). Ok, at which point is the box stopping, when you omit noa* ? Is earlyprintk giving you any useful info ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Monday, 24 September 2007 15:05, Thomas Gleixner wrote: On Mon, 2007-09-24 at 14:57 +0200, Rafael J. Wysocki wrote: http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 applied. I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's some -mm-specific noise in it. Please let me know if you want it, though. Hmm: Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer apic=verbose 2 ^^^ noacpitimer is not a valid commandline option. I asked for: noapictimer I'm blind, sorry. So I really wonder, why noacpitimer on the kernel command line makes any difference. I'm confused. \metoo Well, it was probably read as noacpi. :-) Hmm, ACPI is in the log all over the place. Well, noacpi seems to be a synonym for pci=noacpi. Anyway, it causes acpi_disable_pci() to be executed, which according to Documentation/kernel-parameters.txt means Do not use ACPI for IRQ routing or for PCI scanning (it works like this on x86_64 too, although the doc says it's x86_32-specific). And yes, it matches noacpiwhatever in the command line with noacpi. Sigh. Fortunately, noapictimer helps as well, dmesg attached (I have the one from 2.6.23-rc6-mm1 ready, too). Ok, at which point is the box stopping, when you omit noa* ? Is earlyprintk giving you any useful info ? earlyprintk=vga doesn't display anything (ie. black screen) and there are no serial ports in the box. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Mon, 2007-09-24 at 15:52 +0200, Rafael J. Wysocki wrote: So I really wonder, why noacpitimer on the kernel command line makes any difference. I'm confused. \metoo Well, it was probably read as noacpi. :-) Hmm, ACPI is in the log all over the place. Well, noacpi seems to be a synonym for pci=noacpi. Anyway, it causes acpi_disable_pci() to be executed, which according to Documentation/kernel-parameters.txt means Do not use ACPI for IRQ routing or for PCI scanning (it works like this on x86_64 too, although the doc says it's x86_32-specific). Hrm. The local apic timer calibration does not use anything which is related to interrupts, but if we use the local APIC timer we switch off PIT. Can you boot Linus latest (w/o hrt patches) and add apicmaintimer to the kernel command line please ? And yes, it matches noacpiwhatever in the command line with noacpi. Sigh. Urgh. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Monday, 24 September 2007 16:23, Thomas Gleixner wrote: On Mon, 2007-09-24 at 15:52 +0200, Rafael J. Wysocki wrote: So I really wonder, why noacpitimer on the kernel command line makes any difference. I'm confused. \metoo Well, it was probably read as noacpi. :-) Hmm, ACPI is in the log all over the place. Well, noacpi seems to be a synonym for pci=noacpi. Anyway, it causes acpi_disable_pci() to be executed, which according to Documentation/kernel-parameters.txt means Do not use ACPI for IRQ routing or for PCI scanning (it works like this on x86_64 too, although the doc says it's x86_32-specific). Hrm. The local apic timer calibration does not use anything which is related to interrupts, but if we use the local APIC timer we switch off PIT. Can you boot Linus latest (w/o hrt patches) and add apicmaintimer to the kernel command line please ? Works, dmesg attached. Greetings, Rafael Linux version 2.6.23-rc7test ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #19 SMP Mon Sep 24 16:55:05 CEST 2007 Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 apicmaintimer apic=verbose 2 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - 77fd (usable) BIOS-e820: 77fd - 77fe5600 (reserved) BIOS-e820: 77fe5600 - 77ff8000 (ACPI NVS) BIOS-e820: 77ff8000 - 8000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec02000 (reserved) BIOS-e820: ffbc - ffcc (reserved) BIOS-e820: fff0 - 0001 (reserved) Entering add_active_range(0, 0, 159) 0 entries of 256 used Entering add_active_range(0, 256, 491472) 1 entries of 256 used end_pfn_map = 1048576 DMI 2.4 present. ACPI: RSDP 000F7D30, 0024 (r2 HP) ACPI: XSDT 77FE57B4, 0054 (r1 HP 0944 6070620 HP 1) ACPI: FACP 77FE5684, 00F4 (r4 HP 09443 HP 1) ACPI: DSDT 77FE58DC, EE7A (r1 HPSB4001 MSFT 10E) ACPI: FACS 77FF7E80, 0040 ACPI: APIC 77FE5808, 0062 (r1 HP 09441 HP 1) ACPI: MCFG 77FE586C, 003C (r1 HP 09441 HP 1) ACPI: TCPA 77FE58A8, 0032 (r2 HP 09441 HP 1) ACPI: SSDT 77FF4756, 0059 (r1 HP HPQNLP1 MSFT 10E) ACPI: SSDT 77FF47AF, 0206 (r1 HP PSSTBLID1 HP 1) Entering add_active_range(0, 0, 159) 0 entries of 256 used Entering add_active_range(0, 256, 491472) 1 entries of 256 used No mptable found. Zone PFN ranges: DMA 0 - 4096 DMA324096 - 1048576 Normal1048576 - 1048576 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0:0 - 159 0: 256 - 491472 On node 0 totalpages: 491375 DMA zone: 56 pages used for memmap DMA zone: 1442 pages reserved DMA zone: 2501 pages, LIFO batch:0 DMA32 zone: 6663 pages used for memmap DMA32 zone: 480713 pages, LIFO batch:31 Normal zone: 0 pages used for memmap Movable zone: 0 pages used for memmap ATI board detected. Disabling timer routing over 8254. ACPI: PM-Timer IO Port: 0x8008 ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information mapped APIC to ff5fb000 (fee0) mapped IOAPIC to ff5fa000 (fec0) swsusp: Registered nosave memory region: 0009f000 - 000a swsusp: Registered nosave memory region: 000a - 000e swsusp: Registered nosave memory region: 000e - 0010 Allocating PCI resources starting at 8800 (gap: 8000:6000) SMP: Allowing 2 CPUs, 0 hotplug CPUs PERCPU: Allocating 47320 bytes of per cpu data Built 1 zonelists in Zone order. Total pages: 483214 Kernel command line: root=/dev/sda3 vga=792 resume=/dev/sda1 apicmaintimer apic=verbose 2 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) Extended CMOS year: 2000 Marking TSC unstable due to TSCs unsynchronized time.c: Detected 1995.108 MHz processor. Console: colour dummy device 80x25 console [tty0] enabled Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) Inode-cache hash table entries: 131072
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Mon, 2007-09-24 at 17:18 +0200, Rafael J. Wysocki wrote: Well, noacpi seems to be a synonym for pci=noacpi. Anyway, it causes acpi_disable_pci() to be executed, which according to Documentation/kernel-parameters.txt means Do not use ACPI for IRQ routing or for PCI scanning (it works like this on x86_64 too, although the doc says it's x86_32-specific). Hrm. The local apic timer calibration does not use anything which is related to interrupts, but if we use the local APIC timer we switch off PIT. Can you boot Linus latest (w/o hrt patches) and add apicmaintimer to the kernel command line please ? Works, dmesg attached. /me scratches head We know, that - disabling local apic timers work - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING is given on the kernel command line. I have no clue, what might be the difference of noacpiFSCKEDPARSING. The boot log is not giving any hint at all. acpi_disable_pci() sets acpi_pci_disabled and acpi_noirq to 1. What happens, if you set acpi=noirq instead ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Monday, 24 September 2007 18:46, Thomas Gleixner wrote: On Mon, 2007-09-24 at 17:18 +0200, Rafael J. Wysocki wrote: Well, noacpi seems to be a synonym for pci=noacpi. Anyway, it causes acpi_disable_pci() to be executed, which according to Documentation/kernel-parameters.txt means Do not use ACPI for IRQ routing or for PCI scanning (it works like this on x86_64 too, although the doc says it's x86_32-specific). Hrm. The local apic timer calibration does not use anything which is related to interrupts, but if we use the local APIC timer we switch off PIT. Can you boot Linus latest (w/o hrt patches) and add apicmaintimer to the kernel command line please ? Works, dmesg attached. /me scratches head Retested. We know, that - disabling local apic timers work This works reproducibly accross the board. - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING This stopped working, although it evidently worked yesterday (wtf?). There seems to be a history effect in the box, to make things more interesting. is given on the kernel command line. I have no clue, what might be the difference of noacpiFSCKEDPARSING. The boot log is not giving any hint at all. acpi_disable_pci() sets acpi_pci_disabled and acpi_noirq to 1. What happens, if you set acpi=noirq instead ? That obviously doesn't help. I think the only solid data point so far is that noapictimer makes the box boot. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Mon, 2007-09-24 at 21:11 +0200, Rafael J. Wysocki wrote: /me scratches head Retested. We know, that - disabling local apic timers work This works reproducibly accross the board. Ok - local apic timers (which turn off PIT) work. when noacpiFSCKEDPARSING This stopped working, although it evidently worked yesterday (wtf?). There seems to be a history effect in the box, to make things more interesting. Did you connect this box to Andrews VAIO during KS ? I think the only solid data point so far is that noapictimer makes the box boot. Ok. Can you add nmi_watchdog=1 to the command line please. This runs through the calibration of APIC, but registers it as a dummy clock source (the PIT must run to make the watchdog work). If it boots, please provide the output of /proc/timer_list Thanks, tlgx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Sunday, 23 September 2007 21:59, Thomas Gleixner wrote: > On Sun, 2007-09-23 at 22:08 +0200, Rafael J. Wysocki wrote: > > > > Since the boot fails very early, before any messages reach the (VGA) > > > > console, > > > > I have no idea what to do next, except for digging in the code. > > > > > > Ok, lets track it down. Is there any difference when you add: > > > > > > nohz=off > > > highres=off > > > noapictimer > > > > > > or any combinations of the above to the kernel command line ? > > > > First, for now, I build all kernels with NO_HZ and HIGH_RES_TIMERS unset > > (.config for 2.6.23-rc6-mm1 is attached). > > > > Second, noacpitimer added to the command line makes all of the kernels, up > > to > > and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible). > > That's valuable information. Can you please provide a boot log of one of > those with an additional "apic=verbose" on the command line ? Attached is the dmesg output from the 2.6.23-rc6 kernel with the patchset: http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 applied. I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's some -mm-specific noise in it. Please let me know if you want it, though. Greetings, Rafael Linux version 2.6.23-rc6-hrt ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Sat Sep 22 22:38:18 CEST 2007 Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer apic=verbose 2 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - 77fd (usable) BIOS-e820: 77fd - 77fe5600 (reserved) BIOS-e820: 77fe5600 - 77ff8000 (ACPI NVS) BIOS-e820: 77ff8000 - 8000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec02000 (reserved) BIOS-e820: ffbc - ffcc (reserved) BIOS-e820: fff0 - 0001 (reserved) Entering add_active_range(0, 0, 159) 0 entries of 256 used Entering add_active_range(0, 256, 491472) 1 entries of 256 used end_pfn_map = 1048576 DMI 2.4 present. ACPI: RSDP 000F7D30, 0024 (r2 HP) ACPI: XSDT 77FE57B4, 0054 (r1 HP 0944 6070620 HP 1) ACPI: FACP 77FE5684, 00F4 (r4 HP 09443 HP 1) ACPI: DSDT 77FE58DC, EE7A (r1 HPSB4001 MSFT 10E) ACPI: FACS 77FF7E80, 0040 ACPI: APIC 77FE5808, 0062 (r1 HP 09441 HP 1) ACPI: MCFG 77FE586C, 003C (r1 HP 09441 HP 1) ACPI: TCPA 77FE58A8, 0032 (r2 HP 09441 HP 1) ACPI: SSDT 77FF4756, 0059 (r1 HP HPQNLP1 MSFT 10E) ACPI: SSDT 77FF47AF, 0206 (r1 HP PSSTBLID1 HP 1) Entering add_active_range(0, 0, 159) 0 entries of 256 used Entering add_active_range(0, 256, 491472) 1 entries of 256 used No mptable found. Zone PFN ranges: DMA 0 -> 4096 DMA324096 -> 1048576 Normal1048576 -> 1048576 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0:0 -> 159 0: 256 -> 491472 On node 0 totalpages: 491375 DMA zone: 56 pages used for memmap DMA zone: 1446 pages reserved DMA zone: 2497 pages, LIFO batch:0 DMA32 zone: 6663 pages used for memmap DMA32 zone: 480713 pages, LIFO batch:31 Normal zone: 0 pages used for memmap Movable zone: 0 pages used for memmap ATI board detected. Disabling timer routing over 8254. ACPI: PM-Timer IO Port: 0x8008 ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information mapped APIC to ff5fb000 (fee0) mapped IOAPIC to ff5fa000 (fec0) swsusp: Registered nosave memory region: 0009f000 - 000a swsusp: Registered nosave memory region: 000a - 000e swsusp: Registered nosave memory region: 000e - 0010 Allocating PCI resources starting at 8800 (gap: 8000:6000) SMP: Allowing 2 CPUs, 0 hotplug CPUs PERCPU: Allocating 47576 bytes of per cpu data Built 1 zonelists in Zone order. Total pages: 483210 Kernel command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer apic=verbose 2 Initializing CPU#0 PID hash table entries: 4096 (order: 12,
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Sun, 2007-09-23 at 22:08 +0200, Rafael J. Wysocki wrote: > > > Since the boot fails very early, before any messages reach the (VGA) > > > console, > > > I have no idea what to do next, except for digging in the code. > > > > Ok, lets track it down. Is there any difference when you add: > > > > nohz=off > > highres=off > > noapictimer > > > > or any combinations of the above to the kernel command line ? > > First, for now, I build all kernels with NO_HZ and HIGH_RES_TIMERS unset > (.config for 2.6.23-rc6-mm1 is attached). > > Second, noacpitimer added to the command line makes all of the kernels, up to > and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible). That's valuable information. Can you please provide a boot log of one of those with an additional "apic=verbose" on the command line ? Thanks, tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Sunday, 23 September 2007 21:10, Thomas Gleixner wrote: > On Sun, 2007-09-23 at 12:57 +0200, Rafael J. Wysocki wrote: > > Hi Thomas, > > > > Unfortunately, my observation that the patch series: > > > > http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 > > > > worked with 2.6.23-rc4 was wrong. It _sometimes_ works, but usually doesn't > > boot, just like 2.6.23-rc4-mm1, 2.6.23-rc6-mm1 and everything in between > > with > > the above patch series applied. I've also tried: > > > > http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2 > > http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch > > > > with the same result. > > > > The problematic patch is x86_64-convert-to-clockevents.patch . > > > > Since the boot fails very early, before any messages reach the (VGA) > > console, > > I have no idea what to do next, except for digging in the code. > > Ok, lets track it down. Is there any difference when you add: > > nohz=off > highres=off > noapictimer > > or any combinations of the above to the kernel command line ? First, for now, I build all kernels with NO_HZ and HIGH_RES_TIMERS unset (.config for 2.6.23-rc6-mm1 is attached). Second, noacpitimer added to the command line makes all of the kernels, up to and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible). Greetings, Rafael # # Automatically generated make config: don't edit # Linux kernel version: 2.6.23-rc6-mm1 # Tue Sep 18 22:52:04 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_NONIRQ_WAKEUP=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_NR_QUICK=2 CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y # CONFIG_TASK_XACCT is not set # CONFIG_USER_NS is not set CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_TREE=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 # CONFIG_CONTAINERS is not set CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_PROC_PAGE_MONITOR=y CONFIG_PROC_KPAGEMAP=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_MODVERSIONS=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_BLK_DEV_BSG is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" # # Processor type and features # # CONFIG_TICK_ONESHOT is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_X86_PC=y # CONFIG_X86_VSMP is not set CONFIG_MK8=y # CONFIG_MPSC is not set # CONFIG_MCORE2 is not set # CONFIG_GENERIC_CPU is not set CONFIG_X86_L1_CACHE_BYTES=64 CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_INTERNODE_CACHE_BYTES=64 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y # CONFIG_MICROCODE is not set CONFIG_X86_MSR=m CONFIG_X86_CPUID=m CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_MTRR=y CONFIG_SMP=y # CONFIG_SCHED_SMT is not set CONFIG_SCHED_MC=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_PREEMPT_BKL=y # CONFIG_NUMA is not set CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Sun, 2007-09-23 at 12:57 +0200, Rafael J. Wysocki wrote: > Hi Thomas, > > Unfortunately, my observation that the patch series: > > http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 > > worked with 2.6.23-rc4 was wrong. It _sometimes_ works, but usually doesn't > boot, just like 2.6.23-rc4-mm1, 2.6.23-rc6-mm1 and everything in between with > the above patch series applied. I've also tried: > > http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2 > http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch > > with the same result. > > The problematic patch is x86_64-convert-to-clockevents.patch . > > Since the boot fails very early, before any messages reach the (VGA) console, > I have no idea what to do next, except for digging in the code. Ok, lets track it down. Is there any difference when you add: nohz=off highres=off noapictimer or any combinations of the above to the kernel command line ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Hi Thomas, Unfortunately, my observation that the patch series: http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 worked with 2.6.23-rc4 was wrong. It _sometimes_ works, but usually doesn't boot, just like 2.6.23-rc4-mm1, 2.6.23-rc6-mm1 and everything in between with the above patch series applied. I've also tried: http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2 http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch with the same result. The problematic patch is x86_64-convert-to-clockevents.patch . Since the boot fails very early, before any messages reach the (VGA) console, I have no idea what to do next, except for digging in the code. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
Hi Thomas, Unfortunately, my observation that the patch series: http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 worked with 2.6.23-rc4 was wrong. It _sometimes_ works, but usually doesn't boot, just like 2.6.23-rc4-mm1, 2.6.23-rc6-mm1 and everything in between with the above patch series applied. I've also tried: http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2 http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch with the same result. The problematic patch is x86_64-convert-to-clockevents.patch . Since the boot fails very early, before any messages reach the (VGA) console, I have no idea what to do next, except for digging in the code. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Sun, 2007-09-23 at 12:57 +0200, Rafael J. Wysocki wrote: Hi Thomas, Unfortunately, my observation that the patch series: http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 worked with 2.6.23-rc4 was wrong. It _sometimes_ works, but usually doesn't boot, just like 2.6.23-rc4-mm1, 2.6.23-rc6-mm1 and everything in between with the above patch series applied. I've also tried: http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2 http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch with the same result. The problematic patch is x86_64-convert-to-clockevents.patch . Since the boot fails very early, before any messages reach the (VGA) console, I have no idea what to do next, except for digging in the code. Ok, lets track it down. Is there any difference when you add: nohz=off highres=off noapictimer or any combinations of the above to the kernel command line ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Sunday, 23 September 2007 21:10, Thomas Gleixner wrote: On Sun, 2007-09-23 at 12:57 +0200, Rafael J. Wysocki wrote: Hi Thomas, Unfortunately, my observation that the patch series: http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 worked with 2.6.23-rc4 was wrong. It _sometimes_ works, but usually doesn't boot, just like 2.6.23-rc4-mm1, 2.6.23-rc6-mm1 and everything in between with the above patch series applied. I've also tried: http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2 http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch with the same result. The problematic patch is x86_64-convert-to-clockevents.patch . Since the boot fails very early, before any messages reach the (VGA) console, I have no idea what to do next, except for digging in the code. Ok, lets track it down. Is there any difference when you add: nohz=off highres=off noapictimer or any combinations of the above to the kernel command line ? First, for now, I build all kernels with NO_HZ and HIGH_RES_TIMERS unset (.config for 2.6.23-rc6-mm1 is attached). Second, noacpitimer added to the command line makes all of the kernels, up to and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible). Greetings, Rafael # # Automatically generated make config: don't edit # Linux kernel version: 2.6.23-rc6-mm1 # Tue Sep 18 22:52:04 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_NONIRQ_WAKEUP=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_NR_QUICK=2 CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y # CONFIG_TASK_XACCT is not set # CONFIG_USER_NS is not set CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_TREE=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 # CONFIG_CONTAINERS is not set CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_PROC_PAGE_MONITOR=y CONFIG_PROC_KPAGEMAP=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_MODVERSIONS=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_BLK_DEV_BSG is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=cfq # # Processor type and features # # CONFIG_TICK_ONESHOT is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_X86_PC=y # CONFIG_X86_VSMP is not set CONFIG_MK8=y # CONFIG_MPSC is not set # CONFIG_MCORE2 is not set # CONFIG_GENERIC_CPU is not set CONFIG_X86_L1_CACHE_BYTES=64 CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_INTERNODE_CACHE_BYTES=64 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y # CONFIG_MICROCODE is not set CONFIG_X86_MSR=m CONFIG_X86_CPUID=m CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_MTRR=y CONFIG_SMP=y # CONFIG_SCHED_SMT is not set CONFIG_SCHED_MC=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_PREEMPT_BKL=y # CONFIG_NUMA is not set CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Sun, 2007-09-23 at 22:08 +0200, Rafael J. Wysocki wrote: Since the boot fails very early, before any messages reach the (VGA) console, I have no idea what to do next, except for digging in the code. Ok, lets track it down. Is there any difference when you add: nohz=off highres=off noapictimer or any combinations of the above to the kernel command line ? First, for now, I build all kernels with NO_HZ and HIGH_RES_TIMERS unset (.config for 2.6.23-rc6-mm1 is attached). Second, noacpitimer added to the command line makes all of the kernels, up to and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible). That's valuable information. Can you please provide a boot log of one of those with an additional apic=verbose on the command line ? Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc4-mm1 and -rc6-mm1: boot failure on HP nx6325, related to clockevents
On Sunday, 23 September 2007 21:59, Thomas Gleixner wrote: On Sun, 2007-09-23 at 22:08 +0200, Rafael J. Wysocki wrote: Since the boot fails very early, before any messages reach the (VGA) console, I have no idea what to do next, except for digging in the code. Ok, lets track it down. Is there any difference when you add: nohz=off highres=off noapictimer or any combinations of the above to the kernel command line ? First, for now, I build all kernels with NO_HZ and HIGH_RES_TIMERS unset (.config for 2.6.23-rc6-mm1 is attached). Second, noacpitimer added to the command line makes all of the kernels, up to and including 2.6.23-rc6-mm1, boot (this seems to be 100% reproducible). That's valuable information. Can you please provide a boot log of one of those with an additional apic=verbose on the command line ? Attached is the dmesg output from the 2.6.23-rc6 kernel with the patchset: http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 applied. I also have the 2.6.23-rc6-mm1 dmesg output ready, but there's some -mm-specific noise in it. Please let me know if you want it, though. Greetings, Rafael Linux version 2.6.23-rc6-hrt ([EMAIL PROTECTED]) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Sat Sep 22 22:38:18 CEST 2007 Command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer apic=verbose 2 BIOS-provided physical RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - 77fd (usable) BIOS-e820: 77fd - 77fe5600 (reserved) BIOS-e820: 77fe5600 - 77ff8000 (ACPI NVS) BIOS-e820: 77ff8000 - 8000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec02000 (reserved) BIOS-e820: ffbc - ffcc (reserved) BIOS-e820: fff0 - 0001 (reserved) Entering add_active_range(0, 0, 159) 0 entries of 256 used Entering add_active_range(0, 256, 491472) 1 entries of 256 used end_pfn_map = 1048576 DMI 2.4 present. ACPI: RSDP 000F7D30, 0024 (r2 HP) ACPI: XSDT 77FE57B4, 0054 (r1 HP 0944 6070620 HP 1) ACPI: FACP 77FE5684, 00F4 (r4 HP 09443 HP 1) ACPI: DSDT 77FE58DC, EE7A (r1 HPSB4001 MSFT 10E) ACPI: FACS 77FF7E80, 0040 ACPI: APIC 77FE5808, 0062 (r1 HP 09441 HP 1) ACPI: MCFG 77FE586C, 003C (r1 HP 09441 HP 1) ACPI: TCPA 77FE58A8, 0032 (r2 HP 09441 HP 1) ACPI: SSDT 77FF4756, 0059 (r1 HP HPQNLP1 MSFT 10E) ACPI: SSDT 77FF47AF, 0206 (r1 HP PSSTBLID1 HP 1) Entering add_active_range(0, 0, 159) 0 entries of 256 used Entering add_active_range(0, 256, 491472) 1 entries of 256 used No mptable found. Zone PFN ranges: DMA 0 - 4096 DMA324096 - 1048576 Normal1048576 - 1048576 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0:0 - 159 0: 256 - 491472 On node 0 totalpages: 491375 DMA zone: 56 pages used for memmap DMA zone: 1446 pages reserved DMA zone: 2497 pages, LIFO batch:0 DMA32 zone: 6663 pages used for memmap DMA32 zone: 480713 pages, LIFO batch:31 Normal zone: 0 pages used for memmap Movable zone: 0 pages used for memmap ATI board detected. Disabling timer routing over 8254. ACPI: PM-Timer IO Port: 0x8008 ACPI: Local APIC address 0xfee0 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information mapped APIC to ff5fb000 (fee0) mapped IOAPIC to ff5fa000 (fec0) swsusp: Registered nosave memory region: 0009f000 - 000a swsusp: Registered nosave memory region: 000a - 000e swsusp: Registered nosave memory region: 000e - 0010 Allocating PCI resources starting at 8800 (gap: 8000:6000) SMP: Allowing 2 CPUs, 0 hotplug CPUs PERCPU: Allocating 47576 bytes of per cpu data Built 1 zonelists in Zone order. Total pages: 483210 Kernel command line: root=/dev/sda3 vga=792 resume=/dev/sda1 noacpitimer apic=verbose 2 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) Extended CMOS year: 2000 TSC calibrated against