Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Fri 2007-09-21 10:06:15, Thomas Gleixner wrote: > On Fri, 2007-09-21 at 14:51 +1000, Paul Mackerras wrote: > > Linus Torvalds writes: > > > > > It would indeed be nice if we could just take CPU's down early (while > > > everything is working), and run the whole suspend code with just one CPU, > > > rather than having to worry about the ordering between CPU and device > > > takedown. > > > > That is certainly what we want to do on powerpc. > > I would have expected that we do it exactly this way and it took me by > surprise, that we do not. Well, we used to do that, but acpi spec forbids that, and it means userspace sees plugs/unplugs. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Fri 2007-09-21 10:06:15, Thomas Gleixner wrote: On Fri, 2007-09-21 at 14:51 +1000, Paul Mackerras wrote: Linus Torvalds writes: It would indeed be nice if we could just take CPU's down early (while everything is working), and run the whole suspend code with just one CPU, rather than having to worry about the ordering between CPU and device takedown. That is certainly what we want to do on powerpc. I would have expected that we do it exactly this way and it took me by surprise, that we do not. Well, we used to do that, but acpi spec forbids that, and it means userspace sees plugs/unplugs. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Thomas, On Friday, 21 September 2007 21:16, Thomas Gleixner wrote: > Rafael, > > On Fri, 2007-09-21 at 21:20 +0200, Rafael J. Wysocki wrote: > > On Friday, 21 September 2007 18:27, Thomas Gleixner wrote: > > > I simply rmmod'ed the processor module before suspend and the problem is > > > solved as well. The cpuidle patches make this problem more prominent due > > > to the possible more direct switch into lower power states, when we wait > > > for > > > a long time on something. > > > > So, perhaps we can add a .suspend()/.resume() routines to the processor > > driver > > and use them to disable/enable the cpuidle functionality during a > > suspend/resume? > > http://tglx.de/private/tglx/p.diff > > untested yet, but I'm on the way to do that :) Heh, I thought of the same thing. :-) Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Fri, 2007-09-21 at 21:20 +0200, Rafael J. Wysocki wrote: > On Friday, 21 September 2007 18:27, Thomas Gleixner wrote: > > I simply rmmod'ed the processor module before suspend and the problem is > > solved as well. The cpuidle patches make this problem more prominent due > > to the possible more direct switch into lower power states, when we wait for > > a long time on something. > > So, perhaps we can add a .suspend()/.resume() routines to the processor driver > and use them to disable/enable the cpuidle functionality during a > suspend/resume? http://tglx.de/private/tglx/p.diff untested yet, but I'm on the way to do that :) tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Friday, 21 September 2007 18:27, Thomas Gleixner wrote: > Rafael, > > On Fri, 2007-09-21 at 16:20 +0200, Rafael J. Wysocki wrote: > > > > If you need any help from me with that, please let me know. > > > > > > I'm zooming in. It seems, that the ACPI idle code plays tricks with us. > > > After debugging the swsusp_suspend() code path I figured out, that we > > > end up in C2 or deeper power states while we run the suspend code. The > > > same happens when we come back on resume. It looks like we disable stuff > > > in the ACPI BIOS, which makes the C2 and deeper power states misbehave. > > > > Hm, can you please run the test I've suggested in another branch of the > > thread, ie. > > > > # echo shutdown > /sys/power/disk > > # echo disk > /sys/power/state > > > > without your debugging code in disk.c? > > > > This makes the hibernation code omit the major ACPI hooks, so if it works, > > we'll know that these hooks are responsible for the problem. > > Yes, this works fine. We still go into C3, but this seems not longer to > brick the box. > > > > I hacked the idle loop arch code to use halt() right before we call > > > device_suspend() and switch back to the acpi idle code right after > > > device_resume(). This solves the problem as well. > > > > Well, that seems less intrusive than changing the code ordering right before > > the major kernel release, but I think we should do our best to understand > > what > > _exactly_ is happening here. > > I found some other subtle thinko in the clock events code while I was > heading down the swsusp_suspend code path. I wait for confirmation that > it does not brick some endangered boxen, though. Still with this change > in the clock events code, my VAIO goes into C2 or C3 and causes the box > to wait for a helping keystroke. > > The correct solution would be, that the ACPI code ignores the lower > C-states during suspend / resume. Yes, certainly. > I simply rmmod'ed the processor module before suspend and the problem is > solved as well. The cpuidle patches make this problem more prominent due > to the possible more direct switch into lower power states, when we wait for > a long time on something. So, perhaps we can add a .suspend()/.resume() routines to the processor driver and use them to disable/enable the cpuidle functionality during a suspend/resume? > I think we really should not fiddle with the various cpu states during > the critical parts of suspend / resume. Let's keep it simple. We have > the same policy during boot and I think the suspend / resume critical > parts have similar constraints. I completely agree. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Fri, 2007-09-21 at 16:20 +0200, Rafael J. Wysocki wrote: > > > If you need any help from me with that, please let me know. > > > > I'm zooming in. It seems, that the ACPI idle code plays tricks with us. > > After debugging the swsusp_suspend() code path I figured out, that we > > end up in C2 or deeper power states while we run the suspend code. The > > same happens when we come back on resume. It looks like we disable stuff > > in the ACPI BIOS, which makes the C2 and deeper power states misbehave. > > Hm, can you please run the test I've suggested in another branch of the > thread, ie. > > # echo shutdown > /sys/power/disk > # echo disk > /sys/power/state > > without your debugging code in disk.c? > > This makes the hibernation code omit the major ACPI hooks, so if it works, > we'll know that these hooks are responsible for the problem. Yes, this works fine. We still go into C3, but this seems not longer to brick the box. > > I hacked the idle loop arch code to use halt() right before we call > > device_suspend() and switch back to the acpi idle code right after > > device_resume(). This solves the problem as well. > > Well, that seems less intrusive than changing the code ordering right before > the major kernel release, but I think we should do our best to understand what > _exactly_ is happening here. I found some other subtle thinko in the clock events code while I was heading down the swsusp_suspend code path. I wait for confirmation that it does not brick some endangered boxen, though. Still with this change in the clock events code, my VAIO goes into C2 or C3 and causes the box to wait for a helping keystroke. The correct solution would be, that the ACPI code ignores the lower C-states during suspend / resume. I simply rmmod'ed the processor module before suspend and the problem is solved as well. The cpuidle patches make this problem more prominent due to the possible more direct switch into lower power states, when we wait for a long time on something. I think we really should not fiddle with the various cpu states during the critical parts of suspend / resume. Let's keep it simple. We have the same policy during boot and I think the suspend / resume critical parts have similar constraints. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Thomas, On Friday, 21 September 2007 14:59, Thomas Gleixner wrote: > Rafael, > > On Fri, 2007-09-21 at 00:30 +0200, Rafael J. Wysocki wrote: > > > -ETOOTIRED led me too a wrong conclusion, but still it is a valuable > > > hint that this change is making things work again. > > > > Yes, it is. > > > > > I need to go down into the details of the swsusp_suspend() code path to > > > figure out, what's the root cause. > > > > If you need any help from me with that, please let me know. > > I'm zooming in. It seems, that the ACPI idle code plays tricks with us. > After debugging the swsusp_suspend() code path I figured out, that we > end up in C2 or deeper power states while we run the suspend code. The > same happens when we come back on resume. It looks like we disable stuff > in the ACPI BIOS, which makes the C2 and deeper power states misbehave. Hm, can you please run the test I've suggested in another branch of the thread, ie. # echo shutdown > /sys/power/disk # echo disk > /sys/power/state without your debugging code in disk.c? This makes the hibernation code omit the major ACPI hooks, so if it works, we'll know that these hooks are responsible for the problem. > I hacked the idle loop arch code to use halt() right before we call > device_suspend() and switch back to the acpi idle code right after > device_resume(). This solves the problem as well. Well, that seems less intrusive than changing the code ordering right before the major kernel release, but I think we should do our best to understand what _exactly_ is happening here. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Fri, 2007-09-21 at 00:30 +0200, Rafael J. Wysocki wrote: > > -ETOOTIRED led me too a wrong conclusion, but still it is a valuable > > hint that this change is making things work again. > > Yes, it is. > > > I need to go down into the details of the swsusp_suspend() code path to > > figure out, what's the root cause. > > If you need any help from me with that, please let me know. I'm zooming in. It seems, that the ACPI idle code plays tricks with us. After debugging the swsusp_suspend() code path I figured out, that we end up in C2 or deeper power states while we run the suspend code. The same happens when we come back on resume. It looks like we disable stuff in the ACPI BIOS, which makes the C2 and deeper power states misbehave. I hacked the idle loop arch code to use halt() right before we call device_suspend() and switch back to the acpi idle code right after device_resume(). This solves the problem as well. Len, any opinion on this one ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Friday, 21 September 2007 09:56, Thomas Gleixner wrote: > On Thu, 2007-09-20 at 19:35 -0400, Len Brown wrote: > > > > (Btw, the above commit message points to just my response with a > > > > testing > > > > patch to the real email: the actual explanation of the INSANE ordering > > > > is > > > > from Len Brown in > > > > > > > > > > > > https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html > > > > > > > > and there Len claims that we *must* wake up CPU's early). > > > > > > ..and points to commit 1a38416cea8ac801ae8f261074721f35317613dc which in > > > turn talks about http://bugzilla.kernel.org/show_bug.cgi?id=5651 > > > > > > Howerver, it seems that bugzilla entry may just be bogus. It talks about > > > "it appears that some firmware in the future may depend on that sequence > > > for correction operation" > > > > > > Len, Shaohua, what are the real issues here? > > > > Intel's reference BIOS for Core Duo performs some re-initialization > > in _WAK that will get blow away if INIT follows _WAK. > > IIR, it is related to re-initializing the thermal sensors. > > I opened bug 5651 when the BIOS team informed me of this issue. > > > > Yes, bringing a processor offline and then online again w/o > > an intervening suspend or reset would not evaluate _WAK, > > and thus may still run into the issue. > > If this is true, then we should disable the sys//cpu/online entry > right away. Or drop the execution of _INI from the CPU hotplug, if possible ... Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Thomas, On Thursday, 20 September 2007 23:53, Thomas Gleixner wrote: > Rafael, > > On Thu, 2007-09-20 at 23:45 +0200, Rafael J. Wysocki wrote: > > > We disable everything in device_suspend() > > > > No, we don't. sysdevs are _not_ suspended in device_suspend(). > > They are suspended in device_power_down(), which is called > > _after_ disable_nonboot_cpus() (from swsusp_suspend()). > > > > > including timekeeping, > > > > No, the timekeeping is suspended in device_power_down() (or at least it > > should > > be). > > Damn, you are right. Reading through 30 different logs confused me. > > > > enable_nonboot_cpus(); > > > > Actually, we can't do this here, because of ACPI and some interrupt handling > > related problems. Unfortunately, platform_finish() needs to go _after_ > > enable_nonboot_cpus() and device_resume() needs to go after > > platform_finish(). > > Analogously, disable_nonboot_cpus() has to go after platform_prepare(). > > > > Otherwise, some systems will break. > > Well, I don't buy this one. The system would break in the same way, when > I take CPU#1 offline before I initiate the suspend. > > > > and non-surprisingly the "my VAIO needs help from keyboard" problem went > > > away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not > > > work at all on my VAIO due to some yet not identified wreckage) > > > > Hm, I really don't know why it helps, but that's not because of the > > timekeeping > > suspend, IMO. > > It is related. We rely on some subtle thing which is not up when we > resume the non boot cpu. > > > > I did not yet look into the suspend to ram code, but I guess that there > > > is an equivalent problem. > > > > Yes, the code ordering is the same, but it's not totally wrong, IMHO. > > > > > But I have no idea why this affects Andrews jinxed VAIO (UP machine), > > > though I suspect that we have more timekeeping/timer depending code > > > somewhere waiting to bite us. > > > > That's possible. > > > > > Also I still need to debug why the HIBERNATION_TEST code path (which has > > > a msleep(5000) in it) does not fail, > > > > See above. :-) > > Yes. It makes sense. When I change the TEST code path to: > > - printk("swsusp debug: Waiting for 5 seconds.\n"); > - msleep(5000); > + printk("swsusp debug: before swsusp_suspend\n"); > + error = swsusp_suspend(); > > then I have the same effect as I get from real hibernation. And we > actually shut down time keeping somewhere in that code path. > > ACPI: PCI interrupt for device :00:1b.0 disabled > swsusp debug: before swsusp_suspend > Suspend timekeeping > swsusp: critical section: > swsusp: Need to copy 112429 pages > swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876 > swsusp: critical section: done (112429 pages copied) > Intel machine check architecture supported. > Intel machine check reporting enabled on CPU#0. > Resume timekeeping > ACPI: PCI Interrupt :00:02.0[A] -> GSI 16 (level, low) -> IRQ 16 > -> works fine > > This is with my patch applied. Without that I get: > > CPU1 is down > swsusp debug: before swsusp_suspend > Suspend timekeeping > swsusp: critical section: > swsusp: Need to copy 112429 pages > swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876 > swsusp: critical section: done (112429 pages copied) > Intel machine check architecture supported. > Intel machine check reporting enabled on CPU#0. > Resume timekeeping > Enabling non-boot CPUs > --> Waits for ever until a key is pressed Can you please run one more test? Namely, without your debugging code in disk.c, please try # echo shutdown > /sys/power/disk # echo disk > /sys/power/state Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Fri, 2007-09-21 at 14:51 +1000, Paul Mackerras wrote: > Linus Torvalds writes: > > > It would indeed be nice if we could just take CPU's down early (while > > everything is working), and run the whole suspend code with just one CPU, > > rather than having to worry about the ordering between CPU and device > > takedown. > > That is certainly what we want to do on powerpc. I would have expected that we do it exactly this way and it took me by surprise, that we do not. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 2007-09-20 at 19:35 -0400, Len Brown wrote: > > > (Btw, the above commit message points to just my response with a testing > > > patch to the real email: the actual explanation of the INSANE ordering is > > > from Len Brown in > > > > > > > > > https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html > > > > > > and there Len claims that we *must* wake up CPU's early). > > > > ..and points to commit 1a38416cea8ac801ae8f261074721f35317613dc which in > > turn talks about http://bugzilla.kernel.org/show_bug.cgi?id=5651 > > > > Howerver, it seems that bugzilla entry may just be bogus. It talks about > > "it appears that some firmware in the future may depend on that sequence > > for correction operation" > > > > Len, Shaohua, what are the real issues here? > > Intel's reference BIOS for Core Duo performs some re-initialization > in _WAK that will get blow away if INIT follows _WAK. > IIR, it is related to re-initializing the thermal sensors. > I opened bug 5651 when the BIOS team informed me of this issue. > > Yes, bringing a processor offline and then online again w/o > an intervening suspend or reset would not evaluate _WAK, > and thus may still run into the issue. If this is true, then we should disable the sys//cpu/online entry right away. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 2007-09-20 at 19:35 -0400, Len Brown wrote: (Btw, the above commit message points to just my response with a testing patch to the real email: the actual explanation of the INSANE ordering is from Len Brown in https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html and there Len claims that we *must* wake up CPU's early). ..and points to commit 1a38416cea8ac801ae8f261074721f35317613dc which in turn talks about http://bugzilla.kernel.org/show_bug.cgi?id=5651 Howerver, it seems that bugzilla entry may just be bogus. It talks about it appears that some firmware in the future may depend on that sequence for correction operation Len, Shaohua, what are the real issues here? Intel's reference BIOS for Core Duo performs some re-initialization in _WAK that will get blow away if INIT follows _WAK. IIR, it is related to re-initializing the thermal sensors. I opened bug 5651 when the BIOS team informed me of this issue. Yes, bringing a processor offline and then online again w/o an intervening suspend or reset would not evaluate _WAK, and thus may still run into the issue. If this is true, then we should disable the sys//cpu/online entry right away. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Fri, 2007-09-21 at 14:51 +1000, Paul Mackerras wrote: Linus Torvalds writes: It would indeed be nice if we could just take CPU's down early (while everything is working), and run the whole suspend code with just one CPU, rather than having to worry about the ordering between CPU and device takedown. That is certainly what we want to do on powerpc. I would have expected that we do it exactly this way and it took me by surprise, that we do not. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Thomas, On Thursday, 20 September 2007 23:53, Thomas Gleixner wrote: Rafael, On Thu, 2007-09-20 at 23:45 +0200, Rafael J. Wysocki wrote: We disable everything in device_suspend() No, we don't. sysdevs are _not_ suspended in device_suspend(). They are suspended in device_power_down(), which is called _after_ disable_nonboot_cpus() (from swsusp_suspend()). including timekeeping, No, the timekeeping is suspended in device_power_down() (or at least it should be). Damn, you are right. Reading through 30 different logs confused me. enable_nonboot_cpus(); Actually, we can't do this here, because of ACPI and some interrupt handling related problems. Unfortunately, platform_finish() needs to go _after_ enable_nonboot_cpus() and device_resume() needs to go after platform_finish(). Analogously, disable_nonboot_cpus() has to go after platform_prepare(). Otherwise, some systems will break. Well, I don't buy this one. The system would break in the same way, when I take CPU#1 offline before I initiate the suspend. and non-surprisingly the my VAIO needs help from keyboard problem went away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not work at all on my VAIO due to some yet not identified wreckage) Hm, I really don't know why it helps, but that's not because of the timekeeping suspend, IMO. It is related. We rely on some subtle thing which is not up when we resume the non boot cpu. I did not yet look into the suspend to ram code, but I guess that there is an equivalent problem. Yes, the code ordering is the same, but it's not totally wrong, IMHO. But I have no idea why this affects Andrews jinxed VAIO (UP machine), though I suspect that we have more timekeeping/timer depending code somewhere waiting to bite us. That's possible. Also I still need to debug why the HIBERNATION_TEST code path (which has a msleep(5000) in it) does not fail, See above. :-) Yes. It makes sense. When I change the TEST code path to: - printk(swsusp debug: Waiting for 5 seconds.\n); - msleep(5000); + printk(swsusp debug: before swsusp_suspend\n); + error = swsusp_suspend(); then I have the same effect as I get from real hibernation. And we actually shut down time keeping somewhere in that code path. ACPI: PCI interrupt for device :00:1b.0 disabled swsusp debug: before swsusp_suspend Suspend timekeeping swsusp: critical section: swsusp: Need to copy 112429 pages swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876 swsusp: critical section: done (112429 pages copied) Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Resume timekeeping ACPI: PCI Interrupt :00:02.0[A] - GSI 16 (level, low) - IRQ 16 - works fine This is with my patch applied. Without that I get: CPU1 is down swsusp debug: before swsusp_suspend Suspend timekeeping swsusp: critical section: swsusp: Need to copy 112429 pages swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876 swsusp: critical section: done (112429 pages copied) Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Resume timekeeping Enabling non-boot CPUs -- Waits for ever until a key is pressed Can you please run one more test? Namely, without your debugging code in disk.c, please try # echo shutdown /sys/power/disk # echo disk /sys/power/state Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Friday, 21 September 2007 09:56, Thomas Gleixner wrote: On Thu, 2007-09-20 at 19:35 -0400, Len Brown wrote: (Btw, the above commit message points to just my response with a testing patch to the real email: the actual explanation of the INSANE ordering is from Len Brown in https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html and there Len claims that we *must* wake up CPU's early). ..and points to commit 1a38416cea8ac801ae8f261074721f35317613dc which in turn talks about http://bugzilla.kernel.org/show_bug.cgi?id=5651 Howerver, it seems that bugzilla entry may just be bogus. It talks about it appears that some firmware in the future may depend on that sequence for correction operation Len, Shaohua, what are the real issues here? Intel's reference BIOS for Core Duo performs some re-initialization in _WAK that will get blow away if INIT follows _WAK. IIR, it is related to re-initializing the thermal sensors. I opened bug 5651 when the BIOS team informed me of this issue. Yes, bringing a processor offline and then online again w/o an intervening suspend or reset would not evaluate _WAK, and thus may still run into the issue. If this is true, then we should disable the sys//cpu/online entry right away. Or drop the execution of _INI from the CPU hotplug, if possible ... Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Fri, 2007-09-21 at 00:30 +0200, Rafael J. Wysocki wrote: -ETOOTIRED led me too a wrong conclusion, but still it is a valuable hint that this change is making things work again. Yes, it is. I need to go down into the details of the swsusp_suspend() code path to figure out, what's the root cause. If you need any help from me with that, please let me know. I'm zooming in. It seems, that the ACPI idle code plays tricks with us. After debugging the swsusp_suspend() code path I figured out, that we end up in C2 or deeper power states while we run the suspend code. The same happens when we come back on resume. It looks like we disable stuff in the ACPI BIOS, which makes the C2 and deeper power states misbehave. I hacked the idle loop arch code to use halt() right before we call device_suspend() and switch back to the acpi idle code right after device_resume(). This solves the problem as well. Len, any opinion on this one ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Thomas, On Friday, 21 September 2007 14:59, Thomas Gleixner wrote: Rafael, On Fri, 2007-09-21 at 00:30 +0200, Rafael J. Wysocki wrote: -ETOOTIRED led me too a wrong conclusion, but still it is a valuable hint that this change is making things work again. Yes, it is. I need to go down into the details of the swsusp_suspend() code path to figure out, what's the root cause. If you need any help from me with that, please let me know. I'm zooming in. It seems, that the ACPI idle code plays tricks with us. After debugging the swsusp_suspend() code path I figured out, that we end up in C2 or deeper power states while we run the suspend code. The same happens when we come back on resume. It looks like we disable stuff in the ACPI BIOS, which makes the C2 and deeper power states misbehave. Hm, can you please run the test I've suggested in another branch of the thread, ie. # echo shutdown /sys/power/disk # echo disk /sys/power/state without your debugging code in disk.c? This makes the hibernation code omit the major ACPI hooks, so if it works, we'll know that these hooks are responsible for the problem. I hacked the idle loop arch code to use halt() right before we call device_suspend() and switch back to the acpi idle code right after device_resume(). This solves the problem as well. Well, that seems less intrusive than changing the code ordering right before the major kernel release, but I think we should do our best to understand what _exactly_ is happening here. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Fri, 2007-09-21 at 16:20 +0200, Rafael J. Wysocki wrote: If you need any help from me with that, please let me know. I'm zooming in. It seems, that the ACPI idle code plays tricks with us. After debugging the swsusp_suspend() code path I figured out, that we end up in C2 or deeper power states while we run the suspend code. The same happens when we come back on resume. It looks like we disable stuff in the ACPI BIOS, which makes the C2 and deeper power states misbehave. Hm, can you please run the test I've suggested in another branch of the thread, ie. # echo shutdown /sys/power/disk # echo disk /sys/power/state without your debugging code in disk.c? This makes the hibernation code omit the major ACPI hooks, so if it works, we'll know that these hooks are responsible for the problem. Yes, this works fine. We still go into C3, but this seems not longer to brick the box. I hacked the idle loop arch code to use halt() right before we call device_suspend() and switch back to the acpi idle code right after device_resume(). This solves the problem as well. Well, that seems less intrusive than changing the code ordering right before the major kernel release, but I think we should do our best to understand what _exactly_ is happening here. I found some other subtle thinko in the clock events code while I was heading down the swsusp_suspend code path. I wait for confirmation that it does not brick some endangered boxen, though. Still with this change in the clock events code, my VAIO goes into C2 or C3 and causes the box to wait for a helping keystroke. The correct solution would be, that the ACPI code ignores the lower C-states during suspend / resume. I simply rmmod'ed the processor module before suspend and the problem is solved as well. The cpuidle patches make this problem more prominent due to the possible more direct switch into lower power states, when we wait for a long time on something. I think we really should not fiddle with the various cpu states during the critical parts of suspend / resume. Let's keep it simple. We have the same policy during boot and I think the suspend / resume critical parts have similar constraints. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Friday, 21 September 2007 18:27, Thomas Gleixner wrote: Rafael, On Fri, 2007-09-21 at 16:20 +0200, Rafael J. Wysocki wrote: If you need any help from me with that, please let me know. I'm zooming in. It seems, that the ACPI idle code plays tricks with us. After debugging the swsusp_suspend() code path I figured out, that we end up in C2 or deeper power states while we run the suspend code. The same happens when we come back on resume. It looks like we disable stuff in the ACPI BIOS, which makes the C2 and deeper power states misbehave. Hm, can you please run the test I've suggested in another branch of the thread, ie. # echo shutdown /sys/power/disk # echo disk /sys/power/state without your debugging code in disk.c? This makes the hibernation code omit the major ACPI hooks, so if it works, we'll know that these hooks are responsible for the problem. Yes, this works fine. We still go into C3, but this seems not longer to brick the box. I hacked the idle loop arch code to use halt() right before we call device_suspend() and switch back to the acpi idle code right after device_resume(). This solves the problem as well. Well, that seems less intrusive than changing the code ordering right before the major kernel release, but I think we should do our best to understand what _exactly_ is happening here. I found some other subtle thinko in the clock events code while I was heading down the swsusp_suspend code path. I wait for confirmation that it does not brick some endangered boxen, though. Still with this change in the clock events code, my VAIO goes into C2 or C3 and causes the box to wait for a helping keystroke. The correct solution would be, that the ACPI code ignores the lower C-states during suspend / resume. Yes, certainly. I simply rmmod'ed the processor module before suspend and the problem is solved as well. The cpuidle patches make this problem more prominent due to the possible more direct switch into lower power states, when we wait for a long time on something. So, perhaps we can add a .suspend()/.resume() routines to the processor driver and use them to disable/enable the cpuidle functionality during a suspend/resume? I think we really should not fiddle with the various cpu states during the critical parts of suspend / resume. Let's keep it simple. We have the same policy during boot and I think the suspend / resume critical parts have similar constraints. I completely agree. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Thomas, On Friday, 21 September 2007 21:16, Thomas Gleixner wrote: Rafael, On Fri, 2007-09-21 at 21:20 +0200, Rafael J. Wysocki wrote: On Friday, 21 September 2007 18:27, Thomas Gleixner wrote: I simply rmmod'ed the processor module before suspend and the problem is solved as well. The cpuidle patches make this problem more prominent due to the possible more direct switch into lower power states, when we wait for a long time on something. So, perhaps we can add a .suspend()/.resume() routines to the processor driver and use them to disable/enable the cpuidle functionality during a suspend/resume? http://tglx.de/private/tglx/p.diff untested yet, but I'm on the way to do that :) Heh, I thought of the same thing. :-) Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Fri, 2007-09-21 at 21:20 +0200, Rafael J. Wysocki wrote: On Friday, 21 September 2007 18:27, Thomas Gleixner wrote: I simply rmmod'ed the processor module before suspend and the problem is solved as well. The cpuidle patches make this problem more prominent due to the possible more direct switch into lower power states, when we wait for a long time on something. So, perhaps we can add a .suspend()/.resume() routines to the processor driver and use them to disable/enable the cpuidle functionality during a suspend/resume? http://tglx.de/private/tglx/p.diff untested yet, but I'm on the way to do that :) tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Linus Torvalds writes: > It would indeed be nice if we could just take CPU's down early (while > everything is working), and run the whole suspend code with just one CPU, > rather than having to worry about the ordering between CPU and device > takedown. That is certainly what we want to do on powerpc. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thursday 20 September 2007 17:55, Linus Torvalds wrote: > > On Thu, 20 Sep 2007, Linus Torvalds wrote: > > > > (Btw, the above commit message points to just my response with a testing > > patch to the real email: the actual explanation of the INSANE ordering is > > from Len Brown in > > > > > > https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html > > > > and there Len claims that we *must* wake up CPU's early). > > ..and points to commit 1a38416cea8ac801ae8f261074721f35317613dc which in > turn talks about http://bugzilla.kernel.org/show_bug.cgi?id=5651 > > Howerver, it seems that bugzilla entry may just be bogus. It talks about > "it appears that some firmware in the future may depend on that sequence > for correction operation" > > Len, Shaohua, what are the real issues here? Intel's reference BIOS for Core Duo performs some re-initialization in _WAK that will get blow away if INIT follows _WAK. IIR, it is related to re-initializing the thermal sensors. I opened bug 5651 when the BIOS team informed me of this issue. Yes, bringing a processor offline and then online again w/o an intervening suspend or reset would not evaluate _WAK, and thus may still run into the issue. I don't know if this is a widespread issue and a commonly used BIOS hook, or if it is specific to certain processors. -Len > It would indeed be nice if we could just take CPU's down early (while > everything is working), and run the whole suspend code with just one CPU, > rather than having to worry about the ordering between CPU and device > takedown. > > That said, at least with STR, the situation is: > > 1) suspend_console > 2) device_suspend(PMSG_SUSPEND) (== ->suspend) > 3) disable_nonboot_cpus() > 4) device_power_down(PMSG_SUSPEND) (== ->suspend_late) > 5) pm_ops->enter() > 6) device_power_up() (== ->resume_early) > 7) enable_nonboot_cpus() > 8) pm_finish() > 9) device_resume() (== ->resume > 10) resume_console > > So if we agree that things like timers etc should *never* be suspended by > the early suspend, and *always* use "suspend_late/resume_early", then at > least STR should be ok. > > And I think that's a damn reasonable thing to agree on: timers (and > anything else that CPU shutdown/bringup could *possibly* care about) > should be considered core enough that they had better be on the > suspend_late/resume_early list. > > Thomas, Rafael, can you verify that at least STR is ok in this respect? > > Linus > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Friday, 21 September 2007 00:05, Thomas Gleixner wrote: > Linus, > > On Thu, 2007-09-20 at 14:55 -0700, Linus Torvalds wrote: > > And I think that's a damn reasonable thing to agree on: timers (and > > anything else that CPU shutdown/bringup could *possibly* care about) > > should be considered core enough that they had better be on the > > suspend_late/resume_early list. > > > > Thomas, Rafael, can you verify that at least STR is ok in this respect? > > -ETOOTIRED led me too a wrong conclusion, but still it is a valuable > hint that this change is making things work again. Yes, it is. > I need to go down into the details of the swsusp_suspend() code path to > figure out, what's the root cause. If you need any help from me with that, please let me know. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Thomas, On Thursday, 20 September 2007 23:53, Thomas Gleixner wrote: > Rafael, > > On Thu, 2007-09-20 at 23:45 +0200, Rafael J. Wysocki wrote: > > > We disable everything in device_suspend() > > > > No, we don't. sysdevs are _not_ suspended in device_suspend(). > > They are suspended in device_power_down(), which is called > > _after_ disable_nonboot_cpus() (from swsusp_suspend()). > > > > > including timekeeping, > > > > No, the timekeeping is suspended in device_power_down() (or at least it > > should > > be). > > Damn, you are right. Reading through 30 different logs confused me. > > > > enable_nonboot_cpus(); > > > > Actually, we can't do this here, because of ACPI and some interrupt handling > > related problems. Unfortunately, platform_finish() needs to go _after_ > > enable_nonboot_cpus() and device_resume() needs to go after > > platform_finish(). > > Analogously, disable_nonboot_cpus() has to go after platform_prepare(). > > > > Otherwise, some systems will break. > > Well, I don't buy this one. The system would break in the same way, when > I take CPU#1 offline before I initiate the suspend. I was referring to the resume part. If we call enable_nonboot_cpus(), which executes the _INI ACPI control method, after platform_finish(), which executes the _WAK global ACPI control method, things will break. That already happened in the past, when the code ordering was different, AFAICS. > > > and non-surprisingly the "my VAIO needs help from keyboard" problem went > > > away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not > > > work at all on my VAIO due to some yet not identified wreckage) > > > > Hm, I really don't know why it helps, but that's not because of the > > timekeeping > > suspend, IMO. > > It is related. We rely on some subtle thing which is not up when we > resume the non boot cpu. Yes, it looks so. > > > I did not yet look into the suspend to ram code, but I guess that there > > > is an equivalent problem. > > > > Yes, the code ordering is the same, but it's not totally wrong, IMHO. > > > > > But I have no idea why this affects Andrews jinxed VAIO (UP machine), > > > though I suspect that we have more timekeeping/timer depending code > > > somewhere waiting to bite us. > > > > That's possible. > > > > > Also I still need to debug why the HIBERNATION_TEST code path (which has > > > a msleep(5000) in it) does not fail, > > > > See above. :-) > > Yes. It makes sense. When I change the TEST code path to: > > - printk("swsusp debug: Waiting for 5 seconds.\n"); > - msleep(5000); > + printk("swsusp debug: before swsusp_suspend\n"); > + error = swsusp_suspend(); > > then I have the same effect as I get from real hibernation. And we > actually shut down time keeping somewhere in that code path. > > ACPI: PCI interrupt for device :00:1b.0 disabled > swsusp debug: before swsusp_suspend > Suspend timekeeping Exactly. timekeeping_suspend() is called from device_power_down(), which is called from swsusp_suspend() (after disabling interrupts). > swsusp: critical section: > swsusp: Need to copy 112429 pages > swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876 > swsusp: critical section: done (112429 pages copied) > Intel machine check architecture supported. > Intel machine check reporting enabled on CPU#0. > Resume timekeeping > ACPI: PCI Interrupt :00:02.0[A] -> GSI 16 (level, low) -> IRQ 16 > -> works fine > > This is with my patch applied. Without that I get: > > CPU1 is down > swsusp debug: before swsusp_suspend > Suspend timekeeping > swsusp: critical section: > swsusp: Need to copy 112429 pages > swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876 > swsusp: critical section: done (112429 pages copied) > Intel machine check architecture supported. > Intel machine check reporting enabled on CPU#0. > Resume timekeeping > Enabling non-boot CPUs > --> Waits for ever until a key is pressed Well, perhaps there's something else that we should suspend late and resume early, but we don't? Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Linus, On Thu, 2007-09-20 at 14:55 -0700, Linus Torvalds wrote: > And I think that's a damn reasonable thing to agree on: timers (and > anything else that CPU shutdown/bringup could *possibly* care about) > should be considered core enough that they had better be on the > suspend_late/resume_early list. > > Thomas, Rafael, can you verify that at least STR is ok in this respect? -ETOOTIRED led me too a wrong conclusion, but still it is a valuable hint that this change is making things work again. I need to go down into the details of the swsusp_suspend() code path to figure out, what's the root cause. Sorry for the noise, but I'm zooming in. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Thu, 2007-09-20 at 23:54 +0200, Rafael J. Wysocki wrote: > > Hmm. This is close to the ordering we have in STR too. > > > > I have some dim memory of there being some ACPI reason why it had to be > > done that way. > > Yes. We're executing _INI from the CPU initialization code and that shouldn't > be done after _WAK, which is called from platform_finish(). If I tear down CPU#1 right before I tell the kernel to hibernate, then the box must explode in the same way. It does not. On none of 4 tested laptops. Of course only the jinxed VAIO one exposes the "please press a key problem". I need to follow down the swsusp_suspend() code path to figure out, why this breaks the box. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 20 Sep 2007, Linus Torvalds wrote: > > (Btw, the above commit message points to just my response with a testing > patch to the real email: the actual explanation of the INSANE ordering is > from Len Brown in > > > https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html > > and there Len claims that we *must* wake up CPU's early). ..and points to commit 1a38416cea8ac801ae8f261074721f35317613dc which in turn talks about http://bugzilla.kernel.org/show_bug.cgi?id=5651 Howerver, it seems that bugzilla entry may just be bogus. It talks about "it appears that some firmware in the future may depend on that sequence for correction operation" Len, Shaohua, what are the real issues here? It would indeed be nice if we could just take CPU's down early (while everything is working), and run the whole suspend code with just one CPU, rather than having to worry about the ordering between CPU and device takedown. That said, at least with STR, the situation is: 1) suspend_console 2) device_suspend(PMSG_SUSPEND)(== ->suspend) 3) disable_nonboot_cpus() 4) device_power_down(PMSG_SUSPEND) (== ->suspend_late) 5) pm_ops->enter() 6) device_power_up() (== ->resume_early) 7) enable_nonboot_cpus() 8) pm_finish() 9) device_resume() (== ->resume 10) resume_console So if we agree that things like timers etc should *never* be suspended by the early suspend, and *always* use "suspend_late/resume_early", then at least STR should be ok. And I think that's a damn reasonable thing to agree on: timers (and anything else that CPU shutdown/bringup could *possibly* care about) should be considered core enough that they had better be on the suspend_late/resume_early list. Thomas, Rafael, can you verify that at least STR is ok in this respect? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Thu, 2007-09-20 at 23:45 +0200, Rafael J. Wysocki wrote: > > We disable everything in device_suspend() > > No, we don't. sysdevs are _not_ suspended in device_suspend(). > They are suspended in device_power_down(), which is called > _after_ disable_nonboot_cpus() (from swsusp_suspend()). > > > including timekeeping, > > No, the timekeeping is suspended in device_power_down() (or at least it should > be). Damn, you are right. Reading through 30 different logs confused me. > > enable_nonboot_cpus(); > > Actually, we can't do this here, because of ACPI and some interrupt handling > related problems. Unfortunately, platform_finish() needs to go _after_ > enable_nonboot_cpus() and device_resume() needs to go after platform_finish(). > Analogously, disable_nonboot_cpus() has to go after platform_prepare(). > > Otherwise, some systems will break. Well, I don't buy this one. The system would break in the same way, when I take CPU#1 offline before I initiate the suspend. > > and non-surprisingly the "my VAIO needs help from keyboard" problem went > > away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not > > work at all on my VAIO due to some yet not identified wreckage) > > Hm, I really don't know why it helps, but that's not because of the > timekeeping > suspend, IMO. It is related. We rely on some subtle thing which is not up when we resume the non boot cpu. > > I did not yet look into the suspend to ram code, but I guess that there > > is an equivalent problem. > > Yes, the code ordering is the same, but it's not totally wrong, IMHO. > > > But I have no idea why this affects Andrews jinxed VAIO (UP machine), > > though I suspect that we have more timekeeping/timer depending code > > somewhere waiting to bite us. > > That's possible. > > > Also I still need to debug why the HIBERNATION_TEST code path (which has > > a msleep(5000) in it) does not fail, > > See above. :-) Yes. It makes sense. When I change the TEST code path to: - printk("swsusp debug: Waiting for 5 seconds.\n"); - msleep(5000); + printk("swsusp debug: before swsusp_suspend\n"); + error = swsusp_suspend(); then I have the same effect as I get from real hibernation. And we actually shut down time keeping somewhere in that code path. ACPI: PCI interrupt for device :00:1b.0 disabled swsusp debug: before swsusp_suspend Suspend timekeeping swsusp: critical section: swsusp: Need to copy 112429 pages swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876 swsusp: critical section: done (112429 pages copied) Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Resume timekeeping ACPI: PCI Interrupt :00:02.0[A] -> GSI 16 (level, low) -> IRQ 16 -> works fine This is with my patch applied. Without that I get: CPU1 is down swsusp debug: before swsusp_suspend Suspend timekeeping swsusp: critical section: swsusp: Need to copy 112429 pages swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876 swsusp: critical section: done (112429 pages copied) Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Resume timekeeping Enabling non-boot CPUs --> Waits for ever until a key is pressed Thanks, tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thursday, 20 September 2007 23:35, Linus Torvalds wrote: > > On Thu, 20 Sep 2007, Thomas Gleixner wrote: > > > > In meantime I figured out what's happening. The ordering in > > hibernate_snapshot() is wrong. It does: Actually, this is incorrect. Please read my reply to Thomas, just sent. > Hmm. This is close to the ordering we have in STR too. > > I have some dim memory of there being some ACPI reason why it had to be > done that way. Yes. We're executing _INI from the CPU initialization code and that shouldn't be done after _WAK, which is called from platform_finish(). > In fact, this was done in commit e3c7db621bed4afb8e231cb005057f2feb5db557, > long ago, by Rafael: > > As indicated in a recent thread on Linux-PM, it's necessary to call > pm_ops->finish() before devce_resume(), but enable_nonboot_cpus() has to > be > called before pm_ops->finish() (cf. > http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). For > consistency, it seems reasonable to call disable_nonboot_cpus() after > device_suspend(). > > This way the suspend code will remain symmetrical with respect to the > resume > code and it may allow us to speed up things in the future by suspending > and > resuming devices and/or saving the suspend image in many threads. > > The following series of patches reorders the suspend and resume code so > that > nonboot CPUs are disabled after devices have been suspended and enabled > before > the devices are resumed. It also causes pm_ops->finish() to be called > after > enable_nonboot_cpus() wherever necessary. > > Hmm? > > It's entirely possible that that commit was simply just buggy, and we > should indeed move the CPU down/up to be early/late - we've fixed other > ordering issues since that commit went in. But this whole area is very > murky. > > (Btw, the above commit message points to just my response with a testing > patch to the real email: the actual explanation of the INSANE ordering is > from Len Brown in > > > https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html > > and there Len claims that we *must* wake up CPU's early). > > I personally think that the whole ACPI ordering requirements are just > insane, but the point of this email is to point these different > requirements out, and hopefully we can get something that works for > everybody. Sure. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 20 Sep 2007, Thomas Gleixner wrote: > > In meantime I figured out what's happening. The ordering in > hibernate_snapshot() is wrong. It does: Hmm. This is close to the ordering we have in STR too. I have some dim memory of there being some ACPI reason why it had to be done that way. In fact, this was done in commit e3c7db621bed4afb8e231cb005057f2feb5db557, long ago, by Rafael: As indicated in a recent thread on Linux-PM, it's necessary to call pm_ops->finish() before devce_resume(), but enable_nonboot_cpus() has to be called before pm_ops->finish() (cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). For consistency, it seems reasonable to call disable_nonboot_cpus() after device_suspend(). This way the suspend code will remain symmetrical with respect to the resume code and it may allow us to speed up things in the future by suspending and resuming devices and/or saving the suspend image in many threads. The following series of patches reorders the suspend and resume code so that nonboot CPUs are disabled after devices have been suspended and enabled before the devices are resumed. It also causes pm_ops->finish() to be called after enable_nonboot_cpus() wherever necessary. Hmm? It's entirely possible that that commit was simply just buggy, and we should indeed move the CPU down/up to be early/late - we've fixed other ordering issues since that commit went in. But this whole area is very murky. (Btw, the above commit message points to just my response with a testing patch to the real email: the actual explanation of the INSANE ordering is from Len Brown in https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html and there Len claims that we *must* wake up CPU's early). I personally think that the whole ACPI ordering requirements are just insane, but the point of this email is to point these different requirements out, and hopefully we can get something that works for everybody. Len added to Cc. Len? Thomas wants to call 'disable_nonboot_cpus()' early, and 'enable_nonboot_cpus()' late. Can you explain why that is wrong? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Thomas, On Thursday, 20 September 2007 23:08, Thomas Gleixner wrote: > Rafael, > > On Thu, 2007-09-20 at 22:39 +0200, Rafael J. Wysocki wrote: > > > Works as well. What's the difference between this and the real thing ? > > > > The real thing also calls device_power_down(PMSG_FREEZE), which is a > > counterpart of sysdev_shutdown(), more or less, and I think that's what goes > > belly up. > > > > You can use the patch below (on top of -rc6-mm1), which just disables the > > image > > creation (that should be irrelevant anyway) and see what happens. > > In meantime I figured out what's happening. The ordering in > hibernate_snapshot() is wrong. It does: > > swsusp_shrink_memory(); > suspend_console(); > device_suspend(PMSG_FREEZE); > platform_prepare(platform_mode); > > disable_nonboot_cpus(); > > swsusp_suspend(); > > enable_nonboot_cpus(); > > platform_finish(platform_mode); > device_resume(); > resume_console(); > > We disable everything in device_suspend() No, we don't. sysdevs are _not_ suspended in device_suspend(). They are suspended in device_power_down(), which is called _after_ disable_nonboot_cpus() (from swsusp_suspend()). > including timekeeping, No, the timekeeping is suspended in device_power_down() (or at least it should be). > so any code which is depending on working timekeeping and timer > functionality (which is suspended in timekeeping_suspend() as well) is > busted. > > enable_nonboot_cpus() definitely relies on working timekeeping and > timers depending on the codepath. It's just a surprise that this did not > blow up earlier (also before clock events). > > I changed the ordering of the above to: > > disable_nonboot_cpus(); > > swsusp_shrink_memory(); > suspend_console(); > device_suspend(PMSG_FREEZE); > platform_prepare(platform_mode); > swsusp_suspend(); > platform_finish(platform_mode); > device_resume(); > resume_console(); > > enable_nonboot_cpus(); Actually, we can't do this here, because of ACPI and some interrupt handling related problems. Unfortunately, platform_finish() needs to go _after_ enable_nonboot_cpus() and device_resume() needs to go after platform_finish(). Analogously, disable_nonboot_cpus() has to go after platform_prepare(). Otherwise, some systems will break. > and non-surprisingly the "my VAIO needs help from keyboard" problem went > away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not > work at all on my VAIO due to some yet not identified wreckage) Hm, I really don't know why it helps, but that's not because of the timekeeping suspend, IMO. > I did not yet look into the suspend to ram code, but I guess that there > is an equivalent problem. Yes, the code ordering is the same, but it's not totally wrong, IMHO. > But I have no idea why this affects Andrews jinxed VAIO (UP machine), > though I suspect that we have more timekeeping/timer depending code > somewhere waiting to bite us. That's possible. > Also I still need to debug why the HIBERNATION_TEST code path (which has > a msleep(5000) in it) does not fail, See above. :-) > but I postpone this until tomorrow morning. I'm dead tired after hunting > this Heisenbug which changes with every other printk added to the code. > I'm going to add some really noisy messages for everything which accesses > timekeeping / timers _after_ those systems have been shut down. > > We really need to fix this once and forever _before_ 2.6.23 final, even > if it requires a -rc8. Agreed. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Thu, 2007-09-20 at 22:39 +0200, Rafael J. Wysocki wrote: > > Works as well. What's the difference between this and the real thing ? > > The real thing also calls device_power_down(PMSG_FREEZE), which is a > counterpart of sysdev_shutdown(), more or less, and I think that's what goes > belly up. > > You can use the patch below (on top of -rc6-mm1), which just disables the > image > creation (that should be irrelevant anyway) and see what happens. In meantime I figured out what's happening. The ordering in hibernate_snapshot() is wrong. It does: swsusp_shrink_memory(); suspend_console(); device_suspend(PMSG_FREEZE); platform_prepare(platform_mode); disable_nonboot_cpus(); swsusp_suspend(); enable_nonboot_cpus(); platform_finish(platform_mode); device_resume(); resume_console(); We disable everything in device_suspend() including timekeeping, so any code which is depending on working timekeeping and timer functionality (which is suspended in timekeeping_suspend() as well) is busted. enable_nonboot_cpus() definitely relies on working timekeeping and timers depending on the codepath. It's just a surprise that this did not blow up earlier (also before clock events). I changed the ordering of the above to: disable_nonboot_cpus(); swsusp_shrink_memory(); suspend_console(); device_suspend(PMSG_FREEZE); platform_prepare(platform_mode); swsusp_suspend(); platform_finish(platform_mode); device_resume(); resume_console(); enable_nonboot_cpus(); and non-surprisingly the "my VAIO needs help from keyboard" problem went away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not work at all on my VAIO due to some yet not identified wreckage) I did not yet look into the suspend to ram code, but I guess that there is an equivalent problem. But I have no idea why this affects Andrews jinxed VAIO (UP machine), though I suspect that we have more timekeeping/timer depending code somewhere waiting to bite us. Also I still need to debug why the HIBERNATION_TEST code path (which has a msleep(5000) in it) does not fail, but I postpone this until tomorrow morning. I'm dead tired after hunting this Heisenbug which changes with every other printk added to the code. I'm going to add some really noisy messages for everything which accesses timekeeping / timers _after_ those systems have been shut down. We really need to fix this once and forever _before_ 2.6.23 final, even if it requires a -rc8. Thanks, tglx --- a/kernel/power/disk.c 2007-09-11 09:25:24.0 +0200 +++ b/kernel/power/disk.c 2007-09-20 22:47:30.0 +0200 @@ -130,10 +130,14 @@ int hibernation_snapshot(int platform_mo { int error; + error = disable_nonboot_cpus(); + if (error) + goto resume_cpus; + /* Free memory before shutting down devices. */ error = swsusp_shrink_memory(); if (error) - return error; + goto resume_cpus; suspend_console(); error = device_suspend(PMSG_FREEZE); @@ -144,23 +148,22 @@ int hibernation_snapshot(int platform_mo if (error) goto Resume_devices; - error = disable_nonboot_cpus(); - if (!error) { - if (hibernation_mode != HIBERNATION_TEST) { - in_suspend = 1; - error = swsusp_suspend(); - /* Control returns here after successful restore */ - } else { - printk("swsusp debug: Waiting for 5 seconds.\n"); - mdelay(5000); - } + if (hibernation_mode != HIBERNATION_TEST) { + in_suspend = 1; + error = swsusp_suspend(); + /* Control returns here after successful restore */ + } else { + printk("swsusp debug: Waiting for 5 seconds.\n"); + mdelay(5000); } - enable_nonboot_cpus(); + Resume_devices: platform_finish(platform_mode); device_resume(); Resume_console: resume_console(); +resume_cpus: + enable_nonboot_cpus(); return error; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thursday, 20 September 2007 16:12, Rafael J. Wysocki wrote: > On Thursday, 20 September 2007 15:43, Thomas Gleixner wrote: > > On Thu, 2007-09-20 at 15:29 +0200, Rafael J. Wysocki wrote: > > > > > I haven't had the time to check if any special command line arguments > > > > > help. > > > > > Will check tomorrow. > > > > > > > > Can you please disable the patches, which I sent Linus wards: > > > > > > > > timekeeping-access-rtc-outside-xtime-lock.patch > > > > xtime-supsend-resume-fixup.patch > > > > acpi-reevaluate-c-p-t-states.patch > > > > clockevents-enforce-broadcast-on-resume.patch > > > > clockevents-do-not-shutdown-broadcast-device-in-oneshot-mode.patch > > > > clockevents-prevent-stale-tick-update-on-offline-cpu.patch > > > > > > I have skipped all of them, but the resulting kernel behaves in the same > > > way (ie. doesn't boot). > > > > > > > Without those patches you get the state of rc4-mm1. It would be > > > > interesting to know which one interferes with the acpi stuff. > > > > > > It looks like something else went in between -rc4 and -rc6 that broke your > > > patch. I wonder what it might be ... > > > > Hmm. Can you please go back in the -hrt project history: > > http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2 > > http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 Each of them on top of 2.6.23-rc6 gives the same symptoms as rc6-hrt2 (ie. the box doesn't boot). I'm going to check if -rc5 with patch-2.6.23-rc4-hrt1 on top of it works and if not (I suspect so), I'll bisect the Linus' tree between -rc4 and -rc5 in order to identify the responsible patch. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thursday, 20 September 2007 17:49, Thomas Gleixner wrote: > On Thu, 2007-09-20 at 16:50 +0200, Thomas Gleixner wrote: > > > > > Well, the above may affect SMP systems, but the Vaio is UP. Hmm? > > > > > > > > My jinxed VAIO variant is SMP, but it looks like the same mysterious > > > > error. > > > > > > Hm. Have you tried > > > > > > # echo test > /sys/power/disk > > > # echo disk > /sys/power/state > > > > > > (should suspend devices and disable the nonboot CPUs, wait for 5 sec. and > > > restore everything)? > > > > Works fine, but I need to reboot into a non debug kernel to verify. > > Works as well. What's the difference between this and the real thing ? The real thing also calls device_power_down(PMSG_FREEZE), which is a counterpart of sysdev_shutdown(), more or less, and I think that's what goes belly up. You can use the patch below (on top of -rc6-mm1), which just disables the image creation (that should be irrelevant anyway) and see what happens. Greetings, Rafael --- kernel/power/disk.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) Index: linux-2.6.23-rc6-mm1/kernel/power/disk.c === --- linux-2.6.23-rc6-mm1.orig/kernel/power/disk.c +++ linux-2.6.23-rc6-mm1/kernel/power/disk.c @@ -168,13 +168,14 @@ int create_image(int platform_mode) } save_processor_state(); - error = swsusp_arch_suspend(); - if (error) - printk(KERN_ERR "Error %d while creating the image\n", error); + //error = swsusp_arch_suspend(); + //if (error) + // printk(KERN_ERR "Error %d while creating the image\n", error); /* Restore control flow magically appears here */ restore_processor_state(); - if (!in_suspend) - platform_leave(platform_mode); + //if (!in_suspend) + // platform_leave(platform_mode); + in_suspend = 0; /* NOTE: device_power_up() is just a resume() for devices * that suspended with irqs off ... no overall powerup. */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 2007-09-20 at 16:50 +0200, Thomas Gleixner wrote: > > > > Well, the above may affect SMP systems, but the Vaio is UP. Hmm? > > > > > > My jinxed VAIO variant is SMP, but it looks like the same mysterious > > > error. > > > > Hm. Have you tried > > > > # echo test > /sys/power/disk > > # echo disk > /sys/power/state > > > > (should suspend devices and disable the nonboot CPUs, wait for 5 sec. and > > restore everything)? > > Works fine, but I need to reboot into a non debug kernel to verify. Works as well. What's the difference between this and the real thing ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 2007-09-20 at 16:47 +0200, Rafael J. Wysocki wrote: > On Thursday, 20 September 2007 15:53, Thomas Gleixner wrote: > > On Thu, 2007-09-20 at 16:12 +0200, Rafael J. Wysocki wrote: > > > > Vs. the suspend / resume wreckage of rc6-mm1 / rc6-hrt2: > > > > > > ie. the one on the Vaio (I assume). > > > > > > > I'm still fishing in rather dark water. Depending on the added > > > > instrumentation points the problem mutates up to the point where it > > > > vanishes completely. The hang, which requires key strokes again, happens > > > > consistently at the same place: > > > > > > > > The notifier call in kernel/cpu.c::_cpu_up() > > > > > > > >ret = __raw_notifier_call_chain(_chain, CPU_UP_PREPARE | > > > > mod, hcpu, > > > > -1, _calls); > > > > > > > > does not return, but _all_ registered notifiers are called and reach > > > > their return statement. This reminds me on: > > > > > > > > http://lkml.org/lkml/2007/5/9/46 > > > > > > > > Sigh. I have no clue where to dig further. > > > > > > Well, the above may affect SMP systems, but the Vaio is UP. Hmm? > > > > My jinxed VAIO variant is SMP, but it looks like the same mysterious > > error. > > Hm. Have you tried > > # echo test > /sys/power/disk > # echo disk > /sys/power/state > > (should suspend devices and disable the nonboot CPUs, wait for 5 sec. and > restore everything)? Works fine, but I need to reboot into a non debug kernel to verify. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thursday, 20 September 2007 15:53, Thomas Gleixner wrote: > On Thu, 2007-09-20 at 16:12 +0200, Rafael J. Wysocki wrote: > > > Vs. the suspend / resume wreckage of rc6-mm1 / rc6-hrt2: > > > > ie. the one on the Vaio (I assume). > > > > > I'm still fishing in rather dark water. Depending on the added > > > instrumentation points the problem mutates up to the point where it > > > vanishes completely. The hang, which requires key strokes again, happens > > > consistently at the same place: > > > > > > The notifier call in kernel/cpu.c::_cpu_up() > > > > > >ret = __raw_notifier_call_chain(_chain, CPU_UP_PREPARE | mod, > > > hcpu, > > > -1, _calls); > > > > > > does not return, but _all_ registered notifiers are called and reach > > > their return statement. This reminds me on: > > > > > > http://lkml.org/lkml/2007/5/9/46 > > > > > > Sigh. I have no clue where to dig further. > > > > Well, the above may affect SMP systems, but the Vaio is UP. Hmm? > > My jinxed VAIO variant is SMP, but it looks like the same mysterious > error. Hm. Have you tried # echo test > /sys/power/disk # echo disk > /sys/power/state (should suspend devices and disable the nonboot CPUs, wait for 5 sec. and restore everything)? Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 2007-09-20 at 16:12 +0200, Rafael J. Wysocki wrote: > > Vs. the suspend / resume wreckage of rc6-mm1 / rc6-hrt2: > > ie. the one on the Vaio (I assume). > > > I'm still fishing in rather dark water. Depending on the added > > instrumentation points the problem mutates up to the point where it > > vanishes completely. The hang, which requires key strokes again, happens > > consistently at the same place: > > > > The notifier call in kernel/cpu.c::_cpu_up() > > > >ret = __raw_notifier_call_chain(_chain, CPU_UP_PREPARE | mod, > > hcpu, > > -1, _calls); > > > > does not return, but _all_ registered notifiers are called and reach > > their return statement. This reminds me on: > > > > http://lkml.org/lkml/2007/5/9/46 > > > > Sigh. I have no clue where to dig further. > > Well, the above may affect SMP systems, but the Vaio is UP. Hmm? My jinxed VAIO variant is SMP, but it looks like the same mysterious error. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thursday, 20 September 2007 15:43, Thomas Gleixner wrote: > On Thu, 2007-09-20 at 15:29 +0200, Rafael J. Wysocki wrote: > > > > I haven't had the time to check if any special command line arguments > > > > help. > > > > Will check tomorrow. > > > > > > Can you please disable the patches, which I sent Linus wards: > > > > > > timekeeping-access-rtc-outside-xtime-lock.patch > > > xtime-supsend-resume-fixup.patch > > > acpi-reevaluate-c-p-t-states.patch > > > clockevents-enforce-broadcast-on-resume.patch > > > clockevents-do-not-shutdown-broadcast-device-in-oneshot-mode.patch > > > clockevents-prevent-stale-tick-update-on-offline-cpu.patch > > > > I have skipped all of them, but the resulting kernel behaves in the same > > way (ie. doesn't boot). > > > > > Without those patches you get the state of rc4-mm1. It would be > > > interesting to know which one interferes with the acpi stuff. > > > > It looks like something else went in between -rc4 and -rc6 that broke your > > patch. I wonder what it might be ... > > Hmm. Can you please go back in the -hrt project history: > http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2 > http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 Sure, but it'll take some time. :-) > Also, can you send me your .config file please ? Attached is the one I'm using on 2.6.23-rc6 w/ your patches. > Vs. the suspend / resume wreckage of rc6-mm1 / rc6-hrt2: ie. the one on the Vaio (I assume). > I'm still fishing in rather dark water. Depending on the added > instrumentation points the problem mutates up to the point where it > vanishes completely. The hang, which requires key strokes again, happens > consistently at the same place: > > The notifier call in kernel/cpu.c::_cpu_up() > >ret = __raw_notifier_call_chain(_chain, CPU_UP_PREPARE | mod, hcpu, > -1, _calls); > > does not return, but _all_ registered notifiers are called and reach > their return statement. This reminds me on: > > http://lkml.org/lkml/2007/5/9/46 > > Sigh. I have no clue where to dig further. Well, the above may affect SMP systems, but the Vaio is UP. Hmm? Greetings, Rafael # # Automatically generated make config: don't edit # Linux kernel version: 2.6.23-rc6-hrt # Thu Sep 20 14:26:03 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_NONIRQ_WAKEUP=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_NR_QUICK=2 CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y # CONFIG_TASK_XACCT is not set # CONFIG_USER_NS is not set CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 CONFIG_CPUSETS=y CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_MODVERSIONS=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_BLK_DEV_BSG is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" # # Processor type and features # # CONFIG_TICK_ONESHOT is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 2007-09-20 at 15:29 +0200, Rafael J. Wysocki wrote: > > > I haven't had the time to check if any special command line arguments > > > help. > > > Will check tomorrow. > > > > Can you please disable the patches, which I sent Linus wards: > > > > timekeeping-access-rtc-outside-xtime-lock.patch > > xtime-supsend-resume-fixup.patch > > acpi-reevaluate-c-p-t-states.patch > > clockevents-enforce-broadcast-on-resume.patch > > clockevents-do-not-shutdown-broadcast-device-in-oneshot-mode.patch > > clockevents-prevent-stale-tick-update-on-offline-cpu.patch > > I have skipped all of them, but the resulting kernel behaves in the same > way (ie. doesn't boot). > > > Without those patches you get the state of rc4-mm1. It would be > > interesting to know which one interferes with the acpi stuff. > > It looks like something else went in between -rc4 and -rc6 that broke your > patch. I wonder what it might be ... Hmm. Can you please go back in the -hrt project history: http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2 http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 Also, can you send me your .config file please ? Vs. the suspend / resume wreckage of rc6-mm1 / rc6-hrt2: I'm still fishing in rather dark water. Depending on the added instrumentation points the problem mutates up to the point where it vanishes completely. The hang, which requires key strokes again, happens consistently at the same place: The notifier call in kernel/cpu.c::_cpu_up() ret = __raw_notifier_call_chain(_chain, CPU_UP_PREPARE | mod, hcpu, -1, _calls); does not return, but _all_ registered notifiers are called and reach their return statement. This reminds me on: http://lkml.org/lkml/2007/5/9/46 Sigh. I have no clue where to dig further. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thursday, 20 September 2007 08:18, Thomas Gleixner wrote: > On Thu, 2007-09-20 at 02:06 +0200, Rafael J. Wysocki wrote: > > On Wednesday, 19 September 2007 21:21, Thomas Gleixner wrote: > > > On Wed, 2007-09-19 at 19:44 +0200, Rafael J. Wysocki wrote: > > > > > > It boots with nohpet alone and suspend/hibernation seem to work > > > > > > (still, > > > > > > it didn't want to boot right after hibernation, but booted after > > > > > > I'd switched > > > > > > it off/on manually). > > > > > > > > > > Can you please check, whether > > > > > > > > > > http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch > > > > > > > > > > works for you ? > > > > > > > > Nope. It's a total disaster. :-( > > > > > > True. I have instrumented it to the point where the broadcast device is > > > programmed, but no interrupt comes in for totally unknown reasons. > > > > > > > Doesn't boot at all, even with "noacpitimer nohpet", and that's with > > > > NO_HZ and HIGH_RES_TIMERS unset. > > > > > > > If you have a bisectable patch series, I can try to identify the > > > > responsible > > > > patch. > > > > > > http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patches.tar.bz2 > > > > > > The first patches in the queue are the mainline fixups. > > > > It's x86_64-convert-to-clockevents.patch (ie. after applying it the box > > stops > > to boot). > > > > I haven't had the time to check if any special command line arguments help. > > Will check tomorrow. > > Can you please disable the patches, which I sent Linus wards: > > timekeeping-access-rtc-outside-xtime-lock.patch > xtime-supsend-resume-fixup.patch > acpi-reevaluate-c-p-t-states.patch > clockevents-enforce-broadcast-on-resume.patch > clockevents-do-not-shutdown-broadcast-device-in-oneshot-mode.patch > clockevents-prevent-stale-tick-update-on-offline-cpu.patch I have skipped all of them, but the resulting kernel behaves in the same way (ie. doesn't boot). > Without those patches you get the state of rc4-mm1. It would be > interesting to know which one interferes with the acpi stuff. It looks like something else went in between -rc4 and -rc6 that broke your patch. I wonder what it might be ... Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 2007-09-20 at 02:06 +0200, Rafael J. Wysocki wrote: > On Wednesday, 19 September 2007 21:21, Thomas Gleixner wrote: > > On Wed, 2007-09-19 at 19:44 +0200, Rafael J. Wysocki wrote: > > > > > It boots with nohpet alone and suspend/hibernation seem to work > > > > > (still, > > > > > it didn't want to boot right after hibernation, but booted after I'd > > > > > switched > > > > > it off/on manually). > > > > > > > > Can you please check, whether > > > > > > > > http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch > > > > > > > > works for you ? > > > > > > Nope. It's a total disaster. :-( > > > > True. I have instrumented it to the point where the broadcast device is > > programmed, but no interrupt comes in for totally unknown reasons. > > > > > Doesn't boot at all, even with "noacpitimer nohpet", and that's with > > > NO_HZ and HIGH_RES_TIMERS unset. > > > > > If you have a bisectable patch series, I can try to identify the > > > responsible > > > patch. > > > > http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patches.tar.bz2 > > > > The first patches in the queue are the mainline fixups. > > It's x86_64-convert-to-clockevents.patch (ie. after applying it the box stops > to boot). > > I haven't had the time to check if any special command line arguments help. > Will check tomorrow. Can you please disable the patches, which I sent Linus wards: timekeeping-access-rtc-outside-xtime-lock.patch xtime-supsend-resume-fixup.patch acpi-reevaluate-c-p-t-states.patch clockevents-enforce-broadcast-on-resume.patch clockevents-do-not-shutdown-broadcast-device-in-oneshot-mode.patch clockevents-prevent-stale-tick-update-on-offline-cpu.patch Without those patches you get the state of rc4-mm1. It would be interesting to know which one interferes with the acpi stuff. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 2007-09-20 at 02:06 +0200, Rafael J. Wysocki wrote: On Wednesday, 19 September 2007 21:21, Thomas Gleixner wrote: On Wed, 2007-09-19 at 19:44 +0200, Rafael J. Wysocki wrote: It boots with nohpet alone and suspend/hibernation seem to work (still, it didn't want to boot right after hibernation, but booted after I'd switched it off/on manually). Can you please check, whether http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch works for you ? Nope. It's a total disaster. :-( True. I have instrumented it to the point where the broadcast device is programmed, but no interrupt comes in for totally unknown reasons. Doesn't boot at all, even with noacpitimer nohpet, and that's with NO_HZ and HIGH_RES_TIMERS unset. If you have a bisectable patch series, I can try to identify the responsible patch. http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patches.tar.bz2 The first patches in the queue are the mainline fixups. It's x86_64-convert-to-clockevents.patch (ie. after applying it the box stops to boot). I haven't had the time to check if any special command line arguments help. Will check tomorrow. Can you please disable the patches, which I sent Linus wards: timekeeping-access-rtc-outside-xtime-lock.patch xtime-supsend-resume-fixup.patch acpi-reevaluate-c-p-t-states.patch clockevents-enforce-broadcast-on-resume.patch clockevents-do-not-shutdown-broadcast-device-in-oneshot-mode.patch clockevents-prevent-stale-tick-update-on-offline-cpu.patch Without those patches you get the state of rc4-mm1. It would be interesting to know which one interferes with the acpi stuff. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thursday, 20 September 2007 08:18, Thomas Gleixner wrote: On Thu, 2007-09-20 at 02:06 +0200, Rafael J. Wysocki wrote: On Wednesday, 19 September 2007 21:21, Thomas Gleixner wrote: On Wed, 2007-09-19 at 19:44 +0200, Rafael J. Wysocki wrote: It boots with nohpet alone and suspend/hibernation seem to work (still, it didn't want to boot right after hibernation, but booted after I'd switched it off/on manually). Can you please check, whether http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch works for you ? Nope. It's a total disaster. :-( True. I have instrumented it to the point where the broadcast device is programmed, but no interrupt comes in for totally unknown reasons. Doesn't boot at all, even with noacpitimer nohpet, and that's with NO_HZ and HIGH_RES_TIMERS unset. If you have a bisectable patch series, I can try to identify the responsible patch. http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patches.tar.bz2 The first patches in the queue are the mainline fixups. It's x86_64-convert-to-clockevents.patch (ie. after applying it the box stops to boot). I haven't had the time to check if any special command line arguments help. Will check tomorrow. Can you please disable the patches, which I sent Linus wards: timekeeping-access-rtc-outside-xtime-lock.patch xtime-supsend-resume-fixup.patch acpi-reevaluate-c-p-t-states.patch clockevents-enforce-broadcast-on-resume.patch clockevents-do-not-shutdown-broadcast-device-in-oneshot-mode.patch clockevents-prevent-stale-tick-update-on-offline-cpu.patch I have skipped all of them, but the resulting kernel behaves in the same way (ie. doesn't boot). Without those patches you get the state of rc4-mm1. It would be interesting to know which one interferes with the acpi stuff. It looks like something else went in between -rc4 and -rc6 that broke your patch. I wonder what it might be ... Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 2007-09-20 at 15:29 +0200, Rafael J. Wysocki wrote: I haven't had the time to check if any special command line arguments help. Will check tomorrow. Can you please disable the patches, which I sent Linus wards: timekeeping-access-rtc-outside-xtime-lock.patch xtime-supsend-resume-fixup.patch acpi-reevaluate-c-p-t-states.patch clockevents-enforce-broadcast-on-resume.patch clockevents-do-not-shutdown-broadcast-device-in-oneshot-mode.patch clockevents-prevent-stale-tick-update-on-offline-cpu.patch I have skipped all of them, but the resulting kernel behaves in the same way (ie. doesn't boot). Without those patches you get the state of rc4-mm1. It would be interesting to know which one interferes with the acpi stuff. It looks like something else went in between -rc4 and -rc6 that broke your patch. I wonder what it might be ... Hmm. Can you please go back in the -hrt project history: http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2 http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 Also, can you send me your .config file please ? Vs. the suspend / resume wreckage of rc6-mm1 / rc6-hrt2: I'm still fishing in rather dark water. Depending on the added instrumentation points the problem mutates up to the point where it vanishes completely. The hang, which requires key strokes again, happens consistently at the same place: The notifier call in kernel/cpu.c::_cpu_up() ret = __raw_notifier_call_chain(cpu_chain, CPU_UP_PREPARE | mod, hcpu, -1, nr_calls); does not return, but _all_ registered notifiers are called and reach their return statement. This reminds me on: http://lkml.org/lkml/2007/5/9/46 Sigh. I have no clue where to dig further. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 2007-09-20 at 16:12 +0200, Rafael J. Wysocki wrote: Vs. the suspend / resume wreckage of rc6-mm1 / rc6-hrt2: ie. the one on the Vaio (I assume). I'm still fishing in rather dark water. Depending on the added instrumentation points the problem mutates up to the point where it vanishes completely. The hang, which requires key strokes again, happens consistently at the same place: The notifier call in kernel/cpu.c::_cpu_up() ret = __raw_notifier_call_chain(cpu_chain, CPU_UP_PREPARE | mod, hcpu, -1, nr_calls); does not return, but _all_ registered notifiers are called and reach their return statement. This reminds me on: http://lkml.org/lkml/2007/5/9/46 Sigh. I have no clue where to dig further. Well, the above may affect SMP systems, but the Vaio is UP. Hmm? My jinxed VAIO variant is SMP, but it looks like the same mysterious error. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thursday, 20 September 2007 15:43, Thomas Gleixner wrote: On Thu, 2007-09-20 at 15:29 +0200, Rafael J. Wysocki wrote: I haven't had the time to check if any special command line arguments help. Will check tomorrow. Can you please disable the patches, which I sent Linus wards: timekeeping-access-rtc-outside-xtime-lock.patch xtime-supsend-resume-fixup.patch acpi-reevaluate-c-p-t-states.patch clockevents-enforce-broadcast-on-resume.patch clockevents-do-not-shutdown-broadcast-device-in-oneshot-mode.patch clockevents-prevent-stale-tick-update-on-offline-cpu.patch I have skipped all of them, but the resulting kernel behaves in the same way (ie. doesn't boot). Without those patches you get the state of rc4-mm1. It would be interesting to know which one interferes with the acpi stuff. It looks like something else went in between -rc4 and -rc6 that broke your patch. I wonder what it might be ... Hmm. Can you please go back in the -hrt project history: http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2 http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 Sure, but it'll take some time. :-) Also, can you send me your .config file please ? Attached is the one I'm using on 2.6.23-rc6 w/ your patches. Vs. the suspend / resume wreckage of rc6-mm1 / rc6-hrt2: ie. the one on the Vaio (I assume). I'm still fishing in rather dark water. Depending on the added instrumentation points the problem mutates up to the point where it vanishes completely. The hang, which requires key strokes again, happens consistently at the same place: The notifier call in kernel/cpu.c::_cpu_up() ret = __raw_notifier_call_chain(cpu_chain, CPU_UP_PREPARE | mod, hcpu, -1, nr_calls); does not return, but _all_ registered notifiers are called and reach their return statement. This reminds me on: http://lkml.org/lkml/2007/5/9/46 Sigh. I have no clue where to dig further. Well, the above may affect SMP systems, but the Vaio is UP. Hmm? Greetings, Rafael # # Automatically generated make config: don't edit # Linux kernel version: 2.6.23-rc6-hrt # Thu Sep 20 14:26:03 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_NONIRQ_WAKEUP=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_NR_QUICK=2 CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y # CONFIG_TASK_XACCT is not set # CONFIG_USER_NS is not set CONFIG_AUDIT=y CONFIG_AUDITSYSCALL=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 CONFIG_CPUSETS=y CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLAB=y # CONFIG_SLUB is not set # CONFIG_SLOB is not set CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_MODVERSIONS=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_BLK_DEV_BSG is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=cfq # # Processor type and features # # CONFIG_TICK_ONESHOT is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_X86_PC=y # CONFIG_X86_VSMP is not set CONFIG_MK8=y #
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thursday, 20 September 2007 15:53, Thomas Gleixner wrote: On Thu, 2007-09-20 at 16:12 +0200, Rafael J. Wysocki wrote: Vs. the suspend / resume wreckage of rc6-mm1 / rc6-hrt2: ie. the one on the Vaio (I assume). I'm still fishing in rather dark water. Depending on the added instrumentation points the problem mutates up to the point where it vanishes completely. The hang, which requires key strokes again, happens consistently at the same place: The notifier call in kernel/cpu.c::_cpu_up() ret = __raw_notifier_call_chain(cpu_chain, CPU_UP_PREPARE | mod, hcpu, -1, nr_calls); does not return, but _all_ registered notifiers are called and reach their return statement. This reminds me on: http://lkml.org/lkml/2007/5/9/46 Sigh. I have no clue where to dig further. Well, the above may affect SMP systems, but the Vaio is UP. Hmm? My jinxed VAIO variant is SMP, but it looks like the same mysterious error. Hm. Have you tried # echo test /sys/power/disk # echo disk /sys/power/state (should suspend devices and disable the nonboot CPUs, wait for 5 sec. and restore everything)? Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 2007-09-20 at 16:50 +0200, Thomas Gleixner wrote: Well, the above may affect SMP systems, but the Vaio is UP. Hmm? My jinxed VAIO variant is SMP, but it looks like the same mysterious error. Hm. Have you tried # echo test /sys/power/disk # echo disk /sys/power/state (should suspend devices and disable the nonboot CPUs, wait for 5 sec. and restore everything)? Works fine, but I need to reboot into a non debug kernel to verify. Works as well. What's the difference between this and the real thing ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thursday, 20 September 2007 17:49, Thomas Gleixner wrote: On Thu, 2007-09-20 at 16:50 +0200, Thomas Gleixner wrote: Well, the above may affect SMP systems, but the Vaio is UP. Hmm? My jinxed VAIO variant is SMP, but it looks like the same mysterious error. Hm. Have you tried # echo test /sys/power/disk # echo disk /sys/power/state (should suspend devices and disable the nonboot CPUs, wait for 5 sec. and restore everything)? Works fine, but I need to reboot into a non debug kernel to verify. Works as well. What's the difference between this and the real thing ? The real thing also calls device_power_down(PMSG_FREEZE), which is a counterpart of sysdev_shutdown(), more or less, and I think that's what goes belly up. You can use the patch below (on top of -rc6-mm1), which just disables the image creation (that should be irrelevant anyway) and see what happens. Greetings, Rafael --- kernel/power/disk.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) Index: linux-2.6.23-rc6-mm1/kernel/power/disk.c === --- linux-2.6.23-rc6-mm1.orig/kernel/power/disk.c +++ linux-2.6.23-rc6-mm1/kernel/power/disk.c @@ -168,13 +168,14 @@ int create_image(int platform_mode) } save_processor_state(); - error = swsusp_arch_suspend(); - if (error) - printk(KERN_ERR Error %d while creating the image\n, error); + //error = swsusp_arch_suspend(); + //if (error) + // printk(KERN_ERR Error %d while creating the image\n, error); /* Restore control flow magically appears here */ restore_processor_state(); - if (!in_suspend) - platform_leave(platform_mode); + //if (!in_suspend) + // platform_leave(platform_mode); + in_suspend = 0; /* NOTE: device_power_up() is just a resume() for devices * that suspended with irqs off ... no overall powerup. */ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thursday, 20 September 2007 16:12, Rafael J. Wysocki wrote: On Thursday, 20 September 2007 15:43, Thomas Gleixner wrote: On Thu, 2007-09-20 at 15:29 +0200, Rafael J. Wysocki wrote: I haven't had the time to check if any special command line arguments help. Will check tomorrow. Can you please disable the patches, which I sent Linus wards: timekeeping-access-rtc-outside-xtime-lock.patch xtime-supsend-resume-fixup.patch acpi-reevaluate-c-p-t-states.patch clockevents-enforce-broadcast-on-resume.patch clockevents-do-not-shutdown-broadcast-device-in-oneshot-mode.patch clockevents-prevent-stale-tick-update-on-offline-cpu.patch I have skipped all of them, but the resulting kernel behaves in the same way (ie. doesn't boot). Without those patches you get the state of rc4-mm1. It would be interesting to know which one interferes with the acpi stuff. It looks like something else went in between -rc4 and -rc6 that broke your patch. I wonder what it might be ... Hmm. Can you please go back in the -hrt project history: http://tglx.de/projects/hrtimers/2.6.23-rc5/patch-2.6.23-rc5-hrt1.patches.tar.bz2 http://tglx.de/projects/hrtimers/2.6.23-rc4/patch-2.6.23-rc4-hrt1.patches.tar.bz2 Each of them on top of 2.6.23-rc6 gives the same symptoms as rc6-hrt2 (ie. the box doesn't boot). I'm going to check if -rc5 with patch-2.6.23-rc4-hrt1 on top of it works and if not (I suspect so), I'll bisect the Linus' tree between -rc4 and -rc5 in order to identify the responsible patch. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Thu, 2007-09-20 at 22:39 +0200, Rafael J. Wysocki wrote: Works as well. What's the difference between this and the real thing ? The real thing also calls device_power_down(PMSG_FREEZE), which is a counterpart of sysdev_shutdown(), more or less, and I think that's what goes belly up. You can use the patch below (on top of -rc6-mm1), which just disables the image creation (that should be irrelevant anyway) and see what happens. In meantime I figured out what's happening. The ordering in hibernate_snapshot() is wrong. It does: swsusp_shrink_memory(); suspend_console(); device_suspend(PMSG_FREEZE); platform_prepare(platform_mode); disable_nonboot_cpus(); swsusp_suspend(); enable_nonboot_cpus(); platform_finish(platform_mode); device_resume(); resume_console(); We disable everything in device_suspend() including timekeeping, so any code which is depending on working timekeeping and timer functionality (which is suspended in timekeeping_suspend() as well) is busted. enable_nonboot_cpus() definitely relies on working timekeeping and timers depending on the codepath. It's just a surprise that this did not blow up earlier (also before clock events). I changed the ordering of the above to: disable_nonboot_cpus(); swsusp_shrink_memory(); suspend_console(); device_suspend(PMSG_FREEZE); platform_prepare(platform_mode); swsusp_suspend(); platform_finish(platform_mode); device_resume(); resume_console(); enable_nonboot_cpus(); and non-surprisingly the my VAIO needs help from keyboard problem went away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not work at all on my VAIO due to some yet not identified wreckage) I did not yet look into the suspend to ram code, but I guess that there is an equivalent problem. But I have no idea why this affects Andrews jinxed VAIO (UP machine), though I suspect that we have more timekeeping/timer depending code somewhere waiting to bite us. Also I still need to debug why the HIBERNATION_TEST code path (which has a msleep(5000) in it) does not fail, but I postpone this until tomorrow morning. I'm dead tired after hunting this Heisenbug which changes with every other printk added to the code. I'm going to add some really noisy messages for everything which accesses timekeeping / timers _after_ those systems have been shut down. We really need to fix this once and forever _before_ 2.6.23 final, even if it requires a -rc8. Thanks, tglx --- a/kernel/power/disk.c 2007-09-11 09:25:24.0 +0200 +++ b/kernel/power/disk.c 2007-09-20 22:47:30.0 +0200 @@ -130,10 +130,14 @@ int hibernation_snapshot(int platform_mo { int error; + error = disable_nonboot_cpus(); + if (error) + goto resume_cpus; + /* Free memory before shutting down devices. */ error = swsusp_shrink_memory(); if (error) - return error; + goto resume_cpus; suspend_console(); error = device_suspend(PMSG_FREEZE); @@ -144,23 +148,22 @@ int hibernation_snapshot(int platform_mo if (error) goto Resume_devices; - error = disable_nonboot_cpus(); - if (!error) { - if (hibernation_mode != HIBERNATION_TEST) { - in_suspend = 1; - error = swsusp_suspend(); - /* Control returns here after successful restore */ - } else { - printk(swsusp debug: Waiting for 5 seconds.\n); - mdelay(5000); - } + if (hibernation_mode != HIBERNATION_TEST) { + in_suspend = 1; + error = swsusp_suspend(); + /* Control returns here after successful restore */ + } else { + printk(swsusp debug: Waiting for 5 seconds.\n); + mdelay(5000); } - enable_nonboot_cpus(); + Resume_devices: platform_finish(platform_mode); device_resume(); Resume_console: resume_console(); +resume_cpus: + enable_nonboot_cpus(); return error; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Thomas, On Thursday, 20 September 2007 23:08, Thomas Gleixner wrote: Rafael, On Thu, 2007-09-20 at 22:39 +0200, Rafael J. Wysocki wrote: Works as well. What's the difference between this and the real thing ? The real thing also calls device_power_down(PMSG_FREEZE), which is a counterpart of sysdev_shutdown(), more or less, and I think that's what goes belly up. You can use the patch below (on top of -rc6-mm1), which just disables the image creation (that should be irrelevant anyway) and see what happens. In meantime I figured out what's happening. The ordering in hibernate_snapshot() is wrong. It does: swsusp_shrink_memory(); suspend_console(); device_suspend(PMSG_FREEZE); platform_prepare(platform_mode); disable_nonboot_cpus(); swsusp_suspend(); enable_nonboot_cpus(); platform_finish(platform_mode); device_resume(); resume_console(); We disable everything in device_suspend() No, we don't. sysdevs are _not_ suspended in device_suspend(). They are suspended in device_power_down(), which is called _after_ disable_nonboot_cpus() (from swsusp_suspend()). including timekeeping, No, the timekeeping is suspended in device_power_down() (or at least it should be). so any code which is depending on working timekeeping and timer functionality (which is suspended in timekeeping_suspend() as well) is busted. enable_nonboot_cpus() definitely relies on working timekeeping and timers depending on the codepath. It's just a surprise that this did not blow up earlier (also before clock events). I changed the ordering of the above to: disable_nonboot_cpus(); swsusp_shrink_memory(); suspend_console(); device_suspend(PMSG_FREEZE); platform_prepare(platform_mode); swsusp_suspend(); platform_finish(platform_mode); device_resume(); resume_console(); enable_nonboot_cpus(); Actually, we can't do this here, because of ACPI and some interrupt handling related problems. Unfortunately, platform_finish() needs to go _after_ enable_nonboot_cpus() and device_resume() needs to go after platform_finish(). Analogously, disable_nonboot_cpus() has to go after platform_prepare(). Otherwise, some systems will break. and non-surprisingly the my VAIO needs help from keyboard problem went away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not work at all on my VAIO due to some yet not identified wreckage) Hm, I really don't know why it helps, but that's not because of the timekeeping suspend, IMO. I did not yet look into the suspend to ram code, but I guess that there is an equivalent problem. Yes, the code ordering is the same, but it's not totally wrong, IMHO. But I have no idea why this affects Andrews jinxed VAIO (UP machine), though I suspect that we have more timekeeping/timer depending code somewhere waiting to bite us. That's possible. Also I still need to debug why the HIBERNATION_TEST code path (which has a msleep(5000) in it) does not fail, See above. :-) but I postpone this until tomorrow morning. I'm dead tired after hunting this Heisenbug which changes with every other printk added to the code. I'm going to add some really noisy messages for everything which accesses timekeeping / timers _after_ those systems have been shut down. We really need to fix this once and forever _before_ 2.6.23 final, even if it requires a -rc8. Agreed. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 20 Sep 2007, Thomas Gleixner wrote: In meantime I figured out what's happening. The ordering in hibernate_snapshot() is wrong. It does: Hmm. This is close to the ordering we have in STR too. I have some dim memory of there being some ACPI reason why it had to be done that way. In fact, this was done in commit e3c7db621bed4afb8e231cb005057f2feb5db557, long ago, by Rafael: As indicated in a recent thread on Linux-PM, it's necessary to call pm_ops-finish() before devce_resume(), but enable_nonboot_cpus() has to be called before pm_ops-finish() (cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). For consistency, it seems reasonable to call disable_nonboot_cpus() after device_suspend(). This way the suspend code will remain symmetrical with respect to the resume code and it may allow us to speed up things in the future by suspending and resuming devices and/or saving the suspend image in many threads. The following series of patches reorders the suspend and resume code so that nonboot CPUs are disabled after devices have been suspended and enabled before the devices are resumed. It also causes pm_ops-finish() to be called after enable_nonboot_cpus() wherever necessary. Hmm? It's entirely possible that that commit was simply just buggy, and we should indeed move the CPU down/up to be early/late - we've fixed other ordering issues since that commit went in. But this whole area is very murky. (Btw, the above commit message points to just my response with a testing patch to the real email: the actual explanation of the INSANE ordering is from Len Brown in https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html and there Len claims that we *must* wake up CPU's early). I personally think that the whole ACPI ordering requirements are just insane, but the point of this email is to point these different requirements out, and hopefully we can get something that works for everybody. Len added to Cc. Len? Thomas wants to call 'disable_nonboot_cpus()' early, and 'enable_nonboot_cpus()' late. Can you explain why that is wrong? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thursday, 20 September 2007 23:35, Linus Torvalds wrote: On Thu, 20 Sep 2007, Thomas Gleixner wrote: In meantime I figured out what's happening. The ordering in hibernate_snapshot() is wrong. It does: Actually, this is incorrect. Please read my reply to Thomas, just sent. Hmm. This is close to the ordering we have in STR too. I have some dim memory of there being some ACPI reason why it had to be done that way. Yes. We're executing _INI from the CPU initialization code and that shouldn't be done after _WAK, which is called from platform_finish(). In fact, this was done in commit e3c7db621bed4afb8e231cb005057f2feb5db557, long ago, by Rafael: As indicated in a recent thread on Linux-PM, it's necessary to call pm_ops-finish() before devce_resume(), but enable_nonboot_cpus() has to be called before pm_ops-finish() (cf. http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html). For consistency, it seems reasonable to call disable_nonboot_cpus() after device_suspend(). This way the suspend code will remain symmetrical with respect to the resume code and it may allow us to speed up things in the future by suspending and resuming devices and/or saving the suspend image in many threads. The following series of patches reorders the suspend and resume code so that nonboot CPUs are disabled after devices have been suspended and enabled before the devices are resumed. It also causes pm_ops-finish() to be called after enable_nonboot_cpus() wherever necessary. Hmm? It's entirely possible that that commit was simply just buggy, and we should indeed move the CPU down/up to be early/late - we've fixed other ordering issues since that commit went in. But this whole area is very murky. (Btw, the above commit message points to just my response with a testing patch to the real email: the actual explanation of the INSANE ordering is from Len Brown in https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html and there Len claims that we *must* wake up CPU's early). I personally think that the whole ACPI ordering requirements are just insane, but the point of this email is to point these different requirements out, and hopefully we can get something that works for everybody. Sure. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Thu, 2007-09-20 at 23:45 +0200, Rafael J. Wysocki wrote: We disable everything in device_suspend() No, we don't. sysdevs are _not_ suspended in device_suspend(). They are suspended in device_power_down(), which is called _after_ disable_nonboot_cpus() (from swsusp_suspend()). including timekeeping, No, the timekeeping is suspended in device_power_down() (or at least it should be). Damn, you are right. Reading through 30 different logs confused me. enable_nonboot_cpus(); Actually, we can't do this here, because of ACPI and some interrupt handling related problems. Unfortunately, platform_finish() needs to go _after_ enable_nonboot_cpus() and device_resume() needs to go after platform_finish(). Analogously, disable_nonboot_cpus() has to go after platform_prepare(). Otherwise, some systems will break. Well, I don't buy this one. The system would break in the same way, when I take CPU#1 offline before I initiate the suspend. and non-surprisingly the my VAIO needs help from keyboard problem went away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not work at all on my VAIO due to some yet not identified wreckage) Hm, I really don't know why it helps, but that's not because of the timekeeping suspend, IMO. It is related. We rely on some subtle thing which is not up when we resume the non boot cpu. I did not yet look into the suspend to ram code, but I guess that there is an equivalent problem. Yes, the code ordering is the same, but it's not totally wrong, IMHO. But I have no idea why this affects Andrews jinxed VAIO (UP machine), though I suspect that we have more timekeeping/timer depending code somewhere waiting to bite us. That's possible. Also I still need to debug why the HIBERNATION_TEST code path (which has a msleep(5000) in it) does not fail, See above. :-) Yes. It makes sense. When I change the TEST code path to: - printk(swsusp debug: Waiting for 5 seconds.\n); - msleep(5000); + printk(swsusp debug: before swsusp_suspend\n); + error = swsusp_suspend(); then I have the same effect as I get from real hibernation. And we actually shut down time keeping somewhere in that code path. ACPI: PCI interrupt for device :00:1b.0 disabled swsusp debug: before swsusp_suspend Suspend timekeeping swsusp: critical section: swsusp: Need to copy 112429 pages swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876 swsusp: critical section: done (112429 pages copied) Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Resume timekeeping ACPI: PCI Interrupt :00:02.0[A] - GSI 16 (level, low) - IRQ 16 - works fine This is with my patch applied. Without that I get: CPU1 is down swsusp debug: before swsusp_suspend Suspend timekeeping swsusp: critical section: swsusp: Need to copy 112429 pages swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876 swsusp: critical section: done (112429 pages copied) Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Resume timekeeping Enabling non-boot CPUs -- Waits for ever until a key is pressed Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thu, 20 Sep 2007, Linus Torvalds wrote: (Btw, the above commit message points to just my response with a testing patch to the real email: the actual explanation of the INSANE ordering is from Len Brown in https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html and there Len claims that we *must* wake up CPU's early). ..and points to commit 1a38416cea8ac801ae8f261074721f35317613dc which in turn talks about http://bugzilla.kernel.org/show_bug.cgi?id=5651 Howerver, it seems that bugzilla entry may just be bogus. It talks about it appears that some firmware in the future may depend on that sequence for correction operation Len, Shaohua, what are the real issues here? It would indeed be nice if we could just take CPU's down early (while everything is working), and run the whole suspend code with just one CPU, rather than having to worry about the ordering between CPU and device takedown. That said, at least with STR, the situation is: 1) suspend_console 2) device_suspend(PMSG_SUSPEND)(== -suspend) 3) disable_nonboot_cpus() 4) device_power_down(PMSG_SUSPEND) (== -suspend_late) 5) pm_ops-enter() 6) device_power_up() (== -resume_early) 7) enable_nonboot_cpus() 8) pm_finish() 9) device_resume() (== -resume 10) resume_console So if we agree that things like timers etc should *never* be suspended by the early suspend, and *always* use suspend_late/resume_early, then at least STR should be ok. And I think that's a damn reasonable thing to agree on: timers (and anything else that CPU shutdown/bringup could *possibly* care about) should be considered core enough that they had better be on the suspend_late/resume_early list. Thomas, Rafael, can you verify that at least STR is ok in this respect? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Rafael, On Thu, 2007-09-20 at 23:54 +0200, Rafael J. Wysocki wrote: Hmm. This is close to the ordering we have in STR too. I have some dim memory of there being some ACPI reason why it had to be done that way. Yes. We're executing _INI from the CPU initialization code and that shouldn't be done after _WAK, which is called from platform_finish(). If I tear down CPU#1 right before I tell the kernel to hibernate, then the box must explode in the same way. It does not. On none of 4 tested laptops. Of course only the jinxed VAIO one exposes the please press a key problem. I need to follow down the swsusp_suspend() code path to figure out, why this breaks the box. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Linus, On Thu, 2007-09-20 at 14:55 -0700, Linus Torvalds wrote: And I think that's a damn reasonable thing to agree on: timers (and anything else that CPU shutdown/bringup could *possibly* care about) should be considered core enough that they had better be on the suspend_late/resume_early list. Thomas, Rafael, can you verify that at least STR is ok in this respect? -ETOOTIRED led me too a wrong conclusion, but still it is a valuable hint that this change is making things work again. I need to go down into the details of the swsusp_suspend() code path to figure out, what's the root cause. Sorry for the noise, but I'm zooming in. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Friday, 21 September 2007 00:05, Thomas Gleixner wrote: Linus, On Thu, 2007-09-20 at 14:55 -0700, Linus Torvalds wrote: And I think that's a damn reasonable thing to agree on: timers (and anything else that CPU shutdown/bringup could *possibly* care about) should be considered core enough that they had better be on the suspend_late/resume_early list. Thomas, Rafael, can you verify that at least STR is ok in this respect? -ETOOTIRED led me too a wrong conclusion, but still it is a valuable hint that this change is making things work again. Yes, it is. I need to go down into the details of the swsusp_suspend() code path to figure out, what's the root cause. If you need any help from me with that, please let me know. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Thomas, On Thursday, 20 September 2007 23:53, Thomas Gleixner wrote: Rafael, On Thu, 2007-09-20 at 23:45 +0200, Rafael J. Wysocki wrote: We disable everything in device_suspend() No, we don't. sysdevs are _not_ suspended in device_suspend(). They are suspended in device_power_down(), which is called _after_ disable_nonboot_cpus() (from swsusp_suspend()). including timekeeping, No, the timekeeping is suspended in device_power_down() (or at least it should be). Damn, you are right. Reading through 30 different logs confused me. enable_nonboot_cpus(); Actually, we can't do this here, because of ACPI and some interrupt handling related problems. Unfortunately, platform_finish() needs to go _after_ enable_nonboot_cpus() and device_resume() needs to go after platform_finish(). Analogously, disable_nonboot_cpus() has to go after platform_prepare(). Otherwise, some systems will break. Well, I don't buy this one. The system would break in the same way, when I take CPU#1 offline before I initiate the suspend. I was referring to the resume part. If we call enable_nonboot_cpus(), which executes the _INI ACPI control method, after platform_finish(), which executes the _WAK global ACPI control method, things will break. That already happened in the past, when the code ordering was different, AFAICS. and non-surprisingly the my VAIO needs help from keyboard problem went away immediately. See patch below. (on top of rc7-hrt1, -mm1 does not work at all on my VAIO due to some yet not identified wreckage) Hm, I really don't know why it helps, but that's not because of the timekeeping suspend, IMO. It is related. We rely on some subtle thing which is not up when we resume the non boot cpu. Yes, it looks so. I did not yet look into the suspend to ram code, but I guess that there is an equivalent problem. Yes, the code ordering is the same, but it's not totally wrong, IMHO. But I have no idea why this affects Andrews jinxed VAIO (UP machine), though I suspect that we have more timekeeping/timer depending code somewhere waiting to bite us. That's possible. Also I still need to debug why the HIBERNATION_TEST code path (which has a msleep(5000) in it) does not fail, See above. :-) Yes. It makes sense. When I change the TEST code path to: - printk(swsusp debug: Waiting for 5 seconds.\n); - msleep(5000); + printk(swsusp debug: before swsusp_suspend\n); + error = swsusp_suspend(); then I have the same effect as I get from real hibernation. And we actually shut down time keeping somewhere in that code path. ACPI: PCI interrupt for device :00:1b.0 disabled swsusp debug: before swsusp_suspend Suspend timekeeping Exactly. timekeeping_suspend() is called from device_power_down(), which is called from swsusp_suspend() (after disabling interrupts). swsusp: critical section: swsusp: Need to copy 112429 pages swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876 swsusp: critical section: done (112429 pages copied) Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Resume timekeeping ACPI: PCI Interrupt :00:02.0[A] - GSI 16 (level, low) - IRQ 16 - works fine This is with my patch applied. Without that I get: CPU1 is down swsusp debug: before swsusp_suspend Suspend timekeeping swsusp: critical section: swsusp: Need to copy 112429 pages swsusp: Normal pages needed: 35399 + 1024 + 40, available pages: 193876 swsusp: critical section: done (112429 pages copied) Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Resume timekeeping Enabling non-boot CPUs -- Waits for ever until a key is pressed Well, perhaps there's something else that we should suspend late and resume early, but we don't? Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Thursday 20 September 2007 17:55, Linus Torvalds wrote: On Thu, 20 Sep 2007, Linus Torvalds wrote: (Btw, the above commit message points to just my response with a testing patch to the real email: the actual explanation of the INSANE ordering is from Len Brown in https://lists.linux-foundation.org/pipermail/linux-pm/2006-November/004161.html and there Len claims that we *must* wake up CPU's early). ..and points to commit 1a38416cea8ac801ae8f261074721f35317613dc which in turn talks about http://bugzilla.kernel.org/show_bug.cgi?id=5651 Howerver, it seems that bugzilla entry may just be bogus. It talks about it appears that some firmware in the future may depend on that sequence for correction operation Len, Shaohua, what are the real issues here? Intel's reference BIOS for Core Duo performs some re-initialization in _WAK that will get blow away if INIT follows _WAK. IIR, it is related to re-initializing the thermal sensors. I opened bug 5651 when the BIOS team informed me of this issue. Yes, bringing a processor offline and then online again w/o an intervening suspend or reset would not evaluate _WAK, and thus may still run into the issue. I don't know if this is a widespread issue and a commonly used BIOS hook, or if it is specific to certain processors. -Len It would indeed be nice if we could just take CPU's down early (while everything is working), and run the whole suspend code with just one CPU, rather than having to worry about the ordering between CPU and device takedown. That said, at least with STR, the situation is: 1) suspend_console 2) device_suspend(PMSG_SUSPEND) (== -suspend) 3) disable_nonboot_cpus() 4) device_power_down(PMSG_SUSPEND) (== -suspend_late) 5) pm_ops-enter() 6) device_power_up() (== -resume_early) 7) enable_nonboot_cpus() 8) pm_finish() 9) device_resume() (== -resume 10) resume_console So if we agree that things like timers etc should *never* be suspended by the early suspend, and *always* use suspend_late/resume_early, then at least STR should be ok. And I think that's a damn reasonable thing to agree on: timers (and anything else that CPU shutdown/bringup could *possibly* care about) should be considered core enough that they had better be on the suspend_late/resume_early list. Thomas, Rafael, can you verify that at least STR is ok in this respect? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
Linus Torvalds writes: It would indeed be nice if we could just take CPU's down early (while everything is working), and run the whole suspend code with just one CPU, rather than having to worry about the ordering between CPU and device takedown. That is certainly what we want to do on powerpc. Paul. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Wednesday, 19 September 2007 21:21, Thomas Gleixner wrote: > On Wed, 2007-09-19 at 19:44 +0200, Rafael J. Wysocki wrote: > > > > It boots with nohpet alone and suspend/hibernation seem to work (still, > > > > it didn't want to boot right after hibernation, but booted after I'd > > > > switched > > > > it off/on manually). > > > > > > Can you please check, whether > > > > > > http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch > > > > > > works for you ? > > > > Nope. It's a total disaster. :-( > > True. I have instrumented it to the point where the broadcast device is > programmed, but no interrupt comes in for totally unknown reasons. > > > Doesn't boot at all, even with "noacpitimer nohpet", and that's with > > NO_HZ and HIGH_RES_TIMERS unset. > > > If you have a bisectable patch series, I can try to identify the responsible > > patch. > > http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patches.tar.bz2 > > The first patches in the queue are the mainline fixups. It's x86_64-convert-to-clockevents.patch (ie. after applying it the box stops to boot). I haven't had the time to check if any special command line arguments help. Will check tomorrow. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Wed, 2007-09-19 at 19:44 +0200, Rafael J. Wysocki wrote: > > > It boots with nohpet alone and suspend/hibernation seem to work (still, > > > it didn't want to boot right after hibernation, but booted after I'd > > > switched > > > it off/on manually). > > > > Can you please check, whether > > > > http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch > > > > works for you ? > > Nope. It's a total disaster. :-( True. I have instrumented it to the point where the broadcast device is programmed, but no interrupt comes in for totally unknown reasons. > Doesn't boot at all, even with "noacpitimer nohpet", and that's with > NO_HZ and HIGH_RES_TIMERS unset. > If you have a bisectable patch series, I can try to identify the responsible > patch. http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patches.tar.bz2 The first patches in the queue are the mainline fixups. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Wednesday, 19 September 2007 09:06, Thomas Gleixner wrote: > On Tue, 2007-09-18 at 23:37 +0200, Rafael J. Wysocki wrote: > > > > > - The Vaio also hangs during resume-from-RAM, due to git-acpi.patch > > > > > > > > > > - And it hangs during suspend-to-RAM, due to git-acpi.patch > > > > > > Sorry, I was wrong. > > > > > > > On my HP nx6325 it only boots with "noacpitimer nohpet" on the command > > > > line, > > > > but then it works. > > > > > > It _sometimes_ boots with "noacpitimer nohpet" and that's if I press the > > > power > > > button for a couple of times during boot (before any messages appear on > > > the > > > console). > > > > > > > Suspend-to-RAM and hibernation work too. :-) > > > > > > No, they don't (I must have booted -rc6 instead of it by mistake, sigh). > > > > > > > Since 2.6.23-rc4-mm1 only booted with nohpet because of > > > > > > > > x86_64-convert-to-clockevents.patch > > > > > > > > I guess that the boot problems with this one result from the same patch. > > > > > > Not sure any more ... > > > > > > I'll try to compile it with NO_HZ and HIGH_RES_TIMERS unset. > > > > OK, in that configuration it's much better. > > > > It boots with nohpet alone and suspend/hibernation seem to work (still, > > it didn't want to boot right after hibernation, but booted after I'd > > switched > > it off/on manually). > > Can you please check, whether > > http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch > > works for you ? Nope. It's a total disaster. :-( Doesn't boot at all, even with "noacpitimer nohpet", and that's with NO_HZ and HIGH_RES_TIMERS unset. If you have a bisectable patch series, I can try to identify the responsible patch. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Tue, 2007-09-18 at 23:37 +0200, Rafael J. Wysocki wrote: > > > > - The Vaio also hangs during resume-from-RAM, due to git-acpi.patch > > > > > > > > - And it hangs during suspend-to-RAM, due to git-acpi.patch > > > > Sorry, I was wrong. > > > > > On my HP nx6325 it only boots with "noacpitimer nohpet" on the command > > > line, > > > but then it works. > > > > It _sometimes_ boots with "noacpitimer nohpet" and that's if I press the > > power > > button for a couple of times during boot (before any messages appear on the > > console). > > > > > Suspend-to-RAM and hibernation work too. :-) > > > > No, they don't (I must have booted -rc6 instead of it by mistake, sigh). > > > > > Since 2.6.23-rc4-mm1 only booted with nohpet because of > > > > > > x86_64-convert-to-clockevents.patch > > > > > > I guess that the boot problems with this one result from the same patch. > > > > Not sure any more ... > > > > I'll try to compile it with NO_HZ and HIGH_RES_TIMERS unset. > > OK, in that configuration it's much better. > > It boots with nohpet alone and suspend/hibernation seem to work (still, > it didn't want to boot right after hibernation, but booted after I'd switched > it off/on manually). Can you please check, whether http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch works for you ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Tue, 2007-09-18 at 23:37 +0200, Rafael J. Wysocki wrote: - The Vaio also hangs during resume-from-RAM, due to git-acpi.patch - And it hangs during suspend-to-RAM, due to git-acpi.patch Sorry, I was wrong. On my HP nx6325 it only boots with noacpitimer nohpet on the command line, but then it works. It _sometimes_ boots with noacpitimer nohpet and that's if I press the power button for a couple of times during boot (before any messages appear on the console). Suspend-to-RAM and hibernation work too. :-) No, they don't (I must have booted -rc6 instead of it by mistake, sigh). Since 2.6.23-rc4-mm1 only booted with nohpet because of x86_64-convert-to-clockevents.patch I guess that the boot problems with this one result from the same patch. Not sure any more ... I'll try to compile it with NO_HZ and HIGH_RES_TIMERS unset. OK, in that configuration it's much better. It boots with nohpet alone and suspend/hibernation seem to work (still, it didn't want to boot right after hibernation, but booted after I'd switched it off/on manually). Can you please check, whether http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch works for you ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Wednesday, 19 September 2007 09:06, Thomas Gleixner wrote: On Tue, 2007-09-18 at 23:37 +0200, Rafael J. Wysocki wrote: - The Vaio also hangs during resume-from-RAM, due to git-acpi.patch - And it hangs during suspend-to-RAM, due to git-acpi.patch Sorry, I was wrong. On my HP nx6325 it only boots with noacpitimer nohpet on the command line, but then it works. It _sometimes_ boots with noacpitimer nohpet and that's if I press the power button for a couple of times during boot (before any messages appear on the console). Suspend-to-RAM and hibernation work too. :-) No, they don't (I must have booted -rc6 instead of it by mistake, sigh). Since 2.6.23-rc4-mm1 only booted with nohpet because of x86_64-convert-to-clockevents.patch I guess that the boot problems with this one result from the same patch. Not sure any more ... I'll try to compile it with NO_HZ and HIGH_RES_TIMERS unset. OK, in that configuration it's much better. It boots with nohpet alone and suspend/hibernation seem to work (still, it didn't want to boot right after hibernation, but booted after I'd switched it off/on manually). Can you please check, whether http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch works for you ? Nope. It's a total disaster. :-( Doesn't boot at all, even with noacpitimer nohpet, and that's with NO_HZ and HIGH_RES_TIMERS unset. If you have a bisectable patch series, I can try to identify the responsible patch. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Wed, 2007-09-19 at 19:44 +0200, Rafael J. Wysocki wrote: It boots with nohpet alone and suspend/hibernation seem to work (still, it didn't want to boot right after hibernation, but booted after I'd switched it off/on manually). Can you please check, whether http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch works for you ? Nope. It's a total disaster. :-( True. I have instrumented it to the point where the broadcast device is programmed, but no interrupt comes in for totally unknown reasons. Doesn't boot at all, even with noacpitimer nohpet, and that's with NO_HZ and HIGH_RES_TIMERS unset. If you have a bisectable patch series, I can try to identify the responsible patch. http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patches.tar.bz2 The first patches in the queue are the mainline fixups. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Wednesday, 19 September 2007 21:21, Thomas Gleixner wrote: On Wed, 2007-09-19 at 19:44 +0200, Rafael J. Wysocki wrote: It boots with nohpet alone and suspend/hibernation seem to work (still, it didn't want to boot right after hibernation, but booted after I'd switched it off/on manually). Can you please check, whether http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patch works for you ? Nope. It's a total disaster. :-( True. I have instrumented it to the point where the broadcast device is programmed, but no interrupt comes in for totally unknown reasons. Doesn't boot at all, even with noacpitimer nohpet, and that's with NO_HZ and HIGH_RES_TIMERS unset. If you have a bisectable patch series, I can try to identify the responsible patch. http://tglx.de/projects/hrtimers/2.6.23-rc6/patch-2.6.23-rc6-hrt2.patches.tar.bz2 The first patches in the queue are the mainline fixups. It's x86_64-convert-to-clockevents.patch (ie. after applying it the box stops to boot). I haven't had the time to check if any special command line arguments help. Will check tomorrow. Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Tuesday, 18 September 2007 22:54, Rafael J. Wysocki wrote: > On Tuesday, 18 September 2007 22:21, Rafael J. Wysocki wrote: > > On Tuesday, 18 September 2007 10:18, Andrew Morton wrote: > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc6/2.6.23-rc6-mm1/ > > > > > > 2.6.23-rc6-mm1 is a 29MB diff against 2.6.23-rc6. > > > > > > It took me over two solid days to get this lot compiling and booting on a > > > few > > > boxes. This required around ninety fixup patches and patch droppings. > > > There > > > are several bugs in here which I know of (details below) and presumably > > > many > > > more which I don't know of. I have to say that this just isn't working > > > any > > > more. > > > > > > - The Vaio hangs when quitting X due to x86_64-mm-cpa-clflush.patch, but > > > I didn't drop that patch because the iommu patch series depends on it. > > > > > > - The Vaio also hangs during resume-from-RAM, due to git-acpi.patch > > > > > > - And it hangs during suspend-to-RAM, due to git-acpi.patch > > Sorry, I was wrong. > > > On my HP nx6325 it only boots with "noacpitimer nohpet" on the command line, > > but then it works. > > It _sometimes_ boots with "noacpitimer nohpet" and that's if I press the power > button for a couple of times during boot (before any messages appear on the > console). > > > Suspend-to-RAM and hibernation work too. :-) > > No, they don't (I must have booted -rc6 instead of it by mistake, sigh). > > > Since 2.6.23-rc4-mm1 only booted with nohpet because of > > > > x86_64-convert-to-clockevents.patch > > > > I guess that the boot problems with this one result from the same patch. > > Not sure any more ... > > I'll try to compile it with NO_HZ and HIGH_RES_TIMERS unset. OK, in that configuration it's much better. It boots with nohpet alone and suspend/hibernation seem to work (still, it didn't want to boot right after hibernation, but booted after I'd switched it off/on manually). Unfortunately, I get this in dmesg: ALSA /home/rafael/src/mm/linux-2.6.23-rc6-mm1/sound/pci/hda/hda_intel.c:1758: hda-intel: ioremap error and (obviously) the sound card doesn't work. Additionally, I've got a couple of these: WARNING: at /home/rafael/src/mm/linux-2.6.23-rc6-mm1/drivers/usb/core/driver.c:1 217 usb_autopm_do_device() Call Trace: [] :usbcore:usb_autopm_do_device+0x60/0xe9 [] :usbcore:usb_autosuspend_device+0xc/0xe [] :usbcore:usb_disconnect+0x15f/0x18c [] :usbcore:hub_thread+0x691/0x10a1 [] autoremove_wake_function+0x0/0x38 [] :usbcore:hub_thread+0x0/0x10a1 [] kthread+0x49/0x79 [] child_rip+0xa/0x12 [] kthread+0x0/0x79 [] child_rip+0x0/0x12 Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc6-mm1: failure to boot on HP nx6325, no sound when booted, USB-related WARNING
On Tuesday, 18 September 2007 22:54, Rafael J. Wysocki wrote: On Tuesday, 18 September 2007 22:21, Rafael J. Wysocki wrote: On Tuesday, 18 September 2007 10:18, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc6/2.6.23-rc6-mm1/ 2.6.23-rc6-mm1 is a 29MB diff against 2.6.23-rc6. It took me over two solid days to get this lot compiling and booting on a few boxes. This required around ninety fixup patches and patch droppings. There are several bugs in here which I know of (details below) and presumably many more which I don't know of. I have to say that this just isn't working any more. - The Vaio hangs when quitting X due to x86_64-mm-cpa-clflush.patch, but I didn't drop that patch because the iommu patch series depends on it. - The Vaio also hangs during resume-from-RAM, due to git-acpi.patch - And it hangs during suspend-to-RAM, due to git-acpi.patch Sorry, I was wrong. On my HP nx6325 it only boots with noacpitimer nohpet on the command line, but then it works. It _sometimes_ boots with noacpitimer nohpet and that's if I press the power button for a couple of times during boot (before any messages appear on the console). Suspend-to-RAM and hibernation work too. :-) No, they don't (I must have booted -rc6 instead of it by mistake, sigh). Since 2.6.23-rc4-mm1 only booted with nohpet because of x86_64-convert-to-clockevents.patch I guess that the boot problems with this one result from the same patch. Not sure any more ... I'll try to compile it with NO_HZ and HIGH_RES_TIMERS unset. OK, in that configuration it's much better. It boots with nohpet alone and suspend/hibernation seem to work (still, it didn't want to boot right after hibernation, but booted after I'd switched it off/on manually). Unfortunately, I get this in dmesg: ALSA /home/rafael/src/mm/linux-2.6.23-rc6-mm1/sound/pci/hda/hda_intel.c:1758: hda-intel: ioremap error and (obviously) the sound card doesn't work. Additionally, I've got a couple of these: WARNING: at /home/rafael/src/mm/linux-2.6.23-rc6-mm1/drivers/usb/core/driver.c:1 217 usb_autopm_do_device() Call Trace: [8813885e] :usbcore:usb_autopm_do_device+0x60/0xe9 [88138910] :usbcore:usb_autosuspend_device+0xc/0xe [88131aa8] :usbcore:usb_disconnect+0x15f/0x18c [88133305] :usbcore:hub_thread+0x691/0x10a1 [8024a077] autoremove_wake_function+0x0/0x38 [88132c74] :usbcore:hub_thread+0x0/0x10a1 [80249f50] kthread+0x49/0x79 [8020ce98] child_rip+0xa/0x12 [80249f07] kthread+0x0/0x79 [8020ce8e] child_rip+0x0/0x12 Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/