Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Andrew Morton wrote: (Please do reply-to-all) Jindrich Makovicka <[EMAIL PROTECTED]> wrote: Pavel Machek wrote: Hi! In `subj` kernel, machine no longer powers down at the end of swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. For me, power down stopped working since the introduction of softlockup detection. After disabling CONFIG_DETECT_SOFTLOCKUP, powerdown works fine. Could you send the output which CONFIG_DETECT_SOFTLOCKUP generates? I had one CONFIG_DETECT_SOFTLOCKUP failure with suspend, on SMP. The machine was stuck somewhere under mce_work_fn(). Perhaps in the smp_call_function(). It only happened the once. Strange enough, softlockup produces no additional output. Kernel just prints "acpi_power_off called" and freezes. Without softlockup detection compiled in it turns off normally. First I was under impression that this is caused by acpi_power_off-bug-fix.patch mentioned above, but unfortunately removing it didn't actually solve the problem. Later I found I missed that softlockup detection sneaked in turned on by default, and disabling it made power off work again. Power down via APM produced some softlockup output, but I am not sure if APM actually worked on my machine before - I just tried APM if it works when ACPI doesn't, and didn't bother taking a snapshot. I can recompile an APM kernel with softlockup enabled and disabled and test it, if it could help. -- Jindrich Makovicka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Andrew Morton wrote: (Please do reply-to-all) Jindrich Makovicka [EMAIL PROTECTED] wrote: Pavel Machek wrote: Hi! In `subj` kernel, machine no longer powers down at the end of swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. For me, power down stopped working since the introduction of softlockup detection. After disabling CONFIG_DETECT_SOFTLOCKUP, powerdown works fine. Could you send the output which CONFIG_DETECT_SOFTLOCKUP generates? I had one CONFIG_DETECT_SOFTLOCKUP failure with suspend, on SMP. The machine was stuck somewhere under mce_work_fn(). Perhaps in the smp_call_function(). It only happened the once. Strange enough, softlockup produces no additional output. Kernel just prints acpi_power_off called and freezes. Without softlockup detection compiled in it turns off normally. First I was under impression that this is caused by acpi_power_off-bug-fix.patch mentioned above, but unfortunately removing it didn't actually solve the problem. Later I found I missed that softlockup detection sneaked in turned on by default, and disabling it made power off work again. Power down via APM produced some softlockup output, but I am not sure if APM actually worked on my machine before - I just tried APM if it works when ACPI doesn't, and didn't bother taking a snapshot. I can recompile an APM kernel with softlockup enabled and disabled and test it, if it could help. -- Jindrich Makovicka - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
(Please do reply-to-all) Jindrich Makovicka <[EMAIL PROTECTED]> wrote: > > Pavel Machek wrote: > > Hi! > > > > In `subj` kernel, machine no longer powers down at the end of > > swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. > > For me, power down stopped working since the introduction of softlockup > detection. After disabling CONFIG_DETECT_SOFTLOCKUP, powerdown works fine. Could you send the output which CONFIG_DETECT_SOFTLOCKUP generates? I had one CONFIG_DETECT_SOFTLOCKUP failure with suspend, on SMP. The machine was stuck somewhere under mce_work_fn(). Perhaps in the smp_call_function(). It only happened the once. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Pavel Machek wrote: Hi! In `subj` kernel, machine no longer powers down at the end of swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. For me, power down stopped working since the introduction of softlockup detection. After disabling CONFIG_DETECT_SOFTLOCKUP, powerdown works fine. -- Jindrich Makovicka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Pavel Machek wrote: Hi! In `subj` kernel, machine no longer powers down at the end of swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. For me, power down stopped working since the introduction of softlockup detection. After disabling CONFIG_DETECT_SOFTLOCKUP, powerdown works fine. -- Jindrich Makovicka - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
(Please do reply-to-all) Jindrich Makovicka [EMAIL PROTECTED] wrote: Pavel Machek wrote: Hi! In `subj` kernel, machine no longer powers down at the end of swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. For me, power down stopped working since the introduction of softlockup detection. After disabling CONFIG_DETECT_SOFTLOCKUP, powerdown works fine. Could you send the output which CONFIG_DETECT_SOFTLOCKUP generates? I had one CONFIG_DETECT_SOFTLOCKUP failure with suspend, on SMP. The machine was stuck somewhere under mce_work_fn(). Perhaps in the smp_call_function(). It only happened the once. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! > > > Relocating pagedir | > > > Reading image data (8157 pages): 100% 8157 done. > > > Stopping tasks: | > > > Freeing memory... done (0 pages freed) > > > Freezing CPUs (at 1)...Sleeping in: > > > [] dump_stack+0x19/0x20 > > > [] smp_pause+0x1f/0x54 > > > [] smp_call_function_interrupt+0x3b/0x60 > > > [] call_function_interrupt+0x1c/0x24 > > > [] cpu_idle+0x55/0x64 > > > [] start_secondary+0x71/0x78 > > > [<>] 0x0 > > > [] 0xcffa5fbc > > > ok > > > double fault, gdt at c1203260 [255 bytes] > > > NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers: > > Note the double fault. Yes, I can see it, it scares me. SMP swsusp is not in good state because I do not have easy access to SMP or HT hardware. I guess I'll just have to get into suse at the night and steal some P4 ;-). Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Pavel Machek <[EMAIL PROTECTED]> wrote: > > I can fix disk going yo-yo without switching pm_message_t to struct, > but will have to back parts of that later. Do you want patch? No thanks, I was just pointing it out. It sounds like you have it under control. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Pavel Machek <[EMAIL PROTECTED]> wrote: > > Hi! > > > Resume on SMP locks up. > > Does it work on UP kernel on same hardware? yup. > NMI watchdog is problem > for suspend, it takes long to do various phases. Can you disable it > for testing? Will try to remember to do that. > > Relocating pagedir | > > Reading image data (8157 pages): 100% 8157 done. > > Stopping tasks: | > > Freeing memory... done (0 pages freed) > > Freezing CPUs (at 1)...Sleeping in: > > [] dump_stack+0x19/0x20 > > [] smp_pause+0x1f/0x54 > > [] smp_call_function_interrupt+0x3b/0x60 > > [] call_function_interrupt+0x1c/0x24 > > [] cpu_idle+0x55/0x64 > > [] start_secondary+0x71/0x78 > > [<>] 0x0 > > [] 0xcffa5fbc > > ok > > double fault, gdt at c1203260 [255 bytes] > > NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers: Note the double fault. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Pavel Machek <[EMAIL PROTECTED]> writes: > > I threw it together to test a specific code path, and the fact it > > fails in software suspend is actually almost confirmation that I am on > > the right track. This actually fixed the case I was testing. > > > > In this case the failure is simply because system_state is > > not set to SYSTEM_POWER_OFF before > > kernel/power/disk.c:power_down() calls device_shutdown(). > > The appropriate reboot notifier is also not called.. > > Can you suggest patch to do it right? Or perhaps there should be > just_plain_power_machine_down() that does all neccessary > trickery? I would call it kernel_power_down() and that is what I am suggesting is the right fix. We have it open coded in kernel/sys.c:sys_reboot() in the switch case for: LINUX_REBOOT_CMD_POWER_OFF So after the code gets factored out from there all of the cases that call machine_power_off() and pm_power_off() directly need to be updated. There are similar cases for machine_restart() and machine_halt(). But the power off case seems to be the most acute. My biggest problem with this is I get into the recursive code cleanup problem. Where I fix one piece and a bug is exposed somewhere else. And that then requires investigation and fixing. Fixing the callers of machine_power_off() is about the fifth bug fix down the chain triggered by disabling UP interrupts in device_shutdown(), SMP interrupts have always been disabled. With the first bug fix was to create system devices in the device tree.. I haven't a clue where fixing this one will lead. Recursive code fixes are a hard thing to schedule :( Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! > btw, suspend is a bit messy. The disk spins down. Then up. Then down > again. And: Here's preview patch to make disk not do stupid yo-yo. Please do not apply (it will probably not apply cleanly anyway). I can fix disk going yo-yo without switching pm_message_t to struct, but will have to back parts of that later. Do you want patch? Pavel --- clean/drivers/base/power/resume.c 2004-12-25 13:34:59.0 +0100 +++ linux/drivers/base/power/resume.c 2005-02-28 15:38:51.0 +0100 @@ -41,7 +41,7 @@ list_add_tail(entry, _active); up(_list_sem); - if (!dev->power.prev_state) + if (!dev->power.prev_state EVENT) resume_device(dev); down(_list_sem); put_device(dev); --- clean/drivers/base/power/runtime.c 2005-01-12 11:07:39.0 +0100 +++ linux/drivers/base/power/runtime.c 2005-02-28 15:42:10.0 +0100 @@ -13,10 +13,10 @@ static void runtime_resume(struct device * dev) { dev_dbg(dev, "resuming\n"); - if (!dev->power.power_state) + if (!dev->power.power_state EVENT) return; if (!resume_device(dev)) - dev->power.power_state = 0; + dev->power.power_state = PMSG_ON; } @@ -49,10 +49,10 @@ int error = 0; down(_sem); - if (dev->power.power_state == state) + if (dev->power.power_state EVENT == state EVENT) goto Done; - if (dev->power.power_state) + if (dev->power.power_state EVENT) runtime_resume(dev); if (!(error = suspend_device(dev, state))) --- clean/drivers/base/power/shutdown.c 2004-08-15 19:14:55.0 +0200 +++ linux/drivers/base/power/shutdown.c 2005-01-12 10:57:23.0 +0100 @@ -29,7 +29,8 @@ dev->driver->shutdown(dev); return 0; } - return dpm_runtime_suspend(dev, dev->detach_state); + /* FIXME */ + return dpm_runtime_suspend(dev, PMSG_FREEZE); } --- clean/drivers/base/power/suspend.c 2005-01-12 11:07:39.0 +0100 +++ linux/drivers/base/power/suspend.c 2005-02-28 21:30:13.0 +0100 @@ -43,7 +43,7 @@ dev->power.prev_state = dev->power.power_state; - if (dev->bus && dev->bus->suspend && !dev->power.power_state) + if (dev->bus && dev->bus->suspend && (!dev->power.power_state EVENT)) error = dev->bus->suspend(dev, state); return error; @@ -134,6 +134,8 @@ Done: return error; Error: + printk(KERN_ERR "Could not power down device %s: " + "error %d\n", kobject_name(>kobj), error); dpm_power_up(); goto Done; } --- clean/drivers/base/power/sysfs.c2004-08-15 19:14:55.0 +0200 +++ linux/drivers/base/power/sysfs.c2005-02-28 15:43:57.0 +0100 @@ -26,19 +26,20 @@ static ssize_t state_show(struct device * dev, char * buf) { - return sprintf(buf, "%u\n", dev->power.power_state); + return sprintf(buf, "%u\n", dev->power.power_state EVENT); } static ssize_t state_store(struct device * dev, const char * buf, size_t n) { - u32 state; + pm_message_t state; char * rest; int error = 0; - state = simple_strtoul(buf, , 10); + state EVENT = simple_strtoul(buf, , 10); +// state.flags = PFL_RUNTIME; if (*rest) return -EINVAL; - if (state) + if (state EVENT) error = dpm_runtime_suspend(dev, state); else dpm_runtime_resume(dev); --- clean/drivers/ide/ide-disk.c2005-02-14 14:12:21.0 +0100 +++ linux/drivers/ide/ide-disk.c2005-02-14 22:34:43.0 +0100 @@ -872,7 +872,7 @@ { switch (rq->pm->pm_step) { case idedisk_pm_flush_cache:/* Suspend step 1 (flush cache) complete */ - if (rq->pm->pm_state == 4) + if (rq->pm->pm_state == EVENT_FREEZE) rq->pm->pm_step = ide_pm_state_completed; else rq->pm->pm_step = idedisk_pm_standby; @@ -1155,8 +1155,7 @@ return; } - printk("Shutdown: %s\n", drive->name); - dev->bus->suspend(dev, PM_SUSPEND_STANDBY); + dev->bus->suspend(dev, PMSG_SUSPEND); } /* --- clean/drivers/ide/ide.c 2005-02-28 00:50:42.0 +0100 +++ linux/drivers/ide/ide.c 2005-02-28 15:48:21.0 +0100 @@ -1398,7 +1398,7 @@ rq.special = rq.pm = rqpm.pm_step = ide_pm_state_start_suspend; - rqpm.pm_state = state; + rqpm.pm_state = state EVENT; return ide_do_drive_cmd(drive, , ide_wait); } @@ -1417,7 +1417,7 @@ rq.special = rq.pm = rqpm.pm_step = ide_pm_state_start_resume; - rqpm.pm_state = 0; + rqpm.pm_state = EVENT_ON;
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! > btw, suspend is a bit messy. The disk spins down. Then up. Then down > again. And: Yes, this is going to be properly solved by switching pm_message_t to struct (preview patch attached, EVENT will become .event, this is just for me). I could do some hack to make disk not go up-down-up (and will need to do it for suse9.3, anyway), but I do not think that would belong to mainline. > Powering off system > Debug: sleeping function called from invalid context at > include/linux/rwsem.h:66 > in_atomic():0, irqs_disabled():1 > > [] dump_stack+0x19/0x20 > [] __might_sleep+0x91/0x9c > [] device_shutdown+0x16/0x82 > [] power_down+0x47/0x74 > [] pm_suspend_disk+0x5a/0x74 > [] enter_state+0x2e/0x70 > [] software_suspend+0xa/0x10 > [] acpi_system_write_sleep+0x73/0x98 > [] vfs_write+0xaf/0x118 > [] sys_write+0x3c/0x68 > [] sysenter_past_esp+0x52/0x75 > Synchronizing SCSI cache for disk sda: > Shutdown: hda > acpi_power_off called Hmm, device_shutdown is confused. Should it be called with interrupts enabled or disabled? It uses rwsem, that suggests interrupts enabled, but I do not think sysdev_shutdown with enabled interrupts is good idea (and comment suggests it should be called with interrupts disabled). Pavel /** * We handle system devices differently - we suspend and shut them * down last and resume them first. That way, we don't do anything stupid like * shutting down the interrupt controller before any devices.. * * Note that there are not different stages for power management calls - * they only get one called once when interrupts are disabled. */ extern int sysdev_shutdown(void); /** * device_shutdown - call ->shutdown() on each device to shutdown. */ void device_shutdown(void) { struct device * dev; down_write(_subsys.rwsem); list_for_each_entry_reverse(dev, _subsys.kset.list, kobj.entry) { pr_debug("shutting down %s: ", dev->bus_id); if (dev->driver && dev->driver->shutdown) { pr_debug("Ok\n"); dev->driver->shutdown(dev); } else pr_debug("Ignored.\n"); } up_write(_subsys.rwsem); sysdev_shutdown(); } -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! > > > In `subj` kernel, machine no longer powers down at the end of > > > swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. > > > > Binary searching indicates that this is due to > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch. > > > > > > I'll drop it. That patch is pretty ugly-looking anyway (ACPI code in > > drivers/base/power/?). > > > > Perhaps someone who is hitting the problem which that patch addresses could > > raise a bugzilla entry. > > > > Oh. It has one. http://bugme.osdl.org/show_bug.cgi?id=4041 > > > > Anyway. It needs more work. > > Agreed. > > I threw it together to test a specific code path, and the fact it > fails in software suspend is actually almost confirmation that I am on > the right track. This actually fixed the case I was testing. > > In this case the failure is simply because system_state is > not set to SYSTEM_POWER_OFF before > kernel/power/disk.c:power_down() calls device_shutdown(). > The appropriate reboot notifier is also not called.. Can you suggest patch to do it right? Or perhaps there should be just_plain_power_machine_down() that does all neccessary trickery? Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! > > Yes, the patch is very ugly. If something like this needs to be done, > > then perhaps acpi should properly register into driver model and do > > the work there. This will also mean code will be called consistently. > > I totally agree. Do you have an example of how a non-device > can do this? > > In particular something that gets as close to shutting down > the system devices as possible. But gets called before that. > > Or perhaps acpi should simply be setup to be the first system device? I believe that's the prefered solution. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Pavel Machek <[EMAIL PROTECTED]> writes: > Yes, the patch is very ugly. If something like this needs to be done, > then perhaps acpi should properly register into driver model and do > the work there. This will also mean code will be called consistently. I totally agree. Do you have an example of how a non-device can do this? In particular something that gets as close to shutting down the system devices as possible. But gets called before that. Or perhaps acpi should simply be setup to be the first system device? Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Andrew Morton <[EMAIL PROTECTED]> writes: > Pavel Machek <[EMAIL PROTECTED]> wrote: > > > > In `subj` kernel, machine no longer powers down at the end of > > swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. > > Binary searching indicates that this is due to > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch. > > > I'll drop it. That patch is pretty ugly-looking anyway (ACPI code in > drivers/base/power/?). > > Perhaps someone who is hitting the problem which that patch addresses could > raise a bugzilla entry. > > Oh. It has one. http://bugme.osdl.org/show_bug.cgi?id=4041 > > Anyway. It needs more work. Agreed. I threw it together to test a specific code path, and the fact it fails in software suspend is actually almost confirmation that I am on the right track. This actually fixed the case I was testing. In this case the failure is simply because system_state is not set to SYSTEM_POWER_OFF before kernel/power/disk.c:power_down() calls device_shutdown(). The appropriate reboot notifier is also not called.. So to fix this properly all of the places that call machine_power_off now need to call a wrapper that does all of the appropriate things and then calls machine_power_off. Likewise with the other reboot functions. In addition a clean way to get device_shutdown() to call acpi_power_off_prepare() at roughly the location I have it hard coded. The fundamental issue this patch was starting to address before I ran out of steam, is that acpi_power_off_prepare() must be called with interrupts enabled and after we have shut down the system devices (i.e. the interrupt controllers) we can't guarantee interrupts, are working. I'm don't know how much earlier it is safe to acpi_power_off_prepare(). But mostly I think we need to throw in a fake device to attach acpi_power_off_prepare to. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! > Resume on SMP locks up. Does it work on UP kernel on same hardware? NMI watchdog is problem for suspend, it takes long to do various phases. Can you disable it for testing? Pavel > Relocating pagedir | > Reading image data (8157 pages): 100% 8157 done. > Stopping tasks: | > Freeing memory... done (0 pages freed) > Freezing CPUs (at 1)...Sleeping in: > [] dump_stack+0x19/0x20 > [] smp_pause+0x1f/0x54 > [] smp_call_function_interrupt+0x3b/0x60 > [] call_function_interrupt+0x1c/0x24 > [] cpu_idle+0x55/0x64 > [] start_secondary+0x71/0x78 > [<>] 0x0 > [] 0xcffa5fbc > ok > double fault, gdt at c1203260 [255 bytes] > NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers: > Modules linked in: video thermal processor pcc_acpi fan button battery ac > CPU:1 > EIP:0060:[]Not tainted VLI > EFLAGS: 0002 (2.6.11-rc5) > EIP is at smp_pause+0x36/0x54 > eax: 0001 ebx: cffa5f20 ecx: fffbe4e6 edx: cffa5f20 > esi: cffa4000 edi: 0080 ebp: cffa5f58 esp: cffa5f1c > ds: 007b es: 007b ss: 0068 > Process swapper (pid: 0, threadinfo=cffa4000 task=c18ac540) > Stack: 007b 0068 80050033 005d3000 06f0 > 00ff0001 >c120b260 07ff5f4c c0577000 0088 cffa0080 c011eed4 cffa5f68 > cffa5f68 >c010ee27 0001 cffa5fa4 c01037d4 0001 c120b260 > fffbe4e5 > Call Trace: > > [] show_stack+0x7b/0x88 > [] show_registers+0x112/0x188 > [] die_nmi+0x41/0x74 > [] nmi_watchdog_tick+0x54/0xcc > [] default_do_nmi+0x73/0xfc > [] do_nmi+0x39/0x4c > [] nmi_stack_correct+0x1d/0x2a > [] smp_call_function_interrupt+0x3b/0x60 > [] call_function_interrupt+0x1c/0x24 > [] cpu_idle+0x55/0x64 > [] start_secondary+0x71/0x78 > [<>] 0x0 > [] 0xcffa5fbc > Code: e8 60 e0 24 00 68 0c 7a 40 c0 e8 c2 68 fe ff e8 85 ff fc ff 83 c4 08 f0 > ff 05 4c 20 5e c0 a1 50 20 5e c0 89 da 85 c0 74 0b f3 > console shuts up ... > > -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! > > In `subj` kernel, machine no longer powers down at the end of > > swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. > > Binary searching indicates that this is due to > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch. > > I'll drop it. That patch is pretty ugly-looking anyway (ACPI code in > drivers/base/power/?). > > Perhaps someone who is hitting the problem which that patch addresses could > raise a bugzilla entry. > > Oh. It has one. http://bugme.osdl.org/show_bug.cgi?id=4041 > > Anyway. It needs more work. Yes, the patch is very ugly. If something like this needs to be done, then perhaps acpi should properly register into driver model and do the work there. This will also mean code will be called consistently. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! > btw, suspend is a bit messy. The disk spins down. Then up. Then down > again. And: Yes, that's known, pm_message_t needs to become struct to solve disk pingpong properly. > Debug: sleeping function called from invalid context at mm/slab.c:2082 > in_atomic():0, irqs_disabled():1 > [] dump_stack+0x19/0x20 > [] __might_sleep+0x91/0x9c > [] kmem_cache_alloc+0x23/0x84 > [] acpi_evaluate_integer+0x3c/0xac > [] acpi_bus_get_status+0x39/0x94 > [] acpi_pci_link_set+0x16d/0x1e8 > [] acpi_pci_link_resume+0x1d/0x28 > [] irqrouter_resume+0x1a/0x38 > [] sysdev_resume+0x2c/0xae > [] device_power_up+0x8/0x11 > [] swsusp_suspend+0x4b/0x58 > [] pm_suspend_disk+0x35/0x74 > [] enter_state+0x2e/0x70 > [] software_suspend+0xa/0x10 > [] acpi_system_write_sleep+0x73/0x98 > [] vfs_write+0xaf/0x118 > [] sys_write+0x3c/0x68 > [] sysenter_past_esp+0x52/0x75 ACPI problem, patches are available (s/GFP_KERNEL/GFP_ATOMIC), but Len claims better solution is ready... OTOH he claims that for half a year already so we may push him a bit (added to cc). > Powering off system > Debug: sleeping function called from invalid context at > include/linux/rwsem.h:66 > in_atomic():0, irqs_disabled():1 > > [] dump_stack+0x19/0x20 > [] __might_sleep+0x91/0x9c > [] device_shutdown+0x16/0x82 > [] power_down+0x47/0x74 > [] pm_suspend_disk+0x5a/0x74 > [] enter_state+0x2e/0x70 > [] software_suspend+0xa/0x10 > [] acpi_system_write_sleep+0x73/0x98 > [] vfs_write+0xaf/0x118 > [] sys_write+0x3c/0x68 > [] sysenter_past_esp+0x52/0x75 I'll look at this one. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Resume on SMP locks up. Relocating pagedir | Reading image data (8157 pages): 100% 8157 done. Stopping tasks: | Freeing memory... done (0 pages freed) Freezing CPUs (at 1)...Sleeping in: [] dump_stack+0x19/0x20 [] smp_pause+0x1f/0x54 [] smp_call_function_interrupt+0x3b/0x60 [] call_function_interrupt+0x1c/0x24 [] cpu_idle+0x55/0x64 [] start_secondary+0x71/0x78 [<>] 0x0 [] 0xcffa5fbc ok double fault, gdt at c1203260 [255 bytes] NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers: Modules linked in: video thermal processor pcc_acpi fan button battery ac CPU:1 EIP:0060:[]Not tainted VLI EFLAGS: 0002 (2.6.11-rc5) EIP is at smp_pause+0x36/0x54 eax: 0001 ebx: cffa5f20 ecx: fffbe4e6 edx: cffa5f20 esi: cffa4000 edi: 0080 ebp: cffa5f58 esp: cffa5f1c ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=cffa4000 task=c18ac540) Stack: 007b 0068 80050033 005d3000 06f0 00ff0001 c120b260 07ff5f4c c0577000 0088 cffa0080 c011eed4 cffa5f68 cffa5f68 c010ee27 0001 cffa5fa4 c01037d4 0001 c120b260 fffbe4e5 Call Trace: [] show_stack+0x7b/0x88 [] show_registers+0x112/0x188 [] die_nmi+0x41/0x74 [] nmi_watchdog_tick+0x54/0xcc [] default_do_nmi+0x73/0xfc [] do_nmi+0x39/0x4c [] nmi_stack_correct+0x1d/0x2a [] smp_call_function_interrupt+0x3b/0x60 [] call_function_interrupt+0x1c/0x24 [] cpu_idle+0x55/0x64 [] start_secondary+0x71/0x78 [<>] 0x0 [] 0xcffa5fbc Code: e8 60 e0 24 00 68 0c 7a 40 c0 e8 c2 68 fe ff e8 85 ff fc ff 83 c4 08 f0 ff 05 4c 20 5e c0 a1 50 20 5e c0 89 da 85 c0 74 0b f3 console shuts up ... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
btw, suspend is a bit messy. The disk spins down. Then up. Then down again. And: Stopping tasks: ==| Freeing memory... done (7069 pages freed) swsusp: Need to copy 7847 pages swsusp: critical section/: done (7879 pages copied) swsusp: Restoring Highmem Debug: sleeping function called from invalid context at mm/slab.c:2082 in_atomic():0, irqs_disabled():1 [] dump_stack+0x19/0x20 [] __might_sleep+0x91/0x9c [] kmem_cache_alloc+0x23/0x84 [] acpi_evaluate_integer+0x3c/0xac [] acpi_bus_get_status+0x39/0x94 [] acpi_pci_link_set+0x16d/0x1e8 [] acpi_pci_link_resume+0x1d/0x28 [] irqrouter_resume+0x1a/0x38 [] sysdev_resume+0x2c/0xae [] device_power_up+0x8/0x11 [] swsusp_suspend+0x4b/0x58 [] pm_suspend_disk+0x35/0x74 [] enter_state+0x2e/0x70 [] software_suspend+0xa/0x10 [] acpi_system_write_sleep+0x73/0x98 [] vfs_write+0xaf/0x118 [] sys_write+0x3c/0x68 [] sysenter_past_esp+0x52/0x75 PCI: Setting latency timer of device :00:1f.2 to 64 ACPI: PCI interrupt :00:1f.5[B] -> GSI 9 (level, low) -> IRQ 9 PCI: Setting latency timer of device :00:1f.5 to 64 ACPI: PCI interrupt :01:00.0[A] -> GSI 11 (level, low) -> IRQ 11 ehci_hcd :02:01.2: USB 2.0 restarted, EHCI 0.95, driver 10 Dec 2004 ACPI: PCI interrupt :02:0c.0[A] -> GSI 9 (level, low) -> IRQ 9 e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex Writing data to swap (7879 pages)... done Writing pagedir (31 pages) S| Powering off system Debug: sleeping function called from invalid context at include/linux/rwsem.h:66 in_atomic():0, irqs_disabled():1 [] dump_stack+0x19/0x20 [] __might_sleep+0x91/0x9c [] device_shutdown+0x16/0x82 [] power_down+0x47/0x74 [] pm_suspend_disk+0x5a/0x74 [] enter_state+0x2e/0x70 [] software_suspend+0xa/0x10 [] acpi_system_write_sleep+0x73/0x98 [] vfs_write+0xaf/0x118 [] sys_write+0x3c/0x68 [] sysenter_past_esp+0x52/0x75 Synchronizing SCSI cache for disk sda: Shutdown: hda acpi_power_off called - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Pavel Machek <[EMAIL PROTECTED]> wrote: > > In `subj` kernel, machine no longer powers down at the end of > swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. Binary searching indicates that this is due to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch. I'll drop it. That patch is pretty ugly-looking anyway (ACPI code in drivers/base/power/?). Perhaps someone who is hitting the problem which that patch addresses could raise a bugzilla entry. Oh. It has one. http://bugme.osdl.org/show_bug.cgi?id=4041 Anyway. It needs more work. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Pavel Machek [EMAIL PROTECTED] wrote: In `subj` kernel, machine no longer powers down at the end of swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. Binary searching indicates that this is due to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch. I'll drop it. That patch is pretty ugly-looking anyway (ACPI code in drivers/base/power/?). Perhaps someone who is hitting the problem which that patch addresses could raise a bugzilla entry. Oh. It has one. http://bugme.osdl.org/show_bug.cgi?id=4041 Anyway. It needs more work. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
btw, suspend is a bit messy. The disk spins down. Then up. Then down again. And: Stopping tasks: ==| Freeing memory... done (7069 pages freed) swsusp: Need to copy 7847 pages swsusp: critical section/: done (7879 pages copied) swsusp: Restoring Highmem Debug: sleeping function called from invalid context at mm/slab.c:2082 in_atomic():0, irqs_disabled():1 [c010318d] dump_stack+0x19/0x20 [c0111731] __might_sleep+0x91/0x9c [c01365df] kmem_cache_alloc+0x23/0x84 [c0232d50] acpi_evaluate_integer+0x3c/0xac [c024b3d9] acpi_bus_get_status+0x39/0x94 [c024ca99] acpi_pci_link_set+0x16d/0x1e8 [c024ce65] acpi_pci_link_resume+0x1d/0x28 [c024ce8a] irqrouter_resume+0x1a/0x38 [c0281e3c] sysdev_resume+0x2c/0xae [c0285ea8] device_power_up+0x8/0x11 [c012a873] swsusp_suspend+0x4b/0x58 [c012ac35] pm_suspend_disk+0x35/0x74 [c01292ea] enter_state+0x2e/0x70 [c0129336] software_suspend+0xa/0x10 [c024a8a7] acpi_system_write_sleep+0x73/0x98 [c0149f1b] vfs_write+0xaf/0x118 [c014a028] sys_write+0x3c/0x68 [c0102c05] sysenter_past_esp+0x52/0x75 PCI: Setting latency timer of device :00:1f.2 to 64 ACPI: PCI interrupt :00:1f.5[B] - GSI 9 (level, low) - IRQ 9 PCI: Setting latency timer of device :00:1f.5 to 64 ACPI: PCI interrupt :01:00.0[A] - GSI 11 (level, low) - IRQ 11 ehci_hcd :02:01.2: USB 2.0 restarted, EHCI 0.95, driver 10 Dec 2004 ACPI: PCI interrupt :02:0c.0[A] - GSI 9 (level, low) - IRQ 9 e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex Writing data to swap (7879 pages)... done Writing pagedir (31 pages) S| Powering off system Debug: sleeping function called from invalid context at include/linux/rwsem.h:66 in_atomic():0, irqs_disabled():1 [c010318d] dump_stack+0x19/0x20 [c0111731] __might_sleep+0x91/0x9c [c0285872] device_shutdown+0x16/0x82 [c012aa97] power_down+0x47/0x74 [c012ac5a] pm_suspend_disk+0x5a/0x74 [c01292ea] enter_state+0x2e/0x70 [c0129336] software_suspend+0xa/0x10 [c024a8a7] acpi_system_write_sleep+0x73/0x98 [c0149f1b] vfs_write+0xaf/0x118 [c014a028] sys_write+0x3c/0x68 [c0102c05] sysenter_past_esp+0x52/0x75 Synchronizing SCSI cache for disk sda: Shutdown: hda acpi_power_off called - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Resume on SMP locks up. Relocating pagedir | Reading image data (8157 pages): 100% 8157 done. Stopping tasks: | Freeing memory... done (0 pages freed) Freezing CPUs (at 1)...Sleeping in: [c0103c1d] dump_stack+0x19/0x20 [c0133c7f] smp_pause+0x1f/0x54 [c010ee27] smp_call_function_interrupt+0x3b/0x60 [c01037d4] call_function_interrupt+0x1c/0x24 [c010] cpu_idle+0x55/0x64 [c05929ed] start_secondary+0x71/0x78 [] 0x0 [cffa5fbc] 0xcffa5fbc ok double fault, gdt at c1203260 [255 bytes] NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers: Modules linked in: video thermal processor pcc_acpi fan button battery ac CPU:1 EIP:0060:[c0133c96]Not tainted VLI EFLAGS: 0002 (2.6.11-rc5) EIP is at smp_pause+0x36/0x54 eax: 0001 ebx: cffa5f20 ecx: fffbe4e6 edx: cffa5f20 esi: cffa4000 edi: 0080 ebp: cffa5f58 esp: cffa5f1c ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=cffa4000 task=c18ac540) Stack: 007b 0068 80050033 005d3000 06f0 00ff0001 c120b260 07ff5f4c c0577000 0088 cffa0080 c011eed4 cffa5f68 cffa5f68 c010ee27 0001 cffa5fa4 c01037d4 0001 c120b260 fffbe4e5 Call Trace: [c0103bf7] show_stack+0x7b/0x88 [c0103d36] show_registers+0x112/0x188 [c01046f1] die_nmi+0x41/0x74 [c010fcb4] nmi_watchdog_tick+0x54/0xcc [c0104797] default_do_nmi+0x73/0xfc [c0104865] do_nmi+0x39/0x4c [c010395c] nmi_stack_correct+0x1d/0x2a [c010ee27] smp_call_function_interrupt+0x3b/0x60 [c01037d4] call_function_interrupt+0x1c/0x24 [c010] cpu_idle+0x55/0x64 [c05929ed] start_secondary+0x71/0x78 [] 0x0 [cffa5fbc] 0xcffa5fbc Code: e8 60 e0 24 00 68 0c 7a 40 c0 e8 c2 68 fe ff e8 85 ff fc ff 83 c4 08 f0 ff 05 4c 20 5e c0 a1 50 20 5e c0 89 da 85 c0 74 0b f3 console shuts up ... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! btw, suspend is a bit messy. The disk spins down. Then up. Then down again. And: Yes, that's known, pm_message_t needs to become struct to solve disk pingpong properly. Debug: sleeping function called from invalid context at mm/slab.c:2082 in_atomic():0, irqs_disabled():1 [c010318d] dump_stack+0x19/0x20 [c0111731] __might_sleep+0x91/0x9c [c01365df] kmem_cache_alloc+0x23/0x84 [c0232d50] acpi_evaluate_integer+0x3c/0xac [c024b3d9] acpi_bus_get_status+0x39/0x94 [c024ca99] acpi_pci_link_set+0x16d/0x1e8 [c024ce65] acpi_pci_link_resume+0x1d/0x28 [c024ce8a] irqrouter_resume+0x1a/0x38 [c0281e3c] sysdev_resume+0x2c/0xae [c0285ea8] device_power_up+0x8/0x11 [c012a873] swsusp_suspend+0x4b/0x58 [c012ac35] pm_suspend_disk+0x35/0x74 [c01292ea] enter_state+0x2e/0x70 [c0129336] software_suspend+0xa/0x10 [c024a8a7] acpi_system_write_sleep+0x73/0x98 [c0149f1b] vfs_write+0xaf/0x118 [c014a028] sys_write+0x3c/0x68 [c0102c05] sysenter_past_esp+0x52/0x75 ACPI problem, patches are available (s/GFP_KERNEL/GFP_ATOMIC), but Len claims better solution is ready... OTOH he claims that for half a year already so we may push him a bit (added to cc). Powering off system Debug: sleeping function called from invalid context at include/linux/rwsem.h:66 in_atomic():0, irqs_disabled():1 [c010318d] dump_stack+0x19/0x20 [c0111731] __might_sleep+0x91/0x9c [c0285872] device_shutdown+0x16/0x82 [c012aa97] power_down+0x47/0x74 [c012ac5a] pm_suspend_disk+0x5a/0x74 [c01292ea] enter_state+0x2e/0x70 [c0129336] software_suspend+0xa/0x10 [c024a8a7] acpi_system_write_sleep+0x73/0x98 [c0149f1b] vfs_write+0xaf/0x118 [c014a028] sys_write+0x3c/0x68 [c0102c05] sysenter_past_esp+0x52/0x75 I'll look at this one. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! In `subj` kernel, machine no longer powers down at the end of swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. Binary searching indicates that this is due to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch. I'll drop it. That patch is pretty ugly-looking anyway (ACPI code in drivers/base/power/?). Perhaps someone who is hitting the problem which that patch addresses could raise a bugzilla entry. Oh. It has one. http://bugme.osdl.org/show_bug.cgi?id=4041 Anyway. It needs more work. Yes, the patch is very ugly. If something like this needs to be done, then perhaps acpi should properly register into driver model and do the work there. This will also mean code will be called consistently. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! Resume on SMP locks up. Does it work on UP kernel on same hardware? NMI watchdog is problem for suspend, it takes long to do various phases. Can you disable it for testing? Pavel Relocating pagedir | Reading image data (8157 pages): 100% 8157 done. Stopping tasks: | Freeing memory... done (0 pages freed) Freezing CPUs (at 1)...Sleeping in: [c0103c1d] dump_stack+0x19/0x20 [c0133c7f] smp_pause+0x1f/0x54 [c010ee27] smp_call_function_interrupt+0x3b/0x60 [c01037d4] call_function_interrupt+0x1c/0x24 [c010] cpu_idle+0x55/0x64 [c05929ed] start_secondary+0x71/0x78 [] 0x0 [cffa5fbc] 0xcffa5fbc ok double fault, gdt at c1203260 [255 bytes] NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers: Modules linked in: video thermal processor pcc_acpi fan button battery ac CPU:1 EIP:0060:[c0133c96]Not tainted VLI EFLAGS: 0002 (2.6.11-rc5) EIP is at smp_pause+0x36/0x54 eax: 0001 ebx: cffa5f20 ecx: fffbe4e6 edx: cffa5f20 esi: cffa4000 edi: 0080 ebp: cffa5f58 esp: cffa5f1c ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=cffa4000 task=c18ac540) Stack: 007b 0068 80050033 005d3000 06f0 00ff0001 c120b260 07ff5f4c c0577000 0088 cffa0080 c011eed4 cffa5f68 cffa5f68 c010ee27 0001 cffa5fa4 c01037d4 0001 c120b260 fffbe4e5 Call Trace: [c0103bf7] show_stack+0x7b/0x88 [c0103d36] show_registers+0x112/0x188 [c01046f1] die_nmi+0x41/0x74 [c010fcb4] nmi_watchdog_tick+0x54/0xcc [c0104797] default_do_nmi+0x73/0xfc [c0104865] do_nmi+0x39/0x4c [c010395c] nmi_stack_correct+0x1d/0x2a [c010ee27] smp_call_function_interrupt+0x3b/0x60 [c01037d4] call_function_interrupt+0x1c/0x24 [c010] cpu_idle+0x55/0x64 [c05929ed] start_secondary+0x71/0x78 [] 0x0 [cffa5fbc] 0xcffa5fbc Code: e8 60 e0 24 00 68 0c 7a 40 c0 e8 c2 68 fe ff e8 85 ff fc ff 83 c4 08 f0 ff 05 4c 20 5e c0 a1 50 20 5e c0 89 da 85 c0 74 0b f3 console shuts up ... -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Andrew Morton [EMAIL PROTECTED] writes: Pavel Machek [EMAIL PROTECTED] wrote: In `subj` kernel, machine no longer powers down at the end of swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. Binary searching indicates that this is due to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch. I'll drop it. That patch is pretty ugly-looking anyway (ACPI code in drivers/base/power/?). Perhaps someone who is hitting the problem which that patch addresses could raise a bugzilla entry. Oh. It has one. http://bugme.osdl.org/show_bug.cgi?id=4041 Anyway. It needs more work. Agreed. I threw it together to test a specific code path, and the fact it fails in software suspend is actually almost confirmation that I am on the right track. This actually fixed the case I was testing. In this case the failure is simply because system_state is not set to SYSTEM_POWER_OFF before kernel/power/disk.c:power_down() calls device_shutdown(). The appropriate reboot notifier is also not called.. So to fix this properly all of the places that call machine_power_off now need to call a wrapper that does all of the appropriate things and then calls machine_power_off. Likewise with the other reboot functions. In addition a clean way to get device_shutdown() to call acpi_power_off_prepare() at roughly the location I have it hard coded. The fundamental issue this patch was starting to address before I ran out of steam, is that acpi_power_off_prepare() must be called with interrupts enabled and after we have shut down the system devices (i.e. the interrupt controllers) we can't guarantee interrupts, are working. I'm don't know how much earlier it is safe to acpi_power_off_prepare(). But mostly I think we need to throw in a fake device to attach acpi_power_off_prepare to. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Pavel Machek [EMAIL PROTECTED] writes: Yes, the patch is very ugly. If something like this needs to be done, then perhaps acpi should properly register into driver model and do the work there. This will also mean code will be called consistently. I totally agree. Do you have an example of how a non-device can do this? In particular something that gets as close to shutting down the system devices as possible. But gets called before that. Or perhaps acpi should simply be setup to be the first system device? Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! Yes, the patch is very ugly. If something like this needs to be done, then perhaps acpi should properly register into driver model and do the work there. This will also mean code will be called consistently. I totally agree. Do you have an example of how a non-device can do this? In particular something that gets as close to shutting down the system devices as possible. But gets called before that. Or perhaps acpi should simply be setup to be the first system device? I believe that's the prefered solution. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! In `subj` kernel, machine no longer powers down at the end of swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. Binary searching indicates that this is due to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch. I'll drop it. That patch is pretty ugly-looking anyway (ACPI code in drivers/base/power/?). Perhaps someone who is hitting the problem which that patch addresses could raise a bugzilla entry. Oh. It has one. http://bugme.osdl.org/show_bug.cgi?id=4041 Anyway. It needs more work. Agreed. I threw it together to test a specific code path, and the fact it fails in software suspend is actually almost confirmation that I am on the right track. This actually fixed the case I was testing. In this case the failure is simply because system_state is not set to SYSTEM_POWER_OFF before kernel/power/disk.c:power_down() calls device_shutdown(). The appropriate reboot notifier is also not called.. Can you suggest patch to do it right? Or perhaps there should be just_plain_power_machine_down() that does all neccessary trickery? Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! btw, suspend is a bit messy. The disk spins down. Then up. Then down again. And: Yes, this is going to be properly solved by switching pm_message_t to struct (preview patch attached, EVENT will become .event, this is just for me). I could do some hack to make disk not go up-down-up (and will need to do it for suse9.3, anyway), but I do not think that would belong to mainline. Powering off system Debug: sleeping function called from invalid context at include/linux/rwsem.h:66 in_atomic():0, irqs_disabled():1 [c010318d] dump_stack+0x19/0x20 [c0111731] __might_sleep+0x91/0x9c [c0285872] device_shutdown+0x16/0x82 [c012aa97] power_down+0x47/0x74 [c012ac5a] pm_suspend_disk+0x5a/0x74 [c01292ea] enter_state+0x2e/0x70 [c0129336] software_suspend+0xa/0x10 [c024a8a7] acpi_system_write_sleep+0x73/0x98 [c0149f1b] vfs_write+0xaf/0x118 [c014a028] sys_write+0x3c/0x68 [c0102c05] sysenter_past_esp+0x52/0x75 Synchronizing SCSI cache for disk sda: Shutdown: hda acpi_power_off called Hmm, device_shutdown is confused. Should it be called with interrupts enabled or disabled? It uses rwsem, that suggests interrupts enabled, but I do not think sysdev_shutdown with enabled interrupts is good idea (and comment suggests it should be called with interrupts disabled). Pavel /** * We handle system devices differently - we suspend and shut them * down last and resume them first. That way, we don't do anything stupid like * shutting down the interrupt controller before any devices.. * * Note that there are not different stages for power management calls - * they only get one called once when interrupts are disabled. */ extern int sysdev_shutdown(void); /** * device_shutdown - call -shutdown() on each device to shutdown. */ void device_shutdown(void) { struct device * dev; down_write(devices_subsys.rwsem); list_for_each_entry_reverse(dev, devices_subsys.kset.list, kobj.entry) { pr_debug(shutting down %s: , dev-bus_id); if (dev-driver dev-driver-shutdown) { pr_debug(Ok\n); dev-driver-shutdown(dev); } else pr_debug(Ignored.\n); } up_write(devices_subsys.rwsem); sysdev_shutdown(); } -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! btw, suspend is a bit messy. The disk spins down. Then up. Then down again. And: Here's preview patch to make disk not do stupid yo-yo. Please do not apply (it will probably not apply cleanly anyway). I can fix disk going yo-yo without switching pm_message_t to struct, but will have to back parts of that later. Do you want patch? Pavel --- clean/drivers/base/power/resume.c 2004-12-25 13:34:59.0 +0100 +++ linux/drivers/base/power/resume.c 2005-02-28 15:38:51.0 +0100 @@ -41,7 +41,7 @@ list_add_tail(entry, dpm_active); up(dpm_list_sem); - if (!dev-power.prev_state) + if (!dev-power.prev_state EVENT) resume_device(dev); down(dpm_list_sem); put_device(dev); --- clean/drivers/base/power/runtime.c 2005-01-12 11:07:39.0 +0100 +++ linux/drivers/base/power/runtime.c 2005-02-28 15:42:10.0 +0100 @@ -13,10 +13,10 @@ static void runtime_resume(struct device * dev) { dev_dbg(dev, resuming\n); - if (!dev-power.power_state) + if (!dev-power.power_state EVENT) return; if (!resume_device(dev)) - dev-power.power_state = 0; + dev-power.power_state = PMSG_ON; } @@ -49,10 +49,10 @@ int error = 0; down(dpm_sem); - if (dev-power.power_state == state) + if (dev-power.power_state EVENT == state EVENT) goto Done; - if (dev-power.power_state) + if (dev-power.power_state EVENT) runtime_resume(dev); if (!(error = suspend_device(dev, state))) --- clean/drivers/base/power/shutdown.c 2004-08-15 19:14:55.0 +0200 +++ linux/drivers/base/power/shutdown.c 2005-01-12 10:57:23.0 +0100 @@ -29,7 +29,8 @@ dev-driver-shutdown(dev); return 0; } - return dpm_runtime_suspend(dev, dev-detach_state); + /* FIXME */ + return dpm_runtime_suspend(dev, PMSG_FREEZE); } --- clean/drivers/base/power/suspend.c 2005-01-12 11:07:39.0 +0100 +++ linux/drivers/base/power/suspend.c 2005-02-28 21:30:13.0 +0100 @@ -43,7 +43,7 @@ dev-power.prev_state = dev-power.power_state; - if (dev-bus dev-bus-suspend !dev-power.power_state) + if (dev-bus dev-bus-suspend (!dev-power.power_state EVENT)) error = dev-bus-suspend(dev, state); return error; @@ -134,6 +134,8 @@ Done: return error; Error: + printk(KERN_ERR Could not power down device %s: + error %d\n, kobject_name(dev-kobj), error); dpm_power_up(); goto Done; } --- clean/drivers/base/power/sysfs.c2004-08-15 19:14:55.0 +0200 +++ linux/drivers/base/power/sysfs.c2005-02-28 15:43:57.0 +0100 @@ -26,19 +26,20 @@ static ssize_t state_show(struct device * dev, char * buf) { - return sprintf(buf, %u\n, dev-power.power_state); + return sprintf(buf, %u\n, dev-power.power_state EVENT); } static ssize_t state_store(struct device * dev, const char * buf, size_t n) { - u32 state; + pm_message_t state; char * rest; int error = 0; - state = simple_strtoul(buf, rest, 10); + state EVENT = simple_strtoul(buf, rest, 10); +// state.flags = PFL_RUNTIME; if (*rest) return -EINVAL; - if (state) + if (state EVENT) error = dpm_runtime_suspend(dev, state); else dpm_runtime_resume(dev); --- clean/drivers/ide/ide-disk.c2005-02-14 14:12:21.0 +0100 +++ linux/drivers/ide/ide-disk.c2005-02-14 22:34:43.0 +0100 @@ -872,7 +872,7 @@ { switch (rq-pm-pm_step) { case idedisk_pm_flush_cache:/* Suspend step 1 (flush cache) complete */ - if (rq-pm-pm_state == 4) + if (rq-pm-pm_state == EVENT_FREEZE) rq-pm-pm_step = ide_pm_state_completed; else rq-pm-pm_step = idedisk_pm_standby; @@ -1155,8 +1155,7 @@ return; } - printk(Shutdown: %s\n, drive-name); - dev-bus-suspend(dev, PM_SUSPEND_STANDBY); + dev-bus-suspend(dev, PMSG_SUSPEND); } /* --- clean/drivers/ide/ide.c 2005-02-28 00:50:42.0 +0100 +++ linux/drivers/ide/ide.c 2005-02-28 15:48:21.0 +0100 @@ -1398,7 +1398,7 @@ rq.special = args; rq.pm = rqpm; rqpm.pm_step = ide_pm_state_start_suspend; - rqpm.pm_state = state; + rqpm.pm_state = state EVENT; return ide_do_drive_cmd(drive, rq, ide_wait); } @@ -1417,7 +1417,7 @@ rq.special = args; rq.pm = rqpm; rqpm.pm_step = ide_pm_state_start_resume; - rqpm.pm_state = 0; + rqpm.pm_state = EVENT_ON; return
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Pavel Machek [EMAIL PROTECTED] writes: I threw it together to test a specific code path, and the fact it fails in software suspend is actually almost confirmation that I am on the right track. This actually fixed the case I was testing. In this case the failure is simply because system_state is not set to SYSTEM_POWER_OFF before kernel/power/disk.c:power_down() calls device_shutdown(). The appropriate reboot notifier is also not called.. Can you suggest patch to do it right? Or perhaps there should be just_plain_power_machine_down() that does all neccessary trickery? I would call it kernel_power_down() and that is what I am suggesting is the right fix. We have it open coded in kernel/sys.c:sys_reboot() in the switch case for: LINUX_REBOOT_CMD_POWER_OFF So after the code gets factored out from there all of the cases that call machine_power_off() and pm_power_off() directly need to be updated. There are similar cases for machine_restart() and machine_halt(). But the power off case seems to be the most acute. My biggest problem with this is I get into the recursive code cleanup problem. Where I fix one piece and a bug is exposed somewhere else. And that then requires investigation and fixing. Fixing the callers of machine_power_off() is about the fifth bug fix down the chain triggered by disabling UP interrupts in device_shutdown(), SMP interrupts have always been disabled. With the first bug fix was to create system devices in the device tree.. I haven't a clue where fixing this one will lead. Recursive code fixes are a hard thing to schedule :( Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Pavel Machek [EMAIL PROTECTED] wrote: Hi! Resume on SMP locks up. Does it work on UP kernel on same hardware? yup. NMI watchdog is problem for suspend, it takes long to do various phases. Can you disable it for testing? Will try to remember to do that. Relocating pagedir | Reading image data (8157 pages): 100% 8157 done. Stopping tasks: | Freeing memory... done (0 pages freed) Freezing CPUs (at 1)...Sleeping in: [c0103c1d] dump_stack+0x19/0x20 [c0133c7f] smp_pause+0x1f/0x54 [c010ee27] smp_call_function_interrupt+0x3b/0x60 [c01037d4] call_function_interrupt+0x1c/0x24 [c010] cpu_idle+0x55/0x64 [c05929ed] start_secondary+0x71/0x78 [] 0x0 [cffa5fbc] 0xcffa5fbc ok double fault, gdt at c1203260 [255 bytes] NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers: Note the double fault. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Pavel Machek [EMAIL PROTECTED] wrote: I can fix disk going yo-yo without switching pm_message_t to struct, but will have to back parts of that later. Do you want patch? No thanks, I was just pointing it out. It sounds like you have it under control. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! Relocating pagedir | Reading image data (8157 pages): 100% 8157 done. Stopping tasks: | Freeing memory... done (0 pages freed) Freezing CPUs (at 1)...Sleeping in: [c0103c1d] dump_stack+0x19/0x20 [c0133c7f] smp_pause+0x1f/0x54 [c010ee27] smp_call_function_interrupt+0x3b/0x60 [c01037d4] call_function_interrupt+0x1c/0x24 [c010] cpu_idle+0x55/0x64 [c05929ed] start_secondary+0x71/0x78 [] 0x0 [cffa5fbc] 0xcffa5fbc ok double fault, gdt at c1203260 [255 bytes] NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers: Note the double fault. Yes, I can see it, it scares me. SMP swsusp is not in good state because I do not have easy access to SMP or HT hardware. I guess I'll just have to get into suse at the night and steal some P4 ;-). Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Le 01.03.2005 00:17, Pavel Machek a écrit : Hi! In `subj` kernel, machine no longer powers down at the end of swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. Pavel Hello, I noticed this behaviour, too. Can't remember if it came with 2.6.11-rc3-mm2 or with 2.6.11-rc4-mm1. Didn't try another kernel. I was able to workaround this problem by doing "echo platform > /sys/power/disk" before "echo disk > /sys/power/state" The box is a desktop with an asus A7V133 mb (VIA 82Cxxx chipset), Athlon XP 1600+ CPU and NVidia Geforce2 MX400 graphics. ~~ laurent signature.asc Description: OpenPGP digital signature
2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! In `subj` kernel, machine no longer powers down at the end of swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Hi! In `subj` kernel, machine no longer powers down at the end of swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. Pavel -- People were complaining that M$ turns users into beta-testers... ...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown
Le 01.03.2005 00:17, Pavel Machek a écrit : Hi! In `subj` kernel, machine no longer powers down at the end of swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk. Pavel Hello, I noticed this behaviour, too. Can't remember if it came with 2.6.11-rc3-mm2 or with 2.6.11-rc4-mm1. Didn't try another kernel. I was able to workaround this problem by doing echo platform /sys/power/disk before echo disk /sys/power/state The box is a desktop with an asus A7V133 mb (VIA 82Cxxx chipset), Athlon XP 1600+ CPU and NVidia Geforce2 MX400 graphics. ~~ laurent signature.asc Description: OpenPGP digital signature