subject:"2.6.11\-rc4\-mm1\: something is wrong with swsusp powerdown"

Hi!

> > > Relocating pagedir | 
> > > Reading image data (8157 pages): 100% 8157 done.
> > > Stopping tasks: |   
> > > Freeing memory... done (0 pages freed)
> > > Freezing CPUs (at 1)...Sleeping in:   
> > >  [] dump_stack+0x19/0x20 
> > >  [] smp_pause+0x1f/0x54 
> > >  [] smp_call_function_interrupt+0x3b/0x60
> > >  [] call_function_interrupt+0x1c/0x24
> > >  [] cpu_idle+0x55/0x64   
> > >  [] start_secondary+0x71/0x78
> > >  [<>] 0x0  
> > >  [] 0xcffa5fbc
> > > ok  
> > > double fault, gdt at c1203260 [255 bytes]
> > > NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers:
> 
> Note the double fault.

Yes, I can see it, it scares me. SMP swsusp is not in good state
because I do not have easy access to SMP or HT hardware. I guess I'll
just have to get into suse at the night and steal some P4 ;-).

Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Pavel Machek <[EMAIL PROTECTED]> wrote:
>
> I can fix disk going yo-yo without switching pm_message_t to struct,
>  but will have to back parts of that later. Do you want patch?

No thanks, I was just pointing it out.  It sounds like you have it under
control.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Pavel Machek <[EMAIL PROTECTED]> wrote:
>
> Hi!
> 
> > Resume on SMP locks up.
> 
> Does it work on UP kernel on same hardware?

yup.

> NMI watchdog is problem
> for suspend, it takes long to do various phases. Can you disable it
> for testing?

Will try to remember to do that.

> > Relocating pagedir | 
> > Reading image data (8157 pages): 100% 8157 done.
> > Stopping tasks: |   
> > Freeing memory... done (0 pages freed)
> > Freezing CPUs (at 1)...Sleeping in:   
> >  [] dump_stack+0x19/0x20 
> >  [] smp_pause+0x1f/0x54 
> >  [] smp_call_function_interrupt+0x3b/0x60
> >  [] call_function_interrupt+0x1c/0x24
> >  [] cpu_idle+0x55/0x64   
> >  [] start_secondary+0x71/0x78
> >  [<>] 0x0  
> >  [] 0xcffa5fbc
> > ok  
> > double fault, gdt at c1203260 [255 bytes]
> > NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers:

Note the double fault.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Pavel Machek <[EMAIL PROTECTED]> writes:

> > I threw it together to test a specific code path, and the fact it
> > fails in software suspend is actually almost confirmation that I am on
> > the right track.  This actually fixed the case I was testing.
> > 
> > In this case the failure is simply because system_state is
> > not set to SYSTEM_POWER_OFF before
> > kernel/power/disk.c:power_down() calls device_shutdown().
> > The appropriate reboot notifier is also not called..
> 
> Can you suggest patch to do it right? Or perhaps there should be
> just_plain_power_machine_down() that does all neccessary
> trickery?

I would call it kernel_power_down() and that
is what I am suggesting is the right fix.

We have it open coded in kernel/sys.c:sys_reboot()
in the switch case for: LINUX_REBOOT_CMD_POWER_OFF

So after the code gets factored out from there all
of the cases that call machine_power_off() and pm_power_off()
directly need to be updated.

There are similar cases for machine_restart() and machine_halt().
But the power off case seems to be the most acute.

My biggest problem with this is I get into the recursive code
cleanup problem.  Where I fix one piece and a bug is exposed somewhere
else.  And that then requires investigation and fixing.

Fixing the callers of machine_power_off() is about the fifth bug
fix down the chain triggered by disabling UP interrupts in
device_shutdown(), SMP interrupts have always been disabled.  With the
first bug fix was to create system devices in the device tree..

I haven't a clue where fixing this one will lead.  Recursive
code fixes are a hard thing to schedule :(

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Hi!

> btw, suspend is a bit messy.  The disk spins down.  Then up.  Then down
> again.  And:

Here's preview patch to make disk not do stupid yo-yo. Please do not
apply (it will probably not apply cleanly anyway).

I can fix disk going yo-yo without switching pm_message_t to struct,
but will have to back parts of that later. Do you want patch?
Pavel

--- clean/drivers/base/power/resume.c   2004-12-25 13:34:59.0 +0100
+++ linux/drivers/base/power/resume.c   2005-02-28 15:38:51.0 +0100
@@ -41,7 +41,7 @@
list_add_tail(entry, _active);
 
up(_list_sem);
-   if (!dev->power.prev_state)
+   if (!dev->power.prev_state EVENT)
resume_device(dev);
down(_list_sem);
put_device(dev);
--- clean/drivers/base/power/runtime.c  2005-01-12 11:07:39.0 +0100
+++ linux/drivers/base/power/runtime.c  2005-02-28 15:42:10.0 +0100
@@ -13,10 +13,10 @@
 static void runtime_resume(struct device * dev)
 {
dev_dbg(dev, "resuming\n");
-   if (!dev->power.power_state)
+   if (!dev->power.power_state EVENT)
return;
if (!resume_device(dev))
-   dev->power.power_state = 0;
+   dev->power.power_state = PMSG_ON;
 }
 
 
@@ -49,10 +49,10 @@
int error = 0;
 
down(_sem);
-   if (dev->power.power_state == state)
+   if (dev->power.power_state EVENT == state EVENT)
goto Done;
 
-   if (dev->power.power_state)
+   if (dev->power.power_state EVENT)
runtime_resume(dev);
 
if (!(error = suspend_device(dev, state)))
--- clean/drivers/base/power/shutdown.c 2004-08-15 19:14:55.0 +0200
+++ linux/drivers/base/power/shutdown.c 2005-01-12 10:57:23.0 +0100
@@ -29,7 +29,8 @@
dev->driver->shutdown(dev);
return 0;
}
-   return dpm_runtime_suspend(dev, dev->detach_state);
+   /* FIXME */
+   return dpm_runtime_suspend(dev, PMSG_FREEZE);
 }
 
 
--- clean/drivers/base/power/suspend.c  2005-01-12 11:07:39.0 +0100
+++ linux/drivers/base/power/suspend.c  2005-02-28 21:30:13.0 +0100
@@ -43,7 +43,7 @@
 
dev->power.prev_state = dev->power.power_state;
 
-   if (dev->bus && dev->bus->suspend && !dev->power.power_state)
+   if (dev->bus && dev->bus->suspend && (!dev->power.power_state EVENT))
error = dev->bus->suspend(dev, state);
 
return error;
@@ -134,6 +134,8 @@
  Done:
return error;
  Error:
+   printk(KERN_ERR "Could not power down device %s: "
+   "error %d\n", kobject_name(>kobj), error);
dpm_power_up();
goto Done;
 }
--- clean/drivers/base/power/sysfs.c2004-08-15 19:14:55.0 +0200
+++ linux/drivers/base/power/sysfs.c2005-02-28 15:43:57.0 +0100
@@ -26,19 +26,20 @@
 
 static ssize_t state_show(struct device * dev, char * buf)
 {
-   return sprintf(buf, "%u\n", dev->power.power_state);
+   return sprintf(buf, "%u\n", dev->power.power_state EVENT);
 }
 
 static ssize_t state_store(struct device * dev, const char * buf, size_t n)
 {
-   u32 state;
+   pm_message_t state;
char * rest;
int error = 0;
 
-   state = simple_strtoul(buf, , 10);
+   state EVENT = simple_strtoul(buf, , 10);
+// state.flags = PFL_RUNTIME;
if (*rest)
return -EINVAL;
-   if (state)
+   if (state EVENT)
error = dpm_runtime_suspend(dev, state);
else
dpm_runtime_resume(dev);
--- clean/drivers/ide/ide-disk.c2005-02-14 14:12:21.0 +0100
+++ linux/drivers/ide/ide-disk.c2005-02-14 22:34:43.0 +0100
@@ -872,7 +872,7 @@
 {
switch (rq->pm->pm_step) {
case idedisk_pm_flush_cache:/* Suspend step 1 (flush cache) 
complete */
-   if (rq->pm->pm_state == 4)
+   if (rq->pm->pm_state == EVENT_FREEZE)
rq->pm->pm_step = ide_pm_state_completed;
else
rq->pm->pm_step = idedisk_pm_standby;
@@ -1155,8 +1155,7 @@
return;
}
 
-   printk("Shutdown: %s\n", drive->name);
-   dev->bus->suspend(dev, PM_SUSPEND_STANDBY);
+   dev->bus->suspend(dev, PMSG_SUSPEND);
 }
 
 /*
--- clean/drivers/ide/ide.c 2005-02-28 00:50:42.0 +0100
+++ linux/drivers/ide/ide.c 2005-02-28 15:48:21.0 +0100
@@ -1398,7 +1398,7 @@
rq.special = 
rq.pm = 
rqpm.pm_step = ide_pm_state_start_suspend;
-   rqpm.pm_state = state;
+   rqpm.pm_state = state EVENT;
 
return ide_do_drive_cmd(drive, , ide_wait);
 }
@@ -1417,7 +1417,7 @@
rq.special = 
rq.pm = 
rqpm.pm_step = ide_pm_state_start_resume;
-   rqpm.pm_state = 0;
+   rqpm.pm_state = EVENT_ON;

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Hi!

> btw, suspend is a bit messy.  The disk spins down.  Then up.  Then down
> again.  And:

Yes, this is going to be properly solved by switching pm_message_t to
struct (preview patch attached, EVENT will become .event, this is just
for me). I could do some hack to make disk not go up-down-up (and will
need to do it for suse9.3, anyway), but I do not think that would
belong to mainline.

> Powering off system
> Debug: sleeping function called from invalid context at 
> include/linux/rwsem.h:66
> in_atomic():0, irqs_disabled():1  
>   
>  [] dump_stack+0x19/0x20
>  [] __might_sleep+0x91/0x9c
>  [] device_shutdown+0x16/0x82
>  [] power_down+0x47/0x74 
>  [] pm_suspend_disk+0x5a/0x74
>  [] enter_state+0x2e/0x70
>  [] software_suspend+0xa/0x10
>  [] acpi_system_write_sleep+0x73/0x98
>  [] vfs_write+0xaf/0x118 
>  [] sys_write+0x3c/0x68 
>  [] sysenter_past_esp+0x52/0x75
> Synchronizing SCSI cache for disk sda:   
> Shutdown: hda  
> acpi_power_off called

Hmm, device_shutdown is confused. Should it be called with interrupts
enabled or disabled? It uses rwsem, that suggests interrupts enabled,
but I do not think sysdev_shutdown with enabled interrupts is good
idea (and comment suggests it should be called with interrupts disabled).

Pavel

/**
 * We handle system devices differently - we suspend and shut them
 * down last and resume them first. That way, we don't do anything
stupid like
 * shutting down the interrupt controller before any devices..
 *
 * Note that there are not different stages for power management calls
-
 * they only get one called once when interrupts are disabled.
 */

extern int sysdev_shutdown(void);

/**
 * device_shutdown - call ->shutdown() on each device to shutdown.
 */
void device_shutdown(void)
{
struct device * dev;

down_write(_subsys.rwsem);
list_for_each_entry_reverse(dev, _subsys.kset.list, kobj.entry) 
{
pr_debug("shutting down %s: ", dev->bus_id);
if (dev->driver && dev->driver->shutdown) {
pr_debug("Ok\n");
dev->driver->shutdown(dev);
} else
pr_debug("Ignored.\n");
}
up_write(_subsys.rwsem);

sysdev_shutdown();
}



-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Hi!

> > > In `subj` kernel, machine no longer powers down at the end of
> > >  swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk.
> > 
> > Binary searching indicates that this is due to
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch.
> > 
> > 
> > I'll drop it.  That patch is pretty ugly-looking anyway (ACPI code in
> > drivers/base/power/?).
> > 
> > Perhaps someone who is hitting the problem which that patch addresses could
> > raise a bugzilla entry.
> > 
> > Oh.  It has one.  http://bugme.osdl.org/show_bug.cgi?id=4041
> > 
> > Anyway.  It needs more work.
> 
> Agreed.
> 
> I threw it together to test a specific code path, and the fact it
> fails in software suspend is actually almost confirmation that I am on
> the right track.  This actually fixed the case I was testing.
> 
> In this case the failure is simply because system_state is
> not set to SYSTEM_POWER_OFF before
> kernel/power/disk.c:power_down() calls device_shutdown().
> The appropriate reboot notifier is also not called..

Can you suggest patch to do it right? Or perhaps there should be
just_plain_power_machine_down() that does all neccessary
trickery?
Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Hi!

> > Yes, the patch is very ugly. If something like this needs to be done,
> > then perhaps acpi should properly register into driver model and do
> > the work there. This will also mean code will be called consistently.
> 
> I totally agree.  Do you have an example of how a non-device
> can do this?
> 
> In particular something that gets as close to shutting down
> the system devices as possible.  But gets called before that.
> 
> Or perhaps acpi should simply be setup to be the first system device?

I believe that's the prefered solution.
Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Pavel Machek <[EMAIL PROTECTED]> writes:

> Yes, the patch is very ugly. If something like this needs to be done,
> then perhaps acpi should properly register into driver model and do
> the work there. This will also mean code will be called consistently.

I totally agree.  Do you have an example of how a non-device
can do this?

In particular something that gets as close to shutting down
the system devices as possible.  But gets called before that.

Or perhaps acpi should simply be setup to be the first system device?

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Andrew Morton <[EMAIL PROTECTED]> writes:

> Pavel Machek <[EMAIL PROTECTED]> wrote:
> >
> > In `subj` kernel, machine no longer powers down at the end of
> >  swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk.
> 
> Binary searching indicates that this is due to
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch.
> 
> 
> I'll drop it.  That patch is pretty ugly-looking anyway (ACPI code in
> drivers/base/power/?).
> 
> Perhaps someone who is hitting the problem which that patch addresses could
> raise a bugzilla entry.
> 
> Oh.  It has one.  http://bugme.osdl.org/show_bug.cgi?id=4041
> 
> Anyway.  It needs more work.

Agreed.

I threw it together to test a specific code path, and the fact it
fails in software suspend is actually almost confirmation that I am on
the right track.  This actually fixed the case I was testing.

In this case the failure is simply because system_state is
not set to SYSTEM_POWER_OFF before
kernel/power/disk.c:power_down() calls device_shutdown().
The appropriate reboot notifier is also not called..

So to fix this properly all of the places
that call machine_power_off now need to call a wrapper
that does all of the appropriate things and then calls
machine_power_off.

Likewise with the other reboot functions.

In addition a clean way to get device_shutdown() to 
call acpi_power_off_prepare() at roughly the location
I have it hard coded.  

The fundamental issue this patch was starting to address
before I ran out of steam, is that acpi_power_off_prepare()
must be called with interrupts enabled and after we have shut down
the system devices (i.e. the interrupt controllers) we can't
guarantee interrupts, are working.

I'm don't know how much earlier it is safe to
acpi_power_off_prepare().  But mostly I think we need to
throw in a fake device to attach acpi_power_off_prepare to.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Hi!

> Resume on SMP locks up.

Does it work on UP kernel on same hardware? NMI watchdog is problem
for suspend, it takes long to do various phases. Can you disable it
for testing?
Pavel

> Relocating pagedir | 
> Reading image data (8157 pages): 100% 8157 done.
> Stopping tasks: |   
> Freeing memory... done (0 pages freed)
> Freezing CPUs (at 1)...Sleeping in:   
>  [] dump_stack+0x19/0x20 
>  [] smp_pause+0x1f/0x54 
>  [] smp_call_function_interrupt+0x3b/0x60
>  [] call_function_interrupt+0x1c/0x24
>  [] cpu_idle+0x55/0x64   
>  [] start_secondary+0x71/0x78
>  [<>] 0x0  
>  [] 0xcffa5fbc
> ok  
> double fault, gdt at c1203260 [255 bytes]
> NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers:
> Modules linked in: video thermal processor pcc_acpi fan button battery ac
> CPU:1
> EIP:0060:[]Not tainted VLI
> EFLAGS: 0002   (2.6.11-rc5) 
> EIP is at smp_pause+0x36/0x54   
> eax: 0001   ebx: cffa5f20   ecx: fffbe4e6   edx: cffa5f20
> esi: cffa4000   edi: 0080   ebp: cffa5f58   esp: cffa5f1c
> ds: 007b   es: 007b   ss: 0068   
> Process swapper (pid: 0, threadinfo=cffa4000 task=c18ac540)
> Stack:  007b 0068 80050033  005d3000 06f0 
> 00ff0001 
>c120b260 07ff5f4c c0577000 0088 cffa0080 c011eed4 cffa5f68 
> cffa5f68 
>c010ee27  0001 cffa5fa4 c01037d4 0001 c120b260 
> fffbe4e5 
> Call Trace:   
>  
>  [] show_stack+0x7b/0x88
>  [] show_registers+0x112/0x188
>  [] die_nmi+0x41/0x74 
>  [] nmi_watchdog_tick+0x54/0xcc
>  [] default_do_nmi+0x73/0xfc   
>  [] do_nmi+0x39/0x4c
>  [] nmi_stack_correct+0x1d/0x2a
>  [] smp_call_function_interrupt+0x3b/0x60
>  [] call_function_interrupt+0x1c/0x24
>  [] cpu_idle+0x55/0x64   
>  [] start_secondary+0x71/0x78
>  [<>] 0x0  
>  [] 0xcffa5fbc
> Code: e8 60 e0 24 00 68 0c 7a 40 c0 e8 c2 68 fe ff e8 85 ff fc ff 83 c4 08 f0 
> ff 05 4c 20 5e c0 a1 50 20 5e c0 89 da 85 c0 74 0b f3 
> console shuts up ...  
>  
> 

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Hi!

> > In `subj` kernel, machine no longer powers down at the end of
> >  swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk.
> 
> Binary searching indicates that this is due to
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch.
> 
> I'll drop it.  That patch is pretty ugly-looking anyway (ACPI code in
> drivers/base/power/?).
> 
> Perhaps someone who is hitting the problem which that patch addresses could
> raise a bugzilla entry.
> 
> Oh.  It has one.  http://bugme.osdl.org/show_bug.cgi?id=4041
> 
> Anyway.  It needs more work.

Yes, the patch is very ugly. If something like this needs to be done,
then perhaps acpi should properly register into driver model and do
the work there. This will also mean code will be called consistently.

Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Hi!

> btw, suspend is a bit messy.  The disk spins down.  Then up.  Then down
> again.  And:

Yes, that's known, pm_message_t needs to become struct to solve disk
pingpong properly.

> Debug: sleeping function called from invalid context at mm/slab.c:2082
> in_atomic():0, irqs_disabled():1  
>  [] dump_stack+0x19/0x20
>  [] __might_sleep+0x91/0x9c
>  [] kmem_cache_alloc+0x23/0x84
>  [] acpi_evaluate_integer+0x3c/0xac
>  [] acpi_bus_get_status+0x39/0x94  
>  [] acpi_pci_link_set+0x16d/0x1e8
>  [] acpi_pci_link_resume+0x1d/0x28
>  [] irqrouter_resume+0x1a/0x38
>  [] sysdev_resume+0x2c/0xae   
>  [] device_power_up+0x8/0x11
>  [] swsusp_suspend+0x4b/0x58
>  [] pm_suspend_disk+0x35/0x74
>  [] enter_state+0x2e/0x70
>  [] software_suspend+0xa/0x10
>  [] acpi_system_write_sleep+0x73/0x98
>  [] vfs_write+0xaf/0x118 
>  [] sys_write+0x3c/0x68 
>  [] sysenter_past_esp+0x52/0x75

ACPI problem, patches are available (s/GFP_KERNEL/GFP_ATOMIC), but Len
claims better solution is ready... OTOH he claims that for half a year
already so we may push him a bit (added to cc). 

> Powering off system
> Debug: sleeping function called from invalid context at 
> include/linux/rwsem.h:66
> in_atomic():0, irqs_disabled():1  
>   
>  [] dump_stack+0x19/0x20
>  [] __might_sleep+0x91/0x9c
>  [] device_shutdown+0x16/0x82
>  [] power_down+0x47/0x74 
>  [] pm_suspend_disk+0x5a/0x74
>  [] enter_state+0x2e/0x70
>  [] software_suspend+0xa/0x10
>  [] acpi_system_write_sleep+0x73/0x98
>  [] vfs_write+0xaf/0x118 
>  [] sys_write+0x3c/0x68 
>  [] sysenter_past_esp+0x52/0x75

I'll look at this one.
Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown


Resume on SMP locks up.


Relocating pagedir | 
Reading image data (8157 pages): 100% 8157 done.
Stopping tasks: |   
Freeing memory... done (0 pages freed)
Freezing CPUs (at 1)...Sleeping in:   
 [] dump_stack+0x19/0x20 
 [] smp_pause+0x1f/0x54 
 [] smp_call_function_interrupt+0x3b/0x60
 [] call_function_interrupt+0x1c/0x24
 [] cpu_idle+0x55/0x64   
 [] start_secondary+0x71/0x78
 [<>] 0x0  
 [] 0xcffa5fbc
ok  
double fault, gdt at c1203260 [255 bytes]
NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers:
Modules linked in: video thermal processor pcc_acpi fan button battery ac
CPU:1
EIP:0060:[]Not tainted VLI
EFLAGS: 0002   (2.6.11-rc5) 
EIP is at smp_pause+0x36/0x54   
eax: 0001   ebx: cffa5f20   ecx: fffbe4e6   edx: cffa5f20
esi: cffa4000   edi: 0080   ebp: cffa5f58   esp: cffa5f1c
ds: 007b   es: 007b   ss: 0068   
Process swapper (pid: 0, threadinfo=cffa4000 task=c18ac540)
Stack:  007b 0068 80050033  005d3000 06f0 00ff0001 
   c120b260 07ff5f4c c0577000 0088 cffa0080 c011eed4 cffa5f68 cffa5f68 
   c010ee27  0001 cffa5fa4 c01037d4 0001 c120b260 fffbe4e5 
Call Trace:
 [] show_stack+0x7b/0x88
 [] show_registers+0x112/0x188
 [] die_nmi+0x41/0x74 
 [] nmi_watchdog_tick+0x54/0xcc
 [] default_do_nmi+0x73/0xfc   
 [] do_nmi+0x39/0x4c
 [] nmi_stack_correct+0x1d/0x2a
 [] smp_call_function_interrupt+0x3b/0x60
 [] call_function_interrupt+0x1c/0x24
 [] cpu_idle+0x55/0x64   
 [] start_secondary+0x71/0x78
 [<>] 0x0  
 [] 0xcffa5fbc
Code: e8 60 e0 24 00 68 0c 7a 40 c0 e8 c2 68 fe ff e8 85 ff fc ff 83 c4 08 f0 
ff 05 4c 20 5e c0 a1 50 20 5e c0 89 da 85 c0 74 0b f3 
console shuts up ...
   

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown


btw, suspend is a bit messy.  The disk spins down.  Then up.  Then down
again.  And:



Stopping tasks: ==|
Freeing memory... done (7069 pages freed)  
swsusp: Need to copy 7847 pages  
swsusp: critical section/: done (7879 pages copied)
swsusp: Restoring Highmem  
Debug: sleeping function called from invalid context at mm/slab.c:2082
in_atomic():0, irqs_disabled():1  
 [] dump_stack+0x19/0x20
 [] __might_sleep+0x91/0x9c
 [] kmem_cache_alloc+0x23/0x84
 [] acpi_evaluate_integer+0x3c/0xac
 [] acpi_bus_get_status+0x39/0x94  
 [] acpi_pci_link_set+0x16d/0x1e8
 [] acpi_pci_link_resume+0x1d/0x28
 [] irqrouter_resume+0x1a/0x38
 [] sysdev_resume+0x2c/0xae   
 [] device_power_up+0x8/0x11
 [] swsusp_suspend+0x4b/0x58
 [] pm_suspend_disk+0x35/0x74
 [] enter_state+0x2e/0x70
 [] software_suspend+0xa/0x10
 [] acpi_system_write_sleep+0x73/0x98
 [] vfs_write+0xaf/0x118 
 [] sys_write+0x3c/0x68 
 [] sysenter_past_esp+0x52/0x75
PCI: Setting latency timer of device :00:1f.2 to 64
ACPI: PCI interrupt :00:1f.5[B] -> GSI 9 (level, low) -> IRQ 9
PCI: Setting latency timer of device :00:1f.5 to 64   
ACPI: PCI interrupt :01:00.0[A] -> GSI 11 (level, low) -> IRQ 11
ehci_hcd :02:01.2: USB 2.0 restarted, EHCI 0.95, driver 10 Dec 2004
ACPI: PCI interrupt :02:0c.0[A] -> GSI 9 (level, low) -> IRQ 9 
e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex  
Writing data to swap (7879 pages)... done   
Writing pagedir (31 pages)   
S|
Powering off system
Debug: sleeping function called from invalid context at include/linux/rwsem.h:66
in_atomic():0, irqs_disabled():1
 [] dump_stack+0x19/0x20
 [] __might_sleep+0x91/0x9c
 [] device_shutdown+0x16/0x82
 [] power_down+0x47/0x74 
 [] pm_suspend_disk+0x5a/0x74
 [] enter_state+0x2e/0x70
 [] software_suspend+0xa/0x10
 [] acpi_system_write_sleep+0x73/0x98
 [] vfs_write+0xaf/0x118 
 [] sys_write+0x3c/0x68 
 [] sysenter_past_esp+0x52/0x75
Synchronizing SCSI cache for disk sda:   
Shutdown: hda  
acpi_power_off called
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Pavel Machek <[EMAIL PROTECTED]> wrote:
>
> In `subj` kernel, machine no longer powers down at the end of
>  swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk.

Binary searching indicates that this is due to
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch.

I'll drop it.  That patch is pretty ugly-looking anyway (ACPI code in
drivers/base/power/?).

Perhaps someone who is hitting the problem which that patch addresses could
raise a bugzilla entry.

Oh.  It has one.  http://bugme.osdl.org/show_bug.cgi?id=4041

Anyway.  It needs more work.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Pavel Machek [EMAIL PROTECTED] wrote:

 In `subj` kernel, machine no longer powers down at the end of
  swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk.

Binary searching indicates that this is due to
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch.

I'll drop it.  That patch is pretty ugly-looking anyway (ACPI code in
drivers/base/power/?).

Perhaps someone who is hitting the problem which that patch addresses could
raise a bugzilla entry.

Oh.  It has one.  http://bugme.osdl.org/show_bug.cgi?id=4041

Anyway.  It needs more work.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown


btw, suspend is a bit messy.  The disk spins down.  Then up.  Then down
again.  And:



Stopping tasks: ==|
Freeing memory... done (7069 pages freed)  
swsusp: Need to copy 7847 pages  
swsusp: critical section/: done (7879 pages copied)
swsusp: Restoring Highmem  
Debug: sleeping function called from invalid context at mm/slab.c:2082
in_atomic():0, irqs_disabled():1  
 [c010318d] dump_stack+0x19/0x20
 [c0111731] __might_sleep+0x91/0x9c
 [c01365df] kmem_cache_alloc+0x23/0x84
 [c0232d50] acpi_evaluate_integer+0x3c/0xac
 [c024b3d9] acpi_bus_get_status+0x39/0x94  
 [c024ca99] acpi_pci_link_set+0x16d/0x1e8
 [c024ce65] acpi_pci_link_resume+0x1d/0x28
 [c024ce8a] irqrouter_resume+0x1a/0x38
 [c0281e3c] sysdev_resume+0x2c/0xae   
 [c0285ea8] device_power_up+0x8/0x11
 [c012a873] swsusp_suspend+0x4b/0x58
 [c012ac35] pm_suspend_disk+0x35/0x74
 [c01292ea] enter_state+0x2e/0x70
 [c0129336] software_suspend+0xa/0x10
 [c024a8a7] acpi_system_write_sleep+0x73/0x98
 [c0149f1b] vfs_write+0xaf/0x118 
 [c014a028] sys_write+0x3c/0x68 
 [c0102c05] sysenter_past_esp+0x52/0x75
PCI: Setting latency timer of device :00:1f.2 to 64
ACPI: PCI interrupt :00:1f.5[B] - GSI 9 (level, low) - IRQ 9
PCI: Setting latency timer of device :00:1f.5 to 64   
ACPI: PCI interrupt :01:00.0[A] - GSI 11 (level, low) - IRQ 11
ehci_hcd :02:01.2: USB 2.0 restarted, EHCI 0.95, driver 10 Dec 2004
ACPI: PCI interrupt :02:0c.0[A] - GSI 9 (level, low) - IRQ 9 
e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex  
Writing data to swap (7879 pages)... done   
Writing pagedir (31 pages)   
S|
Powering off system
Debug: sleeping function called from invalid context at include/linux/rwsem.h:66
in_atomic():0, irqs_disabled():1
 [c010318d] dump_stack+0x19/0x20
 [c0111731] __might_sleep+0x91/0x9c
 [c0285872] device_shutdown+0x16/0x82
 [c012aa97] power_down+0x47/0x74 
 [c012ac5a] pm_suspend_disk+0x5a/0x74
 [c01292ea] enter_state+0x2e/0x70
 [c0129336] software_suspend+0xa/0x10
 [c024a8a7] acpi_system_write_sleep+0x73/0x98
 [c0149f1b] vfs_write+0xaf/0x118 
 [c014a028] sys_write+0x3c/0x68 
 [c0102c05] sysenter_past_esp+0x52/0x75
Synchronizing SCSI cache for disk sda:   
Shutdown: hda  
acpi_power_off called
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown


Resume on SMP locks up.


Relocating pagedir | 
Reading image data (8157 pages): 100% 8157 done.
Stopping tasks: |   
Freeing memory... done (0 pages freed)
Freezing CPUs (at 1)...Sleeping in:   
 [c0103c1d] dump_stack+0x19/0x20 
 [c0133c7f] smp_pause+0x1f/0x54 
 [c010ee27] smp_call_function_interrupt+0x3b/0x60
 [c01037d4] call_function_interrupt+0x1c/0x24
 [c010] cpu_idle+0x55/0x64   
 [c05929ed] start_secondary+0x71/0x78
 [] 0x0  
 [cffa5fbc] 0xcffa5fbc
ok  
double fault, gdt at c1203260 [255 bytes]
NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers:
Modules linked in: video thermal processor pcc_acpi fan button battery ac
CPU:1
EIP:0060:[c0133c96]Not tainted VLI
EFLAGS: 0002   (2.6.11-rc5) 
EIP is at smp_pause+0x36/0x54   
eax: 0001   ebx: cffa5f20   ecx: fffbe4e6   edx: cffa5f20
esi: cffa4000   edi: 0080   ebp: cffa5f58   esp: cffa5f1c
ds: 007b   es: 007b   ss: 0068   
Process swapper (pid: 0, threadinfo=cffa4000 task=c18ac540)
Stack:  007b 0068 80050033  005d3000 06f0 00ff0001 
   c120b260 07ff5f4c c0577000 0088 cffa0080 c011eed4 cffa5f68 cffa5f68 
   c010ee27  0001 cffa5fa4 c01037d4 0001 c120b260 fffbe4e5 
Call Trace:
 [c0103bf7] show_stack+0x7b/0x88
 [c0103d36] show_registers+0x112/0x188
 [c01046f1] die_nmi+0x41/0x74 
 [c010fcb4] nmi_watchdog_tick+0x54/0xcc
 [c0104797] default_do_nmi+0x73/0xfc   
 [c0104865] do_nmi+0x39/0x4c
 [c010395c] nmi_stack_correct+0x1d/0x2a
 [c010ee27] smp_call_function_interrupt+0x3b/0x60
 [c01037d4] call_function_interrupt+0x1c/0x24
 [c010] cpu_idle+0x55/0x64   
 [c05929ed] start_secondary+0x71/0x78
 [] 0x0  
 [cffa5fbc] 0xcffa5fbc
Code: e8 60 e0 24 00 68 0c 7a 40 c0 e8 c2 68 fe ff e8 85 ff fc ff 83 c4 08 f0 
ff 05 4c 20 5e c0 a1 50 20 5e c0 89 da 85 c0 74 0b f3 
console shuts up ...
   

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Hi!

 btw, suspend is a bit messy.  The disk spins down.  Then up.  Then down
 again.  And:

Yes, that's known, pm_message_t needs to become struct to solve disk
pingpong properly.

 Debug: sleeping function called from invalid context at mm/slab.c:2082
 in_atomic():0, irqs_disabled():1  
  [c010318d] dump_stack+0x19/0x20
  [c0111731] __might_sleep+0x91/0x9c
  [c01365df] kmem_cache_alloc+0x23/0x84
  [c0232d50] acpi_evaluate_integer+0x3c/0xac
  [c024b3d9] acpi_bus_get_status+0x39/0x94  
  [c024ca99] acpi_pci_link_set+0x16d/0x1e8
  [c024ce65] acpi_pci_link_resume+0x1d/0x28
  [c024ce8a] irqrouter_resume+0x1a/0x38
  [c0281e3c] sysdev_resume+0x2c/0xae   
  [c0285ea8] device_power_up+0x8/0x11
  [c012a873] swsusp_suspend+0x4b/0x58
  [c012ac35] pm_suspend_disk+0x35/0x74
  [c01292ea] enter_state+0x2e/0x70
  [c0129336] software_suspend+0xa/0x10
  [c024a8a7] acpi_system_write_sleep+0x73/0x98
  [c0149f1b] vfs_write+0xaf/0x118 
  [c014a028] sys_write+0x3c/0x68 
  [c0102c05] sysenter_past_esp+0x52/0x75

ACPI problem, patches are available (s/GFP_KERNEL/GFP_ATOMIC), but Len
claims better solution is ready... OTOH he claims that for half a year
already so we may push him a bit (added to cc). 

 Powering off system
 Debug: sleeping function called from invalid context at 
 include/linux/rwsem.h:66
 in_atomic():0, irqs_disabled():1  
   
  [c010318d] dump_stack+0x19/0x20
  [c0111731] __might_sleep+0x91/0x9c
  [c0285872] device_shutdown+0x16/0x82
  [c012aa97] power_down+0x47/0x74 
  [c012ac5a] pm_suspend_disk+0x5a/0x74
  [c01292ea] enter_state+0x2e/0x70
  [c0129336] software_suspend+0xa/0x10
  [c024a8a7] acpi_system_write_sleep+0x73/0x98
  [c0149f1b] vfs_write+0xaf/0x118 
  [c014a028] sys_write+0x3c/0x68 
  [c0102c05] sysenter_past_esp+0x52/0x75

I'll look at this one.
Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Hi!

  In `subj` kernel, machine no longer powers down at the end of
   swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk.
 
 Binary searching indicates that this is due to
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch.
 
 I'll drop it.  That patch is pretty ugly-looking anyway (ACPI code in
 drivers/base/power/?).
 
 Perhaps someone who is hitting the problem which that patch addresses could
 raise a bugzilla entry.
 
 Oh.  It has one.  http://bugme.osdl.org/show_bug.cgi?id=4041
 
 Anyway.  It needs more work.

Yes, the patch is very ugly. If something like this needs to be done,
then perhaps acpi should properly register into driver model and do
the work there. This will also mean code will be called consistently.

Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Hi!

 Resume on SMP locks up.

Does it work on UP kernel on same hardware? NMI watchdog is problem
for suspend, it takes long to do various phases. Can you disable it
for testing?
Pavel

 Relocating pagedir | 
 Reading image data (8157 pages): 100% 8157 done.
 Stopping tasks: |   
 Freeing memory... done (0 pages freed)
 Freezing CPUs (at 1)...Sleeping in:   
  [c0103c1d] dump_stack+0x19/0x20 
  [c0133c7f] smp_pause+0x1f/0x54 
  [c010ee27] smp_call_function_interrupt+0x3b/0x60
  [c01037d4] call_function_interrupt+0x1c/0x24
  [c010] cpu_idle+0x55/0x64   
  [c05929ed] start_secondary+0x71/0x78
  [] 0x0  
  [cffa5fbc] 0xcffa5fbc
 ok  
 double fault, gdt at c1203260 [255 bytes]
 NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers:
 Modules linked in: video thermal processor pcc_acpi fan button battery ac
 CPU:1
 EIP:0060:[c0133c96]Not tainted VLI
 EFLAGS: 0002   (2.6.11-rc5) 
 EIP is at smp_pause+0x36/0x54   
 eax: 0001   ebx: cffa5f20   ecx: fffbe4e6   edx: cffa5f20
 esi: cffa4000   edi: 0080   ebp: cffa5f58   esp: cffa5f1c
 ds: 007b   es: 007b   ss: 0068   
 Process swapper (pid: 0, threadinfo=cffa4000 task=c18ac540)
 Stack:  007b 0068 80050033  005d3000 06f0 
 00ff0001 
c120b260 07ff5f4c c0577000 0088 cffa0080 c011eed4 cffa5f68 
 cffa5f68 
c010ee27  0001 cffa5fa4 c01037d4 0001 c120b260 
 fffbe4e5 
 Call Trace:   
  
  [c0103bf7] show_stack+0x7b/0x88
  [c0103d36] show_registers+0x112/0x188
  [c01046f1] die_nmi+0x41/0x74 
  [c010fcb4] nmi_watchdog_tick+0x54/0xcc
  [c0104797] default_do_nmi+0x73/0xfc   
  [c0104865] do_nmi+0x39/0x4c
  [c010395c] nmi_stack_correct+0x1d/0x2a
  [c010ee27] smp_call_function_interrupt+0x3b/0x60
  [c01037d4] call_function_interrupt+0x1c/0x24
  [c010] cpu_idle+0x55/0x64   
  [c05929ed] start_secondary+0x71/0x78
  [] 0x0  
  [cffa5fbc] 0xcffa5fbc
 Code: e8 60 e0 24 00 68 0c 7a 40 c0 e8 c2 68 fe ff e8 85 ff fc ff 83 c4 08 f0 
 ff 05 4c 20 5e c0 a1 50 20 5e c0 89 da 85 c0 74 0b f3 
 console shuts up ...  
  
 

-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Andrew Morton [EMAIL PROTECTED] writes:

Pavel Machek [EMAIL PROTECTED] wrote:

In `subj` kernel, machine no longer powers down at the end of
swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk.

Binary searching indicates that this is due to
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch.

I'll drop it. That patch is pretty ugly-looking anyway (ACPI code in
drivers/base/power/?).

Perhaps someone who is hitting the problem which that patch addresses could
raise a bugzilla entry.

Oh. It has one. http://bugme.osdl.org/show_bug.cgi?id=4041

Anyway. It needs more work.

Agreed.

I threw it together to test a specific code path, and the fact it
fails in software suspend is actually almost confirmation that I am on
the right track. This actually fixed the case I was testing.

In this case the failure is simply because system_state is
not set to SYSTEM_POWER_OFF before
kernel/power/disk.c:power_down() calls device_shutdown().
The appropriate reboot notifier is also not called..

So to fix this properly all of the places
that call machine_power_off now need to call a wrapper
that does all of the appropriate things and then calls
machine_power_off.

Likewise with the other reboot functions.

In addition a clean way to get device_shutdown() to
call acpi_power_off_prepare() at roughly the location
I have it hard coded.

The fundamental issue this patch was starting to address
before I ran out of steam, is that acpi_power_off_prepare()
must be called with interrupts enabled and after we have shut down
the system devices (i.e. the interrupt controllers) we can't
guarantee interrupts, are working.

I'm don't know how much earlier it is safe to
acpi_power_off_prepare(). But mostly I think we need to
throw in a fake device to attach acpi_power_off_prepare to.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Pavel Machek [EMAIL PROTECTED] writes:

 Yes, the patch is very ugly. If something like this needs to be done,
 then perhaps acpi should properly register into driver model and do
 the work there. This will also mean code will be called consistently.

I totally agree.  Do you have an example of how a non-device
can do this?

In particular something that gets as close to shutting down
the system devices as possible.  But gets called before that.

Or perhaps acpi should simply be setup to be the first system device?

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Hi!

  Yes, the patch is very ugly. If something like this needs to be done,
  then perhaps acpi should properly register into driver model and do
  the work there. This will also mean code will be called consistently.
 
 I totally agree.  Do you have an example of how a non-device
 can do this?
 
 In particular something that gets as close to shutting down
 the system devices as possible.  But gets called before that.
 
 Or perhaps acpi should simply be setup to be the first system device?

I believe that's the prefered solution.
Pavel
-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Hi!

In `subj` kernel, machine no longer powers down at the end of
swsusp. 2.6.11-rc5-pavel works ok, as does 2.6.11-bk.

Binary searching indicates that this is due to
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc5/2.6.11-rc5-mm1/broken-out/acpi_power_off-bug-fix.patch.

I'll drop it. That patch is pretty ugly-looking anyway (ACPI code in
drivers/base/power/?).

Perhaps someone who is hitting the problem which that patch addresses could
raise a bugzilla entry.

Oh. It has one. http://bugme.osdl.org/show_bug.cgi?id=4041

Anyway. It needs more work.

Agreed.

I threw it together to test a specific code path, and the fact it
fails in software suspend is actually almost confirmation that I am on
the right track. This actually fixed the case I was testing.

Can you suggest patch to do it right? Or perhaps there should be
just_plain_power_machine_down() that does all neccessary
trickery?
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Hi!

 btw, suspend is a bit messy.  The disk spins down.  Then up.  Then down
 again.  And:

Yes, this is going to be properly solved by switching pm_message_t to
struct (preview patch attached, EVENT will become .event, this is just
for me). I could do some hack to make disk not go up-down-up (and will
need to do it for suse9.3, anyway), but I do not think that would
belong to mainline.

 Powering off system
 Debug: sleeping function called from invalid context at 
 include/linux/rwsem.h:66
 in_atomic():0, irqs_disabled():1  
   
  [c010318d] dump_stack+0x19/0x20
  [c0111731] __might_sleep+0x91/0x9c
  [c0285872] device_shutdown+0x16/0x82
  [c012aa97] power_down+0x47/0x74 
  [c012ac5a] pm_suspend_disk+0x5a/0x74
  [c01292ea] enter_state+0x2e/0x70
  [c0129336] software_suspend+0xa/0x10
  [c024a8a7] acpi_system_write_sleep+0x73/0x98
  [c0149f1b] vfs_write+0xaf/0x118 
  [c014a028] sys_write+0x3c/0x68 
  [c0102c05] sysenter_past_esp+0x52/0x75
 Synchronizing SCSI cache for disk sda:   
 Shutdown: hda  
 acpi_power_off called

Hmm, device_shutdown is confused. Should it be called with interrupts
enabled or disabled? It uses rwsem, that suggests interrupts enabled,
but I do not think sysdev_shutdown with enabled interrupts is good
idea (and comment suggests it should be called with interrupts disabled).

Pavel

/**
 * We handle system devices differently - we suspend and shut them
 * down last and resume them first. That way, we don't do anything
stupid like
 * shutting down the interrupt controller before any devices..
 *
 * Note that there are not different stages for power management calls
-
 * they only get one called once when interrupts are disabled.
 */

extern int sysdev_shutdown(void);

/**
 * device_shutdown - call -shutdown() on each device to shutdown.
 */
void device_shutdown(void)
{
struct device * dev;

down_write(devices_subsys.rwsem);
list_for_each_entry_reverse(dev, devices_subsys.kset.list, kobj.entry) 
{
pr_debug(shutting down %s: , dev-bus_id);
if (dev-driver  dev-driver-shutdown) {
pr_debug(Ok\n);
dev-driver-shutdown(dev);
} else
pr_debug(Ignored.\n);
}
up_write(devices_subsys.rwsem);

sysdev_shutdown();
}



-- 
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Hi!

 btw, suspend is a bit messy.  The disk spins down.  Then up.  Then down
 again.  And:

Here's preview patch to make disk not do stupid yo-yo. Please do not
apply (it will probably not apply cleanly anyway).

I can fix disk going yo-yo without switching pm_message_t to struct,
but will have to back parts of that later. Do you want patch?
Pavel

--- clean/drivers/base/power/resume.c   2004-12-25 13:34:59.0 +0100
+++ linux/drivers/base/power/resume.c   2005-02-28 15:38:51.0 +0100
@@ -41,7 +41,7 @@
list_add_tail(entry, dpm_active);
 
up(dpm_list_sem);
-   if (!dev-power.prev_state)
+   if (!dev-power.prev_state EVENT)
resume_device(dev);
down(dpm_list_sem);
put_device(dev);
--- clean/drivers/base/power/runtime.c  2005-01-12 11:07:39.0 +0100
+++ linux/drivers/base/power/runtime.c  2005-02-28 15:42:10.0 +0100
@@ -13,10 +13,10 @@
 static void runtime_resume(struct device * dev)
 {
dev_dbg(dev, resuming\n);
-   if (!dev-power.power_state)
+   if (!dev-power.power_state EVENT)
return;
if (!resume_device(dev))
-   dev-power.power_state = 0;
+   dev-power.power_state = PMSG_ON;
 }
 
 
@@ -49,10 +49,10 @@
int error = 0;
 
down(dpm_sem);
-   if (dev-power.power_state == state)
+   if (dev-power.power_state EVENT == state EVENT)
goto Done;
 
-   if (dev-power.power_state)
+   if (dev-power.power_state EVENT)
runtime_resume(dev);
 
if (!(error = suspend_device(dev, state)))
--- clean/drivers/base/power/shutdown.c 2004-08-15 19:14:55.0 +0200
+++ linux/drivers/base/power/shutdown.c 2005-01-12 10:57:23.0 +0100
@@ -29,7 +29,8 @@
dev-driver-shutdown(dev);
return 0;
}
-   return dpm_runtime_suspend(dev, dev-detach_state);
+   /* FIXME */
+   return dpm_runtime_suspend(dev, PMSG_FREEZE);
 }
 
 
--- clean/drivers/base/power/suspend.c  2005-01-12 11:07:39.0 +0100
+++ linux/drivers/base/power/suspend.c  2005-02-28 21:30:13.0 +0100
@@ -43,7 +43,7 @@
 
dev-power.prev_state = dev-power.power_state;
 
-   if (dev-bus  dev-bus-suspend  !dev-power.power_state)
+   if (dev-bus  dev-bus-suspend  (!dev-power.power_state EVENT))
error = dev-bus-suspend(dev, state);
 
return error;
@@ -134,6 +134,8 @@
  Done:
return error;
  Error:
+   printk(KERN_ERR Could not power down device %s: 
+   error %d\n, kobject_name(dev-kobj), error);
dpm_power_up();
goto Done;
 }
--- clean/drivers/base/power/sysfs.c2004-08-15 19:14:55.0 +0200
+++ linux/drivers/base/power/sysfs.c2005-02-28 15:43:57.0 +0100
@@ -26,19 +26,20 @@
 
 static ssize_t state_show(struct device * dev, char * buf)
 {
-   return sprintf(buf, %u\n, dev-power.power_state);
+   return sprintf(buf, %u\n, dev-power.power_state EVENT);
 }
 
 static ssize_t state_store(struct device * dev, const char * buf, size_t n)
 {
-   u32 state;
+   pm_message_t state;
char * rest;
int error = 0;
 
-   state = simple_strtoul(buf, rest, 10);
+   state EVENT = simple_strtoul(buf, rest, 10);
+// state.flags = PFL_RUNTIME;
if (*rest)
return -EINVAL;
-   if (state)
+   if (state EVENT)
error = dpm_runtime_suspend(dev, state);
else
dpm_runtime_resume(dev);
--- clean/drivers/ide/ide-disk.c2005-02-14 14:12:21.0 +0100
+++ linux/drivers/ide/ide-disk.c2005-02-14 22:34:43.0 +0100
@@ -872,7 +872,7 @@
 {
switch (rq-pm-pm_step) {
case idedisk_pm_flush_cache:/* Suspend step 1 (flush cache) 
complete */
-   if (rq-pm-pm_state == 4)
+   if (rq-pm-pm_state == EVENT_FREEZE)
rq-pm-pm_step = ide_pm_state_completed;
else
rq-pm-pm_step = idedisk_pm_standby;
@@ -1155,8 +1155,7 @@
return;
}
 
-   printk(Shutdown: %s\n, drive-name);
-   dev-bus-suspend(dev, PM_SUSPEND_STANDBY);
+   dev-bus-suspend(dev, PMSG_SUSPEND);
 }
 
 /*
--- clean/drivers/ide/ide.c 2005-02-28 00:50:42.0 +0100
+++ linux/drivers/ide/ide.c 2005-02-28 15:48:21.0 +0100
@@ -1398,7 +1398,7 @@
rq.special = args;
rq.pm = rqpm;
rqpm.pm_step = ide_pm_state_start_suspend;
-   rqpm.pm_state = state;
+   rqpm.pm_state = state EVENT;
 
return ide_do_drive_cmd(drive, rq, ide_wait);
 }
@@ -1417,7 +1417,7 @@
rq.special = args;
rq.pm = rqpm;
rqpm.pm_step = ide_pm_state_start_resume;
-   rqpm.pm_state = 0;
+   rqpm.pm_state = EVENT_ON;
 
return

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Pavel Machek [EMAIL PROTECTED] writes:

  I threw it together to test a specific code path, and the fact it
  fails in software suspend is actually almost confirmation that I am on
  the right track.  This actually fixed the case I was testing.
  
  In this case the failure is simply because system_state is
  not set to SYSTEM_POWER_OFF before
  kernel/power/disk.c:power_down() calls device_shutdown().
  The appropriate reboot notifier is also not called..
 
 Can you suggest patch to do it right? Or perhaps there should be
 just_plain_power_machine_down() that does all neccessary
 trickery?

I would call it kernel_power_down() and that
is what I am suggesting is the right fix.

We have it open coded in kernel/sys.c:sys_reboot()
in the switch case for: LINUX_REBOOT_CMD_POWER_OFF

So after the code gets factored out from there all
of the cases that call machine_power_off() and pm_power_off()
directly need to be updated.

There are similar cases for machine_restart() and machine_halt().
But the power off case seems to be the most acute.

My biggest problem with this is I get into the recursive code
cleanup problem.  Where I fix one piece and a bug is exposed somewhere
else.  And that then requires investigation and fixing.

Fixing the callers of machine_power_off() is about the fifth bug
fix down the chain triggered by disabling UP interrupts in
device_shutdown(), SMP interrupts have always been disabled.  With the
first bug fix was to create system devices in the device tree..

I haven't a clue where fixing this one will lead.  Recursive
code fixes are a hard thing to schedule :(

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Pavel Machek [EMAIL PROTECTED] wrote:

 Hi!
 
  Resume on SMP locks up.
 
 Does it work on UP kernel on same hardware?

yup.

 NMI watchdog is problem
 for suspend, it takes long to do various phases. Can you disable it
 for testing?

Will try to remember to do that.

  Relocating pagedir | 
  Reading image data (8157 pages): 100% 8157 done.
  Stopping tasks: |   
  Freeing memory... done (0 pages freed)
  Freezing CPUs (at 1)...Sleeping in:   
   [c0103c1d] dump_stack+0x19/0x20 
   [c0133c7f] smp_pause+0x1f/0x54 
   [c010ee27] smp_call_function_interrupt+0x3b/0x60
   [c01037d4] call_function_interrupt+0x1c/0x24
   [c010] cpu_idle+0x55/0x64   
   [c05929ed] start_secondary+0x71/0x78
   [] 0x0  
   [cffa5fbc] 0xcffa5fbc
  ok  
  double fault, gdt at c1203260 [255 bytes]
  NMI Watchdog detected LOCKUP on CPU1, eip c0133c96, registers:

Note the double fault.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown

Pavel Machek [EMAIL PROTECTED] wrote:

 I can fix disk going yo-yo without switching pm_message_t to struct,
  but will have to back parts of that later. Do you want patch?

No thanks, I was just pointing it out.  It sounds like you have it under
control.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.11-rc4-mm1: something is wrong with swsusp powerdown