Re: [patch 0/2] suspend/resume regression fixes

2007-09-29 Thread Bill Davidsen

Mark Lord wrote:

Linus Torvalds wrote:


On Sat, 22 Sep 2007, Thomas Gleixner wrote:

My final enlightment was, when I removed the ACPI processor module,
which controls the lower idle C-states, right before resume; this
worked fine all the time even without all the workaround hacks.

I really hope that this two patches finally set an end to the "jinxed
VAIO heisenbug series", which started when we removed the periodic
tick with the clockevents/dyntick patches.


Ok, so the patches look fine, but I somehow have this slight feeling 
that you gave up a bit too soon on the "*why* does this happen?" 
question.


On a closely related note:  I just now submitted a patch to fix 
SMP-poweroff,

by having it do disable_nonboot_cpus before doing poweroff.

Which has led me to thinking..
..are similar precautions perhaps necessary for *all* ACPI BIOS calls?

Because one never knows what the other CPUs are doing at the same time,
and what the side effects may be on the ACPI BIOS functions.

And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS 
calls
only ever happen from CPU#0 (or the "boot" CPU)?   Or do we do that 
already?



Boot CPU, and AFAIK suspend is the only place which does it.

--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-29 Thread Bill Davidsen

Mark Lord wrote:

Linus Torvalds wrote:


On Sat, 22 Sep 2007, Thomas Gleixner wrote:

My final enlightment was, when I removed the ACPI processor module,
which controls the lower idle C-states, right before resume; this
worked fine all the time even without all the workaround hacks.

I really hope that this two patches finally set an end to the jinxed
VAIO heisenbug series, which started when we removed the periodic
tick with the clockevents/dyntick patches.


Ok, so the patches look fine, but I somehow have this slight feeling 
that you gave up a bit too soon on the *why* does this happen? 
question.


On a closely related note:  I just now submitted a patch to fix 
SMP-poweroff,

by having it do disable_nonboot_cpus before doing poweroff.

Which has led me to thinking..
..are similar precautions perhaps necessary for *all* ACPI BIOS calls?

Because one never knows what the other CPUs are doing at the same time,
and what the side effects may be on the ACPI BIOS functions.

And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS 
calls
only ever happen from CPU#0 (or the boot CPU)?   Or do we do that 
already?



Boot CPU, and AFAIK suspend is the only place which does it.

--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-28 Thread Rafael J. Wysocki
On Friday, 28 September 2007 23:17, Mark Lord wrote:
> Thomas Gleixner wrote:
> > On Fri, 2007-09-28 at 16:27 -0400, Mark Lord wrote:
> ..
> >> On a closely related note:  I just now submitted a patch to fix 
> >> SMP-poweroff,
> >> by having it do disable_nonboot_cpus before doing poweroff.
> >>
> >> Which has led me to thinking..
> >> ..are similar precautions perhaps necessary for *all* ACPI BIOS calls?
> >>
> >> Because one never knows what the other CPUs are doing at the same time,
> >> and what the side effects may be on the ACPI BIOS functions.
> >>
> >> And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS 
> >> calls
> >> only ever happen from CPU#0 (or the "boot" CPU)?   Or do we do that 
> >> already?
> > 
> > The ACPI calls are serialized in the kernel, AFAICT. But the fragile
> > situations (suspend, resume, shutdown, reboot) are probably those, where
> > some BIOS implementation expect that certain things are not called or
> > not active.
> 
> Mmm.. *do* we actually do this for reboot?  I don't see it there.
> And how about for kexec?
> 
> I'm probably just missing seeing it.  Right?

Nope.

Till now, only hibernation and suspend disabled the nonboot CPUs before
invoking the platform firmware.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-28 Thread Mark Lord

Thomas Gleixner wrote:

On Fri, 2007-09-28 at 16:27 -0400, Mark Lord wrote:

..

On a closely related note:  I just now submitted a patch to fix SMP-poweroff,
by having it do disable_nonboot_cpus before doing poweroff.

Which has led me to thinking..
..are similar precautions perhaps necessary for *all* ACPI BIOS calls?

Because one never knows what the other CPUs are doing at the same time,
and what the side effects may be on the ACPI BIOS functions.

And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS calls
only ever happen from CPU#0 (or the "boot" CPU)?   Or do we do that already?


The ACPI calls are serialized in the kernel, AFAICT. But the fragile
situations (suspend, resume, shutdown, reboot) are probably those, where
some BIOS implementation expect that certain things are not called or
not active.


Mmm.. *do* we actually do this for reboot?  I don't see it there.
And how about for kexec?

I'm probably just missing seeing it.  Right?

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-28 Thread Alan Cox
> And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS calls
> only ever happen from CPU#0 (or the "boot" CPU)?   Or do we do that already?

The real question that matters is "does windows" which possibly someone
who touches Windows can definitively answer. For APM we lock to CPU 0 and
have to because its what MS did, not because of the spec.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-28 Thread Thomas Gleixner
On Fri, 2007-09-28 at 16:27 -0400, Mark Lord wrote:
> Linus Torvalds wrote:
> > 
> > On Sat, 22 Sep 2007, Thomas Gleixner wrote:
> >> My final enlightment was, when I removed the ACPI processor module,
> >> which controls the lower idle C-states, right before resume; this
> >> worked fine all the time even without all the workaround hacks.
> >>
> >> I really hope that this two patches finally set an end to the "jinxed
> >> VAIO heisenbug series", which started when we removed the periodic
> >> tick with the clockevents/dyntick patches.
> > 
> > Ok, so the patches look fine, but I somehow have this slight feeling that 
> > you gave up a bit too soon on the "*why* does this happen?" question.
> 
> On a closely related note:  I just now submitted a patch to fix SMP-poweroff,
> by having it do disable_nonboot_cpus before doing poweroff.
> 
> Which has led me to thinking..
> ..are similar precautions perhaps necessary for *all* ACPI BIOS calls?
> 
> Because one never knows what the other CPUs are doing at the same time,
> and what the side effects may be on the ACPI BIOS functions.
> 
> And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS calls
> only ever happen from CPU#0 (or the "boot" CPU)?   Or do we do that already?

The ACPI calls are serialized in the kernel, AFAICT. But the fragile
situations (suspend, resume, shutdown, reboot) are probably those, where
some BIOS implementation expect that certain things are not called or
not active.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-28 Thread Mark Lord

Linus Torvalds wrote:


On Sat, 22 Sep 2007, Thomas Gleixner wrote:

My final enlightment was, when I removed the ACPI processor module,
which controls the lower idle C-states, right before resume; this
worked fine all the time even without all the workaround hacks.

I really hope that this two patches finally set an end to the "jinxed
VAIO heisenbug series", which started when we removed the periodic
tick with the clockevents/dyntick patches.


Ok, so the patches look fine, but I somehow have this slight feeling that 
you gave up a bit too soon on the "*why* does this happen?" question.


On a closely related note:  I just now submitted a patch to fix SMP-poweroff,
by having it do disable_nonboot_cpus before doing poweroff.

Which has led me to thinking..
..are similar precautions perhaps necessary for *all* ACPI BIOS calls?

Because one never knows what the other CPUs are doing at the same time,
and what the side effects may be on the ACPI BIOS functions.

And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS calls
only ever happen from CPU#0 (or the "boot" CPU)?   Or do we do that already?

-ml
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-28 Thread Mark Lord

Linus Torvalds wrote:


On Sat, 22 Sep 2007, Thomas Gleixner wrote:

My final enlightment was, when I removed the ACPI processor module,
which controls the lower idle C-states, right before resume; this
worked fine all the time even without all the workaround hacks.

I really hope that this two patches finally set an end to the jinxed
VAIO heisenbug series, which started when we removed the periodic
tick with the clockevents/dyntick patches.


Ok, so the patches look fine, but I somehow have this slight feeling that 
you gave up a bit too soon on the *why* does this happen? question.


On a closely related note:  I just now submitted a patch to fix SMP-poweroff,
by having it do disable_nonboot_cpus before doing poweroff.

Which has led me to thinking..
..are similar precautions perhaps necessary for *all* ACPI BIOS calls?

Because one never knows what the other CPUs are doing at the same time,
and what the side effects may be on the ACPI BIOS functions.

And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS calls
only ever happen from CPU#0 (or the boot CPU)?   Or do we do that already?

-ml
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-28 Thread Alan Cox
 And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS calls
 only ever happen from CPU#0 (or the boot CPU)?   Or do we do that already?

The real question that matters is does windows which possibly someone
who touches Windows can definitively answer. For APM we lock to CPU 0 and
have to because its what MS did, not because of the spec.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-28 Thread Thomas Gleixner
On Fri, 2007-09-28 at 16:27 -0400, Mark Lord wrote:
 Linus Torvalds wrote:
  
  On Sat, 22 Sep 2007, Thomas Gleixner wrote:
  My final enlightment was, when I removed the ACPI processor module,
  which controls the lower idle C-states, right before resume; this
  worked fine all the time even without all the workaround hacks.
 
  I really hope that this two patches finally set an end to the jinxed
  VAIO heisenbug series, which started when we removed the periodic
  tick with the clockevents/dyntick patches.
  
  Ok, so the patches look fine, but I somehow have this slight feeling that 
  you gave up a bit too soon on the *why* does this happen? question.
 
 On a closely related note:  I just now submitted a patch to fix SMP-poweroff,
 by having it do disable_nonboot_cpus before doing poweroff.
 
 Which has led me to thinking..
 ..are similar precautions perhaps necessary for *all* ACPI BIOS calls?
 
 Because one never knows what the other CPUs are doing at the same time,
 and what the side effects may be on the ACPI BIOS functions.
 
 And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS calls
 only ever happen from CPU#0 (or the boot CPU)?   Or do we do that already?

The ACPI calls are serialized in the kernel, AFAICT. But the fragile
situations (suspend, resume, shutdown, reboot) are probably those, where
some BIOS implementation expect that certain things are not called or
not active.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-28 Thread Rafael J. Wysocki
On Friday, 28 September 2007 23:17, Mark Lord wrote:
 Thomas Gleixner wrote:
  On Fri, 2007-09-28 at 16:27 -0400, Mark Lord wrote:
 ..
  On a closely related note:  I just now submitted a patch to fix 
  SMP-poweroff,
  by having it do disable_nonboot_cpus before doing poweroff.
 
  Which has led me to thinking..
  ..are similar precautions perhaps necessary for *all* ACPI BIOS calls?
 
  Because one never knows what the other CPUs are doing at the same time,
  and what the side effects may be on the ACPI BIOS functions.
 
  And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS 
  calls
  only ever happen from CPU#0 (or the boot CPU)?   Or do we do that 
  already?
  
  The ACPI calls are serialized in the kernel, AFAICT. But the fragile
  situations (suspend, resume, shutdown, reboot) are probably those, where
  some BIOS implementation expect that certain things are not called or
  not active.
 
 Mmm.. *do* we actually do this for reboot?  I don't see it there.
 And how about for kexec?
 
 I'm probably just missing seeing it.  Right?

Nope.

Till now, only hibernation and suspend disabled the nonboot CPUs before
invoking the platform firmware.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-28 Thread Mark Lord

Thomas Gleixner wrote:

On Fri, 2007-09-28 at 16:27 -0400, Mark Lord wrote:

..

On a closely related note:  I just now submitted a patch to fix SMP-poweroff,
by having it do disable_nonboot_cpus before doing poweroff.

Which has led me to thinking..
..are similar precautions perhaps necessary for *all* ACPI BIOS calls?

Because one never knows what the other CPUs are doing at the same time,
and what the side effects may be on the ACPI BIOS functions.

And also, I wonder if at a minimum we should be guaranteeing ACPI BIOS calls
only ever happen from CPU#0 (or the boot CPU)?   Or do we do that already?


The ACPI calls are serialized in the kernel, AFAICT. But the fragile
situations (suspend, resume, shutdown, reboot) are probably those, where
some BIOS implementation expect that certain things are not called or
not active.


Mmm.. *do* we actually do this for reboot?  I don't see it there.
And how about for kexec?

I'm probably just missing seeing it.  Right?

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-23 Thread Matthew Garrett
On Sat, Sep 22, 2007 at 08:11:06PM -0700, Linus Torvalds wrote:

> From a "future behaviour" standpoint it would probably be interesting to 
> hear whether Mihai can make his machine with not with the old IDE layer 
> (which distributions are migrating away from) but with the ATA layer 
> (libata) instead. It too should hopefully know about using ACPI to restore 
> any ATA controller quirks.

It does, but it's disabled by default. libata needs to be loaded with 
noacpi=0.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-23 Thread Mihai Donțu
On Sunday 23 September 2007, Alan Cox wrote:
> >   I switched to libata, but it behaves like the old IDE without ACPI. I
> >   did not manage to get a full dmesg (apparently all volumes are mounted
> >   r/o right after a power up from a s2ram) but I did make a picture, from
> >   which I quote (if I may say so):
> 
> Device errors. 
> 
> Libata currently (wrongly IMHO) defaults to avoiding the use of ACPI
> suspend/resume methods
> 
> So you also need to boot with  "libata.noacpi=0" and if that works beat
> up Jeff a bit ..

  You were right, 'libata.noacpi=0' does the trick :)

  It's interesting my mmc is unmounted before s2ram and then mounted back
  on resume.

  So, to sum  up, I have a working suspend-to-ram with libata. Not bad, not
  bad at all...

  Thanks,

-- 
Mihai Donțu
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-23 Thread Alan Cox
>   I switched to libata, but it behaves like the old IDE without ACPI. I
>   did not manage to get a full dmesg (apparently all volumes are mounted
>   r/o right after a power up from a s2ram) but I did make a picture, from
>   which I quote (if I may say so):

Device errors. 

Libata currently (wrongly IMHO) defaults to avoiding the use of ACPI
suspend/resume methods

So you also need to boot with  "libata.noacpi=0" and if that works beat
up Jeff a bit ..

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-23 Thread Rafael J. Wysocki
On Sunday, 23 September 2007 00:59, Linus Torvalds wrote:
> 
> On Sat, 22 Sep 2007, Thomas Gleixner wrote:
> > 
> > My final enlightment was, when I removed the ACPI processor module,
> > which controls the lower idle C-states, right before resume; this
> > worked fine all the time even without all the workaround hacks.
> > 
> > I really hope that this two patches finally set an end to the "jinxed
> > VAIO heisenbug series", which started when we removed the periodic
> > tick with the clockevents/dyntick patches.
> 
> Ok, so the patches look fine, but I somehow have this slight feeling that 
> you gave up a bit too soon on the "*why* does this happen?" question.
> 
> I realize that the answer is easily "because ACPI screwed up", but I'm 
> wondering if there's something we do to trigger that screw-up.
> 
> In particular, I also suspect that this may not really fix the problem - 
> maybe it just makes the window sufficiently small that it no longer 
> triggers. Because we don't necessarily understand what the real background 
> for the problem is, I'm not sure we can say that it is solved.
> 
> The reason I say this is that I have a suspicion on what triggers it.
> 
> I suspect that the problem is that we do
> 
>   pm_ops->prepare();
>   disable_nonboot_cpus()
>   suspend_enter();
>   enable_nonboot_cpus()
>   pm_finish()
> 
> and here the big thing to notice is that "pm_ops->prepare()" call, which 
> sets the wakup vector etc etc.
> 
> So maybe the real problem here is that once we've done the "->prepare()" 
> call and ACPI has set up various stuff, we MUST NOT do any calls to any 
> ACPI routines to set low-power states, because the stupid firmware isn't 
> expecting it.

I think that this is the case.

> Now, if this is the cause, then I think your patch should indeed fix it, 
> since you get called by the early-suspend code (which happens *before* the 
> "->prepare()" call), but at the same time, I wonder if maybe it would be 
> slightly "more correct" to instead of using the suspend/resume callbacks, 
> simply do this in the "acpi_pm_prepare()" stage, since that is likely the 
> thing that triggers it?
> 
> But hey, I think I'll apply the patches as-is. I'd just feel even better 
> if we actually understood *why* doing the CPU Cx states is not something 
> we can do around the suspend code!

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-23 Thread Rafael J. Wysocki
On Sunday, 23 September 2007 00:59, Linus Torvalds wrote:
 
 On Sat, 22 Sep 2007, Thomas Gleixner wrote:
  
  My final enlightment was, when I removed the ACPI processor module,
  which controls the lower idle C-states, right before resume; this
  worked fine all the time even without all the workaround hacks.
  
  I really hope that this two patches finally set an end to the jinxed
  VAIO heisenbug series, which started when we removed the periodic
  tick with the clockevents/dyntick patches.
 
 Ok, so the patches look fine, but I somehow have this slight feeling that 
 you gave up a bit too soon on the *why* does this happen? question.
 
 I realize that the answer is easily because ACPI screwed up, but I'm 
 wondering if there's something we do to trigger that screw-up.
 
 In particular, I also suspect that this may not really fix the problem - 
 maybe it just makes the window sufficiently small that it no longer 
 triggers. Because we don't necessarily understand what the real background 
 for the problem is, I'm not sure we can say that it is solved.
 
 The reason I say this is that I have a suspicion on what triggers it.
 
 I suspect that the problem is that we do
 
   pm_ops-prepare();
   disable_nonboot_cpus()
   suspend_enter();
   enable_nonboot_cpus()
   pm_finish()
 
 and here the big thing to notice is that pm_ops-prepare() call, which 
 sets the wakup vector etc etc.
 
 So maybe the real problem here is that once we've done the -prepare() 
 call and ACPI has set up various stuff, we MUST NOT do any calls to any 
 ACPI routines to set low-power states, because the stupid firmware isn't 
 expecting it.

I think that this is the case.

 Now, if this is the cause, then I think your patch should indeed fix it, 
 since you get called by the early-suspend code (which happens *before* the 
 -prepare() call), but at the same time, I wonder if maybe it would be 
 slightly more correct to instead of using the suspend/resume callbacks, 
 simply do this in the acpi_pm_prepare() stage, since that is likely the 
 thing that triggers it?
 
 But hey, I think I'll apply the patches as-is. I'd just feel even better 
 if we actually understood *why* doing the CPU Cx states is not something 
 we can do around the suspend code!

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-23 Thread Alan Cox
   I switched to libata, but it behaves like the old IDE without ACPI. I
   did not manage to get a full dmesg (apparently all volumes are mounted
   r/o right after a power up from a s2ram) but I did make a picture, from
   which I quote (if I may say so):

Device errors. 

Libata currently (wrongly IMHO) defaults to avoiding the use of ACPI
suspend/resume methods

So you also need to boot with  libata.noacpi=0 and if that works beat
up Jeff a bit ..

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-23 Thread Mihai Donțu
On Sunday 23 September 2007, Alan Cox wrote:
I switched to libata, but it behaves like the old IDE without ACPI. I
did not manage to get a full dmesg (apparently all volumes are mounted
r/o right after a power up from a s2ram) but I did make a picture, from
which I quote (if I may say so):
 
 Device errors. 
 
 Libata currently (wrongly IMHO) defaults to avoiding the use of ACPI
 suspend/resume methods
 
 So you also need to boot with  libata.noacpi=0 and if that works beat
 up Jeff a bit ..

  You were right, 'libata.noacpi=0' does the trick :)

  It's interesting my mmc is unmounted before s2ram and then mounted back
  on resume.

  So, to sum  up, I have a working suspend-to-ram with libata. Not bad, not
  bad at all...

  Thanks,

-- 
Mihai Donțu
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-23 Thread Matthew Garrett
On Sat, Sep 22, 2007 at 08:11:06PM -0700, Linus Torvalds wrote:

 From a future behaviour standpoint it would probably be interesting to 
 hear whether Mihai can make his machine with not with the old IDE layer 
 (which distributions are migrating away from) but with the ATA layer 
 (libata) instead. It too should hopefully know about using ACPI to restore 
 any ATA controller quirks.

It does, but it's disabled by default. libata needs to be loaded with 
noacpi=0.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-22 Thread Mihai Donțu
On Sunday 23 September 2007, Linus Torvalds wrote:
> From a "future behaviour" standpoint it would probably be interesting to 
> hear whether Mihai can make his machine with not with the old IDE layer 
> (which distributions are migrating away from) but with the ATA layer 
> (libata) instead. It too should hopefully know about using ACPI to restore 
> any ATA controller quirks.

  I switched to libata, but it behaves like the old IDE without ACPI. I
  did not manage to get a full dmesg (apparently all volumes are mounted
  r/o right after a power up from a s2ram) but I did make a picture, from
  which I quote (if I may say so):

  "
  ata1.00: configured for PIO0
  sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
  sd 0:0:0:0: [sda] Sense key : 0xb [current] [descriptor]
  Descriptor sense data with sense descriptors (in hex):
  72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
  07 65 35 25
  sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x0
  end_request: I/O error, dev sda, sector 124073509
  ata1: EH complete
  sd 0:0:0:0: [sda] Write Protect is off
  sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
  ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
  ata1.00: cmd c5/00:10:75:36:65/00:00:00:00:00/e7 tag 0 cdb 0x0 data 8192 out
   res 51/04:10:75:36:65/00:00:00:00:00/e7 Emask 0x1 (device error)
  ata1.00: configured for PIO0
  ata1: EH complete
  "

  The last six lines repeat six times after which the whole things goes
  from the beginning:
  "
  ata1.00: configured for PIO0
  ...
  "

  It all gets crazy the moment I (or a process) try to access the root
  (or any other drive), until then, everything is nice and quiet.

  Mmm... in the excerpt above it says: "Write Protect is off" but when I did
  $ mount -o remount,rw /
  I got something like: "the device is write protected".

  I tried to save the dmesg on a mmc, but after powering up it said:
  "out of disk space"

  These are about all symptoms that I noticed... oh, and 'scsi_eh_0/1'
  enters disk-sleep often.

  I attached the dmesg pre s2ram.

  Thanks,

-- 
Mihai Donțu
[0.00] Linux version 2.6.23-rc7 ([EMAIL PROTECTED]) (gcc version 4.1.2 
(Gentoo 4.1.2)) #10 PREEMPT Sun Sep 23 06:54:24 EEST 2007
[0.00] Command line: nohz=on root=/dev/sda7
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009fc00 (usable)
[0.00]  BIOS-e820: 0009fc00 - 000a (reserved)
[0.00]  BIOS-e820: 000e - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 37fd (usable)
[0.00]  BIOS-e820: 37fd - 37fefc00 (reserved)
[0.00]  BIOS-e820: 37fefc00 - 37ffb000 (ACPI NVS)
[0.00]  BIOS-e820: 37ffb000 - 4000 (reserved)
[0.00]  BIOS-e820: e000 - f000 (reserved)
[0.00]  BIOS-e820: fec0 - fec02000 (reserved)
[0.00]  BIOS-e820: ffb8 - ffc0 (reserved)
[0.00]  BIOS-e820: fff8 - 0001 (reserved)
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used
[0.00] Entering add_active_range(0, 256, 229328) 1 entries of 256 used
[0.00] end_pfn_map = 1048576
[0.00] DMI 2.3 present.
[0.00] ACPI: RSDP 000FE270, 0014 (r0 HP)
[0.00] ACPI: RSDT 37FEFC84, 0034 (r1 HP 0944 22110520 HP
  1)
[0.00] ACPI: FACP 37FEFC00, 0084 (r2 HP 09442 HP
  1)
[0.00] ACPI: DSDT 37FEFD50, 7489 (r1 HPSB4001 MSFT  
10E)
[0.00] ACPI: FACS 37FFAE80, 0040
[0.00] ACPI: APIC 37FEFCB8, 005A (r1 HP 09441 HP
  1)
[0.00] ACPI: MCFG 37FEFD14, 003C (r1 HP 09441 HP
  1)
[0.00] ACPI: SSDT 37FF71D9, 0382 (r1 HP   HPQPpc 1001 MSFT  
10E)
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used
[0.00] Entering add_active_range(0, 256, 229328) 1 entries of 256 used
[0.00] No mptable found.
[0.00] Zone PFN ranges:
[0.00]   DMA 0 -> 4096
[0.00]   DMA324096 ->  1048576
[0.00]   Normal1048576 ->  1048576
[0.00] Movable zone start PFN for each node
[0.00] early_node_map[2] active PFN ranges
[0.00] 0:0 ->  159
[0.00] 0:  256 ->   229328
[0.00] On node 0 totalpages: 229231
[0.00]   DMA zone: 56 pages used for memmap
[0.00]   DMA zone: 1911 pages reserved
[0.00]   DMA zone: 2032 pages, LIFO batch:0
[0.00]   DMA32 zone: 3079 pages used for memmap
[0.00]   DMA32 zone: 222153 pages, LIFO batch:31
[0.00]   Normal zone: 0 pages used for memmap
[0.00]   Movable zone: 0 pages used for memmap
[0.00] ATI board detected. Disabling timer routing over 8254.
[ 

Re: [patch 0/2] suspend/resume regression fixes

2007-09-22 Thread Linus Torvalds


On Sun, 23 Sep 2007, Oleg Verych wrote:
>
> As i never had any suspend working ever, let me point this LKML post
> 
> http://mid.gmane.org/[EMAIL PROTECTED]
> 
> from Mihal, who just managed to do some other magic in sligtly
> different context (maybe yet another anti "ACPI screwed up").
> 
> Mihai, you can find whole thread by above URL, requesting message-ids
> from in-reply-to or references headers of this message.

>From a "future behaviour" standpoint it would probably be interesting to 
hear whether Mihai can make his machine with not with the old IDE layer 
(which distributions are migrating away from) but with the ATA layer 
(libata) instead. It too should hopefully know about using ACPI to restore 
any ATA controller quirks.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-22 Thread Oleg Verych
* Sat, 22 Sep 2007 15:59:25 -0700 (PDT)

As i never had any suspend working ever, let me point this LKML post

http://mid.gmane.org/[EMAIL PROTECTED]

from Mihal, who just managed to do some other magic in sligtly
different context (maybe yet another anti "ACPI screwed up").

Mihai, you can find whole thread by above URL, requesting message-ids
from in-reply-to or references headers of this message.
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-22 Thread Thomas Gleixner
Linus,

On Sat, 2007-09-22 at 15:59 -0700, Linus Torvalds wrote:
> > My final enlightment was, when I removed the ACPI processor module,
> > which controls the lower idle C-states, right before resume; this
> > worked fine all the time even without all the workaround hacks.
> > 
> > I really hope that this two patches finally set an end to the "jinxed
> > VAIO heisenbug series", which started when we removed the periodic
> > tick with the clockevents/dyntick patches.
> 
> Ok, so the patches look fine, but I somehow have this slight feeling that 
> you gave up a bit too soon on the "*why* does this happen?" question.

Yeah, I gave up at the point where I was not longer able to dig
deeper :)

> I realize that the answer is easily "because ACPI screwed up", but I'm 
> wondering if there's something we do to trigger that screw-up.

Fair enough.

> In particular, I also suspect that this may not really fix the problem - 
> maybe it just makes the window sufficiently small that it no longer 
> triggers. Because we don't necessarily understand what the real background 
> for the problem is, I'm not sure we can say that it is solved.
> 
> The reason I say this is that I have a suspicion on what triggers it.
> 
> I suspect that the problem is that we do
> 
>   pm_ops->prepare();
>   disable_nonboot_cpus()
>   suspend_enter();
>   enable_nonboot_cpus()
>   pm_finish()
> 
> and here the big thing to notice is that "pm_ops->prepare()" call, which 
> sets the wakup vector etc etc.
> 
> So maybe the real problem here is that once we've done the "->prepare()" 
> call and ACPI has set up various stuff, we MUST NOT do any calls to any 
> ACPI routines to set low-power states, because the stupid firmware isn't 
> expecting it.

That's what I suspect and deduced from the various experiments including
a force the cpu into a lower c-state one, which triggered the problem
fully reproducible. Note that in case of the "force a lower c-state" I
verified, that the PIT was activated to avoid the local apic stops in c3
issue. But I never got an PIT interrupt. Either the box was completely
stuck or I was able to recover by hitting a key, which is as well one of
the unexplained phenomenons.

> Now, if this is the cause, then I think your patch should indeed fix it, 
> since you get called by the early-suspend code (which happens *before* the 
> "->prepare()" call), but at the same time, I wonder if maybe it would be 
> slightly "more correct" to instead of using the suspend/resume callbacks, 
> simply do this in the "acpi_pm_prepare()" stage, since that is likely the 
> thing that triggers it?

Yeah, probably that's the correct point, but I leave this to the ACPI
wizards.

> But hey, I think I'll apply the patches as-is. I'd just feel even better 
> if we actually understood *why* doing the CPU Cx states is not something 
> we can do around the suspend code!

That needs some explanation of the folks who can actually look beyond
the ACPI/BIOS internals.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-22 Thread Linus Torvalds


On Sat, 22 Sep 2007, Thomas Gleixner wrote:
> 
> My final enlightment was, when I removed the ACPI processor module,
> which controls the lower idle C-states, right before resume; this
> worked fine all the time even without all the workaround hacks.
> 
> I really hope that this two patches finally set an end to the "jinxed
> VAIO heisenbug series", which started when we removed the periodic
> tick with the clockevents/dyntick patches.

Ok, so the patches look fine, but I somehow have this slight feeling that 
you gave up a bit too soon on the "*why* does this happen?" question.

I realize that the answer is easily "because ACPI screwed up", but I'm 
wondering if there's something we do to trigger that screw-up.

In particular, I also suspect that this may not really fix the problem - 
maybe it just makes the window sufficiently small that it no longer 
triggers. Because we don't necessarily understand what the real background 
for the problem is, I'm not sure we can say that it is solved.

The reason I say this is that I have a suspicion on what triggers it.

I suspect that the problem is that we do

pm_ops->prepare();
disable_nonboot_cpus()
suspend_enter();
enable_nonboot_cpus()
pm_finish()

and here the big thing to notice is that "pm_ops->prepare()" call, which 
sets the wakup vector etc etc.

So maybe the real problem here is that once we've done the "->prepare()" 
call and ACPI has set up various stuff, we MUST NOT do any calls to any 
ACPI routines to set low-power states, because the stupid firmware isn't 
expecting it.

Now, if this is the cause, then I think your patch should indeed fix it, 
since you get called by the early-suspend code (which happens *before* the 
"->prepare()" call), but at the same time, I wonder if maybe it would be 
slightly "more correct" to instead of using the suspend/resume callbacks, 
simply do this in the "acpi_pm_prepare()" stage, since that is likely the 
thing that triggers it?

But hey, I think I'll apply the patches as-is. I'd just feel even better 
if we actually understood *why* doing the CPU Cx states is not something 
we can do around the suspend code!

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/2] suspend/resume regression fixes

2007-09-22 Thread Thomas Gleixner
Sorry, it took me quite a while to realize the real root cause of the
VAIO - and probably many other machines - suspend/resume regressions,
which were unearthed by the dyntick / clockevents patches.

We disable a lot of ACPI/BIOS functionality during suspend, but we
keep the lower idle C-states functionality active across
suspend/resume. It seems that this causes trouble with certain BIOSes,
but I assume that the problem is more wide spread and just not
surfacing due to the various scenarios in which a machine goes into
suspend/resume. I spent some quality time to figure out a set of debug
mechanisms, which did not influence the problem. So it is quite likely
that a lot of machines might be affected by this, but due to the
configuration, interrupt scenarios,  the problem just does
not show up. 

My final enlightment was, when I removed the ACPI processor module,
which controls the lower idle C-states, right before resume; this
worked fine all the time even without all the workaround hacks.

I really hope that this two patches finally set an end to the "jinxed
VAIO heisenbug series", which started when we removed the periodic
tick with the clockevents/dyntick patches.

Venki, can you please add the analogous fix to the cpuidle patch set ?

Thanks,

tglx
-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/2] suspend/resume regression fixes

2007-09-22 Thread Thomas Gleixner
Sorry, it took me quite a while to realize the real root cause of the
VAIO - and probably many other machines - suspend/resume regressions,
which were unearthed by the dyntick / clockevents patches.

We disable a lot of ACPI/BIOS functionality during suspend, but we
keep the lower idle C-states functionality active across
suspend/resume. It seems that this causes trouble with certain BIOSes,
but I assume that the problem is more wide spread and just not
surfacing due to the various scenarios in which a machine goes into
suspend/resume. I spent some quality time to figure out a set of debug
mechanisms, which did not influence the problem. So it is quite likely
that a lot of machines might be affected by this, but due to the
configuration, interrupt scenarios,  the problem just does
not show up. 

My final enlightment was, when I removed the ACPI processor module,
which controls the lower idle C-states, right before resume; this
worked fine all the time even without all the workaround hacks.

I really hope that this two patches finally set an end to the jinxed
VAIO heisenbug series, which started when we removed the periodic
tick with the clockevents/dyntick patches.

Venki, can you please add the analogous fix to the cpuidle patch set ?

Thanks,

tglx
-- 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-22 Thread Linus Torvalds


On Sat, 22 Sep 2007, Thomas Gleixner wrote:
 
 My final enlightment was, when I removed the ACPI processor module,
 which controls the lower idle C-states, right before resume; this
 worked fine all the time even without all the workaround hacks.
 
 I really hope that this two patches finally set an end to the jinxed
 VAIO heisenbug series, which started when we removed the periodic
 tick with the clockevents/dyntick patches.

Ok, so the patches look fine, but I somehow have this slight feeling that 
you gave up a bit too soon on the *why* does this happen? question.

I realize that the answer is easily because ACPI screwed up, but I'm 
wondering if there's something we do to trigger that screw-up.

In particular, I also suspect that this may not really fix the problem - 
maybe it just makes the window sufficiently small that it no longer 
triggers. Because we don't necessarily understand what the real background 
for the problem is, I'm not sure we can say that it is solved.

The reason I say this is that I have a suspicion on what triggers it.

I suspect that the problem is that we do

pm_ops-prepare();
disable_nonboot_cpus()
suspend_enter();
enable_nonboot_cpus()
pm_finish()

and here the big thing to notice is that pm_ops-prepare() call, which 
sets the wakup vector etc etc.

So maybe the real problem here is that once we've done the -prepare() 
call and ACPI has set up various stuff, we MUST NOT do any calls to any 
ACPI routines to set low-power states, because the stupid firmware isn't 
expecting it.

Now, if this is the cause, then I think your patch should indeed fix it, 
since you get called by the early-suspend code (which happens *before* the 
-prepare() call), but at the same time, I wonder if maybe it would be 
slightly more correct to instead of using the suspend/resume callbacks, 
simply do this in the acpi_pm_prepare() stage, since that is likely the 
thing that triggers it?

But hey, I think I'll apply the patches as-is. I'd just feel even better 
if we actually understood *why* doing the CPU Cx states is not something 
we can do around the suspend code!

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-22 Thread Thomas Gleixner
Linus,

On Sat, 2007-09-22 at 15:59 -0700, Linus Torvalds wrote:
  My final enlightment was, when I removed the ACPI processor module,
  which controls the lower idle C-states, right before resume; this
  worked fine all the time even without all the workaround hacks.
  
  I really hope that this two patches finally set an end to the jinxed
  VAIO heisenbug series, which started when we removed the periodic
  tick with the clockevents/dyntick patches.
 
 Ok, so the patches look fine, but I somehow have this slight feeling that 
 you gave up a bit too soon on the *why* does this happen? question.

Yeah, I gave up at the point where I was not longer able to dig
deeper :)

 I realize that the answer is easily because ACPI screwed up, but I'm 
 wondering if there's something we do to trigger that screw-up.

Fair enough.

 In particular, I also suspect that this may not really fix the problem - 
 maybe it just makes the window sufficiently small that it no longer 
 triggers. Because we don't necessarily understand what the real background 
 for the problem is, I'm not sure we can say that it is solved.
 
 The reason I say this is that I have a suspicion on what triggers it.
 
 I suspect that the problem is that we do
 
   pm_ops-prepare();
   disable_nonboot_cpus()
   suspend_enter();
   enable_nonboot_cpus()
   pm_finish()
 
 and here the big thing to notice is that pm_ops-prepare() call, which 
 sets the wakup vector etc etc.
 
 So maybe the real problem here is that once we've done the -prepare() 
 call and ACPI has set up various stuff, we MUST NOT do any calls to any 
 ACPI routines to set low-power states, because the stupid firmware isn't 
 expecting it.

That's what I suspect and deduced from the various experiments including
a force the cpu into a lower c-state one, which triggered the problem
fully reproducible. Note that in case of the force a lower c-state I
verified, that the PIT was activated to avoid the local apic stops in c3
issue. But I never got an PIT interrupt. Either the box was completely
stuck or I was able to recover by hitting a key, which is as well one of
the unexplained phenomenons.

 Now, if this is the cause, then I think your patch should indeed fix it, 
 since you get called by the early-suspend code (which happens *before* the 
 -prepare() call), but at the same time, I wonder if maybe it would be 
 slightly more correct to instead of using the suspend/resume callbacks, 
 simply do this in the acpi_pm_prepare() stage, since that is likely the 
 thing that triggers it?

Yeah, probably that's the correct point, but I leave this to the ACPI
wizards.

 But hey, I think I'll apply the patches as-is. I'd just feel even better 
 if we actually understood *why* doing the CPU Cx states is not something 
 we can do around the suspend code!

That needs some explanation of the folks who can actually look beyond
the ACPI/BIOS internals.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-22 Thread Oleg Verych
* Sat, 22 Sep 2007 15:59:25 -0700 (PDT)

As i never had any suspend working ever, let me point this LKML post

http://mid.gmane.org/[EMAIL PROTECTED]

from Mihal, who just managed to do some other magic in sligtly
different context (maybe yet another anti ACPI screwed up).

Mihai, you can find whole thread by above URL, requesting message-ids
from in-reply-to or references headers of this message.
_
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-22 Thread Linus Torvalds


On Sun, 23 Sep 2007, Oleg Verych wrote:

 As i never had any suspend working ever, let me point this LKML post
 
 http://mid.gmane.org/[EMAIL PROTECTED]
 
 from Mihal, who just managed to do some other magic in sligtly
 different context (maybe yet another anti ACPI screwed up).
 
 Mihai, you can find whole thread by above URL, requesting message-ids
 from in-reply-to or references headers of this message.

From a future behaviour standpoint it would probably be interesting to 
hear whether Mihai can make his machine with not with the old IDE layer 
(which distributions are migrating away from) but with the ATA layer 
(libata) instead. It too should hopefully know about using ACPI to restore 
any ATA controller quirks.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] suspend/resume regression fixes

2007-09-22 Thread Mihai Donțu
On Sunday 23 September 2007, Linus Torvalds wrote:
 From a future behaviour standpoint it would probably be interesting to 
 hear whether Mihai can make his machine with not with the old IDE layer 
 (which distributions are migrating away from) but with the ATA layer 
 (libata) instead. It too should hopefully know about using ACPI to restore 
 any ATA controller quirks.

  I switched to libata, but it behaves like the old IDE without ACPI. I
  did not manage to get a full dmesg (apparently all volumes are mounted
  r/o right after a power up from a s2ram) but I did make a picture, from
  which I quote (if I may say so):

  
  ata1.00: configured for PIO0
  sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
  sd 0:0:0:0: [sda] Sense key : 0xb [current] [descriptor]
  Descriptor sense data with sense descriptors (in hex):
  72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
  07 65 35 25
  sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x0
  end_request: I/O error, dev sda, sector 124073509
  ata1: EH complete
  sd 0:0:0:0: [sda] Write Protect is off
  sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
  ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
  ata1.00: cmd c5/00:10:75:36:65/00:00:00:00:00/e7 tag 0 cdb 0x0 data 8192 out
   res 51/04:10:75:36:65/00:00:00:00:00/e7 Emask 0x1 (device error)
  ata1.00: configured for PIO0
  ata1: EH complete
  

  The last six lines repeat six times after which the whole things goes
  from the beginning:
  
  ata1.00: configured for PIO0
  ...
  

  It all gets crazy the moment I (or a process) try to access the root
  (or any other drive), until then, everything is nice and quiet.

  Mmm... in the excerpt above it says: Write Protect is off but when I did
  $ mount -o remount,rw /
  I got something like: the device is write protected.

  I tried to save the dmesg on a mmc, but after powering up it said:
  out of disk space

  These are about all symptoms that I noticed... oh, and 'scsi_eh_0/1'
  enters disk-sleep often.

  I attached the dmesg pre s2ram.

  Thanks,

-- 
Mihai Donțu
[0.00] Linux version 2.6.23-rc7 ([EMAIL PROTECTED]) (gcc version 4.1.2 
(Gentoo 4.1.2)) #10 PREEMPT Sun Sep 23 06:54:24 EEST 2007
[0.00] Command line: nohz=on root=/dev/sda7
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009fc00 (usable)
[0.00]  BIOS-e820: 0009fc00 - 000a (reserved)
[0.00]  BIOS-e820: 000e - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 37fd (usable)
[0.00]  BIOS-e820: 37fd - 37fefc00 (reserved)
[0.00]  BIOS-e820: 37fefc00 - 37ffb000 (ACPI NVS)
[0.00]  BIOS-e820: 37ffb000 - 4000 (reserved)
[0.00]  BIOS-e820: e000 - f000 (reserved)
[0.00]  BIOS-e820: fec0 - fec02000 (reserved)
[0.00]  BIOS-e820: ffb8 - ffc0 (reserved)
[0.00]  BIOS-e820: fff8 - 0001 (reserved)
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used
[0.00] Entering add_active_range(0, 256, 229328) 1 entries of 256 used
[0.00] end_pfn_map = 1048576
[0.00] DMI 2.3 present.
[0.00] ACPI: RSDP 000FE270, 0014 (r0 HP)
[0.00] ACPI: RSDT 37FEFC84, 0034 (r1 HP 0944 22110520 HP
  1)
[0.00] ACPI: FACP 37FEFC00, 0084 (r2 HP 09442 HP
  1)
[0.00] ACPI: DSDT 37FEFD50, 7489 (r1 HPSB4001 MSFT  
10E)
[0.00] ACPI: FACS 37FFAE80, 0040
[0.00] ACPI: APIC 37FEFCB8, 005A (r1 HP 09441 HP
  1)
[0.00] ACPI: MCFG 37FEFD14, 003C (r1 HP 09441 HP
  1)
[0.00] ACPI: SSDT 37FF71D9, 0382 (r1 HP   HPQPpc 1001 MSFT  
10E)
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used
[0.00] Entering add_active_range(0, 256, 229328) 1 entries of 256 used
[0.00] No mptable found.
[0.00] Zone PFN ranges:
[0.00]   DMA 0 - 4096
[0.00]   DMA324096 -  1048576
[0.00]   Normal1048576 -  1048576
[0.00] Movable zone start PFN for each node
[0.00] early_node_map[2] active PFN ranges
[0.00] 0:0 -  159
[0.00] 0:  256 -   229328
[0.00] On node 0 totalpages: 229231
[0.00]   DMA zone: 56 pages used for memmap
[0.00]   DMA zone: 1911 pages reserved
[0.00]   DMA zone: 2032 pages, LIFO batch:0
[0.00]   DMA32 zone: 3079 pages used for memmap
[0.00]   DMA32 zone: 222153 pages, LIFO batch:31
[0.00]   Normal zone: 0 pages used for memmap
[0.00]   Movable zone: 0 pages used for memmap
[0.00] ATI board detected. Disabling timer routing over 8254.
[0.00] ACPI: