Re: resuming swsusp twice

2005-07-16 Thread Pavel Machek
Hi!

> > I of course won't say that this cannot happen, but by design, the
> > swsusp
> > signature is invalidated even before reading the image, so
> > theoretically
> > it should not happen.
> 
> Yes, I'd seen that happen on earlier swsusps, so I was quite suprised
> when it blew up like this.
> 
> Perhaps the image should be more rigorously checked?  I'm wishing that
> it would verify that the header and the image matched, after it finishes
> reading the image.  For example, computing the hash
> 
> MD5(header || image) (|| denotes "concatenate" in crypto pseudocode.)
> 
> and storing that hash in a final trailing block.  Additionally, of
> course, as soon as the resume has read the image it should overwrite the
> header; and the header should include jiffies or something along those
> lines to ensure that it won't accidentally have the same contents as the
> previous image's header.
> 
> The hash doesn't have to be MD5; even a CRC should suffice I think...

Actually, what you want is "if filesystems are newer than suspend
image, panic" test. There is more than one way how that can happen.

Are you sure you did not do

suspend kernel 1
boot kernel 2
attempt to suspend kernel 2 but fail ("not enough swap space")
boot kernel 1 ("and successfully resume, corrupting data")

?
Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: resuming swsusp twice

2005-07-16 Thread Pavel Machek
Hi!

> > I of course won't say that this cannot happen, but by design, the
> > swsusp
> > signature is invalidated even before reading the image, so
> > theoretically
> > it should not happen.
> 
> Yes, I'd seen that happen on earlier swsusps, so I was quite suprised
> when it blew up like this.
> 
> Perhaps the image should be more rigorously checked?  I'm wishing that
> it would verify that the header and the image matched, after it finishes
> reading the image.  For example, computing the hash
> 
> MD5(header || image) (|| denotes "concatenate" in crypto pseudocode.)
> 
> and storing that hash in a final trailing block.  Additionally, of
> course, as soon as the resume has read the image it should overwrite the
> header; and the header should include jiffies or something along those
> lines to ensure that it won't accidentally have the same contents as the
> previous image's header.
> 
> The hash doesn't have to be MD5; even a CRC should suffice I think...

That's quite a lot of complexity... just fix the bug.

Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: resuming swsusp twice

2005-07-16 Thread Pavel Machek
Hi!

> Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp, and
> then resumed.  It ran fine overnight, including a fair amount of IO
> (running firefox, rsyncing ~/Mail/archive from my mail server, hg pull,
> etc).  This morning I did a swsusp:
> 
>   echo shutdown > /sys/power/disk
>   echo disk > /sys/power/state
> 
> and got a panic along the lines of "Unable to find swap space, try
> swapon -a".  Unfortunately I was in a hurry and didn't record the error
> messages.  I powered off, then a few minutes later powered on again.
> 
> At this point, it resumed *to the swsusp state from yesterday*!
> As soon as I realized what had happened, I powered off (not
> shutdown) and rebooted.

Bad, very bad.

> On the next boot it did not find a swsusp signature and booted normally;
> ext3 did a normal recovery and seemed OK, but I was suspicious and did a
> fsck -f, which revealed a lot of damage; most of the damage seemed to be
> in the hg repo which had been pulled from www.kernel.org/hg/.

You should not let ext3 do journal replay. At that point, hopefully
damage will be slightly better. 

> It's extremely unfortunate that there is *any* failure mode in swsusp
> that can result in this behavior.

Well, I've never seen that one before...
Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: resuming swsusp twice

2005-07-16 Thread Pavel Machek
Hi!

 Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp, and
 then resumed.  It ran fine overnight, including a fair amount of IO
 (running firefox, rsyncing ~/Mail/archive from my mail server, hg pull,
 etc).  This morning I did a swsusp:
 
   echo shutdown  /sys/power/disk
   echo disk  /sys/power/state
 
 and got a panic along the lines of Unable to find swap space, try
 swapon -a.  Unfortunately I was in a hurry and didn't record the error
 messages.  I powered off, then a few minutes later powered on again.
 
 At this point, it resumed *to the swsusp state from yesterday*!
 As soon as I realized what had happened, I powered off (not
 shutdown) and rebooted.

Bad, very bad.

 On the next boot it did not find a swsusp signature and booted normally;
 ext3 did a normal recovery and seemed OK, but I was suspicious and did a
 fsck -f, which revealed a lot of damage; most of the damage seemed to be
 in the hg repo which had been pulled from www.kernel.org/hg/.

You should not let ext3 do journal replay. At that point, hopefully
damage will be slightly better. 

 It's extremely unfortunate that there is *any* failure mode in swsusp
 that can result in this behavior.

Well, I've never seen that one before...
Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: resuming swsusp twice

2005-07-16 Thread Pavel Machek
Hi!

  I of course won't say that this cannot happen, but by design, the
  swsusp
  signature is invalidated even before reading the image, so
  theoretically
  it should not happen.
 
 Yes, I'd seen that happen on earlier swsusps, so I was quite suprised
 when it blew up like this.
 
 Perhaps the image should be more rigorously checked?  I'm wishing that
 it would verify that the header and the image matched, after it finishes
 reading the image.  For example, computing the hash
 
 MD5(header || image) (|| denotes concatenate in crypto pseudocode.)
 
 and storing that hash in a final trailing block.  Additionally, of
 course, as soon as the resume has read the image it should overwrite the
 header; and the header should include jiffies or something along those
 lines to ensure that it won't accidentally have the same contents as the
 previous image's header.
 
 The hash doesn't have to be MD5; even a CRC should suffice I think...

That's quite a lot of complexity... just fix the bug.

Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: resuming swsusp twice

2005-07-16 Thread Pavel Machek
Hi!

  I of course won't say that this cannot happen, but by design, the
  swsusp
  signature is invalidated even before reading the image, so
  theoretically
  it should not happen.
 
 Yes, I'd seen that happen on earlier swsusps, so I was quite suprised
 when it blew up like this.
 
 Perhaps the image should be more rigorously checked?  I'm wishing that
 it would verify that the header and the image matched, after it finishes
 reading the image.  For example, computing the hash
 
 MD5(header || image) (|| denotes concatenate in crypto pseudocode.)
 
 and storing that hash in a final trailing block.  Additionally, of
 course, as soon as the resume has read the image it should overwrite the
 header; and the header should include jiffies or something along those
 lines to ensure that it won't accidentally have the same contents as the
 previous image's header.
 
 The hash doesn't have to be MD5; even a CRC should suffice I think...

Actually, what you want is if filesystems are newer than suspend
image, panic test. There is more than one way how that can happen.

Are you sure you did not do

suspend kernel 1
boot kernel 2
attempt to suspend kernel 2 but fail (not enough swap space)
boot kernel 1 (and successfully resume, corrupting data)

?
Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: resuming swsusp twice

2005-07-14 Thread Andy Isaacson
On Thu, Jul 14, 2005 at 08:36:15PM +0200, Stefan Seyfried wrote:
> But the failure you have seen now - failure to invalidate the resume
> header - could also happen as long as we do not fix the reason for your
> failure. If we fix it, we don't need additional security nets ;-)

So if the header is overwritten before the pages are read back in, that
implies that the overwriting IO did not get to disk in my failing case.
Since pleny of other IO did end up on disk (scribbling on my ext3 in the
process), I wonder what could be different there...

> But i have no idea what went wrong for you, i'll have a look at the code
> but i doubt that i'll find much of interest.
> 
> One thing which would be interesting:
> You don't eventually have multiple swap partitions?

One root partition, one swap partition, no swap files or anything.

The only interesting thing I can think of is that my swap partition is
only 512MB while the machine has 1.25GB RAM.  (Installed Ubuntu and took
the defaults before installing the SODIMM.)

FWIW, I have suspended and resumed a few times since the failure and
haven't seen a repeat of the problem.  I am seeing some other problems
with 2.6.13-rc2-mm1 that I didn't see before - DRM/i830 lockups after
swsusp - that might be masking the problem, but I have done the
boot-swsusp-resume-swsusp-resume successfully.

I'm at a loss as to what I might have done to trigger the problem.

-andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: resuming swsusp twice

2005-07-14 Thread Stefan Seyfried
Andy Isaacson wrote:

> Perhaps the image should be more rigorously checked?  I'm wishing that
> it would verify that the header and the image matched, after it finishes

in your case, the header and the image matched. There was no new image
on disk. And no new header.

> reading the image.  For example, computing the hash
> 
> MD5(header || image) (|| denotes "concatenate" in crypto pseudocode.)
> 
> and storing that hash in a final trailing block.  Additionally, of
> course, as soon as the resume has read the image it should overwrite the
> header; and the header should include jiffies or something along those

the header is actually overwritten _prior_ to reading the image back. Or
it should be, obviously it was not in your casee.

> lines to ensure that it won't accidentally have the same contents as the
> previous image's header.
> 
> The hash doesn't have to be MD5; even a CRC should suffice I think...

But the failure you have seen now - failure to invalidate the resume
header - could also happen as long as we do not fix the reason for your
failure. If we fix it, we don't need additional security nets ;-)

But i have no idea what went wrong for you, i'll have a look at the code
but i doubt that i'll find much of interest.

One thing which would be interesting:
You don't eventually have multiple swap partitions?
-- 
Stefan Seyfried  \ "I didn't want to write for pay. I
QA / R Team Mobile Devices  \ wanted to be paid for what I write."
SUSE LINUX Products GmbH, Nürnberg \-- Leonard Cohen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: resuming swsusp twice

2005-07-14 Thread Andy Isaacson
On Thu, Jul 14, 2005 at 04:58:12PM +0200, Stefan Seyfried wrote:
> Andy Isaacson wrote:
> > Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp,
> > and
[snip]
> > and got a panic along the lines of "Unable to find swap space, try
>
> a panic? it should only be an error message, but the machine should
> still be alive.

Well, the console was left on the swsusp VT (guess that's not suprising)
and I was hurrying to catch the train, so I didn't investigate, I just
held down the power button for 5 seconds.

> > swapon -a".  Unfortunately I was in a hurry and didn't record the
> > error
> > messages.  I powered off, then a few minutes later powered on again.
>
> Powered off hard or "shutdown -h now"?

Hard.  It's a Thinkpad X40 with ACPI, so I hold down the power button
for a few seconds to power off.

> > At this point, it resumed *to the swsusp state from yesterday*!
[snip severe ext3 damage]
> > It's extremely unfortunate that there is *any* failure mode in
> > swsusp
> > that can result in this behavior.
>
> I of course won't say that this cannot happen, but by design, the
> swsusp
> signature is invalidated even before reading the image, so
> theoretically
> it should not happen.

Yes, I'd seen that happen on earlier swsusps, so I was quite suprised
when it blew up like this.

Perhaps the image should be more rigorously checked?  I'm wishing that
it would verify that the header and the image matched, after it finishes
reading the image.  For example, computing the hash

MD5(header || image) (|| denotes "concatenate" in crypto pseudocode.)

and storing that hash in a final trailing block.  Additionally, of
course, as soon as the resume has read the image it should overwrite the
header; and the header should include jiffies or something along those
lines to ensure that it won't accidentally have the same contents as the
previous image's header.

The hash doesn't have to be MD5; even a CRC should suffice I think...

-andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: resuming swsusp twice

2005-07-14 Thread Stefan Seyfried
Andy Isaacson wrote:
> Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp, and
> then resumed.  It ran fine overnight, including a fair amount of IO
> (running firefox, rsyncing ~/Mail/archive from my mail server, hg pull,
> etc).  This morning I did a swsusp:
> 
>   echo shutdown > /sys/power/disk
>   echo disk > /sys/power/state
> 
> and got a panic along the lines of "Unable to find swap space, try

a panic? it should only be an error message, but the machine should
still be alive.

> swapon -a".  Unfortunately I was in a hurry and didn't record the error
> messages.  I powered off, then a few minutes later powered on again.

Powered off hard or "shutdown -h now"?

> At this point, it resumed *to the swsusp state from yesterday*!
> As soon as I realized what had happened, I powered off (not
> shutdown) and rebooted.

Good.

> On the next boot it did not find a swsusp signature and booted normally;
> ext3 did a normal recovery and seemed OK, but I was suspicious and did a
> fsck -f, which revealed a lot of damage; most of the damage seemed to be

this is expected in this case, unfortunately.

> in the hg repo which had been pulled from www.kernel.org/hg/.
> 
> It's extremely unfortunate that there is *any* failure mode in swsusp
> that can result in this behavior.

I of course won't say that this cannot happen, but by design, the swsusp
signature is invalidated even before reading the image, so theoretically
it should not happen.

> I will try to reproduce, but I'm curious if anyone else has seen this.

i have not seen anything like that, but i am not always running the
latest & greatest kernel.
-- 
Stefan Seyfried  \ "I didn't want to write for pay. I
QA / R Team Mobile Devices  \ wanted to be paid for what I write."
SUSE LINUX Products GmbH, Nürnberg \-- Leonard Cohen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: resuming swsusp twice

2005-07-14 Thread Andy Isaacson
On Thu, Jul 14, 2005 at 04:58:12PM +0200, Stefan Seyfried wrote:
 Andy Isaacson wrote:
  Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp,
  and
[snip]
  and got a panic along the lines of Unable to find swap space, try

 a panic? it should only be an error message, but the machine should
 still be alive.

Well, the console was left on the swsusp VT (guess that's not suprising)
and I was hurrying to catch the train, so I didn't investigate, I just
held down the power button for 5 seconds.

  swapon -a.  Unfortunately I was in a hurry and didn't record the
  error
  messages.  I powered off, then a few minutes later powered on again.

 Powered off hard or shutdown -h now?

Hard.  It's a Thinkpad X40 with ACPI, so I hold down the power button
for a few seconds to power off.

  At this point, it resumed *to the swsusp state from yesterday*!
[snip severe ext3 damage]
  It's extremely unfortunate that there is *any* failure mode in
  swsusp
  that can result in this behavior.

 I of course won't say that this cannot happen, but by design, the
 swsusp
 signature is invalidated even before reading the image, so
 theoretically
 it should not happen.

Yes, I'd seen that happen on earlier swsusps, so I was quite suprised
when it blew up like this.

Perhaps the image should be more rigorously checked?  I'm wishing that
it would verify that the header and the image matched, after it finishes
reading the image.  For example, computing the hash

MD5(header || image) (|| denotes concatenate in crypto pseudocode.)

and storing that hash in a final trailing block.  Additionally, of
course, as soon as the resume has read the image it should overwrite the
header; and the header should include jiffies or something along those
lines to ensure that it won't accidentally have the same contents as the
previous image's header.

The hash doesn't have to be MD5; even a CRC should suffice I think...

-andy
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: resuming swsusp twice

2005-07-14 Thread Stefan Seyfried
Andy Isaacson wrote:

 Perhaps the image should be more rigorously checked?  I'm wishing that
 it would verify that the header and the image matched, after it finishes

in your case, the header and the image matched. There was no new image
on disk. And no new header.

 reading the image.  For example, computing the hash
 
 MD5(header || image) (|| denotes concatenate in crypto pseudocode.)
 
 and storing that hash in a final trailing block.  Additionally, of
 course, as soon as the resume has read the image it should overwrite the
 header; and the header should include jiffies or something along those

the header is actually overwritten _prior_ to reading the image back. Or
it should be, obviously it was not in your casee.

 lines to ensure that it won't accidentally have the same contents as the
 previous image's header.
 
 The hash doesn't have to be MD5; even a CRC should suffice I think...

But the failure you have seen now - failure to invalidate the resume
header - could also happen as long as we do not fix the reason for your
failure. If we fix it, we don't need additional security nets ;-)

But i have no idea what went wrong for you, i'll have a look at the code
but i doubt that i'll find much of interest.

One thing which would be interesting:
You don't eventually have multiple swap partitions?
-- 
Stefan Seyfried  \ I didn't want to write for pay. I
QA / RD Team Mobile Devices  \ wanted to be paid for what I write.
SUSE LINUX Products GmbH, Nürnberg \-- Leonard Cohen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: resuming swsusp twice

2005-07-14 Thread Andy Isaacson
On Thu, Jul 14, 2005 at 08:36:15PM +0200, Stefan Seyfried wrote:
 But the failure you have seen now - failure to invalidate the resume
 header - could also happen as long as we do not fix the reason for your
 failure. If we fix it, we don't need additional security nets ;-)

So if the header is overwritten before the pages are read back in, that
implies that the overwriting IO did not get to disk in my failing case.
Since pleny of other IO did end up on disk (scribbling on my ext3 in the
process), I wonder what could be different there...

 But i have no idea what went wrong for you, i'll have a look at the code
 but i doubt that i'll find much of interest.
 
 One thing which would be interesting:
 You don't eventually have multiple swap partitions?

One root partition, one swap partition, no swap files or anything.

The only interesting thing I can think of is that my swap partition is
only 512MB while the machine has 1.25GB RAM.  (Installed Ubuntu and took
the defaults before installing the SODIMM.)

FWIW, I have suspended and resumed a few times since the failure and
haven't seen a repeat of the problem.  I am seeing some other problems
with 2.6.13-rc2-mm1 that I didn't see before - DRM/i830 lockups after
swsusp - that might be masking the problem, but I have done the
boot-swsusp-resume-swsusp-resume successfully.

I'm at a loss as to what I might have done to trigger the problem.

-andy
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


resuming swsusp twice

2005-07-13 Thread Andy Isaacson
Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp, and
then resumed.  It ran fine overnight, including a fair amount of IO
(running firefox, rsyncing ~/Mail/archive from my mail server, hg pull,
etc).  This morning I did a swsusp:

echo shutdown > /sys/power/disk
echo disk > /sys/power/state

and got a panic along the lines of "Unable to find swap space, try
swapon -a".  Unfortunately I was in a hurry and didn't record the error
messages.  I powered off, then a few minutes later powered on again.

At this point, it resumed *to the swsusp state from yesterday*!
As soon as I realized what had happened, I powered off (not
shutdown) and rebooted.

On the next boot it did not find a swsusp signature and booted normally;
ext3 did a normal recovery and seemed OK, but I was suspicious and did a
fsck -f, which revealed a lot of damage; most of the damage seemed to be
in the hg repo which had been pulled from www.kernel.org/hg/.

It's extremely unfortunate that there is *any* failure mode in swsusp
that can result in this behavior.

I will try to reproduce, but I'm curious if anyone else has seen this.

-andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


resuming swsusp twice

2005-07-13 Thread Andy Isaacson
Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp, and
then resumed.  It ran fine overnight, including a fair amount of IO
(running firefox, rsyncing ~/Mail/archive from my mail server, hg pull,
etc).  This morning I did a swsusp:

echo shutdown  /sys/power/disk
echo disk  /sys/power/state

and got a panic along the lines of Unable to find swap space, try
swapon -a.  Unfortunately I was in a hurry and didn't record the error
messages.  I powered off, then a few minutes later powered on again.

At this point, it resumed *to the swsusp state from yesterday*!
As soon as I realized what had happened, I powered off (not
shutdown) and rebooted.

On the next boot it did not find a swsusp signature and booted normally;
ext3 did a normal recovery and seemed OK, but I was suspicious and did a
fsck -f, which revealed a lot of damage; most of the damage seemed to be
in the hg repo which had been pulled from www.kernel.org/hg/.

It's extremely unfortunate that there is *any* failure mode in swsusp
that can result in this behavior.

I will try to reproduce, but I'm curious if anyone else has seen this.

-andy
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/