Re: resuming swsusp twice
Hi! > > I of course won't say that this cannot happen, but by design, the > > swsusp > > signature is invalidated even before reading the image, so > > theoretically > > it should not happen. > > Yes, I'd seen that happen on earlier swsusps, so I was quite suprised > when it blew up like this. > > Perhaps the image should be more rigorously checked? I'm wishing that > it would verify that the header and the image matched, after it finishes > reading the image. For example, computing the hash > > MD5(header || image) (|| denotes "concatenate" in crypto pseudocode.) > > and storing that hash in a final trailing block. Additionally, of > course, as soon as the resume has read the image it should overwrite the > header; and the header should include jiffies or something along those > lines to ensure that it won't accidentally have the same contents as the > previous image's header. > > The hash doesn't have to be MD5; even a CRC should suffice I think... Actually, what you want is "if filesystems are newer than suspend image, panic" test. There is more than one way how that can happen. Are you sure you did not do suspend kernel 1 boot kernel 2 attempt to suspend kernel 2 but fail ("not enough swap space") boot kernel 1 ("and successfully resume, corrupting data") ? Pavel -- teflon -- maybe it is a trademark, but it should not be. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: resuming swsusp twice
Hi! > > I of course won't say that this cannot happen, but by design, the > > swsusp > > signature is invalidated even before reading the image, so > > theoretically > > it should not happen. > > Yes, I'd seen that happen on earlier swsusps, so I was quite suprised > when it blew up like this. > > Perhaps the image should be more rigorously checked? I'm wishing that > it would verify that the header and the image matched, after it finishes > reading the image. For example, computing the hash > > MD5(header || image) (|| denotes "concatenate" in crypto pseudocode.) > > and storing that hash in a final trailing block. Additionally, of > course, as soon as the resume has read the image it should overwrite the > header; and the header should include jiffies or something along those > lines to ensure that it won't accidentally have the same contents as the > previous image's header. > > The hash doesn't have to be MD5; even a CRC should suffice I think... That's quite a lot of complexity... just fix the bug. Pavel -- teflon -- maybe it is a trademark, but it should not be. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: resuming swsusp twice
Hi! > Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp, and > then resumed. It ran fine overnight, including a fair amount of IO > (running firefox, rsyncing ~/Mail/archive from my mail server, hg pull, > etc). This morning I did a swsusp: > > echo shutdown > /sys/power/disk > echo disk > /sys/power/state > > and got a panic along the lines of "Unable to find swap space, try > swapon -a". Unfortunately I was in a hurry and didn't record the error > messages. I powered off, then a few minutes later powered on again. > > At this point, it resumed *to the swsusp state from yesterday*! > As soon as I realized what had happened, I powered off (not > shutdown) and rebooted. Bad, very bad. > On the next boot it did not find a swsusp signature and booted normally; > ext3 did a normal recovery and seemed OK, but I was suspicious and did a > fsck -f, which revealed a lot of damage; most of the damage seemed to be > in the hg repo which had been pulled from www.kernel.org/hg/. You should not let ext3 do journal replay. At that point, hopefully damage will be slightly better. > It's extremely unfortunate that there is *any* failure mode in swsusp > that can result in this behavior. Well, I've never seen that one before... Pavel -- teflon -- maybe it is a trademark, but it should not be. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: resuming swsusp twice
Hi! Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp, and then resumed. It ran fine overnight, including a fair amount of IO (running firefox, rsyncing ~/Mail/archive from my mail server, hg pull, etc). This morning I did a swsusp: echo shutdown /sys/power/disk echo disk /sys/power/state and got a panic along the lines of Unable to find swap space, try swapon -a. Unfortunately I was in a hurry and didn't record the error messages. I powered off, then a few minutes later powered on again. At this point, it resumed *to the swsusp state from yesterday*! As soon as I realized what had happened, I powered off (not shutdown) and rebooted. Bad, very bad. On the next boot it did not find a swsusp signature and booted normally; ext3 did a normal recovery and seemed OK, but I was suspicious and did a fsck -f, which revealed a lot of damage; most of the damage seemed to be in the hg repo which had been pulled from www.kernel.org/hg/. You should not let ext3 do journal replay. At that point, hopefully damage will be slightly better. It's extremely unfortunate that there is *any* failure mode in swsusp that can result in this behavior. Well, I've never seen that one before... Pavel -- teflon -- maybe it is a trademark, but it should not be. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: resuming swsusp twice
Hi! I of course won't say that this cannot happen, but by design, the swsusp signature is invalidated even before reading the image, so theoretically it should not happen. Yes, I'd seen that happen on earlier swsusps, so I was quite suprised when it blew up like this. Perhaps the image should be more rigorously checked? I'm wishing that it would verify that the header and the image matched, after it finishes reading the image. For example, computing the hash MD5(header || image) (|| denotes concatenate in crypto pseudocode.) and storing that hash in a final trailing block. Additionally, of course, as soon as the resume has read the image it should overwrite the header; and the header should include jiffies or something along those lines to ensure that it won't accidentally have the same contents as the previous image's header. The hash doesn't have to be MD5; even a CRC should suffice I think... That's quite a lot of complexity... just fix the bug. Pavel -- teflon -- maybe it is a trademark, but it should not be. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: resuming swsusp twice
Hi! I of course won't say that this cannot happen, but by design, the swsusp signature is invalidated even before reading the image, so theoretically it should not happen. Yes, I'd seen that happen on earlier swsusps, so I was quite suprised when it blew up like this. Perhaps the image should be more rigorously checked? I'm wishing that it would verify that the header and the image matched, after it finishes reading the image. For example, computing the hash MD5(header || image) (|| denotes concatenate in crypto pseudocode.) and storing that hash in a final trailing block. Additionally, of course, as soon as the resume has read the image it should overwrite the header; and the header should include jiffies or something along those lines to ensure that it won't accidentally have the same contents as the previous image's header. The hash doesn't have to be MD5; even a CRC should suffice I think... Actually, what you want is if filesystems are newer than suspend image, panic test. There is more than one way how that can happen. Are you sure you did not do suspend kernel 1 boot kernel 2 attempt to suspend kernel 2 but fail (not enough swap space) boot kernel 1 (and successfully resume, corrupting data) ? Pavel -- teflon -- maybe it is a trademark, but it should not be. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: resuming swsusp twice
On Thu, Jul 14, 2005 at 08:36:15PM +0200, Stefan Seyfried wrote: > But the failure you have seen now - failure to invalidate the resume > header - could also happen as long as we do not fix the reason for your > failure. If we fix it, we don't need additional security nets ;-) So if the header is overwritten before the pages are read back in, that implies that the overwriting IO did not get to disk in my failing case. Since pleny of other IO did end up on disk (scribbling on my ext3 in the process), I wonder what could be different there... > But i have no idea what went wrong for you, i'll have a look at the code > but i doubt that i'll find much of interest. > > One thing which would be interesting: > You don't eventually have multiple swap partitions? One root partition, one swap partition, no swap files or anything. The only interesting thing I can think of is that my swap partition is only 512MB while the machine has 1.25GB RAM. (Installed Ubuntu and took the defaults before installing the SODIMM.) FWIW, I have suspended and resumed a few times since the failure and haven't seen a repeat of the problem. I am seeing some other problems with 2.6.13-rc2-mm1 that I didn't see before - DRM/i830 lockups after swsusp - that might be masking the problem, but I have done the boot-swsusp-resume-swsusp-resume successfully. I'm at a loss as to what I might have done to trigger the problem. -andy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: resuming swsusp twice
Andy Isaacson wrote: > Perhaps the image should be more rigorously checked? I'm wishing that > it would verify that the header and the image matched, after it finishes in your case, the header and the image matched. There was no new image on disk. And no new header. > reading the image. For example, computing the hash > > MD5(header || image) (|| denotes "concatenate" in crypto pseudocode.) > > and storing that hash in a final trailing block. Additionally, of > course, as soon as the resume has read the image it should overwrite the > header; and the header should include jiffies or something along those the header is actually overwritten _prior_ to reading the image back. Or it should be, obviously it was not in your casee. > lines to ensure that it won't accidentally have the same contents as the > previous image's header. > > The hash doesn't have to be MD5; even a CRC should suffice I think... But the failure you have seen now - failure to invalidate the resume header - could also happen as long as we do not fix the reason for your failure. If we fix it, we don't need additional security nets ;-) But i have no idea what went wrong for you, i'll have a look at the code but i doubt that i'll find much of interest. One thing which would be interesting: You don't eventually have multiple swap partitions? -- Stefan Seyfried \ "I didn't want to write for pay. I QA / R Team Mobile Devices \ wanted to be paid for what I write." SUSE LINUX Products GmbH, Nürnberg \-- Leonard Cohen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: resuming swsusp twice
On Thu, Jul 14, 2005 at 04:58:12PM +0200, Stefan Seyfried wrote: > Andy Isaacson wrote: > > Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp, > > and [snip] > > and got a panic along the lines of "Unable to find swap space, try > > a panic? it should only be an error message, but the machine should > still be alive. Well, the console was left on the swsusp VT (guess that's not suprising) and I was hurrying to catch the train, so I didn't investigate, I just held down the power button for 5 seconds. > > swapon -a". Unfortunately I was in a hurry and didn't record the > > error > > messages. I powered off, then a few minutes later powered on again. > > Powered off hard or "shutdown -h now"? Hard. It's a Thinkpad X40 with ACPI, so I hold down the power button for a few seconds to power off. > > At this point, it resumed *to the swsusp state from yesterday*! [snip severe ext3 damage] > > It's extremely unfortunate that there is *any* failure mode in > > swsusp > > that can result in this behavior. > > I of course won't say that this cannot happen, but by design, the > swsusp > signature is invalidated even before reading the image, so > theoretically > it should not happen. Yes, I'd seen that happen on earlier swsusps, so I was quite suprised when it blew up like this. Perhaps the image should be more rigorously checked? I'm wishing that it would verify that the header and the image matched, after it finishes reading the image. For example, computing the hash MD5(header || image) (|| denotes "concatenate" in crypto pseudocode.) and storing that hash in a final trailing block. Additionally, of course, as soon as the resume has read the image it should overwrite the header; and the header should include jiffies or something along those lines to ensure that it won't accidentally have the same contents as the previous image's header. The hash doesn't have to be MD5; even a CRC should suffice I think... -andy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: resuming swsusp twice
Andy Isaacson wrote: > Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp, and > then resumed. It ran fine overnight, including a fair amount of IO > (running firefox, rsyncing ~/Mail/archive from my mail server, hg pull, > etc). This morning I did a swsusp: > > echo shutdown > /sys/power/disk > echo disk > /sys/power/state > > and got a panic along the lines of "Unable to find swap space, try a panic? it should only be an error message, but the machine should still be alive. > swapon -a". Unfortunately I was in a hurry and didn't record the error > messages. I powered off, then a few minutes later powered on again. Powered off hard or "shutdown -h now"? > At this point, it resumed *to the swsusp state from yesterday*! > As soon as I realized what had happened, I powered off (not > shutdown) and rebooted. Good. > On the next boot it did not find a swsusp signature and booted normally; > ext3 did a normal recovery and seemed OK, but I was suspicious and did a > fsck -f, which revealed a lot of damage; most of the damage seemed to be this is expected in this case, unfortunately. > in the hg repo which had been pulled from www.kernel.org/hg/. > > It's extremely unfortunate that there is *any* failure mode in swsusp > that can result in this behavior. I of course won't say that this cannot happen, but by design, the swsusp signature is invalidated even before reading the image, so theoretically it should not happen. > I will try to reproduce, but I'm curious if anyone else has seen this. i have not seen anything like that, but i am not always running the latest & greatest kernel. -- Stefan Seyfried \ "I didn't want to write for pay. I QA / R Team Mobile Devices \ wanted to be paid for what I write." SUSE LINUX Products GmbH, Nürnberg \-- Leonard Cohen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: resuming swsusp twice
On Thu, Jul 14, 2005 at 04:58:12PM +0200, Stefan Seyfried wrote: Andy Isaacson wrote: Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp, and [snip] and got a panic along the lines of Unable to find swap space, try a panic? it should only be an error message, but the machine should still be alive. Well, the console was left on the swsusp VT (guess that's not suprising) and I was hurrying to catch the train, so I didn't investigate, I just held down the power button for 5 seconds. swapon -a. Unfortunately I was in a hurry and didn't record the error messages. I powered off, then a few minutes later powered on again. Powered off hard or shutdown -h now? Hard. It's a Thinkpad X40 with ACPI, so I hold down the power button for a few seconds to power off. At this point, it resumed *to the swsusp state from yesterday*! [snip severe ext3 damage] It's extremely unfortunate that there is *any* failure mode in swsusp that can result in this behavior. I of course won't say that this cannot happen, but by design, the swsusp signature is invalidated even before reading the image, so theoretically it should not happen. Yes, I'd seen that happen on earlier swsusps, so I was quite suprised when it blew up like this. Perhaps the image should be more rigorously checked? I'm wishing that it would verify that the header and the image matched, after it finishes reading the image. For example, computing the hash MD5(header || image) (|| denotes concatenate in crypto pseudocode.) and storing that hash in a final trailing block. Additionally, of course, as soon as the resume has read the image it should overwrite the header; and the header should include jiffies or something along those lines to ensure that it won't accidentally have the same contents as the previous image's header. The hash doesn't have to be MD5; even a CRC should suffice I think... -andy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: resuming swsusp twice
Andy Isaacson wrote: Perhaps the image should be more rigorously checked? I'm wishing that it would verify that the header and the image matched, after it finishes in your case, the header and the image matched. There was no new image on disk. And no new header. reading the image. For example, computing the hash MD5(header || image) (|| denotes concatenate in crypto pseudocode.) and storing that hash in a final trailing block. Additionally, of course, as soon as the resume has read the image it should overwrite the header; and the header should include jiffies or something along those the header is actually overwritten _prior_ to reading the image back. Or it should be, obviously it was not in your casee. lines to ensure that it won't accidentally have the same contents as the previous image's header. The hash doesn't have to be MD5; even a CRC should suffice I think... But the failure you have seen now - failure to invalidate the resume header - could also happen as long as we do not fix the reason for your failure. If we fix it, we don't need additional security nets ;-) But i have no idea what went wrong for you, i'll have a look at the code but i doubt that i'll find much of interest. One thing which would be interesting: You don't eventually have multiple swap partitions? -- Stefan Seyfried \ I didn't want to write for pay. I QA / RD Team Mobile Devices \ wanted to be paid for what I write. SUSE LINUX Products GmbH, Nürnberg \-- Leonard Cohen - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: resuming swsusp twice
On Thu, Jul 14, 2005 at 08:36:15PM +0200, Stefan Seyfried wrote: But the failure you have seen now - failure to invalidate the resume header - could also happen as long as we do not fix the reason for your failure. If we fix it, we don't need additional security nets ;-) So if the header is overwritten before the pages are read back in, that implies that the overwriting IO did not get to disk in my failing case. Since pleny of other IO did end up on disk (scribbling on my ext3 in the process), I wonder what could be different there... But i have no idea what went wrong for you, i'll have a look at the code but i doubt that i'll find much of interest. One thing which would be interesting: You don't eventually have multiple swap partitions? One root partition, one swap partition, no swap files or anything. The only interesting thing I can think of is that my swap partition is only 512MB while the machine has 1.25GB RAM. (Installed Ubuntu and took the defaults before installing the SODIMM.) FWIW, I have suspended and resumed a few times since the failure and haven't seen a repeat of the problem. I am seeing some other problems with 2.6.13-rc2-mm1 that I didn't see before - DRM/i830 lockups after swsusp - that might be masking the problem, but I have done the boot-swsusp-resume-swsusp-resume successfully. I'm at a loss as to what I might have done to trigger the problem. -andy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
resuming swsusp twice
Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp, and then resumed. It ran fine overnight, including a fair amount of IO (running firefox, rsyncing ~/Mail/archive from my mail server, hg pull, etc). This morning I did a swsusp: echo shutdown > /sys/power/disk echo disk > /sys/power/state and got a panic along the lines of "Unable to find swap space, try swapon -a". Unfortunately I was in a hurry and didn't record the error messages. I powered off, then a few minutes later powered on again. At this point, it resumed *to the swsusp state from yesterday*! As soon as I realized what had happened, I powered off (not shutdown) and rebooted. On the next boot it did not find a swsusp signature and booted normally; ext3 did a normal recovery and seemed OK, but I was suspicious and did a fsck -f, which revealed a lot of damage; most of the damage seemed to be in the hg repo which had been pulled from www.kernel.org/hg/. It's extremely unfortunate that there is *any* failure mode in swsusp that can result in this behavior. I will try to reproduce, but I'm curious if anyone else has seen this. -andy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
resuming swsusp twice
Yesterday I booted my laptop to 2.6.13-rc2-mm1, suspended to swsusp, and then resumed. It ran fine overnight, including a fair amount of IO (running firefox, rsyncing ~/Mail/archive from my mail server, hg pull, etc). This morning I did a swsusp: echo shutdown /sys/power/disk echo disk /sys/power/state and got a panic along the lines of Unable to find swap space, try swapon -a. Unfortunately I was in a hurry and didn't record the error messages. I powered off, then a few minutes later powered on again. At this point, it resumed *to the swsusp state from yesterday*! As soon as I realized what had happened, I powered off (not shutdown) and rebooted. On the next boot it did not find a swsusp signature and booted normally; ext3 did a normal recovery and seemed OK, but I was suspicious and did a fsck -f, which revealed a lot of damage; most of the damage seemed to be in the hg repo which had been pulled from www.kernel.org/hg/. It's extremely unfortunate that there is *any* failure mode in swsusp that can result in this behavior. I will try to reproduce, but I'm curious if anyone else has seen this. -andy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/