Re: [PATCH] move suspend includes into right place (was Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy))
On Sat, Jun 30, 2007 at 12:44:22AM +0200, Pavel Machek wrote: > Hi! > > > By the way. > > > > > diff --git a/kernel/power/power.h b/kernel/power/power.h > > > index eb461b8..dc13af5 100644 > > > --- a/kernel/power/power.h > > > +++ b/kernel/power/power.h > > > > > > Don't these definitions need to be exported to userspace? That > > definitely is not a header file for userspace. > > Yes, they do. Does this look like a fix? > Pavel > > --- > > Split userinterface part of power.h into separate file. >... You should also add it to include/linux/Kbuild. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] move suspend includes into right place (was Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy))
Hi! > By the way. > > > diff --git a/kernel/power/power.h b/kernel/power/power.h > > index eb461b8..dc13af5 100644 > > --- a/kernel/power/power.h > > +++ b/kernel/power/power.h > > > Don't these definitions need to be exported to userspace? That > definitely is not a header file for userspace. Yes, they do. Does this look like a fix? Pavel --- Split userinterface part of power.h into separate file. Signed-off-by: Pavel Machek <[EMAIL PROTECTED]> diff --git a/include/linux/power.h b/include/linux/power.h new file mode 100644 index 000..37bf890 --- /dev/null +++ b/include/linux/power.h @@ -0,0 +1,31 @@ +#ifndef INCLUDE_LINUX_POWER_H +#define INCLUDE_LINUX_POWER_H + +/* + * This structure is used to pass the values needed for the identification + * of the resume swap area from a user space to the kernel via the + * SNAPSHOT_SET_SWAP_AREA ioctl + */ +struct resume_swap_area { + u_int64_t offset; + u_int32_t dev; +} __attribute__((packed)); + +#define SNAPSHOT_IOC_MAGIC '3' +#define SNAPSHOT_FREEZE_IO(SNAPSHOT_IOC_MAGIC, 1) +#define SNAPSHOT_UNFREEZE _IO(SNAPSHOT_IOC_MAGIC, 2) +#define SNAPSHOT_ATOMIC_SNAPSHOT _IOW(SNAPSHOT_IOC_MAGIC, 3, u32) /* void * */ +#define SNAPSHOT_ATOMIC_RESTORE_IO(SNAPSHOT_IOC_MAGIC, 4) +#define SNAPSHOT_FREE _IO(SNAPSHOT_IOC_MAGIC, 5) +#define SNAPSHOT_SET_IMAGE_SIZE_IOW(SNAPSHOT_IOC_MAGIC, 6, u32) /* unsigned long */ +#define SNAPSHOT_AVAIL_SWAP_IOR(SNAPSHOT_IOC_MAGIC, 7, u32) /* void * */ +#define SNAPSHOT_GET_SWAP_PAGE _IOR(SNAPSHOT_IOC_MAGIC, 8, u32) /* void * */ +#define SNAPSHOT_FREE_SWAP_PAGES _IO(SNAPSHOT_IOC_MAGIC, 9) +#define SNAPSHOT_SET_SWAP_FILE _IOW(SNAPSHOT_IOC_MAGIC, 10, u32) /* unsigned int */ +#define SNAPSHOT_S2RAM _IO(SNAPSHOT_IOC_MAGIC, 11) +#define SNAPSHOT_PMOPS _IOW(SNAPSHOT_IOC_MAGIC, 12, u32) /* unsigned int */ +#define SNAPSHOT_SET_SWAP_AREA _IOW(SNAPSHOT_IOC_MAGIC, 13, \ + struct resume_swap_area) +#define SNAPSHOT_IOC_MAXNR 13 + +#endif diff --git a/kernel/power/power.h b/kernel/power/power.h index 41d33eb..e68352b 100644 --- a/kernel/power/power.h +++ b/kernel/power/power.h @@ -1,5 +1,9 @@ +#ifndef KERNEL_POWER_POWER_H +#define KERNEL_POWER_POWER_H + #include #include +#include struct swsusp_info { struct new_utsname uts; @@ -114,33 +118,6 @@ extern int snapshot_write_next(struct sn extern void snapshot_write_finalize(struct snapshot_handle *handle); extern int snapshot_image_loaded(struct snapshot_handle *handle); -/* - * This structure is used to pass the values needed for the identification - * of the resume swap area from a user space to the kernel via the - * SNAPSHOT_SET_SWAP_AREA ioctl - */ -struct resume_swap_area { - u_int64_t offset; - u_int32_t dev; -} __attribute__((packed)); - -#define SNAPSHOT_IOC_MAGIC '3' -#define SNAPSHOT_FREEZE_IO(SNAPSHOT_IOC_MAGIC, 1) -#define SNAPSHOT_UNFREEZE _IO(SNAPSHOT_IOC_MAGIC, 2) -#define SNAPSHOT_ATOMIC_SNAPSHOT _IOW(SNAPSHOT_IOC_MAGIC, 3, u32) /* void * */ -#define SNAPSHOT_ATOMIC_RESTORE_IO(SNAPSHOT_IOC_MAGIC, 4) -#define SNAPSHOT_FREE _IO(SNAPSHOT_IOC_MAGIC, 5) -#define SNAPSHOT_SET_IMAGE_SIZE_IOW(SNAPSHOT_IOC_MAGIC, 6, u32) /* unsigned long */ -#define SNAPSHOT_AVAIL_SWAP_IOR(SNAPSHOT_IOC_MAGIC, 7, u32) /* void * */ -#define SNAPSHOT_GET_SWAP_PAGE _IOR(SNAPSHOT_IOC_MAGIC, 8, u32) /* void * */ -#define SNAPSHOT_FREE_SWAP_PAGES _IO(SNAPSHOT_IOC_MAGIC, 9) -#define SNAPSHOT_SET_SWAP_FILE _IOW(SNAPSHOT_IOC_MAGIC, 10, u32) /* unsigned int */ -#define SNAPSHOT_S2RAM _IO(SNAPSHOT_IOC_MAGIC, 11) -#define SNAPSHOT_PMOPS _IOW(SNAPSHOT_IOC_MAGIC, 12, u32) /* unsigned int */ -#define SNAPSHOT_SET_SWAP_AREA _IOW(SNAPSHOT_IOC_MAGIC, 13, \ - struct resume_swap_area) -#define SNAPSHOT_IOC_MAXNR 13 - #define PMOPS_PREPARE 1 #define PMOPS_ENTER2 #define PMOPS_FINISH 3 @@ -165,3 +142,5 @@ extern int suspend_enter(suspend_state_t struct timeval; extern void swsusp_show_speed(struct timeval *, struct timeval *, unsigned int, char *); + +#endif -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Am Samstag 26 Mai 2007 schrieb Rafael J. Wysocki: Hi Rafael! > The outcome was, more-or-less, that we'll work on merging suspend2 or > at least some parts of it. > > However, in the meantime there have been some discussions implying that > we have some important problems with suspend/hibernation that suspend2 > doesn't solve and that IMHO are more urgent than the merging of > suspend2 right not. > > So, as far as I'm concerned, the plan is to fix the more urgent > problems first and to work on merging suspend2 as far as there's time > to do this. > > The problem is there are only a few people working on it and there's a > lot to do, so I can only ask you to be patient. ;-) Thats fine with me - I understand that. I just thought that there has been no outcome at all. I will try to be patient as long as I do not dig into kernel hacking myself deeply enough to be able to help with that - did not do more than to put together two conflicting patches to compile my own kernels till now and forward port a patch for a sundance network card. I can help with testing once there is something testable tough. Regards, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 signature.asc Description: This is a digitally signed message part.
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Saturday, 26 May 2007 19:37, Martin Steigerwald wrote: > Am Mittwoch 25 April 2007 schrieb Pavel Machek: > > Hi! > > > > > This is why there's a lot to be said for > > > > > > echo mem > /sys/power/state > > > > > > and being able to follow the path through _one_ object (the kernel) > > > over trying to figure out the interaction between many different > > > parts with different versions. > > > > The 'promise' is 'if you can get echo disk > /sys/power/state working, > > uswsusp will work. too'. IOW it should be ok to debug the in-kernel > > parts, only. > > Hello Nigel, Pavel, Rafael and everyone else who is involved, > > I would like to ask what come out of the suspend2 merge discussion. Nigel > just told that suspend2 likely won't be merged anytime soon and thats its > business as usual: > > - > It's pretty much business as usual. Linus doesn't want another > implementation merged, and he wants the three of us (Pavel, Rafael and > myself) to agree on a way forward. He also believes that we're > approaching things from the wrong direction at the moment. Funnily > enough, this is the one area on which we do all agree. > - > http://lists.suspend2.net/lurker/message/20070510.021641.fe306add.en.html > > > Has there been any further discussion and preferably agreement on the way > to go forward? The outcome was, more-or-less, that we'll work on merging suspend2 or at least some parts of it. However, in the meantime there have been some discussions implying that we have some important problems with suspend/hibernation that suspend2 doesn't solve and that IMHO are more urgent than the merging of suspend2 right not. So, as far as I'm concerned, the plan is to fix the more urgent problems first and to work on merging suspend2 as far as there's time to do this. The problem is there are only a few people working on it and there's a lot to do, so I can only ask you to be patient. ;-) Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Am Mittwoch 25 April 2007 schrieb Pavel Machek: > Hi! > > > This is why there's a lot to be said for > > > > echo mem > /sys/power/state > > > > and being able to follow the path through _one_ object (the kernel) > > over trying to figure out the interaction between many different > > parts with different versions. > > The 'promise' is 'if you can get echo disk > /sys/power/state working, > uswsusp will work. too'. IOW it should be ok to debug the in-kernel > parts, only. Hello Nigel, Pavel, Rafael and everyone else who is involved, I would like to ask what come out of the suspend2 merge discussion. Nigel just told that suspend2 likely won't be merged anytime soon and thats its business as usual: - It's pretty much business as usual. Linus doesn't want another implementation merged, and he wants the three of us (Pavel, Rafael and myself) to agree on a way forward. He also believes that we're approaching things from the wrong direction at the moment. Funnily enough, this is the one area on which we do all agree. - http://lists.suspend2.net/lurker/message/20070510.021641.fe306add.en.html Has there been any further discussion and preferably agreement on the way to go forward? Although you Linus, as I read from different mails only use suspend to RAM, there are many users out there who use suspend to disk daily. I used in kernel software suspend initially and it worked quite nice with starting from 2.6.10 or 2.6.11 where suspend2 didn't work for me before 2.6.14 with the hibernate script. But from then on suspend2 worked better than in kernel software suspend for me and colleagues on: - ThinkPad T23 - ThinkPad T42 - Possibly some other ThinkPads - as well various Dell workstations we have at work It was faster and more reliable, yielding uptimes up to 40 days on my workstation recently (with 2.6.17.7 still). And even that uptime was only ended by booting a newly build kernel (2.6.21 with sws 2.2.9.13). For me in the role of a user actually this is a really satisfying solution! I tried userspace software suspend from time to time but then just was fed up with it, cause I could not get it to work within any sensible amount of time - even with some bog standard Debian kernel, I think it was some 2.6.18 one. Maybe I am dumb, but so be it, it should not be that complicated to get it to work. Recently I didn't even bother to try anymore. Well and I read in the suspend2 merge discussion that even in kernel suspend does not work reliably anymore. As long I cannot be convinced that the vanilla kernel contains a suspend to disk solution that works as good as suspend2 I will patch suspend2 into all of the desktop kernels I build. Thats quite bad IMHO for exactly the same reason than having drivers maintained out of the kernel. For the same reason I think swap prefetch should go in as soon as possible. It will never have the adoption and care taking of an in-kernel-tree solution. I am convinced that a working suspend to RAM just is not enough - well it wasn't working correctly last time I checked. But I even don't bother about suspend to RAM anymore. I can wait those additional seconds for suspend to disk and it allows me to drive my laptops without batteries most of the time and have workstations switched off completely so that they do not consume standby power. So please, pretty please consider working together on a reliable, fast, stable, easy to use and configurable in-kernel-tree snapshot solution! Actually I as a user I couldn't care less about the implementation details, but as someone who is interested in kernel technologies I like it to be a clean and well designed solution, too. ;-) Maybe when the Linux Foundation organizes a meeting for Nigel, Pavel, Rafael and other kernel developers interested in creating such a solution it will help. To me it seems such a concentrated meeting in a good atmosphere could be more effective than endless mailing list discussions not leading to a clear result. When its not easy for the involved people to work together maybe a casual bystander who understands enough of kernel details should moderate the meeting and help finding an agreement. It would just be such a pity to miss the chance to have a nicely working snapshot solution in the Linux kernel, that may even be interesting for virtualization (you could store a backup of a machine state permanently or even more of them - if not already available through other technologies like well suspend2 with filewriter for example). Regards, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 signature.asc Description: This is a digitally signed message part.
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Matthew Garrett wrote: > On Sat, Apr 28, 2007 at 12:15:50PM -0700, David Lang wrote: >> with dynaticks now in the kernel it may even be possible to have the idle >> process decide that the next event is far enough away that it should >> suspend-to-ram until that point. > > This would be ideal (and it's broadly what the OLPC guys are aiming for, > I think), but on most platforms you're looking at at least a second or > so to resume. As far as I know, we're still looking at ~60 ticks a > second at best for an average desktop, so that's not going to be a win. As long as the system has a clear idea of how long it will take to resume, it can schedule a wakeup for a reasonable amount of time before that. Ideally, a completely unused system might not need a tick for several seconds. However, I agree that it doesn't make sense to add such functionality to the kernel until someone can show a system that can actually wait that long between ticks. - Josh Triplett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Sat, Apr 28, 2007 at 12:15:50PM -0700, David Lang wrote: > with dynaticks now in the kernel it may even be possible to have the idle > process decide that the next event is far enough away that it should > suspend-to-ram until that point. This would be ideal (and it's broadly what the OLPC guys are aiming for, I think), but on most platforms you're looking at at least a second or so to resume. As far as I know, we're still looking at ~60 ticks a second at best for an average desktop, so that's not going to be a win. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Sat, 28 Apr 2007, Bill Davidsen wrote: Linus Torvalds wrote: (And for me personally, I'd love to have all my machines "sleep" by default, but wake up by eithernet and keyboard - I'd love for my screen saver to literally put the machine to sleep, but not have to worry about touching a keyboard - just ssh'ing into them should still wake up. It's *technically* doable, but it's just a pain to do right now) And timer somehow so cron jobs could still run. Ideal for critical but rarely used machines like fallover servers, the user documentation download site, or similar. with dynaticks now in the kernel it may even be possible to have the idle process decide that the next event is far enough away that it should suspend-to-ram until that point. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Linus Torvalds wrote: (And for me personally, I'd love to have all my machines "sleep" by default, but wake up by eithernet and keyboard - I'd love for my screen saver to literally put the machine to sleep, but not have to worry about touching a keyboard - just ssh'ing into them should still wake up. It's *technically* doable, but it's just a pain to do right now) And timer somehow so cron jobs could still run. Ideal for critical but rarely used machines like fallover servers, the user documentation download site, or similar. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Friday, 27 April 2007 12:19, Johannes Berg wrote: > On Fri, 2007-04-27 at 12:18 +0200, Rafael J. Wysocki wrote: > > > 1) We define platform_hibernation if CONFIG_ACPI is set. > > Let's just define it always then in the common code so we don't have > even more magic bits platforms need to define even if they don't care at > all. And please don't put #ifdef CONFIG_ACPI into the common code ;) > Maybe #ifdef CONFIG_ARCH_NEEDS_HIBERNATE_HOOKS or something. > > > 2) In the ACPI code we do > > > > if (can do S4) > > platform_hibernation = 1; > > Gotcha. > > > 3) We have functions arch_platform_prepare()/finish()/enter() that are > > defined > > to be noops for anything but ACPI systems and for ACPI systems they are > > defined like this: > > > > int arch_platform_enter(void) > > { > > if (!platform_hibernation) > > return 0; > > > > ... > > } > > > > I think it should work. > > You could reduce code churn in all other platforms by making these weak > symbols like the irq hooks I did for pm_ops. It looks like it can work > and possibly is even less intrusive than my hibernate_ops patch. Though > then again my hibernate_ops patch removed a lot of stuff that is now no > longer necessary, and also completely removed the PM_SUSPEND_DISK foo... > we probably want that regardless of how we invoke ACPI. Yes. Still, I'd like to rework your patch to deal with ACPI without introducing hibernate_ops . I'm going to do this later today if you don't mind. :-) Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Fri, 2007-04-27 at 14:09 +0200, Rafael J. Wysocki wrote: > Yes. Still, I'd like to rework your patch to deal with ACPI without > introducing hibernate_ops . I'm going to do this later today if you don't > mind. :-) Not at all :) That's why I actually sent it out instead of just saying "well I give up it breaks user.c" johannes signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Fri, 2007-04-27 at 12:18 +0200, Rafael J. Wysocki wrote: > 1) We define platform_hibernation if CONFIG_ACPI is set. Let's just define it always then in the common code so we don't have even more magic bits platforms need to define even if they don't care at all. And please don't put #ifdef CONFIG_ACPI into the common code ;) Maybe #ifdef CONFIG_ARCH_NEEDS_HIBERNATE_HOOKS or something. > 2) In the ACPI code we do > > if (can do S4) > platform_hibernation = 1; Gotcha. > 3) We have functions arch_platform_prepare()/finish()/enter() that are defined > to be noops for anything but ACPI systems and for ACPI systems they are > defined like this: > > int arch_platform_enter(void) > { > if (!platform_hibernation) > return 0; > > ... > } > > I think it should work. You could reduce code churn in all other platforms by making these weak symbols like the irq hooks I did for pm_ops. It looks like it can work and possibly is even less intrusive than my hibernate_ops patch. Though then again my hibernate_ops patch removed a lot of stuff that is now no longer necessary, and also completely removed the PM_SUSPEND_DISK foo... we probably want that regardless of how we invoke ACPI. johannes signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Friday, 27 April 2007 11:41, Johannes Berg wrote: > On Thu, 2007-04-26 at 21:02 +0200, Rafael J. Wysocki wrote: > > > Yes. That's because we want to be able to repeat creating the image > > without closing the fd in some situations. > > Oh yeah, I just checked and it's not in fact necessary. I'm just > confused. > > > Still, we could use a global var 'platform_hibernation' or something like > > this, > > I think. Then, we can do > > > > #define platform_hibernation0 > > > > on the architectures that don't need it and make ACPI use it instead of this > > "dynamic linking". > > No, because acpi doesn't know at build time whether it can actually do > S4 or not. That's not a problem, I think. 1) We define platform_hibernation if CONFIG_ACPI is set. 2) In the ACPI code we do if (can do S4) platform_hibernation = 1; 3) We have functions arch_platform_prepare()/finish()/enter() that are defined to be noops for anything but ACPI systems and for ACPI systems they are defined like this: int arch_platform_enter(void) { if (!platform_hibernation) return 0; ... } I think it should work. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Fri, 2007-04-27 at 11:41 +0200, Johannes Berg wrote: > No, because acpi doesn't know at build time whether it can actually do > S4 or not. Actually, you could probably do it by making some weak symbol for it that only ACPI overrides, and then check in the ACPI code if S4 is possible, otherwise somehow invoke the old symbol or copy the code or something. Seems a bit more fragile though. johannes signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 2007-04-26 at 21:02 +0200, Rafael J. Wysocki wrote: > Yes. That's because we want to be able to repeat creating the image > without closing the fd in some situations. Oh yeah, I just checked and it's not in fact necessary. I'm just confused. > Still, we could use a global var 'platform_hibernation' or something like > this, > I think. Then, we can do > > #define platform_hibernation 0 > > on the architectures that don't need it and make ACPI use it instead of this > "dynamic linking". No, because acpi doesn't know at build time whether it can actually do S4 or not. johannes signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi. On Thu, 2007-04-26 at 23:22 +0200, Rafael J. Wysocki wrote: > On Thursday, 26 April 2007 22:55, Nigel Cunningham wrote: > > Hi. > > > > On Thu, 2007-04-26 at 22:37 +0200, Rafael J. Wysocki wrote: > > > On Thursday, 26 April 2007 22:16, Nigel Cunningham wrote: > > > > Hi. > > > > > > > > On Thu, 2007-04-26 at 21:28 +0200, Rafael J. Wysocki wrote: > > > > > On Thursday, 26 April 2007 18:10, Pekka Enberg wrote: > > > > > > > > > > > > On 4/26/2007, "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > > In principle, we could add suspend2 as an alternative (in analogy > > > > > > > with the I/O > > > > > > > schedulers, for example), but I think for this purpose it should > > > > > > > be reviewed > > > > > > > properly. > > > > > > > > > > > > Yeah, this makes sense. > > > > > > > > > > > > On 4/26/2007, "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > > There also is a real problem with how it uses the LRU pages. It > > > > > > > _seems_ to > > > > > > > work, but at least to me it seems to be potentially dangerous. > > > > > > > > > > > > I am new to suspend2 so can you please explain what exactly is > > > > > > dangerous > > > > > > about it? > > > > > > > > > > After freezing tasks, it first saves the contents of the LRU pages, > > > > > freezes > > > > > devices and then uses the LRU pages for storing the suspend image (if > > > > > more > > > > > memory is needed, it's allocated, but that's irrelevant here). Now, > > > > > we have no > > > > > warranty that the LRU pages are not updated after we've saved their > > > > > contents > > > > > (first potential problem here). > > > > > > > > > > After the image has been created, we have to unfreeze devices and > > > > > save the > > > > > image. Now, we have no warranty that no one will be writing to the > > > > > LRU pages > > > > > that we have used to store the image, for whatever reasons known to > > > > > him, so the > > > > > image can potentially get corrupted while it's being saved. > > > > > > > > > > In principle, device drivers can do this and there are some kernel > > > > > threads that > > > > > also can do this (we don't freeze them, because they're needed for > > > > > the image > > > > > saving). > > > > > > > > > > The design is conceptually really really complicated and it makes > > > > > strong > > > > > assumptions about the behavior of different subsystems. While these > > > > > assumptions _may_ be satisfied right now, we'd have to ensure the > > > > > satisfaction > > > > > of them in the future if suspend2 were merged. > > > > > > > > That's a good description of the issue, although I think _may_ and > > > > _seems_ are stating things a bit more pessimistically than is > > > > necessary. > > > > > > I've used them to express my personal concerns. > > > > > > > You see, we need to remember that the pages which are saved separately > > > > are LRU pages. Because userspace is frozen, their contents are going to > > > > be static. The only possibilities for modifying them come from timer > > > > routines, improperly frozen filesystems and device drivers. > > > > > > And kernel threads that we don't freeze deliberately. Currently, these > > > are > > > all worker threads, dm-related kernel threads and some others. > > > > > > > We have code to check that the LRU isn't changing, and I've only seen > > > > one report of modifications to about 20 LRU pages. I haven't had the > > > > time yet to chase down the cause, but hope to do so soon. > > > > > > I didn't say that would be common. If it had been, you'd have seen > > > problems > > > with it. To me the problem is the lack of warranty that it won't happen. > > > > > > > The general scheme has been working for four or five years - if there > > > > was a fundamental issue, we would have found it by now. > > > > > > > > The scheme isn't complicated. > > > > > > Conceptually, it is complicated just because you're using the LRU. > > > > Well, I'm willing to look at other ideas. I actually selected LRU > > because it was simple. Prior to this, we did have a try at just > > iterating over the pages of frozen processes, but it didn't yield enough > > pages to be viable. I wouldn't be surprised if hunting down the cause of > > these changing pages leads to doing the opposite - starting with LRU > > pages and then removing all the ones owned by processes. (Am I right in > > thinking the remainder would be anonymous pages? I must learn more mm > > inards :>). > > I think we can discuss that, and the other things too. I'm open to > cooperation. :-) Great! Nigel signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thursday, 26 April 2007 22:55, Nigel Cunningham wrote: > Hi. > > On Thu, 2007-04-26 at 22:37 +0200, Rafael J. Wysocki wrote: > > On Thursday, 26 April 2007 22:16, Nigel Cunningham wrote: > > > Hi. > > > > > > On Thu, 2007-04-26 at 21:28 +0200, Rafael J. Wysocki wrote: > > > > On Thursday, 26 April 2007 18:10, Pekka Enberg wrote: > > > > > > > > > > On 4/26/2007, "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > In principle, we could add suspend2 as an alternative (in analogy > > > > > > with the I/O > > > > > > schedulers, for example), but I think for this purpose it should be > > > > > > reviewed > > > > > > properly. > > > > > > > > > > Yeah, this makes sense. > > > > > > > > > > On 4/26/2007, "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > There also is a real problem with how it uses the LRU pages. It > > > > > > _seems_ to > > > > > > work, but at least to me it seems to be potentially dangerous. > > > > > > > > > > I am new to suspend2 so can you please explain what exactly is > > > > > dangerous > > > > > about it? > > > > > > > > After freezing tasks, it first saves the contents of the LRU pages, > > > > freezes > > > > devices and then uses the LRU pages for storing the suspend image (if > > > > more > > > > memory is needed, it's allocated, but that's irrelevant here). Now, we > > > > have no > > > > warranty that the LRU pages are not updated after we've saved their > > > > contents > > > > (first potential problem here). > > > > > > > > After the image has been created, we have to unfreeze devices and save > > > > the > > > > image. Now, we have no warranty that no one will be writing to the LRU > > > > pages > > > > that we have used to store the image, for whatever reasons known to > > > > him, so the > > > > image can potentially get corrupted while it's being saved. > > > > > > > > In principle, device drivers can do this and there are some kernel > > > > threads that > > > > also can do this (we don't freeze them, because they're needed for the > > > > image > > > > saving). > > > > > > > > The design is conceptually really really complicated and it makes strong > > > > assumptions about the behavior of different subsystems. While these > > > > assumptions _may_ be satisfied right now, we'd have to ensure the > > > > satisfaction > > > > of them in the future if suspend2 were merged. > > > > > > That's a good description of the issue, although I think _may_ and > > > _seems_ are stating things a bit more pessimistically than is > > > necessary. > > > > I've used them to express my personal concerns. > > > > > You see, we need to remember that the pages which are saved separately > > > are LRU pages. Because userspace is frozen, their contents are going to > > > be static. The only possibilities for modifying them come from timer > > > routines, improperly frozen filesystems and device drivers. > > > > And kernel threads that we don't freeze deliberately. Currently, these are > > all worker threads, dm-related kernel threads and some others. > > > > > We have code to check that the LRU isn't changing, and I've only seen > > > one report of modifications to about 20 LRU pages. I haven't had the > > > time yet to chase down the cause, but hope to do so soon. > > > > I didn't say that would be common. If it had been, you'd have seen problems > > with it. To me the problem is the lack of warranty that it won't happen. > > > > > The general scheme has been working for four or five years - if there > > > was a fundamental issue, we would have found it by now. > > > > > > The scheme isn't complicated. > > > > Conceptually, it is complicated just because you're using the LRU. > > Well, I'm willing to look at other ideas. I actually selected LRU > because it was simple. Prior to this, we did have a try at just > iterating over the pages of frozen processes, but it didn't yield enough > pages to be viable. I wouldn't be surprised if hunting down the cause of > these changing pages leads to doing the opposite - starting with LRU > pages and then removing all the ones owned by processes. (Am I right in > thinking the remainder would be anonymous pages? I must learn more mm > inards :>). I think we can discuss that, and the other things too. I'm open to cooperation. :-) Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > #define SNAPSHOT_SET_IMAGE_SIZE _IOW(SNAPSHOT_IOC_MAGIC, 6, > > unsigned long) > > So I'm not supposed to be able to suspend the 16Gb-ram, 32bits servers > I have here? (You are right, this should have been u64) Snapshot image is by design limited by ammount of lowmem. If you want to change that, this unsigned long limit will be least of your problems. (And no, I'd not expect loaded 6GB box to suspend properly. It will just realize it does not have enough lowmem and refuse to suspend). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Rafael J. Wysocki wrote: The general scheme has been working for four or five years - if there was a fundamental issue, we would have found it by now. The scheme isn't complicated. Conceptually, it is complicated just because you're using the LRU. I know that I've seen many projects that are working on or claim to have suceeded in being able to do live migration of processes from one system to another. has anyone looked at useing any of these mechanisms for snapshoting the user processes for the std situation? if you can do this a process at a time you may be able to avoid the massive blob of a write instead of what linus was saying buff = snapshot() write(buff) it would be start_snapshot() /* stops all userspace schedulers except for this process */ foreach(pid) { buff = snapshotpid(pid) write(buff) } David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi. On Thu, 2007-04-26 at 17:06 -0400, Theodore Tso wrote: > On Thu, Apr 26, 2007 at 08:56:48AM -0700, Linus Torvalds wrote: > > > > > > On Thu, 26 Apr 2007, Pavel Machek wrote: > > > > > > Yes, probably will. The other option is to break existing 32-bit > > > userspace, which is a bit more common AFAICT. > > > > And *this* is why kernel/userspace things simply should not be done. > > > > It's simply better to do things entirely in the kernel. Because you add > > bugs and complications otherwise, and doing it in the kernel allows you > > to just switch things around. > > > > As it is, it appears that user-space suspend is just broken whichever way > > we turn. > > Well, in that case maybe suspend2 should be very seriously considered, > since it has the features of uswsusp --- basic features which every > single Microsoft and MacOSX user are used to like, like progress bars > --- and it's all done in the kernel. Umm. I don't want to be picky, but that's not quite true. The progress bar is done in userspace. There's also the possibility of using a userspace app to manage storage too (I did work on establishing/tearing down an NBD connection as necessary but didn't quite get it finished and have never released it). That said, this bit can be torn out by simply removing a file and the Makefile line. Nigel signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, Apr 26, 2007 at 08:56:48AM -0700, Linus Torvalds wrote: > > > On Thu, 26 Apr 2007, Pavel Machek wrote: > > > > Yes, probably will. The other option is to break existing 32-bit > > userspace, which is a bit more common AFAICT. > > And *this* is why kernel/userspace things simply should not be done. > > It's simply better to do things entirely in the kernel. Because you add > bugs and complications otherwise, and doing it in the kernel allows you > to just switch things around. > > As it is, it appears that user-space suspend is just broken whichever way > we turn. Well, in that case maybe suspend2 should be very seriously considered, since it has the features of uswsusp --- basic features which every single Microsoft and MacOSX user are used to like, like progress bars --- and it's all done in the kernel. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > > See? Two *totally* different cases. They have *nothing* in common. Not the > > > call sequence, not the logic, not *anything*. > > > > Except that both methods cannot rely upon hot-pluggable devices > > still being present on resume/restore. It is exceptionally common > > to unplug all USB/firewire cables, mouse, keyboard, docking cables etc.. > > after a machine is in S2R state. > > Right, and that has nothing to do with suspend/resume. You'd better be > able to handle unexpected hotplugs _regardless_. Actually, with suspend/resume it is quite easy to cheat, and just "unplug" the hardware on suspend, then "plug it back" on resume. That works very well for devices like keyboards and mice (where you can't tell if you are talking to the same hw, anyway). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi. On Thu, 2007-04-26 at 22:37 +0200, Rafael J. Wysocki wrote: > On Thursday, 26 April 2007 22:16, Nigel Cunningham wrote: > > Hi. > > > > On Thu, 2007-04-26 at 21:28 +0200, Rafael J. Wysocki wrote: > > > On Thursday, 26 April 2007 18:10, Pekka Enberg wrote: > > > > > > > > On 4/26/2007, "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > In principle, we could add suspend2 as an alternative (in analogy > > > > > with the I/O > > > > > schedulers, for example), but I think for this purpose it should be > > > > > reviewed > > > > > properly. > > > > > > > > Yeah, this makes sense. > > > > > > > > On 4/26/2007, "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > There also is a real problem with how it uses the LRU pages. It > > > > > _seems_ to > > > > > work, but at least to me it seems to be potentially dangerous. > > > > > > > > I am new to suspend2 so can you please explain what exactly is dangerous > > > > about it? > > > > > > After freezing tasks, it first saves the contents of the LRU pages, > > > freezes > > > devices and then uses the LRU pages for storing the suspend image (if more > > > memory is needed, it's allocated, but that's irrelevant here). Now, we > > > have no > > > warranty that the LRU pages are not updated after we've saved their > > > contents > > > (first potential problem here). > > > > > > After the image has been created, we have to unfreeze devices and save the > > > image. Now, we have no warranty that no one will be writing to the LRU > > > pages > > > that we have used to store the image, for whatever reasons known to him, > > > so the > > > image can potentially get corrupted while it's being saved. > > > > > > In principle, device drivers can do this and there are some kernel > > > threads that > > > also can do this (we don't freeze them, because they're needed for the > > > image > > > saving). > > > > > > The design is conceptually really really complicated and it makes strong > > > assumptions about the behavior of different subsystems. While these > > > assumptions _may_ be satisfied right now, we'd have to ensure the > > > satisfaction > > > of them in the future if suspend2 were merged. > > > > That's a good description of the issue, although I think _may_ and > > _seems_ are stating things a bit more pessimistically than is > > necessary. > > I've used them to express my personal concerns. > > > You see, we need to remember that the pages which are saved separately > > are LRU pages. Because userspace is frozen, their contents are going to > > be static. The only possibilities for modifying them come from timer > > routines, improperly frozen filesystems and device drivers. > > And kernel threads that we don't freeze deliberately. Currently, these are > all worker threads, dm-related kernel threads and some others. > > > We have code to check that the LRU isn't changing, and I've only seen > > one report of modifications to about 20 LRU pages. I haven't had the > > time yet to chase down the cause, but hope to do so soon. > > I didn't say that would be common. If it had been, you'd have seen problems > with it. To me the problem is the lack of warranty that it won't happen. > > > The general scheme has been working for four or five years - if there > > was a fundamental issue, we would have found it by now. > > > > The scheme isn't complicated. > > Conceptually, it is complicated just because you're using the LRU. Well, I'm willing to look at other ideas. I actually selected LRU because it was simple. Prior to this, we did have a try at just iterating over the pages of frozen processes, but it didn't yield enough pages to be viable. I wouldn't be surprised if hunting down the cause of these changing pages leads to doing the opposite - starting with LRU pages and then removing all the ones owned by processes. (Am I right in thinking the remainder would be anonymous pages? I must learn more mm inards :>). Nigel signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thursday, 26 April 2007 22:16, Nigel Cunningham wrote: > Hi. > > On Thu, 2007-04-26 at 21:28 +0200, Rafael J. Wysocki wrote: > > On Thursday, 26 April 2007 18:10, Pekka Enberg wrote: > > > > > > On 4/26/2007, "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > In principle, we could add suspend2 as an alternative (in analogy with > > > > the I/O > > > > schedulers, for example), but I think for this purpose it should be > > > > reviewed > > > > properly. > > > > > > Yeah, this makes sense. > > > > > > On 4/26/2007, "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > There also is a real problem with how it uses the LRU pages. It > > > > _seems_ to > > > > work, but at least to me it seems to be potentially dangerous. > > > > > > I am new to suspend2 so can you please explain what exactly is dangerous > > > about it? > > > > After freezing tasks, it first saves the contents of the LRU pages, freezes > > devices and then uses the LRU pages for storing the suspend image (if more > > memory is needed, it's allocated, but that's irrelevant here). Now, we > > have no > > warranty that the LRU pages are not updated after we've saved their contents > > (first potential problem here). > > > > After the image has been created, we have to unfreeze devices and save the > > image. Now, we have no warranty that no one will be writing to the LRU > > pages > > that we have used to store the image, for whatever reasons known to him, so > > the > > image can potentially get corrupted while it's being saved. > > > > In principle, device drivers can do this and there are some kernel threads > > that > > also can do this (we don't freeze them, because they're needed for the image > > saving). > > > > The design is conceptually really really complicated and it makes strong > > assumptions about the behavior of different subsystems. While these > > assumptions _may_ be satisfied right now, we'd have to ensure the > > satisfaction > > of them in the future if suspend2 were merged. > > That's a good description of the issue, although I think _may_ and > _seems_ are stating things a bit more pessimistically than is > necessary. I've used them to express my personal concerns. > You see, we need to remember that the pages which are saved separately > are LRU pages. Because userspace is frozen, their contents are going to > be static. The only possibilities for modifying them come from timer > routines, improperly frozen filesystems and device drivers. And kernel threads that we don't freeze deliberately. Currently, these are all worker threads, dm-related kernel threads and some others. > We have code to check that the LRU isn't changing, and I've only seen > one report of modifications to about 20 LRU pages. I haven't had the > time yet to chase down the cause, but hope to do so soon. I didn't say that would be common. If it had been, you'd have seen problems with it. To me the problem is the lack of warranty that it won't happen. > The general scheme has been working for four or five years - if there > was a fundamental issue, we would have found it by now. > > The scheme isn't complicated. Conceptually, it is complicated just because you're using the LRU. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi. On Thu, 2007-04-26 at 21:28 +0200, Rafael J. Wysocki wrote: > On Thursday, 26 April 2007 18:10, Pekka Enberg wrote: > > > > On 4/26/2007, "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > In principle, we could add suspend2 as an alternative (in analogy with > > > the I/O > > > schedulers, for example), but I think for this purpose it should be > > > reviewed > > > properly. > > > > Yeah, this makes sense. > > > > On 4/26/2007, "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > There also is a real problem with how it uses the LRU pages. It _seems_ > > > to > > > work, but at least to me it seems to be potentially dangerous. > > > > I am new to suspend2 so can you please explain what exactly is dangerous > > about it? > > After freezing tasks, it first saves the contents of the LRU pages, freezes > devices and then uses the LRU pages for storing the suspend image (if more > memory is needed, it's allocated, but that's irrelevant here). Now, we have > no > warranty that the LRU pages are not updated after we've saved their contents > (first potential problem here). > > After the image has been created, we have to unfreeze devices and save the > image. Now, we have no warranty that no one will be writing to the LRU pages > that we have used to store the image, for whatever reasons known to him, so > the > image can potentially get corrupted while it's being saved. > > In principle, device drivers can do this and there are some kernel threads > that > also can do this (we don't freeze them, because they're needed for the image > saving). > > The design is conceptually really really complicated and it makes strong > assumptions about the behavior of different subsystems. While these > assumptions _may_ be satisfied right now, we'd have to ensure the satisfaction > of them in the future if suspend2 were merged. That's a good description of the issue, although I think _may_ and _seems_ are stating things a bit more pessimistically than is necessary. You see, we need to remember that the pages which are saved separately are LRU pages. Because userspace is frozen, their contents are going to be static. The only possibilities for modifying them come from timer routines, improperly frozen filesystems and device drivers. We have code to check that the LRU isn't changing, and I've only seen one report of modifications to about 20 LRU pages. I haven't had the time yet to chase down the cause, but hope to do so soon. The general scheme has been working for four or five years - if there was a fundamental issue, we would have found it by now. The scheme isn't complicated. The algo for figuring out whether to save the page in an atomic copy just says: Iterate through all LRU pages. For each page, ask: Is this used by the thread suspending, or by userui? No? Save separately. Yes? Save in the atomic copy oh, and save everything else that needs to be saved in the atomic copy. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thursday, 26 April 2007 02:34, Linus Torvalds wrote: > > On Wed, 25 Apr 2007, Linus Torvalds wrote: > > > > The *thaw* needs to happen with devices quiescent. > > Btw, I sure as hell hope you didn't use "suspend()" for that. You're > (again) much better off having a totally separate function that just > freezes stuff. > > So in the "snapshot+shutdown" path, you should have: > > - prepare_to_snapshot() - allocate memory, and possibly return errors > >We can skip this, if we just make the rule be that any devices that >want to support snapshotting must always have the memory required for >snapshotting pre-allocated. Most devices really do allocate memory for >their state anyway, and the only real reason for the "prepare" stage >here is becasue the final snapshot has to happen with interrupts off, >obviously. So *if* we don't need to allocate any memory, and if we >don't expect to want to accept some early error case, this is likely >useless. I think we need this. Apparently, some device drivers need as much as 30 meg of RAM at later stages (I don't know why and what for). > - snapshot() - actually save device state that is consistent with the >memory image at the time. Called with interrupts off, but the device >has to be usable both before and afterwards! > > And I would seriously suggest that "snapshot()" be documented to not rely > on any DMA memory, exactly because the device has to be accessible both > before and after (before - because we're running and allocating memory, > and after - because we'll be writing thigns out). But see later: Please note that some drivers are compiled as modules and they may deal with uninitialized hardware (or worse, with the hardware initialized by the BIOS in a crazy way) after restart_snapshot(). It may be better for them to actually quiesce the devices here too to avoid problems after restart_snapshot() . > For the "resume snapshot" path, I would suggest having > > - freeze(): quiesce the device. This literally just does the absolute >minimum to make sure that the device doesn't do anything surprising (no >interrupts, no DMA, no nothing). For many devices, it's a no-op, even >if they can do DMA (eg most disk controllers will do DMA, but only as >an actual result of a request, and upper layers will be quiescent >anyway, so they do *not* need to disable DMA) > >NOTE! The "freeze()" gets called from the *old* kernel just _before_ a >snapshot unpacking!! Yes, and usually the majority of modules is not loaded at that time. > - restart_snapshot() - actually restart the snapshot (and usually this >would involve re-setting the device, not so much trying to restore all >the saved state. IOW, it's easier to just re-initialize the DMA command >queues than to try to make them "atomic" in the snapshot). > >NOTE! This gets called by the *new* kernel _after_ the snapshot resume! I think devices _should_ be resetted in restart_snapshot(), unless it's possible to check if they have already been initialized by the "old" kernel - but this information would have to be available from the device itself. > And if you *want* to, I can see that you might want to actually do a > "unfreeze()" thing too, and make the actual shapshotting be: What unfreeze() would be needed for in that case? > /* We may not even need this.. */ > for_each_device() { > err = prepare_to_snapshot(); > if (err) > return err; > } We need to free as much memory as we'll need for the image creation at this point. > /* This is the real work for snapshotting */ > cli(); > for_each_device() > freeze(dev); You've added freeze() here, but it's not on your list above? > for_each_device() > snapshot(dev); > .. snapshot current memory image .. > for_each_device_depth_first() > unfreeze(dev); > sti(); > > and maybe it's worth it, but I would almost suggest that you just make the > rule be that any DMA etc just *has* to be re-initialized by > "restart_snapshot()", in which case it's not even necessary to > freeze/unfreeze over the device, and "snapshot()" itself only needs to > make sure any non-DMA data is safe. > > But adding the freeze/unfreeze (which is a no-op for most hardware anyway) > might make things easier to think about, so I would certainly not *object* > to it, even if I suspect it's not necessary. I think it's not necessary. > Anyway, the restore_snapshot() sequence should be: > > /* Old kernel.. Normal boot, load snapshot image */ > cli() > for_each_device() > freeze(dev); > restore_snapshot_image(); > restore_regs_and_jump_to_image(); > /* noreturn */ > > > /* New kernel, gets called at the snapshot restore address >* with interrupts off and devices fro
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thursday, 26 April 2007 18:10, Pekka Enberg wrote: > > On 4/26/2007, "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > In principle, we could add suspend2 as an alternative (in analogy with the > > I/O > > schedulers, for example), but I think for this purpose it should be reviewed > > properly. > > Yeah, this makes sense. > > On 4/26/2007, "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > There also is a real problem with how it uses the LRU pages. It _seems_ to > > work, but at least to me it seems to be potentially dangerous. > > I am new to suspend2 so can you please explain what exactly is dangerous > about it? After freezing tasks, it first saves the contents of the LRU pages, freezes devices and then uses the LRU pages for storing the suspend image (if more memory is needed, it's allocated, but that's irrelevant here). Now, we have no warranty that the LRU pages are not updated after we've saved their contents (first potential problem here). After the image has been created, we have to unfreeze devices and save the image. Now, we have no warranty that no one will be writing to the LRU pages that we have used to store the image, for whatever reasons known to him, so the image can potentially get corrupted while it's being saved. In principle, device drivers can do this and there are some kernel threads that also can do this (we don't freeze them, because they're needed for the image saving). The design is conceptually really really complicated and it makes strong assumptions about the behavior of different subsystems. While these assumptions _may_ be satisfied right now, we'd have to ensure the satisfaction of them in the future if suspend2 were merged. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Matt Mackall wrote: On Tue, Apr 24, 2007 at 11:29:56PM +0200, Pavel Machek wrote: We do not want to fragment the testing base, and suspend2 does not really have any interesting features over uswsusp. The testing base is already fragmented! What the current situation means is that you simply never hear from the people who get fed up with suspend but who manage to get suspend2 working. I have to feel that having a *working resume* capability is "any interesting features" enough. What you say about "simply never hear from" is unfortunately true. On 04/25/2007 05:30 PM EDT, Pavel Machek wrote: >It is not Rafael's fault. Actually it is quite hard to work with >Nigel, because he implements every feature someone asks for, and wants >to merge them all :-( . I don't expect to ever agree with Nigel on >anything important, sorry. The fact that Pavel thinks giving the users what they want is a problem certainly defines the difference between them, the populist "give them what they want" and the elitist "let's them make do with what I think they should have." I do respect Pavel for all the stuff he has done and is doing, I wish I could have found a nicer way to say that. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thursday, 26 April 2007 20:40, Johannes Berg wrote: > On Thu, 2007-04-26 at 20:40 +0200, Rafael J. Wysocki wrote: > > > > * it surfaces kernel implementation details about pm_ops and thus makes > > >the whole thing very fragile > > > > Can you elaborate? > > Well it tells userspace about pm_ops->enter/prepare/finish etc. > Also, it seems that it needs a "release memory now" operation instead of > just releasing it when the fd is closed? Yes. That's because we want to be able to repeat creating the image without closing the fd in some situations. > > > * it has yet another interface (yuck) to determine whether to reboot, > > >shut down etc, doesn't use /sys/power/disk > > > > Yes. In fact it was meant as a replacement for /sys/power/disk at one > > point. > > Heh. > > > > * I generally had no idea wtf it is doing in some places > > > > I could have told you if you had asked. :-) > > I was offline ;) > > > Do we need hibernate_ops at all? There's only one user anyway and I'm not > > sure there will be more of them in the future. > > I'm pretty sure there won't be, but there's no way to do it cleanly > without pm_ops since even acpi doesn't do this all the time but only > when some set of conditions is true. Hence, it needs to be able to > determine the availability of the platform mode at run time rather than > build time (build time => we could use weak symbols, arch hooks, ...) Still, we could use a global var 'platform_hibernation' or something like this, I think. Then, we can do #define platform_hibernation0 on the architectures that don't need it and make ACPI use it instead of this "dynamic linking". Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 2007-04-26 at 20:40 +0200, Rafael J. Wysocki wrote: > > * it surfaces kernel implementation details about pm_ops and thus makes > >the whole thing very fragile > > Can you elaborate? Well it tells userspace about pm_ops->enter/prepare/finish etc. Also, it seems that it needs a "release memory now" operation instead of just releasing it when the fd is closed? > > * it has yet another interface (yuck) to determine whether to reboot, > >shut down etc, doesn't use /sys/power/disk > > Yes. In fact it was meant as a replacement for /sys/power/disk at one point. Heh. > > * I generally had no idea wtf it is doing in some places > > I could have told you if you had asked. :-) I was offline ;) > Do we need hibernate_ops at all? There's only one user anyway and I'm not > sure there will be more of them in the future. I'm pretty sure there won't be, but there's no way to do it cleanly without pm_ops since even acpi doesn't do this all the time but only when some set of conditions is true. Hence, it needs to be able to determine the availability of the platform mode at run time rather than build time (build time => we could use weak symbols, arch hooks, ...) johannes signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thursday, 26 April 2007 18:31, Johannes Berg wrote: > On Thu, 2007-04-26 at 13:30 +0200, Pavel Machek wrote: > > > > From looking at pm_ops which I was recently working with a lot, it seems > > > that it was designed by somebody who was reading the ACPI documentation > > > and was otherwise pretty clueless, even at that level std tries to look > > > like suspend. IMHO that is one of the first things that should be ripped > > > out, no pm_ops for STD, it's a pain to work with. > > > > That code goes back to Patrick, AFAICT. (And yes, ACPI S3 and ACPI S4 > > low-level enter is pretty similar). > > > > Patches would be welcome > > That was easier than I thought. This applies on top of a patch that > makes kernel/power/user.c optional since I had no idea how to fix it, > problems I see: > * it surfaces kernel implementation details about pm_ops and thus makes >the whole thing very fragile Can you elaborate? > * it has yet another interface (yuck) to determine whether to reboot, >shut down etc, doesn't use /sys/power/disk Yes. In fact it was meant as a replacement for /sys/power/disk at one point. > * I generally had no idea wtf it is doing in some places I could have told you if you had asked. :-) > Anyway, this patch is only compile tested, it > * introduces include/linux/hibernate.h with hibernate_ops and >a new hibernate() function to hibernate the system Do we need hibernate_ops at all? There's only one user anyway and I'm not sure there will be more of them in the future. > * rips apart a lot of the suspend code and puts it back together using >the hibernate_ops > * switches ACPI to hibernate_ops (the only user of pm_ops.pm_disk_mode) > * might apply/compile against -mm, I have all my and some of Rafael's >suspend/hibernate work in my tree. > * breaks user suspend as I noted above > * is incomplete, somewhere pm_suspend_disk() is still defined iirc I think I can fix it up, just give me some time. The idea is good, I think we should do someting like this. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, Apr 26, 2007 at 01:09:53PM +0200, Pavel Machek wrote: > #define SNAPSHOT_SET_IMAGE_SIZE _IOW(SNAPSHOT_IOC_MAGIC, 6, > unsigned long) So I'm not supposed to be able to suspend the 16Gb-ram, 32bits servers I have here? OG. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Xavier Bestel wrote: On Wed, 2007-04-25 at 18:50 +1000, Nigel Cunningham wrote: (And guess what, it uses APM and suspend is really faster and way more reliable than each kernel implementation I could try). If you tried Suspend2 and had problems with reliability, please send me logs. I'll do all I can to help. (I have to qualify it a bit, because I'm not able to fix drivers, but if it's a Suspend2 issue, tell me and I'll fix it). Does suspend2 work with APM ? After much trying, I think now the ACPI implementation of my laptop (a vintage Compaq Armada 1700) is busted, only APM works. AFAIR the problem with suspend2 was that it didn't poweroff some parts of the laptop (the led of the wifi pcmcia card was on, and the lcd light was on too), but that was last year. Kernel's suspend kind of worked but didn't resume (no reaction on button press). As I tried all this last year, I may have forgotten some things. Honestly, I like this laptop when it works flawlessly, so I don't see many reasons to try *susp* again. I'll do it when I'm bored, just not today. Actually on some old laptops I just use the apm command, with -s (or -S, I forget by now), and that works. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Josh Triplett wrote: > > http://www.gettysfamily.org/wordpress/?p=32 > http://www.gettysfamily.org/wordpress/?p=33 > > You cannot resume "too fast"; suspend to RAM should become so fast that you > use it without even thinking about it. Yes. I think the OLPC is a prime example of why suspend-to-ram should basically try to not do any setup at all: leave all the DMA command structures etc in memory, leave even things like USB URB's pending, just "stop the actual controllers and suspend it". If you want to just save power by shutting down the CPU, it's a great idea to leave the NIC powered up etc, so that you don't have to renegotiate the link speed etc, and that you can go into s2ram and come back in less than a second. Then, if you really want to *sleep*, you can have a s2ram script that actually turns off devices for real and requires more tear-down and setup, but having something that tries to get in and out of the processor sleep modes as fast as possible is definitely interesting. Obviously, it's *more* interesting on the OLPC than on many other machines (since that one can keep its display refreshed while it's sleeping), so it may well be that the OLPC is one extreme example that isn't very realistic for some other uses, but I think it's one of the things we want to support. (And for me personally, I'd love to have all my machines "sleep" by default, but wake up by eithernet and keyboard - I'd love for my screen saver to literally put the machine to sleep, but not have to worry about touching a keyboard - just ssh'ing into them should still wake up. It's *technically* doable, but it's just a pain to do right now) Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Pavel Machek wrote: >> For suspend to ram, in contrast, since you *know* that nobody will be >> touching the hardware, and since the timings are very different anyway >> (you'd hope that you can resume in a second or two), you'd generally want >> to keep the DMA engine tables right where they are, and just literally >> suspend the PCI chip itself. > > I'd actually prefer resume to be similar to module insert, too... Do > you think that resume is _that_ time critical? http://www.gettysfamily.org/wordpress/?p=32 http://www.gettysfamily.org/wordpress/?p=33 You cannot resume "too fast"; suspend to RAM should become so fast that you use it without even thinking about it. - Josh Triplett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 2007-04-26 at 10:14 -0600, Chris Friesen wrote: > Johannes Berg wrote: > > > Judging from experience with the wext 32/64 bit fiasco it seems to be > > rather uncommon to use 32-bit userspace on 64-bit machines. > > I disagree...it's quite common. I think its the standard way of doing > things for ppc64, for instance. I know. My only 64-bit machine is ppc64 :) But still nobody noticed the wext 32/64 bit compat bug for like forever. On the other hand, maybe that just means that most 64-bit machines are desktop machines without wireless. johannes signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 2007-04-26 at 13:30 +0200, Pavel Machek wrote: > > From looking at pm_ops which I was recently working with a lot, it seems > > that it was designed by somebody who was reading the ACPI documentation > > and was otherwise pretty clueless, even at that level std tries to look > > like suspend. IMHO that is one of the first things that should be ripped > > out, no pm_ops for STD, it's a pain to work with. > > That code goes back to Patrick, AFAICT. (And yes, ACPI S3 and ACPI S4 > low-level enter is pretty similar). > > Patches would be welcome That was easier than I thought. This applies on top of a patch that makes kernel/power/user.c optional since I had no idea how to fix it, problems I see: * it surfaces kernel implementation details about pm_ops and thus makes the whole thing very fragile * it has yet another interface (yuck) to determine whether to reboot, shut down etc, doesn't use /sys/power/disk * I generally had no idea wtf it is doing in some places Anyway, this patch is only compile tested, it * introduces include/linux/hibernate.h with hibernate_ops and a new hibernate() function to hibernate the system * rips apart a lot of the suspend code and puts it back together using the hibernate_ops * switches ACPI to hibernate_ops (the only user of pm_ops.pm_disk_mode) * might apply/compile against -mm, I have all my and some of Rafael's suspend/hibernate work in my tree. * breaks user suspend as I noted above * is incomplete, somewhere pm_suspend_disk() is still defined iirc johannes --- Documentation/power/userland-swsusp.txt | 26 +++ drivers/acpi/sleep/main.c | 89 drivers/acpi/sleep/proc.c |3 drivers/i2c/chips/tps65010.c|2 include/linux/hibernate.h | 36 + include/linux/pm.h | 31 kernel/power/disk.c | 117 +++- kernel/power/main.c | 47 +--- kernel/power/power.h| 13 --- kernel/power/user.c | 28 +-- kernel/sys.c|3 11 files changed, 231 insertions(+), 164 deletions(-) --- wireless-dev.orig/include/linux/pm.h2007-04-26 18:15:00.440691185 +0200 +++ wireless-dev/include/linux/pm.h 2007-04-26 18:15:09.410691185 +0200 @@ -107,26 +107,11 @@ typedef int __bitwise suspend_state_t; #define PM_SUSPEND_ON ((__force suspend_state_t) 0) #define PM_SUSPEND_STANDBY ((__force suspend_state_t) 1) #define PM_SUSPEND_MEM ((__force suspend_state_t) 3) -#define PM_SUSPEND_DISK((__force suspend_state_t) 4) -#define PM_SUSPEND_MAX ((__force suspend_state_t) 5) - -typedef int __bitwise suspend_disk_method_t; - -/* invalid must be 0 so struct pm_ops initialisers can leave it out */ -#define PM_DISK_INVALID((__force suspend_disk_method_t) 0) -#definePM_DISK_PLATFORM((__force suspend_disk_method_t) 1) -#definePM_DISK_SHUTDOWN((__force suspend_disk_method_t) 2) -#definePM_DISK_REBOOT ((__force suspend_disk_method_t) 3) -#definePM_DISK_TEST((__force suspend_disk_method_t) 4) -#definePM_DISK_TESTPROC((__force suspend_disk_method_t) 5) -#definePM_DISK_MAX ((__force suspend_disk_method_t) 6) +#define PM_SUSPEND_MAX ((__force suspend_state_t) 4) /** * struct pm_ops - Callbacks for managing platform dependent suspend states. * @valid: Callback to determine whether the given state can be entered. - * If %CONFIG_SOFTWARE_SUSPEND is set then %PM_SUSPEND_DISK is - * always valid and never passed to this call. If not assigned, - * no suspend states are valid. * Valid states are advertised in /sys/power/state but can still * be rejected by prepare or enter if the conditions aren't right. * There is a %pm_valid_only_mem function available that can be assigned @@ -140,24 +125,12 @@ typedef int __bitwise suspend_disk_metho * * @finish: Called when the system has left the given state and all devices * are resumed. The return value is ignored. - * - * @pm_disk_mode: The generic code always allows one of the shutdown methods - * %PM_DISK_SHUTDOWN, %PM_DISK_REBOOT, %PM_DISK_TEST and - * %PM_DISK_TESTPROC. If this variable is set, the mode it is set - * to is allowed in addition to those modes and is also made default. - * When this mode is sent selected, the @prepare call will be called - * before suspending to disk (if present), the @enter call should be - * present and will be called after all state has been saved and the - * machine is ready to be powered off; the @finish callback is called - * after state has been restored. All these calls are called with - * %PM_SUSPEND_DISK as the state. */ struct pm_ops { int (*valid)(susp
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
By the way. > diff --git a/kernel/power/power.h b/kernel/power/power.h > index eb461b8..dc13af5 100644 > --- a/kernel/power/power.h > +++ b/kernel/power/power.h Don't these definitions need to be exported to userspace? That definitely is not a header file for userspace. johannes signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Chris Friesen wrote: > > I disagree...it's quite common. I think its the standard way of doing things > for ppc64, for instance. It is, although most x86-64 installations seem to be 64-bit user space *if*you*install*from*scatch*. Of course, at least some users (yeah, I've done it) started with a 32-bit CD they had lying around, and upgraded just the kernel. And I'm sure some distro out there just defaults to 32-bit binaries just because (in practice, you have to use a 32-bit firefox anyway if you want flash etc, so you need all the 32-bit libraries, so the argument might go that you might as well use 32-bit stuff for all the common stuff, and only 64-bit binaries when actually needed). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Johannes Berg wrote: Judging from experience with the wext 32/64 bit fiasco it seems to be rather uncommon to use 32-bit userspace on 64-bit machines. I disagree...it's quite common. I think its the standard way of doing things for ppc64, for instance. Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Mark Lord wrote: > Linus Torvalds wrote: > > > > See? Two *totally* different cases. They have *nothing* in common. Not the > > call sequence, not the logic, not *anything*. > > Except that both methods cannot rely upon hot-pluggable devices > still being present on resume/restore. It is exceptionally common > to unplug all USB/firewire cables, mouse, keyboard, docking cables etc.. > after a machine is in S2R state. Right, and that has nothing to do with suspend/resume. You'd better be able to handle unexpected hotplugs _regardless_. For example, it's quite common that people just "remove" the pcmcia/cardbus card while the driver is active. And in fact, when that happens, it's also quite common that the hardware raises the irq for that (active) driver (in fact, it's more than common: since the "card removal" interrupt for the Cardbus controller is generally always the same as the "card interrupt" interrupt for the low-level card driver, you can pretty much *guarantee* that you get that interrupt). So the end result is that the interrupt handler and all normal IO routines for a hotpluggable piece of hardware baically _have_ to be able to gracefully handle the "oops, the hw simply isn't there any more" case! The resume code isn't any different at all. It should run perfectly normally, but for hotpluggable devices, it has to follow all the same rules: handle the "oops, the hw is gone" case gracefully. No different, and it's totally unrelated to suspend/resume: it's a *generic* issue. In fact, suspend/resume is better off than a lot of the other code is, simply because it's easier to test that case and know you hit that particular sequence! It's much harder to verify that the "send packet" case is safe, because how are you going to know to remove the card at the right point to trigger it? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On 4/26/2007, "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > In principle, we could add suspend2 as an alternative (in analogy with the I/O > schedulers, for example), but I think for this purpose it should be reviewed > properly. Yeah, this makes sense. On 4/26/2007, "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > There also is a real problem with how it uses the LRU pages. It _seems_ to > work, but at least to me it seems to be potentially dangerous. I am new to suspend2 so can you please explain what exactly is dangerous about it? Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Alan Cox wrote: > > > The PCI spec for controlling DMA is really pretty nasty. You can disable > > it in the PCI config word, of course, but that usually just messes up the > > device entirely. > > And some devices ignore it. Some of the older Cyrix stuff I have appears > not to care how the master bit is set. I'm not surprised. If the choice is between locking up the PCI bus by hanging the device in endless retries, or just ignoring the bit, I suspect "just ignore it" is actually the better choice. Of course, in a perfect world you'd happily honor it, raise a PCI error, and all is good, but in practice the internal state machine of most non-trivial hardware is simply so complicated that the "abort gracefully" simply isn't an option. The hw people have enough problems in getting things to work when everything is peachy and well, and a lot of companies end up releasing stuff with known errata for even the _normal_ cases, just because they expect software to work around them ("Doctor, doctor, it hurts when I do the documented access!" "You didn't read errata #317, did you? Don't do that, then!") Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Pavel Machek wrote: > > Yes, probably will. The other option is to break existing 32-bit > userspace, which is a bit more common AFAICT. And *this* is why kernel/userspace things simply should not be done. It's simply better to do things entirely in the kernel. Because you add bugs and complications otherwise, and doing it in the kernel allows you to just switch things around. As it is, it appears that user-space suspend is just broken whichever way we turn. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Pavel Machek wrote: > > #define SNAPSHOT_ATOMIC_SNAPSHOT _IOW(SNAPSHOT_IOC_MAGIC, 3, void *) > #define SNAPSHOT_SET_IMAGE_SIZE _IOW(SNAPSHOT_IOC_MAGIC, 6, > unsigned long) > #define SNAPSHOT_AVAIL_SWAP _IOR(SNAPSHOT_IOC_MAGIC, 7, void *) > #define SNAPSHOT_GET_SWAP_PAGE_IOR(SNAPSHOT_IOC_MAGIC, 8, > void *) > #define SNAPSHOT_SET_SWAP_FILE_IOW(SNAPSHOT_IOC_MAGIC, 10, > unsigned int) > #define SNAPSHOT_PMOPS_IOW(SNAPSHOT_IOC_MAGIC, 12, > unsigned int) > > Are these a problem? Do we need to just use u32 as a argument to keep > ioctl numbers same between 32 and 64bit versions? No, you need to use the *proper* type as an argument, and assuming that type has the same representation in both 32-bit and 64-bit world, the numbers will automatically match. Using "void *" is totally bogus. It's supposed to be the actual argument you pass in, not the pointer to it. If your argument doesn't have a "struct xyz" kind of format, then you could use "int" (or u32 or something: but realistically int is 32-bit for the forseeable future), but it's always wrong to pass in "void *" or "unsigned long", since either of those are just a sign of the interface being either (a) misunderstood or (b) broken. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thursday, 26 April 2007 13:12, Pekka Enberg wrote: > On 4/25/07, Pavel Machek <[EMAIL PROTECTED]> wrote: > > > Please stop using FUD. > > > Graphical progress it's not in the kernel, even with suspend2. > > > > It was ascii-art, but still 'graphical', last time I checked. > > Suspend2 talks to an userspace client via netlink. While I find the > name of the message ("redraw UI") rather appaling, there's nothing > wrong in principle that userspace starts the suspend process and the > kernel keeps feeding back progress information ("I froze all processes > now") so it can display a graphical progress bar. > > The real question here is what to do with compression and encryption. > However, if you settle for one compression algorithm (such as LZF in > the case of suspend2) and use the _existing_ in-kernel crypto API for > encryption, suddenly the benefits of userspace suspend are not clear. > > As you and Rafael seem to be mostly interested in uswsusp, why don't > we replace the old in-kernel implementation with suspend2? It has a lot of common code with uswsusp. Practically, the saving of the image is the only part of it that could be removed, but this is simple and _really_ helps with debugging. In principle, we could add suspend2 as an alternative (in analogy with the I/O schedulers, for example), but I think for this purpose it should be reviewed properly. There also is a real problem with how it uses the LRU pages. It _seems_ to work, but at least to me it seems to be potentially dangerous. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Linus Torvalds wrote: See? Two *totally* different cases. They have *nothing* in common. Not the call sequence, not the logic, not *anything*. Except that both methods cannot rely upon hot-pluggable devices still being present on resume/restore. It is exceptionally common to unplug all USB/firewire cables, mouse, keyboard, docking cables etc.. after a machine is in S2R state. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
> The PCI spec for controlling DMA is really pretty nasty. You can disable > it in the PCI config word, of course, but that usually just messes up the > device entirely. And some devices ignore it. Some of the older Cyrix stuff I have appears not to care how the master bit is set. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > > The interface isn't even 64/32-bit compatible... > > > > It's not . And it's one of the worst interface I've seen lately. Did > > anyone actually review this crap before it went in? I completely > > agree with Linus that these kind of boundaries that lead to horribly > > complex ioctl interface are totally wrong. > > it's a bit hard to see the point of it anyway: the resume binary (much > of the focus of the ioctls) fundamentally lives as an 'initrd binary' - > and most of the stuff that wants to execute in an initrd is > fundamentally tied to the kernel anyway. Typically... yes, it needs to be in initrd. And yes, klibc would help here. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
* Christoph Hellwig <[EMAIL PROTECTED]> wrote: > > The interface isn't even 64/32-bit compatible... > > It's not . And it's one of the worst interface I've seen lately. Did > anyone actually review this crap before it went in? I completely > agree with Linus that these kind of boundaries that lead to horribly > complex ioctl interface are totally wrong. it's a bit hard to see the point of it anyway: the resume binary (much of the focus of the ioctls) fundamentally lives as an 'initrd binary' - and most of the stuff that wants to execute in an initrd is fundamentally tied to the kernel anyway. Perhaps we should allow "in-kernel userspace" that would be allowed to grow ad-hoc interfaces and linking that would only be compatible with the kernel they are embedded into: e.g. the klibc stuff in linux/usr/* could link to the kernel (via whatever method) and just be in essence another type of kernel code - but happening to execute in user-space, having access to the normal user-space facilities and being able to link to (GPL) user-space libraries. Perhaps this would bridge the "i want to tinker in user-space because it's technically easier/cleaner there" and "fine but that needs formalized ABIs for your connection to kernel-space" gap. > Now suspend2 wasn't exactly nice either when I last reviewed it, but > we should probably give it another attempt if we can sort out a proper > incremental merge. yeah, it still has quite a bit of work left, but it looked fundamentally split-uppable. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 2007-04-26 at 13:30 +0200, Pavel Machek wrote: > That code goes back to Patrick, AFAICT. (And yes, ACPI S3 and ACPI S4 > low-level enter is pretty similar). But that doesn't excuse abusing the same interface, IMHO. > Patches would be welcome :) I'll see what I can do. Shouldn't be too hard to add an interface just for ACPI here and get platform disk-mode into there from a different angle. johannes signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, Apr 26, 2007 at 12:17:12PM +0200, Johannes Berg wrote: > On Tue, 2007-04-24 at 23:24 +0200, Pavel Machek wrote: > > > I believe uswsusp user/kernel separation is clean enough. Kernel > > provides "snapshot image" and "resume image". (Thanks go to Rafael for > > very clean interface). > > The interface isn't even 64/32-bit compatible... It's not . And it's one of the worst interface I've seen lately. Did anyone actually review this crap before it went in? I completely agree with Linus that these kind of boundaries that lead to horribly complex ioctl interface are totally wrong. Now suspend2 wasn't exactly nice either when I last reviewed it, but we should probably give it another attempt if we can sort out a proper incremental merge. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > Yes, probably will. The other option is to break existing 32-bit > > userspace, which is a bit more common AFAICT. > > Judging from experience with the wext 32/64 bit fiasco it seems to be > rather uncommon to use 32-bit userspace on 64-bit machines. Well, it would break 32-bit userspace on 32-bit kernel, which is the most common version, AFAICT. > Rafael hinted that we could just add these numbers, keep the existing > ones and then phase them out over time, but I haven't really given it > much thought. We could probably do that... but it is slightly ugly. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 2007-04-26 at 13:26 +0200, Pavel Machek wrote: > Yes, probably will. The other option is to break existing 32-bit > userspace, which is a bit more common AFAICT. Judging from experience with the wext 32/64 bit fiasco it seems to be rather uncommon to use 32-bit userspace on 64-bit machines. Rafael hinted that we could just add these numbers, keep the existing ones and then phase them out over time, but I haven't really given it much thought. johannes signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > That's where I started: whole "suspend to disk" thing actually has _more_ > > to do with "shutdown" than with "suspend". > > From looking at pm_ops which I was recently working with a lot, it seems > that it was designed by somebody who was reading the ACPI documentation > and was otherwise pretty clueless, even at that level std tries to look > like suspend. IMHO that is one of the first things that should be ripped > out, no pm_ops for STD, it's a pain to work with. That code goes back to Patrick, AFAICT. (And yes, ACPI S3 and ACPI S4 low-level enter is pretty similar). Patches would be welcome, as would be "suspend-to-ram maintainer". Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > We do not want to fragment the testing base, and suspend2 does not > > really have any interesting features over uswsusp. > > The testing base is already fragmented! > > What the current situation means is that you simply never hear from > the people who get fed up with suspend but who manage to get suspend2 > working. Well, and what can I do? I can wait for suspend2 to slowly disappear on their own. Merging suspend2 would just make testing base _more_ fragmented then it is today. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > This one should prevent ioctl numbers changing, too. > > > -#define SNAPSHOT_ATOMIC_SNAPSHOT _IOW(SNAPSHOT_IOC_MAGIC, 3, void *) > > +#define SNAPSHOT_ATOMIC_SNAPSHOT _IOW(SNAPSHOT_IOC_MAGIC, 3, u32) /* > > void * */ > > Afaict that'll actually change ioctl numbers breaking existing 64-bit > userspace. Yes, probably will. The other option is to break existing 32-bit userspace, which is a bit more common AFAICT. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 2007-04-26 at 13:16 +0200, Pavel Machek wrote: > This one should prevent ioctl numbers changing, too. > -#define SNAPSHOT_ATOMIC_SNAPSHOT _IOW(SNAPSHOT_IOC_MAGIC, 3, void *) > +#define SNAPSHOT_ATOMIC_SNAPSHOT _IOW(SNAPSHOT_IOC_MAGIC, 3, u32) /* > void * */ Afaict that'll actually change ioctl numbers breaking existing 64-bit userspace. johannes signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > Does this seem to help? > > No idea, I haven't actually tried it yet, last time I tried uswsusp on > my 32/32 machine it didn't work due to endian problems that were > supposed to be resolved but I haven't had a chance to pick all the bits > together that you need. This one should prevent ioctl numbers changing, too. diff --git a/kernel/power/power.h b/kernel/power/power.h index eb461b8..a18b85a 100644 --- a/kernel/power/power.h +++ b/kernel/power/power.h @@ -114,23 +114,23 @@ extern int snapshot_image_loaded(struct * SNAPSHOT_SET_SWAP_AREA ioctl */ struct resume_swap_area { - loff_t offset; + u_int64_t offset; u_int32_t dev; } __attribute__((packed)); #define SNAPSHOT_IOC_MAGIC '3' #define SNAPSHOT_FREEZE_IO(SNAPSHOT_IOC_MAGIC, 1) #define SNAPSHOT_UNFREEZE _IO(SNAPSHOT_IOC_MAGIC, 2) -#define SNAPSHOT_ATOMIC_SNAPSHOT _IOW(SNAPSHOT_IOC_MAGIC, 3, void *) +#define SNAPSHOT_ATOMIC_SNAPSHOT _IOW(SNAPSHOT_IOC_MAGIC, 3, u32) /* void * */ #define SNAPSHOT_ATOMIC_RESTORE_IO(SNAPSHOT_IOC_MAGIC, 4) #define SNAPSHOT_FREE _IO(SNAPSHOT_IOC_MAGIC, 5) -#define SNAPSHOT_SET_IMAGE_SIZE_IOW(SNAPSHOT_IOC_MAGIC, 6, unsigned long) -#define SNAPSHOT_AVAIL_SWAP_IOR(SNAPSHOT_IOC_MAGIC, 7, void *) -#define SNAPSHOT_GET_SWAP_PAGE _IOR(SNAPSHOT_IOC_MAGIC, 8, void *) +#define SNAPSHOT_SET_IMAGE_SIZE_IOW(SNAPSHOT_IOC_MAGIC, 6, u32) /* unsigned long */ +#define SNAPSHOT_AVAIL_SWAP_IOR(SNAPSHOT_IOC_MAGIC, 7, u32) /* void * */ +#define SNAPSHOT_GET_SWAP_PAGE _IOR(SNAPSHOT_IOC_MAGIC, 8, u32) /* void * */ #define SNAPSHOT_FREE_SWAP_PAGES _IO(SNAPSHOT_IOC_MAGIC, 9) -#define SNAPSHOT_SET_SWAP_FILE _IOW(SNAPSHOT_IOC_MAGIC, 10, unsigned int) +#define SNAPSHOT_SET_SWAP_FILE _IOW(SNAPSHOT_IOC_MAGIC, 10, u32) /* unsigned int */ #define SNAPSHOT_S2RAM _IO(SNAPSHOT_IOC_MAGIC, 11) -#define SNAPSHOT_PMOPS _IOW(SNAPSHOT_IOC_MAGIC, 12, unsigned int) +#define SNAPSHOT_PMOPS _IOW(SNAPSHOT_IOC_MAGIC, 12, u32) /* unsigned int */ #define SNAPSHOT_SET_SWAP_AREA _IOW(SNAPSHOT_IOC_MAGIC, 13, \ struct resume_swap_area) #define SNAPSHOT_IOC_MAXNR 13 diff --git a/kernel/power/user.c b/kernel/power/user.c index 558e18e..d0730c1 100644 --- a/kernel/power/user.c +++ b/kernel/power/user.c @@ -215,8 +215,7 @@ static int snapshot_ioctl(struct inode * { int error = 0; struct snapshot_data *data; - loff_t avail; - sector_t offset; + u64 avail, offset; if (_IOC_TYPE(cmd) != SNAPSHOT_IOC_MAGIC) return -ENOTTY; @@ -286,7 +285,7 @@ static int snapshot_ioctl(struct inode * case SNAPSHOT_AVAIL_SWAP: avail = count_swap_pages(data->swap, 1); avail <<= PAGE_SHIFT; - error = put_user(avail, (loff_t __user *)arg); + error = put_user(avail, (u64 __user *)arg); break; case SNAPSHOT_GET_SWAP_PAGE: @@ -304,7 +303,7 @@ static int snapshot_ioctl(struct inode * offset = alloc_swapdev_block(data->swap, data->bitmap); if (offset) { offset <<= PAGE_SHIFT; - error = put_user(offset, (sector_t __user *)arg); + error = put_user(offset, (u64 __user *)arg); } else { error = -ENOSPC; } -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On 4/25/07, Pavel Machek <[EMAIL PROTECTED]> wrote: > Please stop using FUD. > Graphical progress it's not in the kernel, even with suspend2. It was ascii-art, but still 'graphical', last time I checked. Suspend2 talks to an userspace client via netlink. While I find the name of the message ("redraw UI") rather appaling, there's nothing wrong in principle that userspace starts the suspend process and the kernel keeps feeding back progress information ("I froze all processes now") so it can display a graphical progress bar. The real question here is what to do with compression and encryption. However, if you settle for one compression algorithm (such as LZF in the case of suspend2) and use the _existing_ in-kernel crypto API for encryption, suddenly the benefits of userspace suspend are not clear. As you and Rafael seem to be mostly interested in uswsusp, why don't we replace the old in-kernel implementation with suspend2? Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > > > I believe uswsusp user/kernel separation is clean enough. Kernel > > > > provides "snapshot image" and "resume image". (Thanks go to Rafael for > > > > very clean interface). > > > > > > The interface isn't even 64/32-bit compatible... > > > > Which parts? > > ioctl numbers last time I talked about it with Rafael. No effort was > made to fix it. #define SNAPSHOT_ATOMIC_SNAPSHOT_IOW(SNAPSHOT_IOC_MAGIC, 3, void *) #define SNAPSHOT_SET_IMAGE_SIZE _IOW(SNAPSHOT_IOC_MAGIC, 6, unsigned long) #define SNAPSHOT_AVAIL_SWAP _IOR(SNAPSHOT_IOC_MAGIC, 7, void *) #define SNAPSHOT_GET_SWAP_PAGE _IOR(SNAPSHOT_IOC_MAGIC, 8, void *) #define SNAPSHOT_SET_SWAP_FILE _IOW(SNAPSHOT_IOC_MAGIC, 10, unsigned int) #define SNAPSHOT_PMOPS _IOW(SNAPSHOT_IOC_MAGIC, 12, unsigned int) Are these a problem? Do we need to just use u32 as a argument to keep ioctl numbers same between 32 and 64bit versions? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 2007-04-26 at 12:40 +0200, Pavel Machek wrote: > Does this seem to help? No idea, I haven't actually tried it yet, last time I tried uswsusp on my 32/32 machine it didn't work due to endian problems that were supposed to be resolved but I haven't had a chance to pick all the bits together that you need. johannes signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 2007-04-26 at 12:30 +0200, Pavel Machek wrote: > On Thu 2007-04-26 12:17:12, Johannes Berg wrote: > > On Tue, 2007-04-24 at 23:24 +0200, Pavel Machek wrote: > > > > > I believe uswsusp user/kernel separation is clean enough. Kernel > > > provides "snapshot image" and "resume image". (Thanks go to Rafael for > > > very clean interface). > > > > The interface isn't even 64/32-bit compatible... > > Which parts? ioctl numbers last time I talked about it with Rafael. No effort was made to fix it. johannes signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Wed, 2007-04-25 at 15:55 -0700, Linus Torvalds wrote: > That's where I started: whole "suspend to disk" thing actually has _more_ > to do with "shutdown" than with "suspend". From looking at pm_ops which I was recently working with a lot, it seems that it was designed by somebody who was reading the ACPI documentation and was otherwise pretty clueless, even at that level std tries to look like suspend. IMHO that is one of the first things that should be ripped out, no pm_ops for STD, it's a pain to work with. johannes signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > The interface isn't even 64/32-bit compatible... > > Which parts? > > ioctl(AVAIL_SWAP, > ...hmm, is this the one you are complaining about? It returns > loff_t through a pointer. Maybe there's another interface > that can return available swap, and we should use that, >instead? loff_t is 64bit on i386, so I do not see immediate problem here, but maybe we should just explicitely pass u64? > ioctl(GET_SWAP_PAGE, > returns sector_t through a pointer. NOt sure if that's good > idea, either. Ok, that's very bad idea, because sector_t can be 32-bit or 64-bit, depending on CONFIG_LBD. We need to use u64 here. > ioctl(SET_SWAP_FILE, > does old_decode_dev(arg). Is that ok? > > ioctl(SET_SWAP_AREA, > shares struct resume_swap_area between user and kernel. I > guess that's bad..? struct resume_swap_area { loff_t offset; u_int32_t dev; } __attribute__((packed)); ...I guess we should change loff_t -> u64 and problem is solved? Old_decode_dev takes u16. That sucks for majors/minors > 256, but fortunately those are not common. Does this seem to help? Pavel diff --git a/kernel/power/power.h b/kernel/power/power.h index eb461b8..dc13af5 100644 --- a/kernel/power/power.h +++ b/kernel/power/power.h @@ -114,7 +114,7 @@ extern int snapshot_image_loaded(struct * SNAPSHOT_SET_SWAP_AREA ioctl */ struct resume_swap_area { - loff_t offset; + u_int64_t offset; u_int32_t dev; } __attribute__((packed)); diff --git a/kernel/power/user.c b/kernel/power/user.c index 558e18e..d0730c1 100644 --- a/kernel/power/user.c +++ b/kernel/power/user.c @@ -215,8 +215,7 @@ static int snapshot_ioctl(struct inode * { int error = 0; struct snapshot_data *data; - loff_t avail; - sector_t offset; + u64 avail, offset; if (_IOC_TYPE(cmd) != SNAPSHOT_IOC_MAGIC) return -ENOTTY; @@ -286,7 +285,7 @@ static int snapshot_ioctl(struct inode * case SNAPSHOT_AVAIL_SWAP: avail = count_swap_pages(data->swap, 1); avail <<= PAGE_SHIFT; - error = put_user(avail, (loff_t __user *)arg); + error = put_user(avail, (u64 __user *)arg); break; case SNAPSHOT_GET_SWAP_PAGE: @@ -304,7 +303,7 @@ static int snapshot_ioctl(struct inode * offset = alloc_swapdev_block(data->swap, data->bitmap); if (offset) { offset <<= PAGE_SHIFT; - error = put_user(offset, (sector_t __user *)arg); + error = put_user(offset, (u64 __user *)arg); } else { error = -ENOSPC; } -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu 2007-04-26 12:17:12, Johannes Berg wrote: > On Tue, 2007-04-24 at 23:24 +0200, Pavel Machek wrote: > > > I believe uswsusp user/kernel separation is clean enough. Kernel > > provides "snapshot image" and "resume image". (Thanks go to Rafael for > > very clean interface). > > The interface isn't even 64/32-bit compatible... Which parts? read/write on /dev/snapshot looks ok. ioctl(SNAPSHOT_FREEZE, UNFREEZE, ATOMIC_RESTORE, FREE, FREE_SWAP_PAGE, SNAPSHOT_S2RAM, is okay, because it does not pass any data. ioctl(ATOMIC_SNAPSHOT, returns 0/1 through pointer. Should be ok. (Maybe we should do if (!error) error = put_user(in_suspend, (u32 __user *)arg); ...instead, to make it very explicit? ioctl(SET_IMAGE_SIZE, is okay, because it just uses arg directly. ioctl(PMOPS, is okay, because it just uses arg directly... and it is in range 0-3 or something. ioctl(AVAIL_SWAP, ...hmm, is this the one you are complaining about? It returns loff_t through a pointer. Maybe there's another interface that can return available swap, and we should use that, instead? ioctl(GET_SWAP_PAGE, returns sector_t through a pointer. NOt sure if that's good idea, either. ioctl(SET_SWAP_FILE, does old_decode_dev(arg). Is that ok? ioctl(SET_SWAP_AREA, shares struct resume_swap_area between user and kernel. I guess that's bad..? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Tue, 2007-04-24 at 23:24 +0200, Pavel Machek wrote: > I believe uswsusp user/kernel separation is clean enough. Kernel > provides "snapshot image" and "resume image". (Thanks go to Rafael for > very clean interface). The interface isn't even 64/32-bit compatible... johannes signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi. On Thu, 2007-04-26 at 00:27 -0700, David Lang wrote: > On Thu, 26 Apr 2007, Nigel Cunningham wrote: > > > Hi. > > > > On Thu, 2007-04-26 at 01:33 +0200, Olivier Galibert wrote: > >> On Wed, Apr 25, 2007 at 11:50:45AM -0700, Linus Torvalds wrote: > >>> .. but if the alternative is a feature that just isn't worth it, and > >>> likely to not only have its own bugs, but cause bugs elsewhere? (And yes, > >>> I believe STD is both of those. There's a reason it's called "STD". Go > >>> to google and type "STD" and press "I'm feeling lucky". Google is God). > >> > >> If it was correctly designed, it would be possible to change the > >> hardware or even the kernel through a STD cycle. And that would be > >> damn interesting on servers. > > > > Those are different issues - hardware hot/cold plugging for the first. > > > > Changing the kernel through a cycle - that's not a design fault. The > > problem there is that the kernel and it's associated data structures are > > part of the state. Changing the kernel and keeping the image would > > require exactly correspondence in data structures, memory map and so on. > > That's why the same kernel is required. > > that depends on exactly what you save in your snapshot. > > one approach is to try and save absolutly everything in ram (this is the > current > approach) > > if you do this then you do need to use the same kernel for the reasons that > you > list. > > however, you could also decide to only save the information about processes > on > the system (i.e. what you absolutly have to) and let the kernel re-initialize > itself (along with it's devices) then you could use a different kernel > safely. > doing this should also save you a significant amount of storage when makeing > your snapshot Well, there is cryopid for individual processes. I suppose you could potentially try doing a mass cryopiding. That would make things a lot more complicated though. I'm not saying it's not doable. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Nigel Cunningham wrote: Hi. On Thu, 2007-04-26 at 01:33 +0200, Olivier Galibert wrote: On Wed, Apr 25, 2007 at 11:50:45AM -0700, Linus Torvalds wrote: .. but if the alternative is a feature that just isn't worth it, and likely to not only have its own bugs, but cause bugs elsewhere? (And yes, I believe STD is both of those. There's a reason it's called "STD". Go to google and type "STD" and press "I'm feeling lucky". Google is God). If it was correctly designed, it would be possible to change the hardware or even the kernel through a STD cycle. And that would be damn interesting on servers. Those are different issues - hardware hot/cold plugging for the first. Changing the kernel through a cycle - that's not a design fault. The problem there is that the kernel and it's associated data structures are part of the state. Changing the kernel and keeping the image would require exactly correspondence in data structures, memory map and so on. That's why the same kernel is required. that depends on exactly what you save in your snapshot. one approach is to try and save absolutly everything in ram (this is the current approach) if you do this then you do need to use the same kernel for the reasons that you list. however, you could also decide to only save the information about processes on the system (i.e. what you absolutly have to) and let the kernel re-initialize itself (along with it's devices) then you could use a different kernel safely. doing this should also save you a significant amount of storage when makeing your snapshot David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi. On Thu, 2007-04-26 at 01:33 +0200, Olivier Galibert wrote: > On Wed, Apr 25, 2007 at 11:50:45AM -0700, Linus Torvalds wrote: > > .. but if the alternative is a feature that just isn't worth it, and > > likely to not only have its own bugs, but cause bugs elsewhere? (And yes, > > I believe STD is both of those. There's a reason it's called "STD". Go > > to google and type "STD" and press "I'm feeling lucky". Google is God). > > If it was correctly designed, it would be possible to change the > hardware or even the kernel through a STD cycle. And that would be > damn interesting on servers. Those are different issues - hardware hot/cold plugging for the first. Changing the kernel through a cycle - that's not a design fault. The problem there is that the kernel and it's associated data structures are part of the state. Changing the kernel and keeping the image would require exactly correspondence in data structures, memory map and so on. That's why the same kernel is required. > In any case, if I could trust it, I'd use it when I need to move > servers around and I don't want to lose what is running. Riding power > cuts that way would be nice. That's what Rafael and I working on. Nigel signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hello. On Wed, 2007-04-25 at 23:30 +0200, Pavel Machek wrote: > Hi! > > > > Please ask anyone who's worked with me if he's had any problem with that. > > > If anyone say I'm unable to work with anybody else, I'd say you're right. > > > Till > > > then, I feel offended. > > > > I'll apologise (and virtually kiss your hairy feet) if you could actually > > show me a single implementation that people can agree on. > > > > But until then, I claim that the suspend-to-disk people cannot work with > > each other. > > It is not Rafael's fault. Actually it is quite hard to work with > Nigel, because he implements every feature someone asks for, and wants > to merge them all :-(. I don't expect to ever agree with Nigel on > anything important, sorry. I'm sorry that you feel that way, Pavel. I can agree that I implement features that people ask for, but I think saying "every feature someone asks for" is going a bit far (I won't ask you to prove that). My desire is to provide Linux with hibernation support that does more than just the bare minimum. Different people have different usage scenarios, and this has led to me implementing more and different features. As to wanting to merge them all, this is true. No one wants to put time into something only to have it left out. But I don't see why you think this is a bad thing. Many kernel guys claim the thing follows an evolutionary model. Well, here's software that has been developed out of tree - evolved if you like - and which many people would consider more mature ('evolved'?) than [u]swsusp. If evolutionary theory is to be followed, let the fittest survive! Nigel signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi. Hmm. Perhaps I should have added to that last reply that recognising that they store similar information doesn't mean I think they need the same high-level routine for both state transitions. I'd really like to see each driver have some sort of state machine controlling its power management, into which these calls were just another input (an important one, but just another alongside information about policy, whether we're on battery (UPS or laptop) or AC, whether the device is actually being used and so on. Nigel signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Alan Cox wrote: > Mind you some laptops think S2RAM is just a transition state on the way > to disk, leave them in ACPI S2RAM and the firmware will magically turn it > into a save to disk and back to ram if the battery runs low or you leave > it idle too long. The OS does this (or at least it's supposed to). STR with battery low, it comes back on fully via a battery wake event, and then STD (aka snapshot/poweroff). The ACPI state machine always goes through S0, FWIW. OK, now back to the "should we have 2 function pointers or 4" debate... -- Andy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi. On Wed, 2007-04-25 at 20:03 -0700, Linus Torvalds wrote: > > On Thu, 26 Apr 2007, Nigel Cunningham wrote: > > > > Sorry. I wasn't clear. I wasn't saying that suspend to ram has a > > snapshot point. I was trying to say it has a point where you're seeking > > to save information (PCI state / SCSI transaction number or whatever) > > that you'll need to get the hardware into the same state at a later > > stage. That (saving information) is the point of similarity. > > Yes, they do both save information, but I'm not actually convinced they > would necessarily even save the *same* information. > > Let's just take an example of USB, and to make things more interesting, > say that the disk you want to suspend to is itself over USB (not > necessarily something you _want_ to do, but I think we can all agree that > it's something that should potentially work, no?) Agreed - it would be nice. > Now, USB devices actually have per-connection state (at a minimum, the > "toggle" bit or whatever), and that's obviously something that will > inevitably *change* as a result of the device being used after > snapshotting (and even if not used, by the rediscovery by the first kernel > to boot), and we fundamentally cannot put the final toggle state in the > snapshot. > > So in the snapshot-to-disk scenario, there are some pieces of data that > simply fundamentally *cannot* be snapshotted, because they are not > controller state, they are "connection" state. > > So in that case, you basically know that you *have* to rebuild the > connection when you do the "snapshot_resume()" thing. So there's no point > in even keeping these kinds of connection states (the same is true of > keyboards, mice, anything else - it's how USB works). Sort of agree - you might want to record some serial number that might let you recognise it as the same thing at resume time when everything is re-hotplugged (assuming it's even there then). Nevertheless, I don't think that diminishes what you're saying. > In contrast, in suspend-to-RAM, USB connections might just be things you > actually want to keep open and active, and you *can* do so, in ways you > simply cannot do with "snapshot to disk". In fact, if you are something > like an OLPC and actually go to s2ram very aggressively, you might well > want to keep the connection established, because it's conceivable that you > might otherwise lose keypresses etc issues) > > See? There are real *technical* reasons to believe that the two "save > state" operations are really fundamentally different. There are reasons to > believe that a s2ram can actually happen while keeping some connections > open that cannot be kept open over a disk snapshot. > > Do they *have* to be different? Of course not. For many devices the "save" > and "freeze" operations will likely all be no-ops, and there would be > absolutely no difference between suspending and snapshotting, because the > driver state already natively contains all the information needed to get > the device going again. > > Equally, I don't doubt that in many drivers you'll have very similar "save > state" logic, but in fact I believe that in many cases that "save state" > logic will often just be a simple > > pci_save_state(dev); > > call, so it's literally the case that they will not be just shared between > the "suspend" and "snapshot" case, they'll be shared across all simple PCI > devices too! > > But that doesn't mean that the functions to do so should be the same. You > might have > > static int mypcidevice_suspend(struct pci_dev *dev) > { > pci_save_state(dev); > pci_set_power_state(dev, PCI_D3); > return 0; > } > > static int mupcidevice_snapshot(struct pci_dev *dev) > { > pci_save_state(dev); > return 0; > } > > and who cares if they both have that same call to a shared "save state" > function? They're still totally different operations, and the fact that > *some* devices may save the same things doesn't make them any more > similar! See above why some devices might save totally *different* things > for a "snapshot" vs a "suspend" event. No disagreement here. > > I suppose that's another point of similarity - for snapshotting, the > > same ordering is probably needed? > > I agree that you're likely to walk the device list in the same order. The > whole "shut down leaf devices first", "start up root devices first" is > pretty fundamental. > > But that's true of reboot and device discovery too. Should that ordering > mean that we should use the "discovery()" function and pass it a flag and > say "you shouldn't discover, you should snapshot or suspend now"? No. > Everybody agrees that device discovery is something different from device > suspend. The fact that it's done in a topological order and thus they bear > some kind of inverse relationship to each other doesn't make them "
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi. On Wed, 2007-04-25 at 19:04 -0700, Linus Torvalds wrote: > > On Thu, 26 Apr 2007, Nigel Cunningham wrote: > > > > That's where I think you're overstretching the argument. Like suspend > >(to ram), we're concerned at the snapshot point with getting the hardware > >in the same state at a later stage. > > Really, no. > > "suspend to ram" doesn't _have_ a "snapshot point". Sorry. I wasn't clear. I wasn't saying that suspend to ram has a snapshot point. I was trying to say it has a point where you're seeking to save information (PCI state / SCSI transaction number or whatever) that you'll need to get the hardware into the same state at a later stage. That (saving information) is the point of similarity. > I've tried to explain this multiple times, I don't know why it's not > apparently sinking in. This is much more fundamental than the fact that > you don't want to stop disks for snapshotting, although it really boils > down to all the same issues: the operations are simply not at all the > same! Miscommunication, I think. Does the above help? > I agree 100% that "snapshot to disk" is a "snapshot event". You have to > create a single point in time when everything is stable. And I'd much > rather call it "snapshot to disk" than "suspend to disk" to make it clear > that it's something _totally_ different from "suspend". > > Because the thing is, "suspend to ram" is *not* a snapshot event. At no > point do you actually need to "snapshot" the system at all. You can just > gradually shut more and more things down, and equally gradually bring them > back up. There simply is *never* any "snapshot" time from a device > standpoint, because you can just shut down devices in the right order AND > YOU ARE DONE. > > Really. I suppose that's another point of similarity - for snapshotting, the same ordering is probably needed? > [ Obviously s2ram does have one "magic moment", namely the time when the > CPU does the magic read from the northbridge that actually turns off > power for the CPU. But that's really a total non-event from a device > standpoint, so while it's undoubtedly a very interesting moment in the > suspend sequence, it's not really relevant in any way for device > drivers in general. Not at all like the "snapshot moment" that requires > the whole system to be totally quiescent in a "snapshot to disk"! ] > > And the reason s2ram doesn't have a that "snapshot" moment is exactly that > the RAM contents are just always there, so there's no need to have a > "synchronization event" when ram and devices match. The RAM will *always* > match whatever any particular device has done to it, and the proper way to > handle things is to just do a simple per-device "save-and-suspend" event. Yeah. > And yes, the _individual_ "save-and-suspend" events obviously needs to be > "atomic", but it's purely about that particular individual device, so > there's never any cross-device issues about that. No interdependencies? I'm not sure. > For example, if you're a USB hub controller, which is just about the most > complex issue you can have, you obviously want to "save the state" with > the controller in a STOPPED state, but that should just go without saying: > if the controller isn't stopped, you simply *cannot* save the state, since > the state is changing under you. > > The difference is, that the USB driver needs to just "stop, save, and > suspend" as one simple operation for s2ram. In contrast, when doing > snapshot to disk, it cannot do that, because while it does want to do the > "stop" part, it needs to do so _separately_ from the "save" part because > you need to stop everything else *too* before you "save" anythng at all. Agree. Nigel signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi. On Thu, 2007-04-26 at 01:45 +0200, Pavel Machek wrote: > > What's your argument? Your argument seems to be that it's not stupid, > > because it can work. Can't you see that that simply isn't an > > argument at > > I tried keeping module_init/thaw/resume similar code, so that driver > authors can debug suspend-to-disk, cross their fingers, and have > suspend-to-ram work, too. > Now, perhaps enough people do std/str these days so this is not > important any longer... lets hope so. N! It's important and getting more important. More and more, people are going to be demanding better power saving (climate change and all that stuff). The best power saving is to have the thing completely off, so STD is more important. The second best power saving is STR, so that's important too. But even more important is good power saving all the time. For that reason, I agree completely with Linus. The current model is far too limited. It shouldn't be so suspend-to-ram/disk centric, and should instead focus on run-time power management, with suspend to ram and disk as particular instances of run-time power management. It should make appropriate differentiation between snapshotting and suspending to ram. I do disagree that the current suspend-to-disk algorithm is broken. We do need a point at which we say "Ok, drivers, record your state." - the current device_suspend and device_resume calls. But that doesn't mean the need to be called device_suspend/resume or do what they do now. I'd love to help make this happen, but I'm afraid I just don't have the time. Nigel signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi. On Wed, 2007-04-25 at 15:18 -0700, Linus Torvalds wrote: > > On Wed, 25 Apr 2007, Pavel Machek wrote: > > > > Not the same... but they are still related. "freeze" (for atomic > > snapshot) is actually subset of "suspend"... freeze needs DMAs off and > > saved state, and you need DMAs off and saved state for "suspend". > > THEY HAVE ABSOLUTELY NOTHING IN COMMON! > > Nobody in their right mind thinks that "disable DMA" and "suspend" are > similar operations. > > > So it is actually correct to do "suspend" when you want "freeze"; it > > is just slow. That's why they only differ in parameter these days. > > It is *not* correct to "suspend" when you want "freeze". > > I don't understand how you can even *claim* something like that. > > Here's a trivial example: > - SCSI disk > > Tell me, what does "suspend" do, and what does "freeze" (snapshot) do? > > And name *one* thing that have in common. Set/reset the scsi transaction id thingy? Hibernation didn't work with SCSI for a long time precisely because that support was missing. Don't get me wrong, I agree on the whole - Suspend2 worked fine on the whole under 2.4 without a driver model. But they do have a bit in common. Regards, Nigel signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi. On Wed, 2007-04-25 at 15:55 -0700, Linus Torvalds wrote: > > On Thu, 26 Apr 2007, Nigel Cunningham wrote: > > > > > > And name *one* thing that have in common. > > > > Set/reset the scsi transaction id thingy? Hibernation didn't work with > > SCSI for a long time precisely because that support was missing. > > And by "hibernation", you mean what? You mean "snapshot + shutdown", > right? > > Think about it for five seconds, and then ask yourself: at which point in > the "snapshot + shutdown" sequence would you actually tell a disk to shut > down? > > If you said "snapshot", then you'd be *wrong*. No, I didn't. I agree with you that they should be separate and distinct. I'm just pointing out that you're overstretching your argument a little. There are some similiarities in that in both cases we want to get the driver into some quiet state and out of it again. The difference comes from the fact that the quite states don't have to be the same and shouldn't be the same. I won't insult your intelligence by describing the differences in more detail. > That's my _point_. The snapshot() function should not (and MUST NOT) tell > disks to shut down, because unlike suspend(), we're still going to _use_ > those disks afterwards (why? To write out the snapshot image!). > > In other words, the act of creating a snapshot has *nothing* to do with > suspend. Absolutely. It's about getting the data we need to restore it to the same state post-hibernation-cycle (or, more correctly) post-atomic-restore. > Now, after you've created (and written out) the snapshot, what do you > actually end up doing? > > That's right - you end up _shutting down_ the machine, and yes, as part > of the _shutdown_ sequence you may actually end up doing a lot of the > things that a suspend would do. But that's long *after* you've actually > done the "snapshot" part, and has absolutely nothing to do with it. > > That's where I started: whole "suspend to disk" thing actually has _more_ > to do with "shutdown" than with "suspend". That's where I think you're overstretching the argument. Like suspend (to ram), we're concerned at the snapshot point with getting the hardware in the same state at a later stage. Unlike suspend, we don't necessarily want it to enter a low power-usage state as part of that state preservation. Nigel signature.asc Description: This is a digitally signed message part
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Nigel Cunningham wrote: > > Sorry. I wasn't clear. I wasn't saying that suspend to ram has a > snapshot point. I was trying to say it has a point where you're seeking > to save information (PCI state / SCSI transaction number or whatever) > that you'll need to get the hardware into the same state at a later > stage. That (saving information) is the point of similarity. Yes, they do both save information, but I'm not actually convinced they would necessarily even save the *same* information. Let's just take an example of USB, and to make things more interesting, say that the disk you want to suspend to is itself over USB (not necessarily something you _want_ to do, but I think we can all agree that it's something that should potentially work, no?) Now, USB devices actually have per-connection state (at a minimum, the "toggle" bit or whatever), and that's obviously something that will inevitably *change* as a result of the device being used after snapshotting (and even if not used, by the rediscovery by the first kernel to boot), and we fundamentally cannot put the final toggle state in the snapshot. So in the snapshot-to-disk scenario, there are some pieces of data that simply fundamentally *cannot* be snapshotted, because they are not controller state, they are "connection" state. So in that case, you basically know that you *have* to rebuild the connection when you do the "snapshot_resume()" thing. So there's no point in even keeping these kinds of connection states (the same is true of keyboards, mice, anything else - it's how USB works). In contrast, in suspend-to-RAM, USB connections might just be things you actually want to keep open and active, and you *can* do so, in ways you simply cannot do with "snapshot to disk". In fact, if you are something like an OLPC and actually go to s2ram very aggressively, you might well want to keep the connection established, because it's conceivable that you might otherwise lose keypresses etc issues) See? There are real *technical* reasons to believe that the two "save state" operations are really fundamentally different. There are reasons to believe that a s2ram can actually happen while keeping some connections open that cannot be kept open over a disk snapshot. Do they *have* to be different? Of course not. For many devices the "save" and "freeze" operations will likely all be no-ops, and there would be absolutely no difference between suspending and snapshotting, because the driver state already natively contains all the information needed to get the device going again. Equally, I don't doubt that in many drivers you'll have very similar "save state" logic, but in fact I believe that in many cases that "save state" logic will often just be a simple pci_save_state(dev); call, so it's literally the case that they will not be just shared between the "suspend" and "snapshot" case, they'll be shared across all simple PCI devices too! But that doesn't mean that the functions to do so should be the same. You might have static int mypcidevice_suspend(struct pci_dev *dev) { pci_save_state(dev); pci_set_power_state(dev, PCI_D3); return 0; } static int mupcidevice_snapshot(struct pci_dev *dev) { pci_save_state(dev); return 0; } and who cares if they both have that same call to a shared "save state" function? They're still totally different operations, and the fact that *some* devices may save the same things doesn't make them any more similar! See above why some devices might save totally *different* things for a "snapshot" vs a "suspend" event. > I suppose that's another point of similarity - for snapshotting, the > same ordering is probably needed? I agree that you're likely to walk the device list in the same order. The whole "shut down leaf devices first", "start up root devices first" is pretty fundamental. But that's true of reboot and device discovery too. Should that ordering mean that we should use the "discovery()" function and pass it a flag and say "you shouldn't discover, you should snapshot or suspend now"? No. Everybody agrees that device discovery is something different from device suspend. The fact that it's done in a topological order and thus they bear some kind of inverse relationship to each other doesn't make them "the same". > > And yes, the _individual_ "save-and-suspend" events obviously needs to be > > "atomic", but it's purely about that particular individual device, so > > there's never any cross-device issues about that. > > No interdependencies? I'm not sure. Well, we pretty much count on it, since we will *suspend* the devices at the same time. So if they had interdependencies that aren't described by the ordering we enforce, they are pretty much screwed anyway ;) So yes, the device list needs to be topologically sort
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Wed, 25 Apr 2007, H. Peter Anvin wrote: > > That was the 1990s. On a brand new server system: > > 00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA > Engine (rev b1) > > For better or worse, slave DMA seems to be making a comeback of sorts. > Not to mention all kinds of embedded crap^Whardware with optimized DMA > engines which look nothing like PCI at all. Well, the solution to that tends to be to just leave them be, and hold them on until the very end - and just ignore them (and just make-believe that it's actually the device itself that does the DMA transfer). The PCI spec for controlling DMA is really pretty nasty. You can disable it in the PCI config word, of course, but that usually just messes up the device entirely. So in practice, the way to shut up DMA (regardless of whether it's an internal DMA engine or an external one) is that you just tell the device not to listen any more (for example, for a network controller, the way to make sure it doesn't do DMA is just to make sure that you're not sending any frames, but also that it's not listening to any either)! So whether it's internal to the device, or some "system DMA controller", the sequence for shutting down DMA always ends up being the same: - make sure the host itself doesn't generate any new traffic (eg shut down the send-queue). This is generally a higher-level thing anyway, ie not really a driver decision. - the driver needs to tell the hardware to stop listening (ie "stop scanning the command mailboxes" or "stop walking USB command structures" or "stop receiving data") - the driver then needs to wait for the controller to say "ok, I'm idle". because regardless of whether it's the system DMA controller or some on-chip DMA controller, you generally can *not* just say "stop transferring DMA data", because that will generally just lock the chip up or cause other major unhappiness. So I don't think an external DMA controller (like the i8237, ugh!) really _changes_ anything. Except for just the horrible pain of serializing access to them for programming etc horrible resource handling issues, of course (but that's not specific to suspend/resume). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Linus Torvalds wrote: > > On Thu, 26 Apr 2007, Pavel Machek wrote: >> Ok, I guess I'll have nightmares of DMA controllers doing DMAs from >> chips that are no longer there tonight. > > Umm. Welcome to the 21st century: we don't do that "separate DMA > controller" thing any more. All devices do their own DMA. > That was the 1990s. On a brand new server system: 00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine (rev b1) For better or worse, slave DMA seems to be making a comeback of sorts. Not to mention all kinds of embedded crap^Whardware with optimized DMA engines which look nothing like PCI at all. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Nigel Cunningham wrote: > > That's where I think you're overstretching the argument. Like suspend >(to ram), we're concerned at the snapshot point with getting the hardware >in the same state at a later stage. Really, no. "suspend to ram" doesn't _have_ a "snapshot point". I've tried to explain this multiple times, I don't know why it's not apparently sinking in. This is much more fundamental than the fact that you don't want to stop disks for snapshotting, although it really boils down to all the same issues: the operations are simply not at all the same! I agree 100% that "snapshot to disk" is a "snapshot event". You have to create a single point in time when everything is stable. And I'd much rather call it "snapshot to disk" than "suspend to disk" to make it clear that it's something _totally_ different from "suspend". Because the thing is, "suspend to ram" is *not* a snapshot event. At no point do you actually need to "snapshot" the system at all. You can just gradually shut more and more things down, and equally gradually bring them back up. There simply is *never* any "snapshot" time from a device standpoint, because you can just shut down devices in the right order AND YOU ARE DONE. Really. [ Obviously s2ram does have one "magic moment", namely the time when the CPU does the magic read from the northbridge that actually turns off power for the CPU. But that's really a total non-event from a device standpoint, so while it's undoubtedly a very interesting moment in the suspend sequence, it's not really relevant in any way for device drivers in general. Not at all like the "snapshot moment" that requires the whole system to be totally quiescent in a "snapshot to disk"! ] And the reason s2ram doesn't have a that "snapshot" moment is exactly that the RAM contents are just always there, so there's no need to have a "synchronization event" when ram and devices match. The RAM will *always* match whatever any particular device has done to it, and the proper way to handle things is to just do a simple per-device "save-and-suspend" event. And yes, the _individual_ "save-and-suspend" events obviously needs to be "atomic", but it's purely about that particular individual device, so there's never any cross-device issues about that. For example, if you're a USB hub controller, which is just about the most complex issue you can have, you obviously want to "save the state" with the controller in a STOPPED state, but that should just go without saying: if the controller isn't stopped, you simply *cannot* save the state, since the state is changing under you. The difference is, that the USB driver needs to just "stop, save, and suspend" as one simple operation for s2ram. In contrast, when doing snapshot to disk, it cannot do that, because while it does want to do the "stop" part, it needs to do so _separately_ from the "save" part because you need to stop everything else *too* before you "save" anythng at all. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Wed, 2007-04-25 at 21:25 +0200, Adrian Bunk wrote: > On Wed, Apr 25, 2007 at 11:50:45AM -0700, Linus Torvalds wrote: > > > > > > On Wed, 25 Apr 2007, Adrian Bunk wrote: > > > > > > 3W for the complete system? In CPU state S1? [1] > > > > In STR, 3W is quite realistic. The CPU is off, all (or most - up to you) > > the devices are off, but the motherboard and memory is powered. > > As far as I understand it, the CPU isn't off in S1. > > > > And even 3W would still be a waste of energy. It is, especially if you're living in a place where power infrastructure is unreliable (such as where I live). Currently, because of the summer heat, power demand exceeds power supply so we experience practically daily rotating 4-hour power interruption. That 3W saved multiplied by the total number of computers is a lot. In this perspective, S2D (or shutdown) is preferred over S2RAM. Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Alan Cox wrote: > > You bet there is. We need to know if data arrived or not, because there > is no guarantee that the data retrieved if we inadvertently re-execute a > command will be the same. The hardware state itself isn't the problem, > its the combination of hardware state and internal state which need to > match in some cases. ... which is why "suspend()" suspends the hardware. Is that so hard to understand? Once the hardware is suspended, it's not doing anything. But STR doesn't have any need for atomicity guarantees _between_devices_. That's a really *fundamental* difference. The reason s2ram is *so* different from snapshot-to-disk is exactly the fact that s2ram can (and does) work on one device at a time. In contrast, snapshot-to-disk needs to snapshot all the devices *together*, since it has a separate disk image. See? Two *totally* different cases. They have *nothing* in common. Not the call sequence, not the logic, not *anything*. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Sort of my 2-many-cents story on why I need "snapshot/restore"... Am Wed, 25 Apr 2007 13:08:09 -0700 (PDT) schrieb Linus Torvalds <[EMAIL PROTECTED]>: > > > On Wed, 25 Apr 2007, Kenneth Crudup wrote: > > > > Any working suspend-to-disk method takes care of that for me. (I'm > > really not sure why Linus hates S2D so much, though. Back in the day > > there was a lot more BIOS support, but that's been years now.) > > The really sad part is that APM actually did this better.. This really triggers a nerve in me. My laptops (always used models from some years ago, even) didn't necessarily get easier with respect to power management (suspend) over time. My first laptop (Siemens Scenic Mobile 710, 200Mhz Pentium, maxed to 192MB RAM) worked just fine with APM, be it s2ram or s2disk. Everything handled by the BIOS. Admittedly, S2disk was quite slow as it stored all ram and didn't write to the disk as fast as possible, but it worked. S2ram was also a viable option because I was even able to easily swap batteries because the thing had two bays to put batteries in. The next one was a Toshiba Portege 7020 CT (366MHz Pentium2 with dynamic clock, 192MB), supporting both APM and ACPI. Installing Linux was not that easy, I think I remember that APM in kernel froze the box (early 2.6 kernel), while ACPI needed some headache to set up (compiling a fixed DSDT into the kernel, for example)... I needed experimental toshiba_acpi to get functions and the acpi_pm_timer to get something like continuous system clock (special cpu throttling has funny effects). Well, I got it together after some time. Used suspend2 for "snapshot/restore" and actually was able to use ACPI S3 with the glitch of having to unload/load psmouse driver ... until I realized that it only resumed in about 80% of cases (BIOS ). So suspend2 was a badly needed "hack" around the hardware/BIOS to get some sane workflow. I remember dealing with swsusp / pmdisk before... but I really ended up with suspend2 as the thing that works (and I wouldn't have bothered finding this patch if the in-kernel stuff worked for me). Of course this was a long time ago and recently I have seen that in-kernel swsusp works ok, just this unresponsiveness after "restore" due to missing page cache... Now I have an IBM ThinkPad X31 (600-1.4GHz Pentium M, 512MB). ACPI. SpeedStep. The machine generally works fine, hardware config via ACPI seems to be fine. But doing S3/STR? Well... this machine has the odd idea that turning the system off but the screen backlight back on after a second is a good idea. Of course just now S3 worked fine... you cannot even depend on the malfunction -- could have something to do with changing bootup video from LCD to VGA output for some other reason recently. Hm. Perhaps it even may work (after tricking the BIOS!?). But I doubt I'll suddenly develop trust in that. I _had_ trust in APM STR and STD. I am quite confident in suspend2 being able to correctly resume (restore) after a successful suspend (snapshot/restore). And then, STR doesn't help me on the road when I need to exchange the battery (I'd need this special extra battery to put under the ThinkPad for that). Another thing is that the old Siemens has a nice auxilliary monochrome LCD that shows the charge status of the batteries in 5 levels, so you have some means to predict the time you have in STR. The Thinkpad has greed LED for "battery level OK" and red for "battery level low". Well, but the Linux kernel won't change that... Perhaps at some time ACPI implementations in BIOS get to something reliable (hm, should I get a PowerBook instead?) and can be a good partner for Linux which struggles for many years now to get into the post-APM era. Remember reading desktop PC test reports in the c't magazine in the last years, S3 usually did _not_ work; with Windows, even. Well, there must be a reason Microsoft chose to implement the "hibernate" (it _is_ in software, right?). The APM->ACPI transition made me use the software STD (snapshot/restore...;-) and I think I will stay with it for the forseeable future, and be it because I can do fancy things like image encryption. ACPI S3 / STR is a nice addition when it works, for the smaller pauses (changing a train at the station, leaving office for half an hour...), but I consider STD really to be the more important feature that enables me to _never_ close my applications unless I want to do a kernel update. I really must say that some sort of STD is a total must for a laptop for me. On the other hand I once had a Psion 5MX, which basically was on STR all the (non-working) time -- and enabled well over 20h of working time on two AAs. When laptops enter that range of battery life, I guess I could arrange with just doing STR and won't have to worry about changing batteries without AC connection;-) Alrighty then, Thomas. signature.asc Description: PGP signature
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Pavel Machek wrote: > > Ok, I guess I'll have nightmares of DMA controllers doing DMAs from > chips that are no longer there tonight. Umm. Welcome to the 21st century: we don't do that "separate DMA controller" thing any more. All devices do their own DMA. > Only the fact that we are currently using same device call during > snapshot() and during restore(). We obviously could do _5_ device > calls > > (suspend/resume/freeze/quiesce_disable_dma/thaw) > > ...but that looks like too many calls to me. I'd much rather have five or even more functions that each do *one* obvious thing. Think like a device driver writer: would you prefer to just implement five functions that do something very specific that you know trivially how to do ("I know how to disable interrupts and DMA") or would you want to do some high-level opertion that you don't even know why the caller asks you to suspend? What does "suspend()" even mean when the caller is just going to wake up up immediately again? Is it performance-critical? Should I tear down all my DMA's? I dunno! In other words, splitting things up actually makes things simpler. That's *doubly* true if you can then give each specific function some really clear goals. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Wed, 25 Apr 2007, Linus Torvalds wrote: > > The *thaw* needs to happen with devices quiescent. Btw, I sure as hell hope you didn't use "suspend()" for that. You're (again) much better off having a totally separate function that just freezes stuff. So in the "snapshot+shutdown" path, you should have: - prepare_to_snapshot() - allocate memory, and possibly return errors We can skip this, if we just make the rule be that any devices that want to support snapshotting must always have the memory required for snapshotting pre-allocated. Most devices really do allocate memory for their state anyway, and the only real reason for the "prepare" stage here is becasue the final snapshot has to happen with interrupts off, obviously. So *if* we don't need to allocate any memory, and if we don't expect to want to accept some early error case, this is likely useless. - snapshot() - actually save device state that is consistent with the memory image at the time. Called with interrupts off, but the device has to be usable both before and afterwards! And I would seriously suggest that "snapshot()" be documented to not rely on any DMA memory, exactly because the device has to be accessible both before and after (before - because we're running and allocating memory, and after - because we'll be writing thigns out). But see later: For the "resume snapshot" path, I would suggest having - freeze(): quiesce the device. This literally just does the absolute minimum to make sure that the device doesn't do anything surprising (no interrupts, no DMA, no nothing). For many devices, it's a no-op, even if they can do DMA (eg most disk controllers will do DMA, but only as an actual result of a request, and upper layers will be quiescent anyway, so they do *not* need to disable DMA) NOTE! The "freeze()" gets called from the *old* kernel just _before_ a snapshot unpacking!! - restart_snapshot() - actually restart the snapshot (and usually this would involve re-setting the device, not so much trying to restore all the saved state. IOW, it's easier to just re-initialize the DMA command queues than to try to make them "atomic" in the snapshot). NOTE! This gets called by the *new* kernel _after_ the snapshot resume! And if you *want* to, I can see that you might want to actually do a "unfreeze()" thing too, and make the actual shapshotting be: /* We may not even need this.. */ for_each_device() { err = prepare_to_snapshot(); if (err) return err; } /* This is the real work for snapshotting */ cli(); for_each_device() freeze(dev); for_each_device() snapshot(dev); .. snapshot current memory image .. for_each_device_depth_first() unfreeze(dev); sti(); and maybe it's worth it, but I would almost suggest that you just make the rule be that any DMA etc just *has* to be re-initialized by "restart_snapshot()", in which case it's not even necessary to freeze/unfreeze over the device, and "snapshot()" itself only needs to make sure any non-DMA data is safe. But adding the freeze/unfreeze (which is a no-op for most hardware anyway) might make things easier to think about, so I would certainly not *object* to it, even if I suspect it's not necessary. Anyway, the restore_snapshot() sequence should be: /* Old kernel.. Normal boot, load snapshot image */ cli() for_each_device() freeze(dev); restore_snapshot_image(); restore_regs_and_jump_to_image(); /* noreturn */ /* New kernel, gets called at the snapshot restore address * with interrupts off and devices frozen, and memory image * constsntent with what it was at "snapshot()" time */ for_each_dev_depth_first() restore_snapshot(dev); /* And if you want to, just to be "symmetric" for_each_dev_depth_first() unfreeze(dev) although I think you could just make "restore_snapshot()" implicitly unfreeze it too.. */ sti(); /* We're up */ and notice how *different* this is from what happens for s2ram. There really isn't anything in common here. Exactly because s2ram simply doesn't _have_ any of the issues with atomic memory images. So s2ram is just for_each_dev() suspend(dev); cli(); for_each_dev() late_suspend(dev); .. go to sleep .. for_each_dev_depth_first() early_resume(dev); sti(); for_each_dev_depth_first() resume(dev); and has none of the "freeze" issues at all. Doesn't that seem a lot more straightforward? Yes, it's more functions, but each function is a lot more obvio
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Pavel Machek wrote: Now, if the old kernel left DMAs running, it could be overwriting the data we are copying in. The *thaw* needs to happen with devices quiescent. But that sure doesn't have anythign to do with the "snapshot()" path. In fact, you'll have rebooted the machine in between. Only the fact that we are currently using same device call during snapshot() and during restore(). We obviously could do _5_ device calls (suspend/resume/freeze/quiesce_disable_dma/thaw) ...but that looks like too many calls to me. So what does that have to do with "snapshotting"? I'm not comfortable with memory I'm copying changing under my hands because of some DMA. It just looks like asking for trouble. I _think_ we can get away with DMA running during snapshot, because driver may not assume anything about the DMA result before it got completion interrupt, but... the key is that with STR you don't need to copy the memory (it's staying where it is) for STD you need to copy the memory, and there you halt DMA becouse you need to make an atomic snapshot. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
> STR does not need to "ensure that you have a consistent snapshot". Linus I think someone's been spiking your guinness again... > Why? Becuase there is no _room_ for inconsistency. There's nothing to be > "inconsistent with", since any changes to memory (by things like DMA or > other setup that happens while the suspend process is going on) is by > _definition_ consistent with the resume image (becasue there is no > separate image). You bet there is. We need to know if data arrived or not, because there is no guarantee that the data retrieved if we inadvertently re-execute a command will be the same. The hardware state itself isn't the problem, its the combination of hardware state and internal state which need to match in some cases. > off DMA and try to make the hardware be wevy wevy quiet while it's hunting > wabbits, it's a lot easier to just do nothing at all on "freeze", and just > make sure that "thaw" will re-initialze the DMA tables entirely! All Who cares about DMA mapping tables, those are easy to deal with, not even that bad with an IOMMU to restore. More problematic is the users data because if we have a device where re-executing a command is not repeatable (eg O_DIRECT SCSI on a shared bus) then we risk being inconsistent in our S2RAM. If we re-run the command on resume having allowed it to prattle on while doing S2anything then we'll get the wrong answer. Now there are lots of devices we don't care about as they don't have state in the form that causes problems - network cards, TV capture etc, but there are cases where it matters that every operation is either finished or not started and there is no doubt about them getting done during the S2RAM/S2DISK S2DISK/S2RAM both need to synchronize the state of a device so it can build a valid snapshot. That bit is a shared requirement just like you said didn't exist. Doesn't even need to involve turning DMA off, just getting a consistent state. The rest can be quite different. Mind you some laptops think S2RAM is just a transition state on the way to disk, leave them in ACPI S2RAM and the firmware will magically turn it into a save to disk and back to ram if the battery runs low or you leave it idle too long. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > > Why? Becuase there is no _room_ for inconsistency. There's nothing to be > > > "inconsistent with", since any changes to memory (by things like DMA or > > > other setup that happens while the suspend process is going on) is by > > > _definition_ consistent with the resume image (becasue there is no > > > separate image). > > > > Do you propose to keep DMAs running while suspending-to-RAM? > > What part of "suspend a chip" do you have trouble with? > > DMA obviously does *not* happen with a suspended device. There's no need > to turn DMA even off - it just doesn't happen! Ok, I guess I'll have nightmares of DMA controllers doing DMAs from chips that are no longer there tonight. > > > For example, the whole myth that "freeze" needs to shut off DMA is a > > > total > > > and utter *myth*. It needs nothing of the sort at all. Rather than shut > > > off DMA and try to make the hardware be wevy wevy quiet while it's > > > hunting > > > wabbits, it's a lot easier to just do nothing at all on "freeze", > > > > No. Sorry, you are wrong here. > > > > Remember that during resume we run > > > > freeze() > > copy old data into memory > > thaw() > > > > Now, if the old kernel left DMAs running, it could be overwriting > > the data we are copying in. > > The *thaw* needs to happen with devices quiescent. > > But that sure doesn't have anythign to do with the "snapshot()" path. In > fact, you'll have rebooted the machine in between. Only the fact that we are currently using same device call during snapshot() and during restore(). We obviously could do _5_ device calls (suspend/resume/freeze/quiesce_disable_dma/thaw) ...but that looks like too many calls to me. > So what does that have to do with "snapshotting"? I'm not comfortable with memory I'm copying changing under my hands because of some DMA. It just looks like asking for trouble. I _think_ we can get away with DMA running during snapshot, because driver may not assume anything about the DMA result before it got completion interrupt, but... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Pavel Machek wrote: > > > > Why? Becuase there is no _room_ for inconsistency. There's nothing to be > > "inconsistent with", since any changes to memory (by things like DMA or > > other setup that happens while the suspend process is going on) is by > > _definition_ consistent with the resume image (becasue there is no > > separate image). > > Do you propose to keep DMAs running while suspending-to-RAM? What part of "suspend a chip" do you have trouble with? DMA obviously does *not* happen with a suspended device. There's no need to turn DMA even off - it just doesn't happen! > > For example, the whole myth that "freeze" needs to shut off DMA is a total > > and utter *myth*. It needs nothing of the sort at all. Rather than shut > > off DMA and try to make the hardware be wevy wevy quiet while it's hunting > > wabbits, it's a lot easier to just do nothing at all on "freeze", > > No. Sorry, you are wrong here. > > Remember that during resume we run > > freeze() > copy old data into memory > thaw() > > Now, if the old kernel left DMAs running, it could be overwriting > the data we are copying in. The *thaw* needs to happen with devices quiescent. But that sure doesn't have anythign to do with the "snapshot()" path. In fact, you'll have rebooted the machine in between. So what does that have to do with "snapshotting"? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Pavel Machek wrote: > > > For suspend to ram, in contrast, since you *know* that nobody will be > > touching the hardware, and since the timings are very different anyway > > (you'd hope that you can resume in a second or two), you'd generally want > > to keep the DMA engine tables right where they are, and just literally > > suspend the PCI chip itself. > > I'd actually prefer resume to be similar to module insert, too... Do > you think that resume is _that_ time critical? I think it probably depends on the device, and it should depend on the driver writer how he wants to do it. My _point_ is that there is absolutely zero reason to think that the two events are the same. We *know* that for snapshot+shutdown, we need to actually keep the DMA tables intact *over* the snapshot (because writing out the snapshot may _need_ them). But exactly because we keep them intact, a driver writer may sanely say "I didn't even bother shutting them down, so on thaw, I cannot trust them, so I'll just re-initialize them entirely". In contrast, over suspend-to-ram, it's entirely reasonable to just leave them in memory, and just keep them. There's no *reason* not to. And that's my whole point in this argument: the two paths are fundamentally totally different. You *claim* that "snapshot()" needs to stop DMA etc, but that's simply not true. So I claim: - for a lot of devices, it's actually a *lot* easier to just have snapshot not do anythign at all, and re-initialze on thaw - for those same devices, for s2ram, since the tables are *safe* and don't get touched by anything else, it's probably easier to just let them be. See? The "it's easier to do X" is a _different_ X for the two cases. So the whole "suspend is a superset of freeze" is simply not true. > [I'd like you to drop me a line saying you understand current design > and that it works -- even if it is very inelegant] I _do_ understand the current design. I just think that it's totally and seriously broken. I've ranted against it before. I think it's stupid to play like you're "suspending" something just to save some state, especially since most users probably don't even *want* to suspend the state, and would quite happily re-initialize the chip instead. And I think it's horrible to have a dynamic flag to tell the difference between two or more state changes that the devices should statically be able to determine. _If_ some driver really does have the same routine, just use the same routine. There are no downsides to splitting them up. > Now, we can separate suspend/freeze and resume/thaw (with some common > helpers). It will speed the code up by avoiding unneccessary > operations. It also needs attetion from driver writers (ouch). > > Do we want to do that? I'd personally certainly want to do that. But I want to split up the callers too. Right now we mix those a lot as well. I suspect that would automatically be fixed by just forcing them to separate out (since they now call different functions of the devices), but I'm not 100% sure. There might be other issues. Just as an example: one of the most painful things there is in the suspend sequence is that we shut off the console (because the console device will be suspended in hw, and it's thus not safe to use it over a suspend/resume sequence). That should just go away entirely for "snapshot()", because there is *never* any excuse for actually turning off the console during a snapshot: even a network console should be entirely functional. Things like that - things that sound like small issues, but that really aren't. (Right now you can enable the "don't disable the console" config option, but since network drivers will actually shut down etc, it just means that you'll have oopses etc if you do, and you have netconsole enabled) Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > Both of them have to ensure you can make a consistent snapshot. > > Bzzt. Wrong again. Very much so. > > STR does not need to "ensure that you have a consistent snapshot". > > Why? Becuase there is no _room_ for inconsistency. There's nothing to be > "inconsistent with", since any changes to memory (by things like DMA or > other setup that happens while the suspend process is going on) is by > _definition_ consistent with the resume image (becasue there is no > separate image). Do you propose to keep DMAs running while suspending-to-RAM? That sounds really unsafe; we are shutting down our PCI controllers at that time; doing that while DMAs are running sounds bad. > That's TOTALLY DIFFERENT from "suspend to disk". In suspend to disk, you > need a completely different kind of mindset, namely you need a single > consistent image, where the image is consistent not only with memory, but > with all the devices. > > For example, the whole myth that "freeze" needs to shut off DMA is a total > and utter *myth*. It needs nothing of the sort at all. Rather than shut > off DMA and try to make the hardware be wevy wevy quiet while it's hunting > wabbits, it's a lot easier to just do nothing at all on "freeze", No. Sorry, you are wrong here. Remember that during resume we run freeze() copy old data into memory thaw() . Now, if the old kernel left DMAs running, it could be overwriting the data we are copying in. It is not about DMA tables. While resuming, CPU needs to be alone, without interference from DMA engines (or other CPUs), because copying back old image means writing to memory that was not properly alocated. (Now, we could add one more hook, turn_off_dmas_for_copyback(), but that looks like way too many hooks to me. And I'm not comfortable with DMA engines running while I'm trying to copy image. They may be overwriting data I'm trying to copy...) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > Current design is: > > Broken. Yes. I've tried to tell you. Ok. ... > It's worse than just confusing, it's *idiotic*. > > It _can_ work in practice, but > - we have pretty damn solid evidence that it doesn't work all that often >in practice > - the fact that something *can* be done the stupid way is in no way an >argument that it *should* be done the stupid way. > > I claim that the current STD is *stupid*. Yes, it can work. But that > doesn't make it less stupid. Good. So you understand how it works. > What's your argument? Your argument seems to be that it's not stupid, > because it can work. Can't you see that that simply isn't an > argument at I tried keeping module_init/thaw/resume similar code, so that driver authors can debug suspend-to-disk, cross their fingers, and have suspend-to-ram work, too. Now, perhaps enough people do std/str these days so this is not important any longer... lets hope so. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Wed, Apr 25, 2007 at 11:50:45AM -0700, Linus Torvalds wrote: > .. but if the alternative is a feature that just isn't worth it, and > likely to not only have its own bugs, but cause bugs elsewhere? (And yes, > I believe STD is both of those. There's a reason it's called "STD". Go > to google and type "STD" and press "I'm feeling lucky". Google is God). If it was correctly designed, it would be possible to change the hardware or even the kernel through a STD cycle. And that would be damn interesting on servers. In any case, if I could trust it, I'd use it when I need to move servers around and I don't want to lose what is running. Riding power cuts that way would be nice. OG. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Pavel Machek wrote: > > Current design is: Broken. Yes. I've tried to tell you. > Twice. Once during snapshot (then we spin it up when the snapshot is > done), and once during shutdown. And nobody can possibly say that is "sane". But it's a direct result of the incorrect thinking that "suspend()" and "snapshot()" have anything what-so-ever to do with each other. > Yep, we optimize away spindown, because it takes too long, so SCSI > disks are actually very bad example. No. SCSI disks are a *good* example. It's an example of how you (incorrectly) call the same function for two totally different things, and then that function is smart enough that it *understands* that they are totally different. But the *confusion* remains. It remains in your head, and you've poisoned people like Alan too, that usually are not confused. And THAT is the main problem (although there are also indirect problems like "fixing one may break the other", but I actually think that the fundamental problem is the confusion it creates, which in turn causes bugs to happen because people are confused and think that they should do the same thing for suspend and for snapshot). > No, I'd like you to understand that we actually CAN tell the disks to > spin down, because we'll call resume and spin them back again before > writing the image. We used to do it. We still can do it, but it is > slow. > > Yes, it is quite confusing. It's worse than just confusing, it's *idiotic*. It _can_ work in practice, but - we have pretty damn solid evidence that it doesn't work all that often in practice - the fact that something *can* be done the stupid way is in no way an argument that it *should* be done the stupid way. I claim that the current STD is *stupid*. Yes, it can work. But that doesn't make it less stupid. What's your argument? Your argument seems to be that it's not stupid, because it can work. Can't you see that that simply isn't an argument at all? "stupid and wrong" doesn't mean "cannot work in theory". But it *does* mean that people get confused, and it *does* mean that there are likely more bugs, because confused people do not tend to write very good code. I'm not claiming that the current code cannot work. It clearly *does* work for a lot of people. But I'm claiming that it's STUPID. So don't argue that "it works". Windows works, kind of. That doesn't make it less stupid and badly designed! Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > > I don't understand how you can even *claim* something like that. > > > > BTW most problems are in thaw/resume functions. > > And do you realize that the thaw/resume functions are totally different > too? > > Or rather, they *would* be, if you allowed them to. > > For example, for "snapshot + thaw", the _sane_ thing is to actually make > the snapshot just throw away all the DMA tables etc, and let the thawing > just do a full initialization (as it did on boot). It basically needs to > do that anyway, and it simplifies the whole thing (ie you don't even > *want* to save things like the DMA command queues etc - the ones that will > quite often be stepped on by the final "write snapshot to disk" stuff > anyway). I'd prefer thaw to be similar to module insert, yes. > For suspend to ram, in contrast, since you *know* that nobody will be > touching the hardware, and since the timings are very different anyway > (you'd hope that you can resume in a second or two), you'd generally want > to keep the DMA engine tables right where they are, and just literally > suspend the PCI chip itself. I'd actually prefer resume to be similar to module insert, too... Do you think that resume is _that_ time critical? > You think they have things in common just because your whole (incorrect) > mindset has _forced_ them to have things in common, becasue your setup > stupidly thinks that "resume" is the same as "thaw", the same way you > think "freeze" is the same as "suspend". > > NEITHER is true. You've _made_ them true in your mind, but there's > absolutely zero reason that they *should* be true. [I'd like you to drop me a line saying you understand current design and that it works -- even if it is very inelegant] Now, we can separate suspend/freeze and resume/thaw (with some common helpers). It will speed the code up by avoiding unneccessary operations. It also needs attetion from driver writers (ouch). Do we want to do that? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Wed, 25 Apr 2007, Alan Cox wrote: > > Both of them have to ensure you can make a consistent snapshot. Bzzt. Wrong again. Very much so. STR does not need to "ensure that you have a consistent snapshot". Why? Becuase there is no _room_ for inconsistency. There's nothing to be "inconsistent with", since any changes to memory (by things like DMA or other setup that happens while the suspend process is going on) is by _definition_ consistent with the resume image (becasue there is no separate image). > Doing that means you've got to be able to define a single "point" at > which the snapshot is made and is internally self-consistent. That in > both cases tends to mean you've got to ensure nothing occurs which pees > on the image while you are making that snapshot (such as outstanding > O_DIRECT I/O to user pages). Get off the drugs, Alan. There *is* no snapshot with suspend-to-ram. Which is the whole point I'm trying to make! A _lot_ of people are confused about this. With suspend-to-ram, you don't need to do a damn thing to the chip, except suspend it and resume it. There are _zero_ consistency issues. There is no need to freeze anything at any point. You can suspend each device totally independently of all other devices (taking into account things like bus topology, of course), and there is no "atomic" snapshot that needs to ever exist. That's TOTALLY DIFFERENT from "suspend to disk". In suspend to disk, you need a completely different kind of mindset, namely you need a single consistent image, where the image is consistent not only with memory, but with all the devices. For example, the whole myth that "freeze" needs to shut off DMA is a total and utter *myth*. It needs nothing of the sort at all. Rather than shut off DMA and try to make the hardware be wevy wevy quiet while it's hunting wabbits, it's a lot easier to just do nothing at all on "freeze", and just make sure that "thaw" will re-initialze the DMA tables entirely! All drivers have code to do that anyway, since that's what you need to do at boot. Notice? Totally different. Absolutely NOTHING in common. Not on a practical plane, and not even conceptually. The current (broken!) implementation has forced a totally idiotic model on things, where instead of snapshotting doing the sane and simple thing, it ends up doing extra work that is totally unnecessary, but *becomes* necessary just because it *also* shares the "resume" path (which should _not_ be the same either!) Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > > And name *one* thing that have in common. > > > > Set/reset the scsi transaction id thingy? Hibernation didn't work with > > SCSI for a long time precisely because that support was missing. > > And by "hibernation", you mean what? You mean "snapshot + shutdown", > right? > > Think about it for five seconds, and then ask yourself: at which point in > the "snapshot + shutdown" sequence would you actually tell a disk to shut > down? Current design is: Twice. Once during snapshot (then we spin it up when the snapshot is done), and once during shutdown. Yep, we optimize away spindown, because it takes too long, so SCSI disks are actually very bad example. > If you said "snapshot", then you'd be *wrong*. > > That's my _point_. The snapshot() function should not (and MUST NOT) tell > disks to shut down, because unlike suspend(), we're still going to _use_ > those disks afterwards (why? To write out the snapshot image!). No, I'd like you to understand that we actually CAN tell the disks to spin down, because we'll call resume and spin them back again before writing the image. We used to do it. We still can do it, but it is slow. Yes, it is quite confusing. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Pavel Machek wrote: > > > > I don't understand how you can even *claim* something like that. > > BTW most problems are in thaw/resume functions. And do you realize that the thaw/resume functions are totally different too? Or rather, they *would* be, if you allowed them to. For example, for "snapshot + thaw", the _sane_ thing is to actually make the snapshot just throw away all the DMA tables etc, and let the thawing just do a full initialization (as it did on boot). It basically needs to do that anyway, and it simplifies the whole thing (ie you don't even *want* to save things like the DMA command queues etc - the ones that will quite often be stepped on by the final "write snapshot to disk" stuff anyway). For suspend to ram, in contrast, since you *know* that nobody will be touching the hardware, and since the timings are very different anyway (you'd hope that you can resume in a second or two), you'd generally want to keep the DMA engine tables right where they are, and just literally suspend the PCI chip itself. See? Again, *nothing* in common. You think they have things in common just because your whole (incorrect) mindset has _forced_ them to have things in common, becasue your setup stupidly thinks that "resume" is the same as "thaw", the same way you think "freeze" is the same as "suspend". NEITHER is true. You've _made_ them true in your mind, but there's absolutely zero reason that they *should* be true. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/