Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]
> > What's the purpose of the card-counting loop in > > host/omap.c:mmc_omap_switch_handler()? It looks like dead code. > > > > I'm not too familiar with that driver, but they've been playing around > with multiplexing several cards into one controller. Might be bits and > pieces of that. > AFAIK, that came after mmc multislot support. Omap2 has one mmc controller for several (2 in practice) mm slots (cards). It's not really a dead code since it's used in n800 and n810, it's in linux-omap archives and git tree. Carlos Aguiar, added in the loop, might have some comments about it. -- Best Regards, Felipe Balbi [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]
On Mon, 25 Feb 2008, Pierre Ossman wrote: > > What do you think of the patch attached to comment #40 in the Bugzilla > > entry? > > > > Looks ok. As long as those two synchronization points are guaranteed, > then I'm happy. Maybe a better approach would be to leave the workqueue unfreezable, and keep cancel_delayed_work_sync() in mmc_suspend_host(). It would then be necessary to add a test to verify, if there is a card attached, that the card is indeed suspended. After all, it's possible that the cancel_delayed_work_sync() ended up waiting for a job already running on the workqueue to register a new card! (The same would be true even with flush_scheduled_work.) Also, as a bit of defensive programming, it might be a good idea to add a "suspended" flag to the mmc_host structure. If mmc_rescan() sees that the flag is set then it should return immediately. This would protect against host drivers that aren't careful to disable detect IRQs before calling mmc_suspend_host(). Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]
On Mon, 25 Feb 2008 12:58:30 -0500 (EST) Alan Stern <[EMAIL PROTECTED]> wrote: > Thanks for the explanations. > > On Mon, 25 Feb 2008, Pierre Ossman wrote: > > > Trying to keep up with the PM changes is a full time job. For now I've > > mostly ignored it as I don't even have time for the other stuff. > > Patches welcome. > > What do you think of the patch attached to comment #40 in the Bugzilla > entry? > Looks ok. As long as those two synchronization points are guaranteed, then I'm happy. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]
Thanks for the explanations. On Mon, 25 Feb 2008, Pierre Ossman wrote: > Trying to keep up with the PM changes is a full time job. For now I've > mostly ignored it as I don't even have time for the other stuff. > Patches welcome. What do you think of the patch attached to comment #40 in the Bugzilla entry? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]
On Sun, 24 Feb 2008 10:33:34 -0500 (EST) Alan Stern <[EMAIL PROTECTED]> wrote: > > Well, the patch itself isn't really adequate; it was just a first pass > intended to illustrate the nature of the problem. > > The problems in the MMC subsystem go deeper. > > The first is the way it uses flush_workqueue(). This is almost always > a bad function to call, because it introduces unnecessary dependencies. > cancel_delayed_work_sync() is much better. > > But even changing that won't solve the second issue, which is a genuine > bug. There is a race between detect events and suspend events. The > mmc_suspend_host() routine starts out by flushing the kmmcd workqueue > before calling the host's suspend routine. So what happens if another > detect event occurs in between? > The idea is that host drivers shouldn't do that. Once they've called mmc_suspend_host(), then they shouldn't be poking the MMC core in any other way. None of this is of course properly documented. :/ > This whole area of synchronization between card insertion, card > removal, suspend, and resume needs to be thought out better. > Especially keeping in mind that device registration or unregistration > during a system sleep is likely to block until the sleep is over. Trying to keep up with the PM changes is a full time job. For now I've mostly ignored it as I don't even have time for the other stuff. Patches welcome. > > Lastly, there are some other questions about confusing aspects of the > subsystem. What's the story with __mmc_claim_host()? Is it really > nothing more than an abortable mutex_lock()? Why does it ever need to > be aborted? Why is its second argument an atomic_t instead of a normal > int? > It got that way when SDIO got in. It was needed to make it abortable to solve a rather nasty deadlock situation. I don't remember the details right now, but it should be in the LKML archives. > Why are mmc_detect() and related routines described in comments as > "Card detection callback from host"? They don't handle card > _detection_ -- they handle card _removal_. They handle both. > > What's the purpose of the card-counting loop in > host/omap.c:mmc_omap_switch_handler()? It looks like dead code. > I'm not too familiar with that driver, but they've been playing around with multiplexing several cards into one controller. Might be bits and pieces of that. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: using long instead of atomic_t when only set/read is required (was Re: [Bug 10030] Suspend doesn't work when SD card is inserted)
On Mon, 25 Feb 2008, Pavel Machek wrote: > Hi! > > Alan thinks that `subj` is correct... More precisely, reads and writes of pointers are always atomic. That is, if a write and a read occur concurrently, it is guaranteed that the read will obtain either the old or the new value of the pointer, never a mish-mash of the two. If this were not so then RCU wouldn't work. > I guess it only works as long as longs are aligned? Should it be > written down to atomic_ops.txt? Forget it, I'm going to withdraw the entire thing. The whole approach is wrong. More details in a new email thread, to follow soon. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Monday, 25 of February 2008, Zdenek Kabelac wrote: > Hi Hi, [CCs restored, added CC to Dave] > I'm finally going to test some kernel - because I'd been trying it > against the HEAD - but unfortunately it looks like there is something > seriously broken with -rc3 and sata merge - anyway - I'm going to make > a test with this head commit 85b80ebfa4384b8ea30cc1af9617db30319a9ccd > which should be the last one before merge of SATA. > > So I'm going to check this tree with you patch: > pm-remove-locking-from-core.patch If that's the one from http://marc.info/?l=linux-acpi=120389632114090=2 , please do it. > --- > drivers/base/core.c |8 --- > drivers/base/power/main.c | 97 > +++--- > drivers/usb/core/usb.c|2 > 3 files changed, 8 insertions(+), 99 deletions(-) > > Do you wan to test also the other patch ? Yes. Please also test the patch that Alan asked you to test here: http://lkml.org/lkml/2008/2/23/402 It's experimantal, but it is important to us to know if you see the symptoms (and which ones if you do) with this patch applied. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Monday, 25 of February 2008, Alan Stern wrote: > On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: > > > > Very subtly wrong ;-). > > > > > > imagine suspending_task == 0xabcdef01. Now task "R" with current == > > > 0xabcd reads suspending_task while the other cpu is writing to it, > > > and sees 0xabcd (0xef01 was not yet written) -- and mistakenly > > > believes that "R" == suspending_task. > > > > > > I agree it is very unlikely, and it will not happen on i386. But what > > > about just using atomic_t suspending_task, and store current->pid into > > > it? > > > > I'd rather use a lock, frankly. For example, we can require the readers to > > take pm_sleep_rwsem for reading in order to access that. > > That certainly won't work. Imagine what would happen when the reader > _was_ the suspending task. Yeah, I've already realized it was a stupid idea. :-) > But if you really twist my arm, I'll go along with Pavel's suggestion. So can you do that, pretty please? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Monday, 25 of February 2008, Alan Stern wrote: > On Sun, 24 Feb 2008, Pavel Machek wrote: > > > > > At the very least, you'd need rmb() before reading it and wmb() after > > > > writing to it, but I'm not sure if that's enough on every obscure > > > > architecture out there. > > > > > > No, neither one is needed because of the way suspending_task is used. > > > > > > It's not necessary for a reader R to see the variable's actual value; > > > all R needs to know is whether or not suspending_task is equal to R. > > > Since the only process which can set suspending_task to R is R itself, > > > and since R will set suspending_task back to NULL before releasing the > > > write lock on pm_sleep_rwsem, there's never any ambiguity. > > > > Subtle. > > > > Very subtly wrong ;-). > > > > imagine suspending_task == 0xabcdef01. Now task "R" with current == > > 0xabcd reads suspending_task while the other cpu is writing to it, > > and sees 0xabcd (0xef01 was not yet written) -- and mistakenly > > believes that "R" == suspending_task. > > I always thought that reads and writes of pointers are atomic, just > like reads and writes of longs. Is that wrong? That depends on the architecture. It may be wrong on alpha, IIRC. > Now if pointers were the same size as long long then I would agree with > you. That certainly is true on x86-64. On alpha probably too. > > I agree it is very unlikely, and it will not happen on i386. But what > > about just using atomic_t suspending_task, and store current->pid into > > it? > > That would work just as well. Except that it wouldn't need to be > atomic_t, because current->pid is always an integer (not guaranteed, I > suppose, but that's what it is on all current architectures) and > reads/writes of ints _are_ atomic. That also depends on the arch, I'm afraid, and in general if we assume something to be atomic, it's better to make the code reflect that assumption. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
using long instead of atomic_t when only set/read is required (was Re: [Bug 10030] Suspend doesn't work when SD card is inserted)
Hi! Alan thinks that `subj` is correct... > > > > At the very least, you'd need rmb() before reading it and wmb() after > > > > writing to it, but I'm not sure if that's enough on every obscure > > > > architecture out there. > > > > > > No, neither one is needed because of the way suspending_task is used. > > > > > > It's not necessary for a reader R to see the variable's actual value; > > > all R needs to know is whether or not suspending_task is equal to R. > > > Since the only process which can set suspending_task to R is R itself, > > > and since R will set suspending_task back to NULL before releasing the > > > write lock on pm_sleep_rwsem, there's never any ambiguity. > > > > Subtle. > > > > Very subtly wrong ;-). > > > > imagine suspending_task == 0xabcdef01. Now task "R" with current == > > 0xabcd reads suspending_task while the other cpu is writing to it, > > and sees 0xabcd (0xef01 was not yet written) -- and mistakenly > > believes that "R" == suspending_task. > > I always thought that reads and writes of pointers are atomic, just > like reads and writes of longs. Is that wrong? ...but I'm not that sure. Can someone clarify? I guess it only works as long as longs are aligned? Should it be written down to atomic_ops.txt? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Monday, 25 of February 2008, Alan Stern wrote: On Sun, 24 Feb 2008, Pavel Machek wrote: At the very least, you'd need rmb() before reading it and wmb() after writing to it, but I'm not sure if that's enough on every obscure architecture out there. No, neither one is needed because of the way suspending_task is used. It's not necessary for a reader R to see the variable's actual value; all R needs to know is whether or not suspending_task is equal to R. Since the only process which can set suspending_task to R is R itself, and since R will set suspending_task back to NULL before releasing the write lock on pm_sleep_rwsem, there's never any ambiguity. Subtle. Very subtly wrong ;-). imagine suspending_task == 0xabcdef01. Now task R with current == 0xabcd reads suspending_task while the other cpu is writing to it, and sees 0xabcd (0xef01 was not yet written) -- and mistakenly believes that R == suspending_task. I always thought that reads and writes of pointers are atomic, just like reads and writes of longs. Is that wrong? That depends on the architecture. It may be wrong on alpha, IIRC. Now if pointers were the same size as long long then I would agree with you. That certainly is true on x86-64. On alpha probably too. I agree it is very unlikely, and it will not happen on i386. But what about just using atomic_t suspending_task, and store current-pid into it? That would work just as well. Except that it wouldn't need to be atomic_t, because current-pid is always an integer (not guaranteed, I suppose, but that's what it is on all current architectures) and reads/writes of ints _are_ atomic. That also depends on the arch, I'm afraid, and in general if we assume something to be atomic, it's better to make the code reflect that assumption. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Monday, 25 of February 2008, Alan Stern wrote: On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: Very subtly wrong ;-). imagine suspending_task == 0xabcdef01. Now task R with current == 0xabcd reads suspending_task while the other cpu is writing to it, and sees 0xabcd (0xef01 was not yet written) -- and mistakenly believes that R == suspending_task. I agree it is very unlikely, and it will not happen on i386. But what about just using atomic_t suspending_task, and store current-pid into it? I'd rather use a lock, frankly. For example, we can require the readers to take pm_sleep_rwsem for reading in order to access that. That certainly won't work. Imagine what would happen when the reader _was_ the suspending task. Yeah, I've already realized it was a stupid idea. :-) But if you really twist my arm, I'll go along with Pavel's suggestion. So can you do that, pretty please? Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
using long instead of atomic_t when only set/read is required (was Re: [Bug 10030] Suspend doesn't work when SD card is inserted)
Hi! Alan thinks that `subj` is correct... At the very least, you'd need rmb() before reading it and wmb() after writing to it, but I'm not sure if that's enough on every obscure architecture out there. No, neither one is needed because of the way suspending_task is used. It's not necessary for a reader R to see the variable's actual value; all R needs to know is whether or not suspending_task is equal to R. Since the only process which can set suspending_task to R is R itself, and since R will set suspending_task back to NULL before releasing the write lock on pm_sleep_rwsem, there's never any ambiguity. Subtle. Very subtly wrong ;-). imagine suspending_task == 0xabcdef01. Now task R with current == 0xabcd reads suspending_task while the other cpu is writing to it, and sees 0xabcd (0xef01 was not yet written) -- and mistakenly believes that R == suspending_task. I always thought that reads and writes of pointers are atomic, just like reads and writes of longs. Is that wrong? ...but I'm not that sure. Can someone clarify? I guess it only works as long as longs are aligned? Should it be written down to atomic_ops.txt? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Monday, 25 of February 2008, Zdenek Kabelac wrote: Hi Hi, [CCs restored, added CC to Dave] I'm finally going to test some kernel - because I'd been trying it against the HEAD - but unfortunately it looks like there is something seriously broken with -rc3 and sata merge - anyway - I'm going to make a test with this head commit 85b80ebfa4384b8ea30cc1af9617db30319a9ccd which should be the last one before merge of SATA. So I'm going to check this tree with you patch: pm-remove-locking-from-core.patch If that's the one from http://marc.info/?l=linux-acpim=120389632114090w=2 , please do it. --- drivers/base/core.c |8 --- drivers/base/power/main.c | 97 +++--- drivers/usb/core/usb.c|2 3 files changed, 8 insertions(+), 99 deletions(-) Do you wan to test also the other patch ? Yes. Please also test the patch that Alan asked you to test here: http://lkml.org/lkml/2008/2/23/402 It's experimantal, but it is important to us to know if you see the symptoms (and which ones if you do) with this patch applied. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: using long instead of atomic_t when only set/read is required (was Re: [Bug 10030] Suspend doesn't work when SD card is inserted)
On Mon, 25 Feb 2008, Pavel Machek wrote: Hi! Alan thinks that `subj` is correct... More precisely, reads and writes of pointers are always atomic. That is, if a write and a read occur concurrently, it is guaranteed that the read will obtain either the old or the new value of the pointer, never a mish-mash of the two. If this were not so then RCU wouldn't work. I guess it only works as long as longs are aligned? Should it be written down to atomic_ops.txt? Forget it, I'm going to withdraw the entire thing. The whole approach is wrong. More details in a new email thread, to follow soon. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]
On Sun, 24 Feb 2008 10:33:34 -0500 (EST) Alan Stern [EMAIL PROTECTED] wrote: Well, the patch itself isn't really adequate; it was just a first pass intended to illustrate the nature of the problem. The problems in the MMC subsystem go deeper. The first is the way it uses flush_workqueue(). This is almost always a bad function to call, because it introduces unnecessary dependencies. cancel_delayed_work_sync() is much better. But even changing that won't solve the second issue, which is a genuine bug. There is a race between detect events and suspend events. The mmc_suspend_host() routine starts out by flushing the kmmcd workqueue before calling the host's suspend routine. So what happens if another detect event occurs in between? The idea is that host drivers shouldn't do that. Once they've called mmc_suspend_host(), then they shouldn't be poking the MMC core in any other way. None of this is of course properly documented. :/ This whole area of synchronization between card insertion, card removal, suspend, and resume needs to be thought out better. Especially keeping in mind that device registration or unregistration during a system sleep is likely to block until the sleep is over. Trying to keep up with the PM changes is a full time job. For now I've mostly ignored it as I don't even have time for the other stuff. Patches welcome. Lastly, there are some other questions about confusing aspects of the subsystem. What's the story with __mmc_claim_host()? Is it really nothing more than an abortable mutex_lock()? Why does it ever need to be aborted? Why is its second argument an atomic_t instead of a normal int? It got that way when SDIO got in. It was needed to make it abortable to solve a rather nasty deadlock situation. I don't remember the details right now, but it should be in the LKML archives. Why are mmc_detect() and related routines described in comments as Card detection callback from host? They don't handle card _detection_ -- they handle card _removal_. They handle both. What's the purpose of the card-counting loop in host/omap.c:mmc_omap_switch_handler()? It looks like dead code. I'm not too familiar with that driver, but they've been playing around with multiplexing several cards into one controller. Might be bits and pieces of that. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]
Thanks for the explanations. On Mon, 25 Feb 2008, Pierre Ossman wrote: Trying to keep up with the PM changes is a full time job. For now I've mostly ignored it as I don't even have time for the other stuff. Patches welcome. What do you think of the patch attached to comment #40 in the Bugzilla entry? Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]
On Mon, 25 Feb 2008 12:58:30 -0500 (EST) Alan Stern [EMAIL PROTECTED] wrote: Thanks for the explanations. On Mon, 25 Feb 2008, Pierre Ossman wrote: Trying to keep up with the PM changes is a full time job. For now I've mostly ignored it as I don't even have time for the other stuff. Patches welcome. What do you think of the patch attached to comment #40 in the Bugzilla entry? Looks ok. As long as those two synchronization points are guaranteed, then I'm happy. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]
On Mon, 25 Feb 2008, Pierre Ossman wrote: What do you think of the patch attached to comment #40 in the Bugzilla entry? Looks ok. As long as those two synchronization points are guaranteed, then I'm happy. Maybe a better approach would be to leave the workqueue unfreezable, and keep cancel_delayed_work_sync() in mmc_suspend_host(). It would then be necessary to add a test to verify, if there is a card attached, that the card is indeed suspended. After all, it's possible that the cancel_delayed_work_sync() ended up waiting for a job already running on the workqueue to register a new card! (The same would be true even with flush_scheduled_work.) Also, as a bit of defensive programming, it might be a good idea to add a suspended flag to the mmc_host structure. If mmc_rescan() sees that the flag is set then it should return immediately. This would protect against host drivers that aren't careful to disable detect IRQs before calling mmc_suspend_host(). Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]
snip What's the purpose of the card-counting loop in host/omap.c:mmc_omap_switch_handler()? It looks like dead code. I'm not too familiar with that driver, but they've been playing around with multiplexing several cards into one controller. Might be bits and pieces of that. AFAIK, that came after mmc multislot support. Omap2 has one mmc controller for several (2 in practice) mm slots (cards). It's not really a dead code since it's used in n800 and n810, it's in linux-omap archives and git tree. Carlos Aguiar, added in the loop, might have some comments about it. -- Best Regards, Felipe Balbi [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: > > Very subtly wrong ;-). > > > > imagine suspending_task == 0xabcdef01. Now task "R" with current == > > 0xabcd reads suspending_task while the other cpu is writing to it, > > and sees 0xabcd (0xef01 was not yet written) -- and mistakenly > > believes that "R" == suspending_task. > > > > I agree it is very unlikely, and it will not happen on i386. But what > > about just using atomic_t suspending_task, and store current->pid into > > it? > > I'd rather use a lock, frankly. For example, we can require the readers to > take pm_sleep_rwsem for reading in order to access that. That certainly won't work. Imagine what would happen when the reader _was_ the suspending task. But if you really twist my arm, I'll go along with Pavel's suggestion. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Pavel Machek wrote: > > > At the very least, you'd need rmb() before reading it and wmb() after > > > writing to it, but I'm not sure if that's enough on every obscure > > > architecture out there. > > > > No, neither one is needed because of the way suspending_task is used. > > > > It's not necessary for a reader R to see the variable's actual value; > > all R needs to know is whether or not suspending_task is equal to R. > > Since the only process which can set suspending_task to R is R itself, > > and since R will set suspending_task back to NULL before releasing the > > write lock on pm_sleep_rwsem, there's never any ambiguity. > > Subtle. > > Very subtly wrong ;-). > > imagine suspending_task == 0xabcdef01. Now task "R" with current == > 0xabcd reads suspending_task while the other cpu is writing to it, > and sees 0xabcd (0xef01 was not yet written) -- and mistakenly > believes that "R" == suspending_task. I always thought that reads and writes of pointers are atomic, just like reads and writes of longs. Is that wrong? Now if pointers were the same size as long long then I would agree with you. > I agree it is very unlikely, and it will not happen on i386. But what > about just using atomic_t suspending_task, and store current->pid into > it? That would work just as well. Except that it wouldn't need to be atomic_t, because current->pid is always an integer (not guaranteed, I suppose, but that's what it is on all current architectures) and reads/writes of ints _are_ atomic. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Pavel Machek wrote: > On Sun 2008-02-24 15:33:01, Alan Stern wrote: > > On Sun, 24 Feb 2008, Pavel Machek wrote: > > > > > > > What locking protects this variable? What happens when suspending_task > > > > > exits? (Hmm, that would probably be bug, anyway?) > > > > > > > > It's protected by whatever existing locking scheme allows only one > > > > task to start a system sleep at a time. For example, the suspending > > > > task has to get a write lock on pm_sleep_rwsem. > > > > > > And readers of suspending_task are protected by? > > > > I added a comment about that too. > > > > > At the very least, you'd need rmb() before reading it and wmb() after > > > writing to it, but I'm not sure if that's enough on every obscure > > > architecture out there. > > > > No, neither one is needed because of the way suspending_task is used. > > > > It's not necessary for a reader R to see the variable's actual value; > > all R needs to know is whether or not suspending_task is equal to R. > > Since the only process which can set suspending_task to R is R itself, > > and since R will set suspending_task back to NULL before releasing the > > write lock on pm_sleep_rwsem, there's never any ambiguity. > > Subtle. > > Very subtly wrong ;-). > > imagine suspending_task == 0xabcdef01. Now task "R" with current == > 0xabcd reads suspending_task while the other cpu is writing to it, > and sees 0xabcd (0xef01 was not yet written) -- and mistakenly > believes that "R" == suspending_task. > > I agree it is very unlikely, and it will not happen on i386. But what > about just using atomic_t suspending_task, and store current->pid into > it? I'd rather use a lock, frankly. For example, we can require the readers to take pm_sleep_rwsem for reading in order to access that. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Alan Stern wrote: > On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: > > > From: Rafael J. Wysocki <[EMAIL PROTECTED]> > > > > Remove the code that acquires all device semaphores from the suspend > > code path as it causes multiple problems to appear (most notably, > > http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the > > change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f > > depending on the code being removed. > > > > Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid > > the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874. > > This looks okay. > > Are you going to submit it just for 2.6.25, leaving the existing code > as-is for 2.6.26? No, I'd like to start over on top of 2.6.25. That will be much cleaner from the upstream point of view. We seem to have all the pieces anyway, so that's only a matter of putting them back together. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun 2008-02-24 15:33:01, Alan Stern wrote: > On Sun, 24 Feb 2008, Pavel Machek wrote: > > > > > What locking protects this variable? What happens when suspending_task > > > > exits? (Hmm, that would probably be bug, anyway?) > > > > > > It's protected by whatever existing locking scheme allows only one > > > task to start a system sleep at a time. For example, the suspending > > > task has to get a write lock on pm_sleep_rwsem. > > > > And readers of suspending_task are protected by? > > I added a comment about that too. > > > At the very least, you'd need rmb() before reading it and wmb() after > > writing to it, but I'm not sure if that's enough on every obscure > > architecture out there. > > No, neither one is needed because of the way suspending_task is used. > > It's not necessary for a reader R to see the variable's actual value; > all R needs to know is whether or not suspending_task is equal to R. > Since the only process which can set suspending_task to R is R itself, > and since R will set suspending_task back to NULL before releasing the > write lock on pm_sleep_rwsem, there's never any ambiguity. Subtle. Very subtly wrong ;-). imagine suspending_task == 0xabcdef01. Now task "R" with current == 0xabcd reads suspending_task while the other cpu is writing to it, and sees 0xabcd (0xef01 was not yet written) -- and mistakenly believes that "R" == suspending_task. I agree it is very unlikely, and it will not happen on i386. But what about just using atomic_t suspending_task, and store current->pid into it? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <[EMAIL PROTECTED]> > > Remove the code that acquires all device semaphores from the suspend > code path as it causes multiple problems to appear (most notably, > http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the > change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f > depending on the code being removed. > > Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid > the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874. This looks okay. Are you going to submit it just for 2.6.25, leaving the existing code as-is for 2.6.26? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Alan Stern wrote: > On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: > > > From: Rafael J. Wysocki <[EMAIL PROTECTED]> > > > > Remove the code that acquires all device semaphores from the suspend > > code path as it causes multiple problems to appear (most notably, > > http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the > > change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f > > depending on the code being removed. > > > > Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid > > the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874. > > Do you actually know whether removing that lock fixes the reported bug? In fact I'm waiting for the confirmation, but given the symptoms described by Lukas I'm quite confident that the problem shouldn't be reproducible with the lock removed. > I agree, the lock should be removed for now. But I'd sure like to get > some feedback about what's going wrong. No question about that. > It's starting to look as though we'd be a lot better off blocking device > registration during sleep instead of failing it. Agreed. > Shouldn't resume_device() and suspend_device() now acquire the device > semaphore before calling the various methods? That's the way they > used to work. Yes, thanks for reminding me about that. Updated patch follows. Thanks, Rafael --- From: Rafael J. Wysocki <[EMAIL PROTECTED]> Remove the code that acquires all device semaphores from the suspend code path as it causes multiple problems to appear (most notably, http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f depending on the code being removed. Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874. Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> --- drivers/base/core.c |8 --- drivers/base/power/main.c | 106 ++ drivers/usb/core/usb.c|2 3 files changed, 17 insertions(+), 99 deletions(-) Index: linux-2.6/drivers/base/power/main.c === --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -48,7 +48,6 @@ */ LIST_HEAD(dpm_active); -static LIST_HEAD(dpm_locked); static LIST_HEAD(dpm_off); static LIST_HEAD(dpm_off_irq); static LIST_HEAD(dpm_destroy); @@ -81,28 +80,6 @@ void device_pm_add(struct device *dev) */ void device_pm_remove(struct device *dev) { - /* -* If this function is called during a suspend, it will be blocked, -* because we're holding the device's semaphore at that time, which may -* lead to a deadlock. In that case we want to print a warning. -* However, it may also be called by unregister_dropped_devices() with -* the device's semaphore released, in which case the warning should -* not be printed. -*/ - if (down_trylock(>sem)) { - if (down_read_trylock(_sleep_rwsem)) { - /* No suspend in progress, wait on dev->sem */ - down(>sem); - up_read(_sleep_rwsem); - } else { - /* Suspend in progress, we may deadlock */ - dev_warn(dev, "Suspicious %s during suspend\n", - __FUNCTION__); - dump_stack(); - /* The user has been warned ... */ - down(>sem); - } - } pr_debug("PM: Removing info for %s:%s\n", dev->bus ? dev->bus->name : "No Bus", kobject_name(>kobj)); @@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev dpm_sysfs_remove(dev); list_del_init(>power.entry); mutex_unlock(_list_mtx); - up(>sem); } /** @@ -230,6 +206,8 @@ static int resume_device(struct device * TRACE_DEVICE(dev); TRACE_RESUME(0); + down(>sem); + if (dev->bus && dev->bus->resume) { dev_dbg(dev,"resuming\n"); error = dev->bus->resume(dev); @@ -245,6 +223,8 @@ static int resume_device(struct device * error = dev->class->resume(dev); } + up(>sem); + TRACE_RESUME(error); return error; } @@ -266,7 +246,7 @@ static void dpm_resume(void) struct list_head *entry = dpm_off.next; struct device *dev = to_device(entry); - list_move_tail(entry, _locked); + list_move_tail(entry, _active); mutex_unlock(_list_mtx); resume_device(dev); mutex_lock(_list_mtx); @@ -275,25 +255,6 @@ static void dpm_resume(void) } /** - * unlock_all_devices - Release each device's semaphore - * - * Go
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <[EMAIL PROTECTED]> > > Remove the code that acquires all device semaphores from the suspend > code path as it causes multiple problems to appear (most notably, > http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the > change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f > depending on the code being removed. > > Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid > the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874. Do you actually know whether removing that lock fixes the reported bug? I agree, the lock should be removed for now. But I'd sure like to get some feedback about what's going wrong. It's starting to look as though we'd be a lot better off blocking device registration during sleep instead of failing it. Shouldn't resume_device() and suspend_device() now acquire the device semaphore before calling the various methods? That's the way they used to work. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Pavel Machek wrote: > > > What locking protects this variable? What happens when suspending_task > > > exits? (Hmm, that would probably be bug, anyway?) > > > > It's protected by whatever existing locking scheme allows only one > > task to start a system sleep at a time. For example, the suspending > > task has to get a write lock on pm_sleep_rwsem. > > And readers of suspending_task are protected by? I added a comment about that too. > At the very least, you'd need rmb() before reading it and wmb() after > writing to it, but I'm not sure if that's enough on every obscure > architecture out there. No, neither one is needed because of the way suspending_task is used. It's not necessary for a reader R to see the variable's actual value; all R needs to know is whether or not suspending_task is equal to R. Since the only process which can set suspending_task to R is R itself, and since R will set suspending_task back to NULL before releasing the write lock on pm_sleep_rwsem, there's never any ambiguity. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Rafael J. Wysocki wrote: > On Sunday, 24 of February 2008, Alan Stern wrote: > > On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: > > > > > On Sunday, 24 of February 2008, Alan Stern wrote: > > > > On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: > > > > > > > > > Ultimately no, it's not. However, we are now late in the -rc2 time > > > > > frame and > > > > > the release of -rc3 seems to be imminent. At this point, IMO, that's > > > > > the > > > > > safest thing to do. BTW, appended is the patch I'd like to get > > > > > applied. > > > > > > > > In the interest of having a safe release, I guess you're right. > > > > > > That's what I'm concerned about at the moment. I'm afraid there won't be > > > enough time to nail down all the issues (there may be some we're not even > > > aware of). > > > > All right. You'll need to enlarge your big reversal patch by reverting > > commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f. > > Thanks for the hint. > > In fact, I had to add a couple of fixes in device_power_down() and > dpm_suspend(). The entire patch is appended. I hope it's fine by everyone to push the patch below to Greg for 2.6.25. Thanks, Rafael --- From: Rafael J. Wysocki <[EMAIL PROTECTED]> Remove the code that acquires all device semaphores from the suspend code path as it causes multiple problems to appear (most notably, http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f depending on the code being removed. Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874. Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> --- drivers/base/core.c |8 --- drivers/base/power/main.c | 97 +++--- drivers/usb/core/usb.c|2 3 files changed, 8 insertions(+), 99 deletions(-) Index: linux-2.6/drivers/base/power/main.c === --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -48,7 +48,6 @@ */ LIST_HEAD(dpm_active); -static LIST_HEAD(dpm_locked); static LIST_HEAD(dpm_off); static LIST_HEAD(dpm_off_irq); static LIST_HEAD(dpm_destroy); @@ -81,28 +80,6 @@ void device_pm_add(struct device *dev) */ void device_pm_remove(struct device *dev) { - /* -* If this function is called during a suspend, it will be blocked, -* because we're holding the device's semaphore at that time, which may -* lead to a deadlock. In that case we want to print a warning. -* However, it may also be called by unregister_dropped_devices() with -* the device's semaphore released, in which case the warning should -* not be printed. -*/ - if (down_trylock(>sem)) { - if (down_read_trylock(_sleep_rwsem)) { - /* No suspend in progress, wait on dev->sem */ - down(>sem); - up_read(_sleep_rwsem); - } else { - /* Suspend in progress, we may deadlock */ - dev_warn(dev, "Suspicious %s during suspend\n", - __FUNCTION__); - dump_stack(); - /* The user has been warned ... */ - down(>sem); - } - } pr_debug("PM: Removing info for %s:%s\n", dev->bus ? dev->bus->name : "No Bus", kobject_name(>kobj)); @@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev dpm_sysfs_remove(dev); list_del_init(>power.entry); mutex_unlock(_list_mtx); - up(>sem); } /** @@ -266,7 +242,7 @@ static void dpm_resume(void) struct list_head *entry = dpm_off.next; struct device *dev = to_device(entry); - list_move_tail(entry, _locked); + list_move_tail(entry, _active); mutex_unlock(_list_mtx); resume_device(dev); mutex_lock(_list_mtx); @@ -275,25 +251,6 @@ static void dpm_resume(void) } /** - * unlock_all_devices - Release each device's semaphore - * - * Go through the dpm_off list. Put each device on the dpm_active - * list and unlock it. - */ -static void unlock_all_devices(void) -{ - mutex_lock(_list_mtx); - while (!list_empty(_locked)) { - struct list_head *entry = dpm_locked.prev; - struct device *dev = to_device(entry); - - list_move(entry, _active); - up(>sem); - } - mutex_unlock(_list_mtx); -} - -/** * unregister_dropped_devices - Unregister devices scheduled for removal * * Unregister all devices on the dpm_destroy list. @@ -305,7 +262,6 @@ static void unregister_dropped_devices(v
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Zdenek Kabelac wrote: > 2008/2/24, Alan Stern <[EMAIL PROTECTED]>: > > On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: > > > > > > > > You know, with this new patch we probably don't need > > > > device_pm_schedule_removal() any more. > > > > > > No, we don't. However, because of that dpm_suspend() and > > device_power_down() > > > need to be changed not to assume that the removed devices will end up on > > > dpm_destroy. > > > > > > with analogous changes to dpm_suspend(). > > > > Any more suggestions for updates? > > > > Is there any current patch I should actually test - it looks there is > ongoing discussion from the time Rafael has proposed his last patch to > check by me: http://bugzilla.kernel.org/attachment.cgi?id=14961=view > > But I think it's visible the only suspend update will not resolve my > problems and the problems is deeper in the mmc. Yes, there are problems in there, but your feedback is needed nevertheless. > So should I still test yesterdays' patch ? Yes please. It will be valuable information to us whether it fixes the suspend issue at least. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
Hi! > > > @@ -25,6 +25,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > > > > #include "../base.h" > > > #include "power.h" > > > @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem); > > > > > > int (*platform_enable_wakeup)(struct device *dev, int is_on); > > > > > > +static struct task_struct *suspending_task; > > > > What locking protects this variable? What happens when suspending_task > > exits? (Hmm, that would probably be bug, anyway?) > > It's protected by whatever existing locking scheme allows only one > task to start a system sleep at a time. For example, the suspending > task has to get a write lock on pm_sleep_rwsem. And readers of suspending_task are protected by? At the very least, you'd need rmb() before reading it and wmb() after writing to it, but I'm not sure if that's enough on every obscure architecture out there. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
2008/2/24, Alan Stern <[EMAIL PROTECTED]>: > On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: > > > > > You know, with this new patch we probably don't need > > > device_pm_schedule_removal() any more. > > > > No, we don't. However, because of that dpm_suspend() and > device_power_down() > > need to be changed not to assume that the removed devices will end up on > > dpm_destroy. > > > with analogous changes to dpm_suspend(). > > Any more suggestions for updates? > Is there any current patch I should actually test - it looks there is ongoing discussion from the time Rafael has proposed his last patch to check by me: http://bugzilla.kernel.org/attachment.cgi?id=14961=view But I think it's visible the only suspend update will not resolve my problems and the problems is deeper in the mmc. So should I still test yesterdays' patch ? Zdenek Zdenek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: > > You know, with this new patch we probably don't need > > device_pm_schedule_removal() any more. > > No, we don't. However, because of that dpm_suspend() and device_power_down() > need to be changed not to assume that the removed devices will end up on > dpm_destroy. Yes. The safest approach will be to check whether the device is still at the end of the dpm_locked or dpm_off list; if it hasn't then we can assume that it has been removed (or scheduled for removal). > IMO you also should add this change in device_power_down(): > > @@ -388,18 +343,15 @@ int device_power_down(pm_message_t state > struct list_head *entry = dpm_off.prev; > struct device *dev = to_device(entry); > > - list_del_init(>power.entry); > error = suspend_device_late(dev, state); > if (error) { > printk(KERN_ERR "Could not power down device %s: " > "error %d\n", > kobject_name(>kobj), error); > - if (list_empty(>power.entry)) > - list_add(>power.entry, _off); > break; > } > - if (list_empty(>power.entry)) > - list_add(>power.entry, _off_irq); > + if (!list_empty(>power.entry)) > + list_move(>power.entry, _off_irq); > } Actually I changed the last two lines to: + + /* Don't move the entry if the device has been unregistered */ + if (entry == dpm_off.prev) + list_move(entry, _off_irq); with analogous changes to dpm_suspend(). Any more suggestions for updates? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Pavel Machek wrote: > Hi! > > > Index: usb-2.6/drivers/base/power/main.c > > === > > --- usb-2.6.orig/drivers/base/power/main.c > > +++ usb-2.6/drivers/base/power/main.c > > @@ -25,6 +25,7 @@ > > #include > > #include > > #include > > +#include > > > > #include "../base.h" > > #include "power.h" > > @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem); > > > > int (*platform_enable_wakeup)(struct device *dev, int is_on); > > > > +static struct task_struct *suspending_task; > > What locking protects this variable? What happens when suspending_task > exits? (Hmm, that would probably be bug, anyway?) It's protected by whatever existing locking scheme allows only one task to start a system sleep at a time. For example, the suspending task has to get a write lock on pm_sleep_rwsem. Yes, if the suspending task exits before the system has woken up, you're in trouble regardless. > Or are we running UP when this is accessed? This at least needs a big > fat comment. > > > +bool in_suspend_context(void) > > +{ > > + return (suspending_task == current); > > +} We aren't necessarily UP. But since all that matters is whether or not suspend_task is equal to the current task, no extra locking is needed. I'll add a comment explaining all this. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
Hi! > Index: usb-2.6/drivers/base/power/main.c > === > --- usb-2.6.orig/drivers/base/power/main.c > +++ usb-2.6/drivers/base/power/main.c > @@ -25,6 +25,7 @@ > #include > #include > #include > +#include > > #include "../base.h" > #include "power.h" > @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem); > > int (*platform_enable_wakeup)(struct device *dev, int is_on); > > +static struct task_struct *suspending_task; What locking protects this variable? What happens when suspending_task exits? (Hmm, that would probably be bug, anyway?) Or are we running UP when this is accessed? This at least needs a big fat comment. > +bool in_suspend_context(void) > +{ > + return (suspending_task == current); > +} Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: > > In fact this turns out to be exactly the problem with the MMC > > subsystem. It uses a dedicated workqueue (kmmcd) to handle state > > changes, and mmc_suspend_host() does a flush_workqueue(). If the > > workqueue contains an entry to register or unregister a device, this is > > guaranteed to deadlock -- and that's exactly what happens when Zdenek > > inserts or removes a card during the disk-drive spindown time of a > > system sleep. > > > > It looks like the best solution is for mmc_suspend_host() not to flush > > the workqueue; instead the workqueue should prevent itself from > > running during a suspend. Right now the easiest way to accomplish this > > is for the workqueue routines (mmc_detect() and siblings) to acquire > > the mmc_host's device semaphore. If in the future the MMC subsystem > > adds dynamic PM support then a more complicated arrangement will be > > needed. > > > > So the patch below, in combination with the previous patch, might just > > fix the whole problem. > > The patch looks sane and I think you're right. Well, the patch itself isn't really adequate; it was just a first pass intended to illustrate the nature of the problem. The problems in the MMC subsystem go deeper. The first is the way it uses flush_workqueue(). This is almost always a bad function to call, because it introduces unnecessary dependencies. cancel_delayed_work_sync() is much better. But even changing that won't solve the second issue, which is a genuine bug. There is a race between detect events and suspend events. The mmc_suspend_host() routine starts out by flushing the kmmcd workqueue before calling the host's suspend routine. So what happens if another detect event occurs in between? This whole area of synchronization between card insertion, card removal, suspend, and resume needs to be thought out better. Especially keeping in mind that device registration or unregistration during a system sleep is likely to block until the sleep is over. Lastly, there are some other questions about confusing aspects of the subsystem. What's the story with __mmc_claim_host()? Is it really nothing more than an abortable mutex_lock()? Why does it ever need to be aborted? Why is its second argument an atomic_t instead of a normal int? Why are mmc_detect() and related routines described in comments as "Card detection callback from host"? They don't handle card _detection_ -- they handle card _removal_. What's the purpose of the card-counting loop in host/omap.c:mmc_omap_switch_handler()? It looks like dead code. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Alan Stern wrote: > On Sat, 23 Feb 2008, Alan Stern wrote: > > > It happened in a workqueue. There could be lots of similar cases: Some > > interrupt-driven event causes a hotplug action. Since the action can't > > be carried out in interrupt context, the driver has no choice but to > > defer it to a workqueue or kernel thread. Blocking that workqueue or > > kernel thread won't cause a problem _unless_ the driver's > > suspend/resume methods try to synchronize with it. That's the > > difficult case. > > In fact this turns out to be exactly the problem with the MMC > subsystem. It uses a dedicated workqueue (kmmcd) to handle state > changes, and mmc_suspend_host() does a flush_workqueue(). If the > workqueue contains an entry to register or unregister a device, this is > guaranteed to deadlock -- and that's exactly what happens when Zdenek > inserts or removes a card during the disk-drive spindown time of a > system sleep. > > It looks like the best solution is for mmc_suspend_host() not to flush > the workqueue; instead the workqueue should prevent itself from > running during a suspend. Right now the easiest way to accomplish this > is for the workqueue routines (mmc_detect() and siblings) to acquire > the mmc_host's device semaphore. If in the future the MMC subsystem > adds dynamic PM support then a more complicated arrangement will be > needed. > > So the patch below, in combination with the previous patch, might just > fix the whole problem. The patch looks sane and I think you're right. Still, it depends on us holding the device semaphores (if I understand it correctly). For this reason, I'd prefer to include it into the patch that adds device locking to the PM core. I think we should first remove the device locking from drivers/base/power/main.c (that's the patch I'd like to push upstream now) and then create a patch that will add it back along with all of the necessary additional changes we know of (BTW, I've just received a message from another user affected by it). but we need to be careful with all the details. IMO, it would be Thanks, Rafael > Index: usb-2.6/drivers/mmc/core/core.c > === > --- usb-2.6.orig/drivers/mmc/core/core.c > +++ usb-2.6/drivers/mmc/core/core.c > @@ -731,8 +731,6 @@ void mmc_stop_host(struct mmc_host *host > */ > int mmc_suspend_host(struct mmc_host *host, pm_message_t state) > { > - mmc_flush_scheduled_work(); > - > mmc_bus_get(host); > if (host->bus_ops && !host->bus_dead) { > if (host->bus_ops->suspend) > Index: usb-2.6/drivers/mmc/core/mmc.c > === > --- usb-2.6.orig/drivers/mmc/core/mmc.c > +++ usb-2.6/drivers/mmc/core/mmc.c > @@ -441,6 +441,7 @@ static void mmc_detect(struct mmc_host * > BUG_ON(!host); > BUG_ON(!host->card); > > + down(_classdev(host)->sem); > mmc_claim_host(host); > > /* > @@ -449,6 +450,7 @@ static void mmc_detect(struct mmc_host * > err = mmc_send_status(host->card, NULL); > > mmc_release_host(host); > + up(_classdev(host)->sem); > > if (err) { > mmc_remove(host); > Index: usb-2.6/drivers/mmc/core/sd.c > === > --- usb-2.6.orig/drivers/mmc/core/sd.c > +++ usb-2.6/drivers/mmc/core/sd.c > @@ -500,6 +500,7 @@ static void mmc_sd_detect(struct mmc_hos > BUG_ON(!host); > BUG_ON(!host->card); > > + down(_classdev(host)->sem); > mmc_claim_host(host); > > /* > @@ -508,6 +509,7 @@ static void mmc_sd_detect(struct mmc_hos > err = mmc_send_status(host->card, NULL); > > mmc_release_host(host); > + up(_classdev(host)->sem); > > if (err) { > mmc_sd_remove(host); > Index: usb-2.6/drivers/mmc/core/sdio.c > === > --- usb-2.6.orig/drivers/mmc/core/sdio.c > +++ usb-2.6/drivers/mmc/core/sdio.c > @@ -195,6 +195,7 @@ static void mmc_sdio_detect(struct mmc_h > BUG_ON(!host); > BUG_ON(!host->card); > > + down(_classdev(host)->sem); > mmc_claim_host(host); > > /* > @@ -203,6 +204,7 @@ static void mmc_sdio_detect(struct mmc_h > err = mmc_select_card(host->card); > > mmc_release_host(host); > + up(_classdev(host)->sem); > > if (err) { > mmc_sdio_remove(host); > > > -- "Premature optimization is the root of all evil." - Donald Knuth -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Alan Stern wrote: > On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: > > On Sunday, 24 of February 2008, Alan Stern wrote: > > > On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: > > > > I still have no clear idea about what's going on with the ACPI bug in > > > #9874. > > > > It seems that the ACPI code is not prepared to cope with a failing device > > registration, in which case it'd need fixing. Frankly, I'm afraid there may > > be more issues like this throughout the tree. > > The fact remains that it is unsafe to register a device during a sleep > transition, although if the device's driver doesn't have a suspend or > resume method you can probably get away with it. > > > > However perhaps the bug wouldn't occur if we blocked device > > > registration until after the resume was finished, instead of failing it > > > outright. Since the registration in this case was done in a workqueue > > > thread, blocking wouldn't cause an immediate deadlock. > > > > No, it wouldn't, but the fact that it happens in the ACPI code makes me > > worry. > > It happened in a workqueue. There could be lots of similar cases: Some > interrupt-driven event causes a hotplug action. Since the action can't > be carried out in interrupt context, the driver has no choice but to > defer it to a workqueue or kernel thread. Blocking that workqueue or > kernel thread won't cause a problem _unless_ the driver's > suspend/resume methods try to synchronize with it. That's the > difficult case. Yes. > > If we block that code and the things it's supposed to do turn out to be > > necessary for suspending later on, we'll be in trouble. > > Exactly. > > > > @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev > > > /* No suspend in progress, wait on dev->sem */ > > > down(>sem); > > > up_read(_sleep_rwsem); > > > - } else { > > > - /* Suspend in progress, we may deadlock */ > > > + } else if (!in_suspend_context()) { > > > + /* Suspending in another task, we may deadlock */ > > > > Well, that shouldn't really deadlock. If it does, there is a potential > > design > > issue somewhere. I think it might be better to set up a timer in here too. > > At this point the driver core owns the device semaphore, so the > unregistration task will block until the sleep is over. If the > driver's resume method has to synchronize with the unregistration task > then it will deadlock. I agree, it would be a design issue. In fact, > it's the same design issue described earlier in this email. > > > Although IMO it would be even better to just set up a timer and remove the > > warning altogehter. > > That warning was just for debugging purposes. A timer in the resume > path could do the same thing with fewer false alarms, just like the > timer in the suspend path. I have added it to the new patch. OK > > > dev_warn(dev, "Suspicious %s during suspend\n", > > > __FUNCTION__); > > > dump_stack(); > > > /* The user has been warned ... */ > > > down(>sem); > > > } > > > + /* Otherwise we're okay */ > > > } > > > pr_debug("PM: Removing info for %s:%s\n", > > >dev->bus ? dev->bus->name : "No Bus", > > > > There's a problem here, because we shouldn't release the semaphore if we're > > in the suspend context. > > Yes, you're right. It's fixed in the new version. > > > > +static void suspend_timeout(unsigned long _dev) > > > +{ > > > + struct device *dev = (struct device *) _dev; > > > + > > > + dev_err(dev, "deadlock during suspend!\n"); > > > + show_state(); > > > > The show_state() seems to be overkill and won't really help if the user > > can't > > collect the output on the fly using a serial console or something like this. > > [The debug stuff printed here should fit a typical laptop screen, ideally.] > > Changed. > > > I'd prefer > > > > if (in_suspend_context()) { > > __device_release_driver(dev); > > } else { > > down(>sem); > > __device_release_driver(dev); > > up(>sem); > > } > > Okay. > > You know, with this new patch we probably don't need > device_pm_schedule_removal() any more. No, we don't. However, because of that dpm_suspend() and device_power_down() need to be changed not to assume that the removed devices will end up on dpm_destroy. > However I have left it in for now. Well, it's used by the b43, leds and hwrng drivers as well as by some CPU hotplug notifiers. They all will have to be changed along with removing it. > > Well, I'd really feel much more comfortable if we removed the troublesome > > code > > for 2.6.25 and reintroduced it along with the above safeguards for 2.6.26. > > > > Of course, this doesn't mean we can't send debug patches for testing to
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Alan Stern wrote: > On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: > > > On Sunday, 24 of February 2008, Alan Stern wrote: > > > On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: > > > > > > > Ultimately no, it's not. However, we are now late in the -rc2 time > > > > frame and > > > > the release of -rc3 seems to be imminent. At this point, IMO, that's > > > > the > > > > safest thing to do. BTW, appended is the patch I'd like to get applied. > > > > > > In the interest of having a safe release, I guess you're right. > > > > That's what I'm concerned about at the moment. I'm afraid there won't be > > enough time to nail down all the issues (there may be some we're not even > > aware of). > > All right. You'll need to enlarge your big reversal patch by reverting > commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f. Thanks for the hint. In fact, I had to add a couple of fixes in device_power_down() and dpm_suspend(). The entire patch is appended. I'll comment your new patch in a separate message. Thanks, Rafael --- drivers/base/power/main.c | 97 +++--- drivers/usb/core/usb.c|2 2 files changed, 8 insertions(+), 91 deletions(-) Index: linux-2.6/drivers/base/power/main.c === --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -48,7 +48,6 @@ */ LIST_HEAD(dpm_active); -static LIST_HEAD(dpm_locked); static LIST_HEAD(dpm_off); static LIST_HEAD(dpm_off_irq); static LIST_HEAD(dpm_destroy); @@ -81,28 +80,6 @@ void device_pm_add(struct device *dev) */ void device_pm_remove(struct device *dev) { - /* -* If this function is called during a suspend, it will be blocked, -* because we're holding the device's semaphore at that time, which may -* lead to a deadlock. In that case we want to print a warning. -* However, it may also be called by unregister_dropped_devices() with -* the device's semaphore released, in which case the warning should -* not be printed. -*/ - if (down_trylock(>sem)) { - if (down_read_trylock(_sleep_rwsem)) { - /* No suspend in progress, wait on dev->sem */ - down(>sem); - up_read(_sleep_rwsem); - } else { - /* Suspend in progress, we may deadlock */ - dev_warn(dev, "Suspicious %s during suspend\n", - __FUNCTION__); - dump_stack(); - /* The user has been warned ... */ - down(>sem); - } - } pr_debug("PM: Removing info for %s:%s\n", dev->bus ? dev->bus->name : "No Bus", kobject_name(>kobj)); @@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev dpm_sysfs_remove(dev); list_del_init(>power.entry); mutex_unlock(_list_mtx); - up(>sem); } /** @@ -266,7 +242,7 @@ static void dpm_resume(void) struct list_head *entry = dpm_off.next; struct device *dev = to_device(entry); - list_move_tail(entry, _locked); + list_move_tail(entry, _active); mutex_unlock(_list_mtx); resume_device(dev); mutex_lock(_list_mtx); @@ -275,25 +251,6 @@ static void dpm_resume(void) } /** - * unlock_all_devices - Release each device's semaphore - * - * Go through the dpm_off list. Put each device on the dpm_active - * list and unlock it. - */ -static void unlock_all_devices(void) -{ - mutex_lock(_list_mtx); - while (!list_empty(_locked)) { - struct list_head *entry = dpm_locked.prev; - struct device *dev = to_device(entry); - - list_move(entry, _active); - up(>sem); - } - mutex_unlock(_list_mtx); -} - -/** * unregister_dropped_devices - Unregister devices scheduled for removal * * Unregister all devices on the dpm_destroy list. @@ -305,7 +262,6 @@ static void unregister_dropped_devices(v struct list_head *entry = dpm_destroy.next; struct device *dev = to_device(entry); - up(>sem); mutex_unlock(_list_mtx); /* This also removes the device from the list */ device_unregister(dev); @@ -324,7 +280,6 @@ void device_resume(void) { might_sleep(); dpm_resume(); - unlock_all_devices(); unregister_dropped_devices(); up_write(_sleep_rwsem); } @@ -388,18 +343,15 @@ int device_power_down(pm_message_t state struct list_head *entry = dpm_off.prev; struct device *dev = to_device(entry); - list_del_init(>power.entry); error =
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Alan Stern wrote: On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: On Sunday, 24 of February 2008, Alan Stern wrote: On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: Ultimately no, it's not. However, we are now late in the -rc2 time frame and the release of -rc3 seems to be imminent. At this point, IMO, that's the safest thing to do. BTW, appended is the patch I'd like to get applied. In the interest of having a safe release, I guess you're right. That's what I'm concerned about at the moment. I'm afraid there won't be enough time to nail down all the issues (there may be some we're not even aware of). All right. You'll need to enlarge your big reversal patch by reverting commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f. Thanks for the hint. In fact, I had to add a couple of fixes in device_power_down() and dpm_suspend(). The entire patch is appended. I'll comment your new patch in a separate message. Thanks, Rafael --- drivers/base/power/main.c | 97 +++--- drivers/usb/core/usb.c|2 2 files changed, 8 insertions(+), 91 deletions(-) Index: linux-2.6/drivers/base/power/main.c === --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -48,7 +48,6 @@ */ LIST_HEAD(dpm_active); -static LIST_HEAD(dpm_locked); static LIST_HEAD(dpm_off); static LIST_HEAD(dpm_off_irq); static LIST_HEAD(dpm_destroy); @@ -81,28 +80,6 @@ void device_pm_add(struct device *dev) */ void device_pm_remove(struct device *dev) { - /* -* If this function is called during a suspend, it will be blocked, -* because we're holding the device's semaphore at that time, which may -* lead to a deadlock. In that case we want to print a warning. -* However, it may also be called by unregister_dropped_devices() with -* the device's semaphore released, in which case the warning should -* not be printed. -*/ - if (down_trylock(dev-sem)) { - if (down_read_trylock(pm_sleep_rwsem)) { - /* No suspend in progress, wait on dev-sem */ - down(dev-sem); - up_read(pm_sleep_rwsem); - } else { - /* Suspend in progress, we may deadlock */ - dev_warn(dev, Suspicious %s during suspend\n, - __FUNCTION__); - dump_stack(); - /* The user has been warned ... */ - down(dev-sem); - } - } pr_debug(PM: Removing info for %s:%s\n, dev-bus ? dev-bus-name : No Bus, kobject_name(dev-kobj)); @@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev dpm_sysfs_remove(dev); list_del_init(dev-power.entry); mutex_unlock(dpm_list_mtx); - up(dev-sem); } /** @@ -266,7 +242,7 @@ static void dpm_resume(void) struct list_head *entry = dpm_off.next; struct device *dev = to_device(entry); - list_move_tail(entry, dpm_locked); + list_move_tail(entry, dpm_active); mutex_unlock(dpm_list_mtx); resume_device(dev); mutex_lock(dpm_list_mtx); @@ -275,25 +251,6 @@ static void dpm_resume(void) } /** - * unlock_all_devices - Release each device's semaphore - * - * Go through the dpm_off list. Put each device on the dpm_active - * list and unlock it. - */ -static void unlock_all_devices(void) -{ - mutex_lock(dpm_list_mtx); - while (!list_empty(dpm_locked)) { - struct list_head *entry = dpm_locked.prev; - struct device *dev = to_device(entry); - - list_move(entry, dpm_active); - up(dev-sem); - } - mutex_unlock(dpm_list_mtx); -} - -/** * unregister_dropped_devices - Unregister devices scheduled for removal * * Unregister all devices on the dpm_destroy list. @@ -305,7 +262,6 @@ static void unregister_dropped_devices(v struct list_head *entry = dpm_destroy.next; struct device *dev = to_device(entry); - up(dev-sem); mutex_unlock(dpm_list_mtx); /* This also removes the device from the list */ device_unregister(dev); @@ -324,7 +280,6 @@ void device_resume(void) { might_sleep(); dpm_resume(); - unlock_all_devices(); unregister_dropped_devices(); up_write(pm_sleep_rwsem); } @@ -388,18 +343,15 @@ int device_power_down(pm_message_t state struct list_head *entry = dpm_off.prev; struct device *dev = to_device(entry); - list_del_init(dev-power.entry); error =
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Alan Stern wrote: On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: On Sunday, 24 of February 2008, Alan Stern wrote: On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: I still have no clear idea about what's going on with the ACPI bug in #9874. It seems that the ACPI code is not prepared to cope with a failing device registration, in which case it'd need fixing. Frankly, I'm afraid there may be more issues like this throughout the tree. The fact remains that it is unsafe to register a device during a sleep transition, although if the device's driver doesn't have a suspend or resume method you can probably get away with it. However perhaps the bug wouldn't occur if we blocked device registration until after the resume was finished, instead of failing it outright. Since the registration in this case was done in a workqueue thread, blocking wouldn't cause an immediate deadlock. No, it wouldn't, but the fact that it happens in the ACPI code makes me worry. It happened in a workqueue. There could be lots of similar cases: Some interrupt-driven event causes a hotplug action. Since the action can't be carried out in interrupt context, the driver has no choice but to defer it to a workqueue or kernel thread. Blocking that workqueue or kernel thread won't cause a problem _unless_ the driver's suspend/resume methods try to synchronize with it. That's the difficult case. Yes. If we block that code and the things it's supposed to do turn out to be necessary for suspending later on, we'll be in trouble. Exactly. @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev /* No suspend in progress, wait on dev-sem */ down(dev-sem); up_read(pm_sleep_rwsem); - } else { - /* Suspend in progress, we may deadlock */ + } else if (!in_suspend_context()) { + /* Suspending in another task, we may deadlock */ Well, that shouldn't really deadlock. If it does, there is a potential design issue somewhere. I think it might be better to set up a timer in here too. At this point the driver core owns the device semaphore, so the unregistration task will block until the sleep is over. If the driver's resume method has to synchronize with the unregistration task then it will deadlock. I agree, it would be a design issue. In fact, it's the same design issue described earlier in this email. Although IMO it would be even better to just set up a timer and remove the warning altogehter. That warning was just for debugging purposes. A timer in the resume path could do the same thing with fewer false alarms, just like the timer in the suspend path. I have added it to the new patch. OK dev_warn(dev, Suspicious %s during suspend\n, __FUNCTION__); dump_stack(); /* The user has been warned ... */ down(dev-sem); } + /* Otherwise we're okay */ } pr_debug(PM: Removing info for %s:%s\n, dev-bus ? dev-bus-name : No Bus, There's a problem here, because we shouldn't release the semaphore if we're in the suspend context. Yes, you're right. It's fixed in the new version. +static void suspend_timeout(unsigned long _dev) +{ + struct device *dev = (struct device *) _dev; + + dev_err(dev, deadlock during suspend!\n); + show_state(); The show_state() seems to be overkill and won't really help if the user can't collect the output on the fly using a serial console or something like this. [The debug stuff printed here should fit a typical laptop screen, ideally.] Changed. I'd prefer if (in_suspend_context()) { __device_release_driver(dev); } else { down(dev-sem); __device_release_driver(dev); up(dev-sem); } Okay. You know, with this new patch we probably don't need device_pm_schedule_removal() any more. No, we don't. However, because of that dpm_suspend() and device_power_down() need to be changed not to assume that the removed devices will end up on dpm_destroy. However I have left it in for now. Well, it's used by the b43, leds and hwrng drivers as well as by some CPU hotplug notifiers. They all will have to be changed along with removing it. Well, I'd really feel much more comfortable if we removed the troublesome code for 2.6.25 and reintroduced it along with the above safeguards for 2.6.26. Of course, this doesn't mean we can't send debug patches for testing to the users who have alraedy reported problems with it. :-) Certainly! Here's the new version. IMO you also should add this change in device_power_down(): @@ -388,18 +343,15 @@ int
Re: [linux-pm] [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Alan Stern wrote: On Sat, 23 Feb 2008, Alan Stern wrote: It happened in a workqueue. There could be lots of similar cases: Some interrupt-driven event causes a hotplug action. Since the action can't be carried out in interrupt context, the driver has no choice but to defer it to a workqueue or kernel thread. Blocking that workqueue or kernel thread won't cause a problem _unless_ the driver's suspend/resume methods try to synchronize with it. That's the difficult case. In fact this turns out to be exactly the problem with the MMC subsystem. It uses a dedicated workqueue (kmmcd) to handle state changes, and mmc_suspend_host() does a flush_workqueue(). If the workqueue contains an entry to register or unregister a device, this is guaranteed to deadlock -- and that's exactly what happens when Zdenek inserts or removes a card during the disk-drive spindown time of a system sleep. It looks like the best solution is for mmc_suspend_host() not to flush the workqueue; instead the workqueue should prevent itself from running during a suspend. Right now the easiest way to accomplish this is for the workqueue routines (mmc_detect() and siblings) to acquire the mmc_host's device semaphore. If in the future the MMC subsystem adds dynamic PM support then a more complicated arrangement will be needed. So the patch below, in combination with the previous patch, might just fix the whole problem. The patch looks sane and I think you're right. Still, it depends on us holding the device semaphores (if I understand it correctly). For this reason, I'd prefer to include it into the patch that adds device locking to the PM core. I think we should first remove the device locking from drivers/base/power/main.c (that's the patch I'd like to push upstream now) and then create a patch that will add it back along with all of the necessary additional changes we know of (BTW, I've just received a message from another user affected by it). but we need to be careful with all the details. IMO, it would be Thanks, Rafael Index: usb-2.6/drivers/mmc/core/core.c === --- usb-2.6.orig/drivers/mmc/core/core.c +++ usb-2.6/drivers/mmc/core/core.c @@ -731,8 +731,6 @@ void mmc_stop_host(struct mmc_host *host */ int mmc_suspend_host(struct mmc_host *host, pm_message_t state) { - mmc_flush_scheduled_work(); - mmc_bus_get(host); if (host-bus_ops !host-bus_dead) { if (host-bus_ops-suspend) Index: usb-2.6/drivers/mmc/core/mmc.c === --- usb-2.6.orig/drivers/mmc/core/mmc.c +++ usb-2.6/drivers/mmc/core/mmc.c @@ -441,6 +441,7 @@ static void mmc_detect(struct mmc_host * BUG_ON(!host); BUG_ON(!host-card); + down(mmc_classdev(host)-sem); mmc_claim_host(host); /* @@ -449,6 +450,7 @@ static void mmc_detect(struct mmc_host * err = mmc_send_status(host-card, NULL); mmc_release_host(host); + up(mmc_classdev(host)-sem); if (err) { mmc_remove(host); Index: usb-2.6/drivers/mmc/core/sd.c === --- usb-2.6.orig/drivers/mmc/core/sd.c +++ usb-2.6/drivers/mmc/core/sd.c @@ -500,6 +500,7 @@ static void mmc_sd_detect(struct mmc_hos BUG_ON(!host); BUG_ON(!host-card); + down(mmc_classdev(host)-sem); mmc_claim_host(host); /* @@ -508,6 +509,7 @@ static void mmc_sd_detect(struct mmc_hos err = mmc_send_status(host-card, NULL); mmc_release_host(host); + up(mmc_classdev(host)-sem); if (err) { mmc_sd_remove(host); Index: usb-2.6/drivers/mmc/core/sdio.c === --- usb-2.6.orig/drivers/mmc/core/sdio.c +++ usb-2.6/drivers/mmc/core/sdio.c @@ -195,6 +195,7 @@ static void mmc_sdio_detect(struct mmc_h BUG_ON(!host); BUG_ON(!host-card); + down(mmc_classdev(host)-sem); mmc_claim_host(host); /* @@ -203,6 +204,7 @@ static void mmc_sdio_detect(struct mmc_h err = mmc_select_card(host-card); mmc_release_host(host); + up(mmc_classdev(host)-sem); if (err) { mmc_sdio_remove(host); -- Premature optimization is the root of all evil. - Donald Knuth -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: In fact this turns out to be exactly the problem with the MMC subsystem. It uses a dedicated workqueue (kmmcd) to handle state changes, and mmc_suspend_host() does a flush_workqueue(). If the workqueue contains an entry to register or unregister a device, this is guaranteed to deadlock -- and that's exactly what happens when Zdenek inserts or removes a card during the disk-drive spindown time of a system sleep. It looks like the best solution is for mmc_suspend_host() not to flush the workqueue; instead the workqueue should prevent itself from running during a suspend. Right now the easiest way to accomplish this is for the workqueue routines (mmc_detect() and siblings) to acquire the mmc_host's device semaphore. If in the future the MMC subsystem adds dynamic PM support then a more complicated arrangement will be needed. So the patch below, in combination with the previous patch, might just fix the whole problem. The patch looks sane and I think you're right. Well, the patch itself isn't really adequate; it was just a first pass intended to illustrate the nature of the problem. The problems in the MMC subsystem go deeper. The first is the way it uses flush_workqueue(). This is almost always a bad function to call, because it introduces unnecessary dependencies. cancel_delayed_work_sync() is much better. But even changing that won't solve the second issue, which is a genuine bug. There is a race between detect events and suspend events. The mmc_suspend_host() routine starts out by flushing the kmmcd workqueue before calling the host's suspend routine. So what happens if another detect event occurs in between? This whole area of synchronization between card insertion, card removal, suspend, and resume needs to be thought out better. Especially keeping in mind that device registration or unregistration during a system sleep is likely to block until the sleep is over. Lastly, there are some other questions about confusing aspects of the subsystem. What's the story with __mmc_claim_host()? Is it really nothing more than an abortable mutex_lock()? Why does it ever need to be aborted? Why is its second argument an atomic_t instead of a normal int? Why are mmc_detect() and related routines described in comments as Card detection callback from host? They don't handle card _detection_ -- they handle card _removal_. What's the purpose of the card-counting loop in host/omap.c:mmc_omap_switch_handler()? It looks like dead code. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
Hi! Index: usb-2.6/drivers/base/power/main.c === --- usb-2.6.orig/drivers/base/power/main.c +++ usb-2.6/drivers/base/power/main.c @@ -25,6 +25,7 @@ #include linux/pm.h #include linux/resume-trace.h #include linux/rwsem.h +#include linux/sched.h #include ../base.h #include power.h @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem); int (*platform_enable_wakeup)(struct device *dev, int is_on); +static struct task_struct *suspending_task; What locking protects this variable? What happens when suspending_task exits? (Hmm, that would probably be bug, anyway?) Or are we running UP when this is accessed? This at least needs a big fat comment. +bool in_suspend_context(void) +{ + return (suspending_task == current); +} Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Pavel Machek wrote: Hi! Index: usb-2.6/drivers/base/power/main.c === --- usb-2.6.orig/drivers/base/power/main.c +++ usb-2.6/drivers/base/power/main.c @@ -25,6 +25,7 @@ #include linux/pm.h #include linux/resume-trace.h #include linux/rwsem.h +#include linux/sched.h #include ../base.h #include power.h @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem); int (*platform_enable_wakeup)(struct device *dev, int is_on); +static struct task_struct *suspending_task; What locking protects this variable? What happens when suspending_task exits? (Hmm, that would probably be bug, anyway?) It's protected by whatever existing locking scheme allows only one task to start a system sleep at a time. For example, the suspending task has to get a write lock on pm_sleep_rwsem. Yes, if the suspending task exits before the system has woken up, you're in trouble regardless. Or are we running UP when this is accessed? This at least needs a big fat comment. +bool in_suspend_context(void) +{ + return (suspending_task == current); +} We aren't necessarily UP. But since all that matters is whether or not suspend_task is equal to the current task, no extra locking is needed. I'll add a comment explaining all this. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: You know, with this new patch we probably don't need device_pm_schedule_removal() any more. No, we don't. However, because of that dpm_suspend() and device_power_down() need to be changed not to assume that the removed devices will end up on dpm_destroy. Yes. The safest approach will be to check whether the device is still at the end of the dpm_locked or dpm_off list; if it hasn't then we can assume that it has been removed (or scheduled for removal). IMO you also should add this change in device_power_down(): @@ -388,18 +343,15 @@ int device_power_down(pm_message_t state struct list_head *entry = dpm_off.prev; struct device *dev = to_device(entry); - list_del_init(dev-power.entry); error = suspend_device_late(dev, state); if (error) { printk(KERN_ERR Could not power down device %s: error %d\n, kobject_name(dev-kobj), error); - if (list_empty(dev-power.entry)) - list_add(dev-power.entry, dpm_off); break; } - if (list_empty(dev-power.entry)) - list_add(dev-power.entry, dpm_off_irq); + if (!list_empty(dev-power.entry)) + list_move(dev-power.entry, dpm_off_irq); } Actually I changed the last two lines to: + + /* Don't move the entry if the device has been unregistered */ + if (entry == dpm_off.prev) + list_move(entry, dpm_off_irq); with analogous changes to dpm_suspend(). Any more suggestions for updates? Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
2008/2/24, Alan Stern [EMAIL PROTECTED]: On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: You know, with this new patch we probably don't need device_pm_schedule_removal() any more. No, we don't. However, because of that dpm_suspend() and device_power_down() need to be changed not to assume that the removed devices will end up on dpm_destroy. with analogous changes to dpm_suspend(). Any more suggestions for updates? Is there any current patch I should actually test - it looks there is ongoing discussion from the time Rafael has proposed his last patch to check by me: http://bugzilla.kernel.org/attachment.cgi?id=14961action=view But I think it's visible the only suspend update will not resolve my problems and the problems is deeper in the mmc. So should I still test yesterdays' patch ? Zdenek Zdenek -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
Hi! @@ -25,6 +25,7 @@ #include linux/pm.h #include linux/resume-trace.h #include linux/rwsem.h +#include linux/sched.h #include ../base.h #include power.h @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem); int (*platform_enable_wakeup)(struct device *dev, int is_on); +static struct task_struct *suspending_task; What locking protects this variable? What happens when suspending_task exits? (Hmm, that would probably be bug, anyway?) It's protected by whatever existing locking scheme allows only one task to start a system sleep at a time. For example, the suspending task has to get a write lock on pm_sleep_rwsem. And readers of suspending_task are protected by? At the very least, you'd need rmb() before reading it and wmb() after writing to it, but I'm not sure if that's enough on every obscure architecture out there. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Zdenek Kabelac wrote: 2008/2/24, Alan Stern [EMAIL PROTECTED]: On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: You know, with this new patch we probably don't need device_pm_schedule_removal() any more. No, we don't. However, because of that dpm_suspend() and device_power_down() need to be changed not to assume that the removed devices will end up on dpm_destroy. with analogous changes to dpm_suspend(). Any more suggestions for updates? Is there any current patch I should actually test - it looks there is ongoing discussion from the time Rafael has proposed his last patch to check by me: http://bugzilla.kernel.org/attachment.cgi?id=14961action=view But I think it's visible the only suspend update will not resolve my problems and the problems is deeper in the mmc. Yes, there are problems in there, but your feedback is needed nevertheless. So should I still test yesterdays' patch ? Yes please. It will be valuable information to us whether it fixes the suspend issue at least. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Rafael J. Wysocki wrote: On Sunday, 24 of February 2008, Alan Stern wrote: On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: On Sunday, 24 of February 2008, Alan Stern wrote: On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: Ultimately no, it's not. However, we are now late in the -rc2 time frame and the release of -rc3 seems to be imminent. At this point, IMO, that's the safest thing to do. BTW, appended is the patch I'd like to get applied. In the interest of having a safe release, I guess you're right. That's what I'm concerned about at the moment. I'm afraid there won't be enough time to nail down all the issues (there may be some we're not even aware of). All right. You'll need to enlarge your big reversal patch by reverting commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f. Thanks for the hint. In fact, I had to add a couple of fixes in device_power_down() and dpm_suspend(). The entire patch is appended. I hope it's fine by everyone to push the patch below to Greg for 2.6.25. Thanks, Rafael --- From: Rafael J. Wysocki [EMAIL PROTECTED] Remove the code that acquires all device semaphores from the suspend code path as it causes multiple problems to appear (most notably, http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f depending on the code being removed. Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874. Signed-off-by: Rafael J. Wysocki [EMAIL PROTECTED] --- drivers/base/core.c |8 --- drivers/base/power/main.c | 97 +++--- drivers/usb/core/usb.c|2 3 files changed, 8 insertions(+), 99 deletions(-) Index: linux-2.6/drivers/base/power/main.c === --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -48,7 +48,6 @@ */ LIST_HEAD(dpm_active); -static LIST_HEAD(dpm_locked); static LIST_HEAD(dpm_off); static LIST_HEAD(dpm_off_irq); static LIST_HEAD(dpm_destroy); @@ -81,28 +80,6 @@ void device_pm_add(struct device *dev) */ void device_pm_remove(struct device *dev) { - /* -* If this function is called during a suspend, it will be blocked, -* because we're holding the device's semaphore at that time, which may -* lead to a deadlock. In that case we want to print a warning. -* However, it may also be called by unregister_dropped_devices() with -* the device's semaphore released, in which case the warning should -* not be printed. -*/ - if (down_trylock(dev-sem)) { - if (down_read_trylock(pm_sleep_rwsem)) { - /* No suspend in progress, wait on dev-sem */ - down(dev-sem); - up_read(pm_sleep_rwsem); - } else { - /* Suspend in progress, we may deadlock */ - dev_warn(dev, Suspicious %s during suspend\n, - __FUNCTION__); - dump_stack(); - /* The user has been warned ... */ - down(dev-sem); - } - } pr_debug(PM: Removing info for %s:%s\n, dev-bus ? dev-bus-name : No Bus, kobject_name(dev-kobj)); @@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev dpm_sysfs_remove(dev); list_del_init(dev-power.entry); mutex_unlock(dpm_list_mtx); - up(dev-sem); } /** @@ -266,7 +242,7 @@ static void dpm_resume(void) struct list_head *entry = dpm_off.next; struct device *dev = to_device(entry); - list_move_tail(entry, dpm_locked); + list_move_tail(entry, dpm_active); mutex_unlock(dpm_list_mtx); resume_device(dev); mutex_lock(dpm_list_mtx); @@ -275,25 +251,6 @@ static void dpm_resume(void) } /** - * unlock_all_devices - Release each device's semaphore - * - * Go through the dpm_off list. Put each device on the dpm_active - * list and unlock it. - */ -static void unlock_all_devices(void) -{ - mutex_lock(dpm_list_mtx); - while (!list_empty(dpm_locked)) { - struct list_head *entry = dpm_locked.prev; - struct device *dev = to_device(entry); - - list_move(entry, dpm_active); - up(dev-sem); - } - mutex_unlock(dpm_list_mtx); -} - -/** * unregister_dropped_devices - Unregister devices scheduled for removal * * Unregister all devices on the dpm_destroy list. @@ -305,7 +262,6 @@ static void unregister_dropped_devices(v struct list_head *entry = dpm_destroy.next;
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Pavel Machek wrote: What locking protects this variable? What happens when suspending_task exits? (Hmm, that would probably be bug, anyway?) It's protected by whatever existing locking scheme allows only one task to start a system sleep at a time. For example, the suspending task has to get a write lock on pm_sleep_rwsem. And readers of suspending_task are protected by? I added a comment about that too. At the very least, you'd need rmb() before reading it and wmb() after writing to it, but I'm not sure if that's enough on every obscure architecture out there. No, neither one is needed because of the way suspending_task is used. It's not necessary for a reader R to see the variable's actual value; all R needs to know is whether or not suspending_task is equal to R. Since the only process which can set suspending_task to R is R itself, and since R will set suspending_task back to NULL before releasing the write lock on pm_sleep_rwsem, there's never any ambiguity. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: From: Rafael J. Wysocki [EMAIL PROTECTED] Remove the code that acquires all device semaphores from the suspend code path as it causes multiple problems to appear (most notably, http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f depending on the code being removed. Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874. Do you actually know whether removing that lock fixes the reported bug? I agree, the lock should be removed for now. But I'd sure like to get some feedback about what's going wrong. It's starting to look as though we'd be a lot better off blocking device registration during sleep instead of failing it. Shouldn't resume_device() and suspend_device() now acquire the device semaphore before calling the various methods? That's the way they used to work. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: From: Rafael J. Wysocki [EMAIL PROTECTED] Remove the code that acquires all device semaphores from the suspend code path as it causes multiple problems to appear (most notably, http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f depending on the code being removed. Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874. This looks okay. Are you going to submit it just for 2.6.25, leaving the existing code as-is for 2.6.26? Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Alan Stern wrote: On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: From: Rafael J. Wysocki [EMAIL PROTECTED] Remove the code that acquires all device semaphores from the suspend code path as it causes multiple problems to appear (most notably, http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f depending on the code being removed. Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874. Do you actually know whether removing that lock fixes the reported bug? In fact I'm waiting for the confirmation, but given the symptoms described by Lukas I'm quite confident that the problem shouldn't be reproducible with the lock removed. I agree, the lock should be removed for now. But I'd sure like to get some feedback about what's going wrong. No question about that. It's starting to look as though we'd be a lot better off blocking device registration during sleep instead of failing it. Agreed. Shouldn't resume_device() and suspend_device() now acquire the device semaphore before calling the various methods? That's the way they used to work. Yes, thanks for reminding me about that. Updated patch follows. Thanks, Rafael --- From: Rafael J. Wysocki [EMAIL PROTECTED] Remove the code that acquires all device semaphores from the suspend code path as it causes multiple problems to appear (most notably, http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f depending on the code being removed. Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874. Signed-off-by: Rafael J. Wysocki [EMAIL PROTECTED] --- drivers/base/core.c |8 --- drivers/base/power/main.c | 106 ++ drivers/usb/core/usb.c|2 3 files changed, 17 insertions(+), 99 deletions(-) Index: linux-2.6/drivers/base/power/main.c === --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -48,7 +48,6 @@ */ LIST_HEAD(dpm_active); -static LIST_HEAD(dpm_locked); static LIST_HEAD(dpm_off); static LIST_HEAD(dpm_off_irq); static LIST_HEAD(dpm_destroy); @@ -81,28 +80,6 @@ void device_pm_add(struct device *dev) */ void device_pm_remove(struct device *dev) { - /* -* If this function is called during a suspend, it will be blocked, -* because we're holding the device's semaphore at that time, which may -* lead to a deadlock. In that case we want to print a warning. -* However, it may also be called by unregister_dropped_devices() with -* the device's semaphore released, in which case the warning should -* not be printed. -*/ - if (down_trylock(dev-sem)) { - if (down_read_trylock(pm_sleep_rwsem)) { - /* No suspend in progress, wait on dev-sem */ - down(dev-sem); - up_read(pm_sleep_rwsem); - } else { - /* Suspend in progress, we may deadlock */ - dev_warn(dev, Suspicious %s during suspend\n, - __FUNCTION__); - dump_stack(); - /* The user has been warned ... */ - down(dev-sem); - } - } pr_debug(PM: Removing info for %s:%s\n, dev-bus ? dev-bus-name : No Bus, kobject_name(dev-kobj)); @@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev dpm_sysfs_remove(dev); list_del_init(dev-power.entry); mutex_unlock(dpm_list_mtx); - up(dev-sem); } /** @@ -230,6 +206,8 @@ static int resume_device(struct device * TRACE_DEVICE(dev); TRACE_RESUME(0); + down(dev-sem); + if (dev-bus dev-bus-resume) { dev_dbg(dev,resuming\n); error = dev-bus-resume(dev); @@ -245,6 +223,8 @@ static int resume_device(struct device * error = dev-class-resume(dev); } + up(dev-sem); + TRACE_RESUME(error); return error; } @@ -266,7 +246,7 @@ static void dpm_resume(void) struct list_head *entry = dpm_off.next; struct device *dev = to_device(entry); - list_move_tail(entry, dpm_locked); + list_move_tail(entry, dpm_active); mutex_unlock(dpm_list_mtx); resume_device(dev); mutex_lock(dpm_list_mtx); @@ -275,25 +255,6 @@ static void dpm_resume(void) } /** - * unlock_all_devices - Release each device's semaphore - * - * Go through the
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun 2008-02-24 15:33:01, Alan Stern wrote: On Sun, 24 Feb 2008, Pavel Machek wrote: What locking protects this variable? What happens when suspending_task exits? (Hmm, that would probably be bug, anyway?) It's protected by whatever existing locking scheme allows only one task to start a system sleep at a time. For example, the suspending task has to get a write lock on pm_sleep_rwsem. And readers of suspending_task are protected by? I added a comment about that too. At the very least, you'd need rmb() before reading it and wmb() after writing to it, but I'm not sure if that's enough on every obscure architecture out there. No, neither one is needed because of the way suspending_task is used. It's not necessary for a reader R to see the variable's actual value; all R needs to know is whether or not suspending_task is equal to R. Since the only process which can set suspending_task to R is R itself, and since R will set suspending_task back to NULL before releasing the write lock on pm_sleep_rwsem, there's never any ambiguity. Subtle. Very subtly wrong ;-). imagine suspending_task == 0xabcdef01. Now task R with current == 0xabcd reads suspending_task while the other cpu is writing to it, and sees 0xabcd (0xef01 was not yet written) -- and mistakenly believes that R == suspending_task. I agree it is very unlikely, and it will not happen on i386. But what about just using atomic_t suspending_task, and store current-pid into it? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Pavel Machek wrote: On Sun 2008-02-24 15:33:01, Alan Stern wrote: On Sun, 24 Feb 2008, Pavel Machek wrote: What locking protects this variable? What happens when suspending_task exits? (Hmm, that would probably be bug, anyway?) It's protected by whatever existing locking scheme allows only one task to start a system sleep at a time. For example, the suspending task has to get a write lock on pm_sleep_rwsem. And readers of suspending_task are protected by? I added a comment about that too. At the very least, you'd need rmb() before reading it and wmb() after writing to it, but I'm not sure if that's enough on every obscure architecture out there. No, neither one is needed because of the way suspending_task is used. It's not necessary for a reader R to see the variable's actual value; all R needs to know is whether or not suspending_task is equal to R. Since the only process which can set suspending_task to R is R itself, and since R will set suspending_task back to NULL before releasing the write lock on pm_sleep_rwsem, there's never any ambiguity. Subtle. Very subtly wrong ;-). imagine suspending_task == 0xabcdef01. Now task R with current == 0xabcd reads suspending_task while the other cpu is writing to it, and sees 0xabcd (0xef01 was not yet written) -- and mistakenly believes that R == suspending_task. I agree it is very unlikely, and it will not happen on i386. But what about just using atomic_t suspending_task, and store current-pid into it? I'd rather use a lock, frankly. For example, we can require the readers to take pm_sleep_rwsem for reading in order to access that. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Alan Stern wrote: On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: From: Rafael J. Wysocki [EMAIL PROTECTED] Remove the code that acquires all device semaphores from the suspend code path as it causes multiple problems to appear (most notably, http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f depending on the code being removed. Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874. This looks okay. Are you going to submit it just for 2.6.25, leaving the existing code as-is for 2.6.26? No, I'd like to start over on top of 2.6.25. That will be much cleaner from the upstream point of view. We seem to have all the pieces anyway, so that's only a matter of putting them back together. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: Very subtly wrong ;-). imagine suspending_task == 0xabcdef01. Now task R with current == 0xabcd reads suspending_task while the other cpu is writing to it, and sees 0xabcd (0xef01 was not yet written) -- and mistakenly believes that R == suspending_task. I agree it is very unlikely, and it will not happen on i386. But what about just using atomic_t suspending_task, and store current-pid into it? I'd rather use a lock, frankly. For example, we can require the readers to take pm_sleep_rwsem for reading in order to access that. That certainly won't work. Imagine what would happen when the reader _was_ the suspending task. But if you really twist my arm, I'll go along with Pavel's suggestion. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Pavel Machek wrote: At the very least, you'd need rmb() before reading it and wmb() after writing to it, but I'm not sure if that's enough on every obscure architecture out there. No, neither one is needed because of the way suspending_task is used. It's not necessary for a reader R to see the variable's actual value; all R needs to know is whether or not suspending_task is equal to R. Since the only process which can set suspending_task to R is R itself, and since R will set suspending_task back to NULL before releasing the write lock on pm_sleep_rwsem, there's never any ambiguity. Subtle. Very subtly wrong ;-). imagine suspending_task == 0xabcdef01. Now task R with current == 0xabcd reads suspending_task while the other cpu is writing to it, and sees 0xabcd (0xef01 was not yet written) -- and mistakenly believes that R == suspending_task. I always thought that reads and writes of pointers are atomic, just like reads and writes of longs. Is that wrong? Now if pointers were the same size as long long then I would agree with you. I agree it is very unlikely, and it will not happen on i386. But what about just using atomic_t suspending_task, and store current-pid into it? That would work just as well. Except that it wouldn't need to be atomic_t, because current-pid is always an integer (not guaranteed, I suppose, but that's what it is on all current architectures) and reads/writes of ints _are_ atomic. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] [Bug 10030] Suspend doesn't work when SD card is inserted
On Sat, 23 Feb 2008, Alan Stern wrote: > It happened in a workqueue. There could be lots of similar cases: Some > interrupt-driven event causes a hotplug action. Since the action can't > be carried out in interrupt context, the driver has no choice but to > defer it to a workqueue or kernel thread. Blocking that workqueue or > kernel thread won't cause a problem _unless_ the driver's > suspend/resume methods try to synchronize with it. That's the > difficult case. In fact this turns out to be exactly the problem with the MMC subsystem. It uses a dedicated workqueue (kmmcd) to handle state changes, and mmc_suspend_host() does a flush_workqueue(). If the workqueue contains an entry to register or unregister a device, this is guaranteed to deadlock -- and that's exactly what happens when Zdenek inserts or removes a card during the disk-drive spindown time of a system sleep. It looks like the best solution is for mmc_suspend_host() not to flush the workqueue; instead the workqueue should prevent itself from running during a suspend. Right now the easiest way to accomplish this is for the workqueue routines (mmc_detect() and siblings) to acquire the mmc_host's device semaphore. If in the future the MMC subsystem adds dynamic PM support then a more complicated arrangement will be needed. So the patch below, in combination with the previous patch, might just fix the whole problem. Alan Stern Index: usb-2.6/drivers/mmc/core/core.c === --- usb-2.6.orig/drivers/mmc/core/core.c +++ usb-2.6/drivers/mmc/core/core.c @@ -731,8 +731,6 @@ void mmc_stop_host(struct mmc_host *host */ int mmc_suspend_host(struct mmc_host *host, pm_message_t state) { - mmc_flush_scheduled_work(); - mmc_bus_get(host); if (host->bus_ops && !host->bus_dead) { if (host->bus_ops->suspend) Index: usb-2.6/drivers/mmc/core/mmc.c === --- usb-2.6.orig/drivers/mmc/core/mmc.c +++ usb-2.6/drivers/mmc/core/mmc.c @@ -441,6 +441,7 @@ static void mmc_detect(struct mmc_host * BUG_ON(!host); BUG_ON(!host->card); + down(_classdev(host)->sem); mmc_claim_host(host); /* @@ -449,6 +450,7 @@ static void mmc_detect(struct mmc_host * err = mmc_send_status(host->card, NULL); mmc_release_host(host); + up(_classdev(host)->sem); if (err) { mmc_remove(host); Index: usb-2.6/drivers/mmc/core/sd.c === --- usb-2.6.orig/drivers/mmc/core/sd.c +++ usb-2.6/drivers/mmc/core/sd.c @@ -500,6 +500,7 @@ static void mmc_sd_detect(struct mmc_hos BUG_ON(!host); BUG_ON(!host->card); + down(_classdev(host)->sem); mmc_claim_host(host); /* @@ -508,6 +509,7 @@ static void mmc_sd_detect(struct mmc_hos err = mmc_send_status(host->card, NULL); mmc_release_host(host); + up(_classdev(host)->sem); if (err) { mmc_sd_remove(host); Index: usb-2.6/drivers/mmc/core/sdio.c === --- usb-2.6.orig/drivers/mmc/core/sdio.c +++ usb-2.6/drivers/mmc/core/sdio.c @@ -195,6 +195,7 @@ static void mmc_sdio_detect(struct mmc_h BUG_ON(!host); BUG_ON(!host->card); + down(_classdev(host)->sem); mmc_claim_host(host); /* @@ -203,6 +204,7 @@ static void mmc_sdio_detect(struct mmc_h err = mmc_select_card(host->card); mmc_release_host(host); + up(_classdev(host)->sem); if (err) { mmc_sdio_remove(host); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: > On Sunday, 24 of February 2008, Alan Stern wrote: > > On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: > > > > > Ultimately no, it's not. However, we are now late in the -rc2 time frame > > > and > > > the release of -rc3 seems to be imminent. At this point, IMO, that's the > > > safest thing to do. BTW, appended is the patch I'd like to get applied. > > > > In the interest of having a safe release, I guess you're right. > > That's what I'm concerned about at the moment. I'm afraid there won't be > enough time to nail down all the issues (there may be some we're not even > aware of). All right. You'll need to enlarge your big reversal patch by reverting commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f. > > Below is my suggested approach, based on yours. It might even solve > > the MMC problem, who knows? And it adds some potentially useful > > debugging, although it would be better to dump just the stack of > > suspending_task. Do you know how to do that from within a timer > > routine? > > No, I don't. On further checking the answer turns out to be sched_show_task(). That's the routine which gets called for each process when you type Alt-SysRq-t. > > I still have no clear idea about what's going on with the ACPI bug in > > #9874. > > It seems that the ACPI code is not prepared to cope with a failing device > registration, in which case it'd need fixing. Frankly, I'm afraid there may > be more issues like this throughout the tree. The fact remains that it is unsafe to register a device during a sleep transition, although if the device's driver doesn't have a suspend or resume method you can probably get away with it. > > However perhaps the bug wouldn't occur if we blocked device > > registration until after the resume was finished, instead of failing it > > outright. Since the registration in this case was done in a workqueue > > thread, blocking wouldn't cause an immediate deadlock. > > No, it wouldn't, but the fact that it happens in the ACPI code makes me worry. It happened in a workqueue. There could be lots of similar cases: Some interrupt-driven event causes a hotplug action. Since the action can't be carried out in interrupt context, the driver has no choice but to defer it to a workqueue or kernel thread. Blocking that workqueue or kernel thread won't cause a problem _unless_ the driver's suspend/resume methods try to synchronize with it. That's the difficult case. > If we block that code and the things it's supposed to do turn out to be > necessary for suspending later on, we'll be in trouble. Exactly. > > @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev > > /* No suspend in progress, wait on dev->sem */ > > down(>sem); > > up_read(_sleep_rwsem); > > - } else { > > - /* Suspend in progress, we may deadlock */ > > + } else if (!in_suspend_context()) { > > + /* Suspending in another task, we may deadlock */ > > Well, that shouldn't really deadlock. If it does, there is a potential design > issue somewhere. I think it might be better to set up a timer in here too. At this point the driver core owns the device semaphore, so the unregistration task will block until the sleep is over. If the driver's resume method has to synchronize with the unregistration task then it will deadlock. I agree, it would be a design issue. In fact, it's the same design issue described earlier in this email. > Although IMO it would be even better to just set up a timer and remove the > warning altogehter. That warning was just for debugging purposes. A timer in the resume path could do the same thing with fewer false alarms, just like the timer in the suspend path. I have added it to the new patch. > > dev_warn(dev, "Suspicious %s during suspend\n", > > __FUNCTION__); > > dump_stack(); > > /* The user has been warned ... */ > > down(>sem); > > } > > + /* Otherwise we're okay */ > > } > > pr_debug("PM: Removing info for %s:%s\n", > > dev->bus ? dev->bus->name : "No Bus", > > There's a problem here, because we shouldn't release the semaphore if we're > in the suspend context. Yes, you're right. It's fixed in the new version. > > +static void suspend_timeout(unsigned long _dev) > > +{ > > + struct device *dev = (struct device *) _dev; > > + > > + dev_err(dev, "deadlock during suspend!\n"); > > + show_state(); > > The show_state() seems to be overkill and won't really help if the user can't > collect the output on the fly using a serial console or something like this. > [The debug stuff printed here should fit a typical laptop screen, ideally.] Changed. > I'd prefer > > if (in_suspend_context()) { >
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Alan Stern wrote: > On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: > > > Ultimately no, it's not. However, we are now late in the -rc2 time frame > > and > > the release of -rc3 seems to be imminent. At this point, IMO, that's the > > safest thing to do. BTW, appended is the patch I'd like to get applied. > > In the interest of having a safe release, I guess you're right. That's what I'm concerned about at the moment. I'm afraid there won't be enough time to nail down all the issues (there may be some we're not even aware of). > It's unfortunate, though -- there's no good way to get thorough testing > without putting the code in Linus's tree. Absolutely. Still, the code in question introduces unexpected behavior that we don't really understand and it's not safe to leave it in the mainline. > > > How would we then learn about drivers trying to register or unregister a > > > device during a sleep transition? > > > > I think we should fix the ones we know about and try to reintroduce this > > change in the 2.6.26 time frame. It also seems reasonable to reconsider > > what > > we want to achieve, as there may be a better way to do that. > > Below is my suggested approach, based on yours. It might even solve > the MMC problem, who knows? And it adds some potentially useful > debugging, although it would be better to dump just the stack of > suspending_task. Do you know how to do that from within a timer > routine? No, I don't. > I still have no clear idea about what's going on with the ACPI bug in > #9874. It seems that the ACPI code is not prepared to cope with a failing device registration, in which case it'd need fixing. Frankly, I'm afraid there may be more issues like this throughout the tree. > However perhaps the bug wouldn't occur if we blocked device > registration until after the resume was finished, instead of failing it > outright. Since the registration in this case was done in a workqueue > thread, blocking wouldn't cause an immediate deadlock. No, it wouldn't, but the fact that it happens in the ACPI code makes me worry. If we block that code and the things it's supposed to do turn out to be necessary for suspending later on, we'll be in trouble. > Index: usb-2.6/drivers/base/power/main.c > === > --- usb-2.6.orig/drivers/base/power/main.c > +++ usb-2.6/drivers/base/power/main.c > @@ -25,6 +25,7 @@ > #include > #include > #include > +#include > > #include "../base.h" > #include "power.h" > @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem); > > int (*platform_enable_wakeup)(struct device *dev, int is_on); > > +static struct task_struct *suspending_task; > + > +bool in_suspend_context(void) > +{ > + return (suspending_task == current); > +} > + > /** > * device_pm_add - add a device to the list of active devices > * @dev: Device to be added to the list > @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev > /* No suspend in progress, wait on dev->sem */ > down(>sem); > up_read(_sleep_rwsem); > - } else { > - /* Suspend in progress, we may deadlock */ > + } else if (!in_suspend_context()) { > + /* Suspending in another task, we may deadlock */ Well, that shouldn't really deadlock. If it does, there is a potential design issue somewhere. I think it might be better to set up a timer in here too. Although IMO it would be even better to just set up a timer and remove the warning altogehter. > dev_warn(dev, "Suspicious %s during suspend\n", > __FUNCTION__); > dump_stack(); > /* The user has been warned ... */ > down(>sem); > } > + /* Otherwise we're okay */ > } > pr_debug("PM: Removing info for %s:%s\n", >dev->bus ? dev->bus->name : "No Bus", There's a problem here, because we shouldn't release the semaphore if we're in the suspend context. > @@ -272,6 +281,7 @@ static void dpm_resume(void) > mutex_lock(_list_mtx); > } > mutex_unlock(_list_mtx); > + suspending_task = NULL; > } > > /** > @@ -410,6 +420,17 @@ int device_power_down(pm_message_t state > } > EXPORT_SYMBOL_GPL(device_power_down); > > +/* Provide debugging info if we hang or deadlock during suspend */ > +static struct timer_list suspend_timer; > + > +static void suspend_timeout(unsigned long _dev) > +{ > + struct device *dev = (struct device *) _dev; > + > + dev_err(dev, "deadlock during suspend!\n"); > + show_state(); The show_state() seems to be overkill and won't really help if the user can't collect the output on the fly using a serial console or something like this. [The debug stuff printed here should
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: > Ultimately no, it's not. However, we are now late in the -rc2 time frame and > the release of -rc3 seems to be imminent. At this point, IMO, that's the > safest thing to do. BTW, appended is the patch I'd like to get applied. In the interest of having a safe release, I guess you're right. It's unfortunate, though -- there's no good way to get thorough testing without putting the code in Linus's tree. > > How would we then learn about drivers trying to register or unregister a > > device during a sleep transition? > > I think we should fix the ones we know about and try to reintroduce this > change in the 2.6.26 time frame. It also seems reasonable to reconsider what > we want to achieve, as there may be a better way to do that. Below is my suggested approach, based on yours. It might even solve the MMC problem, who knows? And it adds some potentially useful debugging, although it would be better to dump just the stack of suspending_task. Do you know how to do that from within a timer routine? I still have no clear idea about what's going on with the ACPI bug in #9874. However perhaps the bug wouldn't occur if we blocked device registration until after the resume was finished, instead of failing it outright. Since the registration in this case was done in a workqueue thread, blocking wouldn't cause an immediate deadlock. Alan Stern Index: usb-2.6/drivers/base/power/main.c === --- usb-2.6.orig/drivers/base/power/main.c +++ usb-2.6/drivers/base/power/main.c @@ -25,6 +25,7 @@ #include #include #include +#include #include "../base.h" #include "power.h" @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem); int (*platform_enable_wakeup)(struct device *dev, int is_on); +static struct task_struct *suspending_task; + +bool in_suspend_context(void) +{ + return (suspending_task == current); +} + /** * device_pm_add - add a device to the list of active devices * @dev: Device to be added to the list @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev /* No suspend in progress, wait on dev->sem */ down(>sem); up_read(_sleep_rwsem); - } else { - /* Suspend in progress, we may deadlock */ + } else if (!in_suspend_context()) { + /* Suspending in another task, we may deadlock */ dev_warn(dev, "Suspicious %s during suspend\n", __FUNCTION__); dump_stack(); /* The user has been warned ... */ down(>sem); } + /* Otherwise we're okay */ } pr_debug("PM: Removing info for %s:%s\n", dev->bus ? dev->bus->name : "No Bus", @@ -272,6 +281,7 @@ static void dpm_resume(void) mutex_lock(_list_mtx); } mutex_unlock(_list_mtx); + suspending_task = NULL; } /** @@ -410,6 +420,17 @@ int device_power_down(pm_message_t state } EXPORT_SYMBOL_GPL(device_power_down); +/* Provide debugging info if we hang or deadlock during suspend */ +static struct timer_list suspend_timer; + +static void suspend_timeout(unsigned long _dev) +{ + struct device *dev = (struct device *) _dev; + + dev_err(dev, "deadlock during suspend!\n"); + show_state(); +} + /** * suspend_device - Save state of one device. * @dev: Device. @@ -419,6 +440,10 @@ int suspend_device(struct device *dev, p { int error = 0; + /* Provide debugging output in case of a deadlock */ + setup_timer(_timer, suspend_timeout, (unsigned long) dev); + mod_timer(_timer, jiffies + 5*HZ); + if (dev->power.power_state.event) { dev_dbg(dev, "PM: suspend %d-->%d\n", dev->power.power_state.event, state.event); @@ -441,6 +466,8 @@ int suspend_device(struct device *dev, p error = dev->bus->suspend(dev, state); suspend_report_result(dev->bus->suspend, error); } + + del_timer_sync(_timer); return error; } @@ -460,6 +487,7 @@ static int dpm_suspend(pm_message_t stat { int error = 0; + suspending_task = current; mutex_lock(_list_mtx); while (!list_empty(_locked)) { struct list_head *entry = dpm_locked.prev; Index: usb-2.6/drivers/base/power/power.h === --- usb-2.6.orig/drivers/base/power/power.h +++ usb-2.6/drivers/base/power/power.h @@ -11,6 +11,7 @@ static inline struct device *to_device(s return container_of(entry, struct device, power.entry); } +extern bool in_suspend_context(void); extern void device_pm_add(struct device *); extern void device_pm_remove(struct
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Saturday, 23 of February 2008, Alan Stern wrote: > On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: > > > Unfortunately, I missed your Bugzilla comment at > > http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28 > > Strange... But obviously you did see it eventually. > > > Well, in the face of it, I'm considering to remove the code that > > acquires device semaphores from the suspend core for now. Evidently, this > > change turns out to be painfully premature. > > I wonder if that's really the best thing. Ultimately no, it's not. However, we are now late in the -rc2 time frame and the release of -rc3 seems to be imminent. At this point, IMO, that's the safest thing to do. BTW, appended is the patch I'd like to get applied. > How would we then learn about drivers trying to register or unregister a > device during a sleep transition? I think we should fix the ones we know about and try to reintroduce this change in the 2.6.26 time frame. It also seems reasonable to reconsider what we want to achieve, as there may be a better way to do that. > Do you think it might be possible instead to somehow allow these > unregistrations to go through, while still failing or blocking > registrations? Yes, that should be possible, but as I said above, I think it's not the right time for doing that right now. > It shouldn't be too hard to modify the driver core so that it calls the > driver's remove() method without trying to acquire dev->sem if your > in_suspend_context() test succeeds. I agree that the in_suspend_context() test may be useful and this is one of the reasons why I think the entire approach requires reconsideration. > I have to admit, I still don't understand what's going on with the MMC > driver. Me neither, and that alone is a good enough reason for resigning from acquiring device semaphores in the suspend core, at least in the mainline kernel, until we understand the problem. > Why is there a workqueue involved? If the workqueue fails to > unregister the device, why should it bother the suspend routine? After > all, if the suspend routine can afford to wait for the workqueue to > finish then it could just as well afford to do the unregistration > itself. Yes, that's strange. > > Also, we have apparent problems with pm_sleep_lock() > > being take in device_add() (see > > http://bugzilla.kernel.org/show_bug.cgi?id=9874). > > We'll have to get more information from the bug reporter to figure out > what really happened there. Yes. > Ultimately it may turn out some drivers just aren't very careful about > when they try to register new devices. That's almost certain to me. > Doing the registration by way of a workqueue can be problematic if the > workqueue happens to run during a system sleep transition. That will still > be true if you revert the acquire-all-semaphores patch. Yes, but our taking of device semaphores exposes these problems in a rather drastic fashion. :-) Thanks, Rafael --- drivers/base/power/main.c | 84 ++ 1 file changed, 4 insertions(+), 80 deletions(-) Index: linux-2.6/drivers/base/power/main.c === --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -48,7 +48,6 @@ */ LIST_HEAD(dpm_active); -static LIST_HEAD(dpm_locked); static LIST_HEAD(dpm_off); static LIST_HEAD(dpm_off_irq); static LIST_HEAD(dpm_destroy); @@ -81,28 +80,6 @@ void device_pm_add(struct device *dev) */ void device_pm_remove(struct device *dev) { - /* -* If this function is called during a suspend, it will be blocked, -* because we're holding the device's semaphore at that time, which may -* lead to a deadlock. In that case we want to print a warning. -* However, it may also be called by unregister_dropped_devices() with -* the device's semaphore released, in which case the warning should -* not be printed. -*/ - if (down_trylock(>sem)) { - if (down_read_trylock(_sleep_rwsem)) { - /* No suspend in progress, wait on dev->sem */ - down(>sem); - up_read(_sleep_rwsem); - } else { - /* Suspend in progress, we may deadlock */ - dev_warn(dev, "Suspicious %s during suspend\n", - __FUNCTION__); - dump_stack(); - /* The user has been warned ... */ - down(>sem); - } - } pr_debug("PM: Removing info for %s:%s\n", dev->bus ? dev->bus->name : "No Bus", kobject_name(>kobj)); @@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev dpm_sysfs_remove(dev); list_del_init(>power.entry); mutex_unlock(_list_mtx); - up(>sem); } /** @@ -266,7 +242,7 @@ static
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Saturday, 23 of February 2008, Alan Stern wrote: On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: Unfortunately, I missed your Bugzilla comment at http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28 Strange... But obviously you did see it eventually. Well, in the face of it, I'm considering to remove the code that acquires device semaphores from the suspend core for now. Evidently, this change turns out to be painfully premature. I wonder if that's really the best thing. Ultimately no, it's not. However, we are now late in the -rc2 time frame and the release of -rc3 seems to be imminent. At this point, IMO, that's the safest thing to do. BTW, appended is the patch I'd like to get applied. How would we then learn about drivers trying to register or unregister a device during a sleep transition? I think we should fix the ones we know about and try to reintroduce this change in the 2.6.26 time frame. It also seems reasonable to reconsider what we want to achieve, as there may be a better way to do that. Do you think it might be possible instead to somehow allow these unregistrations to go through, while still failing or blocking registrations? Yes, that should be possible, but as I said above, I think it's not the right time for doing that right now. It shouldn't be too hard to modify the driver core so that it calls the driver's remove() method without trying to acquire dev-sem if your in_suspend_context() test succeeds. I agree that the in_suspend_context() test may be useful and this is one of the reasons why I think the entire approach requires reconsideration. I have to admit, I still don't understand what's going on with the MMC driver. Me neither, and that alone is a good enough reason for resigning from acquiring device semaphores in the suspend core, at least in the mainline kernel, until we understand the problem. Why is there a workqueue involved? If the workqueue fails to unregister the device, why should it bother the suspend routine? After all, if the suspend routine can afford to wait for the workqueue to finish then it could just as well afford to do the unregistration itself. Yes, that's strange. Also, we have apparent problems with pm_sleep_lock() being take in device_add() (see http://bugzilla.kernel.org/show_bug.cgi?id=9874). We'll have to get more information from the bug reporter to figure out what really happened there. Yes. Ultimately it may turn out some drivers just aren't very careful about when they try to register new devices. That's almost certain to me. Doing the registration by way of a workqueue can be problematic if the workqueue happens to run during a system sleep transition. That will still be true if you revert the acquire-all-semaphores patch. Yes, but our taking of device semaphores exposes these problems in a rather drastic fashion. :-) Thanks, Rafael --- drivers/base/power/main.c | 84 ++ 1 file changed, 4 insertions(+), 80 deletions(-) Index: linux-2.6/drivers/base/power/main.c === --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -48,7 +48,6 @@ */ LIST_HEAD(dpm_active); -static LIST_HEAD(dpm_locked); static LIST_HEAD(dpm_off); static LIST_HEAD(dpm_off_irq); static LIST_HEAD(dpm_destroy); @@ -81,28 +80,6 @@ void device_pm_add(struct device *dev) */ void device_pm_remove(struct device *dev) { - /* -* If this function is called during a suspend, it will be blocked, -* because we're holding the device's semaphore at that time, which may -* lead to a deadlock. In that case we want to print a warning. -* However, it may also be called by unregister_dropped_devices() with -* the device's semaphore released, in which case the warning should -* not be printed. -*/ - if (down_trylock(dev-sem)) { - if (down_read_trylock(pm_sleep_rwsem)) { - /* No suspend in progress, wait on dev-sem */ - down(dev-sem); - up_read(pm_sleep_rwsem); - } else { - /* Suspend in progress, we may deadlock */ - dev_warn(dev, Suspicious %s during suspend\n, - __FUNCTION__); - dump_stack(); - /* The user has been warned ... */ - down(dev-sem); - } - } pr_debug(PM: Removing info for %s:%s\n, dev-bus ? dev-bus-name : No Bus, kobject_name(dev-kobj)); @@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev dpm_sysfs_remove(dev); list_del_init(dev-power.entry); mutex_unlock(dpm_list_mtx); - up(dev-sem); } /** @@ -266,7 +242,7 @@ static void dpm_resume(void)
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: Ultimately no, it's not. However, we are now late in the -rc2 time frame and the release of -rc3 seems to be imminent. At this point, IMO, that's the safest thing to do. BTW, appended is the patch I'd like to get applied. In the interest of having a safe release, I guess you're right. It's unfortunate, though -- there's no good way to get thorough testing without putting the code in Linus's tree. How would we then learn about drivers trying to register or unregister a device during a sleep transition? I think we should fix the ones we know about and try to reintroduce this change in the 2.6.26 time frame. It also seems reasonable to reconsider what we want to achieve, as there may be a better way to do that. Below is my suggested approach, based on yours. It might even solve the MMC problem, who knows? And it adds some potentially useful debugging, although it would be better to dump just the stack of suspending_task. Do you know how to do that from within a timer routine? I still have no clear idea about what's going on with the ACPI bug in #9874. However perhaps the bug wouldn't occur if we blocked device registration until after the resume was finished, instead of failing it outright. Since the registration in this case was done in a workqueue thread, blocking wouldn't cause an immediate deadlock. Alan Stern Index: usb-2.6/drivers/base/power/main.c === --- usb-2.6.orig/drivers/base/power/main.c +++ usb-2.6/drivers/base/power/main.c @@ -25,6 +25,7 @@ #include linux/pm.h #include linux/resume-trace.h #include linux/rwsem.h +#include linux/sched.h #include ../base.h #include power.h @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem); int (*platform_enable_wakeup)(struct device *dev, int is_on); +static struct task_struct *suspending_task; + +bool in_suspend_context(void) +{ + return (suspending_task == current); +} + /** * device_pm_add - add a device to the list of active devices * @dev: Device to be added to the list @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev /* No suspend in progress, wait on dev-sem */ down(dev-sem); up_read(pm_sleep_rwsem); - } else { - /* Suspend in progress, we may deadlock */ + } else if (!in_suspend_context()) { + /* Suspending in another task, we may deadlock */ dev_warn(dev, Suspicious %s during suspend\n, __FUNCTION__); dump_stack(); /* The user has been warned ... */ down(dev-sem); } + /* Otherwise we're okay */ } pr_debug(PM: Removing info for %s:%s\n, dev-bus ? dev-bus-name : No Bus, @@ -272,6 +281,7 @@ static void dpm_resume(void) mutex_lock(dpm_list_mtx); } mutex_unlock(dpm_list_mtx); + suspending_task = NULL; } /** @@ -410,6 +420,17 @@ int device_power_down(pm_message_t state } EXPORT_SYMBOL_GPL(device_power_down); +/* Provide debugging info if we hang or deadlock during suspend */ +static struct timer_list suspend_timer; + +static void suspend_timeout(unsigned long _dev) +{ + struct device *dev = (struct device *) _dev; + + dev_err(dev, deadlock during suspend!\n); + show_state(); +} + /** * suspend_device - Save state of one device. * @dev: Device. @@ -419,6 +440,10 @@ int suspend_device(struct device *dev, p { int error = 0; + /* Provide debugging output in case of a deadlock */ + setup_timer(suspend_timer, suspend_timeout, (unsigned long) dev); + mod_timer(suspend_timer, jiffies + 5*HZ); + if (dev-power.power_state.event) { dev_dbg(dev, PM: suspend %d--%d\n, dev-power.power_state.event, state.event); @@ -441,6 +466,8 @@ int suspend_device(struct device *dev, p error = dev-bus-suspend(dev, state); suspend_report_result(dev-bus-suspend, error); } + + del_timer_sync(suspend_timer); return error; } @@ -460,6 +487,7 @@ static int dpm_suspend(pm_message_t stat { int error = 0; + suspending_task = current; mutex_lock(dpm_list_mtx); while (!list_empty(dpm_locked)) { struct list_head *entry = dpm_locked.prev; Index: usb-2.6/drivers/base/power/power.h === --- usb-2.6.orig/drivers/base/power/power.h +++ usb-2.6/drivers/base/power/power.h @@ -11,6 +11,7 @@ static inline struct device *to_device(s return container_of(entry, struct device, power.entry); } +extern bool in_suspend_context(void); extern void
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sunday, 24 of February 2008, Alan Stern wrote: On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: Ultimately no, it's not. However, we are now late in the -rc2 time frame and the release of -rc3 seems to be imminent. At this point, IMO, that's the safest thing to do. BTW, appended is the patch I'd like to get applied. In the interest of having a safe release, I guess you're right. That's what I'm concerned about at the moment. I'm afraid there won't be enough time to nail down all the issues (there may be some we're not even aware of). It's unfortunate, though -- there's no good way to get thorough testing without putting the code in Linus's tree. Absolutely. Still, the code in question introduces unexpected behavior that we don't really understand and it's not safe to leave it in the mainline. How would we then learn about drivers trying to register or unregister a device during a sleep transition? I think we should fix the ones we know about and try to reintroduce this change in the 2.6.26 time frame. It also seems reasonable to reconsider what we want to achieve, as there may be a better way to do that. Below is my suggested approach, based on yours. It might even solve the MMC problem, who knows? And it adds some potentially useful debugging, although it would be better to dump just the stack of suspending_task. Do you know how to do that from within a timer routine? No, I don't. I still have no clear idea about what's going on with the ACPI bug in #9874. It seems that the ACPI code is not prepared to cope with a failing device registration, in which case it'd need fixing. Frankly, I'm afraid there may be more issues like this throughout the tree. However perhaps the bug wouldn't occur if we blocked device registration until after the resume was finished, instead of failing it outright. Since the registration in this case was done in a workqueue thread, blocking wouldn't cause an immediate deadlock. No, it wouldn't, but the fact that it happens in the ACPI code makes me worry. If we block that code and the things it's supposed to do turn out to be necessary for suspending later on, we'll be in trouble. Index: usb-2.6/drivers/base/power/main.c === --- usb-2.6.orig/drivers/base/power/main.c +++ usb-2.6/drivers/base/power/main.c @@ -25,6 +25,7 @@ #include linux/pm.h #include linux/resume-trace.h #include linux/rwsem.h +#include linux/sched.h #include ../base.h #include power.h @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem); int (*platform_enable_wakeup)(struct device *dev, int is_on); +static struct task_struct *suspending_task; + +bool in_suspend_context(void) +{ + return (suspending_task == current); +} + /** * device_pm_add - add a device to the list of active devices * @dev: Device to be added to the list @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev /* No suspend in progress, wait on dev-sem */ down(dev-sem); up_read(pm_sleep_rwsem); - } else { - /* Suspend in progress, we may deadlock */ + } else if (!in_suspend_context()) { + /* Suspending in another task, we may deadlock */ Well, that shouldn't really deadlock. If it does, there is a potential design issue somewhere. I think it might be better to set up a timer in here too. Although IMO it would be even better to just set up a timer and remove the warning altogehter. dev_warn(dev, Suspicious %s during suspend\n, __FUNCTION__); dump_stack(); /* The user has been warned ... */ down(dev-sem); } + /* Otherwise we're okay */ } pr_debug(PM: Removing info for %s:%s\n, dev-bus ? dev-bus-name : No Bus, There's a problem here, because we shouldn't release the semaphore if we're in the suspend context. @@ -272,6 +281,7 @@ static void dpm_resume(void) mutex_lock(dpm_list_mtx); } mutex_unlock(dpm_list_mtx); + suspending_task = NULL; } /** @@ -410,6 +420,17 @@ int device_power_down(pm_message_t state } EXPORT_SYMBOL_GPL(device_power_down); +/* Provide debugging info if we hang or deadlock during suspend */ +static struct timer_list suspend_timer; + +static void suspend_timeout(unsigned long _dev) +{ + struct device *dev = (struct device *) _dev; + + dev_err(dev, deadlock during suspend!\n); + show_state(); The show_state() seems to be overkill and won't really help if the user can't collect the output on the fly using a serial console or something like this. [The debug stuff printed here should fit a typical laptop screen, ideally.] +} + /**
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote: On Sunday, 24 of February 2008, Alan Stern wrote: On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: Ultimately no, it's not. However, we are now late in the -rc2 time frame and the release of -rc3 seems to be imminent. At this point, IMO, that's the safest thing to do. BTW, appended is the patch I'd like to get applied. In the interest of having a safe release, I guess you're right. That's what I'm concerned about at the moment. I'm afraid there won't be enough time to nail down all the issues (there may be some we're not even aware of). All right. You'll need to enlarge your big reversal patch by reverting commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f. Below is my suggested approach, based on yours. It might even solve the MMC problem, who knows? And it adds some potentially useful debugging, although it would be better to dump just the stack of suspending_task. Do you know how to do that from within a timer routine? No, I don't. On further checking the answer turns out to be sched_show_task(). That's the routine which gets called for each process when you type Alt-SysRq-t. I still have no clear idea about what's going on with the ACPI bug in #9874. It seems that the ACPI code is not prepared to cope with a failing device registration, in which case it'd need fixing. Frankly, I'm afraid there may be more issues like this throughout the tree. The fact remains that it is unsafe to register a device during a sleep transition, although if the device's driver doesn't have a suspend or resume method you can probably get away with it. However perhaps the bug wouldn't occur if we blocked device registration until after the resume was finished, instead of failing it outright. Since the registration in this case was done in a workqueue thread, blocking wouldn't cause an immediate deadlock. No, it wouldn't, but the fact that it happens in the ACPI code makes me worry. It happened in a workqueue. There could be lots of similar cases: Some interrupt-driven event causes a hotplug action. Since the action can't be carried out in interrupt context, the driver has no choice but to defer it to a workqueue or kernel thread. Blocking that workqueue or kernel thread won't cause a problem _unless_ the driver's suspend/resume methods try to synchronize with it. That's the difficult case. If we block that code and the things it's supposed to do turn out to be necessary for suspending later on, we'll be in trouble. Exactly. @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev /* No suspend in progress, wait on dev-sem */ down(dev-sem); up_read(pm_sleep_rwsem); - } else { - /* Suspend in progress, we may deadlock */ + } else if (!in_suspend_context()) { + /* Suspending in another task, we may deadlock */ Well, that shouldn't really deadlock. If it does, there is a potential design issue somewhere. I think it might be better to set up a timer in here too. At this point the driver core owns the device semaphore, so the unregistration task will block until the sleep is over. If the driver's resume method has to synchronize with the unregistration task then it will deadlock. I agree, it would be a design issue. In fact, it's the same design issue described earlier in this email. Although IMO it would be even better to just set up a timer and remove the warning altogehter. That warning was just for debugging purposes. A timer in the resume path could do the same thing with fewer false alarms, just like the timer in the suspend path. I have added it to the new patch. dev_warn(dev, Suspicious %s during suspend\n, __FUNCTION__); dump_stack(); /* The user has been warned ... */ down(dev-sem); } + /* Otherwise we're okay */ } pr_debug(PM: Removing info for %s:%s\n, dev-bus ? dev-bus-name : No Bus, There's a problem here, because we shouldn't release the semaphore if we're in the suspend context. Yes, you're right. It's fixed in the new version. +static void suspend_timeout(unsigned long _dev) +{ + struct device *dev = (struct device *) _dev; + + dev_err(dev, deadlock during suspend!\n); + show_state(); The show_state() seems to be overkill and won't really help if the user can't collect the output on the fly using a serial console or something like this. [The debug stuff printed here should fit a typical laptop screen, ideally.] Changed. I'd prefer if (in_suspend_context()) { __device_release_driver(dev); } else { down(dev-sem); __device_release_driver(dev);
Re: [linux-pm] [Bug 10030] Suspend doesn't work when SD card is inserted
On Sat, 23 Feb 2008, Alan Stern wrote: It happened in a workqueue. There could be lots of similar cases: Some interrupt-driven event causes a hotplug action. Since the action can't be carried out in interrupt context, the driver has no choice but to defer it to a workqueue or kernel thread. Blocking that workqueue or kernel thread won't cause a problem _unless_ the driver's suspend/resume methods try to synchronize with it. That's the difficult case. In fact this turns out to be exactly the problem with the MMC subsystem. It uses a dedicated workqueue (kmmcd) to handle state changes, and mmc_suspend_host() does a flush_workqueue(). If the workqueue contains an entry to register or unregister a device, this is guaranteed to deadlock -- and that's exactly what happens when Zdenek inserts or removes a card during the disk-drive spindown time of a system sleep. It looks like the best solution is for mmc_suspend_host() not to flush the workqueue; instead the workqueue should prevent itself from running during a suspend. Right now the easiest way to accomplish this is for the workqueue routines (mmc_detect() and siblings) to acquire the mmc_host's device semaphore. If in the future the MMC subsystem adds dynamic PM support then a more complicated arrangement will be needed. So the patch below, in combination with the previous patch, might just fix the whole problem. Alan Stern Index: usb-2.6/drivers/mmc/core/core.c === --- usb-2.6.orig/drivers/mmc/core/core.c +++ usb-2.6/drivers/mmc/core/core.c @@ -731,8 +731,6 @@ void mmc_stop_host(struct mmc_host *host */ int mmc_suspend_host(struct mmc_host *host, pm_message_t state) { - mmc_flush_scheduled_work(); - mmc_bus_get(host); if (host-bus_ops !host-bus_dead) { if (host-bus_ops-suspend) Index: usb-2.6/drivers/mmc/core/mmc.c === --- usb-2.6.orig/drivers/mmc/core/mmc.c +++ usb-2.6/drivers/mmc/core/mmc.c @@ -441,6 +441,7 @@ static void mmc_detect(struct mmc_host * BUG_ON(!host); BUG_ON(!host-card); + down(mmc_classdev(host)-sem); mmc_claim_host(host); /* @@ -449,6 +450,7 @@ static void mmc_detect(struct mmc_host * err = mmc_send_status(host-card, NULL); mmc_release_host(host); + up(mmc_classdev(host)-sem); if (err) { mmc_remove(host); Index: usb-2.6/drivers/mmc/core/sd.c === --- usb-2.6.orig/drivers/mmc/core/sd.c +++ usb-2.6/drivers/mmc/core/sd.c @@ -500,6 +500,7 @@ static void mmc_sd_detect(struct mmc_hos BUG_ON(!host); BUG_ON(!host-card); + down(mmc_classdev(host)-sem); mmc_claim_host(host); /* @@ -508,6 +509,7 @@ static void mmc_sd_detect(struct mmc_hos err = mmc_send_status(host-card, NULL); mmc_release_host(host); + up(mmc_classdev(host)-sem); if (err) { mmc_sd_remove(host); Index: usb-2.6/drivers/mmc/core/sdio.c === --- usb-2.6.orig/drivers/mmc/core/sdio.c +++ usb-2.6/drivers/mmc/core/sdio.c @@ -195,6 +195,7 @@ static void mmc_sdio_detect(struct mmc_h BUG_ON(!host); BUG_ON(!host-card); + down(mmc_classdev(host)-sem); mmc_claim_host(host); /* @@ -203,6 +204,7 @@ static void mmc_sdio_detect(struct mmc_h err = mmc_select_card(host-card); mmc_release_host(host); + up(mmc_classdev(host)-sem); if (err) { mmc_sdio_remove(host); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: > Unfortunately, I missed your Bugzilla comment at > http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28 Strange... But obviously you did see it eventually. > Well, in the face of it, I'm considering to remove the code that > acquires device semaphores from the suspend core for now. Evidently, this > change turns out to be painfully premature. I wonder if that's really the best thing. How would we then learn about drivers trying to register or unregister a device during a sleep transition? Do you think it might be possible instead to somehow allow these unregistrations to go through, while still failing or blocking registrations? It shouldn't be too hard to modify the driver core so that it calls the driver's remove() method without trying to acquire dev->sem if your in_suspend_context() test succeeds. I have to admit, I still don't understand what's going on with the MMC driver. Why is there a workqueue involved? If the workqueue fails to unregister the device, why should it bother the suspend routine? After all, if the suspend routine can afford to wait for the workqueue to finish then it could just as well afford to do the unregistration itself. > Also, we have apparent problems with pm_sleep_lock() > being take in device_add() (see > http://bugzilla.kernel.org/show_bug.cgi?id=9874). We'll have to get more information from the bug reporter to figure out what really happened there. Ultimately it may turn out some drivers just aren't very careful about when they try to register new devices. Doing the registration by way of a workqueue can be problematic if the workqueue happens to run during a system sleep transition. That will still be true if you revert the acquire-all-semaphores patch. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Friday, 22 of February 2008, Alan Stern wrote: > On Thu, 21 Feb 2008, Rafael J. Wysocki wrote: > > > BTW, below is a simplified version of the patch, without the mutex > > protecting > > suspending_task. I'd like to push it upstream if it looks good. > > It does look good. Go ahead and push. > > Acked-by: Alan Stern <[EMAIL PROTECTED]> > > > Please also have a look at http://bugzilla.kernel.org/show_bug.cgi?id=10030. > > There seems to be another issue related to us holding devices' semaphores. > > Namely, it looks like, when the user removes the card, a concurrent thread > > (from a workqueue) calls device_del() and blocks on the dev->sem held by > > us and then something else deadlocks with this thread. I'll be looking into > > this tomorrow. > > I've been too busy with other things to look at the activity on that > bug report. Tonight or tomorrow... Unfortunately, I missed your Bugzilla comment at http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28 Well, in the face of it, I'm considering to remove the code that acquires device semaphores from the suspend core for now. Evidently, this change turns out to be painfully premature. Also, we have apparent problems with pm_sleep_lock() being take in device_add() (see http://bugzilla.kernel.org/show_bug.cgi?id=9874). Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Friday, 22 of February 2008, Alan Stern wrote: On Thu, 21 Feb 2008, Rafael J. Wysocki wrote: BTW, below is a simplified version of the patch, without the mutex protecting suspending_task. I'd like to push it upstream if it looks good. It does look good. Go ahead and push. Acked-by: Alan Stern [EMAIL PROTECTED] Please also have a look at http://bugzilla.kernel.org/show_bug.cgi?id=10030. There seems to be another issue related to us holding devices' semaphores. Namely, it looks like, when the user removes the card, a concurrent thread (from a workqueue) calls device_del() and blocks on the dev-sem held by us and then something else deadlocks with this thread. I'll be looking into this tomorrow. I've been too busy with other things to look at the activity on that bug report. Tonight or tomorrow... Unfortunately, I missed your Bugzilla comment at http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28 Well, in the face of it, I'm considering to remove the code that acquires device semaphores from the suspend core for now. Evidently, this change turns out to be painfully premature. Also, we have apparent problems with pm_sleep_lock() being take in device_add() (see http://bugzilla.kernel.org/show_bug.cgi?id=9874). Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Sat, 23 Feb 2008, Rafael J. Wysocki wrote: Unfortunately, I missed your Bugzilla comment at http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28 Strange... But obviously you did see it eventually. Well, in the face of it, I'm considering to remove the code that acquires device semaphores from the suspend core for now. Evidently, this change turns out to be painfully premature. I wonder if that's really the best thing. How would we then learn about drivers trying to register or unregister a device during a sleep transition? Do you think it might be possible instead to somehow allow these unregistrations to go through, while still failing or blocking registrations? It shouldn't be too hard to modify the driver core so that it calls the driver's remove() method without trying to acquire dev-sem if your in_suspend_context() test succeeds. I have to admit, I still don't understand what's going on with the MMC driver. Why is there a workqueue involved? If the workqueue fails to unregister the device, why should it bother the suspend routine? After all, if the suspend routine can afford to wait for the workqueue to finish then it could just as well afford to do the unregistration itself. Also, we have apparent problems with pm_sleep_lock() being take in device_add() (see http://bugzilla.kernel.org/show_bug.cgi?id=9874). We'll have to get more information from the bug reporter to figure out what really happened there. Ultimately it may turn out some drivers just aren't very careful about when they try to register new devices. Doing the registration by way of a workqueue can be problematic if the workqueue happens to run during a system sleep transition. That will still be true if you revert the acquire-all-semaphores patch. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Thu, 21 Feb 2008, Rafael J. Wysocki wrote: > > > --- linux-2.6.orig/drivers/base/core.c > > > +++ linux-2.6/drivers/base/core.c > > > @@ -929,6 +929,11 @@ void device_del(struct device *dev) > > > struct device *parent = dev->parent; > > > struct class_interface *class_intf; > > > > > > + if (in_suspend_context()) { > > > + get_device(dev); > > > > Where is this get_device() undone? Shouldn't there be an extra > > put_device() added to unregister_dropped_devices()? > > No, I don't think so, because unregister_dropped_devices() calls > device_unregister() that does the put_device() eventually. Ah, yes. > If we are called by device_unregister(), the get_device() is needed to balance > the put_device() that will be called by device_unregister() after we return. > > OTOH, if we are called directly, then we need to balance the put_device() > that will be done by device_unregister() called from > unregister_dropped_devices(). > > I hope I didn't miss anything. Okay, that sounds right. > > > + device_pm_schedule_removal(dev); > > > + return; > > > + } > > > device_pm_remove(dev); > > > if (parent) > > > klist_del(>knode_parent); > > > > And now the change to device_destroy() isn't needed at all. > > No, it's not. Didn't I remove it? I thought I did. Oh yes, you did. I see a possible problem in device_resume(). My copy isn't fully up-to-date, but it looks like you call unregister_dropped_devices() before doing the up_write(_sleep_rwsem). Won't this cause warnings in device_del() about a suspicious caller? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Thursday, 21 of February 2008, Alan Stern wrote: > On Thu, 21 Feb 2008, Rafael J. Wysocki wrote: > > > > > +bool in_suspend_context(void) > > > > +{ > > > > + bool result; > > > > + > > > > + mutex_lock(_task_mtx); > > > > + result = (suspending_task == current); > > > > + mutex_unlock(_task_mtx); > > > > + return result; > > > > +} > > > > > > If suspending_task == current then you are guaranteed to be serialized, > > > because everything a single task does is serial. > > > > As I said before (but that doesn't seem to reach the list, so I'm > > repeating), > > this is to protect other tasks from reading an inconsistent value of > > suspending_task in case they attempt to remove a device concurrently with > > respect to us. > > > > While this is not likely to happen right now, because of the freezer, it may > > very well happen when the freezer is finally removed. > > Sorry, I don't understand. Are you worried that process A might set > suspending_task = A but then process B might still see suspending_task > == NULL? Or that A might set suspend_task = NULL but then B might > still see suspending_task == A? > > Neither one will cause any problem, since the only case that matters is > when B sees suspending_task == B -- and that can happen if and only if > B was the last process to set suspending_task. > > In fact, you might as well get rid of the set_suspending_task() routine > entirely and just put the assignments inline. OK, I will. > > --- linux-2.6.orig/drivers/base/core.c > > +++ linux-2.6/drivers/base/core.c > > @@ -929,6 +929,11 @@ void device_del(struct device *dev) > > struct device *parent = dev->parent; > > struct class_interface *class_intf; > > > > + if (in_suspend_context()) { > > + get_device(dev); > > Where is this get_device() undone? Shouldn't there be an extra > put_device() added to unregister_dropped_devices()? No, I don't think so, because unregister_dropped_devices() calls device_unregister() that does the put_device() eventually. If we are called by device_unregister(), the get_device() is needed to balance the put_device() that will be called by device_unregister() after we return. OTOH, if we are called directly, then we need to balance the put_device() that will be done by device_unregister() called from unregister_dropped_devices(). I hope I didn't miss anything. > > + device_pm_schedule_removal(dev); > > + return; > > + } > > device_pm_remove(dev); > > if (parent) > > klist_del(>knode_parent); > > And now the change to device_destroy() isn't needed at all. No, it's not. Didn't I remove it? I thought I did. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Thu, 21 Feb 2008, Rafael J. Wysocki wrote: > > > +bool in_suspend_context(void) > > > +{ > > > + bool result; > > > + > > > + mutex_lock(_task_mtx); > > > + result = (suspending_task == current); > > > + mutex_unlock(_task_mtx); > > > + return result; > > > +} > > > > If suspending_task == current then you are guaranteed to be serialized, > > because everything a single task does is serial. > > As I said before (but that doesn't seem to reach the list, so I'm repeating), > this is to protect other tasks from reading an inconsistent value of > suspending_task in case they attempt to remove a device concurrently with > respect to us. > > While this is not likely to happen right now, because of the freezer, it may > very well happen when the freezer is finally removed. Sorry, I don't understand. Are you worried that process A might set suspending_task = A but then process B might still see suspending_task == NULL? Or that A might set suspend_task = NULL but then B might still see suspending_task == A? Neither one will cause any problem, since the only case that matters is when B sees suspending_task == B -- and that can happen if and only if B was the last process to set suspending_task. In fact, you might as well get rid of the set_suspending_task() routine entirely and just put the assignments inline. > --- linux-2.6.orig/drivers/base/core.c > +++ linux-2.6/drivers/base/core.c > @@ -929,6 +929,11 @@ void device_del(struct device *dev) > struct device *parent = dev->parent; > struct class_interface *class_intf; > > + if (in_suspend_context()) { > + get_device(dev); Where is this get_device() undone? Shouldn't there be an extra put_device() added to unregister_dropped_devices()? > + device_pm_schedule_removal(dev); > + return; > + } > device_pm_remove(dev); > if (parent) > klist_del(>knode_parent); And now the change to device_destroy() isn't needed at all. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Thursday, 21 of February 2008, Alan Stern wrote: On Thu, 21 Feb 2008, Rafael J. Wysocki wrote: +bool in_suspend_context(void) +{ + bool result; + + mutex_lock(suspending_task_mtx); + result = (suspending_task == current); + mutex_unlock(suspending_task_mtx); + return result; +} If suspending_task == current then you are guaranteed to be serialized, because everything a single task does is serial. As I said before (but that doesn't seem to reach the list, so I'm repeating), this is to protect other tasks from reading an inconsistent value of suspending_task in case they attempt to remove a device concurrently with respect to us. While this is not likely to happen right now, because of the freezer, it may very well happen when the freezer is finally removed. Sorry, I don't understand. Are you worried that process A might set suspending_task = A but then process B might still see suspending_task == NULL? Or that A might set suspend_task = NULL but then B might still see suspending_task == A? Neither one will cause any problem, since the only case that matters is when B sees suspending_task == B -- and that can happen if and only if B was the last process to set suspending_task. In fact, you might as well get rid of the set_suspending_task() routine entirely and just put the assignments inline. OK, I will. --- linux-2.6.orig/drivers/base/core.c +++ linux-2.6/drivers/base/core.c @@ -929,6 +929,11 @@ void device_del(struct device *dev) struct device *parent = dev-parent; struct class_interface *class_intf; + if (in_suspend_context()) { + get_device(dev); Where is this get_device() undone? Shouldn't there be an extra put_device() added to unregister_dropped_devices()? No, I don't think so, because unregister_dropped_devices() calls device_unregister() that does the put_device() eventually. If we are called by device_unregister(), the get_device() is needed to balance the put_device() that will be called by device_unregister() after we return. OTOH, if we are called directly, then we need to balance the put_device() that will be done by device_unregister() called from unregister_dropped_devices(). I hope I didn't miss anything. + device_pm_schedule_removal(dev); + return; + } device_pm_remove(dev); if (parent) klist_del(dev-knode_parent); And now the change to device_destroy() isn't needed at all. No, it's not. Didn't I remove it? I thought I did. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Thu, 21 Feb 2008, Rafael J. Wysocki wrote: +bool in_suspend_context(void) +{ + bool result; + + mutex_lock(suspending_task_mtx); + result = (suspending_task == current); + mutex_unlock(suspending_task_mtx); + return result; +} If suspending_task == current then you are guaranteed to be serialized, because everything a single task does is serial. As I said before (but that doesn't seem to reach the list, so I'm repeating), this is to protect other tasks from reading an inconsistent value of suspending_task in case they attempt to remove a device concurrently with respect to us. While this is not likely to happen right now, because of the freezer, it may very well happen when the freezer is finally removed. Sorry, I don't understand. Are you worried that process A might set suspending_task = A but then process B might still see suspending_task == NULL? Or that A might set suspend_task = NULL but then B might still see suspending_task == A? Neither one will cause any problem, since the only case that matters is when B sees suspending_task == B -- and that can happen if and only if B was the last process to set suspending_task. In fact, you might as well get rid of the set_suspending_task() routine entirely and just put the assignments inline. --- linux-2.6.orig/drivers/base/core.c +++ linux-2.6/drivers/base/core.c @@ -929,6 +929,11 @@ void device_del(struct device *dev) struct device *parent = dev-parent; struct class_interface *class_intf; + if (in_suspend_context()) { + get_device(dev); Where is this get_device() undone? Shouldn't there be an extra put_device() added to unregister_dropped_devices()? + device_pm_schedule_removal(dev); + return; + } device_pm_remove(dev); if (parent) klist_del(dev-knode_parent); And now the change to device_destroy() isn't needed at all. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Thu, 21 Feb 2008, Rafael J. Wysocki wrote: --- linux-2.6.orig/drivers/base/core.c +++ linux-2.6/drivers/base/core.c @@ -929,6 +929,11 @@ void device_del(struct device *dev) struct device *parent = dev-parent; struct class_interface *class_intf; + if (in_suspend_context()) { + get_device(dev); Where is this get_device() undone? Shouldn't there be an extra put_device() added to unregister_dropped_devices()? No, I don't think so, because unregister_dropped_devices() calls device_unregister() that does the put_device() eventually. Ah, yes. If we are called by device_unregister(), the get_device() is needed to balance the put_device() that will be called by device_unregister() after we return. OTOH, if we are called directly, then we need to balance the put_device() that will be done by device_unregister() called from unregister_dropped_devices(). I hope I didn't miss anything. Okay, that sounds right. + device_pm_schedule_removal(dev); + return; + } device_pm_remove(dev); if (parent) klist_del(dev-knode_parent); And now the change to device_destroy() isn't needed at all. No, it's not. Didn't I remove it? I thought I did. Oh yes, you did. I see a possible problem in device_resume(). My copy isn't fully up-to-date, but it looks like you call unregister_dropped_devices() before doing the up_write(pm_sleep_rwsem). Won't this cause warnings in device_del() about a suspicious caller? Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wednesday, 20 of February 2008, Alan Stern wrote: > On Wed, 20 Feb 2008, Rafael J. Wysocki wrote: > > > Well, below is an uncompiled and untested but illustrating the idea that > > might allow people not to bother with device_pm_schedule_removal() > > explicitly and can fix the issue at hand. > > > > [There are some cases that need handling and are not covered here.] > > > > Please have a look. > > > > Thanks, > > Rafael > > > +static struct task_struct *suspending_task; > > +static DEFINE_MUTEX(suspending_task_mtx); > > I suspect you don't really need this mutex. > > > +bool in_suspend_context(void) > > +{ > > + bool result; > > + > > + mutex_lock(_task_mtx); > > + result = (suspending_task == current); > > + mutex_unlock(_task_mtx); > > + return result; > > +} > > If suspending_task == current then you are guaranteed to be serialized, > because everything a single task does is serial. As I said before (but that doesn't seem to reach the list, so I'm repeating), this is to protect other tasks from reading an inconsistent value of suspending_task in case they attempt to remove a device concurrently with respect to us. While this is not likely to happen right now, because of the freezer, it may very well happen when the freezer is finally removed. > > @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class, > > dev = class_find_device(class, , __match_devt); > > if (dev) { > > put_device(dev); > > - device_unregister(dev); > > + if (in_suspend_context()) > > + device_pm_schedule_removal(dev); > > + else > > + device_unregister(dev); > > } > > } > > But what about device_del()? Can a similar change be made there? Something like the patch below, perhaps? Again, untested, but compiled this time. :-) Thanks, Rafael --- drivers/base/core.c|5 + drivers/base/power/main.c | 22 ++ drivers/base/power/power.h |5 + 3 files changed, 32 insertions(+) Index: linux-2.6/drivers/base/power/main.c === --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -59,6 +59,26 @@ static DECLARE_RWSEM(pm_sleep_rwsem); int (*platform_enable_wakeup)(struct device *dev, int is_on); +static struct task_struct *suspending_task; +static DEFINE_MUTEX(suspending_task_mtx); + +bool in_suspend_context(void) +{ + bool result; + + mutex_lock(_task_mtx); + result = (suspending_task == current); + mutex_unlock(_task_mtx); + return result; +} + +static void set_suspending_task(struct task_struct *task) +{ + mutex_lock(_task_mtx); + suspending_task = task; + mutex_unlock(_task_mtx); +} + /** * device_pm_add - add a device to the list of active devices * @dev: Device to be added to the list @@ -272,6 +292,7 @@ static void dpm_resume(void) mutex_lock(_list_mtx); } mutex_unlock(_list_mtx); + set_suspending_task(NULL); } /** @@ -460,6 +481,7 @@ static int dpm_suspend(pm_message_t stat { int error = 0; + set_suspending_task(current); mutex_lock(_list_mtx); while (!list_empty(_locked)) { struct list_head *entry = dpm_locked.prev; Index: linux-2.6/drivers/base/core.c === --- linux-2.6.orig/drivers/base/core.c +++ linux-2.6/drivers/base/core.c @@ -929,6 +929,11 @@ void device_del(struct device *dev) struct device *parent = dev->parent; struct class_interface *class_intf; + if (in_suspend_context()) { + get_device(dev); + device_pm_schedule_removal(dev); + return; + } device_pm_remove(dev); if (parent) klist_del(>knode_parent); Index: linux-2.6/drivers/base/power/power.h === --- linux-2.6.orig/drivers/base/power/power.h +++ linux-2.6/drivers/base/power/power.h @@ -11,6 +11,7 @@ static inline struct device *to_device(s return container_of(entry, struct device, power.entry); } +extern bool in_suspend_context(void); extern void device_pm_add(struct device *); extern void device_pm_remove(struct device *); extern int pm_sleep_lock(void); @@ -18,6 +19,10 @@ extern void pm_sleep_unlock(void); #else /* CONFIG_PM_SLEEP */ +static inline bool in_suspend_context(void) +{ + return false; +} static inline void device_pm_add(struct device *dev) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wednesday, 20 of February 2008, Alan Stern wrote: > On Wed, 20 Feb 2008, Rafael J. Wysocki wrote: > > > Well, below is an uncompiled and untested but illustrating the idea that > > might allow people not to bother with device_pm_schedule_removal() > > explicitly and can fix the issue at hand. > > > > [There are some cases that need handling and are not covered here.] > > > > Please have a look. > > > > Thanks, > > Rafael > > > +static struct task_struct *suspending_task; > > +static DEFINE_MUTEX(suspending_task_mtx); > > I suspect you don't really need this mutex. > > > +bool in_suspend_context(void) > > +{ > > + bool result; > > + > > + mutex_lock(_task_mtx); > > + result = (suspending_task == current); > > + mutex_unlock(_task_mtx); > > + return result; > > +} > > If suspending_task == current then you are guaranteed to be serialized, > because everything a single task does is serial. But in principle there could be a concurrent thread removind the device and that should block on dev->sem held by us. Right now that's not very likely to happen thanks to the freezer, but we're doing all this stuff, because we want to get rid of the freezer eventually. :-) > > @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class, > > dev = class_find_device(class, , __match_devt); > > if (dev) { > > put_device(dev); > > - device_unregister(dev); > > + if (in_suspend_context()) > > + device_pm_schedule_removal(dev); > > + else > > + device_unregister(dev); > > } > > } > > But what about device_del()? Can a similar change be made there? I believe so. I'm working on a more complete patch right now. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wed, 20 Feb 2008, Rafael J. Wysocki wrote: > Well, below is an uncompiled and untested but illustrating the idea that > might allow people not to bother with device_pm_schedule_removal() > explicitly and can fix the issue at hand. > > [There are some cases that need handling and are not covered here.] > > Please have a look. > > Thanks, > Rafael > +static struct task_struct *suspending_task; > +static DEFINE_MUTEX(suspending_task_mtx); I suspect you don't really need this mutex. > +bool in_suspend_context(void) > +{ > + bool result; > + > + mutex_lock(_task_mtx); > + result = (suspending_task == current); > + mutex_unlock(_task_mtx); > + return result; > +} If suspending_task == current then you are guaranteed to be serialized, because everything a single task does is serial. > @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class, > dev = class_find_device(class, , __match_devt); > if (dev) { > put_device(dev); > - device_unregister(dev); > + if (in_suspend_context()) > + device_pm_schedule_removal(dev); > + else > + device_unregister(dev); > } > } But what about device_del()? Can a similar change be made there? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wednesday, 20 of February 2008, Alan Stern wrote: > On Wed, 20 Feb 2008, Pierre Ossman wrote: > > > > And why not simply fail the suspend if the resume routine doesn't exist > > > and the suspend routine does? Maybe with an error message in the > > > system log. > > > > For the asymmetric case, I guess that would do. But I still want to remove > > devices when the bus handler has no suspend handling. > > Then it appears the correct approach is to use the new > device_pm_schedule_removal() routine. It schedules the removal of a > suspended device for a time when the system suspend is over. Well, below is an uncompiled and untested but illustrating the idea that might allow people not to bother with device_pm_schedule_removal() explicitly and can fix the issue at hand. [There are some cases that need handling and are not covered here.] Please have a look. Thanks, Rafael --- drivers/base/core.c|5 - drivers/base/power/main.c | 22 ++ drivers/base/power/power.h |5 + 3 files changed, 31 insertions(+), 1 deletion(-) Index: linux-2.6/drivers/base/power/main.c === --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -59,6 +59,26 @@ static DECLARE_RWSEM(pm_sleep_rwsem); int (*platform_enable_wakeup)(struct device *dev, int is_on); +static struct task_struct *suspending_task; +static DEFINE_MUTEX(suspending_task_mtx); + +bool in_suspend_context(void) +{ + bool result; + + mutex_lock(_task_mtx); + result = (suspending_task == current); + mutex_unlock(_task_mtx); + return result; +} + +static void set_suspending_task(struct task_struct *task) +{ + mutex_lock(_task_mtx); + suspending_task = task; + mutex_unlock(_task_mtx); +} + /** * device_pm_add - add a device to the list of active devices * @dev: Device to be added to the list @@ -272,6 +292,7 @@ static void dpm_resume(void) mutex_lock(_list_mtx); } mutex_unlock(_list_mtx); + set_suspending_task(NULL); } /** @@ -460,6 +481,7 @@ static int dpm_suspend(pm_message_t stat { int error = 0; + set_suspending_task(current); mutex_lock(_list_mtx); while (!list_empty(_locked)) { struct list_head *entry = dpm_locked.prev; Index: linux-2.6/drivers/base/core.c === --- linux-2.6.orig/drivers/base/core.c +++ linux-2.6/drivers/base/core.c @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class, dev = class_find_device(class, , __match_devt); if (dev) { put_device(dev); - device_unregister(dev); + if (in_suspend_context()) + device_pm_schedule_removal(dev); + else + device_unregister(dev); } } EXPORT_SYMBOL_GPL(device_destroy); Index: linux-2.6/drivers/base/power/power.h === --- linux-2.6.orig/drivers/base/power/power.h +++ linux-2.6/drivers/base/power/power.h @@ -11,6 +11,7 @@ static inline struct device *to_device(s return container_of(entry, struct device, power.entry); } +extern bool in_suspend_context(void); extern void device_pm_add(struct device *); extern void device_pm_remove(struct device *); extern int pm_sleep_lock(void); @@ -18,6 +19,10 @@ extern void pm_sleep_unlock(void); #else /* CONFIG_PM_SLEEP */ +static inline bool in_suspend_context(void) +{ + return false; +} static inline void device_pm_add(struct device *dev) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wed, 20 Feb 2008, Pierre Ossman wrote: > > And why not simply fail the suspend if the resume routine doesn't exist > > and the suspend routine does? Maybe with an error message in the > > system log. > > For the asymmetric case, I guess that would do. But I still want to remove > devices when the bus handler has no suspend handling. Then it appears the correct approach is to use the new device_pm_schedule_removal() routine. It schedules the removal of a suspended device for a time when the system suspend is over. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wednesday, 20 of February 2008, Pierre Ossman wrote: > On Wed, 20 Feb 2008 14:26:16 -0500 (EST) > Alan Stern <[EMAIL PROTECTED]> wrote: > > > > > Do I understand this correctly? You've got special handling for the > > case where a bus handler doesn't have a resume routine, but no special > > handling for the case where it doesn't have a suspend routine? > > Hmm... There should be checks for both, but the code seems to suggest > otherwise. > > > Why bother to remove the device if neither routine exists (there won't be > > any need to revive it since the bus never got suspended)? > > The bus always gets suspended. The checks are to determine if state is saved > or not. If it isn't, then a suspend/resume is treated as a removal/insertion. > > > And why not simply fail the suspend if the resume routine doesn't exist > > and the suspend routine does? Maybe with an error message in the > > system log. > > For the asymmetric case, I guess that would do. But I still want to remove > devices when the bus handler has no suspend handling. I think I know how to handle that, but there's a more urgent issue I need to fix first. Stay tuned. :-) Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wed, 20 Feb 2008 14:26:16 -0500 (EST) Alan Stern <[EMAIL PROTECTED]> wrote: > > Do I understand this correctly? You've got special handling for the > case where a bus handler doesn't have a resume routine, but no special > handling for the case where it doesn't have a suspend routine? Hmm... There should be checks for both, but the code seems to suggest otherwise. > Why bother to remove the device if neither routine exists (there won't be > any need to revive it since the bus never got suspended)? The bus always gets suspended. The checks are to determine if state is saved or not. If it isn't, then a suspend/resume is treated as a removal/insertion. > And why not simply fail the suspend if the resume routine doesn't exist > and the suspend routine does? Maybe with an error message in the > system log. For the asymmetric case, I guess that would do. But I still want to remove devices when the bus handler has no suspend handling. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wed, 20 Feb 2008, Pierre Ossman wrote: > Not really. But you have some things confused. What it checks is if > the mmc bus handler (not a proper driver model, just a way of > separating the MMC, SD and SDIO stuff) has a resume function. And if > it doesn't, it removes the card (since it cannot revive it at > resume). > > So the only thing I can think of is to delay the removal until the > resume routine, if that is safer. Do I understand this correctly? You've got special handling for the case where a bus handler doesn't have a resume routine, but no special handling for the case where it doesn't have a suspend routine? Why bother to remove the device if neither routine exists (there won't be any need to revive it since the bus never got suspended)? And why not simply fail the suspend if the resume routine doesn't exist and the suspend routine does? Maybe with an error message in the system log. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wed, 20 Feb 2008 11:42:56 -0500 (EST) Alan Stern <[EMAIL PROTECTED]> wrote: > > > > --- Comment #14 from [EMAIL PROTECTED] 2008-02-19 15:23 --- > > Thanks a lot for the debugging work! > > > > First, the patch triggers, which means that the problem discovered by Alan > > is > > troubling us. [Alan, do you have an idea how to fix that cleanly?] > > I suggest we ask the maintainer for the MMC subsystem. > > Pierre, you can find the details in the bugzilla entry. Briefly, > there's a pathway in the MMC core suspend routine (if the driver > doesn't implement a resume hook) which could lead to the host being > removed during a system suspend. This is an illegal operation and it > will deadlock. > > Do you have a suggestion for a way to fix it? > Not really. But you have some things confused. What it checks is if the mmc bus handler (not a proper driver model, just a way of separating the MMC, SD and SDIO stuff) has a resume function. And if it doesn't, it removes the card (since it cannot revive it at resume). So the only thing I can think of is to delay the removal until the resume routine, if that is safer. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Tue, 19 Feb 2008 [EMAIL PROTECTED] wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10030 > > > > > > --- Comment #14 from [EMAIL PROTECTED] 2008-02-19 15:23 --- > Thanks a lot for the debugging work! > > First, the patch triggers, which means that the problem discovered by Alan is > troubling us. [Alan, do you have an idea how to fix that cleanly?] I suggest we ask the maintainer for the MMC subsystem. Pierre, you can find the details in the bugzilla entry. Briefly, there's a pathway in the MMC core suspend routine (if the driver doesn't implement a resume hook) which could lead to the host being removed during a system suspend. This is an illegal operation and it will deadlock. Do you have a suggestion for a way to fix it? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Tue, 19 Feb 2008 [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=10030 --- Comment #14 from [EMAIL PROTECTED] 2008-02-19 15:23 --- Thanks a lot for the debugging work! First, the patch triggers, which means that the problem discovered by Alan is troubling us. [Alan, do you have an idea how to fix that cleanly?] I suggest we ask the maintainer for the MMC subsystem. Pierre, you can find the details in the bugzilla entry. Briefly, there's a pathway in the MMC core suspend routine (if the driver doesn't implement a resume hook) which could lead to the host being removed during a system suspend. This is an illegal operation and it will deadlock. Do you have a suggestion for a way to fix it? Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wed, 20 Feb 2008 11:42:56 -0500 (EST) Alan Stern [EMAIL PROTECTED] wrote: --- Comment #14 from [EMAIL PROTECTED] 2008-02-19 15:23 --- Thanks a lot for the debugging work! First, the patch triggers, which means that the problem discovered by Alan is troubling us. [Alan, do you have an idea how to fix that cleanly?] I suggest we ask the maintainer for the MMC subsystem. Pierre, you can find the details in the bugzilla entry. Briefly, there's a pathway in the MMC core suspend routine (if the driver doesn't implement a resume hook) which could lead to the host being removed during a system suspend. This is an illegal operation and it will deadlock. Do you have a suggestion for a way to fix it? Not really. But you have some things confused. What it checks is if the mmc bus handler (not a proper driver model, just a way of separating the MMC, SD and SDIO stuff) has a resume function. And if it doesn't, it removes the card (since it cannot revive it at resume). So the only thing I can think of is to delay the removal until the resume routine, if that is safer. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wed, 20 Feb 2008, Pierre Ossman wrote: Not really. But you have some things confused. What it checks is if the mmc bus handler (not a proper driver model, just a way of separating the MMC, SD and SDIO stuff) has a resume function. And if it doesn't, it removes the card (since it cannot revive it at resume). So the only thing I can think of is to delay the removal until the resume routine, if that is safer. Do I understand this correctly? You've got special handling for the case where a bus handler doesn't have a resume routine, but no special handling for the case where it doesn't have a suspend routine? Why bother to remove the device if neither routine exists (there won't be any need to revive it since the bus never got suspended)? And why not simply fail the suspend if the resume routine doesn't exist and the suspend routine does? Maybe with an error message in the system log. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wed, 20 Feb 2008 14:26:16 -0500 (EST) Alan Stern [EMAIL PROTECTED] wrote: Do I understand this correctly? You've got special handling for the case where a bus handler doesn't have a resume routine, but no special handling for the case where it doesn't have a suspend routine? Hmm... There should be checks for both, but the code seems to suggest otherwise. Why bother to remove the device if neither routine exists (there won't be any need to revive it since the bus never got suspended)? The bus always gets suspended. The checks are to determine if state is saved or not. If it isn't, then a suspend/resume is treated as a removal/insertion. And why not simply fail the suspend if the resume routine doesn't exist and the suspend routine does? Maybe with an error message in the system log. For the asymmetric case, I guess that would do. But I still want to remove devices when the bus handler has no suspend handling. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wednesday, 20 of February 2008, Pierre Ossman wrote: On Wed, 20 Feb 2008 14:26:16 -0500 (EST) Alan Stern [EMAIL PROTECTED] wrote: Do I understand this correctly? You've got special handling for the case where a bus handler doesn't have a resume routine, but no special handling for the case where it doesn't have a suspend routine? Hmm... There should be checks for both, but the code seems to suggest otherwise. Why bother to remove the device if neither routine exists (there won't be any need to revive it since the bus never got suspended)? The bus always gets suspended. The checks are to determine if state is saved or not. If it isn't, then a suspend/resume is treated as a removal/insertion. And why not simply fail the suspend if the resume routine doesn't exist and the suspend routine does? Maybe with an error message in the system log. For the asymmetric case, I guess that would do. But I still want to remove devices when the bus handler has no suspend handling. I think I know how to handle that, but there's a more urgent issue I need to fix first. Stay tuned. :-) Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wed, 20 Feb 2008, Pierre Ossman wrote: And why not simply fail the suspend if the resume routine doesn't exist and the suspend routine does? Maybe with an error message in the system log. For the asymmetric case, I guess that would do. But I still want to remove devices when the bus handler has no suspend handling. Then it appears the correct approach is to use the new device_pm_schedule_removal() routine. It schedules the removal of a suspended device for a time when the system suspend is over. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wednesday, 20 of February 2008, Alan Stern wrote: On Wed, 20 Feb 2008, Pierre Ossman wrote: And why not simply fail the suspend if the resume routine doesn't exist and the suspend routine does? Maybe with an error message in the system log. For the asymmetric case, I guess that would do. But I still want to remove devices when the bus handler has no suspend handling. Then it appears the correct approach is to use the new device_pm_schedule_removal() routine. It schedules the removal of a suspended device for a time when the system suspend is over. Well, below is an uncompiled and untested but illustrating the idea that might allow people not to bother with device_pm_schedule_removal() explicitly and can fix the issue at hand. [There are some cases that need handling and are not covered here.] Please have a look. Thanks, Rafael --- drivers/base/core.c|5 - drivers/base/power/main.c | 22 ++ drivers/base/power/power.h |5 + 3 files changed, 31 insertions(+), 1 deletion(-) Index: linux-2.6/drivers/base/power/main.c === --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -59,6 +59,26 @@ static DECLARE_RWSEM(pm_sleep_rwsem); int (*platform_enable_wakeup)(struct device *dev, int is_on); +static struct task_struct *suspending_task; +static DEFINE_MUTEX(suspending_task_mtx); + +bool in_suspend_context(void) +{ + bool result; + + mutex_lock(suspending_task_mtx); + result = (suspending_task == current); + mutex_unlock(suspending_task_mtx); + return result; +} + +static void set_suspending_task(struct task_struct *task) +{ + mutex_lock(suspending_task_mtx); + suspending_task = task; + mutex_unlock(suspending_task_mtx); +} + /** * device_pm_add - add a device to the list of active devices * @dev: Device to be added to the list @@ -272,6 +292,7 @@ static void dpm_resume(void) mutex_lock(dpm_list_mtx); } mutex_unlock(dpm_list_mtx); + set_suspending_task(NULL); } /** @@ -460,6 +481,7 @@ static int dpm_suspend(pm_message_t stat { int error = 0; + set_suspending_task(current); mutex_lock(dpm_list_mtx); while (!list_empty(dpm_locked)) { struct list_head *entry = dpm_locked.prev; Index: linux-2.6/drivers/base/core.c === --- linux-2.6.orig/drivers/base/core.c +++ linux-2.6/drivers/base/core.c @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class, dev = class_find_device(class, devt, __match_devt); if (dev) { put_device(dev); - device_unregister(dev); + if (in_suspend_context()) + device_pm_schedule_removal(dev); + else + device_unregister(dev); } } EXPORT_SYMBOL_GPL(device_destroy); Index: linux-2.6/drivers/base/power/power.h === --- linux-2.6.orig/drivers/base/power/power.h +++ linux-2.6/drivers/base/power/power.h @@ -11,6 +11,7 @@ static inline struct device *to_device(s return container_of(entry, struct device, power.entry); } +extern bool in_suspend_context(void); extern void device_pm_add(struct device *); extern void device_pm_remove(struct device *); extern int pm_sleep_lock(void); @@ -18,6 +19,10 @@ extern void pm_sleep_unlock(void); #else /* CONFIG_PM_SLEEP */ +static inline bool in_suspend_context(void) +{ + return false; +} static inline void device_pm_add(struct device *dev) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wed, 20 Feb 2008, Rafael J. Wysocki wrote: Well, below is an uncompiled and untested but illustrating the idea that might allow people not to bother with device_pm_schedule_removal() explicitly and can fix the issue at hand. [There are some cases that need handling and are not covered here.] Please have a look. Thanks, Rafael +static struct task_struct *suspending_task; +static DEFINE_MUTEX(suspending_task_mtx); I suspect you don't really need this mutex. +bool in_suspend_context(void) +{ + bool result; + + mutex_lock(suspending_task_mtx); + result = (suspending_task == current); + mutex_unlock(suspending_task_mtx); + return result; +} If suspending_task == current then you are guaranteed to be serialized, because everything a single task does is serial. @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class, dev = class_find_device(class, devt, __match_devt); if (dev) { put_device(dev); - device_unregister(dev); + if (in_suspend_context()) + device_pm_schedule_removal(dev); + else + device_unregister(dev); } } But what about device_del()? Can a similar change be made there? Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wednesday, 20 of February 2008, Alan Stern wrote: On Wed, 20 Feb 2008, Rafael J. Wysocki wrote: Well, below is an uncompiled and untested but illustrating the idea that might allow people not to bother with device_pm_schedule_removal() explicitly and can fix the issue at hand. [There are some cases that need handling and are not covered here.] Please have a look. Thanks, Rafael +static struct task_struct *suspending_task; +static DEFINE_MUTEX(suspending_task_mtx); I suspect you don't really need this mutex. +bool in_suspend_context(void) +{ + bool result; + + mutex_lock(suspending_task_mtx); + result = (suspending_task == current); + mutex_unlock(suspending_task_mtx); + return result; +} If suspending_task == current then you are guaranteed to be serialized, because everything a single task does is serial. But in principle there could be a concurrent thread removind the device and that should block on dev-sem held by us. Right now that's not very likely to happen thanks to the freezer, but we're doing all this stuff, because we want to get rid of the freezer eventually. :-) @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class, dev = class_find_device(class, devt, __match_devt); if (dev) { put_device(dev); - device_unregister(dev); + if (in_suspend_context()) + device_pm_schedule_removal(dev); + else + device_unregister(dev); } } But what about device_del()? Can a similar change be made there? I believe so. I'm working on a more complete patch right now. Thanks, Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 10030] Suspend doesn't work when SD card is inserted
On Wednesday, 20 of February 2008, Alan Stern wrote: On Wed, 20 Feb 2008, Rafael J. Wysocki wrote: Well, below is an uncompiled and untested but illustrating the idea that might allow people not to bother with device_pm_schedule_removal() explicitly and can fix the issue at hand. [There are some cases that need handling and are not covered here.] Please have a look. Thanks, Rafael +static struct task_struct *suspending_task; +static DEFINE_MUTEX(suspending_task_mtx); I suspect you don't really need this mutex. +bool in_suspend_context(void) +{ + bool result; + + mutex_lock(suspending_task_mtx); + result = (suspending_task == current); + mutex_unlock(suspending_task_mtx); + return result; +} If suspending_task == current then you are guaranteed to be serialized, because everything a single task does is serial. As I said before (but that doesn't seem to reach the list, so I'm repeating), this is to protect other tasks from reading an inconsistent value of suspending_task in case they attempt to remove a device concurrently with respect to us. While this is not likely to happen right now, because of the freezer, it may very well happen when the freezer is finally removed. @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class, dev = class_find_device(class, devt, __match_devt); if (dev) { put_device(dev); - device_unregister(dev); + if (in_suspend_context()) + device_pm_schedule_removal(dev); + else + device_unregister(dev); } } But what about device_del()? Can a similar change be made there? Something like the patch below, perhaps? Again, untested, but compiled this time. :-) Thanks, Rafael --- drivers/base/core.c|5 + drivers/base/power/main.c | 22 ++ drivers/base/power/power.h |5 + 3 files changed, 32 insertions(+) Index: linux-2.6/drivers/base/power/main.c === --- linux-2.6.orig/drivers/base/power/main.c +++ linux-2.6/drivers/base/power/main.c @@ -59,6 +59,26 @@ static DECLARE_RWSEM(pm_sleep_rwsem); int (*platform_enable_wakeup)(struct device *dev, int is_on); +static struct task_struct *suspending_task; +static DEFINE_MUTEX(suspending_task_mtx); + +bool in_suspend_context(void) +{ + bool result; + + mutex_lock(suspending_task_mtx); + result = (suspending_task == current); + mutex_unlock(suspending_task_mtx); + return result; +} + +static void set_suspending_task(struct task_struct *task) +{ + mutex_lock(suspending_task_mtx); + suspending_task = task; + mutex_unlock(suspending_task_mtx); +} + /** * device_pm_add - add a device to the list of active devices * @dev: Device to be added to the list @@ -272,6 +292,7 @@ static void dpm_resume(void) mutex_lock(dpm_list_mtx); } mutex_unlock(dpm_list_mtx); + set_suspending_task(NULL); } /** @@ -460,6 +481,7 @@ static int dpm_suspend(pm_message_t stat { int error = 0; + set_suspending_task(current); mutex_lock(dpm_list_mtx); while (!list_empty(dpm_locked)) { struct list_head *entry = dpm_locked.prev; Index: linux-2.6/drivers/base/core.c === --- linux-2.6.orig/drivers/base/core.c +++ linux-2.6/drivers/base/core.c @@ -929,6 +929,11 @@ void device_del(struct device *dev) struct device *parent = dev-parent; struct class_interface *class_intf; + if (in_suspend_context()) { + get_device(dev); + device_pm_schedule_removal(dev); + return; + } device_pm_remove(dev); if (parent) klist_del(dev-knode_parent); Index: linux-2.6/drivers/base/power/power.h === --- linux-2.6.orig/drivers/base/power/power.h +++ linux-2.6/drivers/base/power/power.h @@ -11,6 +11,7 @@ static inline struct device *to_device(s return container_of(entry, struct device, power.entry); } +extern bool in_suspend_context(void); extern void device_pm_add(struct device *); extern void device_pm_remove(struct device *); extern int pm_sleep_lock(void); @@ -18,6 +19,10 @@ extern void pm_sleep_unlock(void); #else /* CONFIG_PM_SLEEP */ +static inline bool in_suspend_context(void) +{ + return false; +} static inline void device_pm_add(struct device *dev) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/