Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

2008-02-25 Thread Felipe Balbi


> > What's the purpose of the card-counting loop in
> > host/omap.c:mmc_omap_switch_handler()? It looks like dead code.
> >
>
> I'm not too familiar with that driver, but they've been playing around
> with multiplexing several cards into one controller. Might be bits and
> pieces of that.
>

AFAIK, that came after mmc multislot support. Omap2 has one mmc
controller  for several (2 in practice) mm slots (cards).
It's not really a dead code since it's used in n800 and n810, it's in
linux-omap archives and git tree.

Carlos Aguiar, added in the loop, might have some comments about it.

--
Best Regards,
Felipe Balbi
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

2008-02-25 Thread Alan Stern
On Mon, 25 Feb 2008, Pierre Ossman wrote:

> > What do you think of the patch attached to comment #40 in the Bugzilla 
> > entry?
> > 
> 
> Looks ok. As long as those two synchronization points are guaranteed,
> then I'm happy.

Maybe a better approach would be to leave the workqueue unfreezable,
and keep cancel_delayed_work_sync() in mmc_suspend_host().  It would
then be necessary to add a test to verify, if there is a card attached,
that the card is indeed suspended.  After all, it's possible that the
cancel_delayed_work_sync() ended up waiting for a job already running
on the workqueue to register a new card!  (The same would be true even 
with flush_scheduled_work.)

Also, as a bit of defensive programming, it might be a good idea to add
a "suspended" flag to the mmc_host structure.  If mmc_rescan() sees
that the flag is set then it should return immediately.  This would
protect against host drivers that aren't careful to disable detect
IRQs before calling mmc_suspend_host().

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

2008-02-25 Thread Pierre Ossman
On Mon, 25 Feb 2008 12:58:30 -0500 (EST)
Alan Stern <[EMAIL PROTECTED]> wrote:

> Thanks for the explanations.
> 
> On Mon, 25 Feb 2008, Pierre Ossman wrote:
> 
> > Trying to keep up with the PM changes is a full time job. For now I've
> > mostly ignored it as I don't even have time for the other stuff.
> > Patches welcome.
> 
> What do you think of the patch attached to comment #40 in the Bugzilla 
> entry?
> 

Looks ok. As long as those two synchronization points are guaranteed,
then I'm happy.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

2008-02-25 Thread Alan Stern
Thanks for the explanations.

On Mon, 25 Feb 2008, Pierre Ossman wrote:

> Trying to keep up with the PM changes is a full time job. For now I've
> mostly ignored it as I don't even have time for the other stuff.
> Patches welcome.

What do you think of the patch attached to comment #40 in the Bugzilla 
entry?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

2008-02-25 Thread Pierre Ossman
On Sun, 24 Feb 2008 10:33:34 -0500 (EST)
Alan Stern <[EMAIL PROTECTED]> wrote:

> 
> Well, the patch itself isn't really adequate; it was just a first pass 
> intended to illustrate the nature of the problem.
> 
> The problems in the MMC subsystem go deeper.
> 
> The first is the way it uses flush_workqueue().  This is almost always 
> a bad function to call, because it introduces unnecessary dependencies.  
> cancel_delayed_work_sync() is much better.
> 
> But even changing that won't solve the second issue, which is a genuine
> bug.  There is a race between detect events and suspend events.  The
> mmc_suspend_host() routine starts out by flushing the kmmcd workqueue
> before calling the host's suspend routine.  So what happens if another
> detect event occurs in between?
> 

The idea is that host drivers shouldn't do that. Once they've called
mmc_suspend_host(), then they shouldn't be poking the MMC core in any
other way. None of this is of course properly documented. :/

> This whole area of synchronization between card insertion, card
> removal, suspend, and resume needs to be thought out better.  
> Especially keeping in mind that device registration or unregistration
> during a system sleep is likely to block until the sleep is over.

Trying to keep up with the PM changes is a full time job. For now I've
mostly ignored it as I don't even have time for the other stuff.
Patches welcome.

> 
> Lastly, there are some other questions about confusing aspects of the
> subsystem.  What's the story with __mmc_claim_host()?  Is it really
> nothing more than an abortable mutex_lock()?  Why does it ever need to
> be aborted?  Why is its second argument an atomic_t instead of a normal
> int?
> 

It got that way when SDIO got in. It was needed to make it abortable to
solve a rather nasty deadlock situation. I don't remember the details
right now, but it should be in the LKML archives.

> Why are mmc_detect() and related routines described in comments as
> "Card detection callback from host"?  They don't handle card
> _detection_ -- they handle card _removal_.

They handle both.

> 
> What's the purpose of the card-counting loop in 
> host/omap.c:mmc_omap_switch_handler()?  It looks like dead code.
> 

I'm not too familiar with that driver, but they've been playing around
with multiplexing several cards into one controller. Might be bits and
pieces of that.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: using long instead of atomic_t when only set/read is required (was Re: [Bug 10030] Suspend doesn't work when SD card is inserted)

2008-02-25 Thread Alan Stern
On Mon, 25 Feb 2008, Pavel Machek wrote:

> Hi!
> 
> Alan thinks that `subj` is correct...

More precisely, reads and writes of pointers are always atomic.  That 
is, if a write and a read occur concurrently, it is guaranteed that the 
read will obtain either the old or the new value of the pointer, never 
a mish-mash of the two.  If this were not so then RCU wouldn't work.

> I guess it only works as long as longs are aligned? Should it be
> written down to atomic_ops.txt?

Forget it, I'm going to withdraw the entire thing.  The whole approach 
is wrong.  More details in a new email thread, to follow soon.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-25 Thread Rafael J. Wysocki
On Monday, 25 of February 2008, Zdenek Kabelac wrote:
> Hi

Hi,

[CCs restored, added CC to Dave]

> I'm finally going to test some kernel - because I'd been trying it
> against the HEAD - but unfortunately  it looks like there is something
> seriously broken with -rc3 and sata merge - anyway - I'm going to make
> a test with this head commit 85b80ebfa4384b8ea30cc1af9617db30319a9ccd
> which should be the last one before merge of SATA.
> 
> So I'm going to check this tree with you patch:
> pm-remove-locking-from-core.patch

If that's the one from http://marc.info/?l=linux-acpi=120389632114090=2 ,
please do it.

> ---
>  drivers/base/core.c   |8 ---
>  drivers/base/power/main.c |   97 
> +++---
>  drivers/usb/core/usb.c|2
>  3 files changed, 8 insertions(+), 99 deletions(-)
> 
> Do you wan to test also the other patch ?

Yes.  Please also test the patch that Alan asked you to test here:
http://lkml.org/lkml/2008/2/23/402
It's experimantal, but it is important to us to know if you see the symptoms
(and which ones if you do) with this patch applied.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-25 Thread Rafael J. Wysocki
On Monday, 25 of February 2008, Alan Stern wrote:
> On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
> 
> > > Very subtly wrong ;-).
> > > 
> > > imagine suspending_task == 0xabcdef01. Now task "R" with current ==
> > > 0xabcd reads suspending_task while the other cpu is writing to it,
> > > and sees 0xabcd (0xef01 was not yet written) -- and mistakenly
> > > believes that  "R" == suspending_task.
> > > 
> > > I agree it is very unlikely, and it will not happen on i386. But what
> > > about just using atomic_t suspending_task, and store current->pid into
> > > it?
> > 
> > I'd rather use a lock, frankly.  For example, we can require the readers to
> > take pm_sleep_rwsem for reading in order to access that.
> 
> That certainly won't work.  Imagine what would happen when the reader 
> _was_ the suspending task.

Yeah, I've already realized it was a stupid idea. :-)

> But if you really twist my arm, I'll go along with Pavel's suggestion.

So can you do that, pretty please?

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-25 Thread Rafael J. Wysocki
On Monday, 25 of February 2008, Alan Stern wrote:
> On Sun, 24 Feb 2008, Pavel Machek wrote:
> 
> > > > At the very least, you'd need rmb() before reading it and wmb() after
> > > > writing to it, but I'm not sure if that's enough on every obscure
> > > > architecture out there.
> > > 
> > > No, neither one is needed because of the way suspending_task is used.  
> > > 
> > > It's not necessary for a reader R to see the variable's actual value;  
> > > all R needs to know is whether or not suspending_task is equal to R.  
> > > Since the only process which can set suspending_task to R is R itself,
> > > and since R will set suspending_task back to NULL before releasing the
> > > write lock on pm_sleep_rwsem, there's never any ambiguity.
> > 
> > Subtle.
> > 
> > Very subtly wrong ;-).
> > 
> > imagine suspending_task == 0xabcdef01. Now task "R" with current ==
> > 0xabcd reads suspending_task while the other cpu is writing to it,
> > and sees 0xabcd (0xef01 was not yet written) -- and mistakenly
> > believes that  "R" == suspending_task.
> 
> I always thought that reads and writes of pointers are atomic, just 
> like reads and writes of longs.  Is that wrong?

That depends on the architecture.  It may be wrong on alpha, IIRC.

> Now if pointers were the same size as long long then I would agree with 
> you.

That certainly is true on x86-64.  On alpha probably too.

> > I agree it is very unlikely, and it will not happen on i386. But what
> > about just using atomic_t suspending_task, and store current->pid into
> > it?
> 
> That would work just as well.  Except that it wouldn't need to be
> atomic_t, because current->pid is always an integer (not guaranteed, I
> suppose, but that's what it is on all current architectures) and
> reads/writes of ints _are_ atomic.

That also depends on the arch, I'm afraid, and in general if we assume
something to be atomic, it's better to make the code reflect that assumption.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


using long instead of atomic_t when only set/read is required (was Re: [Bug 10030] Suspend doesn't work when SD card is inserted)

2008-02-25 Thread Pavel Machek
Hi!

Alan thinks that `subj` is correct...

> > > > At the very least, you'd need rmb() before reading it and wmb() after
> > > > writing to it, but I'm not sure if that's enough on every obscure
> > > > architecture out there.
> > > 
> > > No, neither one is needed because of the way suspending_task is used.  
> > > 
> > > It's not necessary for a reader R to see the variable's actual value;  
> > > all R needs to know is whether or not suspending_task is equal to R.  
> > > Since the only process which can set suspending_task to R is R itself,
> > > and since R will set suspending_task back to NULL before releasing the
> > > write lock on pm_sleep_rwsem, there's never any ambiguity.
> > 
> > Subtle.
> > 
> > Very subtly wrong ;-).
> > 
> > imagine suspending_task == 0xabcdef01. Now task "R" with current ==
> > 0xabcd reads suspending_task while the other cpu is writing to it,
> > and sees 0xabcd (0xef01 was not yet written) -- and mistakenly
> > believes that  "R" == suspending_task.
> 
> I always thought that reads and writes of pointers are atomic, just 
> like reads and writes of longs.  Is that wrong?

...but I'm not that sure. Can someone clarify?

I guess it only works as long as longs are aligned? Should it be
written down to atomic_ops.txt?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-25 Thread Rafael J. Wysocki
On Monday, 25 of February 2008, Alan Stern wrote:
 On Sun, 24 Feb 2008, Pavel Machek wrote:
 
At the very least, you'd need rmb() before reading it and wmb() after
writing to it, but I'm not sure if that's enough on every obscure
architecture out there.
   
   No, neither one is needed because of the way suspending_task is used.  
   
   It's not necessary for a reader R to see the variable's actual value;  
   all R needs to know is whether or not suspending_task is equal to R.  
   Since the only process which can set suspending_task to R is R itself,
   and since R will set suspending_task back to NULL before releasing the
   write lock on pm_sleep_rwsem, there's never any ambiguity.
  
  Subtle.
  
  Very subtly wrong ;-).
  
  imagine suspending_task == 0xabcdef01. Now task R with current ==
  0xabcd reads suspending_task while the other cpu is writing to it,
  and sees 0xabcd (0xef01 was not yet written) -- and mistakenly
  believes that  R == suspending_task.
 
 I always thought that reads and writes of pointers are atomic, just 
 like reads and writes of longs.  Is that wrong?

That depends on the architecture.  It may be wrong on alpha, IIRC.

 Now if pointers were the same size as long long then I would agree with 
 you.

That certainly is true on x86-64.  On alpha probably too.

  I agree it is very unlikely, and it will not happen on i386. But what
  about just using atomic_t suspending_task, and store current-pid into
  it?
 
 That would work just as well.  Except that it wouldn't need to be
 atomic_t, because current-pid is always an integer (not guaranteed, I
 suppose, but that's what it is on all current architectures) and
 reads/writes of ints _are_ atomic.

That also depends on the arch, I'm afraid, and in general if we assume
something to be atomic, it's better to make the code reflect that assumption.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-25 Thread Rafael J. Wysocki
On Monday, 25 of February 2008, Alan Stern wrote:
 On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
 
   Very subtly wrong ;-).
   
   imagine suspending_task == 0xabcdef01. Now task R with current ==
   0xabcd reads suspending_task while the other cpu is writing to it,
   and sees 0xabcd (0xef01 was not yet written) -- and mistakenly
   believes that  R == suspending_task.
   
   I agree it is very unlikely, and it will not happen on i386. But what
   about just using atomic_t suspending_task, and store current-pid into
   it?
  
  I'd rather use a lock, frankly.  For example, we can require the readers to
  take pm_sleep_rwsem for reading in order to access that.
 
 That certainly won't work.  Imagine what would happen when the reader 
 _was_ the suspending task.

Yeah, I've already realized it was a stupid idea. :-)

 But if you really twist my arm, I'll go along with Pavel's suggestion.

So can you do that, pretty please?

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


using long instead of atomic_t when only set/read is required (was Re: [Bug 10030] Suspend doesn't work when SD card is inserted)

2008-02-25 Thread Pavel Machek
Hi!

Alan thinks that `subj` is correct...

At the very least, you'd need rmb() before reading it and wmb() after
writing to it, but I'm not sure if that's enough on every obscure
architecture out there.
   
   No, neither one is needed because of the way suspending_task is used.  
   
   It's not necessary for a reader R to see the variable's actual value;  
   all R needs to know is whether or not suspending_task is equal to R.  
   Since the only process which can set suspending_task to R is R itself,
   and since R will set suspending_task back to NULL before releasing the
   write lock on pm_sleep_rwsem, there's never any ambiguity.
  
  Subtle.
  
  Very subtly wrong ;-).
  
  imagine suspending_task == 0xabcdef01. Now task R with current ==
  0xabcd reads suspending_task while the other cpu is writing to it,
  and sees 0xabcd (0xef01 was not yet written) -- and mistakenly
  believes that  R == suspending_task.
 
 I always thought that reads and writes of pointers are atomic, just 
 like reads and writes of longs.  Is that wrong?

...but I'm not that sure. Can someone clarify?

I guess it only works as long as longs are aligned? Should it be
written down to atomic_ops.txt?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-25 Thread Rafael J. Wysocki
On Monday, 25 of February 2008, Zdenek Kabelac wrote:
 Hi

Hi,

[CCs restored, added CC to Dave]

 I'm finally going to test some kernel - because I'd been trying it
 against the HEAD - but unfortunately  it looks like there is something
 seriously broken with -rc3 and sata merge - anyway - I'm going to make
 a test with this head commit 85b80ebfa4384b8ea30cc1af9617db30319a9ccd
 which should be the last one before merge of SATA.
 
 So I'm going to check this tree with you patch:
 pm-remove-locking-from-core.patch

If that's the one from http://marc.info/?l=linux-acpim=120389632114090w=2 ,
please do it.

 ---
  drivers/base/core.c   |8 ---
  drivers/base/power/main.c |   97 
 +++---
  drivers/usb/core/usb.c|2
  3 files changed, 8 insertions(+), 99 deletions(-)
 
 Do you wan to test also the other patch ?

Yes.  Please also test the patch that Alan asked you to test here:
http://lkml.org/lkml/2008/2/23/402
It's experimantal, but it is important to us to know if you see the symptoms
(and which ones if you do) with this patch applied.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: using long instead of atomic_t when only set/read is required (was Re: [Bug 10030] Suspend doesn't work when SD card is inserted)

2008-02-25 Thread Alan Stern
On Mon, 25 Feb 2008, Pavel Machek wrote:

 Hi!
 
 Alan thinks that `subj` is correct...

More precisely, reads and writes of pointers are always atomic.  That 
is, if a write and a read occur concurrently, it is guaranteed that the 
read will obtain either the old or the new value of the pointer, never 
a mish-mash of the two.  If this were not so then RCU wouldn't work.

 I guess it only works as long as longs are aligned? Should it be
 written down to atomic_ops.txt?

Forget it, I'm going to withdraw the entire thing.  The whole approach 
is wrong.  More details in a new email thread, to follow soon.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

2008-02-25 Thread Pierre Ossman
On Sun, 24 Feb 2008 10:33:34 -0500 (EST)
Alan Stern [EMAIL PROTECTED] wrote:

 
 Well, the patch itself isn't really adequate; it was just a first pass 
 intended to illustrate the nature of the problem.
 
 The problems in the MMC subsystem go deeper.
 
 The first is the way it uses flush_workqueue().  This is almost always 
 a bad function to call, because it introduces unnecessary dependencies.  
 cancel_delayed_work_sync() is much better.
 
 But even changing that won't solve the second issue, which is a genuine
 bug.  There is a race between detect events and suspend events.  The
 mmc_suspend_host() routine starts out by flushing the kmmcd workqueue
 before calling the host's suspend routine.  So what happens if another
 detect event occurs in between?
 

The idea is that host drivers shouldn't do that. Once they've called
mmc_suspend_host(), then they shouldn't be poking the MMC core in any
other way. None of this is of course properly documented. :/

 This whole area of synchronization between card insertion, card
 removal, suspend, and resume needs to be thought out better.  
 Especially keeping in mind that device registration or unregistration
 during a system sleep is likely to block until the sleep is over.

Trying to keep up with the PM changes is a full time job. For now I've
mostly ignored it as I don't even have time for the other stuff.
Patches welcome.

 
 Lastly, there are some other questions about confusing aspects of the
 subsystem.  What's the story with __mmc_claim_host()?  Is it really
 nothing more than an abortable mutex_lock()?  Why does it ever need to
 be aborted?  Why is its second argument an atomic_t instead of a normal
 int?
 

It got that way when SDIO got in. It was needed to make it abortable to
solve a rather nasty deadlock situation. I don't remember the details
right now, but it should be in the LKML archives.

 Why are mmc_detect() and related routines described in comments as
 Card detection callback from host?  They don't handle card
 _detection_ -- they handle card _removal_.

They handle both.

 
 What's the purpose of the card-counting loop in 
 host/omap.c:mmc_omap_switch_handler()?  It looks like dead code.
 

I'm not too familiar with that driver, but they've been playing around
with multiplexing several cards into one controller. Might be bits and
pieces of that.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

2008-02-25 Thread Alan Stern
Thanks for the explanations.

On Mon, 25 Feb 2008, Pierre Ossman wrote:

 Trying to keep up with the PM changes is a full time job. For now I've
 mostly ignored it as I don't even have time for the other stuff.
 Patches welcome.

What do you think of the patch attached to comment #40 in the Bugzilla 
entry?

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

2008-02-25 Thread Pierre Ossman
On Mon, 25 Feb 2008 12:58:30 -0500 (EST)
Alan Stern [EMAIL PROTECTED] wrote:

 Thanks for the explanations.
 
 On Mon, 25 Feb 2008, Pierre Ossman wrote:
 
  Trying to keep up with the PM changes is a full time job. For now I've
  mostly ignored it as I don't even have time for the other stuff.
  Patches welcome.
 
 What do you think of the patch attached to comment #40 in the Bugzilla 
 entry?
 

Looks ok. As long as those two synchronization points are guaranteed,
then I'm happy.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

2008-02-25 Thread Alan Stern
On Mon, 25 Feb 2008, Pierre Ossman wrote:

  What do you think of the patch attached to comment #40 in the Bugzilla 
  entry?
  
 
 Looks ok. As long as those two synchronization points are guaranteed,
 then I'm happy.

Maybe a better approach would be to leave the workqueue unfreezable,
and keep cancel_delayed_work_sync() in mmc_suspend_host().  It would
then be necessary to add a test to verify, if there is a card attached,
that the card is indeed suspended.  After all, it's possible that the
cancel_delayed_work_sync() ended up waiting for a job already running
on the workqueue to register a new card!  (The same would be true even 
with flush_scheduled_work.)

Also, as a bit of defensive programming, it might be a good idea to add
a suspended flag to the mmc_host structure.  If mmc_rescan() sees
that the flag is set then it should return immediately.  This would
protect against host drivers that aren't careful to disable detect
IRQs before calling mmc_suspend_host().

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

2008-02-25 Thread Felipe Balbi
snip

  What's the purpose of the card-counting loop in
  host/omap.c:mmc_omap_switch_handler()? It looks like dead code.
 

 I'm not too familiar with that driver, but they've been playing around
 with multiplexing several cards into one controller. Might be bits and
 pieces of that.


AFAIK, that came after mmc multislot support. Omap2 has one mmc
controller  for several (2 in practice) mm slots (cards).
It's not really a dead code since it's used in n800 and n810, it's in
linux-omap archives and git tree.

Carlos Aguiar, added in the loop, might have some comments about it.

--
Best Regards,
Felipe Balbi
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

> > Very subtly wrong ;-).
> > 
> > imagine suspending_task == 0xabcdef01. Now task "R" with current ==
> > 0xabcd reads suspending_task while the other cpu is writing to it,
> > and sees 0xabcd (0xef01 was not yet written) -- and mistakenly
> > believes that  "R" == suspending_task.
> > 
> > I agree it is very unlikely, and it will not happen on i386. But what
> > about just using atomic_t suspending_task, and store current->pid into
> > it?
> 
> I'd rather use a lock, frankly.  For example, we can require the readers to
> take pm_sleep_rwsem for reading in order to access that.

That certainly won't work.  Imagine what would happen when the reader 
_was_ the suspending task.

But if you really twist my arm, I'll go along with Pavel's suggestion.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Pavel Machek wrote:

> > > At the very least, you'd need rmb() before reading it and wmb() after
> > > writing to it, but I'm not sure if that's enough on every obscure
> > > architecture out there.
> > 
> > No, neither one is needed because of the way suspending_task is used.  
> > 
> > It's not necessary for a reader R to see the variable's actual value;  
> > all R needs to know is whether or not suspending_task is equal to R.  
> > Since the only process which can set suspending_task to R is R itself,
> > and since R will set suspending_task back to NULL before releasing the
> > write lock on pm_sleep_rwsem, there's never any ambiguity.
> 
> Subtle.
> 
> Very subtly wrong ;-).
> 
> imagine suspending_task == 0xabcdef01. Now task "R" with current ==
> 0xabcd reads suspending_task while the other cpu is writing to it,
> and sees 0xabcd (0xef01 was not yet written) -- and mistakenly
> believes that  "R" == suspending_task.

I always thought that reads and writes of pointers are atomic, just 
like reads and writes of longs.  Is that wrong?

Now if pointers were the same size as long long then I would agree with 
you.

> I agree it is very unlikely, and it will not happen on i386. But what
> about just using atomic_t suspending_task, and store current->pid into
> it?

That would work just as well.  Except that it wouldn't need to be
atomic_t, because current->pid is always an integer (not guaranteed, I
suppose, but that's what it is on all current architectures) and
reads/writes of ints _are_ atomic.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Pavel Machek wrote:
> On Sun 2008-02-24 15:33:01, Alan Stern wrote:
> > On Sun, 24 Feb 2008, Pavel Machek wrote:
> > 
> > > > > What locking protects this variable? What happens when suspending_task
> > > > > exits? (Hmm, that would probably be bug, anyway?)
> > > > 
> > > > It's protected by whatever existing locking scheme allows only one
> > > > task to start a system sleep at a time.  For example, the suspending 
> > > > task has to get a write lock on pm_sleep_rwsem.
> > > 
> > > And readers of suspending_task are protected by?
> > 
> > I added a comment about that too.
> > 
> > > At the very least, you'd need rmb() before reading it and wmb() after
> > > writing to it, but I'm not sure if that's enough on every obscure
> > > architecture out there.
> > 
> > No, neither one is needed because of the way suspending_task is used.  
> > 
> > It's not necessary for a reader R to see the variable's actual value;  
> > all R needs to know is whether or not suspending_task is equal to R.  
> > Since the only process which can set suspending_task to R is R itself,
> > and since R will set suspending_task back to NULL before releasing the
> > write lock on pm_sleep_rwsem, there's never any ambiguity.
> 
> Subtle.
> 
> Very subtly wrong ;-).
> 
> imagine suspending_task == 0xabcdef01. Now task "R" with current ==
> 0xabcd reads suspending_task while the other cpu is writing to it,
> and sees 0xabcd (0xef01 was not yet written) -- and mistakenly
> believes that  "R" == suspending_task.
> 
> I agree it is very unlikely, and it will not happen on i386. But what
> about just using atomic_t suspending_task, and store current->pid into
> it?

I'd rather use a lock, frankly.  For example, we can require the readers to
take pm_sleep_rwsem for reading in order to access that.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Alan Stern wrote:
> On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
> 
> > From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > 
> > Remove the code that acquires all device semaphores from the suspend
> > code path as it causes multiple problems to appear (most notably,
> > http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
> > change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
> > depending on the code being removed.
> > 
> > Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid
> > the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.
> 
> This looks okay.
> 
> Are you going to submit it just for 2.6.25, leaving the existing code 
> as-is for 2.6.26?

No, I'd like to start over on top of 2.6.25.  That will be much cleaner from
the upstream point of view.

We seem to have all the pieces anyway, so that's only a matter of putting them
back together.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Pavel Machek
On Sun 2008-02-24 15:33:01, Alan Stern wrote:
> On Sun, 24 Feb 2008, Pavel Machek wrote:
> 
> > > > What locking protects this variable? What happens when suspending_task
> > > > exits? (Hmm, that would probably be bug, anyway?)
> > > 
> > > It's protected by whatever existing locking scheme allows only one
> > > task to start a system sleep at a time.  For example, the suspending 
> > > task has to get a write lock on pm_sleep_rwsem.
> > 
> > And readers of suspending_task are protected by?
> 
> I added a comment about that too.
> 
> > At the very least, you'd need rmb() before reading it and wmb() after
> > writing to it, but I'm not sure if that's enough on every obscure
> > architecture out there.
> 
> No, neither one is needed because of the way suspending_task is used.  
> 
> It's not necessary for a reader R to see the variable's actual value;  
> all R needs to know is whether or not suspending_task is equal to R.  
> Since the only process which can set suspending_task to R is R itself,
> and since R will set suspending_task back to NULL before releasing the
> write lock on pm_sleep_rwsem, there's never any ambiguity.

Subtle.

Very subtly wrong ;-).

imagine suspending_task == 0xabcdef01. Now task "R" with current ==
0xabcd reads suspending_task while the other cpu is writing to it,
and sees 0xabcd (0xef01 was not yet written) -- and mistakenly
believes that  "R" == suspending_task.

I agree it is very unlikely, and it will not happen on i386. But what
about just using atomic_t suspending_task, and store current->pid into
it?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> 
> Remove the code that acquires all device semaphores from the suspend
> code path as it causes multiple problems to appear (most notably,
> http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
> change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
> depending on the code being removed.
> 
> Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid
> the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.

This looks okay.

Are you going to submit it just for 2.6.25, leaving the existing code 
as-is for 2.6.26?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Alan Stern wrote:
> On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
> 
> > From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> > 
> > Remove the code that acquires all device semaphores from the suspend
> > code path as it causes multiple problems to appear (most notably,
> > http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
> > change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
> > depending on the code being removed.
> > 
> > Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid
> > the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.
> 
> Do you actually know whether removing that lock fixes the reported bug?

In fact I'm waiting for the confirmation, but given the symptoms described by
Lukas I'm quite confident that the problem shouldn't be reproducible with
the lock removed.

> I agree, the lock should be removed for now.  But I'd sure like to get 
> some feedback about what's going wrong.

No question about that.

> It's starting to look as though we'd be a lot better off blocking device
> registration during sleep instead of failing it.

Agreed.

> Shouldn't resume_device() and suspend_device() now acquire the device 
> semaphore before calling the various methods?  That's the way they 
> used to work.

Yes, thanks for reminding me about that.

Updated patch follows.

Thanks,
Rafael


---
From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Remove the code that acquires all device semaphores from the suspend
code path as it causes multiple problems to appear (most notably,
http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
depending on the code being removed.

Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid
the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 drivers/base/core.c   |8 ---
 drivers/base/power/main.c |  106 ++
 drivers/usb/core/usb.c|2 
 3 files changed, 17 insertions(+), 99 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -48,7 +48,6 @@
  */
 
 LIST_HEAD(dpm_active);
-static LIST_HEAD(dpm_locked);
 static LIST_HEAD(dpm_off);
 static LIST_HEAD(dpm_off_irq);
 static LIST_HEAD(dpm_destroy);
@@ -81,28 +80,6 @@ void device_pm_add(struct device *dev)
  */
 void device_pm_remove(struct device *dev)
 {
-   /*
-* If this function is called during a suspend, it will be blocked,
-* because we're holding the device's semaphore at that time, which may
-* lead to a deadlock.  In that case we want to print a warning.
-* However, it may also be called by unregister_dropped_devices() with
-* the device's semaphore released, in which case the warning should
-* not be printed.
-*/
-   if (down_trylock(>sem)) {
-   if (down_read_trylock(_sleep_rwsem)) {
-   /* No suspend in progress, wait on dev->sem */
-   down(>sem);
-   up_read(_sleep_rwsem);
-   } else {
-   /* Suspend in progress, we may deadlock */
-   dev_warn(dev, "Suspicious %s during suspend\n",
-   __FUNCTION__);
-   dump_stack();
-   /* The user has been warned ... */
-   down(>sem);
-   }
-   }
pr_debug("PM: Removing info for %s:%s\n",
 dev->bus ? dev->bus->name : "No Bus",
 kobject_name(>kobj));
@@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev
dpm_sysfs_remove(dev);
list_del_init(>power.entry);
mutex_unlock(_list_mtx);
-   up(>sem);
 }
 
 /**
@@ -230,6 +206,8 @@ static int resume_device(struct device *
TRACE_DEVICE(dev);
TRACE_RESUME(0);
 
+   down(>sem);
+
if (dev->bus && dev->bus->resume) {
dev_dbg(dev,"resuming\n");
error = dev->bus->resume(dev);
@@ -245,6 +223,8 @@ static int resume_device(struct device *
error = dev->class->resume(dev);
}
 
+   up(>sem);
+
TRACE_RESUME(error);
return error;
 }
@@ -266,7 +246,7 @@ static void dpm_resume(void)
struct list_head *entry = dpm_off.next;
struct device *dev = to_device(entry);
 
-   list_move_tail(entry, _locked);
+   list_move_tail(entry, _active);
mutex_unlock(_list_mtx);
resume_device(dev);
mutex_lock(_list_mtx);
@@ -275,25 +255,6 @@ static void dpm_resume(void)
 }
 
 /**
- * unlock_all_devices - Release each device's semaphore
- *
- * Go 

Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <[EMAIL PROTECTED]>
> 
> Remove the code that acquires all device semaphores from the suspend
> code path as it causes multiple problems to appear (most notably,
> http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
> change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
> depending on the code being removed.
> 
> Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid
> the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.

Do you actually know whether removing that lock fixes the reported bug?

I agree, the lock should be removed for now.  But I'd sure like to get 
some feedback about what's going wrong.  It's starting to look as 
though we'd be a lot better off blocking device registration during 
sleep instead of failing it.

Shouldn't resume_device() and suspend_device() now acquire the device 
semaphore before calling the various methods?  That's the way they 
used to work.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Pavel Machek wrote:

> > > What locking protects this variable? What happens when suspending_task
> > > exits? (Hmm, that would probably be bug, anyway?)
> > 
> > It's protected by whatever existing locking scheme allows only one
> > task to start a system sleep at a time.  For example, the suspending 
> > task has to get a write lock on pm_sleep_rwsem.
> 
> And readers of suspending_task are protected by?

I added a comment about that too.

> At the very least, you'd need rmb() before reading it and wmb() after
> writing to it, but I'm not sure if that's enough on every obscure
> architecture out there.

No, neither one is needed because of the way suspending_task is used.  

It's not necessary for a reader R to see the variable's actual value;  
all R needs to know is whether or not suspending_task is equal to R.  
Since the only process which can set suspending_task to R is R itself,
and since R will set suspending_task back to NULL before releasing the
write lock on pm_sleep_rwsem, there's never any ambiguity.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Rafael J. Wysocki wrote:
> On Sunday, 24 of February 2008, Alan Stern wrote:
> > On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
> > 
> > > On Sunday, 24 of February 2008, Alan Stern wrote:
> > > > On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
> > > > 
> > > > > Ultimately no, it's not.  However, we are now late in the -rc2 time 
> > > > > frame and
> > > > > the release of -rc3 seems to be imminent.  At this point, IMO, that's 
> > > > > the
> > > > > safest thing to do.  BTW, appended is the patch I'd like to get 
> > > > > applied.
> > > > 
> > > > In the interest of having a safe release, I guess you're right.
> > > 
> > > That's what I'm concerned about at the moment.  I'm afraid there won't be
> > > enough time to nail down all the issues (there may be some we're not even
> > > aware of).
> > 
> > All right.  You'll need to enlarge your big reversal patch by reverting 
> > commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f.
> 
> Thanks for the hint.
> 
> In fact, I had to add a couple of fixes in device_power_down() and
> dpm_suspend().  The entire patch is appended.

I hope it's fine by everyone to push the patch below to Greg for 2.6.25.

Thanks,
Rafael


---
From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Remove the code that acquires all device semaphores from the suspend
code path as it causes multiple problems to appear (most notably,
http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
depending on the code being removed.

Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid
the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 drivers/base/core.c   |8 ---
 drivers/base/power/main.c |   97 +++---
 drivers/usb/core/usb.c|2 
 3 files changed, 8 insertions(+), 99 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -48,7 +48,6 @@
  */
 
 LIST_HEAD(dpm_active);
-static LIST_HEAD(dpm_locked);
 static LIST_HEAD(dpm_off);
 static LIST_HEAD(dpm_off_irq);
 static LIST_HEAD(dpm_destroy);
@@ -81,28 +80,6 @@ void device_pm_add(struct device *dev)
  */
 void device_pm_remove(struct device *dev)
 {
-   /*
-* If this function is called during a suspend, it will be blocked,
-* because we're holding the device's semaphore at that time, which may
-* lead to a deadlock.  In that case we want to print a warning.
-* However, it may also be called by unregister_dropped_devices() with
-* the device's semaphore released, in which case the warning should
-* not be printed.
-*/
-   if (down_trylock(>sem)) {
-   if (down_read_trylock(_sleep_rwsem)) {
-   /* No suspend in progress, wait on dev->sem */
-   down(>sem);
-   up_read(_sleep_rwsem);
-   } else {
-   /* Suspend in progress, we may deadlock */
-   dev_warn(dev, "Suspicious %s during suspend\n",
-   __FUNCTION__);
-   dump_stack();
-   /* The user has been warned ... */
-   down(>sem);
-   }
-   }
pr_debug("PM: Removing info for %s:%s\n",
 dev->bus ? dev->bus->name : "No Bus",
 kobject_name(>kobj));
@@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev
dpm_sysfs_remove(dev);
list_del_init(>power.entry);
mutex_unlock(_list_mtx);
-   up(>sem);
 }
 
 /**
@@ -266,7 +242,7 @@ static void dpm_resume(void)
struct list_head *entry = dpm_off.next;
struct device *dev = to_device(entry);
 
-   list_move_tail(entry, _locked);
+   list_move_tail(entry, _active);
mutex_unlock(_list_mtx);
resume_device(dev);
mutex_lock(_list_mtx);
@@ -275,25 +251,6 @@ static void dpm_resume(void)
 }
 
 /**
- * unlock_all_devices - Release each device's semaphore
- *
- * Go through the dpm_off list.  Put each device on the dpm_active
- * list and unlock it.
- */
-static void unlock_all_devices(void)
-{
-   mutex_lock(_list_mtx);
-   while (!list_empty(_locked)) {
-   struct list_head *entry = dpm_locked.prev;
-   struct device *dev = to_device(entry);
-
-   list_move(entry, _active);
-   up(>sem);
-   }
-   mutex_unlock(_list_mtx);
-}
-
-/**
  * unregister_dropped_devices - Unregister devices scheduled for removal
  *
  * Unregister all devices on the dpm_destroy list.
@@ -305,7 +262,6 @@ static void unregister_dropped_devices(v

Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Zdenek Kabelac wrote:
> 2008/2/24, Alan Stern <[EMAIL PROTECTED]>:
> > On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
> >
> >
> > > > You know, with this new patch we probably don't need
> >  > > device_pm_schedule_removal() any more.
> >  >
> >  > No, we don't.  However, because of that dpm_suspend() and 
> > device_power_down()
> >  > need to be changed not to assume that the removed devices will end up on
> >  > dpm_destroy.
> >
> >
> >  with analogous changes to dpm_suspend().
> >
> >  Any more suggestions for updates?
> >
> 
> Is there any current patch I should actually test - it looks there is
> ongoing discussion from the time Rafael has proposed his last patch to
> check by me: http://bugzilla.kernel.org/attachment.cgi?id=14961=view
> 
> But I think it's visible the only suspend update will not resolve my
> problems and the problems is deeper in the mmc.

Yes, there are problems in there, but your feedback is needed nevertheless.
 
> So should I still test yesterdays' patch ?

Yes please.  It will be valuable information to us whether it fixes the suspend
issue at least.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Pavel Machek
Hi!

> > > @@ -25,6 +25,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  
> > >  #include "../base.h"
> > >  #include "power.h"
> > > @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
> > >  
> > >  int (*platform_enable_wakeup)(struct device *dev, int is_on);
> > >  
> > > +static struct task_struct *suspending_task;
> > 
> > What locking protects this variable? What happens when suspending_task
> > exits? (Hmm, that would probably be bug, anyway?)
> 
> It's protected by whatever existing locking scheme allows only one
> task to start a system sleep at a time.  For example, the suspending 
> task has to get a write lock on pm_sleep_rwsem.

And readers of suspending_task are protected by?

At the very least, you'd need rmb() before reading it and wmb() after
writing to it, but I'm not sure if that's enough on every obscure
architecture out there.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Zdenek Kabelac
2008/2/24, Alan Stern <[EMAIL PROTECTED]>:
> On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
>
>
> > > You know, with this new patch we probably don't need
>  > > device_pm_schedule_removal() any more.
>  >
>  > No, we don't.  However, because of that dpm_suspend() and 
> device_power_down()
>  > need to be changed not to assume that the removed devices will end up on
>  > dpm_destroy.
>
>
>  with analogous changes to dpm_suspend().
>
>  Any more suggestions for updates?
>

Is there any current patch I should actually test - it looks there is
ongoing discussion from the time Rafael has proposed his last patch to
check by me: http://bugzilla.kernel.org/attachment.cgi?id=14961=view

But I think it's visible the only suspend update will not resolve my
problems and the problems is deeper in the mmc.

So should I still test yesterdays' patch ?

Zdenek

Zdenek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

> > You know, with this new patch we probably don't need 
> > device_pm_schedule_removal() any more.
> 
> No, we don't.  However, because of that dpm_suspend() and device_power_down()
> need to be changed not to assume that the removed devices will end up on
> dpm_destroy.

Yes.  The safest approach will be to check whether the device is still 
at the end of the dpm_locked or dpm_off list; if it hasn't then we can 
assume that it has been removed (or scheduled for removal).

> IMO you also should add this change in device_power_down():
> 
> @@ -388,18 +343,15 @@ int device_power_down(pm_message_t state
>   struct list_head *entry = dpm_off.prev;
>   struct device *dev = to_device(entry);
>  
> - list_del_init(>power.entry);
>   error = suspend_device_late(dev, state);
>   if (error) {
>   printk(KERN_ERR "Could not power down device %s: "
>   "error %d\n",
>   kobject_name(>kobj), error);
> - if (list_empty(>power.entry))
> - list_add(>power.entry, _off);
>   break;
>   }
> - if (list_empty(>power.entry))
> - list_add(>power.entry, _off_irq);
> + if (!list_empty(>power.entry))
> + list_move(>power.entry, _off_irq);
>   }

Actually I changed the last two lines to:

+
+   /* Don't move the entry if the device has been unregistered */
+   if (entry == dpm_off.prev)
+   list_move(entry, _off_irq);

with analogous changes to dpm_suspend().

Any more suggestions for updates?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Pavel Machek wrote:

> Hi!
> 
> > Index: usb-2.6/drivers/base/power/main.c
> > ===
> > --- usb-2.6.orig/drivers/base/power/main.c
> > +++ usb-2.6/drivers/base/power/main.c
> > @@ -25,6 +25,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include "../base.h"
> >  #include "power.h"
> > @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
> >  
> >  int (*platform_enable_wakeup)(struct device *dev, int is_on);
> >  
> > +static struct task_struct *suspending_task;
> 
> What locking protects this variable? What happens when suspending_task
> exits? (Hmm, that would probably be bug, anyway?)

It's protected by whatever existing locking scheme allows only one
task to start a system sleep at a time.  For example, the suspending 
task has to get a write lock on pm_sleep_rwsem.

Yes, if the suspending task exits before the system has woken up, 
you're in trouble regardless.

> Or are we running UP when this is accessed? This at least needs a big
> fat comment.
> 
> > +bool in_suspend_context(void)
> > +{
> > +   return (suspending_task == current);
> > +}

We aren't necessarily UP.  But since all that matters is whether or not 
suspend_task is equal to the current task, no extra locking is needed.  

I'll add a comment explaining all this.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Pavel Machek
Hi!

> Index: usb-2.6/drivers/base/power/main.c
> ===
> --- usb-2.6.orig/drivers/base/power/main.c
> +++ usb-2.6/drivers/base/power/main.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "../base.h"
>  #include "power.h"
> @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
>  
>  int (*platform_enable_wakeup)(struct device *dev, int is_on);
>  
> +static struct task_struct *suspending_task;

What locking protects this variable? What happens when suspending_task
exits? (Hmm, that would probably be bug, anyway?)

Or are we running UP when this is accessed? This at least needs a big
fat comment.

> +bool in_suspend_context(void)
> +{
> + return (suspending_task == current);
> +}


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

> > In fact this turns out to be exactly the problem with the MMC
> > subsystem.  It uses a dedicated workqueue (kmmcd) to handle state
> > changes, and mmc_suspend_host() does a flush_workqueue().  If the
> > workqueue contains an entry to register or unregister a device, this is
> > guaranteed to deadlock -- and that's exactly what happens when Zdenek 
> > inserts or removes a card during the disk-drive spindown time of a 
> > system sleep.
> > 
> > It looks like the best solution is for mmc_suspend_host() not to flush
> > the workqueue; instead the workqueue should prevent itself from
> > running during a suspend.  Right now the easiest way to accomplish this
> > is for the workqueue routines (mmc_detect() and siblings) to acquire
> > the mmc_host's device semaphore.  If in the future the MMC subsystem
> > adds dynamic PM support then a more complicated arrangement will be 
> > needed.
> > 
> > So the patch below, in combination with the previous patch, might just 
> > fix the whole problem.
> 
> The patch looks sane and I think you're right.

Well, the patch itself isn't really adequate; it was just a first pass 
intended to illustrate the nature of the problem.

The problems in the MMC subsystem go deeper.

The first is the way it uses flush_workqueue().  This is almost always 
a bad function to call, because it introduces unnecessary dependencies.  
cancel_delayed_work_sync() is much better.

But even changing that won't solve the second issue, which is a genuine
bug.  There is a race between detect events and suspend events.  The
mmc_suspend_host() routine starts out by flushing the kmmcd workqueue
before calling the host's suspend routine.  So what happens if another
detect event occurs in between?

This whole area of synchronization between card insertion, card
removal, suspend, and resume needs to be thought out better.  
Especially keeping in mind that device registration or unregistration
during a system sleep is likely to block until the sleep is over.

Lastly, there are some other questions about confusing aspects of the
subsystem.  What's the story with __mmc_claim_host()?  Is it really
nothing more than an abortable mutex_lock()?  Why does it ever need to
be aborted?  Why is its second argument an atomic_t instead of a normal
int?

Why are mmc_detect() and related routines described in comments as
"Card detection callback from host"?  They don't handle card
_detection_ -- they handle card _removal_.

What's the purpose of the card-counting loop in 
host/omap.c:mmc_omap_switch_handler()?  It looks like dead code.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Alan Stern wrote:
> On Sat, 23 Feb 2008, Alan Stern wrote:
> 
> > It happened in a workqueue.  There could be lots of similar cases: Some 
> > interrupt-driven event causes a hotplug action.  Since the action can't 
> > be carried out in interrupt context, the driver has no choice but to 
> > defer it to a workqueue or kernel thread.  Blocking that workqueue or 
> > kernel thread won't cause a problem _unless_ the driver's 
> > suspend/resume methods try to synchronize with it.  That's the 
> > difficult case.
> 
> In fact this turns out to be exactly the problem with the MMC
> subsystem.  It uses a dedicated workqueue (kmmcd) to handle state
> changes, and mmc_suspend_host() does a flush_workqueue().  If the
> workqueue contains an entry to register or unregister a device, this is
> guaranteed to deadlock -- and that's exactly what happens when Zdenek 
> inserts or removes a card during the disk-drive spindown time of a 
> system sleep.
> 
> It looks like the best solution is for mmc_suspend_host() not to flush
> the workqueue; instead the workqueue should prevent itself from
> running during a suspend.  Right now the easiest way to accomplish this
> is for the workqueue routines (mmc_detect() and siblings) to acquire
> the mmc_host's device semaphore.  If in the future the MMC subsystem
> adds dynamic PM support then a more complicated arrangement will be 
> needed.
> 
> So the patch below, in combination with the previous patch, might just 
> fix the whole problem.

The patch looks sane and I think you're right.

Still, it depends on us holding the device semaphores (if I understand it
correctly).  For this reason, I'd prefer to include it into the patch that adds
device locking to the PM core.

I think we should first remove the device locking from
drivers/base/power/main.c (that's the patch I'd like to push upstream now) and
then create a patch that will add it back along with all of the necessary
additional changes we know of (BTW, I've just received a message from another
user affected by it).

but we need to be careful with
all the details.  IMO, it would be 

Thanks,
Rafael

 
> Index: usb-2.6/drivers/mmc/core/core.c
> ===
> --- usb-2.6.orig/drivers/mmc/core/core.c
> +++ usb-2.6/drivers/mmc/core/core.c
> @@ -731,8 +731,6 @@ void mmc_stop_host(struct mmc_host *host
>   */
>  int mmc_suspend_host(struct mmc_host *host, pm_message_t state)
>  {
> - mmc_flush_scheduled_work();
> -
>   mmc_bus_get(host);
>   if (host->bus_ops && !host->bus_dead) {
>   if (host->bus_ops->suspend)
> Index: usb-2.6/drivers/mmc/core/mmc.c
> ===
> --- usb-2.6.orig/drivers/mmc/core/mmc.c
> +++ usb-2.6/drivers/mmc/core/mmc.c
> @@ -441,6 +441,7 @@ static void mmc_detect(struct mmc_host *
>   BUG_ON(!host);
>   BUG_ON(!host->card);
>  
> + down(_classdev(host)->sem);
>   mmc_claim_host(host);
>  
>   /*
> @@ -449,6 +450,7 @@ static void mmc_detect(struct mmc_host *
>   err = mmc_send_status(host->card, NULL);
>  
>   mmc_release_host(host);
> + up(_classdev(host)->sem);
>  
>   if (err) {
>   mmc_remove(host);
> Index: usb-2.6/drivers/mmc/core/sd.c
> ===
> --- usb-2.6.orig/drivers/mmc/core/sd.c
> +++ usb-2.6/drivers/mmc/core/sd.c
> @@ -500,6 +500,7 @@ static void mmc_sd_detect(struct mmc_hos
>   BUG_ON(!host);
>   BUG_ON(!host->card);
>  
> + down(_classdev(host)->sem);
>   mmc_claim_host(host);
>  
>   /*
> @@ -508,6 +509,7 @@ static void mmc_sd_detect(struct mmc_hos
>   err = mmc_send_status(host->card, NULL);
>  
>   mmc_release_host(host);
> + up(_classdev(host)->sem);
>  
>   if (err) {
>   mmc_sd_remove(host);
> Index: usb-2.6/drivers/mmc/core/sdio.c
> ===
> --- usb-2.6.orig/drivers/mmc/core/sdio.c
> +++ usb-2.6/drivers/mmc/core/sdio.c
> @@ -195,6 +195,7 @@ static void mmc_sdio_detect(struct mmc_h
>   BUG_ON(!host);
>   BUG_ON(!host->card);
>  
> + down(_classdev(host)->sem);
>   mmc_claim_host(host);
>  
>   /*
> @@ -203,6 +204,7 @@ static void mmc_sdio_detect(struct mmc_h
>   err = mmc_select_card(host->card);
>  
>   mmc_release_host(host);
> + up(_classdev(host)->sem);
>  
>   if (err) {
>   mmc_sdio_remove(host);
> 
> 
> 



-- 
"Premature optimization is the root of all evil." - Donald Knuth
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Alan Stern wrote:
> On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
> > On Sunday, 24 of February 2008, Alan Stern wrote:
> > > On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
> 
> > > I still have no clear idea about what's going on with the ACPI bug in 
> > > #9874.
> > 
> > It seems that the ACPI code is not prepared to cope with a failing device
> > registration, in which case it'd need fixing.  Frankly, I'm afraid there may
> > be more issues like this throughout the tree.
> 
> The fact remains that it is unsafe to register a device during a sleep 
> transition, although if the device's driver doesn't have a suspend or 
> resume method you can probably get away with it.
> 
> > > However perhaps the bug wouldn't occur if we blocked device  
> > > registration until after the resume was finished, instead of failing it 
> > > outright.  Since the registration in this case was done in a workqueue 
> > > thread, blocking wouldn't cause an immediate deadlock.
> > 
> > No, it wouldn't, but the fact that it happens in the ACPI code makes me 
> > worry.
> 
> It happened in a workqueue.  There could be lots of similar cases: Some 
> interrupt-driven event causes a hotplug action.  Since the action can't 
> be carried out in interrupt context, the driver has no choice but to 
> defer it to a workqueue or kernel thread.  Blocking that workqueue or 
> kernel thread won't cause a problem _unless_ the driver's 
> suspend/resume methods try to synchronize with it.  That's the 
> difficult case.

Yes.

> > If we block that code and the things it's supposed to do turn out to be
> > necessary for suspending later on, we'll be in trouble.
> 
> Exactly.
> 
> > > @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev
> > >   /* No suspend in progress, wait on dev->sem */
> > >   down(>sem);
> > >   up_read(_sleep_rwsem);
> > > - } else {
> > > - /* Suspend in progress, we may deadlock */
> > > + } else if (!in_suspend_context()) {
> > > + /* Suspending in another task, we may deadlock */
> > 
> > Well, that shouldn't really deadlock.  If it does, there is a potential 
> > design
> > issue somewhere.  I think it might be better to set up a timer in here too.
> 
> At this point the driver core owns the device semaphore, so the 
> unregistration task will block until the sleep is over.  If the 
> driver's resume method has to synchronize with the unregistration task 
> then it will deadlock.  I agree, it would be a design issue.  In fact, 
> it's the same design issue described earlier in this email.
> 
> > Although IMO it would be even better to just set up a timer and remove the
> > warning altogehter.
> 
> That warning was just for debugging purposes.  A timer in the resume
> path could do the same thing with fewer false alarms, just like the 
> timer in the suspend path.  I have added it to the new patch.

OK

> > >   dev_warn(dev, "Suspicious %s during suspend\n",
> > >   __FUNCTION__);
> > >   dump_stack();
> > >   /* The user has been warned ... */
> > >   down(>sem);
> > >   }
> > > + /* Otherwise we're okay */
> > >   }
> > >   pr_debug("PM: Removing info for %s:%s\n",
> > >dev->bus ? dev->bus->name : "No Bus",
> > 
> > There's a problem here, because we shouldn't release the semaphore if we're
> > in the suspend context.
> 
> Yes, you're right.  It's fixed in the new version.
> 
> > > +static void suspend_timeout(unsigned long _dev)
> > > +{
> > > + struct device *dev = (struct device *) _dev;
> > > +
> > > + dev_err(dev, "deadlock during suspend!\n");
> > > + show_state();
> > 
> > The show_state() seems to be overkill and won't really help if the user 
> > can't
> > collect the output on the fly using a serial console or something like this.
> > [The debug stuff printed here should fit a typical laptop screen, ideally.]
> 
> Changed.
> 
> > I'd prefer
> > 
> > if (in_suspend_context()) {
> > __device_release_driver(dev);
> > } else {
> > down(>sem);
> > __device_release_driver(dev);
> > up(>sem);
> > }
> 
> Okay.
> 
> You know, with this new patch we probably don't need 
> device_pm_schedule_removal() any more.

No, we don't.  However, because of that dpm_suspend() and device_power_down()
need to be changed not to assume that the removed devices will end up on
dpm_destroy.

> However I have left it in for now.

Well, it's used by the b43, leds and hwrng drivers as well as by some CPU
hotplug notifiers.  They all will have to be changed along with removing it.

> > Well, I'd really feel much more comfortable if we removed the troublesome 
> > code
> > for 2.6.25 and reintroduced it along with the above safeguards for 2.6.26.
> > 
> > Of course, this doesn't mean we can't send debug patches for testing to 

Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Alan Stern wrote:
> On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
> 
> > On Sunday, 24 of February 2008, Alan Stern wrote:
> > > On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
> > > 
> > > > Ultimately no, it's not.  However, we are now late in the -rc2 time 
> > > > frame and
> > > > the release of -rc3 seems to be imminent.  At this point, IMO, that's 
> > > > the
> > > > safest thing to do.  BTW, appended is the patch I'd like to get applied.
> > > 
> > > In the interest of having a safe release, I guess you're right.
> > 
> > That's what I'm concerned about at the moment.  I'm afraid there won't be
> > enough time to nail down all the issues (there may be some we're not even
> > aware of).
> 
> All right.  You'll need to enlarge your big reversal patch by reverting 
> commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f.

Thanks for the hint.

In fact, I had to add a couple of fixes in device_power_down() and
dpm_suspend().  The entire patch is appended.

I'll comment your new patch in a separate message.

Thanks,
Rafael

---
 drivers/base/power/main.c |   97 +++---
 drivers/usb/core/usb.c|2 
 2 files changed, 8 insertions(+), 91 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -48,7 +48,6 @@
  */
 
 LIST_HEAD(dpm_active);
-static LIST_HEAD(dpm_locked);
 static LIST_HEAD(dpm_off);
 static LIST_HEAD(dpm_off_irq);
 static LIST_HEAD(dpm_destroy);
@@ -81,28 +80,6 @@ void device_pm_add(struct device *dev)
  */
 void device_pm_remove(struct device *dev)
 {
-   /*
-* If this function is called during a suspend, it will be blocked,
-* because we're holding the device's semaphore at that time, which may
-* lead to a deadlock.  In that case we want to print a warning.
-* However, it may also be called by unregister_dropped_devices() with
-* the device's semaphore released, in which case the warning should
-* not be printed.
-*/
-   if (down_trylock(>sem)) {
-   if (down_read_trylock(_sleep_rwsem)) {
-   /* No suspend in progress, wait on dev->sem */
-   down(>sem);
-   up_read(_sleep_rwsem);
-   } else {
-   /* Suspend in progress, we may deadlock */
-   dev_warn(dev, "Suspicious %s during suspend\n",
-   __FUNCTION__);
-   dump_stack();
-   /* The user has been warned ... */
-   down(>sem);
-   }
-   }
pr_debug("PM: Removing info for %s:%s\n",
 dev->bus ? dev->bus->name : "No Bus",
 kobject_name(>kobj));
@@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev
dpm_sysfs_remove(dev);
list_del_init(>power.entry);
mutex_unlock(_list_mtx);
-   up(>sem);
 }
 
 /**
@@ -266,7 +242,7 @@ static void dpm_resume(void)
struct list_head *entry = dpm_off.next;
struct device *dev = to_device(entry);
 
-   list_move_tail(entry, _locked);
+   list_move_tail(entry, _active);
mutex_unlock(_list_mtx);
resume_device(dev);
mutex_lock(_list_mtx);
@@ -275,25 +251,6 @@ static void dpm_resume(void)
 }
 
 /**
- * unlock_all_devices - Release each device's semaphore
- *
- * Go through the dpm_off list.  Put each device on the dpm_active
- * list and unlock it.
- */
-static void unlock_all_devices(void)
-{
-   mutex_lock(_list_mtx);
-   while (!list_empty(_locked)) {
-   struct list_head *entry = dpm_locked.prev;
-   struct device *dev = to_device(entry);
-
-   list_move(entry, _active);
-   up(>sem);
-   }
-   mutex_unlock(_list_mtx);
-}
-
-/**
  * unregister_dropped_devices - Unregister devices scheduled for removal
  *
  * Unregister all devices on the dpm_destroy list.
@@ -305,7 +262,6 @@ static void unregister_dropped_devices(v
struct list_head *entry = dpm_destroy.next;
struct device *dev = to_device(entry);
 
-   up(>sem);
mutex_unlock(_list_mtx);
/* This also removes the device from the list */
device_unregister(dev);
@@ -324,7 +280,6 @@ void device_resume(void)
 {
might_sleep();
dpm_resume();
-   unlock_all_devices();
unregister_dropped_devices();
up_write(_sleep_rwsem);
 }
@@ -388,18 +343,15 @@ int device_power_down(pm_message_t state
struct list_head *entry = dpm_off.prev;
struct device *dev = to_device(entry);
 
-   list_del_init(>power.entry);
error = 

Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Alan Stern wrote:
 On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
 
  On Sunday, 24 of February 2008, Alan Stern wrote:
   On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
   
Ultimately no, it's not.  However, we are now late in the -rc2 time 
frame and
the release of -rc3 seems to be imminent.  At this point, IMO, that's 
the
safest thing to do.  BTW, appended is the patch I'd like to get applied.
   
   In the interest of having a safe release, I guess you're right.
  
  That's what I'm concerned about at the moment.  I'm afraid there won't be
  enough time to nail down all the issues (there may be some we're not even
  aware of).
 
 All right.  You'll need to enlarge your big reversal patch by reverting 
 commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f.

Thanks for the hint.

In fact, I had to add a couple of fixes in device_power_down() and
dpm_suspend().  The entire patch is appended.

I'll comment your new patch in a separate message.

Thanks,
Rafael

---
 drivers/base/power/main.c |   97 +++---
 drivers/usb/core/usb.c|2 
 2 files changed, 8 insertions(+), 91 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -48,7 +48,6 @@
  */
 
 LIST_HEAD(dpm_active);
-static LIST_HEAD(dpm_locked);
 static LIST_HEAD(dpm_off);
 static LIST_HEAD(dpm_off_irq);
 static LIST_HEAD(dpm_destroy);
@@ -81,28 +80,6 @@ void device_pm_add(struct device *dev)
  */
 void device_pm_remove(struct device *dev)
 {
-   /*
-* If this function is called during a suspend, it will be blocked,
-* because we're holding the device's semaphore at that time, which may
-* lead to a deadlock.  In that case we want to print a warning.
-* However, it may also be called by unregister_dropped_devices() with
-* the device's semaphore released, in which case the warning should
-* not be printed.
-*/
-   if (down_trylock(dev-sem)) {
-   if (down_read_trylock(pm_sleep_rwsem)) {
-   /* No suspend in progress, wait on dev-sem */
-   down(dev-sem);
-   up_read(pm_sleep_rwsem);
-   } else {
-   /* Suspend in progress, we may deadlock */
-   dev_warn(dev, Suspicious %s during suspend\n,
-   __FUNCTION__);
-   dump_stack();
-   /* The user has been warned ... */
-   down(dev-sem);
-   }
-   }
pr_debug(PM: Removing info for %s:%s\n,
 dev-bus ? dev-bus-name : No Bus,
 kobject_name(dev-kobj));
@@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev
dpm_sysfs_remove(dev);
list_del_init(dev-power.entry);
mutex_unlock(dpm_list_mtx);
-   up(dev-sem);
 }
 
 /**
@@ -266,7 +242,7 @@ static void dpm_resume(void)
struct list_head *entry = dpm_off.next;
struct device *dev = to_device(entry);
 
-   list_move_tail(entry, dpm_locked);
+   list_move_tail(entry, dpm_active);
mutex_unlock(dpm_list_mtx);
resume_device(dev);
mutex_lock(dpm_list_mtx);
@@ -275,25 +251,6 @@ static void dpm_resume(void)
 }
 
 /**
- * unlock_all_devices - Release each device's semaphore
- *
- * Go through the dpm_off list.  Put each device on the dpm_active
- * list and unlock it.
- */
-static void unlock_all_devices(void)
-{
-   mutex_lock(dpm_list_mtx);
-   while (!list_empty(dpm_locked)) {
-   struct list_head *entry = dpm_locked.prev;
-   struct device *dev = to_device(entry);
-
-   list_move(entry, dpm_active);
-   up(dev-sem);
-   }
-   mutex_unlock(dpm_list_mtx);
-}
-
-/**
  * unregister_dropped_devices - Unregister devices scheduled for removal
  *
  * Unregister all devices on the dpm_destroy list.
@@ -305,7 +262,6 @@ static void unregister_dropped_devices(v
struct list_head *entry = dpm_destroy.next;
struct device *dev = to_device(entry);
 
-   up(dev-sem);
mutex_unlock(dpm_list_mtx);
/* This also removes the device from the list */
device_unregister(dev);
@@ -324,7 +280,6 @@ void device_resume(void)
 {
might_sleep();
dpm_resume();
-   unlock_all_devices();
unregister_dropped_devices();
up_write(pm_sleep_rwsem);
 }
@@ -388,18 +343,15 @@ int device_power_down(pm_message_t state
struct list_head *entry = dpm_off.prev;
struct device *dev = to_device(entry);
 
-   list_del_init(dev-power.entry);
error = 

Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Alan Stern wrote:
 On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
  On Sunday, 24 of February 2008, Alan Stern wrote:
   On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
 
   I still have no clear idea about what's going on with the ACPI bug in 
   #9874.
  
  It seems that the ACPI code is not prepared to cope with a failing device
  registration, in which case it'd need fixing.  Frankly, I'm afraid there may
  be more issues like this throughout the tree.
 
 The fact remains that it is unsafe to register a device during a sleep 
 transition, although if the device's driver doesn't have a suspend or 
 resume method you can probably get away with it.
 
   However perhaps the bug wouldn't occur if we blocked device  
   registration until after the resume was finished, instead of failing it 
   outright.  Since the registration in this case was done in a workqueue 
   thread, blocking wouldn't cause an immediate deadlock.
  
  No, it wouldn't, but the fact that it happens in the ACPI code makes me 
  worry.
 
 It happened in a workqueue.  There could be lots of similar cases: Some 
 interrupt-driven event causes a hotplug action.  Since the action can't 
 be carried out in interrupt context, the driver has no choice but to 
 defer it to a workqueue or kernel thread.  Blocking that workqueue or 
 kernel thread won't cause a problem _unless_ the driver's 
 suspend/resume methods try to synchronize with it.  That's the 
 difficult case.

Yes.

  If we block that code and the things it's supposed to do turn out to be
  necessary for suspending later on, we'll be in trouble.
 
 Exactly.
 
   @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev
 /* No suspend in progress, wait on dev-sem */
 down(dev-sem);
 up_read(pm_sleep_rwsem);
   - } else {
   - /* Suspend in progress, we may deadlock */
   + } else if (!in_suspend_context()) {
   + /* Suspending in another task, we may deadlock */
  
  Well, that shouldn't really deadlock.  If it does, there is a potential 
  design
  issue somewhere.  I think it might be better to set up a timer in here too.
 
 At this point the driver core owns the device semaphore, so the 
 unregistration task will block until the sleep is over.  If the 
 driver's resume method has to synchronize with the unregistration task 
 then it will deadlock.  I agree, it would be a design issue.  In fact, 
 it's the same design issue described earlier in this email.
 
  Although IMO it would be even better to just set up a timer and remove the
  warning altogehter.
 
 That warning was just for debugging purposes.  A timer in the resume
 path could do the same thing with fewer false alarms, just like the 
 timer in the suspend path.  I have added it to the new patch.

OK

 dev_warn(dev, Suspicious %s during suspend\n,
 __FUNCTION__);
 dump_stack();
 /* The user has been warned ... */
 down(dev-sem);
 }
   + /* Otherwise we're okay */
 }
 pr_debug(PM: Removing info for %s:%s\n,
  dev-bus ? dev-bus-name : No Bus,
  
  There's a problem here, because we shouldn't release the semaphore if we're
  in the suspend context.
 
 Yes, you're right.  It's fixed in the new version.
 
   +static void suspend_timeout(unsigned long _dev)
   +{
   + struct device *dev = (struct device *) _dev;
   +
   + dev_err(dev, deadlock during suspend!\n);
   + show_state();
  
  The show_state() seems to be overkill and won't really help if the user 
  can't
  collect the output on the fly using a serial console or something like this.
  [The debug stuff printed here should fit a typical laptop screen, ideally.]
 
 Changed.
 
  I'd prefer
  
  if (in_suspend_context()) {
  __device_release_driver(dev);
  } else {
  down(dev-sem);
  __device_release_driver(dev);
  up(dev-sem);
  }
 
 Okay.
 
 You know, with this new patch we probably don't need 
 device_pm_schedule_removal() any more.

No, we don't.  However, because of that dpm_suspend() and device_power_down()
need to be changed not to assume that the removed devices will end up on
dpm_destroy.

 However I have left it in for now.

Well, it's used by the b43, leds and hwrng drivers as well as by some CPU
hotplug notifiers.  They all will have to be changed along with removing it.

  Well, I'd really feel much more comfortable if we removed the troublesome 
  code
  for 2.6.25 and reintroduced it along with the above safeguards for 2.6.26.
  
  Of course, this doesn't mean we can't send debug patches for testing to the
  users who have alraedy reported problems with it. :-)
 
 Certainly!
 
 Here's the new version.

IMO you also should add this change in device_power_down():

@@ -388,18 +343,15 @@ int 

Re: [linux-pm] [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Alan Stern wrote:
 On Sat, 23 Feb 2008, Alan Stern wrote:
 
  It happened in a workqueue.  There could be lots of similar cases: Some 
  interrupt-driven event causes a hotplug action.  Since the action can't 
  be carried out in interrupt context, the driver has no choice but to 
  defer it to a workqueue or kernel thread.  Blocking that workqueue or 
  kernel thread won't cause a problem _unless_ the driver's 
  suspend/resume methods try to synchronize with it.  That's the 
  difficult case.
 
 In fact this turns out to be exactly the problem with the MMC
 subsystem.  It uses a dedicated workqueue (kmmcd) to handle state
 changes, and mmc_suspend_host() does a flush_workqueue().  If the
 workqueue contains an entry to register or unregister a device, this is
 guaranteed to deadlock -- and that's exactly what happens when Zdenek 
 inserts or removes a card during the disk-drive spindown time of a 
 system sleep.
 
 It looks like the best solution is for mmc_suspend_host() not to flush
 the workqueue; instead the workqueue should prevent itself from
 running during a suspend.  Right now the easiest way to accomplish this
 is for the workqueue routines (mmc_detect() and siblings) to acquire
 the mmc_host's device semaphore.  If in the future the MMC subsystem
 adds dynamic PM support then a more complicated arrangement will be 
 needed.
 
 So the patch below, in combination with the previous patch, might just 
 fix the whole problem.

The patch looks sane and I think you're right.

Still, it depends on us holding the device semaphores (if I understand it
correctly).  For this reason, I'd prefer to include it into the patch that adds
device locking to the PM core.

I think we should first remove the device locking from
drivers/base/power/main.c (that's the patch I'd like to push upstream now) and
then create a patch that will add it back along with all of the necessary
additional changes we know of (BTW, I've just received a message from another
user affected by it).

but we need to be careful with
all the details.  IMO, it would be 

Thanks,
Rafael

 
 Index: usb-2.6/drivers/mmc/core/core.c
 ===
 --- usb-2.6.orig/drivers/mmc/core/core.c
 +++ usb-2.6/drivers/mmc/core/core.c
 @@ -731,8 +731,6 @@ void mmc_stop_host(struct mmc_host *host
   */
  int mmc_suspend_host(struct mmc_host *host, pm_message_t state)
  {
 - mmc_flush_scheduled_work();
 -
   mmc_bus_get(host);
   if (host-bus_ops  !host-bus_dead) {
   if (host-bus_ops-suspend)
 Index: usb-2.6/drivers/mmc/core/mmc.c
 ===
 --- usb-2.6.orig/drivers/mmc/core/mmc.c
 +++ usb-2.6/drivers/mmc/core/mmc.c
 @@ -441,6 +441,7 @@ static void mmc_detect(struct mmc_host *
   BUG_ON(!host);
   BUG_ON(!host-card);
  
 + down(mmc_classdev(host)-sem);
   mmc_claim_host(host);
  
   /*
 @@ -449,6 +450,7 @@ static void mmc_detect(struct mmc_host *
   err = mmc_send_status(host-card, NULL);
  
   mmc_release_host(host);
 + up(mmc_classdev(host)-sem);
  
   if (err) {
   mmc_remove(host);
 Index: usb-2.6/drivers/mmc/core/sd.c
 ===
 --- usb-2.6.orig/drivers/mmc/core/sd.c
 +++ usb-2.6/drivers/mmc/core/sd.c
 @@ -500,6 +500,7 @@ static void mmc_sd_detect(struct mmc_hos
   BUG_ON(!host);
   BUG_ON(!host-card);
  
 + down(mmc_classdev(host)-sem);
   mmc_claim_host(host);
  
   /*
 @@ -508,6 +509,7 @@ static void mmc_sd_detect(struct mmc_hos
   err = mmc_send_status(host-card, NULL);
  
   mmc_release_host(host);
 + up(mmc_classdev(host)-sem);
  
   if (err) {
   mmc_sd_remove(host);
 Index: usb-2.6/drivers/mmc/core/sdio.c
 ===
 --- usb-2.6.orig/drivers/mmc/core/sdio.c
 +++ usb-2.6/drivers/mmc/core/sdio.c
 @@ -195,6 +195,7 @@ static void mmc_sdio_detect(struct mmc_h
   BUG_ON(!host);
   BUG_ON(!host-card);
  
 + down(mmc_classdev(host)-sem);
   mmc_claim_host(host);
  
   /*
 @@ -203,6 +204,7 @@ static void mmc_sdio_detect(struct mmc_h
   err = mmc_select_card(host-card);
  
   mmc_release_host(host);
 + up(mmc_classdev(host)-sem);
  
   if (err) {
   mmc_sdio_remove(host);
 
 
 



-- 
Premature optimization is the root of all evil. - Donald Knuth
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Bugs in MMC [was: [Bug 10030] Suspend doesn't work when SD card is inserted]

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

  In fact this turns out to be exactly the problem with the MMC
  subsystem.  It uses a dedicated workqueue (kmmcd) to handle state
  changes, and mmc_suspend_host() does a flush_workqueue().  If the
  workqueue contains an entry to register or unregister a device, this is
  guaranteed to deadlock -- and that's exactly what happens when Zdenek 
  inserts or removes a card during the disk-drive spindown time of a 
  system sleep.
  
  It looks like the best solution is for mmc_suspend_host() not to flush
  the workqueue; instead the workqueue should prevent itself from
  running during a suspend.  Right now the easiest way to accomplish this
  is for the workqueue routines (mmc_detect() and siblings) to acquire
  the mmc_host's device semaphore.  If in the future the MMC subsystem
  adds dynamic PM support then a more complicated arrangement will be 
  needed.
  
  So the patch below, in combination with the previous patch, might just 
  fix the whole problem.
 
 The patch looks sane and I think you're right.

Well, the patch itself isn't really adequate; it was just a first pass 
intended to illustrate the nature of the problem.

The problems in the MMC subsystem go deeper.

The first is the way it uses flush_workqueue().  This is almost always 
a bad function to call, because it introduces unnecessary dependencies.  
cancel_delayed_work_sync() is much better.

But even changing that won't solve the second issue, which is a genuine
bug.  There is a race between detect events and suspend events.  The
mmc_suspend_host() routine starts out by flushing the kmmcd workqueue
before calling the host's suspend routine.  So what happens if another
detect event occurs in between?

This whole area of synchronization between card insertion, card
removal, suspend, and resume needs to be thought out better.  
Especially keeping in mind that device registration or unregistration
during a system sleep is likely to block until the sleep is over.

Lastly, there are some other questions about confusing aspects of the
subsystem.  What's the story with __mmc_claim_host()?  Is it really
nothing more than an abortable mutex_lock()?  Why does it ever need to
be aborted?  Why is its second argument an atomic_t instead of a normal
int?

Why are mmc_detect() and related routines described in comments as
Card detection callback from host?  They don't handle card
_detection_ -- they handle card _removal_.

What's the purpose of the card-counting loop in 
host/omap.c:mmc_omap_switch_handler()?  It looks like dead code.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Pavel Machek
Hi!

 Index: usb-2.6/drivers/base/power/main.c
 ===
 --- usb-2.6.orig/drivers/base/power/main.c
 +++ usb-2.6/drivers/base/power/main.c
 @@ -25,6 +25,7 @@
  #include linux/pm.h
  #include linux/resume-trace.h
  #include linux/rwsem.h
 +#include linux/sched.h
  
  #include ../base.h
  #include power.h
 @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
  
  int (*platform_enable_wakeup)(struct device *dev, int is_on);
  
 +static struct task_struct *suspending_task;

What locking protects this variable? What happens when suspending_task
exits? (Hmm, that would probably be bug, anyway?)

Or are we running UP when this is accessed? This at least needs a big
fat comment.

 +bool in_suspend_context(void)
 +{
 + return (suspending_task == current);
 +}


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Pavel Machek wrote:

 Hi!
 
  Index: usb-2.6/drivers/base/power/main.c
  ===
  --- usb-2.6.orig/drivers/base/power/main.c
  +++ usb-2.6/drivers/base/power/main.c
  @@ -25,6 +25,7 @@
   #include linux/pm.h
   #include linux/resume-trace.h
   #include linux/rwsem.h
  +#include linux/sched.h
   
   #include ../base.h
   #include power.h
  @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
   
   int (*platform_enable_wakeup)(struct device *dev, int is_on);
   
  +static struct task_struct *suspending_task;
 
 What locking protects this variable? What happens when suspending_task
 exits? (Hmm, that would probably be bug, anyway?)

It's protected by whatever existing locking scheme allows only one
task to start a system sleep at a time.  For example, the suspending 
task has to get a write lock on pm_sleep_rwsem.

Yes, if the suspending task exits before the system has woken up, 
you're in trouble regardless.

 Or are we running UP when this is accessed? This at least needs a big
 fat comment.
 
  +bool in_suspend_context(void)
  +{
  +   return (suspending_task == current);
  +}

We aren't necessarily UP.  But since all that matters is whether or not 
suspend_task is equal to the current task, no extra locking is needed.  

I'll add a comment explaining all this.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

  You know, with this new patch we probably don't need 
  device_pm_schedule_removal() any more.
 
 No, we don't.  However, because of that dpm_suspend() and device_power_down()
 need to be changed not to assume that the removed devices will end up on
 dpm_destroy.

Yes.  The safest approach will be to check whether the device is still 
at the end of the dpm_locked or dpm_off list; if it hasn't then we can 
assume that it has been removed (or scheduled for removal).

 IMO you also should add this change in device_power_down():
 
 @@ -388,18 +343,15 @@ int device_power_down(pm_message_t state
   struct list_head *entry = dpm_off.prev;
   struct device *dev = to_device(entry);
  
 - list_del_init(dev-power.entry);
   error = suspend_device_late(dev, state);
   if (error) {
   printk(KERN_ERR Could not power down device %s: 
   error %d\n,
   kobject_name(dev-kobj), error);
 - if (list_empty(dev-power.entry))
 - list_add(dev-power.entry, dpm_off);
   break;
   }
 - if (list_empty(dev-power.entry))
 - list_add(dev-power.entry, dpm_off_irq);
 + if (!list_empty(dev-power.entry))
 + list_move(dev-power.entry, dpm_off_irq);
   }

Actually I changed the last two lines to:

+
+   /* Don't move the entry if the device has been unregistered */
+   if (entry == dpm_off.prev)
+   list_move(entry, dpm_off_irq);

with analogous changes to dpm_suspend().

Any more suggestions for updates?

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Zdenek Kabelac
2008/2/24, Alan Stern [EMAIL PROTECTED]:
 On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:


   You know, with this new patch we probably don't need
device_pm_schedule_removal() any more.
  
   No, we don't.  However, because of that dpm_suspend() and 
 device_power_down()
   need to be changed not to assume that the removed devices will end up on
   dpm_destroy.


  with analogous changes to dpm_suspend().

  Any more suggestions for updates?


Is there any current patch I should actually test - it looks there is
ongoing discussion from the time Rafael has proposed his last patch to
check by me: http://bugzilla.kernel.org/attachment.cgi?id=14961action=view

But I think it's visible the only suspend update will not resolve my
problems and the problems is deeper in the mmc.

So should I still test yesterdays' patch ?

Zdenek

Zdenek
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Pavel Machek
Hi!

   @@ -25,6 +25,7 @@
#include linux/pm.h
#include linux/resume-trace.h
#include linux/rwsem.h
   +#include linux/sched.h

#include ../base.h
#include power.h
   @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);

int (*platform_enable_wakeup)(struct device *dev, int is_on);

   +static struct task_struct *suspending_task;
  
  What locking protects this variable? What happens when suspending_task
  exits? (Hmm, that would probably be bug, anyway?)
 
 It's protected by whatever existing locking scheme allows only one
 task to start a system sleep at a time.  For example, the suspending 
 task has to get a write lock on pm_sleep_rwsem.

And readers of suspending_task are protected by?

At the very least, you'd need rmb() before reading it and wmb() after
writing to it, but I'm not sure if that's enough on every obscure
architecture out there.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Zdenek Kabelac wrote:
 2008/2/24, Alan Stern [EMAIL PROTECTED]:
  On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
 
 
You know, with this new patch we probably don't need
 device_pm_schedule_removal() any more.
   
No, we don't.  However, because of that dpm_suspend() and 
  device_power_down()
need to be changed not to assume that the removed devices will end up on
dpm_destroy.
 
 
   with analogous changes to dpm_suspend().
 
   Any more suggestions for updates?
 
 
 Is there any current patch I should actually test - it looks there is
 ongoing discussion from the time Rafael has proposed his last patch to
 check by me: http://bugzilla.kernel.org/attachment.cgi?id=14961action=view
 
 But I think it's visible the only suspend update will not resolve my
 problems and the problems is deeper in the mmc.

Yes, there are problems in there, but your feedback is needed nevertheless.
 
 So should I still test yesterdays' patch ?

Yes please.  It will be valuable information to us whether it fixes the suspend
issue at least.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Rafael J. Wysocki wrote:
 On Sunday, 24 of February 2008, Alan Stern wrote:
  On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
  
   On Sunday, 24 of February 2008, Alan Stern wrote:
On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:

 Ultimately no, it's not.  However, we are now late in the -rc2 time 
 frame and
 the release of -rc3 seems to be imminent.  At this point, IMO, that's 
 the
 safest thing to do.  BTW, appended is the patch I'd like to get 
 applied.

In the interest of having a safe release, I guess you're right.
   
   That's what I'm concerned about at the moment.  I'm afraid there won't be
   enough time to nail down all the issues (there may be some we're not even
   aware of).
  
  All right.  You'll need to enlarge your big reversal patch by reverting 
  commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f.
 
 Thanks for the hint.
 
 In fact, I had to add a couple of fixes in device_power_down() and
 dpm_suspend().  The entire patch is appended.

I hope it's fine by everyone to push the patch below to Greg for 2.6.25.

Thanks,
Rafael


---
From: Rafael J. Wysocki [EMAIL PROTECTED]

Remove the code that acquires all device semaphores from the suspend
code path as it causes multiple problems to appear (most notably,
http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
depending on the code being removed.

Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid
the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.

Signed-off-by: Rafael J. Wysocki [EMAIL PROTECTED]
---
 drivers/base/core.c   |8 ---
 drivers/base/power/main.c |   97 +++---
 drivers/usb/core/usb.c|2 
 3 files changed, 8 insertions(+), 99 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -48,7 +48,6 @@
  */
 
 LIST_HEAD(dpm_active);
-static LIST_HEAD(dpm_locked);
 static LIST_HEAD(dpm_off);
 static LIST_HEAD(dpm_off_irq);
 static LIST_HEAD(dpm_destroy);
@@ -81,28 +80,6 @@ void device_pm_add(struct device *dev)
  */
 void device_pm_remove(struct device *dev)
 {
-   /*
-* If this function is called during a suspend, it will be blocked,
-* because we're holding the device's semaphore at that time, which may
-* lead to a deadlock.  In that case we want to print a warning.
-* However, it may also be called by unregister_dropped_devices() with
-* the device's semaphore released, in which case the warning should
-* not be printed.
-*/
-   if (down_trylock(dev-sem)) {
-   if (down_read_trylock(pm_sleep_rwsem)) {
-   /* No suspend in progress, wait on dev-sem */
-   down(dev-sem);
-   up_read(pm_sleep_rwsem);
-   } else {
-   /* Suspend in progress, we may deadlock */
-   dev_warn(dev, Suspicious %s during suspend\n,
-   __FUNCTION__);
-   dump_stack();
-   /* The user has been warned ... */
-   down(dev-sem);
-   }
-   }
pr_debug(PM: Removing info for %s:%s\n,
 dev-bus ? dev-bus-name : No Bus,
 kobject_name(dev-kobj));
@@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev
dpm_sysfs_remove(dev);
list_del_init(dev-power.entry);
mutex_unlock(dpm_list_mtx);
-   up(dev-sem);
 }
 
 /**
@@ -266,7 +242,7 @@ static void dpm_resume(void)
struct list_head *entry = dpm_off.next;
struct device *dev = to_device(entry);
 
-   list_move_tail(entry, dpm_locked);
+   list_move_tail(entry, dpm_active);
mutex_unlock(dpm_list_mtx);
resume_device(dev);
mutex_lock(dpm_list_mtx);
@@ -275,25 +251,6 @@ static void dpm_resume(void)
 }
 
 /**
- * unlock_all_devices - Release each device's semaphore
- *
- * Go through the dpm_off list.  Put each device on the dpm_active
- * list and unlock it.
- */
-static void unlock_all_devices(void)
-{
-   mutex_lock(dpm_list_mtx);
-   while (!list_empty(dpm_locked)) {
-   struct list_head *entry = dpm_locked.prev;
-   struct device *dev = to_device(entry);
-
-   list_move(entry, dpm_active);
-   up(dev-sem);
-   }
-   mutex_unlock(dpm_list_mtx);
-}
-
-/**
  * unregister_dropped_devices - Unregister devices scheduled for removal
  *
  * Unregister all devices on the dpm_destroy list.
@@ -305,7 +262,6 @@ static void unregister_dropped_devices(v
struct list_head *entry = dpm_destroy.next;

Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Pavel Machek wrote:

   What locking protects this variable? What happens when suspending_task
   exits? (Hmm, that would probably be bug, anyway?)
  
  It's protected by whatever existing locking scheme allows only one
  task to start a system sleep at a time.  For example, the suspending 
  task has to get a write lock on pm_sleep_rwsem.
 
 And readers of suspending_task are protected by?

I added a comment about that too.

 At the very least, you'd need rmb() before reading it and wmb() after
 writing to it, but I'm not sure if that's enough on every obscure
 architecture out there.

No, neither one is needed because of the way suspending_task is used.  

It's not necessary for a reader R to see the variable's actual value;  
all R needs to know is whether or not suspending_task is equal to R.  
Since the only process which can set suspending_task to R is R itself,
and since R will set suspending_task back to NULL before releasing the
write lock on pm_sleep_rwsem, there's never any ambiguity.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

 From: Rafael J. Wysocki [EMAIL PROTECTED]
 
 Remove the code that acquires all device semaphores from the suspend
 code path as it causes multiple problems to appear (most notably,
 http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
 change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
 depending on the code being removed.
 
 Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid
 the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.

Do you actually know whether removing that lock fixes the reported bug?

I agree, the lock should be removed for now.  But I'd sure like to get 
some feedback about what's going wrong.  It's starting to look as 
though we'd be a lot better off blocking device registration during 
sleep instead of failing it.

Shouldn't resume_device() and suspend_device() now acquire the device 
semaphore before calling the various methods?  That's the way they 
used to work.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

 From: Rafael J. Wysocki [EMAIL PROTECTED]
 
 Remove the code that acquires all device semaphores from the suspend
 code path as it causes multiple problems to appear (most notably,
 http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
 change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
 depending on the code being removed.
 
 Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid
 the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.

This looks okay.

Are you going to submit it just for 2.6.25, leaving the existing code 
as-is for 2.6.26?

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Alan Stern wrote:
 On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
 
  From: Rafael J. Wysocki [EMAIL PROTECTED]
  
  Remove the code that acquires all device semaphores from the suspend
  code path as it causes multiple problems to appear (most notably,
  http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
  change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
  depending on the code being removed.
  
  Remove pm_sleep_lock()/pm_sleep_unlock() from device add to avoid
  the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.
 
 Do you actually know whether removing that lock fixes the reported bug?

In fact I'm waiting for the confirmation, but given the symptoms described by
Lukas I'm quite confident that the problem shouldn't be reproducible with
the lock removed.

 I agree, the lock should be removed for now.  But I'd sure like to get 
 some feedback about what's going wrong.

No question about that.

 It's starting to look as though we'd be a lot better off blocking device
 registration during sleep instead of failing it.

Agreed.

 Shouldn't resume_device() and suspend_device() now acquire the device 
 semaphore before calling the various methods?  That's the way they 
 used to work.

Yes, thanks for reminding me about that.

Updated patch follows.

Thanks,
Rafael


---
From: Rafael J. Wysocki [EMAIL PROTECTED]

Remove the code that acquires all device semaphores from the suspend
code path as it causes multiple problems to appear (most notably,
http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
depending on the code being removed.

Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid
the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.

Signed-off-by: Rafael J. Wysocki [EMAIL PROTECTED]
---
 drivers/base/core.c   |8 ---
 drivers/base/power/main.c |  106 ++
 drivers/usb/core/usb.c|2 
 3 files changed, 17 insertions(+), 99 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -48,7 +48,6 @@
  */
 
 LIST_HEAD(dpm_active);
-static LIST_HEAD(dpm_locked);
 static LIST_HEAD(dpm_off);
 static LIST_HEAD(dpm_off_irq);
 static LIST_HEAD(dpm_destroy);
@@ -81,28 +80,6 @@ void device_pm_add(struct device *dev)
  */
 void device_pm_remove(struct device *dev)
 {
-   /*
-* If this function is called during a suspend, it will be blocked,
-* because we're holding the device's semaphore at that time, which may
-* lead to a deadlock.  In that case we want to print a warning.
-* However, it may also be called by unregister_dropped_devices() with
-* the device's semaphore released, in which case the warning should
-* not be printed.
-*/
-   if (down_trylock(dev-sem)) {
-   if (down_read_trylock(pm_sleep_rwsem)) {
-   /* No suspend in progress, wait on dev-sem */
-   down(dev-sem);
-   up_read(pm_sleep_rwsem);
-   } else {
-   /* Suspend in progress, we may deadlock */
-   dev_warn(dev, Suspicious %s during suspend\n,
-   __FUNCTION__);
-   dump_stack();
-   /* The user has been warned ... */
-   down(dev-sem);
-   }
-   }
pr_debug(PM: Removing info for %s:%s\n,
 dev-bus ? dev-bus-name : No Bus,
 kobject_name(dev-kobj));
@@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev
dpm_sysfs_remove(dev);
list_del_init(dev-power.entry);
mutex_unlock(dpm_list_mtx);
-   up(dev-sem);
 }
 
 /**
@@ -230,6 +206,8 @@ static int resume_device(struct device *
TRACE_DEVICE(dev);
TRACE_RESUME(0);
 
+   down(dev-sem);
+
if (dev-bus  dev-bus-resume) {
dev_dbg(dev,resuming\n);
error = dev-bus-resume(dev);
@@ -245,6 +223,8 @@ static int resume_device(struct device *
error = dev-class-resume(dev);
}
 
+   up(dev-sem);
+
TRACE_RESUME(error);
return error;
 }
@@ -266,7 +246,7 @@ static void dpm_resume(void)
struct list_head *entry = dpm_off.next;
struct device *dev = to_device(entry);
 
-   list_move_tail(entry, dpm_locked);
+   list_move_tail(entry, dpm_active);
mutex_unlock(dpm_list_mtx);
resume_device(dev);
mutex_lock(dpm_list_mtx);
@@ -275,25 +255,6 @@ static void dpm_resume(void)
 }
 
 /**
- * unlock_all_devices - Release each device's semaphore
- *
- * Go through the 

Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Pavel Machek
On Sun 2008-02-24 15:33:01, Alan Stern wrote:
 On Sun, 24 Feb 2008, Pavel Machek wrote:
 
What locking protects this variable? What happens when suspending_task
exits? (Hmm, that would probably be bug, anyway?)
   
   It's protected by whatever existing locking scheme allows only one
   task to start a system sleep at a time.  For example, the suspending 
   task has to get a write lock on pm_sleep_rwsem.
  
  And readers of suspending_task are protected by?
 
 I added a comment about that too.
 
  At the very least, you'd need rmb() before reading it and wmb() after
  writing to it, but I'm not sure if that's enough on every obscure
  architecture out there.
 
 No, neither one is needed because of the way suspending_task is used.  
 
 It's not necessary for a reader R to see the variable's actual value;  
 all R needs to know is whether or not suspending_task is equal to R.  
 Since the only process which can set suspending_task to R is R itself,
 and since R will set suspending_task back to NULL before releasing the
 write lock on pm_sleep_rwsem, there's never any ambiguity.

Subtle.

Very subtly wrong ;-).

imagine suspending_task == 0xabcdef01. Now task R with current ==
0xabcd reads suspending_task while the other cpu is writing to it,
and sees 0xabcd (0xef01 was not yet written) -- and mistakenly
believes that  R == suspending_task.

I agree it is very unlikely, and it will not happen on i386. But what
about just using atomic_t suspending_task, and store current-pid into
it?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Pavel Machek wrote:
 On Sun 2008-02-24 15:33:01, Alan Stern wrote:
  On Sun, 24 Feb 2008, Pavel Machek wrote:
  
 What locking protects this variable? What happens when suspending_task
 exits? (Hmm, that would probably be bug, anyway?)

It's protected by whatever existing locking scheme allows only one
task to start a system sleep at a time.  For example, the suspending 
task has to get a write lock on pm_sleep_rwsem.
   
   And readers of suspending_task are protected by?
  
  I added a comment about that too.
  
   At the very least, you'd need rmb() before reading it and wmb() after
   writing to it, but I'm not sure if that's enough on every obscure
   architecture out there.
  
  No, neither one is needed because of the way suspending_task is used.  
  
  It's not necessary for a reader R to see the variable's actual value;  
  all R needs to know is whether or not suspending_task is equal to R.  
  Since the only process which can set suspending_task to R is R itself,
  and since R will set suspending_task back to NULL before releasing the
  write lock on pm_sleep_rwsem, there's never any ambiguity.
 
 Subtle.
 
 Very subtly wrong ;-).
 
 imagine suspending_task == 0xabcdef01. Now task R with current ==
 0xabcd reads suspending_task while the other cpu is writing to it,
 and sees 0xabcd (0xef01 was not yet written) -- and mistakenly
 believes that  R == suspending_task.
 
 I agree it is very unlikely, and it will not happen on i386. But what
 about just using atomic_t suspending_task, and store current-pid into
 it?

I'd rather use a lock, frankly.  For example, we can require the readers to
take pm_sleep_rwsem for reading in order to access that.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Alan Stern wrote:
 On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:
 
  From: Rafael J. Wysocki [EMAIL PROTECTED]
  
  Remove the code that acquires all device semaphores from the suspend
  code path as it causes multiple problems to appear (most notably,
  http://bugzilla.kernel.org/show_bug.cgi?id=10030) and revert the
  change introduced by commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f
  depending on the code being removed.
  
  Remove pm_sleep_lock()/pm_sleep_unlock() from device_add() to avoid
  the issue reported at http://bugzilla.kernel.org/show_bug.cgi?id=9874.
 
 This looks okay.
 
 Are you going to submit it just for 2.6.25, leaving the existing code 
 as-is for 2.6.26?

No, I'd like to start over on top of 2.6.25.  That will be much cleaner from
the upstream point of view.

We seem to have all the pieces anyway, so that's only a matter of putting them
back together.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

  Very subtly wrong ;-).
  
  imagine suspending_task == 0xabcdef01. Now task R with current ==
  0xabcd reads suspending_task while the other cpu is writing to it,
  and sees 0xabcd (0xef01 was not yet written) -- and mistakenly
  believes that  R == suspending_task.
  
  I agree it is very unlikely, and it will not happen on i386. But what
  about just using atomic_t suspending_task, and store current-pid into
  it?
 
 I'd rather use a lock, frankly.  For example, we can require the readers to
 take pm_sleep_rwsem for reading in order to access that.

That certainly won't work.  Imagine what would happen when the reader 
_was_ the suspending task.

But if you really twist my arm, I'll go along with Pavel's suggestion.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-24 Thread Alan Stern
On Sun, 24 Feb 2008, Pavel Machek wrote:

   At the very least, you'd need rmb() before reading it and wmb() after
   writing to it, but I'm not sure if that's enough on every obscure
   architecture out there.
  
  No, neither one is needed because of the way suspending_task is used.  
  
  It's not necessary for a reader R to see the variable's actual value;  
  all R needs to know is whether or not suspending_task is equal to R.  
  Since the only process which can set suspending_task to R is R itself,
  and since R will set suspending_task back to NULL before releasing the
  write lock on pm_sleep_rwsem, there's never any ambiguity.
 
 Subtle.
 
 Very subtly wrong ;-).
 
 imagine suspending_task == 0xabcdef01. Now task R with current ==
 0xabcd reads suspending_task while the other cpu is writing to it,
 and sees 0xabcd (0xef01 was not yet written) -- and mistakenly
 believes that  R == suspending_task.

I always thought that reads and writes of pointers are atomic, just 
like reads and writes of longs.  Is that wrong?

Now if pointers were the same size as long long then I would agree with 
you.

 I agree it is very unlikely, and it will not happen on i386. But what
 about just using atomic_t suspending_task, and store current-pid into
 it?

That would work just as well.  Except that it wouldn't need to be
atomic_t, because current-pid is always an integer (not guaranteed, I
suppose, but that's what it is on all current architectures) and
reads/writes of ints _are_ atomic.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-23 Thread Alan Stern
On Sat, 23 Feb 2008, Alan Stern wrote:

> It happened in a workqueue.  There could be lots of similar cases: Some 
> interrupt-driven event causes a hotplug action.  Since the action can't 
> be carried out in interrupt context, the driver has no choice but to 
> defer it to a workqueue or kernel thread.  Blocking that workqueue or 
> kernel thread won't cause a problem _unless_ the driver's 
> suspend/resume methods try to synchronize with it.  That's the 
> difficult case.

In fact this turns out to be exactly the problem with the MMC
subsystem.  It uses a dedicated workqueue (kmmcd) to handle state
changes, and mmc_suspend_host() does a flush_workqueue().  If the
workqueue contains an entry to register or unregister a device, this is
guaranteed to deadlock -- and that's exactly what happens when Zdenek 
inserts or removes a card during the disk-drive spindown time of a 
system sleep.

It looks like the best solution is for mmc_suspend_host() not to flush
the workqueue; instead the workqueue should prevent itself from
running during a suspend.  Right now the easiest way to accomplish this
is for the workqueue routines (mmc_detect() and siblings) to acquire
the mmc_host's device semaphore.  If in the future the MMC subsystem
adds dynamic PM support then a more complicated arrangement will be 
needed.

So the patch below, in combination with the previous patch, might just 
fix the whole problem.

Alan Stern



Index: usb-2.6/drivers/mmc/core/core.c
===
--- usb-2.6.orig/drivers/mmc/core/core.c
+++ usb-2.6/drivers/mmc/core/core.c
@@ -731,8 +731,6 @@ void mmc_stop_host(struct mmc_host *host
  */
 int mmc_suspend_host(struct mmc_host *host, pm_message_t state)
 {
-   mmc_flush_scheduled_work();
-
mmc_bus_get(host);
if (host->bus_ops && !host->bus_dead) {
if (host->bus_ops->suspend)
Index: usb-2.6/drivers/mmc/core/mmc.c
===
--- usb-2.6.orig/drivers/mmc/core/mmc.c
+++ usb-2.6/drivers/mmc/core/mmc.c
@@ -441,6 +441,7 @@ static void mmc_detect(struct mmc_host *
BUG_ON(!host);
BUG_ON(!host->card);
 
+   down(_classdev(host)->sem);
mmc_claim_host(host);
 
/*
@@ -449,6 +450,7 @@ static void mmc_detect(struct mmc_host *
err = mmc_send_status(host->card, NULL);
 
mmc_release_host(host);
+   up(_classdev(host)->sem);
 
if (err) {
mmc_remove(host);
Index: usb-2.6/drivers/mmc/core/sd.c
===
--- usb-2.6.orig/drivers/mmc/core/sd.c
+++ usb-2.6/drivers/mmc/core/sd.c
@@ -500,6 +500,7 @@ static void mmc_sd_detect(struct mmc_hos
BUG_ON(!host);
BUG_ON(!host->card);
 
+   down(_classdev(host)->sem);
mmc_claim_host(host);
 
/*
@@ -508,6 +509,7 @@ static void mmc_sd_detect(struct mmc_hos
err = mmc_send_status(host->card, NULL);
 
mmc_release_host(host);
+   up(_classdev(host)->sem);
 
if (err) {
mmc_sd_remove(host);
Index: usb-2.6/drivers/mmc/core/sdio.c
===
--- usb-2.6.orig/drivers/mmc/core/sdio.c
+++ usb-2.6/drivers/mmc/core/sdio.c
@@ -195,6 +195,7 @@ static void mmc_sdio_detect(struct mmc_h
BUG_ON(!host);
BUG_ON(!host->card);
 
+   down(_classdev(host)->sem);
mmc_claim_host(host);
 
/*
@@ -203,6 +204,7 @@ static void mmc_sdio_detect(struct mmc_h
err = mmc_select_card(host->card);
 
mmc_release_host(host);
+   up(_classdev(host)->sem);
 
if (err) {
mmc_sdio_remove(host);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-23 Thread Alan Stern
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

> On Sunday, 24 of February 2008, Alan Stern wrote:
> > On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
> > 
> > > Ultimately no, it's not.  However, we are now late in the -rc2 time frame 
> > > and
> > > the release of -rc3 seems to be imminent.  At this point, IMO, that's the
> > > safest thing to do.  BTW, appended is the patch I'd like to get applied.
> > 
> > In the interest of having a safe release, I guess you're right.
> 
> That's what I'm concerned about at the moment.  I'm afraid there won't be
> enough time to nail down all the issues (there may be some we're not even
> aware of).

All right.  You'll need to enlarge your big reversal patch by reverting 
commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f.

> > Below is my suggested approach, based on yours.  It might even solve
> > the MMC problem, who knows?  And it adds some potentially useful
> > debugging, although it would be better to dump just the stack of
> > suspending_task.  Do you know how to do that from within a timer
> > routine?
> 
> No, I don't.

On further checking the answer turns out to be sched_show_task().  
That's the routine which gets called for each process when you type 
Alt-SysRq-t.

> > I still have no clear idea about what's going on with the ACPI bug in 
> > #9874.
> 
> It seems that the ACPI code is not prepared to cope with a failing device
> registration, in which case it'd need fixing.  Frankly, I'm afraid there may
> be more issues like this throughout the tree.

The fact remains that it is unsafe to register a device during a sleep 
transition, although if the device's driver doesn't have a suspend or 
resume method you can probably get away with it.

> > However perhaps the bug wouldn't occur if we blocked device  
> > registration until after the resume was finished, instead of failing it 
> > outright.  Since the registration in this case was done in a workqueue 
> > thread, blocking wouldn't cause an immediate deadlock.
> 
> No, it wouldn't, but the fact that it happens in the ACPI code makes me worry.

It happened in a workqueue.  There could be lots of similar cases: Some 
interrupt-driven event causes a hotplug action.  Since the action can't 
be carried out in interrupt context, the driver has no choice but to 
defer it to a workqueue or kernel thread.  Blocking that workqueue or 
kernel thread won't cause a problem _unless_ the driver's 
suspend/resume methods try to synchronize with it.  That's the 
difficult case.

> If we block that code and the things it's supposed to do turn out to be
> necessary for suspending later on, we'll be in trouble.

Exactly.

> > @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev
> > /* No suspend in progress, wait on dev->sem */
> > down(>sem);
> > up_read(_sleep_rwsem);
> > -   } else {
> > -   /* Suspend in progress, we may deadlock */
> > +   } else if (!in_suspend_context()) {
> > +   /* Suspending in another task, we may deadlock */
> 
> Well, that shouldn't really deadlock.  If it does, there is a potential design
> issue somewhere.  I think it might be better to set up a timer in here too.

At this point the driver core owns the device semaphore, so the 
unregistration task will block until the sleep is over.  If the 
driver's resume method has to synchronize with the unregistration task 
then it will deadlock.  I agree, it would be a design issue.  In fact, 
it's the same design issue described earlier in this email.

> Although IMO it would be even better to just set up a timer and remove the
> warning altogehter.

That warning was just for debugging purposes.  A timer in the resume
path could do the same thing with fewer false alarms, just like the 
timer in the suspend path.  I have added it to the new patch.

> > dev_warn(dev, "Suspicious %s during suspend\n",
> > __FUNCTION__);
> > dump_stack();
> > /* The user has been warned ... */
> > down(>sem);
> > }
> > +   /* Otherwise we're okay */
> > }
> > pr_debug("PM: Removing info for %s:%s\n",
> >  dev->bus ? dev->bus->name : "No Bus",
> 
> There's a problem here, because we shouldn't release the semaphore if we're
> in the suspend context.

Yes, you're right.  It's fixed in the new version.

> > +static void suspend_timeout(unsigned long _dev)
> > +{
> > +   struct device *dev = (struct device *) _dev;
> > +
> > +   dev_err(dev, "deadlock during suspend!\n");
> > +   show_state();
> 
> The show_state() seems to be overkill and won't really help if the user can't
> collect the output on the fly using a serial console or something like this.
> [The debug stuff printed here should fit a typical laptop screen, ideally.]

Changed.

> I'd prefer
> 
>   if (in_suspend_context()) {
>   

Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-23 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Alan Stern wrote:
> On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
> 
> > Ultimately no, it's not.  However, we are now late in the -rc2 time frame 
> > and
> > the release of -rc3 seems to be imminent.  At this point, IMO, that's the
> > safest thing to do.  BTW, appended is the patch I'd like to get applied.
> 
> In the interest of having a safe release, I guess you're right.

That's what I'm concerned about at the moment.  I'm afraid there won't be
enough time to nail down all the issues (there may be some we're not even
aware of).

> It's unfortunate, though -- there's no good way to get thorough testing 
> without putting the code in Linus's tree.

Absolutely.  Still, the code in question introduces unexpected behavior that
we don't really understand and it's not safe to leave it in the mainline.
 
> > > How would we then learn about drivers trying to register or unregister a
> > > device during a sleep transition?
> > 
> > I think we should fix the ones we know about and try to reintroduce this
> > change in the 2.6.26 time frame.  It also seems reasonable to reconsider 
> > what
> > we want to achieve, as there may be a better way to do that.
> 
> Below is my suggested approach, based on yours.  It might even solve
> the MMC problem, who knows?  And it adds some potentially useful
> debugging, although it would be better to dump just the stack of
> suspending_task.  Do you know how to do that from within a timer
> routine?

No, I don't.

> I still have no clear idea about what's going on with the ACPI bug in 
> #9874.

It seems that the ACPI code is not prepared to cope with a failing device
registration, in which case it'd need fixing.  Frankly, I'm afraid there may
be more issues like this throughout the tree.

> However perhaps the bug wouldn't occur if we blocked device  
> registration until after the resume was finished, instead of failing it 
> outright.  Since the registration in this case was done in a workqueue 
> thread, blocking wouldn't cause an immediate deadlock.

No, it wouldn't, but the fact that it happens in the ACPI code makes me worry.

If we block that code and the things it's supposed to do turn out to be
necessary for suspending later on, we'll be in trouble.

> Index: usb-2.6/drivers/base/power/main.c
> ===
> --- usb-2.6.orig/drivers/base/power/main.c
> +++ usb-2.6/drivers/base/power/main.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "../base.h"
>  #include "power.h"
> @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
>  
>  int (*platform_enable_wakeup)(struct device *dev, int is_on);
>  
> +static struct task_struct *suspending_task;
> +
> +bool in_suspend_context(void)
> +{
> + return (suspending_task == current);
> +}
> +
>  /**
>   *   device_pm_add - add a device to the list of active devices
>   *   @dev:   Device to be added to the list
> @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev
>   /* No suspend in progress, wait on dev->sem */
>   down(>sem);
>   up_read(_sleep_rwsem);
> - } else {
> - /* Suspend in progress, we may deadlock */
> + } else if (!in_suspend_context()) {
> + /* Suspending in another task, we may deadlock */

Well, that shouldn't really deadlock.  If it does, there is a potential design
issue somewhere.  I think it might be better to set up a timer in here too.

Although IMO it would be even better to just set up a timer and remove the
warning altogehter.

>   dev_warn(dev, "Suspicious %s during suspend\n",
>   __FUNCTION__);
>   dump_stack();
>   /* The user has been warned ... */
>   down(>sem);
>   }
> + /* Otherwise we're okay */
>   }
>   pr_debug("PM: Removing info for %s:%s\n",
>dev->bus ? dev->bus->name : "No Bus",

There's a problem here, because we shouldn't release the semaphore if we're
in the suspend context.

> @@ -272,6 +281,7 @@ static void dpm_resume(void)
>   mutex_lock(_list_mtx);
>   }
>   mutex_unlock(_list_mtx);
> + suspending_task = NULL;
>  }
>  
>  /**
> @@ -410,6 +420,17 @@ int device_power_down(pm_message_t state
>  }
>  EXPORT_SYMBOL_GPL(device_power_down);
>  
> +/* Provide debugging info if we hang or deadlock during suspend */
> +static struct timer_list suspend_timer;
> +
> +static void suspend_timeout(unsigned long _dev)
> +{
> + struct device *dev = (struct device *) _dev;
> +
> + dev_err(dev, "deadlock during suspend!\n");
> + show_state();

The show_state() seems to be overkill and won't really help if the user can't
collect the output on the fly using a serial console or something like this.
[The debug stuff printed here should 

Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-23 Thread Alan Stern
On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:

> Ultimately no, it's not.  However, we are now late in the -rc2 time frame and
> the release of -rc3 seems to be imminent.  At this point, IMO, that's the
> safest thing to do.  BTW, appended is the patch I'd like to get applied.

In the interest of having a safe release, I guess you're right.  It's 
unfortunate, though -- there's no good way to get thorough testing 
without putting the code in Linus's tree.

> > How would we then learn about drivers trying to register or unregister a
> > device during a sleep transition?
> 
> I think we should fix the ones we know about and try to reintroduce this
> change in the 2.6.26 time frame.  It also seems reasonable to reconsider what
> we want to achieve, as there may be a better way to do that.

Below is my suggested approach, based on yours.  It might even solve
the MMC problem, who knows?  And it adds some potentially useful
debugging, although it would be better to dump just the stack of
suspending_task.  Do you know how to do that from within a timer
routine?

I still have no clear idea about what's going on with the ACPI bug in 
#9874.  However perhaps the bug wouldn't occur if we blocked device 
registration until after the resume was finished, instead of failing it 
outright.  Since the registration in this case was done in a workqueue 
thread, blocking wouldn't cause an immediate deadlock.

Alan Stern



Index: usb-2.6/drivers/base/power/main.c
===
--- usb-2.6.orig/drivers/base/power/main.c
+++ usb-2.6/drivers/base/power/main.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../base.h"
 #include "power.h"
@@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
 
 int (*platform_enable_wakeup)(struct device *dev, int is_on);
 
+static struct task_struct *suspending_task;
+
+bool in_suspend_context(void)
+{
+   return (suspending_task == current);
+}
+
 /**
  * device_pm_add - add a device to the list of active devices
  * @dev:   Device to be added to the list
@@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev
/* No suspend in progress, wait on dev->sem */
down(>sem);
up_read(_sleep_rwsem);
-   } else {
-   /* Suspend in progress, we may deadlock */
+   } else if (!in_suspend_context()) {
+   /* Suspending in another task, we may deadlock */
dev_warn(dev, "Suspicious %s during suspend\n",
__FUNCTION__);
dump_stack();
/* The user has been warned ... */
down(>sem);
}
+   /* Otherwise we're okay */
}
pr_debug("PM: Removing info for %s:%s\n",
 dev->bus ? dev->bus->name : "No Bus",
@@ -272,6 +281,7 @@ static void dpm_resume(void)
mutex_lock(_list_mtx);
}
mutex_unlock(_list_mtx);
+   suspending_task = NULL;
 }
 
 /**
@@ -410,6 +420,17 @@ int device_power_down(pm_message_t state
 }
 EXPORT_SYMBOL_GPL(device_power_down);
 
+/* Provide debugging info if we hang or deadlock during suspend */
+static struct timer_list suspend_timer;
+
+static void suspend_timeout(unsigned long _dev)
+{
+   struct device *dev = (struct device *) _dev;
+
+   dev_err(dev, "deadlock during suspend!\n");
+   show_state();
+}
+
 /**
  * suspend_device - Save state of one device.
  * @dev:   Device.
@@ -419,6 +440,10 @@ int suspend_device(struct device *dev, p
 {
int error = 0;
 
+   /* Provide debugging output in case of a deadlock */
+   setup_timer(_timer, suspend_timeout, (unsigned long) dev);
+   mod_timer(_timer, jiffies + 5*HZ);
+
if (dev->power.power_state.event) {
dev_dbg(dev, "PM: suspend %d-->%d\n",
dev->power.power_state.event, state.event);
@@ -441,6 +466,8 @@ int suspend_device(struct device *dev, p
error = dev->bus->suspend(dev, state);
suspend_report_result(dev->bus->suspend, error);
}
+
+   del_timer_sync(_timer);
return error;
 }
 
@@ -460,6 +487,7 @@ static int dpm_suspend(pm_message_t stat
 {
int error = 0;
 
+   suspending_task = current;
mutex_lock(_list_mtx);
while (!list_empty(_locked)) {
struct list_head *entry = dpm_locked.prev;
Index: usb-2.6/drivers/base/power/power.h
===
--- usb-2.6.orig/drivers/base/power/power.h
+++ usb-2.6/drivers/base/power/power.h
@@ -11,6 +11,7 @@ static inline struct device *to_device(s
return container_of(entry, struct device, power.entry);
 }
 
+extern bool in_suspend_context(void);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct 

Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-23 Thread Rafael J. Wysocki
On Saturday, 23 of February 2008, Alan Stern wrote:
> On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
> 
> > Unfortunately, I missed your Bugzilla comment at
> > http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28
> 
> Strange...  But obviously you did see it eventually.
> 
> > Well, in the face of it, I'm considering to remove the code that
> > acquires device semaphores from the suspend core for now.  Evidently, this
> > change turns out to be painfully premature.
> 
> I wonder if that's really the best thing.

Ultimately no, it's not.  However, we are now late in the -rc2 time frame and
the release of -rc3 seems to be imminent.  At this point, IMO, that's the
safest thing to do.  BTW, appended is the patch I'd like to get applied.

> How would we then learn about drivers trying to register or unregister a
> device during a sleep transition?

I think we should fix the ones we know about and try to reintroduce this
change in the 2.6.26 time frame.  It also seems reasonable to reconsider what
we want to achieve, as there may be a better way to do that.

> Do you think it might be possible instead to somehow allow these
> unregistrations to go through, while still failing or blocking
> registrations?

Yes, that should be possible, but as I said above, I think it's not the right
time for doing that right now.

> It shouldn't be too hard to modify the driver core so that it calls the
> driver's remove() method without trying to acquire dev->sem if your
> in_suspend_context() test succeeds. 

I agree that the in_suspend_context() test may be useful and this is one
of the reasons why I think the entire approach requires reconsideration.

> I have to admit, I still don't understand what's going on with the MMC 
> driver.

Me neither, and that alone is a good enough reason for resigning from acquiring
device semaphores in the suspend core, at least in the mainline kernel, until
we understand the problem.

> Why is there a workqueue involved?  If the workqueue fails to  
> unregister the device, why should it bother the suspend routine?  After 
> all, if the suspend routine can afford to wait for the workqueue to 
> finish then it could just as well afford to do the unregistration 
> itself.

Yes, that's strange.

> > Also, we have apparent problems with pm_sleep_lock()
> > being take in device_add() (see
> > http://bugzilla.kernel.org/show_bug.cgi?id=9874).
> 
> We'll have to get more information from the bug reporter to figure out 
> what really happened there.

Yes.

> Ultimately it may turn out some drivers just aren't very careful about
> when they try to register new devices.

That's almost certain to me.

> Doing the registration by way of a workqueue can be problematic if the
> workqueue happens to run during a system sleep transition.  That will still
> be true if you revert the acquire-all-semaphores patch.

Yes, but our taking of device semaphores exposes these problems in a rather
drastic fashion. :-)

Thanks,
Rafael

---
 drivers/base/power/main.c |   84 ++
 1 file changed, 4 insertions(+), 80 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -48,7 +48,6 @@
  */
 
 LIST_HEAD(dpm_active);
-static LIST_HEAD(dpm_locked);
 static LIST_HEAD(dpm_off);
 static LIST_HEAD(dpm_off_irq);
 static LIST_HEAD(dpm_destroy);
@@ -81,28 +80,6 @@ void device_pm_add(struct device *dev)
  */
 void device_pm_remove(struct device *dev)
 {
-   /*
-* If this function is called during a suspend, it will be blocked,
-* because we're holding the device's semaphore at that time, which may
-* lead to a deadlock.  In that case we want to print a warning.
-* However, it may also be called by unregister_dropped_devices() with
-* the device's semaphore released, in which case the warning should
-* not be printed.
-*/
-   if (down_trylock(>sem)) {
-   if (down_read_trylock(_sleep_rwsem)) {
-   /* No suspend in progress, wait on dev->sem */
-   down(>sem);
-   up_read(_sleep_rwsem);
-   } else {
-   /* Suspend in progress, we may deadlock */
-   dev_warn(dev, "Suspicious %s during suspend\n",
-   __FUNCTION__);
-   dump_stack();
-   /* The user has been warned ... */
-   down(>sem);
-   }
-   }
pr_debug("PM: Removing info for %s:%s\n",
 dev->bus ? dev->bus->name : "No Bus",
 kobject_name(>kobj));
@@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev
dpm_sysfs_remove(dev);
list_del_init(>power.entry);
mutex_unlock(_list_mtx);
-   up(>sem);
 }
 
 /**
@@ -266,7 +242,7 @@ static 

Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-23 Thread Rafael J. Wysocki
On Saturday, 23 of February 2008, Alan Stern wrote:
 On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
 
  Unfortunately, I missed your Bugzilla comment at
  http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28
 
 Strange...  But obviously you did see it eventually.
 
  Well, in the face of it, I'm considering to remove the code that
  acquires device semaphores from the suspend core for now.  Evidently, this
  change turns out to be painfully premature.
 
 I wonder if that's really the best thing.

Ultimately no, it's not.  However, we are now late in the -rc2 time frame and
the release of -rc3 seems to be imminent.  At this point, IMO, that's the
safest thing to do.  BTW, appended is the patch I'd like to get applied.

 How would we then learn about drivers trying to register or unregister a
 device during a sleep transition?

I think we should fix the ones we know about and try to reintroduce this
change in the 2.6.26 time frame.  It also seems reasonable to reconsider what
we want to achieve, as there may be a better way to do that.

 Do you think it might be possible instead to somehow allow these
 unregistrations to go through, while still failing or blocking
 registrations?

Yes, that should be possible, but as I said above, I think it's not the right
time for doing that right now.

 It shouldn't be too hard to modify the driver core so that it calls the
 driver's remove() method without trying to acquire dev-sem if your
 in_suspend_context() test succeeds. 

I agree that the in_suspend_context() test may be useful and this is one
of the reasons why I think the entire approach requires reconsideration.

 I have to admit, I still don't understand what's going on with the MMC 
 driver.

Me neither, and that alone is a good enough reason for resigning from acquiring
device semaphores in the suspend core, at least in the mainline kernel, until
we understand the problem.

 Why is there a workqueue involved?  If the workqueue fails to  
 unregister the device, why should it bother the suspend routine?  After 
 all, if the suspend routine can afford to wait for the workqueue to 
 finish then it could just as well afford to do the unregistration 
 itself.

Yes, that's strange.

  Also, we have apparent problems with pm_sleep_lock()
  being take in device_add() (see
  http://bugzilla.kernel.org/show_bug.cgi?id=9874).
 
 We'll have to get more information from the bug reporter to figure out 
 what really happened there.

Yes.

 Ultimately it may turn out some drivers just aren't very careful about
 when they try to register new devices.

That's almost certain to me.

 Doing the registration by way of a workqueue can be problematic if the
 workqueue happens to run during a system sleep transition.  That will still
 be true if you revert the acquire-all-semaphores patch.

Yes, but our taking of device semaphores exposes these problems in a rather
drastic fashion. :-)

Thanks,
Rafael

---
 drivers/base/power/main.c |   84 ++
 1 file changed, 4 insertions(+), 80 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -48,7 +48,6 @@
  */
 
 LIST_HEAD(dpm_active);
-static LIST_HEAD(dpm_locked);
 static LIST_HEAD(dpm_off);
 static LIST_HEAD(dpm_off_irq);
 static LIST_HEAD(dpm_destroy);
@@ -81,28 +80,6 @@ void device_pm_add(struct device *dev)
  */
 void device_pm_remove(struct device *dev)
 {
-   /*
-* If this function is called during a suspend, it will be blocked,
-* because we're holding the device's semaphore at that time, which may
-* lead to a deadlock.  In that case we want to print a warning.
-* However, it may also be called by unregister_dropped_devices() with
-* the device's semaphore released, in which case the warning should
-* not be printed.
-*/
-   if (down_trylock(dev-sem)) {
-   if (down_read_trylock(pm_sleep_rwsem)) {
-   /* No suspend in progress, wait on dev-sem */
-   down(dev-sem);
-   up_read(pm_sleep_rwsem);
-   } else {
-   /* Suspend in progress, we may deadlock */
-   dev_warn(dev, Suspicious %s during suspend\n,
-   __FUNCTION__);
-   dump_stack();
-   /* The user has been warned ... */
-   down(dev-sem);
-   }
-   }
pr_debug(PM: Removing info for %s:%s\n,
 dev-bus ? dev-bus-name : No Bus,
 kobject_name(dev-kobj));
@@ -110,7 +87,6 @@ void device_pm_remove(struct device *dev
dpm_sysfs_remove(dev);
list_del_init(dev-power.entry);
mutex_unlock(dpm_list_mtx);
-   up(dev-sem);
 }
 
 /**
@@ -266,7 +242,7 @@ static void dpm_resume(void)
   

Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-23 Thread Alan Stern
On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:

 Ultimately no, it's not.  However, we are now late in the -rc2 time frame and
 the release of -rc3 seems to be imminent.  At this point, IMO, that's the
 safest thing to do.  BTW, appended is the patch I'd like to get applied.

In the interest of having a safe release, I guess you're right.  It's 
unfortunate, though -- there's no good way to get thorough testing 
without putting the code in Linus's tree.

  How would we then learn about drivers trying to register or unregister a
  device during a sleep transition?
 
 I think we should fix the ones we know about and try to reintroduce this
 change in the 2.6.26 time frame.  It also seems reasonable to reconsider what
 we want to achieve, as there may be a better way to do that.

Below is my suggested approach, based on yours.  It might even solve
the MMC problem, who knows?  And it adds some potentially useful
debugging, although it would be better to dump just the stack of
suspending_task.  Do you know how to do that from within a timer
routine?

I still have no clear idea about what's going on with the ACPI bug in 
#9874.  However perhaps the bug wouldn't occur if we blocked device 
registration until after the resume was finished, instead of failing it 
outright.  Since the registration in this case was done in a workqueue 
thread, blocking wouldn't cause an immediate deadlock.

Alan Stern



Index: usb-2.6/drivers/base/power/main.c
===
--- usb-2.6.orig/drivers/base/power/main.c
+++ usb-2.6/drivers/base/power/main.c
@@ -25,6 +25,7 @@
 #include linux/pm.h
 #include linux/resume-trace.h
 #include linux/rwsem.h
+#include linux/sched.h
 
 #include ../base.h
 #include power.h
@@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
 
 int (*platform_enable_wakeup)(struct device *dev, int is_on);
 
+static struct task_struct *suspending_task;
+
+bool in_suspend_context(void)
+{
+   return (suspending_task == current);
+}
+
 /**
  * device_pm_add - add a device to the list of active devices
  * @dev:   Device to be added to the list
@@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev
/* No suspend in progress, wait on dev-sem */
down(dev-sem);
up_read(pm_sleep_rwsem);
-   } else {
-   /* Suspend in progress, we may deadlock */
+   } else if (!in_suspend_context()) {
+   /* Suspending in another task, we may deadlock */
dev_warn(dev, Suspicious %s during suspend\n,
__FUNCTION__);
dump_stack();
/* The user has been warned ... */
down(dev-sem);
}
+   /* Otherwise we're okay */
}
pr_debug(PM: Removing info for %s:%s\n,
 dev-bus ? dev-bus-name : No Bus,
@@ -272,6 +281,7 @@ static void dpm_resume(void)
mutex_lock(dpm_list_mtx);
}
mutex_unlock(dpm_list_mtx);
+   suspending_task = NULL;
 }
 
 /**
@@ -410,6 +420,17 @@ int device_power_down(pm_message_t state
 }
 EXPORT_SYMBOL_GPL(device_power_down);
 
+/* Provide debugging info if we hang or deadlock during suspend */
+static struct timer_list suspend_timer;
+
+static void suspend_timeout(unsigned long _dev)
+{
+   struct device *dev = (struct device *) _dev;
+
+   dev_err(dev, deadlock during suspend!\n);
+   show_state();
+}
+
 /**
  * suspend_device - Save state of one device.
  * @dev:   Device.
@@ -419,6 +440,10 @@ int suspend_device(struct device *dev, p
 {
int error = 0;
 
+   /* Provide debugging output in case of a deadlock */
+   setup_timer(suspend_timer, suspend_timeout, (unsigned long) dev);
+   mod_timer(suspend_timer, jiffies + 5*HZ);
+
if (dev-power.power_state.event) {
dev_dbg(dev, PM: suspend %d--%d\n,
dev-power.power_state.event, state.event);
@@ -441,6 +466,8 @@ int suspend_device(struct device *dev, p
error = dev-bus-suspend(dev, state);
suspend_report_result(dev-bus-suspend, error);
}
+
+   del_timer_sync(suspend_timer);
return error;
 }
 
@@ -460,6 +487,7 @@ static int dpm_suspend(pm_message_t stat
 {
int error = 0;
 
+   suspending_task = current;
mutex_lock(dpm_list_mtx);
while (!list_empty(dpm_locked)) {
struct list_head *entry = dpm_locked.prev;
Index: usb-2.6/drivers/base/power/power.h
===
--- usb-2.6.orig/drivers/base/power/power.h
+++ usb-2.6/drivers/base/power/power.h
@@ -11,6 +11,7 @@ static inline struct device *to_device(s
return container_of(entry, struct device, power.entry);
 }
 
+extern bool in_suspend_context(void);
 extern void 

Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-23 Thread Rafael J. Wysocki
On Sunday, 24 of February 2008, Alan Stern wrote:
 On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
 
  Ultimately no, it's not.  However, we are now late in the -rc2 time frame 
  and
  the release of -rc3 seems to be imminent.  At this point, IMO, that's the
  safest thing to do.  BTW, appended is the patch I'd like to get applied.
 
 In the interest of having a safe release, I guess you're right.

That's what I'm concerned about at the moment.  I'm afraid there won't be
enough time to nail down all the issues (there may be some we're not even
aware of).

 It's unfortunate, though -- there's no good way to get thorough testing 
 without putting the code in Linus's tree.

Absolutely.  Still, the code in question introduces unexpected behavior that
we don't really understand and it's not safe to leave it in the mainline.
 
   How would we then learn about drivers trying to register or unregister a
   device during a sleep transition?
  
  I think we should fix the ones we know about and try to reintroduce this
  change in the 2.6.26 time frame.  It also seems reasonable to reconsider 
  what
  we want to achieve, as there may be a better way to do that.
 
 Below is my suggested approach, based on yours.  It might even solve
 the MMC problem, who knows?  And it adds some potentially useful
 debugging, although it would be better to dump just the stack of
 suspending_task.  Do you know how to do that from within a timer
 routine?

No, I don't.

 I still have no clear idea about what's going on with the ACPI bug in 
 #9874.

It seems that the ACPI code is not prepared to cope with a failing device
registration, in which case it'd need fixing.  Frankly, I'm afraid there may
be more issues like this throughout the tree.

 However perhaps the bug wouldn't occur if we blocked device  
 registration until after the resume was finished, instead of failing it 
 outright.  Since the registration in this case was done in a workqueue 
 thread, blocking wouldn't cause an immediate deadlock.

No, it wouldn't, but the fact that it happens in the ACPI code makes me worry.

If we block that code and the things it's supposed to do turn out to be
necessary for suspending later on, we'll be in trouble.

 Index: usb-2.6/drivers/base/power/main.c
 ===
 --- usb-2.6.orig/drivers/base/power/main.c
 +++ usb-2.6/drivers/base/power/main.c
 @@ -25,6 +25,7 @@
  #include linux/pm.h
  #include linux/resume-trace.h
  #include linux/rwsem.h
 +#include linux/sched.h
  
  #include ../base.h
  #include power.h
 @@ -59,6 +60,13 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
  
  int (*platform_enable_wakeup)(struct device *dev, int is_on);
  
 +static struct task_struct *suspending_task;
 +
 +bool in_suspend_context(void)
 +{
 + return (suspending_task == current);
 +}
 +
  /**
   *   device_pm_add - add a device to the list of active devices
   *   @dev:   Device to be added to the list
 @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev
   /* No suspend in progress, wait on dev-sem */
   down(dev-sem);
   up_read(pm_sleep_rwsem);
 - } else {
 - /* Suspend in progress, we may deadlock */
 + } else if (!in_suspend_context()) {
 + /* Suspending in another task, we may deadlock */

Well, that shouldn't really deadlock.  If it does, there is a potential design
issue somewhere.  I think it might be better to set up a timer in here too.

Although IMO it would be even better to just set up a timer and remove the
warning altogehter.

   dev_warn(dev, Suspicious %s during suspend\n,
   __FUNCTION__);
   dump_stack();
   /* The user has been warned ... */
   down(dev-sem);
   }
 + /* Otherwise we're okay */
   }
   pr_debug(PM: Removing info for %s:%s\n,
dev-bus ? dev-bus-name : No Bus,

There's a problem here, because we shouldn't release the semaphore if we're
in the suspend context.

 @@ -272,6 +281,7 @@ static void dpm_resume(void)
   mutex_lock(dpm_list_mtx);
   }
   mutex_unlock(dpm_list_mtx);
 + suspending_task = NULL;
  }
  
  /**
 @@ -410,6 +420,17 @@ int device_power_down(pm_message_t state
  }
  EXPORT_SYMBOL_GPL(device_power_down);
  
 +/* Provide debugging info if we hang or deadlock during suspend */
 +static struct timer_list suspend_timer;
 +
 +static void suspend_timeout(unsigned long _dev)
 +{
 + struct device *dev = (struct device *) _dev;
 +
 + dev_err(dev, deadlock during suspend!\n);
 + show_state();

The show_state() seems to be overkill and won't really help if the user can't
collect the output on the fly using a serial console or something like this.
[The debug stuff printed here should fit a typical laptop screen, ideally.]

 +}
 +
  /**
   

Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-23 Thread Alan Stern
On Sun, 24 Feb 2008, Rafael J. Wysocki wrote:

 On Sunday, 24 of February 2008, Alan Stern wrote:
  On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:
  
   Ultimately no, it's not.  However, we are now late in the -rc2 time frame 
   and
   the release of -rc3 seems to be imminent.  At this point, IMO, that's the
   safest thing to do.  BTW, appended is the patch I'd like to get applied.
  
  In the interest of having a safe release, I guess you're right.
 
 That's what I'm concerned about at the moment.  I'm afraid there won't be
 enough time to nail down all the issues (there may be some we're not even
 aware of).

All right.  You'll need to enlarge your big reversal patch by reverting 
commit 4145ed6dc597a9bea5f6ae8c574653b2de10620f.

  Below is my suggested approach, based on yours.  It might even solve
  the MMC problem, who knows?  And it adds some potentially useful
  debugging, although it would be better to dump just the stack of
  suspending_task.  Do you know how to do that from within a timer
  routine?
 
 No, I don't.

On further checking the answer turns out to be sched_show_task().  
That's the routine which gets called for each process when you type 
Alt-SysRq-t.

  I still have no clear idea about what's going on with the ACPI bug in 
  #9874.
 
 It seems that the ACPI code is not prepared to cope with a failing device
 registration, in which case it'd need fixing.  Frankly, I'm afraid there may
 be more issues like this throughout the tree.

The fact remains that it is unsafe to register a device during a sleep 
transition, although if the device's driver doesn't have a suspend or 
resume method you can probably get away with it.

  However perhaps the bug wouldn't occur if we blocked device  
  registration until after the resume was finished, instead of failing it 
  outright.  Since the registration in this case was done in a workqueue 
  thread, blocking wouldn't cause an immediate deadlock.
 
 No, it wouldn't, but the fact that it happens in the ACPI code makes me worry.

It happened in a workqueue.  There could be lots of similar cases: Some 
interrupt-driven event causes a hotplug action.  Since the action can't 
be carried out in interrupt context, the driver has no choice but to 
defer it to a workqueue or kernel thread.  Blocking that workqueue or 
kernel thread won't cause a problem _unless_ the driver's 
suspend/resume methods try to synchronize with it.  That's the 
difficult case.

 If we block that code and the things it's supposed to do turn out to be
 necessary for suspending later on, we'll be in trouble.

Exactly.

  @@ -94,14 +102,15 @@ void device_pm_remove(struct device *dev
  /* No suspend in progress, wait on dev-sem */
  down(dev-sem);
  up_read(pm_sleep_rwsem);
  -   } else {
  -   /* Suspend in progress, we may deadlock */
  +   } else if (!in_suspend_context()) {
  +   /* Suspending in another task, we may deadlock */
 
 Well, that shouldn't really deadlock.  If it does, there is a potential design
 issue somewhere.  I think it might be better to set up a timer in here too.

At this point the driver core owns the device semaphore, so the 
unregistration task will block until the sleep is over.  If the 
driver's resume method has to synchronize with the unregistration task 
then it will deadlock.  I agree, it would be a design issue.  In fact, 
it's the same design issue described earlier in this email.

 Although IMO it would be even better to just set up a timer and remove the
 warning altogehter.

That warning was just for debugging purposes.  A timer in the resume
path could do the same thing with fewer false alarms, just like the 
timer in the suspend path.  I have added it to the new patch.

  dev_warn(dev, Suspicious %s during suspend\n,
  __FUNCTION__);
  dump_stack();
  /* The user has been warned ... */
  down(dev-sem);
  }
  +   /* Otherwise we're okay */
  }
  pr_debug(PM: Removing info for %s:%s\n,
   dev-bus ? dev-bus-name : No Bus,
 
 There's a problem here, because we shouldn't release the semaphore if we're
 in the suspend context.

Yes, you're right.  It's fixed in the new version.

  +static void suspend_timeout(unsigned long _dev)
  +{
  +   struct device *dev = (struct device *) _dev;
  +
  +   dev_err(dev, deadlock during suspend!\n);
  +   show_state();
 
 The show_state() seems to be overkill and won't really help if the user can't
 collect the output on the fly using a serial console or something like this.
 [The debug stuff printed here should fit a typical laptop screen, ideally.]

Changed.

 I'd prefer
 
   if (in_suspend_context()) {
   __device_release_driver(dev);
   } else {
   down(dev-sem);
   __device_release_driver(dev);
   

Re: [linux-pm] [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-23 Thread Alan Stern
On Sat, 23 Feb 2008, Alan Stern wrote:

 It happened in a workqueue.  There could be lots of similar cases: Some 
 interrupt-driven event causes a hotplug action.  Since the action can't 
 be carried out in interrupt context, the driver has no choice but to 
 defer it to a workqueue or kernel thread.  Blocking that workqueue or 
 kernel thread won't cause a problem _unless_ the driver's 
 suspend/resume methods try to synchronize with it.  That's the 
 difficult case.

In fact this turns out to be exactly the problem with the MMC
subsystem.  It uses a dedicated workqueue (kmmcd) to handle state
changes, and mmc_suspend_host() does a flush_workqueue().  If the
workqueue contains an entry to register or unregister a device, this is
guaranteed to deadlock -- and that's exactly what happens when Zdenek 
inserts or removes a card during the disk-drive spindown time of a 
system sleep.

It looks like the best solution is for mmc_suspend_host() not to flush
the workqueue; instead the workqueue should prevent itself from
running during a suspend.  Right now the easiest way to accomplish this
is for the workqueue routines (mmc_detect() and siblings) to acquire
the mmc_host's device semaphore.  If in the future the MMC subsystem
adds dynamic PM support then a more complicated arrangement will be 
needed.

So the patch below, in combination with the previous patch, might just 
fix the whole problem.

Alan Stern



Index: usb-2.6/drivers/mmc/core/core.c
===
--- usb-2.6.orig/drivers/mmc/core/core.c
+++ usb-2.6/drivers/mmc/core/core.c
@@ -731,8 +731,6 @@ void mmc_stop_host(struct mmc_host *host
  */
 int mmc_suspend_host(struct mmc_host *host, pm_message_t state)
 {
-   mmc_flush_scheduled_work();
-
mmc_bus_get(host);
if (host-bus_ops  !host-bus_dead) {
if (host-bus_ops-suspend)
Index: usb-2.6/drivers/mmc/core/mmc.c
===
--- usb-2.6.orig/drivers/mmc/core/mmc.c
+++ usb-2.6/drivers/mmc/core/mmc.c
@@ -441,6 +441,7 @@ static void mmc_detect(struct mmc_host *
BUG_ON(!host);
BUG_ON(!host-card);
 
+   down(mmc_classdev(host)-sem);
mmc_claim_host(host);
 
/*
@@ -449,6 +450,7 @@ static void mmc_detect(struct mmc_host *
err = mmc_send_status(host-card, NULL);
 
mmc_release_host(host);
+   up(mmc_classdev(host)-sem);
 
if (err) {
mmc_remove(host);
Index: usb-2.6/drivers/mmc/core/sd.c
===
--- usb-2.6.orig/drivers/mmc/core/sd.c
+++ usb-2.6/drivers/mmc/core/sd.c
@@ -500,6 +500,7 @@ static void mmc_sd_detect(struct mmc_hos
BUG_ON(!host);
BUG_ON(!host-card);
 
+   down(mmc_classdev(host)-sem);
mmc_claim_host(host);
 
/*
@@ -508,6 +509,7 @@ static void mmc_sd_detect(struct mmc_hos
err = mmc_send_status(host-card, NULL);
 
mmc_release_host(host);
+   up(mmc_classdev(host)-sem);
 
if (err) {
mmc_sd_remove(host);
Index: usb-2.6/drivers/mmc/core/sdio.c
===
--- usb-2.6.orig/drivers/mmc/core/sdio.c
+++ usb-2.6/drivers/mmc/core/sdio.c
@@ -195,6 +195,7 @@ static void mmc_sdio_detect(struct mmc_h
BUG_ON(!host);
BUG_ON(!host-card);
 
+   down(mmc_classdev(host)-sem);
mmc_claim_host(host);
 
/*
@@ -203,6 +204,7 @@ static void mmc_sdio_detect(struct mmc_h
err = mmc_select_card(host-card);
 
mmc_release_host(host);
+   up(mmc_classdev(host)-sem);
 
if (err) {
mmc_sdio_remove(host);

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-22 Thread Alan Stern
On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:

> Unfortunately, I missed your Bugzilla comment at
> http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28

Strange...  But obviously you did see it eventually.

> Well, in the face of it, I'm considering to remove the code that
> acquires device semaphores from the suspend core for now.  Evidently, this
> change turns out to be painfully premature.

I wonder if that's really the best thing.  How would we then learn 
about drivers trying to register or unregister a device during a sleep 
transition?

Do you think it might be possible instead to somehow allow these
unregistrations to go through, while still failing or blocking
registrations?  It shouldn't be too hard to modify the driver core so
that it calls the driver's remove() method without trying to acquire
dev->sem if your in_suspend_context() test succeeds.

I have to admit, I still don't understand what's going on with the MMC 
driver.  Why is there a workqueue involved?  If the workqueue fails to 
unregister the device, why should it bother the suspend routine?  After 
all, if the suspend routine can afford to wait for the workqueue to 
finish then it could just as well afford to do the unregistration 
itself.

> Also, we have apparent problems with pm_sleep_lock()
> being take in device_add() (see
> http://bugzilla.kernel.org/show_bug.cgi?id=9874).

We'll have to get more information from the bug reporter to figure out 
what really happened there.

Ultimately it may turn out some drivers just aren't very careful about
when they try to register new devices.  Doing the registration by way
of a workqueue can be problematic if the workqueue happens to run
during a system sleep transition.  That will still be true if you
revert the acquire-all-semaphores patch.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-22 Thread Rafael J. Wysocki
On Friday, 22 of February 2008, Alan Stern wrote:
> On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:
> 
> > BTW, below is a simplified version of the patch, without the mutex 
> > protecting
> > suspending_task.  I'd like to push it upstream if it looks good.
> 
> It does look good.  Go ahead and push.
> 
> Acked-by: Alan Stern <[EMAIL PROTECTED]>
> 
> > Please also have a look at http://bugzilla.kernel.org/show_bug.cgi?id=10030.
> > There seems to be another issue related to us holding devices' semaphores.
> > Namely, it looks like, when the user removes the card, a concurrent thread
> > (from a workqueue) calls device_del() and blocks on the dev->sem held by
> > us and then something else deadlocks with this thread.  I'll be looking into
> > this tomorrow.
> 
> I've been too busy with other things to look at the activity on that 
> bug report.  Tonight or tomorrow...

Unfortunately, I missed your Bugzilla comment at
http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28

Well, in the face of it, I'm considering to remove the code that
acquires device semaphores from the suspend core for now.  Evidently, this
change turns out to be painfully premature.

Also, we have apparent problems with pm_sleep_lock()
being take in device_add() (see
http://bugzilla.kernel.org/show_bug.cgi?id=9874).

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-22 Thread Rafael J. Wysocki
On Friday, 22 of February 2008, Alan Stern wrote:
 On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:
 
  BTW, below is a simplified version of the patch, without the mutex 
  protecting
  suspending_task.  I'd like to push it upstream if it looks good.
 
 It does look good.  Go ahead and push.
 
 Acked-by: Alan Stern [EMAIL PROTECTED]
 
  Please also have a look at http://bugzilla.kernel.org/show_bug.cgi?id=10030.
  There seems to be another issue related to us holding devices' semaphores.
  Namely, it looks like, when the user removes the card, a concurrent thread
  (from a workqueue) calls device_del() and blocks on the dev-sem held by
  us and then something else deadlocks with this thread.  I'll be looking into
  this tomorrow.
 
 I've been too busy with other things to look at the activity on that 
 bug report.  Tonight or tomorrow...

Unfortunately, I missed your Bugzilla comment at
http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28

Well, in the face of it, I'm considering to remove the code that
acquires device semaphores from the suspend core for now.  Evidently, this
change turns out to be painfully premature.

Also, we have apparent problems with pm_sleep_lock()
being take in device_add() (see
http://bugzilla.kernel.org/show_bug.cgi?id=9874).

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-22 Thread Alan Stern
On Sat, 23 Feb 2008, Rafael J. Wysocki wrote:

 Unfortunately, I missed your Bugzilla comment at
 http://bugzilla.kernel.org/show_bug.cgi?id=10030#c28

Strange...  But obviously you did see it eventually.

 Well, in the face of it, I'm considering to remove the code that
 acquires device semaphores from the suspend core for now.  Evidently, this
 change turns out to be painfully premature.

I wonder if that's really the best thing.  How would we then learn 
about drivers trying to register or unregister a device during a sleep 
transition?

Do you think it might be possible instead to somehow allow these
unregistrations to go through, while still failing or blocking
registrations?  It shouldn't be too hard to modify the driver core so
that it calls the driver's remove() method without trying to acquire
dev-sem if your in_suspend_context() test succeeds.

I have to admit, I still don't understand what's going on with the MMC 
driver.  Why is there a workqueue involved?  If the workqueue fails to 
unregister the device, why should it bother the suspend routine?  After 
all, if the suspend routine can afford to wait for the workqueue to 
finish then it could just as well afford to do the unregistration 
itself.

 Also, we have apparent problems with pm_sleep_lock()
 being take in device_add() (see
 http://bugzilla.kernel.org/show_bug.cgi?id=9874).

We'll have to get more information from the bug reporter to figure out 
what really happened there.

Ultimately it may turn out some drivers just aren't very careful about
when they try to register new devices.  Doing the registration by way
of a workqueue can be problematic if the workqueue happens to run
during a system sleep transition.  That will still be true if you
revert the acquire-all-semaphores patch.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-21 Thread Alan Stern
On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:

> > > --- linux-2.6.orig/drivers/base/core.c
> > > +++ linux-2.6/drivers/base/core.c
> > > @@ -929,6 +929,11 @@ void device_del(struct device *dev)
> > >   struct device *parent = dev->parent;
> > >   struct class_interface *class_intf;
> > >  
> > > + if (in_suspend_context()) {
> > > + get_device(dev);
> > 
> > Where is this get_device() undone?  Shouldn't there be an extra 
> > put_device() added to unregister_dropped_devices()?
> 
> No, I don't think so, because unregister_dropped_devices() calls
> device_unregister() that does the put_device() eventually.

Ah, yes.

> If we are called by device_unregister(), the get_device() is needed to balance
> the put_device() that will be called by device_unregister() after we return.
> 
> OTOH, if we are called directly, then we need to balance the put_device()
> that will be done by device_unregister() called from
> unregister_dropped_devices().
> 
> I hope I didn't miss anything.

Okay, that sounds right.

> > > + device_pm_schedule_removal(dev);
> > > + return;
> > > + }
> > >   device_pm_remove(dev);
> > >   if (parent)
> > >   klist_del(>knode_parent);
> > 
> > And now the change to device_destroy() isn't needed at all.
> 
> No, it's not.  Didn't I remove it?  I thought I did.

Oh yes, you did.

I see a possible problem in device_resume().  My copy isn't fully 
up-to-date, but it looks like you call unregister_dropped_devices() 
before doing the up_write(_sleep_rwsem).  Won't this cause 
warnings in device_del() about a suspicious caller?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-21 Thread Rafael J. Wysocki
On Thursday, 21 of February 2008, Alan Stern wrote:
> On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:
> 
> > > > +bool in_suspend_context(void)
> > > > +{
> > > > +   bool result;
> > > > +
> > > > +   mutex_lock(_task_mtx);
> > > > +   result = (suspending_task == current);
> > > > +   mutex_unlock(_task_mtx);
> > > > +   return result;
> > > > +}
> > > 
> > > If suspending_task == current then you are guaranteed to be serialized, 
> > > because everything a single task does is serial.
> > 
> > As I said before (but that doesn't seem to reach the list, so I'm 
> > repeating),
> > this is to protect other tasks from reading an inconsistent value of
> > suspending_task in case they attempt to remove a device concurrently with
> > respect to us.
> > 
> > While this is not likely to happen right now, because of the freezer, it may
> > very well happen when the freezer is finally removed.
> 
> Sorry, I don't understand.  Are you worried that process A might set
> suspending_task = A but then process B might still see suspending_task
> == NULL?  Or that A might set suspend_task = NULL but then B might
> still see suspending_task == A?
> 
> Neither one will cause any problem, since the only case that matters is
> when B sees suspending_task == B -- and that can happen if and only if
> B was the last process to set suspending_task.
> 
> In fact, you might as well get rid of the set_suspending_task() routine 
> entirely and just put the assignments inline.

OK, I will.

> > --- linux-2.6.orig/drivers/base/core.c
> > +++ linux-2.6/drivers/base/core.c
> > @@ -929,6 +929,11 @@ void device_del(struct device *dev)
> > struct device *parent = dev->parent;
> > struct class_interface *class_intf;
> >  
> > +   if (in_suspend_context()) {
> > +   get_device(dev);
> 
> Where is this get_device() undone?  Shouldn't there be an extra 
> put_device() added to unregister_dropped_devices()?

No, I don't think so, because unregister_dropped_devices() calls
device_unregister() that does the put_device() eventually.

If we are called by device_unregister(), the get_device() is needed to balance
the put_device() that will be called by device_unregister() after we return.

OTOH, if we are called directly, then we need to balance the put_device()
that will be done by device_unregister() called from
unregister_dropped_devices().

I hope I didn't miss anything.

> > +   device_pm_schedule_removal(dev);
> > +   return;
> > +   }
> > device_pm_remove(dev);
> > if (parent)
> > klist_del(>knode_parent);
> 
> And now the change to device_destroy() isn't needed at all.

No, it's not.  Didn't I remove it?  I thought I did.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-21 Thread Alan Stern
On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:

> > > +bool in_suspend_context(void)
> > > +{
> > > + bool result;
> > > +
> > > + mutex_lock(_task_mtx);
> > > + result = (suspending_task == current);
> > > + mutex_unlock(_task_mtx);
> > > + return result;
> > > +}
> > 
> > If suspending_task == current then you are guaranteed to be serialized, 
> > because everything a single task does is serial.
> 
> As I said before (but that doesn't seem to reach the list, so I'm repeating),
> this is to protect other tasks from reading an inconsistent value of
> suspending_task in case they attempt to remove a device concurrently with
> respect to us.
> 
> While this is not likely to happen right now, because of the freezer, it may
> very well happen when the freezer is finally removed.

Sorry, I don't understand.  Are you worried that process A might set
suspending_task = A but then process B might still see suspending_task
== NULL?  Or that A might set suspend_task = NULL but then B might
still see suspending_task == A?

Neither one will cause any problem, since the only case that matters is
when B sees suspending_task == B -- and that can happen if and only if
B was the last process to set suspending_task.

In fact, you might as well get rid of the set_suspending_task() routine 
entirely and just put the assignments inline.

> --- linux-2.6.orig/drivers/base/core.c
> +++ linux-2.6/drivers/base/core.c
> @@ -929,6 +929,11 @@ void device_del(struct device *dev)
>   struct device *parent = dev->parent;
>   struct class_interface *class_intf;
>  
> + if (in_suspend_context()) {
> + get_device(dev);

Where is this get_device() undone?  Shouldn't there be an extra 
put_device() added to unregister_dropped_devices()?

> + device_pm_schedule_removal(dev);
> + return;
> + }
>   device_pm_remove(dev);
>   if (parent)
>   klist_del(>knode_parent);

And now the change to device_destroy() isn't needed at all.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-21 Thread Rafael J. Wysocki
On Thursday, 21 of February 2008, Alan Stern wrote:
 On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:
 
+bool in_suspend_context(void)
+{
+   bool result;
+
+   mutex_lock(suspending_task_mtx);
+   result = (suspending_task == current);
+   mutex_unlock(suspending_task_mtx);
+   return result;
+}
   
   If suspending_task == current then you are guaranteed to be serialized, 
   because everything a single task does is serial.
  
  As I said before (but that doesn't seem to reach the list, so I'm 
  repeating),
  this is to protect other tasks from reading an inconsistent value of
  suspending_task in case they attempt to remove a device concurrently with
  respect to us.
  
  While this is not likely to happen right now, because of the freezer, it may
  very well happen when the freezer is finally removed.
 
 Sorry, I don't understand.  Are you worried that process A might set
 suspending_task = A but then process B might still see suspending_task
 == NULL?  Or that A might set suspend_task = NULL but then B might
 still see suspending_task == A?
 
 Neither one will cause any problem, since the only case that matters is
 when B sees suspending_task == B -- and that can happen if and only if
 B was the last process to set suspending_task.
 
 In fact, you might as well get rid of the set_suspending_task() routine 
 entirely and just put the assignments inline.

OK, I will.

  --- linux-2.6.orig/drivers/base/core.c
  +++ linux-2.6/drivers/base/core.c
  @@ -929,6 +929,11 @@ void device_del(struct device *dev)
  struct device *parent = dev-parent;
  struct class_interface *class_intf;
   
  +   if (in_suspend_context()) {
  +   get_device(dev);
 
 Where is this get_device() undone?  Shouldn't there be an extra 
 put_device() added to unregister_dropped_devices()?

No, I don't think so, because unregister_dropped_devices() calls
device_unregister() that does the put_device() eventually.

If we are called by device_unregister(), the get_device() is needed to balance
the put_device() that will be called by device_unregister() after we return.

OTOH, if we are called directly, then we need to balance the put_device()
that will be done by device_unregister() called from
unregister_dropped_devices().

I hope I didn't miss anything.

  +   device_pm_schedule_removal(dev);
  +   return;
  +   }
  device_pm_remove(dev);
  if (parent)
  klist_del(dev-knode_parent);
 
 And now the change to device_destroy() isn't needed at all.

No, it's not.  Didn't I remove it?  I thought I did.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-21 Thread Alan Stern
On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:

   +bool in_suspend_context(void)
   +{
   + bool result;
   +
   + mutex_lock(suspending_task_mtx);
   + result = (suspending_task == current);
   + mutex_unlock(suspending_task_mtx);
   + return result;
   +}
  
  If suspending_task == current then you are guaranteed to be serialized, 
  because everything a single task does is serial.
 
 As I said before (but that doesn't seem to reach the list, so I'm repeating),
 this is to protect other tasks from reading an inconsistent value of
 suspending_task in case they attempt to remove a device concurrently with
 respect to us.
 
 While this is not likely to happen right now, because of the freezer, it may
 very well happen when the freezer is finally removed.

Sorry, I don't understand.  Are you worried that process A might set
suspending_task = A but then process B might still see suspending_task
== NULL?  Or that A might set suspend_task = NULL but then B might
still see suspending_task == A?

Neither one will cause any problem, since the only case that matters is
when B sees suspending_task == B -- and that can happen if and only if
B was the last process to set suspending_task.

In fact, you might as well get rid of the set_suspending_task() routine 
entirely and just put the assignments inline.

 --- linux-2.6.orig/drivers/base/core.c
 +++ linux-2.6/drivers/base/core.c
 @@ -929,6 +929,11 @@ void device_del(struct device *dev)
   struct device *parent = dev-parent;
   struct class_interface *class_intf;
  
 + if (in_suspend_context()) {
 + get_device(dev);

Where is this get_device() undone?  Shouldn't there be an extra 
put_device() added to unregister_dropped_devices()?

 + device_pm_schedule_removal(dev);
 + return;
 + }
   device_pm_remove(dev);
   if (parent)
   klist_del(dev-knode_parent);

And now the change to device_destroy() isn't needed at all.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-21 Thread Alan Stern
On Thu, 21 Feb 2008, Rafael J. Wysocki wrote:

   --- linux-2.6.orig/drivers/base/core.c
   +++ linux-2.6/drivers/base/core.c
   @@ -929,6 +929,11 @@ void device_del(struct device *dev)
 struct device *parent = dev-parent;
 struct class_interface *class_intf;

   + if (in_suspend_context()) {
   + get_device(dev);
  
  Where is this get_device() undone?  Shouldn't there be an extra 
  put_device() added to unregister_dropped_devices()?
 
 No, I don't think so, because unregister_dropped_devices() calls
 device_unregister() that does the put_device() eventually.

Ah, yes.

 If we are called by device_unregister(), the get_device() is needed to balance
 the put_device() that will be called by device_unregister() after we return.
 
 OTOH, if we are called directly, then we need to balance the put_device()
 that will be done by device_unregister() called from
 unregister_dropped_devices().
 
 I hope I didn't miss anything.

Okay, that sounds right.

   + device_pm_schedule_removal(dev);
   + return;
   + }
 device_pm_remove(dev);
 if (parent)
 klist_del(dev-knode_parent);
  
  And now the change to device_destroy() isn't needed at all.
 
 No, it's not.  Didn't I remove it?  I thought I did.

Oh yes, you did.

I see a possible problem in device_resume().  My copy isn't fully 
up-to-date, but it looks like you call unregister_dropped_devices() 
before doing the up_write(pm_sleep_rwsem).  Won't this cause 
warnings in device_del() about a suspicious caller?

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Rafael J. Wysocki
On Wednesday, 20 of February 2008, Alan Stern wrote:
> On Wed, 20 Feb 2008, Rafael J. Wysocki wrote:
> 
> > Well, below is an uncompiled and untested but illustrating the idea that
> > might allow people not to bother with device_pm_schedule_removal()
> > explicitly and can fix the issue at hand.
> > 
> > [There are some cases that need handling and are not covered here.]
> > 
> > Please have a look.
> > 
> > Thanks,
> > Rafael
> 
> > +static struct task_struct *suspending_task;
> > +static DEFINE_MUTEX(suspending_task_mtx);
> 
> I suspect you don't really need this mutex.
> 
> > +bool in_suspend_context(void)
> > +{
> > +   bool result;
> > +
> > +   mutex_lock(_task_mtx);
> > +   result = (suspending_task == current);
> > +   mutex_unlock(_task_mtx);
> > +   return result;
> > +}
> 
> If suspending_task == current then you are guaranteed to be serialized, 
> because everything a single task does is serial.

As I said before (but that doesn't seem to reach the list, so I'm repeating),
this is to protect other tasks from reading an inconsistent value of
suspending_task in case they attempt to remove a device concurrently with
respect to us.

While this is not likely to happen right now, because of the freezer, it may
very well happen when the freezer is finally removed.

> > @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class,
> > dev = class_find_device(class, , __match_devt);
> > if (dev) {
> > put_device(dev);
> > -   device_unregister(dev);
> > +   if (in_suspend_context())
> > +   device_pm_schedule_removal(dev);
> > +   else
> > +   device_unregister(dev);
> > }
> >  }
> 
> But what about device_del()?  Can a similar change be made there?

Something like the patch below, perhaps?  Again, untested, but compiled this
time. :-)

Thanks,
Rafael

---
 drivers/base/core.c|5 +
 drivers/base/power/main.c  |   22 ++
 drivers/base/power/power.h |5 +
 3 files changed, 32 insertions(+)

Index: linux-2.6/drivers/base/power/main.c
===
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -59,6 +59,26 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
 
 int (*platform_enable_wakeup)(struct device *dev, int is_on);
 
+static struct task_struct *suspending_task;
+static DEFINE_MUTEX(suspending_task_mtx);
+
+bool in_suspend_context(void)
+{
+   bool result;
+
+   mutex_lock(_task_mtx);
+   result = (suspending_task == current);
+   mutex_unlock(_task_mtx);
+   return result;
+}
+
+static void set_suspending_task(struct task_struct *task)
+{
+   mutex_lock(_task_mtx);
+   suspending_task = task;
+   mutex_unlock(_task_mtx);
+}
+
 /**
  * device_pm_add - add a device to the list of active devices
  * @dev:   Device to be added to the list
@@ -272,6 +292,7 @@ static void dpm_resume(void)
mutex_lock(_list_mtx);
}
mutex_unlock(_list_mtx);
+   set_suspending_task(NULL);
 }
 
 /**
@@ -460,6 +481,7 @@ static int dpm_suspend(pm_message_t stat
 {
int error = 0;
 
+   set_suspending_task(current);
mutex_lock(_list_mtx);
while (!list_empty(_locked)) {
struct list_head *entry = dpm_locked.prev;
Index: linux-2.6/drivers/base/core.c
===
--- linux-2.6.orig/drivers/base/core.c
+++ linux-2.6/drivers/base/core.c
@@ -929,6 +929,11 @@ void device_del(struct device *dev)
struct device *parent = dev->parent;
struct class_interface *class_intf;
 
+   if (in_suspend_context()) {
+   get_device(dev);
+   device_pm_schedule_removal(dev);
+   return;
+   }
device_pm_remove(dev);
if (parent)
klist_del(>knode_parent);
Index: linux-2.6/drivers/base/power/power.h
===
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -11,6 +11,7 @@ static inline struct device *to_device(s
return container_of(entry, struct device, power.entry);
 }
 
+extern bool in_suspend_context(void);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern int pm_sleep_lock(void);
@@ -18,6 +19,10 @@ extern void pm_sleep_unlock(void);
 
 #else /* CONFIG_PM_SLEEP */
 
+static inline bool in_suspend_context(void)
+{
+   return false;
+}
 
 static inline void device_pm_add(struct device *dev)
 {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Rafael J. Wysocki
On Wednesday, 20 of February 2008, Alan Stern wrote:
> On Wed, 20 Feb 2008, Rafael J. Wysocki wrote:
> 
> > Well, below is an uncompiled and untested but illustrating the idea that
> > might allow people not to bother with device_pm_schedule_removal()
> > explicitly and can fix the issue at hand.
> > 
> > [There are some cases that need handling and are not covered here.]
> > 
> > Please have a look.
> > 
> > Thanks,
> > Rafael
> 
> > +static struct task_struct *suspending_task;
> > +static DEFINE_MUTEX(suspending_task_mtx);
> 
> I suspect you don't really need this mutex.
> 
> > +bool in_suspend_context(void)
> > +{
> > +   bool result;
> > +
> > +   mutex_lock(_task_mtx);
> > +   result = (suspending_task == current);
> > +   mutex_unlock(_task_mtx);
> > +   return result;
> > +}
> 
> If suspending_task == current then you are guaranteed to be serialized, 
> because everything a single task does is serial.

But in principle there could be a concurrent thread removind the device and
that should block on dev->sem held by us.

Right now that's not very likely to happen thanks to the freezer, but we're
doing all this stuff, because we want to get rid of the freezer eventually. :-)

> > @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class,
> > dev = class_find_device(class, , __match_devt);
> > if (dev) {
> > put_device(dev);
> > -   device_unregister(dev);
> > +   if (in_suspend_context())
> > +   device_pm_schedule_removal(dev);
> > +   else
> > +   device_unregister(dev);
> > }
> >  }
> 
> But what about device_del()?  Can a similar change be made there?

I believe so.

I'm working on a more complete patch right now.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Alan Stern
On Wed, 20 Feb 2008, Rafael J. Wysocki wrote:

> Well, below is an uncompiled and untested but illustrating the idea that
> might allow people not to bother with device_pm_schedule_removal()
> explicitly and can fix the issue at hand.
> 
> [There are some cases that need handling and are not covered here.]
> 
> Please have a look.
> 
> Thanks,
> Rafael

> +static struct task_struct *suspending_task;
> +static DEFINE_MUTEX(suspending_task_mtx);

I suspect you don't really need this mutex.

> +bool in_suspend_context(void)
> +{
> + bool result;
> +
> + mutex_lock(_task_mtx);
> + result = (suspending_task == current);
> + mutex_unlock(_task_mtx);
> + return result;
> +}

If suspending_task == current then you are guaranteed to be serialized, 
because everything a single task does is serial.

> @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class,
>   dev = class_find_device(class, , __match_devt);
>   if (dev) {
>   put_device(dev);
> - device_unregister(dev);
> + if (in_suspend_context())
> + device_pm_schedule_removal(dev);
> + else
> + device_unregister(dev);
>   }
>  }

But what about device_del()?  Can a similar change be made there?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Rafael J. Wysocki
On Wednesday, 20 of February 2008, Alan Stern wrote:
> On Wed, 20 Feb 2008, Pierre Ossman wrote:
> 
> > > And why not simply fail the suspend if the resume routine doesn't exist
> > > and the suspend routine does?  Maybe with an error message in the
> > > system log.
> > 
> > For the asymmetric case, I guess that would do. But I still want to remove 
> > devices when the bus handler has no suspend handling.
> 
> Then it appears the correct approach is to use the new 
> device_pm_schedule_removal() routine.  It schedules the removal of a 
> suspended device for a time when the system suspend is over.

Well, below is an uncompiled and untested but illustrating the idea that
might allow people not to bother with device_pm_schedule_removal()
explicitly and can fix the issue at hand.

[There are some cases that need handling and are not covered here.]

Please have a look.

Thanks,
Rafael

---
 drivers/base/core.c|5 -
 drivers/base/power/main.c  |   22 ++
 drivers/base/power/power.h |5 +
 3 files changed, 31 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/base/power/main.c
===
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -59,6 +59,26 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
 
 int (*platform_enable_wakeup)(struct device *dev, int is_on);
 
+static struct task_struct *suspending_task;
+static DEFINE_MUTEX(suspending_task_mtx);
+
+bool in_suspend_context(void)
+{
+   bool result;
+
+   mutex_lock(_task_mtx);
+   result = (suspending_task == current);
+   mutex_unlock(_task_mtx);
+   return result;
+}
+
+static void set_suspending_task(struct task_struct *task)
+{
+   mutex_lock(_task_mtx);
+   suspending_task = task;
+   mutex_unlock(_task_mtx);
+}
+
 /**
  * device_pm_add - add a device to the list of active devices
  * @dev:   Device to be added to the list
@@ -272,6 +292,7 @@ static void dpm_resume(void)
mutex_lock(_list_mtx);
}
mutex_unlock(_list_mtx);
+   set_suspending_task(NULL);
 }
 
 /**
@@ -460,6 +481,7 @@ static int dpm_suspend(pm_message_t stat
 {
int error = 0;
 
+   set_suspending_task(current);
mutex_lock(_list_mtx);
while (!list_empty(_locked)) {
struct list_head *entry = dpm_locked.prev;
Index: linux-2.6/drivers/base/core.c
===
--- linux-2.6.orig/drivers/base/core.c
+++ linux-2.6/drivers/base/core.c
@@ -1162,7 +1162,10 @@ void device_destroy(struct class *class,
dev = class_find_device(class, , __match_devt);
if (dev) {
put_device(dev);
-   device_unregister(dev);
+   if (in_suspend_context())
+   device_pm_schedule_removal(dev);
+   else
+   device_unregister(dev);
}
 }
 EXPORT_SYMBOL_GPL(device_destroy);
Index: linux-2.6/drivers/base/power/power.h
===
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -11,6 +11,7 @@ static inline struct device *to_device(s
return container_of(entry, struct device, power.entry);
 }
 
+extern bool in_suspend_context(void);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern int pm_sleep_lock(void);
@@ -18,6 +19,10 @@ extern void pm_sleep_unlock(void);
 
 #else /* CONFIG_PM_SLEEP */
 
+static inline bool in_suspend_context(void)
+{
+   return false;
+}
 
 static inline void device_pm_add(struct device *dev)
 {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Alan Stern
On Wed, 20 Feb 2008, Pierre Ossman wrote:

> > And why not simply fail the suspend if the resume routine doesn't exist
> > and the suspend routine does?  Maybe with an error message in the
> > system log.
> 
> For the asymmetric case, I guess that would do. But I still want to remove 
> devices when the bus handler has no suspend handling.

Then it appears the correct approach is to use the new 
device_pm_schedule_removal() routine.  It schedules the removal of a 
suspended device for a time when the system suspend is over.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Rafael J. Wysocki
On Wednesday, 20 of February 2008, Pierre Ossman wrote:
> On Wed, 20 Feb 2008 14:26:16 -0500 (EST)
> Alan Stern <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Do I understand this correctly?  You've got special handling for the 
> > case where a bus handler doesn't have a resume routine, but no special 
> > handling for the case where it doesn't have a suspend routine?
> 
> Hmm... There should be checks for both, but the code seems to suggest 
> otherwise.
> 
> > Why bother to remove the device if neither routine exists (there won't be 
> > any need to revive it since the bus never got suspended)?
> 
> The bus always gets suspended. The checks are to determine if state is saved 
> or not. If it isn't, then a suspend/resume is treated as a removal/insertion.
> 
> > And why not simply fail the suspend if the resume routine doesn't exist
> > and the suspend routine does?  Maybe with an error message in the
> > system log.
> 
> For the asymmetric case, I guess that would do. But I still want to remove 
> devices when the bus handler has no suspend handling.

I think I know how to handle that, but there's a more urgent issue I need to
fix first.  Stay tuned. :-)

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Pierre Ossman
On Wed, 20 Feb 2008 14:26:16 -0500 (EST)
Alan Stern <[EMAIL PROTECTED]> wrote:

> 
> Do I understand this correctly?  You've got special handling for the 
> case where a bus handler doesn't have a resume routine, but no special 
> handling for the case where it doesn't have a suspend routine?

Hmm... There should be checks for both, but the code seems to suggest otherwise.

> Why bother to remove the device if neither routine exists (there won't be 
> any need to revive it since the bus never got suspended)?

The bus always gets suspended. The checks are to determine if state is saved or 
not. If it isn't, then a suspend/resume is treated as a removal/insertion.

> And why not simply fail the suspend if the resume routine doesn't exist
> and the suspend routine does?  Maybe with an error message in the
> system log.

For the asymmetric case, I guess that would do. But I still want to remove 
devices when the bus handler has no suspend handling.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Alan Stern
On Wed, 20 Feb 2008, Pierre Ossman wrote:

> Not really. But you have some things confused. What it checks is if
> the mmc bus handler (not a proper driver model, just a way of
> separating the MMC, SD and SDIO stuff) has a resume function. And if
> it doesn't, it removes the card (since it cannot revive it at
> resume).
> 
> So the only thing I can think of is to delay the removal until the
> resume routine, if that is safer.

Do I understand this correctly?  You've got special handling for the 
case where a bus handler doesn't have a resume routine, but no special 
handling for the case where it doesn't have a suspend routine?  Why 
bother to remove the device if neither routine exists (there won't be 
any need to revive it since the bus never got suspended)?

And why not simply fail the suspend if the resume routine doesn't exist
and the suspend routine does?  Maybe with an error message in the
system log.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Pierre Ossman
On Wed, 20 Feb 2008 11:42:56 -0500 (EST)
Alan Stern <[EMAIL PROTECTED]> wrote:

> > 
> > --- Comment #14 from [EMAIL PROTECTED]  2008-02-19 15:23 ---
> > Thanks a lot for the debugging work!
> > 
> > First, the patch triggers, which means that the problem discovered by Alan 
> > is
> > troubling us.  [Alan, do you have an idea how to fix that cleanly?]
> 
> I suggest we ask the maintainer for the MMC subsystem.
> 
> Pierre, you can find the details in the bugzilla entry.  Briefly,
> there's a pathway in the MMC core suspend routine (if the driver
> doesn't implement a resume hook) which could lead to the host being
> removed during a system suspend.  This is an illegal operation and it
> will deadlock.
> 
> Do you have a suggestion for a way to fix it?
> 

Not really. But you have some things confused. What it checks is if the mmc bus 
handler (not a proper driver model, just a way of separating the MMC, SD and 
SDIO stuff) has a resume function. And if it doesn't, it removes the card 
(since it cannot revive it at resume).

So the only thing I can think of is to delay the removal until the resume 
routine, if that is safer.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Alan Stern
On Tue, 19 Feb 2008 [EMAIL PROTECTED] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10030
> 
> 
> 
> 
> 
> --- Comment #14 from [EMAIL PROTECTED]  2008-02-19 15:23 ---
> Thanks a lot for the debugging work!
> 
> First, the patch triggers, which means that the problem discovered by Alan is
> troubling us.  [Alan, do you have an idea how to fix that cleanly?]

I suggest we ask the maintainer for the MMC subsystem.

Pierre, you can find the details in the bugzilla entry.  Briefly,
there's a pathway in the MMC core suspend routine (if the driver
doesn't implement a resume hook) which could lead to the host being
removed during a system suspend.  This is an illegal operation and it
will deadlock.

Do you have a suggestion for a way to fix it?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Alan Stern
On Tue, 19 Feb 2008 [EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=10030
 
 
 
 
 
 --- Comment #14 from [EMAIL PROTECTED]  2008-02-19 15:23 ---
 Thanks a lot for the debugging work!
 
 First, the patch triggers, which means that the problem discovered by Alan is
 troubling us.  [Alan, do you have an idea how to fix that cleanly?]

I suggest we ask the maintainer for the MMC subsystem.

Pierre, you can find the details in the bugzilla entry.  Briefly,
there's a pathway in the MMC core suspend routine (if the driver
doesn't implement a resume hook) which could lead to the host being
removed during a system suspend.  This is an illegal operation and it
will deadlock.

Do you have a suggestion for a way to fix it?

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Pierre Ossman
On Wed, 20 Feb 2008 11:42:56 -0500 (EST)
Alan Stern [EMAIL PROTECTED] wrote:

  
  --- Comment #14 from [EMAIL PROTECTED]  2008-02-19 15:23 ---
  Thanks a lot for the debugging work!
  
  First, the patch triggers, which means that the problem discovered by Alan 
  is
  troubling us.  [Alan, do you have an idea how to fix that cleanly?]
 
 I suggest we ask the maintainer for the MMC subsystem.
 
 Pierre, you can find the details in the bugzilla entry.  Briefly,
 there's a pathway in the MMC core suspend routine (if the driver
 doesn't implement a resume hook) which could lead to the host being
 removed during a system suspend.  This is an illegal operation and it
 will deadlock.
 
 Do you have a suggestion for a way to fix it?
 

Not really. But you have some things confused. What it checks is if the mmc bus 
handler (not a proper driver model, just a way of separating the MMC, SD and 
SDIO stuff) has a resume function. And if it doesn't, it removes the card 
(since it cannot revive it at resume).

So the only thing I can think of is to delay the removal until the resume 
routine, if that is safer.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Alan Stern
On Wed, 20 Feb 2008, Pierre Ossman wrote:

 Not really. But you have some things confused. What it checks is if
 the mmc bus handler (not a proper driver model, just a way of
 separating the MMC, SD and SDIO stuff) has a resume function. And if
 it doesn't, it removes the card (since it cannot revive it at
 resume).
 
 So the only thing I can think of is to delay the removal until the
 resume routine, if that is safer.

Do I understand this correctly?  You've got special handling for the 
case where a bus handler doesn't have a resume routine, but no special 
handling for the case where it doesn't have a suspend routine?  Why 
bother to remove the device if neither routine exists (there won't be 
any need to revive it since the bus never got suspended)?

And why not simply fail the suspend if the resume routine doesn't exist
and the suspend routine does?  Maybe with an error message in the
system log.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Pierre Ossman
On Wed, 20 Feb 2008 14:26:16 -0500 (EST)
Alan Stern [EMAIL PROTECTED] wrote:

 
 Do I understand this correctly?  You've got special handling for the 
 case where a bus handler doesn't have a resume routine, but no special 
 handling for the case where it doesn't have a suspend routine?

Hmm... There should be checks for both, but the code seems to suggest otherwise.

 Why bother to remove the device if neither routine exists (there won't be 
 any need to revive it since the bus never got suspended)?

The bus always gets suspended. The checks are to determine if state is saved or 
not. If it isn't, then a suspend/resume is treated as a removal/insertion.

 And why not simply fail the suspend if the resume routine doesn't exist
 and the suspend routine does?  Maybe with an error message in the
 system log.

For the asymmetric case, I guess that would do. But I still want to remove 
devices when the bus handler has no suspend handling.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Rafael J. Wysocki
On Wednesday, 20 of February 2008, Pierre Ossman wrote:
 On Wed, 20 Feb 2008 14:26:16 -0500 (EST)
 Alan Stern [EMAIL PROTECTED] wrote:
 
  
  Do I understand this correctly?  You've got special handling for the 
  case where a bus handler doesn't have a resume routine, but no special 
  handling for the case where it doesn't have a suspend routine?
 
 Hmm... There should be checks for both, but the code seems to suggest 
 otherwise.
 
  Why bother to remove the device if neither routine exists (there won't be 
  any need to revive it since the bus never got suspended)?
 
 The bus always gets suspended. The checks are to determine if state is saved 
 or not. If it isn't, then a suspend/resume is treated as a removal/insertion.
 
  And why not simply fail the suspend if the resume routine doesn't exist
  and the suspend routine does?  Maybe with an error message in the
  system log.
 
 For the asymmetric case, I guess that would do. But I still want to remove 
 devices when the bus handler has no suspend handling.

I think I know how to handle that, but there's a more urgent issue I need to
fix first.  Stay tuned. :-)

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Alan Stern
On Wed, 20 Feb 2008, Pierre Ossman wrote:

  And why not simply fail the suspend if the resume routine doesn't exist
  and the suspend routine does?  Maybe with an error message in the
  system log.
 
 For the asymmetric case, I guess that would do. But I still want to remove 
 devices when the bus handler has no suspend handling.

Then it appears the correct approach is to use the new 
device_pm_schedule_removal() routine.  It schedules the removal of a 
suspended device for a time when the system suspend is over.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Rafael J. Wysocki
On Wednesday, 20 of February 2008, Alan Stern wrote:
 On Wed, 20 Feb 2008, Pierre Ossman wrote:
 
   And why not simply fail the suspend if the resume routine doesn't exist
   and the suspend routine does?  Maybe with an error message in the
   system log.
  
  For the asymmetric case, I guess that would do. But I still want to remove 
  devices when the bus handler has no suspend handling.
 
 Then it appears the correct approach is to use the new 
 device_pm_schedule_removal() routine.  It schedules the removal of a 
 suspended device for a time when the system suspend is over.

Well, below is an uncompiled and untested but illustrating the idea that
might allow people not to bother with device_pm_schedule_removal()
explicitly and can fix the issue at hand.

[There are some cases that need handling and are not covered here.]

Please have a look.

Thanks,
Rafael

---
 drivers/base/core.c|5 -
 drivers/base/power/main.c  |   22 ++
 drivers/base/power/power.h |5 +
 3 files changed, 31 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/base/power/main.c
===
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -59,6 +59,26 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
 
 int (*platform_enable_wakeup)(struct device *dev, int is_on);
 
+static struct task_struct *suspending_task;
+static DEFINE_MUTEX(suspending_task_mtx);
+
+bool in_suspend_context(void)
+{
+   bool result;
+
+   mutex_lock(suspending_task_mtx);
+   result = (suspending_task == current);
+   mutex_unlock(suspending_task_mtx);
+   return result;
+}
+
+static void set_suspending_task(struct task_struct *task)
+{
+   mutex_lock(suspending_task_mtx);
+   suspending_task = task;
+   mutex_unlock(suspending_task_mtx);
+}
+
 /**
  * device_pm_add - add a device to the list of active devices
  * @dev:   Device to be added to the list
@@ -272,6 +292,7 @@ static void dpm_resume(void)
mutex_lock(dpm_list_mtx);
}
mutex_unlock(dpm_list_mtx);
+   set_suspending_task(NULL);
 }
 
 /**
@@ -460,6 +481,7 @@ static int dpm_suspend(pm_message_t stat
 {
int error = 0;
 
+   set_suspending_task(current);
mutex_lock(dpm_list_mtx);
while (!list_empty(dpm_locked)) {
struct list_head *entry = dpm_locked.prev;
Index: linux-2.6/drivers/base/core.c
===
--- linux-2.6.orig/drivers/base/core.c
+++ linux-2.6/drivers/base/core.c
@@ -1162,7 +1162,10 @@ void device_destroy(struct class *class,
dev = class_find_device(class, devt, __match_devt);
if (dev) {
put_device(dev);
-   device_unregister(dev);
+   if (in_suspend_context())
+   device_pm_schedule_removal(dev);
+   else
+   device_unregister(dev);
}
 }
 EXPORT_SYMBOL_GPL(device_destroy);
Index: linux-2.6/drivers/base/power/power.h
===
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -11,6 +11,7 @@ static inline struct device *to_device(s
return container_of(entry, struct device, power.entry);
 }
 
+extern bool in_suspend_context(void);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern int pm_sleep_lock(void);
@@ -18,6 +19,10 @@ extern void pm_sleep_unlock(void);
 
 #else /* CONFIG_PM_SLEEP */
 
+static inline bool in_suspend_context(void)
+{
+   return false;
+}
 
 static inline void device_pm_add(struct device *dev)
 {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Alan Stern
On Wed, 20 Feb 2008, Rafael J. Wysocki wrote:

 Well, below is an uncompiled and untested but illustrating the idea that
 might allow people not to bother with device_pm_schedule_removal()
 explicitly and can fix the issue at hand.
 
 [There are some cases that need handling and are not covered here.]
 
 Please have a look.
 
 Thanks,
 Rafael

 +static struct task_struct *suspending_task;
 +static DEFINE_MUTEX(suspending_task_mtx);

I suspect you don't really need this mutex.

 +bool in_suspend_context(void)
 +{
 + bool result;
 +
 + mutex_lock(suspending_task_mtx);
 + result = (suspending_task == current);
 + mutex_unlock(suspending_task_mtx);
 + return result;
 +}

If suspending_task == current then you are guaranteed to be serialized, 
because everything a single task does is serial.

 @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class,
   dev = class_find_device(class, devt, __match_devt);
   if (dev) {
   put_device(dev);
 - device_unregister(dev);
 + if (in_suspend_context())
 + device_pm_schedule_removal(dev);
 + else
 + device_unregister(dev);
   }
  }

But what about device_del()?  Can a similar change be made there?

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Rafael J. Wysocki
On Wednesday, 20 of February 2008, Alan Stern wrote:
 On Wed, 20 Feb 2008, Rafael J. Wysocki wrote:
 
  Well, below is an uncompiled and untested but illustrating the idea that
  might allow people not to bother with device_pm_schedule_removal()
  explicitly and can fix the issue at hand.
  
  [There are some cases that need handling and are not covered here.]
  
  Please have a look.
  
  Thanks,
  Rafael
 
  +static struct task_struct *suspending_task;
  +static DEFINE_MUTEX(suspending_task_mtx);
 
 I suspect you don't really need this mutex.
 
  +bool in_suspend_context(void)
  +{
  +   bool result;
  +
  +   mutex_lock(suspending_task_mtx);
  +   result = (suspending_task == current);
  +   mutex_unlock(suspending_task_mtx);
  +   return result;
  +}
 
 If suspending_task == current then you are guaranteed to be serialized, 
 because everything a single task does is serial.

But in principle there could be a concurrent thread removind the device and
that should block on dev-sem held by us.

Right now that's not very likely to happen thanks to the freezer, but we're
doing all this stuff, because we want to get rid of the freezer eventually. :-)

  @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class,
  dev = class_find_device(class, devt, __match_devt);
  if (dev) {
  put_device(dev);
  -   device_unregister(dev);
  +   if (in_suspend_context())
  +   device_pm_schedule_removal(dev);
  +   else
  +   device_unregister(dev);
  }
   }
 
 But what about device_del()?  Can a similar change be made there?

I believe so.

I'm working on a more complete patch right now.

Thanks,
Rafael
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 10030] Suspend doesn't work when SD card is inserted

2008-02-20 Thread Rafael J. Wysocki
On Wednesday, 20 of February 2008, Alan Stern wrote:
 On Wed, 20 Feb 2008, Rafael J. Wysocki wrote:
 
  Well, below is an uncompiled and untested but illustrating the idea that
  might allow people not to bother with device_pm_schedule_removal()
  explicitly and can fix the issue at hand.
  
  [There are some cases that need handling and are not covered here.]
  
  Please have a look.
  
  Thanks,
  Rafael
 
  +static struct task_struct *suspending_task;
  +static DEFINE_MUTEX(suspending_task_mtx);
 
 I suspect you don't really need this mutex.
 
  +bool in_suspend_context(void)
  +{
  +   bool result;
  +
  +   mutex_lock(suspending_task_mtx);
  +   result = (suspending_task == current);
  +   mutex_unlock(suspending_task_mtx);
  +   return result;
  +}
 
 If suspending_task == current then you are guaranteed to be serialized, 
 because everything a single task does is serial.

As I said before (but that doesn't seem to reach the list, so I'm repeating),
this is to protect other tasks from reading an inconsistent value of
suspending_task in case they attempt to remove a device concurrently with
respect to us.

While this is not likely to happen right now, because of the freezer, it may
very well happen when the freezer is finally removed.

  @@ -1162,7 +1162,10 @@ void device_destroy(struct class *class,
  dev = class_find_device(class, devt, __match_devt);
  if (dev) {
  put_device(dev);
  -   device_unregister(dev);
  +   if (in_suspend_context())
  +   device_pm_schedule_removal(dev);
  +   else
  +   device_unregister(dev);
  }
   }
 
 But what about device_del()?  Can a similar change be made there?

Something like the patch below, perhaps?  Again, untested, but compiled this
time. :-)

Thanks,
Rafael

---
 drivers/base/core.c|5 +
 drivers/base/power/main.c  |   22 ++
 drivers/base/power/power.h |5 +
 3 files changed, 32 insertions(+)

Index: linux-2.6/drivers/base/power/main.c
===
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -59,6 +59,26 @@ static DECLARE_RWSEM(pm_sleep_rwsem);
 
 int (*platform_enable_wakeup)(struct device *dev, int is_on);
 
+static struct task_struct *suspending_task;
+static DEFINE_MUTEX(suspending_task_mtx);
+
+bool in_suspend_context(void)
+{
+   bool result;
+
+   mutex_lock(suspending_task_mtx);
+   result = (suspending_task == current);
+   mutex_unlock(suspending_task_mtx);
+   return result;
+}
+
+static void set_suspending_task(struct task_struct *task)
+{
+   mutex_lock(suspending_task_mtx);
+   suspending_task = task;
+   mutex_unlock(suspending_task_mtx);
+}
+
 /**
  * device_pm_add - add a device to the list of active devices
  * @dev:   Device to be added to the list
@@ -272,6 +292,7 @@ static void dpm_resume(void)
mutex_lock(dpm_list_mtx);
}
mutex_unlock(dpm_list_mtx);
+   set_suspending_task(NULL);
 }
 
 /**
@@ -460,6 +481,7 @@ static int dpm_suspend(pm_message_t stat
 {
int error = 0;
 
+   set_suspending_task(current);
mutex_lock(dpm_list_mtx);
while (!list_empty(dpm_locked)) {
struct list_head *entry = dpm_locked.prev;
Index: linux-2.6/drivers/base/core.c
===
--- linux-2.6.orig/drivers/base/core.c
+++ linux-2.6/drivers/base/core.c
@@ -929,6 +929,11 @@ void device_del(struct device *dev)
struct device *parent = dev-parent;
struct class_interface *class_intf;
 
+   if (in_suspend_context()) {
+   get_device(dev);
+   device_pm_schedule_removal(dev);
+   return;
+   }
device_pm_remove(dev);
if (parent)
klist_del(dev-knode_parent);
Index: linux-2.6/drivers/base/power/power.h
===
--- linux-2.6.orig/drivers/base/power/power.h
+++ linux-2.6/drivers/base/power/power.h
@@ -11,6 +11,7 @@ static inline struct device *to_device(s
return container_of(entry, struct device, power.entry);
 }
 
+extern bool in_suspend_context(void);
 extern void device_pm_add(struct device *);
 extern void device_pm_remove(struct device *);
 extern int pm_sleep_lock(void);
@@ -18,6 +19,10 @@ extern void pm_sleep_unlock(void);
 
 #else /* CONFIG_PM_SLEEP */
 
+static inline bool in_suspend_context(void)
+{
+   return false;
+}
 
 static inline void device_pm_add(struct device *dev)
 {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >