subject:"2.6.22\-rc1\: Broken suspend on SMP with tifm"

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-20 Thread Rafael J. Wysocki

On Sunday, 20 May 2007 23:06, Oleg Nesterov wrote:
> On 05/20, Rafael J. Wysocki wrote:
> >
> > On Sunday, 20 May 2007 21:54, Oleg Nesterov wrote:
> > > 
> > > I am a bit afraid of too many yes/no options for the freezer, a couple of 
> > > naive
> > > questions.
> > > 
> > > 1. Can't we make all wqs freezable? I still can't see the reason to have 
> > > both
> > >freezable and not freezable wqs.
> > 
> > The reason might be the same as for having freezable and nonfreezable kernel
> > threads in general.  For example, there are some kernel threads that we need
> > for saving the image and I don't see why there shouldn't be any such
> > workqueues.
> 
> OK, I see.
> 
> > > 2. Why do we need CPU_TASKS_FROZEN? Can't we change cpu-hotplug to always
> > >freeze tasks right now, without any additional changes?
> > 
> > In principle, we can, but for this purpose we'd have to modify all NOFREEZE
> > tasks.
> 
> Why?

Ah, sorry, I didn't understand the question correctly.

> >That wouldn't fly, I'm afraid.
> > 
> > >Any subsystem should handle correctly the case when _cpu_down() (say)
> > >is called with tasks_frozen == 1 anyway. So, why can't we simplify
> > >things and do
> > > 
> > >   _cpu_down(int tasks_frozen)
> > > 
> > >   if (!tasks_frozen)
> > >   freeze_processes();
> > >   ...
> > > 
> > >   right now?

Yes, we can do this, I think.
 
> > But we call _cpu_down() after device_suspend(), so many tasks are already
> > frozen at this point.  We'd only need to freeze those that are not frozen 
> > and
> > in _cpu_up() we'd have to thaw them.
> 
> Not sure I understand. When we call _cpu_down() after device_suspend(), we
> check tasks_frozen == 1, and do not call freeze_processes(). If the task
> could be frozen, it is already frozen.
> 
> When _cpu_down() sees tasks_frozen = 0, it does freeze_processes() itself,
> and thaw_tasks() on return.
> 
> IOW, we never send (say) CPU_DEAD, always CPU_DEAD_FROZEN.

Yes, that seems reasonable.

This means that every user of freezable kernel threads who installs a CPU
hotplug notifier will have to assume that its kernel threads are frozen when
the notifier is called.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-20 Thread Oleg Nesterov

On 05/20, Rafael J. Wysocki wrote:
>
> On Sunday, 20 May 2007 21:54, Oleg Nesterov wrote:
> > 
> > I am a bit afraid of too many yes/no options for the freezer, a couple of 
> > naive
> > questions.
> > 
> > 1. Can't we make all wqs freezable? I still can't see the reason to have 
> > both
> >freezable and not freezable wqs.
> 
> The reason might be the same as for having freezable and nonfreezable kernel
> threads in general.  For example, there are some kernel threads that we need
> for saving the image and I don't see why there shouldn't be any such
> workqueues.

OK, I see.

> > 2. Why do we need CPU_TASKS_FROZEN? Can't we change cpu-hotplug to always
> >freeze tasks right now, without any additional changes?
> 
> In principle, we can, but for this purpose we'd have to modify all NOFREEZE
> tasks.

Why?

>That wouldn't fly, I'm afraid.
> 
> >Any subsystem should handle correctly the case when _cpu_down() (say)
> >is called with tasks_frozen == 1 anyway. So, why can't we simplify
> >things and do
> > 
> > _cpu_down(int tasks_frozen)
> > 
> > if (!tasks_frozen)
> > freeze_processes();
> > ...
> > 
> >   right now?
> 
> But we call _cpu_down() after device_suspend(), so many tasks are already
> frozen at this point.  We'd only need to freeze those that are not frozen and
> in _cpu_up() we'd have to thaw them.

Not sure I understand. When we call _cpu_down() after device_suspend(), we
check tasks_frozen == 1, and do not call freeze_processes(). If the task
could be frozen, it is already frozen.

When _cpu_down() sees tasks_frozen = 0, it does freeze_processes() itself,
and thaw_tasks() on return.

IOW, we never send (say) CPU_DEAD, always CPU_DEAD_FROZEN.

Wouldn't fly?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-20 Thread Rafael J. Wysocki

On Sunday, 20 May 2007 21:54, Oleg Nesterov wrote:
> On 05/15, Rafael J. Wysocki wrote:
> >
> > On Monday, 14 May 2007 23:48, Oleg Nesterov wrote:
> > > 
> > > So, in the long term, should we change this only user, or we think we 
> > > better fix
> > > freezeable wqs again?
> > 
> > Long term, I'd like to have freezable workqueues, so that people don't have 
> > to
> > use "raw" kernel threads only because they need some synchronization with
> > hibertnation/suspend.  Plus some cases in which workqueues are used by
> > fs-related code make me worry.
> 
> OK, so we should fix them. It would be great to also fix the last known 
> problem
> as well (work->func() vs hotplug callback deadlocks).
> 
> I am a bit afraid of too many yes/no options for the freezer, a couple of 
> naive
> questions.
> 
> 1. Can't we make all wqs freezable? I still can't see the reason to have both
>freezable and not freezable wqs.

The reason might be the same as for having freezable and nonfreezable kernel
threads in general.  For example, there are some kernel threads that we need
for saving the image and I don't see why there shouldn't be any such
workqueues.
 
> 2. Why do we need CPU_TASKS_FROZEN? Can't we change cpu-hotplug to always
>freeze tasks right now, without any additional changes?

In principle, we can, but for this purpose we'd have to modify all NOFREEZE
tasks.  That wouldn't fly, I'm afraid.

>Any subsystem should handle correctly the case when _cpu_down() (say)
>is called with tasks_frozen == 1 anyway. So, why can't we simplify
>things and do
> 
>   _cpu_down(int tasks_frozen)
> 
>   if (!tasks_frozen)
>   freeze_processes();
>   ...
> 
>   right now?

But we call _cpu_down() after device_suspend(), so many tasks are already
frozen at this point.  We'd only need to freeze those that are not frozen and
in _cpu_up() we'd have to thaw them.

> > [*] The problem is, though, that freezable workqueus have some potential to 
> > fail
> > the freezer.  Namely, suppose task A calls flush_workqueue() on a freezable
> > workqueue, finds some work items in there, inserts the barrier and waits for
> > completion (TASK_UNINTERRUPTIBLE).  In the meantime, TIF_FREEZE is set on
> > the worker thread, which is then woken up and goes to the refrigerator.  
> > Thus
> > if A is not NOFREEZE, the freezing of tasks will fail (A must be a kernel
> > thread for this to happen, but still).  Worse yet, if A is NOFREEZE, it 
> > will be
> > blocked until the worker thread is woken up.
> 
> Yes, this is yet another dependency which freezer can't handle. Probably it is
> better to ignore this problem for now.
> 
> > To avoid this, I think, we may need to redesign the freezer, so that 
> > freezable
> > worker threads are frozen after all of the other kernel threads.
> 
> I doubt we can find a very clean way to do this. Besides, what if work->func()
> does flush_workqueue(another_wq) ? How can we decide which wq to freeze first?

We can't.

I think it would be a mistake to even try to remove all limitations from the
freezer.  Any other synchronization mechanisms have some limitations as well.

The code that uses these mechanisms is usually expected to use them in a sane
way and I don't see why we shouldn't expect the freezer users to do the same. 
;-)
 
> >   
> > Additionally,
> > we'd need to make a rule that NOFREEZE kernel threads must not call
> > flush_workqueue() or cancel_work_sync() on freezable workqueues.
> 
> cancel_work_sync() is OK, it can be used safely even if workqueue is frozen.
> flush_workqueue() and destroy_workqueue() are not.

Yes, you're right.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-20 Thread Oleg Nesterov

On 05/15, Rafael J. Wysocki wrote:
>
> On Monday, 14 May 2007 23:48, Oleg Nesterov wrote:
> > 
> > So, in the long term, should we change this only user, or we think we 
> > better fix
> > freezeable wqs again?
> 
> Long term, I'd like to have freezable workqueues, so that people don't have to
> use "raw" kernel threads only because they need some synchronization with
> hibertnation/suspend.  Plus some cases in which workqueues are used by
> fs-related code make me worry.

OK, so we should fix them. It would be great to also fix the last known problem
as well (work->func() vs hotplug callback deadlocks).

I am a bit afraid of too many yes/no options for the freezer, a couple of naive
questions.

1. Can't we make all wqs freezable? I still can't see the reason to have both
   freezable and not freezable wqs.

2. Why do we need CPU_TASKS_FROZEN? Can't we change cpu-hotplug to always
   freeze tasks right now, without any additional changes?

   Any subsystem should handle correctly the case when _cpu_down() (say)
   is called with tasks_frozen == 1 anyway. So, why can't we simplify
   things and do

_cpu_down(int tasks_frozen)

if (!tasks_frozen)
freeze_processes();
...

  right now?

> [*] The problem is, though, that freezable workqueus have some potential to 
> fail
> the freezer.  Namely, suppose task A calls flush_workqueue() on a freezable
> workqueue, finds some work items in there, inserts the barrier and waits for
> completion (TASK_UNINTERRUPTIBLE).  In the meantime, TIF_FREEZE is set on
> the worker thread, which is then woken up and goes to the refrigerator.  Thus
> if A is not NOFREEZE, the freezing of tasks will fail (A must be a kernel
> thread for this to happen, but still).  Worse yet, if A is NOFREEZE, it will 
> be
> blocked until the worker thread is woken up.

Yes, this is yet another dependency which freezer can't handle. Probably it is
better to ignore this problem for now.

> To avoid this, I think, we may need to redesign the freezer, so that freezable
> worker threads are frozen after all of the other kernel threads.

I doubt we can find a very clean way to do this. Besides, what if work->func()
does flush_workqueue(another_wq) ? How can we decide which wq to freeze first?

>   
> Additionally,
> we'd need to make a rule that NOFREEZE kernel threads must not call
> flush_workqueue() or cancel_work_sync() on freezable workqueues.

cancel_work_sync() is OK, it can be used safely even if workqueue is frozen.
flush_workqueue() and destroy_workqueue() are not.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-20 Thread Oleg Nesterov

On 05/15, Rafael J. Wysocki wrote:

 On Monday, 14 May 2007 23:48, Oleg Nesterov wrote:
  
  So, in the long term, should we change this only user, or we think we 
  better fix
  freezeable wqs again?
 
 Long term, I'd like to have freezable workqueues, so that people don't have to
 use raw kernel threads only because they need some synchronization with
 hibertnation/suspend.  Plus some cases in which workqueues are used by
 fs-related code make me worry.

OK, so we should fix them. It would be great to also fix the last known problem
as well (work-func() vs hotplug callback deadlocks).

I am a bit afraid of too many yes/no options for the freezer, a couple of naive
questions.

1. Can't we make all wqs freezable? I still can't see the reason to have both
   freezable and not freezable wqs.

2. Why do we need CPU_TASKS_FROZEN? Can't we change cpu-hotplug to always
   freeze tasks right now, without any additional changes?

   Any subsystem should handle correctly the case when _cpu_down() (say)
   is called with tasks_frozen == 1 anyway. So, why can't we simplify
   things and do

_cpu_down(int tasks_frozen)

if (!tasks_frozen)
freeze_processes();
...

  right now?

 [*] The problem is, though, that freezable workqueus have some potential to 
 fail
 the freezer.  Namely, suppose task A calls flush_workqueue() on a freezable
 workqueue, finds some work items in there, inserts the barrier and waits for
 completion (TASK_UNINTERRUPTIBLE).  In the meantime, TIF_FREEZE is set on
 the worker thread, which is then woken up and goes to the refrigerator.  Thus
 if A is not NOFREEZE, the freezing of tasks will fail (A must be a kernel
 thread for this to happen, but still).  Worse yet, if A is NOFREEZE, it will 
 be
 blocked until the worker thread is woken up.

Yes, this is yet another dependency which freezer can't handle. Probably it is
better to ignore this problem for now.

 To avoid this, I think, we may need to redesign the freezer, so that freezable
 worker threads are frozen after all of the other kernel threads.

I doubt we can find a very clean way to do this. Besides, what if work-func()
does flush_workqueue(another_wq) ? How can we decide which wq to freeze first?

   
 Additionally,
 we'd need to make a rule that NOFREEZE kernel threads must not call
 flush_workqueue() or cancel_work_sync() on freezable workqueues.

cancel_work_sync() is OK, it can be used safely even if workqueue is frozen.
flush_workqueue() and destroy_workqueue() are not.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-20 Thread Rafael J. Wysocki

On Sunday, 20 May 2007 21:54, Oleg Nesterov wrote:
 On 05/15, Rafael J. Wysocki wrote:
 
  On Monday, 14 May 2007 23:48, Oleg Nesterov wrote:
   
   So, in the long term, should we change this only user, or we think we 
   better fix
   freezeable wqs again?
  
  Long term, I'd like to have freezable workqueues, so that people don't have 
  to
  use raw kernel threads only because they need some synchronization with
  hibertnation/suspend.  Plus some cases in which workqueues are used by
  fs-related code make me worry.
 
 OK, so we should fix them. It would be great to also fix the last known 
 problem
 as well (work-func() vs hotplug callback deadlocks).
 
 I am a bit afraid of too many yes/no options for the freezer, a couple of 
 naive
 questions.
 
 1. Can't we make all wqs freezable? I still can't see the reason to have both
freezable and not freezable wqs.

The reason might be the same as for having freezable and nonfreezable kernel
threads in general.  For example, there are some kernel threads that we need
for saving the image and I don't see why there shouldn't be any such
workqueues.
 
 2. Why do we need CPU_TASKS_FROZEN? Can't we change cpu-hotplug to always
freeze tasks right now, without any additional changes?

In principle, we can, but for this purpose we'd have to modify all NOFREEZE
tasks.  That wouldn't fly, I'm afraid.

Any subsystem should handle correctly the case when _cpu_down() (say)
is called with tasks_frozen == 1 anyway. So, why can't we simplify
things and do
 
   _cpu_down(int tasks_frozen)
 
   if (!tasks_frozen)
   freeze_processes();
   ...
 
   right now?

But we call _cpu_down() after device_suspend(), so many tasks are already
frozen at this point.  We'd only need to freeze those that are not frozen and
in _cpu_up() we'd have to thaw them.

  [*] The problem is, though, that freezable workqueus have some potential to 
  fail
  the freezer.  Namely, suppose task A calls flush_workqueue() on a freezable
  workqueue, finds some work items in there, inserts the barrier and waits for
  completion (TASK_UNINTERRUPTIBLE).  In the meantime, TIF_FREEZE is set on
  the worker thread, which is then woken up and goes to the refrigerator.  
  Thus
  if A is not NOFREEZE, the freezing of tasks will fail (A must be a kernel
  thread for this to happen, but still).  Worse yet, if A is NOFREEZE, it 
  will be
  blocked until the worker thread is woken up.
 
 Yes, this is yet another dependency which freezer can't handle. Probably it is
 better to ignore this problem for now.
 
  To avoid this, I think, we may need to redesign the freezer, so that 
  freezable
  worker threads are frozen after all of the other kernel threads.
 
 I doubt we can find a very clean way to do this. Besides, what if work-func()
 does flush_workqueue(another_wq) ? How can we decide which wq to freeze first?

We can't.

I think it would be a mistake to even try to remove all limitations from the
freezer.  Any other synchronization mechanisms have some limitations as well.

The code that uses these mechanisms is usually expected to use them in a sane
way and I don't see why we shouldn't expect the freezer users to do the same. 
;-)
 

  Additionally,
  we'd need to make a rule that NOFREEZE kernel threads must not call
  flush_workqueue() or cancel_work_sync() on freezable workqueues.
 
 cancel_work_sync() is OK, it can be used safely even if workqueue is frozen.
 flush_workqueue() and destroy_workqueue() are not.

Yes, you're right.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-20 Thread Oleg Nesterov

On 05/20, Rafael J. Wysocki wrote:

 On Sunday, 20 May 2007 21:54, Oleg Nesterov wrote:
  
  I am a bit afraid of too many yes/no options for the freezer, a couple of 
  naive
  questions.
  
  1. Can't we make all wqs freezable? I still can't see the reason to have 
  both
 freezable and not freezable wqs.
 
 The reason might be the same as for having freezable and nonfreezable kernel
 threads in general.  For example, there are some kernel threads that we need
 for saving the image and I don't see why there shouldn't be any such
 workqueues.

OK, I see.

  2. Why do we need CPU_TASKS_FROZEN? Can't we change cpu-hotplug to always
 freeze tasks right now, without any additional changes?
 
 In principle, we can, but for this purpose we'd have to modify all NOFREEZE
 tasks.

Why?

That wouldn't fly, I'm afraid.
 
 Any subsystem should handle correctly the case when _cpu_down() (say)
 is called with tasks_frozen == 1 anyway. So, why can't we simplify
 things and do
  
  _cpu_down(int tasks_frozen)
  
  if (!tasks_frozen)
  freeze_processes();
  ...
  
right now?
 
 But we call _cpu_down() after device_suspend(), so many tasks are already
 frozen at this point.  We'd only need to freeze those that are not frozen and
 in _cpu_up() we'd have to thaw them.

Not sure I understand. When we call _cpu_down() after device_suspend(), we
check tasks_frozen == 1, and do not call freeze_processes(). If the task
could be frozen, it is already frozen.

When _cpu_down() sees tasks_frozen = 0, it does freeze_processes() itself,
and thaw_tasks() on return.

IOW, we never send (say) CPU_DEAD, always CPU_DEAD_FROZEN.

Wouldn't fly?

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-20 Thread Rafael J. Wysocki

On Sunday, 20 May 2007 23:06, Oleg Nesterov wrote:
 On 05/20, Rafael J. Wysocki wrote:
 
  On Sunday, 20 May 2007 21:54, Oleg Nesterov wrote:
   
   I am a bit afraid of too many yes/no options for the freezer, a couple of 
   naive
   questions.
   
   1. Can't we make all wqs freezable? I still can't see the reason to have 
   both
  freezable and not freezable wqs.
  
  The reason might be the same as for having freezable and nonfreezable kernel
  threads in general.  For example, there are some kernel threads that we need
  for saving the image and I don't see why there shouldn't be any such
  workqueues.
 
 OK, I see.
 
   2. Why do we need CPU_TASKS_FROZEN? Can't we change cpu-hotplug to always
  freeze tasks right now, without any additional changes?
  
  In principle, we can, but for this purpose we'd have to modify all NOFREEZE
  tasks.
 
 Why?

Ah, sorry, I didn't understand the question correctly.

 That wouldn't fly, I'm afraid.
  
  Any subsystem should handle correctly the case when _cpu_down() (say)
  is called with tasks_frozen == 1 anyway. So, why can't we simplify
  things and do
   
 _cpu_down(int tasks_frozen)
   
 if (!tasks_frozen)
 freeze_processes();
 ...
   
 right now?

Yes, we can do this, I think.
 
  But we call _cpu_down() after device_suspend(), so many tasks are already
  frozen at this point.  We'd only need to freeze those that are not frozen 
  and
  in _cpu_up() we'd have to thaw them.
 
 Not sure I understand. When we call _cpu_down() after device_suspend(), we
 check tasks_frozen == 1, and do not call freeze_processes(). If the task
 could be frozen, it is already frozen.
 
 When _cpu_down() sees tasks_frozen = 0, it does freeze_processes() itself,
 and thaw_tasks() on return.
 
 IOW, we never send (say) CPU_DEAD, always CPU_DEAD_FROZEN.

Yes, that seems reasonable.

This means that every user of freezable kernel threads who installs a CPU
hotplug notifier will have to assume that its kernel threads are frozen when
the notifier is called.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-15 Thread Rafael J. Wysocki

On Monday, 14 May 2007 23:48, Oleg Nesterov wrote:
> On 05/14, Rafael J. Wysocki wrote:
> >
> > On Monday, 14 May 2007 18:55, Oleg Nesterov wrote:
> > > 
> > > Rafael, I am afraid we are making too much noise, and this may confuse 
> > > Alex
> > > and Andrew.
> > > 
> > > First, we should decide how to fix the bug we have in 2.6.22. I prefer a 
> > > simple
> > > "make freezeable workqueues singlethread" I sent. It was acked by Alex, 
> > > it is
> > > simple, and it is also good because tifm doesn't need multithreaded wq 
> > > anyway.
> > 
> > Yes, I've already agreed with that.
> 
> Ah, OK, I misunderstood your message as if you propose this fix for 2.6.22.

Never mind. :-)

> > >   - Do we need freezeable workqueues ?
> > 
> > Well, we have at least one case in which they appear to be useful.
> 
> So, in the long term, should we change this only user, or we think we better 
> fix
> freezeable wqs again?

Long term, I'd like to have freezable workqueues, so that people don't have to
use "raw" kernel threads only because they need some synchronization with
hibertnation/suspend.  Plus some cases in which workqueues are used by
fs-related code make me worry.

OTOH, I have some concerns with that (please see [*] below).

> > > WORK2->func() completes.
> > > 
> > > freezer comes. cwq->thread notices TIF_FREEZE and goes to refrigerator 
> > > before
> > > executing that barrier.
> 
> This is not possible. cwq->thread _must_ notice the barrier before it goes to
> refrigerator.
> 
> So, given that we have cpu_populated_map we can re-introduce take_over_work()
> along with migrate_sequence and thus we can fix freezeable multithreaded wqs.

[*] The problem is, though, that freezable workqueus have some potential to fail
the freezer.  Namely, suppose task A calls flush_workqueue() on a freezable
workqueue, finds some work items in there, inserts the barrier and waits for
completion (TASK_UNINTERRUPTIBLE).  In the meantime, TIF_FREEZE is set on
the worker thread, which is then woken up and goes to the refrigerator.  Thus
if A is not NOFREEZE, the freezing of tasks will fail (A must be a kernel
thread for this to happen, but still).  Worse yet, if A is NOFREEZE, it will be
blocked until the worker thread is woken up.

To avoid this, I think, we may need to redesign the freezer, so that freezable
worker threads are frozen after all of the other kernel threads.  Additionally,
we'd need to make a rule that NOFREEZE kernel threads must not call
flush_workqueue() or cancel_work_sync() on freezable workqueues.

> > > If we have take_over_work() we should use it for any workqueue,
> > > freezeable or not. Otherwise this is just a mess, imho.
> 
> Still, this is imho true. So we'd better do some other changes to be 
> consistent.

Agreed.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-15 Thread Rafael J. Wysocki

On Tuesday, 15 May 2007 02:56, Alex Dubov wrote:
> > 
> > > > - Do we need freezeable workqueues ?
> > > 
> > > Well, we have at least one case in which they appear to be useful.
> > 
> 
> I need freezeable wq exactly for the fact that they are synchronized with 
> suspend/resume. My
> workitem may do device_register/unregister and it can (and will be) scheduled 
> from irq handler
> during resume. As far as I understand, before freezeable wqs, kthreads were 
> the only way to
> achieve this behavior,

That's correct.

> which is less convenient.

Thanks for the explanation. 

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-15 Thread Rafael J. Wysocki

On Tuesday, 15 May 2007 02:56, Alex Dubov wrote:
  
- Do we need freezeable workqueues ?
   
   Well, we have at least one case in which they appear to be useful.
  
 
 I need freezeable wq exactly for the fact that they are synchronized with 
 suspend/resume. My
 workitem may do device_register/unregister and it can (and will be) scheduled 
 from irq handler
 during resume. As far as I understand, before freezeable wqs, kthreads were 
 the only way to
 achieve this behavior,

That's correct.

 which is less convenient.

Thanks for the explanation. 

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-15 Thread Rafael J. Wysocki

On Monday, 14 May 2007 23:48, Oleg Nesterov wrote:
 On 05/14, Rafael J. Wysocki wrote:
 
  On Monday, 14 May 2007 18:55, Oleg Nesterov wrote:
   
   Rafael, I am afraid we are making too much noise, and this may confuse 
   Alex
   and Andrew.
   
   First, we should decide how to fix the bug we have in 2.6.22. I prefer a 
   simple
   make freezeable workqueues singlethread I sent. It was acked by Alex, 
   it is
   simple, and it is also good because tifm doesn't need multithreaded wq 
   anyway.
  
  Yes, I've already agreed with that.
 
 Ah, OK, I misunderstood your message as if you propose this fix for 2.6.22.

Never mind. :-)

 - Do we need freezeable workqueues ?
  
  Well, we have at least one case in which they appear to be useful.
 
 So, in the long term, should we change this only user, or we think we better 
 fix
 freezeable wqs again?

Long term, I'd like to have freezable workqueues, so that people don't have to
use raw kernel threads only because they need some synchronization with
hibertnation/suspend.  Plus some cases in which workqueues are used by
fs-related code make me worry.

OTOH, I have some concerns with that (please see [*] below).
 
   WORK2-func() completes.
   
   freezer comes. cwq-thread notices TIF_FREEZE and goes to refrigerator 
   before
   executing that barrier.
 
 This is not possible. cwq-thread _must_ notice the barrier before it goes to
 refrigerator.
 
 So, given that we have cpu_populated_map we can re-introduce take_over_work()
 along with migrate_sequence and thus we can fix freezeable multithreaded wqs.

[*] The problem is, though, that freezable workqueus have some potential to fail
the freezer.  Namely, suppose task A calls flush_workqueue() on a freezable
workqueue, finds some work items in there, inserts the barrier and waits for
completion (TASK_UNINTERRUPTIBLE).  In the meantime, TIF_FREEZE is set on
the worker thread, which is then woken up and goes to the refrigerator.  Thus
if A is not NOFREEZE, the freezing of tasks will fail (A must be a kernel
thread for this to happen, but still).  Worse yet, if A is NOFREEZE, it will be
blocked until the worker thread is woken up.

To avoid this, I think, we may need to redesign the freezer, so that freezable
worker threads are frozen after all of the other kernel threads.  Additionally,
we'd need to make a rule that NOFREEZE kernel threads must not call
flush_workqueue() or cancel_work_sync() on freezable workqueues.

   If we have take_over_work() we should use it for any workqueue,
   freezeable or not. Otherwise this is just a mess, imho.
 
 Still, this is imho true. So we'd better do some other changes to be 
 consistent.

Agreed.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-14 Thread Alex Dubov

> 
> > >   - Do we need freezeable workqueues ?
> > 
> > Well, we have at least one case in which they appear to be useful.
> 

I need freezeable wq exactly for the fact that they are synchronized with 
suspend/resume. My
workitem may do device_register/unregister and it can (and will be) scheduled 
from irq handler
during resume. As far as I understand, before freezeable wqs, kthreads were the 
only way to
achieve this behavior, which is less convenient.




 

Need Mail bonding?
Go to the Yahoo! Mail Q for great tips from Yahoo! Answers users.
http://answers.yahoo.com/dir/?link=list=396546091
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-14 Thread Oleg Nesterov

On 05/14, Rafael J. Wysocki wrote:
>
> On Monday, 14 May 2007 18:55, Oleg Nesterov wrote:
> > 
> > Rafael, I am afraid we are making too much noise, and this may confuse Alex
> > and Andrew.
> > 
> > First, we should decide how to fix the bug we have in 2.6.22. I prefer a 
> > simple
> > "make freezeable workqueues singlethread" I sent. It was acked by Alex, it 
> > is
> > simple, and it is also good because tifm doesn't need multithreaded wq 
> > anyway.
> 
> Yes, I've already agreed with that.

Ah, OK, I misunderstood your message as if you propose this fix for 2.6.22.

> > - Do we need freezeable workqueues ?
> 
> Well, we have at least one case in which they appear to be useful.

So, in the long term, should we change this only user, or we think we better fix
freezeable wqs again?

> > WORK2->func() completes.
> > 
> > freezer comes. cwq->thread notices TIF_FREEZE and goes to refrigerator 
> > before
> > executing that barrier.

This is not possible. cwq->thread _must_ notice the barrier before it goes to
refrigerator.

So, given that we have cpu_populated_map we can re-introduce take_over_work()
along with migrate_sequence and thus we can fix freezeable multithreaded wqs.

> > If we have take_over_work() we should use it for any workqueue,
> > freezeable or not. Otherwise this is just a mess, imho.

Still, this is imho true. So we'd better do some other changes to be consistent.

---
> Yeah, I need to learn more.

No. I should read the patches more carefully before complain.

> Sorry for the confusion I made.

Rafael, it is me who have to apologize.
---

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-14 Thread Rafael J. Wysocki

On Monday, 14 May 2007 18:55, Oleg Nesterov wrote:
> Long. Please read.
> 
> On 05/14, Rafael J. Wysocki wrote:
> >
> > I think I have solved this particular problem without any locking:
> 
> Rafael, I am afraid we are making too much noise, and this may confuse Alex
> and Andrew.
> 
> First, we should decide how to fix the bug we have in 2.6.22. I prefer a 
> simple
> "make freezeable workqueues singlethread" I sent. It was acked by Alex, it is
> simple, and it is also good because tifm doesn't need multithreaded wq anyway.

Yes, I've already agreed with that.

> I'll comment the patch you sent below, but for a start
> 
> ---
> Guys, let's have a plan 
> !!!
> 
> I do not understand what's going on. With or without recent changes in 
> workqueue.c
> multithreaded freezeable workqueues were broken by other changes in suspend.
> 
> It was decided we don't need them, they should go away. We even had a patch
> which removed freezeable workqueues (multithreaded or not) completely. But it
> conflicted with other changes in -mm, so it was deleted, and then forgotten.
> 
> I never understood why do we need freezeable workqueues. Is they needed to
> synchronize with suspend? In that case, shouldn't we have some form of
> notification wich allows the driver to cancel the work which should not run
> at suspend time?

To make the long story short, we see some suspend-related problems that may
be attributed to workqueues used by XFS, but not directly, AFAICS.  That's why
we tried to introduce the freezability of workqueues, but I think that wasn't a
good idea.

> OK, so you think we should re-introduce them.

I just think we *might* introduce them, if there are some users.  Obviously we
have one user right now, but he only needs a singlethread workqueue, so your
small patch is the right thing to do for 2.6.22.

> What about incoming CPU-hotplug changes? The goal was - again! - remove the
> "singlethread" parameter, make them all freezeable, and freeze all processes
> to handle CPU_DEAD. In that case we don't have any problems, we can
> re-introduce take_over_work() without migrate_sequence this patch adds.
> 
> So.
>   - Do we need freezeable workqueues ?

Well, we have at least one case in which they appear to be useful.

>   - Do we need multithreaded freezeable workqueues ?

Not right now, if ever.

>   - Do we need them for 2.6.22 ?

Certainly not.

>   - Should we wait for CPU-hotplug changes, or we should
> re-introduce them right now ?

We don't need to reintroduce freezable multithreaded workqueues right now.

> > --- linux-2.6.22-rc1.orig/kernel/workqueue.c
> > +++ linux-2.6.22-rc1/kernel/workqueue.c
> > @@ -62,6 +62,9 @@ struct workqueue_struct {
> > const char *name;
> > int singlethread;
> > int freezeable; /* Freeze threads during suspend */
> > +   atomic_t work_sw;   /* Used to indicate that some work has been
> > +* moved from one CPU to another
> > +*/
> >  };
> 
> "work_sw" should not be atomic, and since the race is _very_ unlikely it
> could be global. We had such a thing, it was called "migrate_sequence" and
> it was removed.
> 
> It didn't work, but it _can_ work now because we have cpu_populated_map.
> However, this needs more thinking, because it breaks cancel_work_sync()
> in a very subtle way.
> 
> > +static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
> > +{
> > +   struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
> > +   struct list_head list;
> > +   struct work_struct *work;
> > +
> > +   spin_lock_irq(>lock);
> > +   list_replace_init(>worklist, );
> > +
> > +   if (!list_empty()) {
> > +   /*
> > +* Tell flush_workqueue() that it should repeat the loop over
> > +* CPUs
> > +*/
> > +   atomic_inc(>work_sw);
> > +   while (!list_empty()) {
> > +   printk("Taking work for %s\n", wq->name);
> > +   work = list_entry(list.next, struct work_struct, entry);
> > +   list_del(>entry);
> > +   __queue_work(per_cpu_ptr(wq->cpu_wq,
> > +   smp_processor_id()), work);
> > +   }
> > +   }
> > +   spin_unlock_irq(>lock);
> > +}
> > +
> 
> Suppose that we have some pending WORK1 on CPU 0 which needs some LOCK.
> 
> We take this LOCK and call cancel_work_sync(WORK2) which is running on CPU 1.
> cancel_work_sync() inserts a barrier after WORK2 and waits for completion.
> 
> WORK2->func() completes.
> 
> freezer comes. cwq->thread notices TIF_FREEZE and goes to refrigerator before
> executing that barrier.
> 
> CPU_DEAD_FROZEN(cpu == 1) moves that barrier to CPU 0.
> 
> thaw_processes().
> 
> DEADLOCK.
> 
> We hold the LOCK and sleep waiting for the completion of that

Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-14 Thread Oleg Nesterov

Long. Please read.

On 05/14, Rafael J. Wysocki wrote:
>
> I think I have solved this particular problem without any locking:

Rafael, I am afraid we are making too much noise, and this may confuse Alex
and Andrew.

First, we should decide how to fix the bug we have in 2.6.22. I prefer a simple
"make freezeable workqueues singlethread" I sent. It was acked by Alex, it is
simple, and it is also good because tifm doesn't need multithreaded wq anyway.

I'll comment the patch you sent below, but for a start

---
Guys, let's have a plan !!!

I do not understand what's going on. With or without recent changes in 
workqueue.c
multithreaded freezeable workqueues were broken by other changes in suspend.

It was decided we don't need them, they should go away. We even had a patch
which removed freezeable workqueues (multithreaded or not) completely. But it
conflicted with other changes in -mm, so it was deleted, and then forgotten.

I never understood why do we need freezeable workqueues. Is they needed to
synchronize with suspend? In that case, shouldn't we have some form of
notification wich allows the driver to cancel the work which should not run
at suspend time?

OK, so you think we should re-introduce them. What about incoming CPU-hotplug
changes? The goal was - again! - remove the "singlethread" parameter, make
them all freezeable, and freeze all processes to handle CPU_DEAD. In that
case we don't have any problems, we can re-introduce take_over_work() without
migrate_sequence this patch adds.

So.
- Do we need freezeable workqueues ?

- Do we need multithreaded freezeable workqueues ?

- Do we need them for 2.6.22 ?

- Should we wait for CPU-hotplug changes, or we should
  re-introduce them right now ?

> --- linux-2.6.22-rc1.orig/kernel/workqueue.c
> +++ linux-2.6.22-rc1/kernel/workqueue.c
> @@ -62,6 +62,9 @@ struct workqueue_struct {
>   const char *name;
>   int singlethread;
>   int freezeable; /* Freeze threads during suspend */
> + atomic_t work_sw;   /* Used to indicate that some work has been
> +  * moved from one CPU to another
> +  */
>  };

"work_sw" should not be atomic, and since the race is _very_ unlikely it
could be global. We had such a thing, it was called "migrate_sequence" and
it was removed.

It didn't work, but it _can_ work now because we have cpu_populated_map.
However, this needs more thinking, because it breaks cancel_work_sync()
in a very subtle way.

> +static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
> +{
> + struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
> + struct list_head list;
> + struct work_struct *work;
> +
> + spin_lock_irq(>lock);
> + list_replace_init(>worklist, );
> +
> + if (!list_empty()) {
> + /*
> +  * Tell flush_workqueue() that it should repeat the loop over
> +  * CPUs
> +  */
> + atomic_inc(>work_sw);
> + while (!list_empty()) {
> + printk("Taking work for %s\n", wq->name);
> + work = list_entry(list.next, struct work_struct, entry);
> + list_del(>entry);
> + __queue_work(per_cpu_ptr(wq->cpu_wq,
> + smp_processor_id()), work);
> + }
> + }
> + spin_unlock_irq(>lock);
> +}
> +

Suppose that we have some pending WORK1 on CPU 0 which needs some LOCK.

We take this LOCK and call cancel_work_sync(WORK2) which is running on CPU 1.
cancel_work_sync() inserts a barrier after WORK2 and waits for completion.

WORK2->func() completes.

freezer comes. cwq->thread notices TIF_FREEZE and goes to refrigerator before
executing that barrier.

CPU_DEAD_FROZEN(cpu == 1) moves that barrier to CPU 0.

thaw_processes().

DEADLOCK.

We hold the LOCK and sleep waiting for the completion of that barrier.
But there is WORK1 on queue which runs first, and needs this LOCK to
complete.

> @@ -819,20 +860,31 @@ static int __devinit workqueue_cpu_callb
>
> + case CPU_DEAD_FROZEN:
> + if (wq->freezeable) {
> + take_over_work(wq, cpu);
> + thaw_process(cwq->thread);
> + }
> + cleanup_workqueue_thread(cwq, cpu);
> + break;

If we have take_over_work() we should use it for any workqueue,
freezeable or not. Otherwise this is just a mess, imho.

Rafael, this is tricky. Probably we can fix this, but this needs more
changes. I can _try_ to do this, but not now (unless we think we need
freezeable workqueues for 2.6.22).

I have other clenaups for workqueues, but I'd prefer to do nothing
except bugfixes right now. A lot of

Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-14 Thread Oleg Nesterov

Long. Please read.

On 05/14, Rafael J. Wysocki wrote:

 I think I have solved this particular problem without any locking:

Rafael, I am afraid we are making too much noise, and this may confuse Alex
and Andrew.

First, we should decide how to fix the bug we have in 2.6.22. I prefer a simple
make freezeable workqueues singlethread I sent. It was acked by Alex, it is
simple, and it is also good because tifm doesn't need multithreaded wq anyway.

I'll comment the patch you sent below, but for a start

---
Guys, let's have a plan !!!

I do not understand what's going on. With or without recent changes in 
workqueue.c
multithreaded freezeable workqueues were broken by other changes in suspend.

It was decided we don't need them, they should go away. We even had a patch
which removed freezeable workqueues (multithreaded or not) completely. But it
conflicted with other changes in -mm, so it was deleted, and then forgotten.

I never understood why do we need freezeable workqueues. Is they needed to
synchronize with suspend? In that case, shouldn't we have some form of
notification wich allows the driver to cancel the work which should not run
at suspend time?


OK, so you think we should re-introduce them. What about incoming CPU-hotplug
changes? The goal was - again! - remove the singlethread parameter, make
them all freezeable, and freeze all processes to handle CPU_DEAD. In that
case we don't have any problems, we can re-introduce take_over_work() without
migrate_sequence this patch adds.

So.
- Do we need freezeable workqueues ?

- Do we need multithreaded freezeable workqueues ?

- Do we need them for 2.6.22 ?

- Should we wait for CPU-hotplug changes, or we should
  re-introduce them right now ?


 --- linux-2.6.22-rc1.orig/kernel/workqueue.c
 +++ linux-2.6.22-rc1/kernel/workqueue.c
 @@ -62,6 +62,9 @@ struct workqueue_struct {
   const char *name;
   int singlethread;
   int freezeable; /* Freeze threads during suspend */
 + atomic_t work_sw;   /* Used to indicate that some work has been
 +  * moved from one CPU to another
 +  */
  };

work_sw should not be atomic, and since the race is _very_ unlikely it
could be global. We had such a thing, it was called migrate_sequence and
it was removed.

It didn't work, but it _can_ work now because we have cpu_populated_map.
However, this needs more thinking, because it breaks cancel_work_sync()
in a very subtle way.

 +static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
 +{
 + struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq-cpu_wq, cpu);
 + struct list_head list;
 + struct work_struct *work;
 +
 + spin_lock_irq(cwq-lock);
 + list_replace_init(cwq-worklist, list);
 +
 + if (!list_empty(list)) {
 + /*
 +  * Tell flush_workqueue() that it should repeat the loop over
 +  * CPUs
 +  */
 + atomic_inc(wq-work_sw);
 + while (!list_empty(list)) {
 + printk(Taking work for %s\n, wq-name);
 + work = list_entry(list.next, struct work_struct, entry);
 + list_del(work-entry);
 + __queue_work(per_cpu_ptr(wq-cpu_wq,
 + smp_processor_id()), work);
 + }
 + }
 + spin_unlock_irq(cwq-lock);
 +}
 +

Suppose that we have some pending WORK1 on CPU 0 which needs some LOCK.

We take this LOCK and call cancel_work_sync(WORK2) which is running on CPU 1.
cancel_work_sync() inserts a barrier after WORK2 and waits for completion.

WORK2-func() completes.

freezer comes. cwq-thread notices TIF_FREEZE and goes to refrigerator before
executing that barrier.

CPU_DEAD_FROZEN(cpu == 1) moves that barrier to CPU 0.

thaw_processes().

DEADLOCK.

We hold the LOCK and sleep waiting for the completion of that barrier.
But there is WORK1 on queue which runs first, and needs this LOCK to
complete.

 @@ -819,20 +860,31 @@ static int __devinit workqueue_cpu_callb

 + case CPU_DEAD_FROZEN:
 + if (wq-freezeable) {
 + take_over_work(wq, cpu);
 + thaw_process(cwq-thread);
 + }
 + cleanup_workqueue_thread(cwq, cpu);
 + break;

If we have take_over_work() we should use it for any workqueue,
freezeable or not. Otherwise this is just a mess, imho.

Rafael, this is tricky. Probably we can fix this, but this needs more
changes. I can _try_ to do this, but not now (unless we think we need
freezeable workqueues for 2.6.22).

I have other clenaups for workqueues, but I'd prefer to do nothing
except bugfixes right now. A lot of non-reviewed intrusive changes
were merged.

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-14 Thread Rafael J. Wysocki

On Monday, 14 May 2007 18:55, Oleg Nesterov wrote:
 Long. Please read.
 
 On 05/14, Rafael J. Wysocki wrote:
 
  I think I have solved this particular problem without any locking:
 
 Rafael, I am afraid we are making too much noise, and this may confuse Alex
 and Andrew.
 
 First, we should decide how to fix the bug we have in 2.6.22. I prefer a 
 simple
 make freezeable workqueues singlethread I sent. It was acked by Alex, it is
 simple, and it is also good because tifm doesn't need multithreaded wq anyway.

Yes, I've already agreed with that.

 I'll comment the patch you sent below, but for a start
 
 ---
 Guys, let's have a plan 
 !!!
 
 I do not understand what's going on. With or without recent changes in 
 workqueue.c
 multithreaded freezeable workqueues were broken by other changes in suspend.
 
 It was decided we don't need them, they should go away. We even had a patch
 which removed freezeable workqueues (multithreaded or not) completely. But it
 conflicted with other changes in -mm, so it was deleted, and then forgotten.
 
 I never understood why do we need freezeable workqueues. Is they needed to
 synchronize with suspend? In that case, shouldn't we have some form of
 notification wich allows the driver to cancel the work which should not run
 at suspend time?

To make the long story short, we see some suspend-related problems that may
be attributed to workqueues used by XFS, but not directly, AFAICS.  That's why
we tried to introduce the freezability of workqueues, but I think that wasn't a
good idea.

 OK, so you think we should re-introduce them.

I just think we *might* introduce them, if there are some users.  Obviously we
have one user right now, but he only needs a singlethread workqueue, so your
small patch is the right thing to do for 2.6.22.

 What about incoming CPU-hotplug changes? The goal was - again! - remove the
 singlethread parameter, make them all freezeable, and freeze all processes
 to handle CPU_DEAD. In that case we don't have any problems, we can
 re-introduce take_over_work() without migrate_sequence this patch adds.
 
 So.
   - Do we need freezeable workqueues ?

Well, we have at least one case in which they appear to be useful.

   - Do we need multithreaded freezeable workqueues ?

Not right now, if ever.

   - Do we need them for 2.6.22 ?

Certainly not.

   - Should we wait for CPU-hotplug changes, or we should
 re-introduce them right now ?

We don't need to reintroduce freezable multithreaded workqueues right now.

  --- linux-2.6.22-rc1.orig/kernel/workqueue.c
  +++ linux-2.6.22-rc1/kernel/workqueue.c
  @@ -62,6 +62,9 @@ struct workqueue_struct {
  const char *name;
  int singlethread;
  int freezeable; /* Freeze threads during suspend */
  +   atomic_t work_sw;   /* Used to indicate that some work has been
  +* moved from one CPU to another
  +*/
   };
 
 work_sw should not be atomic, and since the race is _very_ unlikely it
 could be global. We had such a thing, it was called migrate_sequence and
 it was removed.
 
 It didn't work, but it _can_ work now because we have cpu_populated_map.
 However, this needs more thinking, because it breaks cancel_work_sync()
 in a very subtle way.
 
  +static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
  +{
  +   struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq-cpu_wq, cpu);
  +   struct list_head list;
  +   struct work_struct *work;
  +
  +   spin_lock_irq(cwq-lock);
  +   list_replace_init(cwq-worklist, list);
  +
  +   if (!list_empty(list)) {
  +   /*
  +* Tell flush_workqueue() that it should repeat the loop over
  +* CPUs
  +*/
  +   atomic_inc(wq-work_sw);
  +   while (!list_empty(list)) {
  +   printk(Taking work for %s\n, wq-name);
  +   work = list_entry(list.next, struct work_struct, entry);
  +   list_del(work-entry);
  +   __queue_work(per_cpu_ptr(wq-cpu_wq,
  +   smp_processor_id()), work);
  +   }
  +   }
  +   spin_unlock_irq(cwq-lock);
  +}
  +
 
 Suppose that we have some pending WORK1 on CPU 0 which needs some LOCK.
 
 We take this LOCK and call cancel_work_sync(WORK2) which is running on CPU 1.
 cancel_work_sync() inserts a barrier after WORK2 and waits for completion.
 
 WORK2-func() completes.
 
 freezer comes. cwq-thread notices TIF_FREEZE and goes to refrigerator before
 executing that barrier.
 
 CPU_DEAD_FROZEN(cpu == 1) moves that barrier to CPU 0.
 
 thaw_processes().
 
 DEADLOCK.
 
 We hold the LOCK and sleep waiting for the completion of that barrier.
 But there is WORK1 on queue which runs first, and needs this LOCK to
 complete.

Yeah, I need to learn more.

  @@ -819,20

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-14 Thread Oleg Nesterov

On 05/14, Rafael J. Wysocki wrote:

 On Monday, 14 May 2007 18:55, Oleg Nesterov wrote:
  
  Rafael, I am afraid we are making too much noise, and this may confuse Alex
  and Andrew.
  
  First, we should decide how to fix the bug we have in 2.6.22. I prefer a 
  simple
  make freezeable workqueues singlethread I sent. It was acked by Alex, it 
  is
  simple, and it is also good because tifm doesn't need multithreaded wq 
  anyway.
 
 Yes, I've already agreed with that.

Ah, OK, I misunderstood your message as if you propose this fix for 2.6.22.

  - Do we need freezeable workqueues ?
 
 Well, we have at least one case in which they appear to be useful.

So, in the long term, should we change this only user, or we think we better fix
freezeable wqs again?

  WORK2-func() completes.
  
  freezer comes. cwq-thread notices TIF_FREEZE and goes to refrigerator 
  before
  executing that barrier.

This is not possible. cwq-thread _must_ notice the barrier before it goes to
refrigerator.

So, given that we have cpu_populated_map we can re-introduce take_over_work()
along with migrate_sequence and thus we can fix freezeable multithreaded wqs.

  If we have take_over_work() we should use it for any workqueue,
  freezeable or not. Otherwise this is just a mess, imho.

Still, this is imho true. So we'd better do some other changes to be consistent.

---
 Yeah, I need to learn more.

No. I should read the patches more carefully before complain.

 Sorry for the confusion I made.

Rafael, it is me who have to apologize.
---

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Freezeable workqueues [Was: 2.6.22-rc1: Broken suspend on SMP with tifm]

2007-05-14 Thread Alex Dubov

 
 - Do we need freezeable workqueues ?
  
  Well, we have at least one case in which they appear to be useful.
 

I need freezeable wq exactly for the fact that they are synchronized with 
suspend/resume. My
workitem may do device_register/unregister and it can (and will be) scheduled 
from irq handler
during resume. As far as I understand, before freezeable wqs, kthreads were the 
only way to
achieve this behavior, which is less convenient.




 

Need Mail bonding?
Go to the Yahoo! Mail QA for great tips from Yahoo! Answers users.
http://answers.yahoo.com/dir/?link=listsid=396546091
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Rafael J. Wysocki

On Sunday, 13 May 2007 23:34, Oleg Nesterov wrote:
> On 05/13, Rafael J. Wysocki wrote:
> >
[--snip--]
> 
> > @@ -819,20 +843,31 @@ static int __devinit workqueue_cpu_callb
> >  
> > +
> > +   case CPU_DEAD_FROZEN:
> > +   if (wq->freezeable) {
> > +   take_over_work(wq, cpu);
> > +   thaw_process(cwq->thread);
> 
> Suppose that PF_NOFREEZE task T does flush_workqueue(), and CPU 1 has pending
> works. T does flush_cpu_workqueue(0), CPU_DEAD_FROZEN moves works from CPU 1
> to CPU 0, T does flush_cpu_workqueue(1) and finds nothing.

I think I have solved this particular problem without any locking:

Define an atomic field in workqueue_struct (let's call it 'switch'), initially
equal to 0.  Whenever work is taken from an offlined CPU, take_over_work()
increases 'switch' by 1.  In turn, flush_workqueue() reads 'switch' before
looping over CPUs and saves its value in a local variable.  On exit, it
compares the current value of 'switch' with the saved one and if they differ,
it repeats the loop over CPUs.

Updated patch follows.

Rafael

---
 kernel/workqueue.c |   58 ++---
 1 file changed, 55 insertions(+), 3 deletions(-)

Index: linux-2.6.22-rc1/kernel/workqueue.c
===
--- linux-2.6.22-rc1.orig/kernel/workqueue.c
+++ linux-2.6.22-rc1/kernel/workqueue.c
@@ -62,6 +62,9 @@ struct workqueue_struct {
const char *name;
int singlethread;
int freezeable; /* Freeze threads during suspend */
+   atomic_t work_sw;   /* Used to indicate that some work has been
+* moved from one CPU to another
+*/
 };
 
 /* All the per-cpu workqueues on the system, for hotplug cpu to add/remove
@@ -381,10 +384,15 @@ void fastcall flush_workqueue(struct wor
 {
const cpumask_t *cpu_map = wq_cpu_map(wq);
int cpu;
+   int val;
 
might_sleep();
+ start:
+   val = atomic_read(>work_sw);
for_each_cpu_mask(cpu, *cpu_map)
flush_cpu_workqueue(per_cpu_ptr(wq->cpu_wq, cpu));
+   if (unlikely(val != atomic_read(>work_sw)))
+   goto start;
 }
 EXPORT_SYMBOL_GPL(flush_workqueue);
 
@@ -710,6 +718,7 @@ struct workqueue_struct *__create_workqu
wq->name = name;
wq->singlethread = singlethread;
wq->freezeable = freezeable;
+   atomic_set(>work_sw, 0);
INIT_LIST_HEAD(>list);
 
if (singlethread) {
@@ -791,6 +800,40 @@ void destroy_workqueue(struct workqueue_
 }
 EXPORT_SYMBOL_GPL(destroy_workqueue);
 
+/**
+ * take_over_work - if the workqueue is freezable and CPUs are being taken down
+ * due to a hibernation/suspend, we need to take the work out of their worker
+ * threads, because they might need to use some devices to do the work and
+ * the devices are suspended at this point.
+ * @wq: target workqueue
+ * @cpu: CPU being offlined
+ */
+static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
+{
+   struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
+   struct list_head list;
+   struct work_struct *work;
+
+   spin_lock_irq(>lock);
+   list_replace_init(>worklist, );
+
+   if (!list_empty()) {
+   /*
+* Tell flush_workqueue() that it should repeat the loop over
+* CPUs
+*/
+   atomic_inc(>work_sw);
+   while (!list_empty()) {
+   printk("Taking work for %s\n", wq->name);
+   work = list_entry(list.next, struct work_struct, entry);
+   list_del(>entry);
+   __queue_work(per_cpu_ptr(wq->cpu_wq,
+   smp_processor_id()), work);
+   }
+   }
+   spin_unlock_irq(>lock);
+}
+
 static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
unsigned long action,
void *hcpu)
@@ -799,9 +842,7 @@ static int __devinit workqueue_cpu_callb
struct cpu_workqueue_struct *cwq;
struct workqueue_struct *wq;
 
-   action &= ~CPU_TASKS_FROZEN;
-
-   switch (action) {
+   switch (action & ~CPU_TASKS_FROZEN) {
case CPU_LOCK_ACQUIRE:
mutex_lock(_mutex);
return NOTIFY_OK;
@@ -819,20 +860,31 @@ static int __devinit workqueue_cpu_callb
 
switch (action) {
case CPU_UP_PREPARE:
+   case CPU_UP_PREPARE_FROZEN:
if (!create_workqueue_thread(cwq, cpu))
break;
printk(KERN_ERR "workqueue for %i failed\n", cpu);
return NOTIFY_BAD;
 
case CPU_ONLINE:
+   case CPU_ONLINE_FROZEN:

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Alex Dubov

I don't have any particular need in multithreaded wq, but I do need it 
freezeable. Freezeability
simplifies things quite a lot. This is ok with me:

> -#define create_freezeable_workqueue(name) __create_workqueue((name), 0, 1)
> +#define create_freezeable_workqueue(name) __create_workqueue((name), 1, 1)



 

8:00? 8:25? 8:40? Find a flick in no time 
with the Yahoo! Search movie showtime shortcut.
http://tools.search.yahoo.com/shortcuts/#news
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Oleg Nesterov

On 05/14, Rafael J. Wysocki wrote:
>
> Hmm, I guess we could add an additional mutex that would only be taken in
> flush_workqueue() and in _cpu_down()/_cpu_up() via workqueue_cpu_callback()
> with CPU_LOCK_ACQUIRE?

This will deadlock if work->func() does flush_workqueue(), because it may
run when _cpu_down() holds this lock (note that it doesn't help if we
re-introduce take_over_work()).

This is a reason why mutex_lock(_mutex) was removed from
flush_workqueue().

> It doesn't seem to be a good idea to run flush_workqueue() while CPUs are 
> being
> taken up and down anyway.

We can freeze all tasks :) Otherwise we can't forbid them to call
flush_workqueue().

flush_workqueue() is OK. create/destroy is a problem.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Rafael J. Wysocki

On Sunday, 13 May 2007 23:54, Oleg Nesterov wrote:
> On 05/13, Rafael J. Wysocki wrote:
> >
> > On Sunday, 13 May 2007 23:34, Oleg Nesterov wrote:
> > > On 05/13, Rafael J. Wysocki wrote:
> > > >
> > > > @@ -819,20 +843,31 @@ static int __devinit workqueue_cpu_callb
> > > >  
> > > > +
> > > > +   case CPU_DEAD_FROZEN:
> > > > +   if (wq->freezeable) {
> > > > +   take_over_work(wq, cpu);
> > > > +   thaw_process(cwq->thread);
> > > 
> > > Suppose that PF_NOFREEZE task T does flush_workqueue(), and CPU 1 has 
> > > pending
> > > works. T does flush_cpu_workqueue(0), CPU_DEAD_FROZEN moves works from 
> > > CPU 1
> > > to CPU 0, T does flush_cpu_workqueue(1) and finds nothing.
> > 
> > I don't think this is possible, because we've acquired workqueue_mutex in
> > _cpu_down().
> 
> Yes, we did... but flush_workqueue() doesn't take it?

I was looking at the 2.6.21 code, sorry.

Hmm, I guess we could add an additional mutex that would only be taken in
flush_workqueue() and in _cpu_down()/_cpu_up() via workqueue_cpu_callback()
with CPU_LOCK_ACQUIRE?

It doesn't seem to be a good idea to run flush_workqueue() while CPUs are being
taken up and down anyway.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Oleg Nesterov

On 05/13, Rafael J. Wysocki wrote:
>
> On Sunday, 13 May 2007 23:34, Oleg Nesterov wrote:
> > On 05/13, Rafael J. Wysocki wrote:
> > >
> > > @@ -819,20 +843,31 @@ static int __devinit workqueue_cpu_callb
> > >  
> > > +
> > > + case CPU_DEAD_FROZEN:
> > > + if (wq->freezeable) {
> > > + take_over_work(wq, cpu);
> > > + thaw_process(cwq->thread);
> > 
> > Suppose that PF_NOFREEZE task T does flush_workqueue(), and CPU 1 has 
> > pending
> > works. T does flush_cpu_workqueue(0), CPU_DEAD_FROZEN moves works from CPU 1
> > to CPU 0, T does flush_cpu_workqueue(1) and finds nothing.
> 
> I don't think this is possible, because we've acquired workqueue_mutex in
> _cpu_down().

Yes, we did... but flush_workqueue() doesn't take it?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Rafael J. Wysocki

On Sunday, 13 May 2007 23:34, Oleg Nesterov wrote:
> On 05/13, Rafael J. Wysocki wrote:
> >
> > > > On Sunday, 13 May 2007 22:30, Oleg Nesterov wrote:
> > > > >
> > > > > I think the better fix (at least for now) is
> > > > > 
> > > > >   - #define create_freezeable_workqueue(name) 
> > > > > __create_workqueue((name), 0, 1)
> > > > >   + #define create_freezeable_workqueue(name) 
> > > > > __create_workqueue((name), 1, 1)
> > > > > 
> > > > > Alex, do you really need a multithreaded wq?
> > > > > 
> > > > > Rafael, what do you think?
> > > > 
> > Sure, if a singlethread workqueue is sufficient for Alex, I agree that this
> > would be preferable.
> 
> Great. Alex?
> 
> > @@ -819,20 +843,31 @@ static int __devinit workqueue_cpu_callb
> >  
> > +
> > +   case CPU_DEAD_FROZEN:
> > +   if (wq->freezeable) {
> > +   take_over_work(wq, cpu);
> > +   thaw_process(cwq->thread);
> 
> Suppose that PF_NOFREEZE task T does flush_workqueue(), and CPU 1 has pending
> works. T does flush_cpu_workqueue(0), CPU_DEAD_FROZEN moves works from CPU 1
> to CPU 0, T does flush_cpu_workqueue(1) and finds nothing.

I don't think this is possible, because we've acquired workqueue_mutex in
_cpu_down().

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Oleg Nesterov

On 05/13, Rafael J. Wysocki wrote:
>
> > > On Sunday, 13 May 2007 22:30, Oleg Nesterov wrote:
> > > >
> > > > I think the better fix (at least for now) is
> > > > 
> > > > - #define create_freezeable_workqueue(name) 
> > > > __create_workqueue((name), 0, 1)
> > > > + #define create_freezeable_workqueue(name) 
> > > > __create_workqueue((name), 1, 1)
> > > > 
> > > > Alex, do you really need a multithreaded wq?
> > > > 
> > > > Rafael, what do you think?
> > > 
> Sure, if a singlethread workqueue is sufficient for Alex, I agree that this
> would be preferable.

Great. Alex?

> @@ -819,20 +843,31 @@ static int __devinit workqueue_cpu_callb
>  
> +
> + case CPU_DEAD_FROZEN:
> + if (wq->freezeable) {
> + take_over_work(wq, cpu);
> + thaw_process(cwq->thread);

Suppose that PF_NOFREEZE task T does flush_workqueue(), and CPU 1 has pending
works. T does flush_cpu_workqueue(0), CPU_DEAD_FROZEN moves works from CPU 1
to CPU 0, T does flush_cpu_workqueue(1) and finds nothing.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Rafael J. Wysocki

On Sunday, 13 May 2007 22:50, Oleg Nesterov wrote:
> On 05/13, Rafael J. Wysocki wrote:
> >
> > On Sunday, 13 May 2007 22:30, Oleg Nesterov wrote:
> > > > 
> > > > > --- linux-2.6.22-rc1.orig/kernel/workqueue.c
> > > > > +++ linux-2.6.22-rc1/kernel/workqueue.c
> > > > > @@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
> > > > >   struct cpu_workqueue_struct *cwq;
> > > > >   struct workqueue_struct *wq;
> > > > >  
> > > > > - action &= ~CPU_TASKS_FROZEN;
> > > > > -
> > > > > - switch (action) {
> > > > > + switch (action & ~CPU_TASKS_FROZEN) {
> > > > 
> > > > Confused. How can we see, say CPU_UP_PREPARE_FROZEN, if we cleared
> > > > CPU_TASKS_FROZEN bit?
> > > 
> > > So, unless I missed something stupid, this patch is not 100% right.
> > 
> > Well, it isn't, but for a different reason (see [*] below).
> 
> Yes, I missed something stupid :)
> 
> > > I think the better fix (at least for now) is
> > > 
> > >   - #define create_freezeable_workqueue(name) __create_workqueue((name), 
> > > 0, 1)
> > >   + #define create_freezeable_workqueue(name) __create_workqueue((name), 
> > > 1, 1)
> > > 
> > > Alex, do you really need a multithreaded wq?
> > > 
> > > Rafael, what do you think?
> > 
> > That would be misleading if the driver needs the threads to be frozen.
> 
> Hm? The thread will be frozen, the "patch" above changes "singlethread", not
> "freezeable".

Ah, I'm sorry.

> > [*] Getting back to the patch, it seems to me that we should do something 
> > like
> > take_over_work() before thawing the frozen thread, because there may be a 
> > queue
> > to process and the device is suspended at that point.
> 
> Yes, exactly because the driver wants this wq to be frozen.
> 
> So, could you take a second look at the "patch" above ?

Sure, if a singlethread workqueue is sufficient for Alex, I agree that this
would be preferable.

Anyway, I've added take_over_work() to the patch (appended), just in case. ;-)

Rafael


---
Prevent freezable worqueues from deadlocking with CPU hotplug during a
suspend/hibernation by thawing their worker threads before they get stopped.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 kernel/workqueue.c |   41 ++---
 1 file changed, 38 insertions(+), 3 deletions(-)

Index: linux-2.6.22-rc1/kernel/workqueue.c
===
--- linux-2.6.22-rc1.orig/kernel/workqueue.c
+++ linux-2.6.22-rc1/kernel/workqueue.c
@@ -791,6 +791,32 @@ void destroy_workqueue(struct workqueue_
 }
 EXPORT_SYMBOL_GPL(destroy_workqueue);
 
+/**
+ * take_over_work - if the workqueue is freezable and CPUs are being taken down
+ * due to a hibernation/suspend, we need to take the work out of their worker
+ * threads, because they might need to use some devices to do the work and
+ * the devices are suspended at this point.
+ * @wq: target workqueue
+ * @cpu: CPU being offlined
+ */
+static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
+{
+   struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq->cpu_wq, cpu);
+   struct list_head list;
+   struct work_struct *work;
+
+   spin_lock_irq(>lock);
+   list_replace_init(>worklist, );
+
+   while (!list_empty()) {
+   printk("Taking work for %s\n", wq->name);
+   work = list_entry(list.next,struct work_struct,entry);
+   list_del(>entry);
+   __queue_work(per_cpu_ptr(wq->cpu_wq, smp_processor_id()), work);
+   }
+   spin_unlock_irq(>lock);
+}
+
 static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
unsigned long action,
void *hcpu)
@@ -799,9 +825,7 @@ static int __devinit workqueue_cpu_callb
struct cpu_workqueue_struct *cwq;
struct workqueue_struct *wq;
 
-   action &= ~CPU_TASKS_FROZEN;
-
-   switch (action) {
+   switch (action & ~CPU_TASKS_FROZEN) {
case CPU_LOCK_ACQUIRE:
mutex_lock(_mutex);
return NOTIFY_OK;
@@ -819,20 +843,31 @@ static int __devinit workqueue_cpu_callb
 
switch (action) {
case CPU_UP_PREPARE:
+   case CPU_UP_PREPARE_FROZEN:
if (!create_workqueue_thread(cwq, cpu))
break;
printk(KERN_ERR "workqueue for %i failed\n", cpu);
return NOTIFY_BAD;
 
case CPU_ONLINE:
+   case CPU_ONLINE_FROZEN:
start_workqueue_thread(cwq, cpu);
break;
 
case CPU_UP_CANCELED:
+   case CPU_UP_CANCELED_FROZEN:
start_workqueue_thread(cwq, -1);
case CPU_DEAD:
cleanup_workqueue_thread(cwq, cpu);
break;
+
+   case CPU_DEAD_FROZEN:
+

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Oleg Nesterov

On 05/13, Rafael J. Wysocki wrote:
>
> On Sunday, 13 May 2007 22:30, Oleg Nesterov wrote:
> > > 
> > > > --- linux-2.6.22-rc1.orig/kernel/workqueue.c
> > > > +++ linux-2.6.22-rc1/kernel/workqueue.c
> > > > @@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
> > > > struct cpu_workqueue_struct *cwq;
> > > > struct workqueue_struct *wq;
> > > >  
> > > > -   action &= ~CPU_TASKS_FROZEN;
> > > > -
> > > > -   switch (action) {
> > > > +   switch (action & ~CPU_TASKS_FROZEN) {
> > > 
> > > Confused. How can we see, say CPU_UP_PREPARE_FROZEN, if we cleared
> > > CPU_TASKS_FROZEN bit?
> > 
> > So, unless I missed something stupid, this patch is not 100% right.
> 
> Well, it isn't, but for a different reason (see [*] below).

Yes, I missed something stupid :)

> > I think the better fix (at least for now) is
> > 
> > - #define create_freezeable_workqueue(name) __create_workqueue((name), 
> > 0, 1)
> > + #define create_freezeable_workqueue(name) __create_workqueue((name), 
> > 1, 1)
> > 
> > Alex, do you really need a multithreaded wq?
> > 
> > Rafael, what do you think?
> 
> That would be misleading if the driver needs the threads to be frozen.

Hm? The thread will be frozen, the "patch" above changes "singlethread", not
"freezeable".

> [*] Getting back to the patch, it seems to me that we should do something like
> take_over_work() before thawing the frozen thread, because there may be a 
> queue
> to process and the device is suspended at that point.

Yes, exactly because the driver wants this wq to be frozen.

So, could you take a second look at the "patch" above ?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Rafael J. Wysocki

On Sunday, 13 May 2007 22:30, Oleg Nesterov wrote:
> On 05/14, Oleg Nesterov wrote:
> >
> > On 05/13, Rafael J. Wysocki wrote:
> > > 
> > > The suspend/hibernation is broken on SMP due to:
> > > 
> > > commit 3540af8ffddcdbc7573451ac0b5cd57a2eaf8af5
> > > tifm: replace per-adapter kthread with freezeable workqueue
> > > 
> > > Well, it looks like freezable worqueues still deadlock with CPU hotplug
> > > when worker threads are frozen.
> > 
> > Ugh. I thought we deprecated create_freezeable_workqueue(), exactly
> > because suspend was changed to call _cpu_down() after freeze().
> > 
> > It is not that "looks like freezable worqueues still deadlock", it
> > is "of course, freezable worqueues deadlocks" on CPU_DEAD.
> > 
> > The ->freezeable is still here just because of incoming "cpu-hotplug
> > using freezer" rework.
> > 
> > No?
> > 
> > > --- linux-2.6.22-rc1.orig/kernel/workqueue.c
> > > +++ linux-2.6.22-rc1/kernel/workqueue.c
> > > @@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
> > >   struct cpu_workqueue_struct *cwq;
> > >   struct workqueue_struct *wq;
> > >  
> > > - action &= ~CPU_TASKS_FROZEN;
> > > -
> > > - switch (action) {
> > > + switch (action & ~CPU_TASKS_FROZEN) {
> > 
> > Confused. How can we see, say CPU_UP_PREPARE_FROZEN, if we cleared
> > CPU_TASKS_FROZEN bit?
> 
> So, unless I missed something stupid, this patch is not 100% right.

Well, it isn't, but for a different reason (see [*] below).

> I think the better fix (at least for now) is
> 
>   - #define create_freezeable_workqueue(name) __create_workqueue((name), 
> 0, 1)
>   + #define create_freezeable_workqueue(name) __create_workqueue((name), 
> 1, 1)
> 
> Alex, do you really need a multithreaded wq?
> 
> Rafael, what do you think?

That would be misleading if the driver needs the threads to be frozen.

I would prefer to revert the commit that caused the problem to appear, but it
doesn't revert cleanly and I hate to invalidate someone else's work becuase of
my own mistakes.

[*] Getting back to the patch, it seems to me that we should do something like
take_over_work() before thawing the frozen thread, because there may be a queue
to process and the device is suspended at that point.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Oleg Nesterov

On 05/14, Oleg Nesterov wrote:
>
> On 05/13, Rafael J. Wysocki wrote:
> > 
> > The suspend/hibernation is broken on SMP due to:
> > 
> > commit 3540af8ffddcdbc7573451ac0b5cd57a2eaf8af5
> > tifm: replace per-adapter kthread with freezeable workqueue
> > 
> > Well, it looks like freezable worqueues still deadlock with CPU hotplug
> > when worker threads are frozen.
> 
> Ugh. I thought we deprecated create_freezeable_workqueue(), exactly
> because suspend was changed to call _cpu_down() after freeze().
> 
> It is not that "looks like freezable worqueues still deadlock", it
> is "of course, freezable worqueues deadlocks" on CPU_DEAD.
> 
> The ->freezeable is still here just because of incoming "cpu-hotplug
> using freezer" rework.
> 
> No?
> 
> > --- linux-2.6.22-rc1.orig/kernel/workqueue.c
> > +++ linux-2.6.22-rc1/kernel/workqueue.c
> > @@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
> > struct cpu_workqueue_struct *cwq;
> > struct workqueue_struct *wq;
> >  
> > -   action &= ~CPU_TASKS_FROZEN;
> > -
> > -   switch (action) {
> > +   switch (action & ~CPU_TASKS_FROZEN) {
> 
> Confused. How can we see, say CPU_UP_PREPARE_FROZEN, if we cleared
> CPU_TASKS_FROZEN bit?

So, unless I missed something stupid, this patch is not 100% right.

I think the better fix (at least for now) is

- #define create_freezeable_workqueue(name) __create_workqueue((name), 
0, 1)
+ #define create_freezeable_workqueue(name) __create_workqueue((name), 
1, 1)

Alex, do you really need a multithreaded wq?

Rafael, what do you think?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Rafael J. Wysocki

On Sunday, 13 May 2007 22:08, Oleg Nesterov wrote:
> On 05/13, Rafael J. Wysocki wrote:
> > 
> > The suspend/hibernation is broken on SMP due to:
> > 
> > commit 3540af8ffddcdbc7573451ac0b5cd57a2eaf8af5
> > tifm: replace per-adapter kthread with freezeable workqueue
> > 
> > Well, it looks like freezable worqueues still deadlock with CPU hotplug
> > when worker threads are frozen.
> 
> Ugh. I thought we deprecated create_freezeable_workqueue(), exactly
> because suspend was changed to call _cpu_down() after freeze().

Well, apparently no one has told it to Alex ...

> It is not that "looks like freezable worqueues still deadlock", it
> is "of course, freezable worqueues deadlocks" on CPU_DEAD.
> 
> The ->freezeable is still here just because of incoming "cpu-hotplug
> using freezer" rework.
> 
> No?

Yes, but we failed to communicate that to the others clearly enough.

> > --- linux-2.6.22-rc1.orig/kernel/workqueue.c
> > +++ linux-2.6.22-rc1/kernel/workqueue.c
> > @@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
> > struct cpu_workqueue_struct *cwq;
> > struct workqueue_struct *wq;
> >  
> > -   action &= ~CPU_TASKS_FROZEN;
> > -
> > -   switch (action) {
> > +   switch (action & ~CPU_TASKS_FROZEN) {
> 
> Confused. How can we see, say CPU_UP_PREPARE_FROZEN, if we cleared
> CPU_TASKS_FROZEN bit?

There's another 'switch ()' in there where the flag is not cleared
(that's why I removed the 'action &= ~CPU_TASKS_FROZEN' above).

> > case CPU_LOCK_ACQUIRE:
> > mutex_lock(_mutex);
> > return NOTIFY_OK;
> > @@ -819,20 +817,29 @@ static int __devinit workqueue_cpu_callb
> >  
> > switch (action) {
> > case CPU_UP_PREPARE:
> > +   case CPU_UP_PREPARE_FROZEN:
> > if (!create_workqueue_thread(cwq, cpu))
> > break;
> > printk(KERN_ERR "workqueue for %i failed\n", cpu);
> > return NOTIFY_BAD;
> >  
> > case CPU_ONLINE:
> > +   case CPU_ONLINE_FROZEN:
> > start_workqueue_thread(cwq, cpu);
> > break;
> >  
> > case CPU_UP_CANCELED:
> > +   case CPU_UP_CANCELED_FROZEN:
> > start_workqueue_thread(cwq, -1);
> > case CPU_DEAD:
> > cleanup_workqueue_thread(cwq, cpu);
> > break;
> > +
> > +   case CPU_DEAD_FROZEN:
> > +   if (wq->freezeable)
> > +   thaw_process(cwq->thread);
> > +   cleanup_workqueue_thread(cwq, cpu);
> > +   break;
> > }
> > }
> 
> Minor, but can't we do
> 
>   ...
>   case CPU_UP_CANCELED:
>   case CPU_UP_CANCELED_FROZEN:
>   start_workqueue_thread(cwq, -1);
>   case CPU_DEAD_FROZEN:
>   if (wq->freezeable)
>   // we can't see PF_FROZEN if it was 
> CPU_UP_CANCELED
>   thaw_process(cwq->thread);
>   case CPU_DEAD:
>   cleanup_workqueue_thread(cwq, cpu);
>   break;
> 
> ?

Yes, we can, but that means one redundant check more in the CPU_UP_CANCELLED
path.  Besides, I prefer having different cases clearly separated if that makes
sense.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Oleg Nesterov

On 05/13, Rafael J. Wysocki wrote:
> 
> The suspend/hibernation is broken on SMP due to:
> 
> commit 3540af8ffddcdbc7573451ac0b5cd57a2eaf8af5
> tifm: replace per-adapter kthread with freezeable workqueue
> 
> Well, it looks like freezable worqueues still deadlock with CPU hotplug
> when worker threads are frozen.

Ugh. I thought we deprecated create_freezeable_workqueue(), exactly
because suspend was changed to call _cpu_down() after freeze().

It is not that "looks like freezable worqueues still deadlock", it
is "of course, freezable worqueues deadlocks" on CPU_DEAD.

The ->freezeable is still here just because of incoming "cpu-hotplug
using freezer" rework.

No?

> --- linux-2.6.22-rc1.orig/kernel/workqueue.c
> +++ linux-2.6.22-rc1/kernel/workqueue.c
> @@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
>   struct cpu_workqueue_struct *cwq;
>   struct workqueue_struct *wq;
>  
> - action &= ~CPU_TASKS_FROZEN;
> -
> - switch (action) {
> + switch (action & ~CPU_TASKS_FROZEN) {

Confused. How can we see, say CPU_UP_PREPARE_FROZEN, if we cleared
CPU_TASKS_FROZEN bit?

>   case CPU_LOCK_ACQUIRE:
>   mutex_lock(_mutex);
>   return NOTIFY_OK;
> @@ -819,20 +817,29 @@ static int __devinit workqueue_cpu_callb
>  
>   switch (action) {
>   case CPU_UP_PREPARE:
> + case CPU_UP_PREPARE_FROZEN:
>   if (!create_workqueue_thread(cwq, cpu))
>   break;
>   printk(KERN_ERR "workqueue for %i failed\n", cpu);
>   return NOTIFY_BAD;
>  
>   case CPU_ONLINE:
> + case CPU_ONLINE_FROZEN:
>   start_workqueue_thread(cwq, cpu);
>   break;
>  
>   case CPU_UP_CANCELED:
> + case CPU_UP_CANCELED_FROZEN:
>   start_workqueue_thread(cwq, -1);
>   case CPU_DEAD:
>   cleanup_workqueue_thread(cwq, cpu);
>   break;
> +
> + case CPU_DEAD_FROZEN:
> + if (wq->freezeable)
> + thaw_process(cwq->thread);
> + cleanup_workqueue_thread(cwq, cpu);
> + break;
>   }
>   }

Minor, but can't we do

...
case CPU_UP_CANCELED:
case CPU_UP_CANCELED_FROZEN:
start_workqueue_thread(cwq, -1);
case CPU_DEAD_FROZEN:
if (wq->freezeable)
// we can't see PF_FROZEN if it was 
CPU_UP_CANCELED
thaw_process(cwq->thread);
case CPU_DEAD:
cleanup_workqueue_thread(cwq, cpu);
break;

?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Rafael J. Wysocki

Hi,

The suspend/hibernation is broken on SMP due to:

commit 3540af8ffddcdbc7573451ac0b5cd57a2eaf8af5
tifm: replace per-adapter kthread with freezeable workqueue

Well, it looks like freezable worqueues still deadlock with CPU hotplug
when worker threads are frozen.

The appended patch fixes the issue, but I think we could even avoid the
killing of frozen worker threads.

Greetings,
Rafael


---
Prevent freezable worqueues from deadlocking with CPU hotplug during a
suspend/hibernation by thawing their worker threads before they get stopped.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 kernel/workqueue.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6.22-rc1/kernel/workqueue.c
===
--- linux-2.6.22-rc1.orig/kernel/workqueue.c
+++ linux-2.6.22-rc1/kernel/workqueue.c
@@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
struct cpu_workqueue_struct *cwq;
struct workqueue_struct *wq;
 
-   action &= ~CPU_TASKS_FROZEN;
-
-   switch (action) {
+   switch (action & ~CPU_TASKS_FROZEN) {
case CPU_LOCK_ACQUIRE:
mutex_lock(_mutex);
return NOTIFY_OK;
@@ -819,20 +817,29 @@ static int __devinit workqueue_cpu_callb
 
switch (action) {
case CPU_UP_PREPARE:
+   case CPU_UP_PREPARE_FROZEN:
if (!create_workqueue_thread(cwq, cpu))
break;
printk(KERN_ERR "workqueue for %i failed\n", cpu);
return NOTIFY_BAD;
 
case CPU_ONLINE:
+   case CPU_ONLINE_FROZEN:
start_workqueue_thread(cwq, cpu);
break;
 
case CPU_UP_CANCELED:
+   case CPU_UP_CANCELED_FROZEN:
start_workqueue_thread(cwq, -1);
case CPU_DEAD:
cleanup_workqueue_thread(cwq, cpu);
break;
+
+   case CPU_DEAD_FROZEN:
+   if (wq->freezeable)
+   thaw_process(cwq->thread);
+   cleanup_workqueue_thread(cwq, cpu);
+   break;
}
}
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Rafael J. Wysocki

Hi,

The suspend/hibernation is broken on SMP due to:

commit 3540af8ffddcdbc7573451ac0b5cd57a2eaf8af5
tifm: replace per-adapter kthread with freezeable workqueue

Well, it looks like freezable worqueues still deadlock with CPU hotplug
when worker threads are frozen.

The appended patch fixes the issue, but I think we could even avoid the
killing of frozen worker threads.

Greetings,
Rafael


---
Prevent freezable worqueues from deadlocking with CPU hotplug during a
suspend/hibernation by thawing their worker threads before they get stopped.

Signed-off-by: Rafael J. Wysocki [EMAIL PROTECTED]
---
 kernel/workqueue.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6.22-rc1/kernel/workqueue.c
===
--- linux-2.6.22-rc1.orig/kernel/workqueue.c
+++ linux-2.6.22-rc1/kernel/workqueue.c
@@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
struct cpu_workqueue_struct *cwq;
struct workqueue_struct *wq;
 
-   action = ~CPU_TASKS_FROZEN;
-
-   switch (action) {
+   switch (action  ~CPU_TASKS_FROZEN) {
case CPU_LOCK_ACQUIRE:
mutex_lock(workqueue_mutex);
return NOTIFY_OK;
@@ -819,20 +817,29 @@ static int __devinit workqueue_cpu_callb
 
switch (action) {
case CPU_UP_PREPARE:
+   case CPU_UP_PREPARE_FROZEN:
if (!create_workqueue_thread(cwq, cpu))
break;
printk(KERN_ERR workqueue for %i failed\n, cpu);
return NOTIFY_BAD;
 
case CPU_ONLINE:
+   case CPU_ONLINE_FROZEN:
start_workqueue_thread(cwq, cpu);
break;
 
case CPU_UP_CANCELED:
+   case CPU_UP_CANCELED_FROZEN:
start_workqueue_thread(cwq, -1);
case CPU_DEAD:
cleanup_workqueue_thread(cwq, cpu);
break;
+
+   case CPU_DEAD_FROZEN:
+   if (wq-freezeable)
+   thaw_process(cwq-thread);
+   cleanup_workqueue_thread(cwq, cpu);
+   break;
}
}
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Oleg Nesterov

On 05/13, Rafael J. Wysocki wrote:
 
 The suspend/hibernation is broken on SMP due to:
 
 commit 3540af8ffddcdbc7573451ac0b5cd57a2eaf8af5
 tifm: replace per-adapter kthread with freezeable workqueue
 
 Well, it looks like freezable worqueues still deadlock with CPU hotplug
 when worker threads are frozen.

Ugh. I thought we deprecated create_freezeable_workqueue(), exactly
because suspend was changed to call _cpu_down() after freeze().

It is not that looks like freezable worqueues still deadlock, it
is of course, freezable worqueues deadlocks on CPU_DEAD.

The -freezeable is still here just because of incoming cpu-hotplug
using freezer rework.

No?

 --- linux-2.6.22-rc1.orig/kernel/workqueue.c
 +++ linux-2.6.22-rc1/kernel/workqueue.c
 @@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
   struct cpu_workqueue_struct *cwq;
   struct workqueue_struct *wq;
  
 - action = ~CPU_TASKS_FROZEN;
 -
 - switch (action) {
 + switch (action  ~CPU_TASKS_FROZEN) {

Confused. How can we see, say CPU_UP_PREPARE_FROZEN, if we cleared
CPU_TASKS_FROZEN bit?

   case CPU_LOCK_ACQUIRE:
   mutex_lock(workqueue_mutex);
   return NOTIFY_OK;
 @@ -819,20 +817,29 @@ static int __devinit workqueue_cpu_callb
  
   switch (action) {
   case CPU_UP_PREPARE:
 + case CPU_UP_PREPARE_FROZEN:
   if (!create_workqueue_thread(cwq, cpu))
   break;
   printk(KERN_ERR workqueue for %i failed\n, cpu);
   return NOTIFY_BAD;
  
   case CPU_ONLINE:
 + case CPU_ONLINE_FROZEN:
   start_workqueue_thread(cwq, cpu);
   break;
  
   case CPU_UP_CANCELED:
 + case CPU_UP_CANCELED_FROZEN:
   start_workqueue_thread(cwq, -1);
   case CPU_DEAD:
   cleanup_workqueue_thread(cwq, cpu);
   break;
 +
 + case CPU_DEAD_FROZEN:
 + if (wq-freezeable)
 + thaw_process(cwq-thread);
 + cleanup_workqueue_thread(cwq, cpu);
 + break;
   }
   }

Minor, but can't we do

...
case CPU_UP_CANCELED:
case CPU_UP_CANCELED_FROZEN:
start_workqueue_thread(cwq, -1);
case CPU_DEAD_FROZEN:
if (wq-freezeable)
// we can't see PF_FROZEN if it was 
CPU_UP_CANCELED
thaw_process(cwq-thread);
case CPU_DEAD:
cleanup_workqueue_thread(cwq, cpu);
break;

?

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Rafael J. Wysocki

On Sunday, 13 May 2007 22:08, Oleg Nesterov wrote:
 On 05/13, Rafael J. Wysocki wrote:
  
  The suspend/hibernation is broken on SMP due to:
  
  commit 3540af8ffddcdbc7573451ac0b5cd57a2eaf8af5
  tifm: replace per-adapter kthread with freezeable workqueue
  
  Well, it looks like freezable worqueues still deadlock with CPU hotplug
  when worker threads are frozen.
 
 Ugh. I thought we deprecated create_freezeable_workqueue(), exactly
 because suspend was changed to call _cpu_down() after freeze().

Well, apparently no one has told it to Alex ...

 It is not that looks like freezable worqueues still deadlock, it
 is of course, freezable worqueues deadlocks on CPU_DEAD.
 
 The -freezeable is still here just because of incoming cpu-hotplug
 using freezer rework.
 
 No?

Yes, but we failed to communicate that to the others clearly enough.

  --- linux-2.6.22-rc1.orig/kernel/workqueue.c
  +++ linux-2.6.22-rc1/kernel/workqueue.c
  @@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
  struct cpu_workqueue_struct *cwq;
  struct workqueue_struct *wq;
   
  -   action = ~CPU_TASKS_FROZEN;
  -
  -   switch (action) {
  +   switch (action  ~CPU_TASKS_FROZEN) {
 
 Confused. How can we see, say CPU_UP_PREPARE_FROZEN, if we cleared
 CPU_TASKS_FROZEN bit?

There's another 'switch ()' in there where the flag is not cleared
(that's why I removed the 'action = ~CPU_TASKS_FROZEN' above).

  case CPU_LOCK_ACQUIRE:
  mutex_lock(workqueue_mutex);
  return NOTIFY_OK;
  @@ -819,20 +817,29 @@ static int __devinit workqueue_cpu_callb
   
  switch (action) {
  case CPU_UP_PREPARE:
  +   case CPU_UP_PREPARE_FROZEN:
  if (!create_workqueue_thread(cwq, cpu))
  break;
  printk(KERN_ERR workqueue for %i failed\n, cpu);
  return NOTIFY_BAD;
   
  case CPU_ONLINE:
  +   case CPU_ONLINE_FROZEN:
  start_workqueue_thread(cwq, cpu);
  break;
   
  case CPU_UP_CANCELED:
  +   case CPU_UP_CANCELED_FROZEN:
  start_workqueue_thread(cwq, -1);
  case CPU_DEAD:
  cleanup_workqueue_thread(cwq, cpu);
  break;
  +
  +   case CPU_DEAD_FROZEN:
  +   if (wq-freezeable)
  +   thaw_process(cwq-thread);
  +   cleanup_workqueue_thread(cwq, cpu);
  +   break;
  }
  }
 
 Minor, but can't we do
 
   ...
   case CPU_UP_CANCELED:
   case CPU_UP_CANCELED_FROZEN:
   start_workqueue_thread(cwq, -1);
   case CPU_DEAD_FROZEN:
   if (wq-freezeable)
   // we can't see PF_FROZEN if it was 
 CPU_UP_CANCELED
   thaw_process(cwq-thread);
   case CPU_DEAD:
   cleanup_workqueue_thread(cwq, cpu);
   break;
 
 ?

Yes, we can, but that means one redundant check more in the CPU_UP_CANCELLED
path.  Besides, I prefer having different cases clearly separated if that makes
sense.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Oleg Nesterov

On 05/14, Oleg Nesterov wrote:

 On 05/13, Rafael J. Wysocki wrote:
  
  The suspend/hibernation is broken on SMP due to:
  
  commit 3540af8ffddcdbc7573451ac0b5cd57a2eaf8af5
  tifm: replace per-adapter kthread with freezeable workqueue
  
  Well, it looks like freezable worqueues still deadlock with CPU hotplug
  when worker threads are frozen.
 
 Ugh. I thought we deprecated create_freezeable_workqueue(), exactly
 because suspend was changed to call _cpu_down() after freeze().
 
 It is not that looks like freezable worqueues still deadlock, it
 is of course, freezable worqueues deadlocks on CPU_DEAD.
 
 The -freezeable is still here just because of incoming cpu-hotplug
 using freezer rework.
 
 No?
 
  --- linux-2.6.22-rc1.orig/kernel/workqueue.c
  +++ linux-2.6.22-rc1/kernel/workqueue.c
  @@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
  struct cpu_workqueue_struct *cwq;
  struct workqueue_struct *wq;
   
  -   action = ~CPU_TASKS_FROZEN;
  -
  -   switch (action) {
  +   switch (action  ~CPU_TASKS_FROZEN) {
 
 Confused. How can we see, say CPU_UP_PREPARE_FROZEN, if we cleared
 CPU_TASKS_FROZEN bit?

So, unless I missed something stupid, this patch is not 100% right.

I think the better fix (at least for now) is

- #define create_freezeable_workqueue(name) __create_workqueue((name), 
0, 1)
+ #define create_freezeable_workqueue(name) __create_workqueue((name), 
1, 1)

Alex, do you really need a multithreaded wq?

Rafael, what do you think?

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Rafael J. Wysocki

On Sunday, 13 May 2007 22:30, Oleg Nesterov wrote:
 On 05/14, Oleg Nesterov wrote:
 
  On 05/13, Rafael J. Wysocki wrote:
   
   The suspend/hibernation is broken on SMP due to:
   
   commit 3540af8ffddcdbc7573451ac0b5cd57a2eaf8af5
   tifm: replace per-adapter kthread with freezeable workqueue
   
   Well, it looks like freezable worqueues still deadlock with CPU hotplug
   when worker threads are frozen.
  
  Ugh. I thought we deprecated create_freezeable_workqueue(), exactly
  because suspend was changed to call _cpu_down() after freeze().
  
  It is not that looks like freezable worqueues still deadlock, it
  is of course, freezable worqueues deadlocks on CPU_DEAD.
  
  The -freezeable is still here just because of incoming cpu-hotplug
  using freezer rework.
  
  No?
  
   --- linux-2.6.22-rc1.orig/kernel/workqueue.c
   +++ linux-2.6.22-rc1/kernel/workqueue.c
   @@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
 struct cpu_workqueue_struct *cwq;
 struct workqueue_struct *wq;

   - action = ~CPU_TASKS_FROZEN;
   -
   - switch (action) {
   + switch (action  ~CPU_TASKS_FROZEN) {
  
  Confused. How can we see, say CPU_UP_PREPARE_FROZEN, if we cleared
  CPU_TASKS_FROZEN bit?
 
 So, unless I missed something stupid, this patch is not 100% right.

Well, it isn't, but for a different reason (see [*] below).

 I think the better fix (at least for now) is
 
   - #define create_freezeable_workqueue(name) __create_workqueue((name), 
 0, 1)
   + #define create_freezeable_workqueue(name) __create_workqueue((name), 
 1, 1)
 
 Alex, do you really need a multithreaded wq?
 
 Rafael, what do you think?

That would be misleading if the driver needs the threads to be frozen.

I would prefer to revert the commit that caused the problem to appear, but it
doesn't revert cleanly and I hate to invalidate someone else's work becuase of
my own mistakes.

[*] Getting back to the patch, it seems to me that we should do something like
take_over_work() before thawing the frozen thread, because there may be a queue
to process and the device is suspended at that point.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Oleg Nesterov

On 05/13, Rafael J. Wysocki wrote:

 On Sunday, 13 May 2007 22:30, Oleg Nesterov wrote:
   
--- linux-2.6.22-rc1.orig/kernel/workqueue.c
+++ linux-2.6.22-rc1/kernel/workqueue.c
@@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
struct cpu_workqueue_struct *cwq;
struct workqueue_struct *wq;
 
-   action = ~CPU_TASKS_FROZEN;
-
-   switch (action) {
+   switch (action  ~CPU_TASKS_FROZEN) {
   
   Confused. How can we see, say CPU_UP_PREPARE_FROZEN, if we cleared
   CPU_TASKS_FROZEN bit?
  
  So, unless I missed something stupid, this patch is not 100% right.
 
 Well, it isn't, but for a different reason (see [*] below).

Yes, I missed something stupid :)

  I think the better fix (at least for now) is
  
  - #define create_freezeable_workqueue(name) __create_workqueue((name), 
  0, 1)
  + #define create_freezeable_workqueue(name) __create_workqueue((name), 
  1, 1)
  
  Alex, do you really need a multithreaded wq?
  
  Rafael, what do you think?
 
 That would be misleading if the driver needs the threads to be frozen.

Hm? The thread will be frozen, the patch above changes singlethread, not
freezeable.

 [*] Getting back to the patch, it seems to me that we should do something like
 take_over_work() before thawing the frozen thread, because there may be a 
 queue
 to process and the device is suspended at that point.

Yes, exactly because the driver wants this wq to be frozen.

So, could you take a second look at the patch above ?

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Rafael J. Wysocki

On Sunday, 13 May 2007 22:50, Oleg Nesterov wrote:
 On 05/13, Rafael J. Wysocki wrote:
 
  On Sunday, 13 May 2007 22:30, Oleg Nesterov wrote:

 --- linux-2.6.22-rc1.orig/kernel/workqueue.c
 +++ linux-2.6.22-rc1/kernel/workqueue.c
 @@ -799,9 +799,7 @@ static int __devinit workqueue_cpu_callb
   struct cpu_workqueue_struct *cwq;
   struct workqueue_struct *wq;
  
 - action = ~CPU_TASKS_FROZEN;
 -
 - switch (action) {
 + switch (action  ~CPU_TASKS_FROZEN) {

Confused. How can we see, say CPU_UP_PREPARE_FROZEN, if we cleared
CPU_TASKS_FROZEN bit?
   
   So, unless I missed something stupid, this patch is not 100% right.
  
  Well, it isn't, but for a different reason (see [*] below).
 
 Yes, I missed something stupid :)
 
   I think the better fix (at least for now) is
   
 - #define create_freezeable_workqueue(name) __create_workqueue((name), 
   0, 1)
 + #define create_freezeable_workqueue(name) __create_workqueue((name), 
   1, 1)
   
   Alex, do you really need a multithreaded wq?
   
   Rafael, what do you think?
  
  That would be misleading if the driver needs the threads to be frozen.
 
 Hm? The thread will be frozen, the patch above changes singlethread, not
 freezeable.

Ah, I'm sorry.

  [*] Getting back to the patch, it seems to me that we should do something 
  like
  take_over_work() before thawing the frozen thread, because there may be a 
  queue
  to process and the device is suspended at that point.
 
 Yes, exactly because the driver wants this wq to be frozen.
 
 So, could you take a second look at the patch above ?

Sure, if a singlethread workqueue is sufficient for Alex, I agree that this
would be preferable.

Anyway, I've added take_over_work() to the patch (appended), just in case. ;-)

Rafael


---
Prevent freezable worqueues from deadlocking with CPU hotplug during a
suspend/hibernation by thawing their worker threads before they get stopped.

Signed-off-by: Rafael J. Wysocki [EMAIL PROTECTED]
---
 kernel/workqueue.c |   41 ++---
 1 file changed, 38 insertions(+), 3 deletions(-)

Index: linux-2.6.22-rc1/kernel/workqueue.c
===
--- linux-2.6.22-rc1.orig/kernel/workqueue.c
+++ linux-2.6.22-rc1/kernel/workqueue.c
@@ -791,6 +791,32 @@ void destroy_workqueue(struct workqueue_
 }
 EXPORT_SYMBOL_GPL(destroy_workqueue);
 
+/**
+ * take_over_work - if the workqueue is freezable and CPUs are being taken down
+ * due to a hibernation/suspend, we need to take the work out of their worker
+ * threads, because they might need to use some devices to do the work and
+ * the devices are suspended at this point.
+ * @wq: target workqueue
+ * @cpu: CPU being offlined
+ */
+static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
+{
+   struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq-cpu_wq, cpu);
+   struct list_head list;
+   struct work_struct *work;
+
+   spin_lock_irq(cwq-lock);
+   list_replace_init(cwq-worklist, list);
+
+   while (!list_empty(list)) {
+   printk(Taking work for %s\n, wq-name);
+   work = list_entry(list.next,struct work_struct,entry);
+   list_del(work-entry);
+   __queue_work(per_cpu_ptr(wq-cpu_wq, smp_processor_id()), work);
+   }
+   spin_unlock_irq(cwq-lock);
+}
+
 static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
unsigned long action,
void *hcpu)
@@ -799,9 +825,7 @@ static int __devinit workqueue_cpu_callb
struct cpu_workqueue_struct *cwq;
struct workqueue_struct *wq;
 
-   action = ~CPU_TASKS_FROZEN;
-
-   switch (action) {
+   switch (action  ~CPU_TASKS_FROZEN) {
case CPU_LOCK_ACQUIRE:
mutex_lock(workqueue_mutex);
return NOTIFY_OK;
@@ -819,20 +843,31 @@ static int __devinit workqueue_cpu_callb
 
switch (action) {
case CPU_UP_PREPARE:
+   case CPU_UP_PREPARE_FROZEN:
if (!create_workqueue_thread(cwq, cpu))
break;
printk(KERN_ERR workqueue for %i failed\n, cpu);
return NOTIFY_BAD;
 
case CPU_ONLINE:
+   case CPU_ONLINE_FROZEN:
start_workqueue_thread(cwq, cpu);
break;
 
case CPU_UP_CANCELED:
+   case CPU_UP_CANCELED_FROZEN:
start_workqueue_thread(cwq, -1);
case CPU_DEAD:
cleanup_workqueue_thread(cwq, cpu);
break;
+
+   case CPU_DEAD_FROZEN:
+   if (wq-freezeable) {
+   take_over_work(wq, cpu);
+

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Oleg Nesterov

On 05/13, Rafael J. Wysocki wrote:

   On Sunday, 13 May 2007 22:30, Oleg Nesterov wrote:
   
I think the better fix (at least for now) is

- #define create_freezeable_workqueue(name) 
__create_workqueue((name), 0, 1)
+ #define create_freezeable_workqueue(name) 
__create_workqueue((name), 1, 1)

Alex, do you really need a multithreaded wq?

Rafael, what do you think?
   
 Sure, if a singlethread workqueue is sufficient for Alex, I agree that this
 would be preferable.

Great. Alex?

 @@ -819,20 +843,31 @@ static int __devinit workqueue_cpu_callb
  
 +
 + case CPU_DEAD_FROZEN:
 + if (wq-freezeable) {
 + take_over_work(wq, cpu);
 + thaw_process(cwq-thread);

Suppose that PF_NOFREEZE task T does flush_workqueue(), and CPU 1 has pending
works. T does flush_cpu_workqueue(0), CPU_DEAD_FROZEN moves works from CPU 1
to CPU 0, T does flush_cpu_workqueue(1) and finds nothing.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Rafael J. Wysocki

On Sunday, 13 May 2007 23:34, Oleg Nesterov wrote:
 On 05/13, Rafael J. Wysocki wrote:
 
On Sunday, 13 May 2007 22:30, Oleg Nesterov wrote:

 I think the better fix (at least for now) is
 
   - #define create_freezeable_workqueue(name) 
 __create_workqueue((name), 0, 1)
   + #define create_freezeable_workqueue(name) 
 __create_workqueue((name), 1, 1)
 
 Alex, do you really need a multithreaded wq?
 
 Rafael, what do you think?

  Sure, if a singlethread workqueue is sufficient for Alex, I agree that this
  would be preferable.
 
 Great. Alex?
 
  @@ -819,20 +843,31 @@ static int __devinit workqueue_cpu_callb
   
  +
  +   case CPU_DEAD_FROZEN:
  +   if (wq-freezeable) {
  +   take_over_work(wq, cpu);
  +   thaw_process(cwq-thread);
 
 Suppose that PF_NOFREEZE task T does flush_workqueue(), and CPU 1 has pending
 works. T does flush_cpu_workqueue(0), CPU_DEAD_FROZEN moves works from CPU 1
 to CPU 0, T does flush_cpu_workqueue(1) and finds nothing.

I don't think this is possible, because we've acquired workqueue_mutex in
_cpu_down().

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Oleg Nesterov

On 05/13, Rafael J. Wysocki wrote:

 On Sunday, 13 May 2007 23:34, Oleg Nesterov wrote:
  On 05/13, Rafael J. Wysocki wrote:
  
   @@ -819,20 +843,31 @@ static int __devinit workqueue_cpu_callb

   +
   + case CPU_DEAD_FROZEN:
   + if (wq-freezeable) {
   + take_over_work(wq, cpu);
   + thaw_process(cwq-thread);
  
  Suppose that PF_NOFREEZE task T does flush_workqueue(), and CPU 1 has 
  pending
  works. T does flush_cpu_workqueue(0), CPU_DEAD_FROZEN moves works from CPU 1
  to CPU 0, T does flush_cpu_workqueue(1) and finds nothing.
 
 I don't think this is possible, because we've acquired workqueue_mutex in
 _cpu_down().

Yes, we did... but flush_workqueue() doesn't take it?

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Rafael J. Wysocki

On Sunday, 13 May 2007 23:54, Oleg Nesterov wrote:
 On 05/13, Rafael J. Wysocki wrote:
 
  On Sunday, 13 May 2007 23:34, Oleg Nesterov wrote:
   On 05/13, Rafael J. Wysocki wrote:
   
@@ -819,20 +843,31 @@ static int __devinit workqueue_cpu_callb
 
+
+   case CPU_DEAD_FROZEN:
+   if (wq-freezeable) {
+   take_over_work(wq, cpu);
+   thaw_process(cwq-thread);
   
   Suppose that PF_NOFREEZE task T does flush_workqueue(), and CPU 1 has 
   pending
   works. T does flush_cpu_workqueue(0), CPU_DEAD_FROZEN moves works from 
   CPU 1
   to CPU 0, T does flush_cpu_workqueue(1) and finds nothing.
  
  I don't think this is possible, because we've acquired workqueue_mutex in
  _cpu_down().
 
 Yes, we did... but flush_workqueue() doesn't take it?

I was looking at the 2.6.21 code, sorry.

Hmm, I guess we could add an additional mutex that would only be taken in
flush_workqueue() and in _cpu_down()/_cpu_up() via workqueue_cpu_callback()
with CPU_LOCK_ACQUIRE?

It doesn't seem to be a good idea to run flush_workqueue() while CPUs are being
taken up and down anyway.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Oleg Nesterov

On 05/14, Rafael J. Wysocki wrote:

 Hmm, I guess we could add an additional mutex that would only be taken in
 flush_workqueue() and in _cpu_down()/_cpu_up() via workqueue_cpu_callback()
 with CPU_LOCK_ACQUIRE?

This will deadlock if work-func() does flush_workqueue(), because it may
run when _cpu_down() holds this lock (note that it doesn't help if we
re-introduce take_over_work()).

This is a reason why mutex_lock(workqueue_mutex) was removed from
flush_workqueue().

 It doesn't seem to be a good idea to run flush_workqueue() while CPUs are 
 being
 taken up and down anyway.

We can freeze all tasks :) Otherwise we can't forbid them to call
flush_workqueue().

flush_workqueue() is OK. create/destroy is a problem.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Alex Dubov

I don't have any particular need in multithreaded wq, but I do need it 
freezeable. Freezeability
simplifies things quite a lot. This is ok with me:

 -#define create_freezeable_workqueue(name) __create_workqueue((name), 0, 1)
 +#define create_freezeable_workqueue(name) __create_workqueue((name), 1, 1)



 

8:00? 8:25? 8:40? Find a flick in no time 
with the Yahoo! Search movie showtime shortcut.
http://tools.search.yahoo.com/shortcuts/#news
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1: Broken suspend on SMP with tifm

2007-05-13 Thread Rafael J. Wysocki

On Sunday, 13 May 2007 23:34, Oleg Nesterov wrote:
 On 05/13, Rafael J. Wysocki wrote:
 
[--snip--]
 
  @@ -819,20 +843,31 @@ static int __devinit workqueue_cpu_callb
   
  +
  +   case CPU_DEAD_FROZEN:
  +   if (wq-freezeable) {
  +   take_over_work(wq, cpu);
  +   thaw_process(cwq-thread);
 
 Suppose that PF_NOFREEZE task T does flush_workqueue(), and CPU 1 has pending
 works. T does flush_cpu_workqueue(0), CPU_DEAD_FROZEN moves works from CPU 1
 to CPU 0, T does flush_cpu_workqueue(1) and finds nothing.

I think I have solved this particular problem without any locking:

Define an atomic field in workqueue_struct (let's call it 'switch'), initially
equal to 0.  Whenever work is taken from an offlined CPU, take_over_work()
increases 'switch' by 1.  In turn, flush_workqueue() reads 'switch' before
looping over CPUs and saves its value in a local variable.  On exit, it
compares the current value of 'switch' with the saved one and if they differ,
it repeats the loop over CPUs.

Updated patch follows.

Rafael

---
 kernel/workqueue.c |   58 ++---
 1 file changed, 55 insertions(+), 3 deletions(-)

Index: linux-2.6.22-rc1/kernel/workqueue.c
===
--- linux-2.6.22-rc1.orig/kernel/workqueue.c
+++ linux-2.6.22-rc1/kernel/workqueue.c
@@ -62,6 +62,9 @@ struct workqueue_struct {
const char *name;
int singlethread;
int freezeable; /* Freeze threads during suspend */
+   atomic_t work_sw;   /* Used to indicate that some work has been
+* moved from one CPU to another
+*/
 };
 
 /* All the per-cpu workqueues on the system, for hotplug cpu to add/remove
@@ -381,10 +384,15 @@ void fastcall flush_workqueue(struct wor
 {
const cpumask_t *cpu_map = wq_cpu_map(wq);
int cpu;
+   int val;
 
might_sleep();
+ start:
+   val = atomic_read(wq-work_sw);
for_each_cpu_mask(cpu, *cpu_map)
flush_cpu_workqueue(per_cpu_ptr(wq-cpu_wq, cpu));
+   if (unlikely(val != atomic_read(wq-work_sw)))
+   goto start;
 }
 EXPORT_SYMBOL_GPL(flush_workqueue);
 
@@ -710,6 +718,7 @@ struct workqueue_struct *__create_workqu
wq-name = name;
wq-singlethread = singlethread;
wq-freezeable = freezeable;
+   atomic_set(wq-work_sw, 0);
INIT_LIST_HEAD(wq-list);
 
if (singlethread) {
@@ -791,6 +800,40 @@ void destroy_workqueue(struct workqueue_
 }
 EXPORT_SYMBOL_GPL(destroy_workqueue);
 
+/**
+ * take_over_work - if the workqueue is freezable and CPUs are being taken down
+ * due to a hibernation/suspend, we need to take the work out of their worker
+ * threads, because they might need to use some devices to do the work and
+ * the devices are suspended at this point.
+ * @wq: target workqueue
+ * @cpu: CPU being offlined
+ */
+static void take_over_work(struct workqueue_struct *wq, unsigned int cpu)
+{
+   struct cpu_workqueue_struct *cwq = per_cpu_ptr(wq-cpu_wq, cpu);
+   struct list_head list;
+   struct work_struct *work;
+
+   spin_lock_irq(cwq-lock);
+   list_replace_init(cwq-worklist, list);
+
+   if (!list_empty(list)) {
+   /*
+* Tell flush_workqueue() that it should repeat the loop over
+* CPUs
+*/
+   atomic_inc(wq-work_sw);
+   while (!list_empty(list)) {
+   printk(Taking work for %s\n, wq-name);
+   work = list_entry(list.next, struct work_struct, entry);
+   list_del(work-entry);
+   __queue_work(per_cpu_ptr(wq-cpu_wq,
+   smp_processor_id()), work);
+   }
+   }
+   spin_unlock_irq(cwq-lock);
+}
+
 static int __devinit workqueue_cpu_callback(struct notifier_block *nfb,
unsigned long action,
void *hcpu)
@@ -799,9 +842,7 @@ static int __devinit workqueue_cpu_callb
struct cpu_workqueue_struct *cwq;
struct workqueue_struct *wq;
 
-   action = ~CPU_TASKS_FROZEN;
-
-   switch (action) {
+   switch (action  ~CPU_TASKS_FROZEN) {
case CPU_LOCK_ACQUIRE:
mutex_lock(workqueue_mutex);
return NOTIFY_OK;
@@ -819,20 +860,31 @@ static int __devinit workqueue_cpu_callb
 
switch (action) {
case CPU_UP_PREPARE:
+   case CPU_UP_PREPARE_FROZEN:
if (!create_workqueue_thread(cwq, cpu))
break;
printk(KERN_ERR workqueue for %i failed\n, cpu);
return NOTIFY_BAD;
 
case CPU_ONLINE:
+   case CPU_ONLINE_FROZEN:

48 matches

Mail list logo