Re: [PATCH 04/10 V4] workqueue: add manage_workers_slowpath()

2012-09-05 Thread Lai Jiangshan
On 09/05/2012 09:12 AM, Tejun Heo wrote:
> Hello, Lai.
> 
> On Sun, Sep 02, 2012 at 12:28:22AM +0800, Lai Jiangshan wrote:
>> If hotplug code grabbed the manager_mutex and worker_thread try to create
>> a worker, the manage_worker() will return false and worker_thread go to
>> process work items. Now, on the CPU, all workers are processing work items,
>> no idle_worker left/ready for managing. It breaks the concept of workqueue
>> and it is bug.
>>
>> So when manage_worker() failed to grab the manager_mutex, it should
>> try to enter normal process contex and then compete on the manager_mutex
>> instead of return false.
>>
>> To safely do this, we add manage_workers_slowpath() and the worker
>> go to process work items mode to do the managing jobs. thus
>> managing jobs are processed via work item and can free to compete
>> on manager_mutex.
> 
> Ummm this seems overly complicated.  How about scheduling rebind
> work to a worker and forcing it to break out of the work processing
> loop?  I think it can be done fairly easily using POOL_MANAGE_WORKERS
> - set it from the rebind function, break out of work processing loop
> if it's set, replace need_to_manage_workers() with POOL_MANAGE_WORKERS
> test (the function really isn't necessary) and always jump back to
> recheck afterwards.  It might need a bit more mangling here and there
> but that should be the essence of it.  I'll give a stab at it later
> today.
> 

This approach is a little like my unsent approach3.(I will explain in other 
mail)
This approach is most complicated and changing more code if it is implemented.

First we should rebind/unbind separated by pool. because,
if we queue the rebind-work to high-pri pool, we will break normal-pool
vice versa

and this forces us move DISASSOCIATED to pool-flags.
and this forces us add more code in cpu-notify

second, reuse POOL_MANAGE_WORKERS, or add new one.

third, need to restruct of rebind/unbind and change a lot in worker_thread.

my partial/unsent approach3 has almost the same problem.
(different, my approach3 don't use work item, it is checked and called from
the "recheck" label of worker_thread. it is called with WORKER_PREP bit set
and it uses "mutex_trylock" to grab lock like manage_workers())

how much code will be changed for only unbind part of this approach:


 kernel/workqueue.c |  103 ++--
 1 files changed, 76 insertions(+), 27 deletions(-)


Thanks
Lai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/10 V4] workqueue: add manage_workers_slowpath()

2012-09-05 Thread Lai Jiangshan
On 09/05/2012 09:12 AM, Tejun Heo wrote:
 Hello, Lai.
 
 On Sun, Sep 02, 2012 at 12:28:22AM +0800, Lai Jiangshan wrote:
 If hotplug code grabbed the manager_mutex and worker_thread try to create
 a worker, the manage_worker() will return false and worker_thread go to
 process work items. Now, on the CPU, all workers are processing work items,
 no idle_worker left/ready for managing. It breaks the concept of workqueue
 and it is bug.

 So when manage_worker() failed to grab the manager_mutex, it should
 try to enter normal process contex and then compete on the manager_mutex
 instead of return false.

 To safely do this, we add manage_workers_slowpath() and the worker
 go to process work items mode to do the managing jobs. thus
 managing jobs are processed via work item and can free to compete
 on manager_mutex.
 
 Ummm this seems overly complicated.  How about scheduling rebind
 work to a worker and forcing it to break out of the work processing
 loop?  I think it can be done fairly easily using POOL_MANAGE_WORKERS
 - set it from the rebind function, break out of work processing loop
 if it's set, replace need_to_manage_workers() with POOL_MANAGE_WORKERS
 test (the function really isn't necessary) and always jump back to
 recheck afterwards.  It might need a bit more mangling here and there
 but that should be the essence of it.  I'll give a stab at it later
 today.
 

This approach is a little like my unsent approach3.(I will explain in other 
mail)
This approach is most complicated and changing more code if it is implemented.

First we should rebind/unbind separated by pool. because,
if we queue the rebind-work to high-pri pool, we will break normal-pool
vice versa

and this forces us move DISASSOCIATED to pool-flags.
and this forces us add more code in cpu-notify

second, reuse POOL_MANAGE_WORKERS, or add new one.

third, need to restruct of rebind/unbind and change a lot in worker_thread.

my partial/unsent approach3 has almost the same problem.
(different, my approach3 don't use work item, it is checked and called from
the recheck label of worker_thread. it is called with WORKER_PREP bit set
and it uses mutex_trylock to grab lock like manage_workers())

how much code will be changed for only unbind part of this approach:


 kernel/workqueue.c |  103 ++--
 1 files changed, 76 insertions(+), 27 deletions(-)


Thanks
Lai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/10 V4] workqueue: add manage_workers_slowpath()

2012-09-04 Thread Tejun Heo
Hello, Lai.

On Sun, Sep 02, 2012 at 12:28:22AM +0800, Lai Jiangshan wrote:
> If hotplug code grabbed the manager_mutex and worker_thread try to create
> a worker, the manage_worker() will return false and worker_thread go to
> process work items. Now, on the CPU, all workers are processing work items,
> no idle_worker left/ready for managing. It breaks the concept of workqueue
> and it is bug.
> 
> So when manage_worker() failed to grab the manager_mutex, it should
> try to enter normal process contex and then compete on the manager_mutex
> instead of return false.
> 
> To safely do this, we add manage_workers_slowpath() and the worker
> go to process work items mode to do the managing jobs. thus
> managing jobs are processed via work item and can free to compete
> on manager_mutex.

Ummm this seems overly complicated.  How about scheduling rebind
work to a worker and forcing it to break out of the work processing
loop?  I think it can be done fairly easily using POOL_MANAGE_WORKERS
- set it from the rebind function, break out of work processing loop
if it's set, replace need_to_manage_workers() with POOL_MANAGE_WORKERS
test (the function really isn't necessary) and always jump back to
recheck afterwards.  It might need a bit more mangling here and there
but that should be the essence of it.  I'll give a stab at it later
today.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/10 V4] workqueue: add manage_workers_slowpath()

2012-09-04 Thread Tejun Heo
Hello, Lai.

On Sun, Sep 02, 2012 at 12:28:22AM +0800, Lai Jiangshan wrote:
 If hotplug code grabbed the manager_mutex and worker_thread try to create
 a worker, the manage_worker() will return false and worker_thread go to
 process work items. Now, on the CPU, all workers are processing work items,
 no idle_worker left/ready for managing. It breaks the concept of workqueue
 and it is bug.
 
 So when manage_worker() failed to grab the manager_mutex, it should
 try to enter normal process contex and then compete on the manager_mutex
 instead of return false.
 
 To safely do this, we add manage_workers_slowpath() and the worker
 go to process work items mode to do the managing jobs. thus
 managing jobs are processed via work item and can free to compete
 on manager_mutex.

Ummm this seems overly complicated.  How about scheduling rebind
work to a worker and forcing it to break out of the work processing
loop?  I think it can be done fairly easily using POOL_MANAGE_WORKERS
- set it from the rebind function, break out of work processing loop
if it's set, replace need_to_manage_workers() with POOL_MANAGE_WORKERS
test (the function really isn't necessary) and always jump back to
recheck afterwards.  It might need a bit more mangling here and there
but that should be the essence of it.  I'll give a stab at it later
today.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/10 V4] workqueue: add manage_workers_slowpath()

2012-09-01 Thread Lai Jiangshan
If hotplug code grabbed the manager_mutex and worker_thread try to create
a worker, the manage_worker() will return false and worker_thread go to
process work items. Now, on the CPU, all workers are processing work items,
no idle_worker left/ready for managing. It breaks the concept of workqueue
and it is bug.

So when manage_worker() failed to grab the manager_mutex, it should
try to enter normal process contex and then compete on the manager_mutex
instead of return false.

To safely do this, we add manage_workers_slowpath() and the worker
go to process work items mode to do the managing jobs. thus
managing jobs are processed via work item and can free to compete
on manager_mutex.

After this patch, manager_mutex can be grabbed anywhere if needed,
it will not cause the CPU consumes all the idle worker_threads.

By the way, POOL_MANAGING_WORKERS is still need to tell us
why manage_workers() failed to grab the manage_mutex.

This slowpath is hard to trigger, so I change
"if (unlikely(!mutex_trylock(>manager_mutex)))"
to "if (1 || unlikely(!mutex_trylock(>manager_mutex)))"
when testing, it uses manage_workers_slowpath() always.


Signed-off-by: Lai Jiangshan 
---
 kernel/workqueue.c |   89 ++-
 1 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 979ef4f..d40e8d7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1808,6 +1808,81 @@ static bool maybe_destroy_workers(struct worker_pool 
*pool)
return ret;
 }
 
+/* manage workers via work item */
+static void manage_workers_slowpath_fn(struct work_struct *work)
+{
+   struct worker *worker = kthread_data(current);
+   struct worker_pool *pool = worker->pool;
+
+   mutex_lock(>manager_mutex);
+   spin_lock_irq(>gcwq->lock);
+
+   pool->flags &= ~POOL_MANAGE_WORKERS;
+   maybe_destroy_workers(pool);
+   maybe_create_worker(pool);
+
+   spin_unlock_irq(>gcwq->lock);
+   mutex_unlock(>manager_mutex);
+}
+
+static void process_scheduled_works(struct worker *worker);
+
+/*
+ * manage_workers_slowpath - manage worker pool via work item
+ * @worker: self
+ *
+ * manage workers when rebind_workers() or gcwq_unbind_fn() beat us down
+ * on manage_mutex. The worker can't release the gcwq->lock and then
+ * compete on manage_mutex, because any worker must have at least one of:
+ * 1) with gcwq->lock held
+ * 2) with pool->manage_mutex held (manage_workers() fast path)
+ * 3) queued on idle_list
+ * 4) processing work item and queued on busy hash table
+ *
+ * So we move the managing worker job to a work item and process it,
+ * thus the manage_workers_slowpath_fn() has full ability to compete
+ * on manage_mutex.
+ *
+ * CONTEXT:
+ * with WORKER_PREP bit set
+ * spin_lock_irq(gcwq->lock) which will be released and regrabbed
+ * multiple times.  Does GFP_KERNEL allocations.
+ */
+static void manage_workers_slowpath(struct worker *worker)
+{
+   struct worker_pool *pool = worker->pool;
+   struct work_struct manage_work;
+   int cpu = pool->gcwq->cpu;
+   struct cpu_workqueue_struct *cwq;
+
+   pool->flags |= POOL_MANAGING_WORKERS;
+
+   INIT_WORK_ONSTACK(_work, manage_workers_slowpath_fn);
+   __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(_work));
+
+   /* see the comment of the same statement of worker_thread() */
+   BUG_ON(!list_empty(>scheduled));
+
+   /* wq doesn't matter, use the default one */
+   if (cpu == WORK_CPU_UNBOUND)
+   cwq = get_cwq(cpu, system_unbound_wq);
+   else
+   cwq = get_cwq(cpu, system_wq);
+
+   /* insert the work to the worker's own scheduled list */
+   debug_work_activate(_work);
+   insert_work(cwq, _work, >scheduled,
+   work_color_to_flags(WORK_NO_COLOR));
+
+   /*
+* Do manage workers. And may also proccess busy_worker_rebind_fn()
+* queued by rebind_workers().
+*/
+   process_scheduled_works(worker);
+
+   pool->flags &= ~POOL_MANAGING_WORKERS;
+}
+
 /**
  * manage_workers - manage worker pool
  * @worker: self
@@ -1833,8 +1908,18 @@ static bool manage_workers(struct worker *worker)
struct worker_pool *pool = worker->pool;
bool ret = false;
 
-   if (!mutex_trylock(>manager_mutex))
-   return ret;
+   if (pool->flags & POOL_MANAGING_WORKERS)
+   return false;
+
+   if (unlikely(!mutex_trylock(>manager_mutex))) {
+   /*
+* Ouch! rebind_workers() or gcwq_unbind_fn() beats we,
+* but we can't return without making any progress.
+* Fall back to manage_workers_slowpath().
+*/
+   manage_workers_slowpath(worker);
+   return true;
+   }
 
pool->flags &= ~POOL_MANAGE_WORKERS;
pool->flags |= POOL_MANAGING_WORKERS;
-- 
1.7.4.4

--
To unsubscribe from this list: send 

[PATCH 04/10 V4] workqueue: add manage_workers_slowpath()

2012-09-01 Thread Lai Jiangshan
If hotplug code grabbed the manager_mutex and worker_thread try to create
a worker, the manage_worker() will return false and worker_thread go to
process work items. Now, on the CPU, all workers are processing work items,
no idle_worker left/ready for managing. It breaks the concept of workqueue
and it is bug.

So when manage_worker() failed to grab the manager_mutex, it should
try to enter normal process contex and then compete on the manager_mutex
instead of return false.

To safely do this, we add manage_workers_slowpath() and the worker
go to process work items mode to do the managing jobs. thus
managing jobs are processed via work item and can free to compete
on manager_mutex.

After this patch, manager_mutex can be grabbed anywhere if needed,
it will not cause the CPU consumes all the idle worker_threads.

By the way, POOL_MANAGING_WORKERS is still need to tell us
why manage_workers() failed to grab the manage_mutex.

This slowpath is hard to trigger, so I change
if (unlikely(!mutex_trylock(pool-manager_mutex)))
to if (1 || unlikely(!mutex_trylock(pool-manager_mutex)))
when testing, it uses manage_workers_slowpath() always.


Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
---
 kernel/workqueue.c |   89 ++-
 1 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 979ef4f..d40e8d7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1808,6 +1808,81 @@ static bool maybe_destroy_workers(struct worker_pool 
*pool)
return ret;
 }
 
+/* manage workers via work item */
+static void manage_workers_slowpath_fn(struct work_struct *work)
+{
+   struct worker *worker = kthread_data(current);
+   struct worker_pool *pool = worker-pool;
+
+   mutex_lock(pool-manager_mutex);
+   spin_lock_irq(pool-gcwq-lock);
+
+   pool-flags = ~POOL_MANAGE_WORKERS;
+   maybe_destroy_workers(pool);
+   maybe_create_worker(pool);
+
+   spin_unlock_irq(pool-gcwq-lock);
+   mutex_unlock(pool-manager_mutex);
+}
+
+static void process_scheduled_works(struct worker *worker);
+
+/*
+ * manage_workers_slowpath - manage worker pool via work item
+ * @worker: self
+ *
+ * manage workers when rebind_workers() or gcwq_unbind_fn() beat us down
+ * on manage_mutex. The worker can't release the gcwq-lock and then
+ * compete on manage_mutex, because any worker must have at least one of:
+ * 1) with gcwq-lock held
+ * 2) with pool-manage_mutex held (manage_workers() fast path)
+ * 3) queued on idle_list
+ * 4) processing work item and queued on busy hash table
+ *
+ * So we move the managing worker job to a work item and process it,
+ * thus the manage_workers_slowpath_fn() has full ability to compete
+ * on manage_mutex.
+ *
+ * CONTEXT:
+ * with WORKER_PREP bit set
+ * spin_lock_irq(gcwq-lock) which will be released and regrabbed
+ * multiple times.  Does GFP_KERNEL allocations.
+ */
+static void manage_workers_slowpath(struct worker *worker)
+{
+   struct worker_pool *pool = worker-pool;
+   struct work_struct manage_work;
+   int cpu = pool-gcwq-cpu;
+   struct cpu_workqueue_struct *cwq;
+
+   pool-flags |= POOL_MANAGING_WORKERS;
+
+   INIT_WORK_ONSTACK(manage_work, manage_workers_slowpath_fn);
+   __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(manage_work));
+
+   /* see the comment of the same statement of worker_thread() */
+   BUG_ON(!list_empty(worker-scheduled));
+
+   /* wq doesn't matter, use the default one */
+   if (cpu == WORK_CPU_UNBOUND)
+   cwq = get_cwq(cpu, system_unbound_wq);
+   else
+   cwq = get_cwq(cpu, system_wq);
+
+   /* insert the work to the worker's own scheduled list */
+   debug_work_activate(manage_work);
+   insert_work(cwq, manage_work, worker-scheduled,
+   work_color_to_flags(WORK_NO_COLOR));
+
+   /*
+* Do manage workers. And may also proccess busy_worker_rebind_fn()
+* queued by rebind_workers().
+*/
+   process_scheduled_works(worker);
+
+   pool-flags = ~POOL_MANAGING_WORKERS;
+}
+
 /**
  * manage_workers - manage worker pool
  * @worker: self
@@ -1833,8 +1908,18 @@ static bool manage_workers(struct worker *worker)
struct worker_pool *pool = worker-pool;
bool ret = false;
 
-   if (!mutex_trylock(pool-manager_mutex))
-   return ret;
+   if (pool-flags  POOL_MANAGING_WORKERS)
+   return false;
+
+   if (unlikely(!mutex_trylock(pool-manager_mutex))) {
+   /*
+* Ouch! rebind_workers() or gcwq_unbind_fn() beats we,
+* but we can't return without making any progress.
+* Fall back to manage_workers_slowpath().
+*/
+   manage_workers_slowpath(worker);
+   return true;
+   }
 
pool-flags = ~POOL_MANAGE_WORKERS;
pool-flags |=