Re: [PATCH 04/10 V4] workqueue: add manage_workers_slowpath()
On 09/05/2012 09:12 AM, Tejun Heo wrote: > Hello, Lai. > > On Sun, Sep 02, 2012 at 12:28:22AM +0800, Lai Jiangshan wrote: >> If hotplug code grabbed the manager_mutex and worker_thread try to create >> a worker, the manage_worker() will return false and worker_thread go to >> process work items. Now, on the CPU, all workers are processing work items, >> no idle_worker left/ready for managing. It breaks the concept of workqueue >> and it is bug. >> >> So when manage_worker() failed to grab the manager_mutex, it should >> try to enter normal process contex and then compete on the manager_mutex >> instead of return false. >> >> To safely do this, we add manage_workers_slowpath() and the worker >> go to process work items mode to do the managing jobs. thus >> managing jobs are processed via work item and can free to compete >> on manager_mutex. > > Ummm this seems overly complicated. How about scheduling rebind > work to a worker and forcing it to break out of the work processing > loop? I think it can be done fairly easily using POOL_MANAGE_WORKERS > - set it from the rebind function, break out of work processing loop > if it's set, replace need_to_manage_workers() with POOL_MANAGE_WORKERS > test (the function really isn't necessary) and always jump back to > recheck afterwards. It might need a bit more mangling here and there > but that should be the essence of it. I'll give a stab at it later > today. > This approach is a little like my unsent approach3.(I will explain in other mail) This approach is most complicated and changing more code if it is implemented. First we should rebind/unbind separated by pool. because, if we queue the rebind-work to high-pri pool, we will break normal-pool vice versa and this forces us move DISASSOCIATED to pool-flags. and this forces us add more code in cpu-notify second, reuse POOL_MANAGE_WORKERS, or add new one. third, need to restruct of rebind/unbind and change a lot in worker_thread. my partial/unsent approach3 has almost the same problem. (different, my approach3 don't use work item, it is checked and called from the "recheck" label of worker_thread. it is called with WORKER_PREP bit set and it uses "mutex_trylock" to grab lock like manage_workers()) how much code will be changed for only unbind part of this approach: kernel/workqueue.c | 103 ++-- 1 files changed, 76 insertions(+), 27 deletions(-) Thanks Lai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 04/10 V4] workqueue: add manage_workers_slowpath()
On 09/05/2012 09:12 AM, Tejun Heo wrote: Hello, Lai. On Sun, Sep 02, 2012 at 12:28:22AM +0800, Lai Jiangshan wrote: If hotplug code grabbed the manager_mutex and worker_thread try to create a worker, the manage_worker() will return false and worker_thread go to process work items. Now, on the CPU, all workers are processing work items, no idle_worker left/ready for managing. It breaks the concept of workqueue and it is bug. So when manage_worker() failed to grab the manager_mutex, it should try to enter normal process contex and then compete on the manager_mutex instead of return false. To safely do this, we add manage_workers_slowpath() and the worker go to process work items mode to do the managing jobs. thus managing jobs are processed via work item and can free to compete on manager_mutex. Ummm this seems overly complicated. How about scheduling rebind work to a worker and forcing it to break out of the work processing loop? I think it can be done fairly easily using POOL_MANAGE_WORKERS - set it from the rebind function, break out of work processing loop if it's set, replace need_to_manage_workers() with POOL_MANAGE_WORKERS test (the function really isn't necessary) and always jump back to recheck afterwards. It might need a bit more mangling here and there but that should be the essence of it. I'll give a stab at it later today. This approach is a little like my unsent approach3.(I will explain in other mail) This approach is most complicated and changing more code if it is implemented. First we should rebind/unbind separated by pool. because, if we queue the rebind-work to high-pri pool, we will break normal-pool vice versa and this forces us move DISASSOCIATED to pool-flags. and this forces us add more code in cpu-notify second, reuse POOL_MANAGE_WORKERS, or add new one. third, need to restruct of rebind/unbind and change a lot in worker_thread. my partial/unsent approach3 has almost the same problem. (different, my approach3 don't use work item, it is checked and called from the recheck label of worker_thread. it is called with WORKER_PREP bit set and it uses mutex_trylock to grab lock like manage_workers()) how much code will be changed for only unbind part of this approach: kernel/workqueue.c | 103 ++-- 1 files changed, 76 insertions(+), 27 deletions(-) Thanks Lai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 04/10 V4] workqueue: add manage_workers_slowpath()
Hello, Lai. On Sun, Sep 02, 2012 at 12:28:22AM +0800, Lai Jiangshan wrote: > If hotplug code grabbed the manager_mutex and worker_thread try to create > a worker, the manage_worker() will return false and worker_thread go to > process work items. Now, on the CPU, all workers are processing work items, > no idle_worker left/ready for managing. It breaks the concept of workqueue > and it is bug. > > So when manage_worker() failed to grab the manager_mutex, it should > try to enter normal process contex and then compete on the manager_mutex > instead of return false. > > To safely do this, we add manage_workers_slowpath() and the worker > go to process work items mode to do the managing jobs. thus > managing jobs are processed via work item and can free to compete > on manager_mutex. Ummm this seems overly complicated. How about scheduling rebind work to a worker and forcing it to break out of the work processing loop? I think it can be done fairly easily using POOL_MANAGE_WORKERS - set it from the rebind function, break out of work processing loop if it's set, replace need_to_manage_workers() with POOL_MANAGE_WORKERS test (the function really isn't necessary) and always jump back to recheck afterwards. It might need a bit more mangling here and there but that should be the essence of it. I'll give a stab at it later today. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 04/10 V4] workqueue: add manage_workers_slowpath()
Hello, Lai. On Sun, Sep 02, 2012 at 12:28:22AM +0800, Lai Jiangshan wrote: If hotplug code grabbed the manager_mutex and worker_thread try to create a worker, the manage_worker() will return false and worker_thread go to process work items. Now, on the CPU, all workers are processing work items, no idle_worker left/ready for managing. It breaks the concept of workqueue and it is bug. So when manage_worker() failed to grab the manager_mutex, it should try to enter normal process contex and then compete on the manager_mutex instead of return false. To safely do this, we add manage_workers_slowpath() and the worker go to process work items mode to do the managing jobs. thus managing jobs are processed via work item and can free to compete on manager_mutex. Ummm this seems overly complicated. How about scheduling rebind work to a worker and forcing it to break out of the work processing loop? I think it can be done fairly easily using POOL_MANAGE_WORKERS - set it from the rebind function, break out of work processing loop if it's set, replace need_to_manage_workers() with POOL_MANAGE_WORKERS test (the function really isn't necessary) and always jump back to recheck afterwards. It might need a bit more mangling here and there but that should be the essence of it. I'll give a stab at it later today. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/10 V4] workqueue: add manage_workers_slowpath()
If hotplug code grabbed the manager_mutex and worker_thread try to create a worker, the manage_worker() will return false and worker_thread go to process work items. Now, on the CPU, all workers are processing work items, no idle_worker left/ready for managing. It breaks the concept of workqueue and it is bug. So when manage_worker() failed to grab the manager_mutex, it should try to enter normal process contex and then compete on the manager_mutex instead of return false. To safely do this, we add manage_workers_slowpath() and the worker go to process work items mode to do the managing jobs. thus managing jobs are processed via work item and can free to compete on manager_mutex. After this patch, manager_mutex can be grabbed anywhere if needed, it will not cause the CPU consumes all the idle worker_threads. By the way, POOL_MANAGING_WORKERS is still need to tell us why manage_workers() failed to grab the manage_mutex. This slowpath is hard to trigger, so I change "if (unlikely(!mutex_trylock(>manager_mutex)))" to "if (1 || unlikely(!mutex_trylock(>manager_mutex)))" when testing, it uses manage_workers_slowpath() always. Signed-off-by: Lai Jiangshan --- kernel/workqueue.c | 89 ++- 1 files changed, 87 insertions(+), 2 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 979ef4f..d40e8d7 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1808,6 +1808,81 @@ static bool maybe_destroy_workers(struct worker_pool *pool) return ret; } +/* manage workers via work item */ +static void manage_workers_slowpath_fn(struct work_struct *work) +{ + struct worker *worker = kthread_data(current); + struct worker_pool *pool = worker->pool; + + mutex_lock(>manager_mutex); + spin_lock_irq(>gcwq->lock); + + pool->flags &= ~POOL_MANAGE_WORKERS; + maybe_destroy_workers(pool); + maybe_create_worker(pool); + + spin_unlock_irq(>gcwq->lock); + mutex_unlock(>manager_mutex); +} + +static void process_scheduled_works(struct worker *worker); + +/* + * manage_workers_slowpath - manage worker pool via work item + * @worker: self + * + * manage workers when rebind_workers() or gcwq_unbind_fn() beat us down + * on manage_mutex. The worker can't release the gcwq->lock and then + * compete on manage_mutex, because any worker must have at least one of: + * 1) with gcwq->lock held + * 2) with pool->manage_mutex held (manage_workers() fast path) + * 3) queued on idle_list + * 4) processing work item and queued on busy hash table + * + * So we move the managing worker job to a work item and process it, + * thus the manage_workers_slowpath_fn() has full ability to compete + * on manage_mutex. + * + * CONTEXT: + * with WORKER_PREP bit set + * spin_lock_irq(gcwq->lock) which will be released and regrabbed + * multiple times. Does GFP_KERNEL allocations. + */ +static void manage_workers_slowpath(struct worker *worker) +{ + struct worker_pool *pool = worker->pool; + struct work_struct manage_work; + int cpu = pool->gcwq->cpu; + struct cpu_workqueue_struct *cwq; + + pool->flags |= POOL_MANAGING_WORKERS; + + INIT_WORK_ONSTACK(_work, manage_workers_slowpath_fn); + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(_work)); + + /* see the comment of the same statement of worker_thread() */ + BUG_ON(!list_empty(>scheduled)); + + /* wq doesn't matter, use the default one */ + if (cpu == WORK_CPU_UNBOUND) + cwq = get_cwq(cpu, system_unbound_wq); + else + cwq = get_cwq(cpu, system_wq); + + /* insert the work to the worker's own scheduled list */ + debug_work_activate(_work); + insert_work(cwq, _work, >scheduled, + work_color_to_flags(WORK_NO_COLOR)); + + /* +* Do manage workers. And may also proccess busy_worker_rebind_fn() +* queued by rebind_workers(). +*/ + process_scheduled_works(worker); + + pool->flags &= ~POOL_MANAGING_WORKERS; +} + /** * manage_workers - manage worker pool * @worker: self @@ -1833,8 +1908,18 @@ static bool manage_workers(struct worker *worker) struct worker_pool *pool = worker->pool; bool ret = false; - if (!mutex_trylock(>manager_mutex)) - return ret; + if (pool->flags & POOL_MANAGING_WORKERS) + return false; + + if (unlikely(!mutex_trylock(>manager_mutex))) { + /* +* Ouch! rebind_workers() or gcwq_unbind_fn() beats we, +* but we can't return without making any progress. +* Fall back to manage_workers_slowpath(). +*/ + manage_workers_slowpath(worker); + return true; + } pool->flags &= ~POOL_MANAGE_WORKERS; pool->flags |= POOL_MANAGING_WORKERS; -- 1.7.4.4 -- To unsubscribe from this list: send
[PATCH 04/10 V4] workqueue: add manage_workers_slowpath()
If hotplug code grabbed the manager_mutex and worker_thread try to create a worker, the manage_worker() will return false and worker_thread go to process work items. Now, on the CPU, all workers are processing work items, no idle_worker left/ready for managing. It breaks the concept of workqueue and it is bug. So when manage_worker() failed to grab the manager_mutex, it should try to enter normal process contex and then compete on the manager_mutex instead of return false. To safely do this, we add manage_workers_slowpath() and the worker go to process work items mode to do the managing jobs. thus managing jobs are processed via work item and can free to compete on manager_mutex. After this patch, manager_mutex can be grabbed anywhere if needed, it will not cause the CPU consumes all the idle worker_threads. By the way, POOL_MANAGING_WORKERS is still need to tell us why manage_workers() failed to grab the manage_mutex. This slowpath is hard to trigger, so I change if (unlikely(!mutex_trylock(pool-manager_mutex))) to if (1 || unlikely(!mutex_trylock(pool-manager_mutex))) when testing, it uses manage_workers_slowpath() always. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com --- kernel/workqueue.c | 89 ++- 1 files changed, 87 insertions(+), 2 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 979ef4f..d40e8d7 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1808,6 +1808,81 @@ static bool maybe_destroy_workers(struct worker_pool *pool) return ret; } +/* manage workers via work item */ +static void manage_workers_slowpath_fn(struct work_struct *work) +{ + struct worker *worker = kthread_data(current); + struct worker_pool *pool = worker-pool; + + mutex_lock(pool-manager_mutex); + spin_lock_irq(pool-gcwq-lock); + + pool-flags = ~POOL_MANAGE_WORKERS; + maybe_destroy_workers(pool); + maybe_create_worker(pool); + + spin_unlock_irq(pool-gcwq-lock); + mutex_unlock(pool-manager_mutex); +} + +static void process_scheduled_works(struct worker *worker); + +/* + * manage_workers_slowpath - manage worker pool via work item + * @worker: self + * + * manage workers when rebind_workers() or gcwq_unbind_fn() beat us down + * on manage_mutex. The worker can't release the gcwq-lock and then + * compete on manage_mutex, because any worker must have at least one of: + * 1) with gcwq-lock held + * 2) with pool-manage_mutex held (manage_workers() fast path) + * 3) queued on idle_list + * 4) processing work item and queued on busy hash table + * + * So we move the managing worker job to a work item and process it, + * thus the manage_workers_slowpath_fn() has full ability to compete + * on manage_mutex. + * + * CONTEXT: + * with WORKER_PREP bit set + * spin_lock_irq(gcwq-lock) which will be released and regrabbed + * multiple times. Does GFP_KERNEL allocations. + */ +static void manage_workers_slowpath(struct worker *worker) +{ + struct worker_pool *pool = worker-pool; + struct work_struct manage_work; + int cpu = pool-gcwq-cpu; + struct cpu_workqueue_struct *cwq; + + pool-flags |= POOL_MANAGING_WORKERS; + + INIT_WORK_ONSTACK(manage_work, manage_workers_slowpath_fn); + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(manage_work)); + + /* see the comment of the same statement of worker_thread() */ + BUG_ON(!list_empty(worker-scheduled)); + + /* wq doesn't matter, use the default one */ + if (cpu == WORK_CPU_UNBOUND) + cwq = get_cwq(cpu, system_unbound_wq); + else + cwq = get_cwq(cpu, system_wq); + + /* insert the work to the worker's own scheduled list */ + debug_work_activate(manage_work); + insert_work(cwq, manage_work, worker-scheduled, + work_color_to_flags(WORK_NO_COLOR)); + + /* +* Do manage workers. And may also proccess busy_worker_rebind_fn() +* queued by rebind_workers(). +*/ + process_scheduled_works(worker); + + pool-flags = ~POOL_MANAGING_WORKERS; +} + /** * manage_workers - manage worker pool * @worker: self @@ -1833,8 +1908,18 @@ static bool manage_workers(struct worker *worker) struct worker_pool *pool = worker-pool; bool ret = false; - if (!mutex_trylock(pool-manager_mutex)) - return ret; + if (pool-flags POOL_MANAGING_WORKERS) + return false; + + if (unlikely(!mutex_trylock(pool-manager_mutex))) { + /* +* Ouch! rebind_workers() or gcwq_unbind_fn() beats we, +* but we can't return without making any progress. +* Fall back to manage_workers_slowpath(). +*/ + manage_workers_slowpath(worker); + return true; + } pool-flags = ~POOL_MANAGE_WORKERS; pool-flags |=