Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
On Tue, Jun 21, 2016 at 09:47:19PM +0200, Peter Zijlstra wrote: > On Tue, Jun 21, 2016 at 03:43:56PM -0400, Tejun Heo wrote: > > On Tue, Jun 21, 2016 at 09:37:09PM +0200, Peter Zijlstra wrote: > > > Hurm.. So I've applied it, just to get this issue sorted, but I'm not > > > entirely sure I like it. > > > > > > I think I prefer ego's version because that makes it harder to get stuff > > > to run on !active,online cpus. I think we really want to be careful what > > > gets to run during that state. > > > > The original patch just did set_cpus_allowed one more time late enough > > so that the target kthread (in most cases) doesn't have to go through > > fallback rq selection afterwards. I don't know what the long term > > solution is but CPU_ONLINE callbacks should be able to bind kthreads > > to the new CPU one way or the other. > > Fair enough; clearly I need to stare harder. In any case, patch is on > its way to sched/urgent. Thanks Tejun, Peter! > -- Regards gautham. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
On Tue, Jun 21, 2016 at 03:43:56PM -0400, Tejun Heo wrote: > On Tue, Jun 21, 2016 at 09:37:09PM +0200, Peter Zijlstra wrote: > > Hurm.. So I've applied it, just to get this issue sorted, but I'm not > > entirely sure I like it. > > > > I think I prefer ego's version because that makes it harder to get stuff > > to run on !active,online cpus. I think we really want to be careful what > > gets to run during that state. > > The original patch just did set_cpus_allowed one more time late enough > so that the target kthread (in most cases) doesn't have to go through > fallback rq selection afterwards. I don't know what the long term > solution is but CPU_ONLINE callbacks should be able to bind kthreads > to the new CPU one way or the other. Fair enough; clearly I need to stare harder. In any case, patch is on its way to sched/urgent. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
On Tue, Jun 21, 2016 at 09:37:09PM +0200, Peter Zijlstra wrote: > Hurm.. So I've applied it, just to get this issue sorted, but I'm not > entirely sure I like it. > > I think I prefer ego's version because that makes it harder to get stuff > to run on !active,online cpus. I think we really want to be careful what > gets to run during that state. The original patch just did set_cpus_allowed one more time late enough so that the target kthread (in most cases) doesn't have to go through fallback rq selection afterwards. I don't know what the long term solution is but CPU_ONLINE callbacks should be able to bind kthreads to the new CPU one way or the other. Thanks. -- tejun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
On Tue, Jun 21, 2016 at 11:36:51AM -0400, Tejun Heo wrote: > On Tue, Jun 21, 2016 at 07:42:31PM +0530, Gautham R Shenoy wrote: > > > Subject: [PATCH] sched: allow kthreads to fallback to online && !active > > > cpus > > > > > > During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is > > > online but not active. A CPU_ONLINE callback may create or bind a > > > kthread so that its cpus_allowed mask only allows the CPU which is > > > being brought online. The kthread may start executing before the CPU > > > is made active and can end up in select_fallback_rq(). > > > > > > In such cases, the expected behavior is selecting the CPU which is > > > coming online; however, because select_fallback_rq() only chooses from > > > active CPUs, it determines that the task doesn't have any viable CPU > > > in its allowed mask and ends up overriding it to cpu_possible_mask. > > > > > > CPU_ONLINE callbacks should be able to put kthreads on the CPU which > > > is coming online. Update select_fallback_rq() so that it follows > > > cpu_online() rather than cpu_active() for kthreads. > > > > > > Signed-off-by: Tejun Heo> > > Reported-by: Gautham R Shenoy > > > > Hi Tejun, > > > > This patch fixes the issue on POWER. I am able to see the worker > > threads of the unbound workqueues of the newly onlined node with this. > > > > Tested-by: Gautham R. Shenoy > > Peter? Hurm.. So I've applied it, just to get this issue sorted, but I'm not entirely sure I like it. I think I prefer ego's version because that makes it harder to get stuff to run on !active,online cpus. I think we really want to be careful what gets to run during that state. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
On Tue, Jun 21, 2016 at 07:42:31PM +0530, Gautham R Shenoy wrote: > > Subject: [PATCH] sched: allow kthreads to fallback to online && !active cpus > > > > During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is > > online but not active. A CPU_ONLINE callback may create or bind a > > kthread so that its cpus_allowed mask only allows the CPU which is > > being brought online. The kthread may start executing before the CPU > > is made active and can end up in select_fallback_rq(). > > > > In such cases, the expected behavior is selecting the CPU which is > > coming online; however, because select_fallback_rq() only chooses from > > active CPUs, it determines that the task doesn't have any viable CPU > > in its allowed mask and ends up overriding it to cpu_possible_mask. > > > > CPU_ONLINE callbacks should be able to put kthreads on the CPU which > > is coming online. Update select_fallback_rq() so that it follows > > cpu_online() rather than cpu_active() for kthreads. > > > > Signed-off-by: Tejun Heo> > Reported-by: Gautham R Shenoy > > Hi Tejun, > > This patch fixes the issue on POWER. I am able to see the worker > threads of the unbound workqueues of the newly onlined node with this. > > Tested-by: Gautham R. Shenoy Peter? -- tejun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
Hi Tejun, On Thu, Jun 16, 2016 at 03:35:04PM -0400, Tejun Heo wrote: > Hello, > > So, the issue of the initial worker not having its affinity set > correctly wasn't caused by the order of the operations. Reordering > just made set_cpus_allowed tried one more time late enough so that it > hides the race condition most of the time. The problem is that > CPU_ONLINE callbacks are called while the cpu being onlined is online > but not active and select_fallback_rq() only considers active cpus, so > if a kthread gets scheduled in the meantime and it doesn't have any > cpu which is active in its allowed mask, it's allowed mask gets reset > to cpu_possible_mask. > > Would something like the following make sense? > > Thanks. > -- 8< -- > Subject: [PATCH] sched: allow kthreads to fallback to online && !active cpus > > During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is > online but not active. A CPU_ONLINE callback may create or bind a > kthread so that its cpus_allowed mask only allows the CPU which is > being brought online. The kthread may start executing before the CPU > is made active and can end up in select_fallback_rq(). > > In such cases, the expected behavior is selecting the CPU which is > coming online; however, because select_fallback_rq() only chooses from > active CPUs, it determines that the task doesn't have any viable CPU > in its allowed mask and ends up overriding it to cpu_possible_mask. > > CPU_ONLINE callbacks should be able to put kthreads on the CPU which > is coming online. Update select_fallback_rq() so that it follows > cpu_online() rather than cpu_active() for kthreads. > > Signed-off-by: Tejun Heo> Reported-by: Gautham R Shenoy Hi Tejun, This patch fixes the issue on POWER. I am able to see the worker threads of the unbound workqueues of the newly onlined node with this. Tested-by: Gautham R. Shenoy > --- > kernel/sched/core.c |4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 017d539..a12e3db 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1536,7 +1536,9 @@ static int select_fallback_rq(int cpu, struct > task_struct *p) > for (;;) { > /* Any allowed, online CPU? */ > for_each_cpu(dest_cpu, tsk_cpus_allowed(p)) { > - if (!cpu_active(dest_cpu)) > + if (!(p->flags & PF_KTHREAD) && !cpu_active(dest_cpu)) > + continue; > + if (!cpu_online(dest_cpu)) > continue; > goto out; > } > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
Hello, So, the issue of the initial worker not having its affinity set correctly wasn't caused by the order of the operations. Reordering just made set_cpus_allowed tried one more time late enough so that it hides the race condition most of the time. The problem is that CPU_ONLINE callbacks are called while the cpu being onlined is online but not active and select_fallback_rq() only considers active cpus, so if a kthread gets scheduled in the meantime and it doesn't have any cpu which is active in its allowed mask, it's allowed mask gets reset to cpu_possible_mask. Would something like the following make sense? Thanks. -- 8< -- Subject: [PATCH] sched: allow kthreads to fallback to online && !active cpus During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is online but not active. A CPU_ONLINE callback may create or bind a kthread so that its cpus_allowed mask only allows the CPU which is being brought online. The kthread may start executing before the CPU is made active and can end up in select_fallback_rq(). In such cases, the expected behavior is selecting the CPU which is coming online; however, because select_fallback_rq() only chooses from active CPUs, it determines that the task doesn't have any viable CPU in its allowed mask and ends up overriding it to cpu_possible_mask. CPU_ONLINE callbacks should be able to put kthreads on the CPU which is coming online. Update select_fallback_rq() so that it follows cpu_online() rather than cpu_active() for kthreads. Signed-off-by: Tejun HeoReported-by: Gautham R Shenoy --- kernel/sched/core.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 017d539..a12e3db 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1536,7 +1536,9 @@ static int select_fallback_rq(int cpu, struct task_struct *p) for (;;) { /* Any allowed, online CPU? */ for_each_cpu(dest_cpu, tsk_cpus_allowed(p)) { - if (!cpu_active(dest_cpu)) + if (!(p->flags & PF_KTHREAD) && !cpu_active(dest_cpu)) + continue; + if (!cpu_online(dest_cpu)) continue; goto out; } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
Hello Tejun, On Wed, Jun 15, 2016 at 11:53:50AM -0400, Tejun Heo wrote: > Hello, > > On Tue, Jun 07, 2016 at 08:44:02PM +0530, Gautham R. Shenoy wrote: > > Currently in the CPU_ONLINE workqueue handler, the > > restore_unbound_workers_cpumask() will never call > > set_cpus_allowed_ptr() for a newly created unbound worker thread. > > Hmmm... did you actually verify that this happens? A new kworker > always gets bound to the cpumask that it's assigned to in > create_worker(). Yes I have verified that this happens despite the fact that create_worker() calls kthread_bind_mask() to bind the worker thread to attrs->cpumask. However, this doesn't seem to be sufficient. Consider the following case of a 2-node POWER machine running 4.7-rc3. CPUs 0-79 belong to the 1st node and CPUs 80-159 belong to the second. == root@fir01:~# uname -r 4.7.0-rc3-vanilla root@fir01:~# numactl -H available: 2 nodes (0,8) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 node 0 size: 65246 MB node 0 free: 64025 MB node 8 cpus: 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 node 8 size: 65304 MB node 8 free: 64985 MB node distances: node 0 8 0: 10 40 8: 40 10 == If we inspect the unbound worker threads affinity we will observe that the ordered unbound worker threads (pids 6, 1088, 1122) are affined to the online CPUs while the remaining unbound worker threads are affined to the nodemask. == root@fir01:/home/ego# ./pr_unbound_workers_affinity.sh #See [1] below pid 6's current affinity list: 0-159 pid 7's current affinity list: 0-79 pid 1018's current affinity list: 80-159 pid 1054's current affinity list: 80-159 pid 1088's current affinity list: 0-159 pid 1089's current affinity list: 0-79 pid 1090's current affinity list: 80-159 pid 1122's current affinity list: 0-159 pid 1176's current affinity list: 0-79 pid 3683's current affinity list: 0-79 == At this point if we offline all but CPU0, the only unbound workers that exist are the unordered workers and those affined to the first node. == root@fir01:/home/ego# ./cpuhp.sh 0 1 159 #See [2] below root@fir01:/home/ego# ./pr_unbound_workers_affinity.sh pid 6's current affinity list: 0 pid 7's current affinity list: 0 pid 1088's current affinity list: 0 pid 1089's current affinity list: 0 pid 1122's current affinity list: 0 pid 1176's current affinity list: 0 pid 3683's current affinity list: 0 == We now online CPU80 which is the first CPU in the second node. We would expect that an unbound worker thread corresponding to the second node would be created and would have the mask 80-159. However, the newly created workers (pid 4109 and 4110) are affined to CPU0 instead of CPU80! == root@fir01:/home/ego# ./cpuhp.sh 1 80 80 root@fir01:/home/ego# ./pr_unbound_workers_affinity.sh pid 6's current affinity list: 0,80 pid 7's current affinity list: 0 pid 1088's current affinity list: 0,80 pid 1089's current affinity list: 0 pid 1122's current affinity list: 0,80 pid 1176's current affinity list: 0 pid 3683's current affinity list: 0 pid 4109's current affinity list: 0 pid 4110's current affinity list: 0 == Furthermore, if we now bring all the CPUs online, we don't expect new worker threads to be created since the representative for the second node would have been created with CPU80 coming online. However, we do expect that those worker threads are affined to CPUs 80-159. But that's not the case either! == root@fir01:/home/ego# ./cpuhp.sh 1 1 159 root@fir01:/home/ego# ./pr_unbound_workers_affinity.sh pid 6's current affinity list: 0-159 pid 7's current affinity list: 0-79 pid 1088's current affinity list: 0-159 pid 1089's current affinity list: 0-79 pid 1122's current affinity list: 0-159 pid 1176's current affinity list: 0-79 pid 3683's current affinity list: 0-79 pid 4109's current affinity list: 0 pid 4110's current affinity list: 0 == Note: [1]
Re: [PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
Hello, On Tue, Jun 07, 2016 at 08:44:02PM +0530, Gautham R. Shenoy wrote: > Currently in the CPU_ONLINE workqueue handler, the > restore_unbound_workers_cpumask() will never call > set_cpus_allowed_ptr() for a newly created unbound worker thread. Hmmm... did you actually verify that this happens? A new kworker always gets bound to the cpumask that it's assigned to in create_worker(). Thanks. -- tejun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] workqueue: Move wq_update_unbound_numa() to the beginning of CPU_ONLINE
Currently in the CPU_ONLINE workqueue handler, the restore_unbound_workers_cpumask() will never call set_cpus_allowed_ptr() for a newly created unbound worker thread. This is because the function which creates a new unbound worker thread when the first CPU in the node comes online [wq_update_unbound_numa()] is invoked after the call to restore_unbound_workers_cpumask(). Thus the affinity is never set for this worker thread when the first CPU in the node comes online. Furthermore, due to an optimization in restore_unbound_workers_cpumask(), set_cpus_allowed_ptr() is not called when subsequent CPUs in the node come online since it assumes that the affinity would have been set when the first CPU has come online. This patch fixes this issue by invoking wq_update_unbound_numa() before the calling restore_unbound_workers_cpumask(). Cc: Peter ZijlstraCc: Thomas Gleixner Cc: Tejun Heo Cc: Michael Ellerman Signed-off-by: Gautham R. Shenoy --- kernel/workqueue.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index e1c0e99..e412794 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -4638,6 +4638,10 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb, case CPU_ONLINE: mutex_lock(_pool_mutex); + /* update NUMA affinity of unbound workqueues */ + list_for_each_entry(wq, , list) + wq_update_unbound_numa(wq, cpu, true); + for_each_pool(pool, pi) { mutex_lock(>attach_mutex); @@ -4649,10 +4653,6 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb, mutex_unlock(>attach_mutex); } - /* update NUMA affinity of unbound workqueues */ - list_for_each_entry(wq, , list) - wq_update_unbound_numa(wq, cpu, true); - mutex_unlock(_pool_mutex); break; } -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev