On Tue, Jan 5, 2021 at 10:41 AM Lai Jiangshan <jiangshan...@gmail.com> wrote: > > On Mon, Jan 4, 2021 at 9:56 PM Peter Zijlstra <pet...@infradead.org> wrote: > > > > On Sat, Dec 26, 2020 at 10:51:11AM +0800, Lai Jiangshan wrote: > > > From: Lai Jiangshan <la...@linux.alibaba.com> > > > > > > wq_online_cpumask is the cached result of cpu_online_mask with the > > > going-down cpu cleared. > > > > You can't use cpu_active_mask ? > > > When a cpu is going out: > (cpu_active_mask is not protected by workqueue mutexs.) > > create_worker() for unbound pool | cpu offlining > check cpu_active_mask | check wq_online_cpumask > | remove bit from cpu_active_mask > | no cpu in pool->attrs->cpumask is active > set pool->attrs->cpumask to worker| > and hit the warning | remove bit from wq_online_cpumask
Even with the help of wq_online_cpumask, the patchset can't silence the warning in __set_cpus_allowed_ptr() in this case. It is indeed hard to suppress the warning for unbound pools. Maybe we need something like this (outmost callback of CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE, so that workqueue can do preparation when offlining before AP_ACTIVE): diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 0042ef362511..ac2103deb20b 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -20,6 +20,9 @@ * | ^ * v | * AP_ACTIVE AP_ACTIVE + * | ^ + * v | + * ONLINE ONLINE */ enum cpuhp_state { @@ -194,6 +197,7 @@ enum cpuhp_state { CPUHP_AP_X86_HPET_ONLINE, CPUHP_AP_X86_KVM_CLK_ONLINE, CPUHP_AP_ACTIVE, + CPUHP_AP_WORKQUEUE_UNBOUND_ONLINE, CPUHP_ONLINE, }; The other way is to modify __set_cpus_allowed_ptr() to suppress the warning for kworkers and believe/let the workqueue handle cpumask correctly. And the third way is to use get_online_cpus() in worker_attach_to_pool() which might delay work items to be processed during cpuhotplug and might be dangerous when someone call flush_work() in cpuhotplug callbacks. Any thoughts? Thanks, Lai > > > And when a cpu is onlining, there may be some workers which were just created > after the workqueue hotplug callback is finished but before cpu_active_mask > was updated. workqueue has not call back after cpu_active_mask updated and > these workers' cpumask is not updated. > > For percpu workers, these problems can be handled with the help of > POOL_DISASSOCIATED which is protected by workqueue mutexs and the > help of sched/core.c which doesn't warn when per-cpu-kthread. > > For unbound workers, the way to handle it without using wq_online_cpumask > is much more complex when a cpu is going out.