Hello, Michael. On Wed, May 24, 2017 at 06:39:49PM -0500, Michael Bringmann wrote: > [ 321.310961] ------------[ cut here ]------------ > [ 321.310997] WARNING: CPU: 184 PID: 13201 at kernel/workqueue.c:3375 > alloc_unbound_pwq+0x5c0/0x5e0 > [ 321.311005] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag > udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng > ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c > sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log > dm_mod > [ 321.311097] CPU: 184 PID: 13201 Comm: cpuhp/184 Not tainted > 4.12.0-rc1.wi91275_debug_03.ppc64le+ #8 > [ 321.311106] task: c000000408961080 task.stack: c000000406394000 > [ 321.311113] NIP: c000000000116c80 LR: c000000000116c7c CTR: > 0000000000000000 > [ 321.311121] REGS: c0000004063977b0 TRAP: 0700 Not tainted > (4.12.0-rc1.wi91275_debug_03.ppc64le+) > [ 321.311128] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> > [ 321.311150] CR: 28000082 XER: 00000000 > [ 321.311159] CFAR: c000000000a2dc80 SOFTE: 1 > [ 321.311159] GPR00: c000000000116c7c c000000406397a30 c0000000013ae900 > 000000000000003b > [ 321.311159] GPR04: c000000408961a38 0000000000000006 00000000a49e41e5 > ffffffffa4a5a483 > [ 321.311159] GPR08: 00000000000062cc 0000000000000000 0000000000000000 > c000000408961a38 > [ 321.311159] GPR12: 0000000000000000 c00000000fb38c00 c00000000011e858 > c00000040a902ac0 > [ 321.311159] GPR16: 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > [ 321.311159] GPR20: c000000406394000 0000000000000002 c000000406394000 > 0000000000000000 > [ 321.311159] GPR24: c000000405075400 c000000404fc0000 0000000000000110 > c0000000015a4c88 > [ 321.311159] GPR28: 0000000000000000 c0000004fe256000 c0000004fe256008 > c0000004fe052800 > [ 321.311290] NIP [c000000000116c80] alloc_unbound_pwq+0x5c0/0x5e0 > [ 321.311298] LR [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0 > [ 321.311305] Call Trace: > [ 321.311310] [c000000406397a30] [c000000000116c7c] > alloc_unbound_pwq+0x5bc/0x5e0 (unreliable) > [ 321.311323] [c000000406397ad0] [c000000000116e30] > wq_update_unbound_numa+0x190/0x270 > [ 321.311334] [c000000406397b60] [c000000000118eb0] > workqueue_offline_cpu+0xe0/0x130 > [ 321.311345] [c000000406397bf0] [c0000000000e9f20] > cpuhp_invoke_callback+0x240/0xcd0 > [ 321.311355] [c000000406397cb0] [c0000000000eab28] > cpuhp_down_callbacks+0x78/0xf0 > [ 321.311365] [c000000406397d00] [c0000000000eae6c] > cpuhp_thread_fun+0x18c/0x1a0 > [ 321.311376] [c000000406397d30] [c0000000001251cc] > smpboot_thread_fn+0x2fc/0x3b0 > [ 321.311386] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0 > [ 321.311397] [c000000406397e30] [c00000000000b4f4] > ret_from_kernel_thread+0x5c/0x68
wq_update_unbound_numa() should have never called into alloc_unbound_pwq() w/ empty node cpu mask. It should have fallen back to the dfl_pwq. It looks like I just messed up the logic there from the initial commit of the feature. Can you please see whether the following fixes the problem? Thanks. diff --git a/kernel/workqueue.c b/kernel/workqueue.c index c74bf39ef764..e2c248d5223a 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -3559,29 +3559,21 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq, * stable. * * Return: %true if the resulting @cpumask is different from @attrs->cpumask, - * %false if equal. + * %false if equal. On %false return, the content of @cpumask is undefined. */ static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node, int cpu_going_down, cpumask_t *cpumask) { if (!wq_numa_enabled || attrs->no_numa) - goto use_dfl; + return false; /* does @node have any online CPUs @attrs wants? */ cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask); if (cpu_going_down >= 0) cpumask_clear_cpu(cpu_going_down, cpumask); - if (cpumask_empty(cpumask)) - goto use_dfl; - - /* yeap, return possible CPUs in @node that @attrs wants */ - cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]); - return !cpumask_equal(cpumask, attrs->cpumask); - -use_dfl: - cpumask_copy(cpumask, attrs->cpumask); - return false; + return !cpumask_empty(cpumask) && + !cpumask_equal(cpumask, attrs->cpumask); } /* install @pwq into @wq's numa_pwq_tbl[] for @node and return the old pwq */