Hello, Michael.

On Wed, May 24, 2017 at 06:39:49PM -0500, Michael Bringmann wrote:
> [  321.310961] ------------[ cut here ]------------
> [  321.310997] WARNING: CPU: 184 PID: 13201 at kernel/workqueue.c:3375 
> alloc_unbound_pwq+0x5c0/0x5e0
> [  321.311005] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag 
> udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng 
> ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c 
> sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log 
> dm_mod
> [  321.311097] CPU: 184 PID: 13201 Comm: cpuhp/184 Not tainted 
> 4.12.0-rc1.wi91275_debug_03.ppc64le+ #8
> [  321.311106] task: c000000408961080 task.stack: c000000406394000
> [  321.311113] NIP: c000000000116c80 LR: c000000000116c7c CTR: 
> 0000000000000000
> [  321.311121] REGS: c0000004063977b0 TRAP: 0700   Not tainted  
> (4.12.0-rc1.wi91275_debug_03.ppc64le+)
> [  321.311128] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>
> [  321.311150]   CR: 28000082  XER: 00000000
> [  321.311159] CFAR: c000000000a2dc80 SOFTE: 1 
> [  321.311159] GPR00: c000000000116c7c c000000406397a30 c0000000013ae900 
> 000000000000003b 
> [  321.311159] GPR04: c000000408961a38 0000000000000006 00000000a49e41e5 
> ffffffffa4a5a483 
> [  321.311159] GPR08: 00000000000062cc 0000000000000000 0000000000000000 
> c000000408961a38 
> [  321.311159] GPR12: 0000000000000000 c00000000fb38c00 c00000000011e858 
> c00000040a902ac0 
> [  321.311159] GPR16: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000 
> [  321.311159] GPR20: c000000406394000 0000000000000002 c000000406394000 
> 0000000000000000 
> [  321.311159] GPR24: c000000405075400 c000000404fc0000 0000000000000110 
> c0000000015a4c88 
> [  321.311159] GPR28: 0000000000000000 c0000004fe256000 c0000004fe256008 
> c0000004fe052800 
> [  321.311290] NIP [c000000000116c80] alloc_unbound_pwq+0x5c0/0x5e0
> [  321.311298] LR [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0
> [  321.311305] Call Trace:
> [  321.311310] [c000000406397a30] [c000000000116c7c] 
> alloc_unbound_pwq+0x5bc/0x5e0 (unreliable)
> [  321.311323] [c000000406397ad0] [c000000000116e30] 
> wq_update_unbound_numa+0x190/0x270
> [  321.311334] [c000000406397b60] [c000000000118eb0] 
> workqueue_offline_cpu+0xe0/0x130
> [  321.311345] [c000000406397bf0] [c0000000000e9f20] 
> cpuhp_invoke_callback+0x240/0xcd0
> [  321.311355] [c000000406397cb0] [c0000000000eab28] 
> cpuhp_down_callbacks+0x78/0xf0
> [  321.311365] [c000000406397d00] [c0000000000eae6c] 
> cpuhp_thread_fun+0x18c/0x1a0
> [  321.311376] [c000000406397d30] [c0000000001251cc] 
> smpboot_thread_fn+0x2fc/0x3b0
> [  321.311386] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0
> [  321.311397] [c000000406397e30] [c00000000000b4f4] 
> ret_from_kernel_thread+0x5c/0x68

wq_update_unbound_numa() should have never called into
alloc_unbound_pwq() w/ empty node cpu mask.  It should have fallen
back to the dfl_pwq.  It looks like I just messed up the logic there
from the initial commit of the feature.  Can you please see whether
the following fixes the problem?

Thanks.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c74bf39ef764..e2c248d5223a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3559,29 +3559,21 @@ static struct pool_workqueue *alloc_unbound_pwq(struct 
workqueue_struct *wq,
  * stable.
  *
  * Return: %true if the resulting @cpumask is different from @attrs->cpumask,
- * %false if equal.
+ * %false if equal.  On %false return, the content of @cpumask is undefined.
  */
 static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
                                 int cpu_going_down, cpumask_t *cpumask)
 {
        if (!wq_numa_enabled || attrs->no_numa)
-               goto use_dfl;
+               return false;
 
        /* does @node have any online CPUs @attrs wants? */
        cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
        if (cpu_going_down >= 0)
                cpumask_clear_cpu(cpu_going_down, cpumask);
 
-       if (cpumask_empty(cpumask))
-               goto use_dfl;
-
-       /* yeap, return possible CPUs in @node that @attrs wants */
-       cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
-       return !cpumask_equal(cpumask, attrs->cpumask);
-
-use_dfl:
-       cpumask_copy(cpumask, attrs->cpumask);
-       return false;
+       return !cpumask_empty(cpumask) &&
+               !cpumask_equal(cpumask, attrs->cpumask);
 }
 
 /* install @pwq into @wq's numa_pwq_tbl[] for @node and return the old pwq */

Reply via email to