Hello, On Tue, Jun 06, 2017 at 11:18:36AM -0500, Michael Bringmann wrote: > On 05/25/2017 10:30 AM, Michael Bringmann wrote: > > I will try that patch shortly. I also updated my patch to be conditional > > on whether the pool's cpumask attribute was empty. You should have received > > V2 of that patch by now. > > Let's try this again. > > The hotplug problem goes away with the changes that you provided earlier, and
So, that means we're ending up in situations where NUMA online is a proper superset of NUMA possible. > shown in the patch below. I kept this change to get_unbound_pool' as a just > in case to explain the crash in the event that it occurs again: > > if (!cpumask_weight(pool->attrs->cpumask)) > cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id())); > > I could also insert > > BUG(!cpumask_weight(pool->attrs->cpumask, cpumask_of(smp_processor_id())); > > at that place, but I really prefer not to crash the system if there is a > workaround. I'm not sure because it doesn't make any logical sense and it's not right in terms of correctness. The above would be able to enable CPUs which are explicitly excluded from a workqueue. The only fallback which makes sense is falling back to the default pwq. > > Can you please post the messages with the debug patch from the prev > > thread? In fact, let's please continue on that thread. I'm having a > > hard time following what's going wrong with the code. > > Are these the failure logs that you requested? > > > Red Hat Enterprise Linux Server 7.3 (Maipo) > Kernel 4.12.0-rc1.wi91275_debug_03.ppc64le+ on an ppc64le > > ltcalpine2-lp20 login: root > Password: > Last login: Wed May 24 18:45:40 from oc1554177480.austin.ibm.com > [root@ltcalpine2-lp20 ~]# numactl -H > available: 2 nodes (0,6) > node 0 cpus: > node 0 size: 0 MB > node 0 free: 0 MB > node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 > 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 > 51 52 53 54 55 56 57 58 59 60 61 62 63 > node 6 size: 19858 MB > node 6 free: 16920 MB > node distances: > node 0 6 > 0: 10 40 > 6: 40 10 > [root@ltcalpine2-lp20 ~]# numactl -H > available: 2 nodes (0,6) > node 0 cpus: > node 0 size: 0 MB > node 0 free: 0 MB > node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 > 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 > 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 > 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 > 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 > 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 > 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 > 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 > 178 179 180 181 182 183 184 185 186 187 188 189 190 191 > node 6 size: 19858 MB > node 6 free: 16362 MB > node distances: > node 0 6 > 0: 10 40 > 6: 40 10 > [root@ltcalpine2-lp20 ~]# [ 321.310943] workqueue:get_unbound_pool has empty > cpumask for pool attrs > [ 321.310961] ------------[ cut here ]------------ > [ 321.310997] WARNING: CPU: 184 PID: 13201 at kernel/workqueue.c:3375 > alloc_unbound_pwq+0x5c0/0x5e0 > [ 321.311005] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag > udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng > ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c > sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log > dm_mod > [ 321.311097] CPU: 184 PID: 13201 Comm: cpuhp/184 Not tainted > 4.12.0-rc1.wi91275_debug_03.ppc64le+ #8 > [ 321.311106] task: c000000408961080 task.stack: c000000406394000 > [ 321.311113] NIP: c000000000116c80 LR: c000000000116c7c CTR: > 0000000000000000 > [ 321.311121] REGS: c0000004063977b0 TRAP: 0700 Not tainted > (4.12.0-rc1.wi91275_debug_03.ppc64le+) > [ 321.311128] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> > [ 321.311150] CR: 28000082 XER: 00000000 > [ 321.311159] CFAR: c000000000a2dc80 SOFTE: 1 > [ 321.311159] GPR00: c000000000116c7c c000000406397a30 c0000000013ae900 > 000000000000003b > [ 321.311159] GPR04: c000000408961a38 0000000000000006 00000000a49e41e5 > ffffffffa4a5a483 > [ 321.311159] GPR08: 00000000000062cc 0000000000000000 0000000000000000 > c000000408961a38 > [ 321.311159] GPR12: 0000000000000000 c00000000fb38c00 c00000000011e858 > c00000040a902ac0 > [ 321.311159] GPR16: 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > [ 321.311159] GPR20: c000000406394000 0000000000000002 c000000406394000 > 0000000000000000 > [ 321.311159] GPR24: c000000405075400 c000000404fc0000 0000000000000110 > c0000000015a4c88 > [ 321.311159] GPR28: 0000000000000000 c0000004fe256000 c0000004fe256008 > c0000004fe052800 > [ 321.311290] NIP [c000000000116c80] alloc_unbound_pwq+0x5c0/0x5e0 > [ 321.311298] LR [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0 > [ 321.311305] Call Trace: > [ 321.311310] [c000000406397a30] [c000000000116c7c] > alloc_unbound_pwq+0x5bc/0x5e0 (unreliable) > [ 321.311323] [c000000406397ad0] [c000000000116e30] > wq_update_unbound_numa+0x190/0x270 > [ 321.311334] [c000000406397b60] [c000000000118eb0] > workqueue_offline_cpu+0xe0/0x130 > [ 321.311345] [c000000406397bf0] [c0000000000e9f20] > cpuhp_invoke_callback+0x240/0xcd0 > [ 321.311355] [c000000406397cb0] [c0000000000eab28] > cpuhp_down_callbacks+0x78/0xf0 > [ 321.311365] [c000000406397d00] [c0000000000eae6c] > cpuhp_thread_fun+0x18c/0x1a0 > [ 321.311376] [c000000406397d30] [c0000000001251cc] > smpboot_thread_fn+0x2fc/0x3b0 > [ 321.311386] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0 > [ 321.311397] [c000000406397e30] [c00000000000b4f4] > ret_from_kernel_thread+0x5c/0x68 > [ 321.311406] Instruction dump: > [ 321.311413] 3d42fff0 892ac565 2f890000 40fefd98 39200001 3c62ff89 3c82ff6c > 3863d590 > [ 321.311437] 38847cb0 992ac565 48916fc9 60000000 <0fe00000> 4bfffd70 > 60000000 60420000 The only way offlining can lead to this failure is when wq numa possible cpu mask is a proper subset of the matching online mask. Can you please print out the numa online cpu and wq_numa_possible_cpumask masks and verify that online stays within the possible for each node? If not, the ppc arch init code needs to be updated so that cpu <-> node binding is establish for all possible cpus on boot. Note that this isn't a requirement coming solely from wq. All node affine (thus percpu) allocations depend on that. Thanks. -- tejun

