On Wed, 2016-02-03 at 11:24 -0500, Tejun Heo wrote: > On Wed, Feb 03, 2016 at 01:28:56PM +0100, Michal Hocko wrote: > > > The CPU was 168, and that one was offlined in the meantime. So > > > __queue_work fails at: > > > if (!(wq->flags & WQ_UNBOUND)) > > > pwq = per_cpu_ptr(wq->cpu_pwqs, cpu); > > > else > > > pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu)); > > > ^^^ ^^^^ NODE is -1 > > > \ pwq is NULL > > > > > > if (last_pool && last_pool != pwq->pool) { <--- BOOM > > So, the proper fix here is keeping cpu <-> node mapping stable across > cpu on/offlining which has been being worked on for a long time now. > The patchst is pending and it fixes other issues too. > > > So I think 874bbfe600a6 is really bogus. It should be reverted. We > > already have a proper fix for vmstat 176bed1de5bf ("vmstat: > > explicitly > > schedule per-cpu work on the CPU we need it to run on"). This which > > should be used for the stable trees as a replacement. > > It's not bogus. We can't flip a property that has been guaranteed > without any provision for verification. Why do you think vmstat blow > up in the first place? vmstat would be the canary case as it runs > frequently on all systems. It's exactly the sign that we can't break > this guarantee willy-nilly.
If the intent of the below is to fulfill a guarantee... + /* timer isn't guaranteed to run in this cpu, record earlier */ + if (cpu == WORK_CPU_UNBOUND) + cpu = raw_smp_processor_id(); dwork->cpu = cpu; timer->expires = jiffies + delay; - if (unlikely(cpu != WORK_CPU_UNBOUND)) - add_timer_on(timer, cpu); - else - add_timer(timer); + add_timer_on(timer, cpu); ...it appears to be incomplete. Hotplug aside, when adding a timer with the expectation that it stay put, should it not also be pinned? -Mike