On Sat, Dec 26, 2020 at 10:51:08AM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
> 
> 06249738a41a ("workqueue: Manually break affinity on hotplug")
> said that scheduler will not force break affinity for us.

So I've been looking at this the past day or so, and the more I look,
the more I think commit:

  1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")

is a real problem and we need to revert it (at least for now).

Let me attempt a brain dump:

 - the assumption that per-cpu kernel threads are 'well behaved' on
   hot-plug has, I think, been proven incorrect, it's far worse than
   just bounded workqueue. Therefore, it makes sense to provide the old
   semantics.

 - making the current code provide the old semantics (forcing affinity
   on per-cpu kernel threads) is tricky, but could probably be done:

    * we need to disallow new per-cpu kthreads while going down
    * we need to force push more agressive; basically when
      rcuwait_active(rq->hotplug_wait) push everything except that task,
      irrespective of is_per_cpu_kthread()
    * we need to disallow wakeups of anything not the hotplug thread or
      stop-machine from happening from the rcuwait_wait_event()

   and I have patches for most of that... except they're adding more
   complexity than 1cf12e08bc4d ever deleted.

However, even with all that, there's a further problem...

Fundamentally, waiting for !rq_has_pinned_tasks() so late in
hot-un-plug, is wrong I think. It means that migrate_disable() code
might encounter a mostly torn down CPU. This is OK-ish for per-cpu
kernel threads [*], but is now exposed to any random odd kernel code
that does migrate_disable().

[*] arguably running 'work' this late is similarly problematic.

Let me go do lunch and ponder this further..

Reply via email to