On Thu, Sep 17 2020 at 16:24, peterz wrote: > On Thu, Sep 17, 2020 at 11:42:11AM +0200, Thomas Gleixner wrote: > >> +static inline void update_nr_migratory(struct task_struct *p, long delta) >> +{ >> + if (p->nr_cpus_allowed > 1 && p->sched_class->update_migratory) >> + p->sched_class->update_migratory(p, delta); >> +} > > Right, so as you know, I totally hate this thing :-) It adds a second > (and radically different) version of changing affinity. I'm working on a > version that uses the normal *set_cpus_allowed*() interface.
Tried that back and forth and ended either up in locking hell or with race conditions of sorts, but my scheduler foo is rusty. >> +static inline void sched_migration_ctrl(struct task_struct *prev, int cpu) >> +{ >> + if (!prev->migration_ctrl.disable_cnt || >> + prev->cpus_ptr != &prev->cpus_mask) >> + return; >> + >> + prev->cpus_ptr = cpumask_of(cpu); >> + update_nr_migratory(prev, -1); >> + prev->nr_cpus_allowed = 1; >> +} > > So this thing is called from schedule(), with only rq->lock held, and > that violates the locking rules for changing the affinity. > > I have a comment that explains how it's broken and why it's sort-of > working. Yeah :( >> +void migrate_disable(void) >> +{ >> + unsigned long flags; >> + >> + if (!current->migration_ctrl.disable_cnt) { >> + raw_spin_lock_irqsave(¤t->pi_lock, flags); >> + current->migration_ctrl.disable_cnt++; >> + raw_spin_unlock_irqrestore(¤t->pi_lock, flags); >> + } else { >> + current->migration_ctrl.disable_cnt++; >> + } >> +} > > That pi_lock seems unfortunate, and it isn't obvious what the point of > it is. Indeed. That obviously lacks a big fat comment. current->migration_ctrl.disable_cnt++ is obviously a RMW operation. So you end up with the following: CPU0 CPU1 migrate_disable() R = current->migration_ctrl.disable_cnt; set_cpus_allowed_ptr() task_rq_lock(); if (!p->migration_ctrl.disable_cnt) { current->migration_ctrl.disable_cnt = R + 1; stop_one_cpu(); ---> stopper_thread() BUG_ON(task->migration_ctrl.disable_cnt); I tried to back out from that instead of BUG(), but that ended up being even more of a hacky trainwreck than just biting the bullet and taking pi_lock. > > So, what I'm missing with all this are the design contraints for this > trainwreck. Because the 'sane' solution was having migrate_disable() > imply cpus_read_lock(). But that didn't fly because we can't have > migrate_disable() / migrate_enable() schedule for raisins. Yeah. The original code had some magic if (preemptible()) cpus_read_lock(); else p->atomic_migrate_disable++; but that caused another set of horrors with asymetric code like the below and stuff like try_lock(). > And if I'm not mistaken, the above migrate_enable() *does* require being > able to schedule, and our favourite piece of futex: > > raw_spin_lock_irq(&q.pi_state->pi_mutex.wait_lock); > spin_unlock(q.lock_ptr); > > is broken. Consider that spin_unlock() doing migrate_enable() with a > pending sched_setaffinity(). Yes, we have the extra migrate_disable()/enable() pair around that. The other way I solved that was to have a spin_[un]lock() variant which does not have a migrate disable/enable. That works in that code because there is no per CPUness requirement. Not pretty either... Thanks, tglx