On 07/06/20 16:49, Valentin Schneider wrote: > > On 06/07/20 15:28, Qais Yousef wrote: > > CC: linux-fsde...@vger.kernel.org > > --- > > > > Peter > > > > I didn't do the > > > > read_lock(&taslist_lock); > > smp_mb__after_spinlock(); > > read_unlock(&tasklist_lock); > > > > dance you suggested on IRC as it didn't seem necessary. But maybe I missed > > something. > > > > So the annoying bit with just uclamp_fork() is that it happens *before* the > task is appended to the tasklist. This means without too much care we > would have (if we'd do a sync at uclamp_fork()): > > CPU0 (sysctl write) CPU1 (concurrent forker) > > copy_process() > uclamp_fork() > p.uclamp_min = > state > state = foo > > for_each_process_thread(p, t) > update_state(t); > list_add(p) > > i.e. that newly forked process would entirely sidestep the update. Now, > with Peter's suggested approach we can be in a much better situation. If we > have this in the sysctl update: > > state = foo; > > read_lock(&taslist_lock); > smp_mb__after_spinlock(); > read_unlock(&tasklist_lock); > > for_each_process_thread(p, t) > update_state(t); > > While having this in the fork: > > write_lock(&tasklist_lock); > list_add(p); > write_unlock(&tasklist_lock); > > sched_post_fork(p); // state re-read here; probably wants an mb first > > Then we can no longer miss an update. If the forked p doesn't see the new > value, it *must* have been added to the tasklist before the updater loops > over it, so the loop will catch it. If it sees the new value, we're done.
uclamp_fork() has nothing to do with the race. If copy_process() duplicates the task_struct of an RT task, it'll copy the old value. I'd expect the newly introduced sched_post_fork() (also in copy_process() after the list update) to prevent this race altogether. Now we could end up with a problem if for_each_process_thread() doesn't see the newly forked task _after_ sched_post_fork(). Hence my question to Peter. > > AIUI, the above strategy doesn't require any use of RCU. The update_state() > and sched_post_fork() can race, but as per the above they should both be > writing the same value. for_each_process_thread() must be protected by either tasklist_lock or rcu_read_lock(). The other RCU logic I added is not to protect against the race above. I describe the other race condition in a comment. Basically another updater on a different cpu via fork() and sched_setattr() might read an old value and get preempted. The rcu synchronization will ensure concurrent updaters have finished before iterating the list. Thanks -- Qais Yousef