On 03/07/20 14:32, Peter Zijlstra wrote:
> Dave hit the problem fixed by commit:
>
>   b6e13e85829f ("sched/core: Fix ttwu() race")
>
> and failed to understand much of the code involved. Per his request a
> few comments to (hopefully) clarify things.
>
> Requested-by: Dave Chinner <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>

Small extra comment below, otherwise FWIW:

Reviewed-by: Valentin Schneider <[email protected]>

> +++ b/kernel/sched/core.c
> @@ -79,6 +79,100 @@ __read_mostly int scheduler_running;
>   */
>  int sysctl_sched_rt_runtime = 950000;
>
> +
> +/*
> + * Serialization rules:
> + *
> + * Lock order:
> + *
> + *   p->pi_lock
> + *     rq->lock
> + *       hrtimer_cpu_base->lock (hrtimer_start() for bandwidth controls)
> + *
> + *  rq1->lock
> + *    rq2->lock  where: rq1 < rq2
> + *
> + * Regular state:
> + *
> + * Normal scheduling state is serialized by rq->lock. __schedule() takes the
> + * local CPU's rq->lock, it optionally removes the task from the runqueue and
> + * always looks at the local rq data structures to find the most elegible 
> task
> + * to run next.
> + *
> + * Task enqueue is also under rq->lock, possibly taken from another CPU.
> + * Wakeups from another LLC domain might use an IPI to transfer the enqueue 
> to
> + * the local CPU to avoid bouncing the runqueue state around [ see
> + * ttwu_queue_wakelist() ]
> + *
> + * Task wakeup, specifically wakeups that involve migration, are horribly
> + * complicated to avoid having to take two rq->locks.
> + *
> + * Special state:
> + *
> + * System-calls and anything external will use task_rq_lock() which acquires
> + * both p->pi_lock and rq->lock. As a consequence the state they change is
> + * stable while holding either lock:
> + *
> + *  - sched_setaffinity()/
> + *    set_cpus_allowed_ptr():        p->cpus_ptr, p->nr_cpus_allowed
> + *  - set_user_nice():               p->se.load, p->*prio
> + *  - __sched_setscheduler():        p->sched_class, p->policy, p->*prio,
> + *                           p->se.load, p->rt_priority,
> + *                           p->dl.dl_{runtime, deadline, period, flags, bw, 
> density}

Uclamp stuff can also get updated in __sched_setscheduler(); see
__setscheduler_uclamp(). It's only p->uclamp_req AFAICT, but I don't think
there's harm in just saying p->uclamp*.

> + *  - sched_setnuma():               p->numa_preferred_nid
> + *  - sched_move_task()/
> + *    cpu_cgroup_fork():     p->sched_task_group
> + *  - uclamp_update_active() p->uclamp*
> + *

Reply via email to