On 13-Apr 12:47, Patrick Bellasi wrote: > On 13-Apr 13:36, Peter Zijlstra wrote: > > On Fri, Apr 13, 2018 at 12:15:10PM +0100, Patrick Bellasi wrote: > > > On 13-Apr 10:43, Peter Zijlstra wrote: > > > > On Mon, Apr 09, 2018 at 05:56:09PM +0100, Patrick Bellasi wrote: > > > > > +static inline void uclamp_task_update(struct rq *rq, struct > > > > > task_struct *p) > > > > > +{ > > > > > + int cpu = cpu_of(rq); > > > > > + int clamp_id; > > > > > + > > > > > + /* The idle task does not affect CPU's clamps */ > > > > > + if (unlikely(p->sched_class == &idle_sched_class)) > > > > > + return; > > > > > + /* DEADLINE tasks do not affect CPU's clamps */ > > > > > + if (unlikely(p->sched_class == &dl_sched_class)) > > > > > + return; > > > > > + > > > > > + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) { > > > > > + if (uclamp_task_affects(p, clamp_id)) > > > > > + uclamp_cpu_put(p, cpu, clamp_id); > > > > > + else > > > > > + uclamp_cpu_get(p, cpu, clamp_id); > > > > > + } > > > > > +} > > > > > > > > Is that uclamp_task_affects() thing there to fix up the fact you failed > > > > to propagate the calling context (enqueue/dequeue) ? > > > > > > Not really, it's intended by design: we back annotate the clamp_group > > > a task has been refcounted in. > > > > > > The uclamp_task_affects() tells if we are refcounted now and then we > > > know from the back-annotation from which refcounter we need to remove > > > the task. > > > > > > I found this solution much less racy and effective in avoiding to > > > screw up the refcounter whenever we look at a task at either > > > dequeue/migration time and these operations can overlaps with the > > > slow-path. Meaning, when we change the task specific clamp_group > > > either via syscall or cgroups attributes. > > > > > > IOW, the back annotation allows to decouple refcounting from > > > clamp_group configuration in a lockless way. > > > > But it adds extra state and logic, to a fastpath, for no reason. > > > > I suspect you messed up the cgroup side; because the syscall should > > already have done task_rq_lock() and hold both p->pi_lock and rq->lock > > and have dequeued the task when changing the attribute. > > Yes, actually I'm using task_rq_lock() from the cgroup callback to > update each task already queued. And I do the same from the > sched_setattr syscall... > > > It is actually really hard to make the syscall do it wrong. > > ... thus, I'll look better into this. > > Not sure now if there was some other corner-case.
Actually, I've just remembered another use-case for that back-annotation. That's used when we have cgroups and per-task API asserting two different clamp values. For example, a task in a TG with max_clamp=50 is further clamped with a task specific max_clamp=10. The back annotation tracks the group_id in which we have been refcount right now, which is the task specific group in the previous example. -- #include <best/regards.h> Patrick Bellasi