On Mon, Jul 23, 2018 at 01:49:46PM +0100, Patrick Bellasi wrote:
> On 23-Jul 11:49, Peter Zijlstra wrote:
> 
> [...]
> 
> > > -void __getparam_dl(struct task_struct *p, struct sched_attr *attr)
> > > +void __getparam_dl(struct task_struct *p, struct sched_attr *attr,
> > > +            unsigned int flags)
> > >  {
> > >   struct sched_dl_entity *dl_se = &p->dl;
> > >  
> > >   attr->sched_priority = p->rt_priority;
> > > - attr->sched_runtime = dl_se->dl_runtime;
> > > - attr->sched_deadline = dl_se->dl_deadline;
> > > +
> > > + if (flags & SCHED_GETATTR_FLAGS_DL_ABSOLUTE) {
> > > +         /*
> > > +          * If the task is not running, its runtime is already
> > > +          * properly accounted. Otherwise, update clocks and the
> > > +          * statistics for the task.
> > > +          */
> > > +         if (task_running(task_rq(p), p)) {
> > > +                 struct rq_flags rf;
> > > +                 struct rq *rq;
> > > +
> > > +                 rq = task_rq_lock(p, &rf);
> > > +                 sched_clock_tick();
> > > +                 update_rq_clock(rq);
> > > +                 task_tick_dl(rq, p, 0);
> > 
> > Do we really want task_tick_dl() here, or update_curr_dl()?
> 
> I think this was to cover the case of a syscall being called while the
> task is running and we are midway between two ticks...

Sure, I know what it's there for, just saying that update_curr_dl()
would've updated the accounting as well. Calling tick stuff from !tick
context is a wee bit dodgy.

> > Also, who says the task still is dl ? :-)
> 
> Good point, but what should be the rule in general for these cases?
> 
> We already have:
> 
>    SYSCALL_DEFINE4(sched_getattr())
>        ....
>        if (task_has_dl_policy(p))
>             __getparam_dl(p, &attr);
> 
> which is also potentially racy, isn't it?

Yes, but only in so far as that the whole syscall is racy
per-definition. EVen if we'd lock the rq and get the absolute accurate
values, everything can change the moment we release the locks and return
to userspace again.

> Or just make the syscall return the most updated metrics for all the
> scheduling classes since we cannot grant the user anything about what
> the task will be once we return to userspace?

This.

> > > +                 task_rq_unlock(rq, p, &rf);
> > > +         }
> > > +
> > > +         /*
> > > +          * If the task is throttled, this value could be negative,
> > > +          * but sched_runtime is unsigned.
> > > +          */
> > > +         attr->sched_runtime = dl_se->runtime <= 0 ? 0 : dl_se->runtime;
> > > +         attr->sched_deadline = dl_se->deadline;
> > 
> > This is all very racy..
> > 
> > Even if the task wasn't running when you did the task_running() test, it
> > could be running now. And if it was running, it might not be running
> > anymore by the time you've acquired the rq->lock.
> 
> Which means we should use something like:
> 
>    if (flags & SCHED_GETATTR_FLAGS_DL_ABSOLUTE) {
>         /* Lock the task and the RQ before any other check and upate */
>         rq = task_rq_lock(p, &rf);
> 
>         /* Check the task is still DL ?*/
> 
>         /* Update task stats */
> 
>         task_rq_unlock(rq, p, &rf);
>    }
> 
> right?

Yeah, something along those lines.

> If that's better, then we should probably even better move the
> task_rq_lock at the beginning of SYSCALL_DEFINE4(sched_getattr()) ?

Hurm.. yes, we should probably have the has_dl_policy test under the
lock too. Which is really annoying, because this basically turns a
lockless syscall into locked one.

Another method would be to have __getparam_dl() 'fail' and retry if it
finds !has_dl_policy() once we have the lock. That would retain the
lockless nature for all current use-cases and only incur the locking
overhead for this new case.

Reply via email to