Hi, On 12/02/16 18:10, Steven Rostedt wrote: > I'm writing a test case for SCHED_DEADLINE, and notice a strange > anomaly. Every so often, a deadline is missed and when I looked into > it, it happened because the sched_yield() had no effect (it didn't end > the previous period and let the start of the next runtime happen on the > end of the old period). > > deadline-2228 7...1 116.778420: sys_enter_sched_yield: > deadline-2228 7d..3 116.778421: hrtimer_cancel: > hrtimer=0xffff88011ebd79a0 > deadline-2228 7d..2 116.778422: rcu_utilization: Start context > switch > deadline-2228 7d..2 116.778423: rcu_utilization: End context switch > deadline-2228 7d..4 116.778423: hrtimer_start: > hrtimer=0xffff88011ebd79a0 function=hrtick/0x0 expires=116124420428 > softexpires=116124420428 > deadline-2228 7...1 116.778425: sys_exit_sched_yield: 0x0 > > > Schedule was never called. A added some trace_printks() and discovered > that this happens when sched_yield() is called right after a tick that > updates its current bandwidth. > > When the schedule tick happens that updates the current bandwidth, > update_curr_dl() is called, where it updates curr->se.exec_start to > rq_clock_task(rq). > > The rq_clock_task(rq) gets updated by update_rq_clock_task() that gets > update by various points in the scheduler. > > Now, if the user task calls sched_yield() just after a bandwidth update > synced curr->se.exec_start to rq_clock_task(rq), when sched_yield() > calls into update_curr_dl() we have: > > delta_exec = rq_clock_task(rq) - curr->se.exec_start; > if (unlikely((s64)delta_exec <= 0)) > return; > > Coming in here from a sched_yield() will have delta_exec == 0 if the > sched_yield() was called after a DL tick and before another > update_rq_clock_task() is called. > > This means that the task will not release its remaining runtime, and > the will start off in the current period when it expected to be in the > next period. > > The fix that appears to work for me is to add a test in > update_curr_dl() to not exit if delta_exec is zero and > dl_se->dl_yielded is true. > > Signed-off-by: Steven Rostedt <rost...@goodmis.org> > --- > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > index cd64c979d0e1..1dd180cda574 100644 > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -735,7 +735,7 @@ static void update_curr_dl(struct rq *rq) > * approach need further study. > */ > delta_exec = rq_clock_task(rq) - curr->se.exec_start; > - if (unlikely((s64)delta_exec <= 0)) > + if (unlikely((s64)delta_exec <= 0 && !dl_se->dl_yielded)) > return; >
This looks good to me. Do you think we could also skip some of the following updates/accounting in this case? Not sure we win anything by doing that, though. Thanks, - Juri