Hi,

On 12/02/16 18:10, Steven Rostedt wrote:
> I'm writing a test case for SCHED_DEADLINE, and notice a strange
> anomaly. Every so often, a deadline is missed and when I looked into
> it, it happened because the sched_yield() had no effect (it didn't end
> the previous period and let the start of the next runtime happen on the
> end of the old period).
> 
> deadline-2228    7...1   116.778420: sys_enter_sched_yield: 
> deadline-2228    7d..3   116.778421: hrtimer_cancel:       
> hrtimer=0xffff88011ebd79a0
> deadline-2228    7d..2   116.778422: rcu_utilization:      Start context 
> switch
> deadline-2228    7d..2   116.778423: rcu_utilization:      End context switch
> deadline-2228    7d..4   116.778423: hrtimer_start:        
> hrtimer=0xffff88011ebd79a0 function=hrtick/0x0 expires=116124420428 
> softexpires=116124420428
> deadline-2228    7...1   116.778425: sys_exit_sched_yield: 0x0
> 
> 
> Schedule was never called. A added some trace_printks() and discovered
> that this happens when sched_yield() is called right after a tick that
> updates its current bandwidth.
> 
> When the schedule tick happens that updates the current bandwidth,
> update_curr_dl() is called, where it updates curr->se.exec_start to
> rq_clock_task(rq).
> 
> The rq_clock_task(rq) gets updated by update_rq_clock_task() that gets
> update by various points in the scheduler.
> 
> Now, if the user task calls sched_yield() just after a bandwidth update
> synced curr->se.exec_start to rq_clock_task(rq), when sched_yield()
> calls into update_curr_dl() we have:
> 
>       delta_exec = rq_clock_task(rq) - curr->se.exec_start;
>       if (unlikely((s64)delta_exec <= 0))
>               return;
> 
> Coming in here from a sched_yield() will have delta_exec == 0 if the
> sched_yield() was called after a DL tick and before another
> update_rq_clock_task() is called.
> 
> This means that the task will not release its remaining runtime, and
> the will start off in the current period when it expected to be in the
> next period.
> 
> The fix that appears to work for me is to add a test in
> update_curr_dl() to not exit if delta_exec is zero and
> dl_se->dl_yielded is true.
> 
> Signed-off-by: Steven Rostedt <rost...@goodmis.org>
> ---
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index cd64c979d0e1..1dd180cda574 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -735,7 +735,7 @@ static void update_curr_dl(struct rq *rq)
>        * approach need further study.
>        */
>       delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> -     if (unlikely((s64)delta_exec <= 0))
> +     if (unlikely((s64)delta_exec <= 0 && !dl_se->dl_yielded))
>               return;
>

This looks good to me. Do you think we could also skip some of the
following updates/accounting in this case? Not sure we win anything by
doing that, though.

Thanks,

- Juri

Reply via email to