On 30 June 2017 at 15:58, Vincent Guittot <vincent.guit...@linaro.org> wrote: > The running state is a subset of runnable state which means that running > can't be set if runnable (weight) is cleared. There are corner cases > where the current sched_entity has been already dequeued but cfs_rq->curr > has not been updated yet and still points to the dequeued sched_entity. > If ___update_load_avg is called at that time, weight will be 0 and running > will be set which is not possible. > > This case happens during pick_next_task_fair() when a cfs_rq becomes idles. > The current sched_entity has been dequeued so se->on_rq is cleared and > cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be > cleared when picking the idle thread). Because the cfs_rq becomes idle, > idle_balance() is called and ends up to call update_blocked_averages() > with these wrong running and runnable states. > > Add a test in ___update_load_avg to correct the running state in this case.* > > Signed-off-by: Vincent Guittot <vincent.guit...@linaro.org>
The v1 is just wrong. I have sent a v2 with correct patch sorry for the disturb Vincent > --- > kernel/sched/fair.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 008c514..5fdcb42 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2968,6 +2968,24 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg > *sa, > sa->last_update_time += delta << 10; > > /* > + * running is a subset of runnable (weight) so running can't be set if > + * runnable is clear. But there are some corner cases where the > current > + * se has been already dequeued but cfs_rq->curr still points to it. > + * This means that weight will be 0 but not running for a sched_entity > + * but also for a cfs_rq if the latter becomes idle. As an example, > + * this happens during idle_balance() which calls > + * update_blocked_averages() > + */ > + if (weight) > + running = 1; > + > + /* > + * Scale time to reflect the amount a computation effectively done > + * during the time slot at current capacity > + */ > + delta = scale_time(delta, cpu, sa, weight, running); > + > + /* > * Now we know we crossed measurement unit boundaries. The *_avg > * accrues by two steps: > * > -- > 2.7.4 >