Hi Gabriel, On Sat, 16 Feb 2019 at 00:06, Gabriel Hartmann <gabriel.hartm...@gmail.com> wrote: > > Hi Vincent, > > Apologies for the slow turn around on this. We have tried both approaches to > fixing the bug now. In both cases for a particularly long duration CPU > intensive workload we are seeing ~33% slowdown.
This was somehow expected because the unused cfs_rq are not removed anymore but at least the list is correctly ordered with my patch. the official version of this patch is there: https://lkml.org/lkml/2019/2/4/121 Then, more patches have been queued that removed unused cfs_rq and keep a correct list ordering: https://lkml.org/lkml/2019/2/6/499 With these 3 patches, the slowdown should disappear and the list ordering will stay correct Regards, Vincent > > -- Gabriel > > On Fri, Jan 25, 2019 at 6:31 AM Vincent Guittot <vincent.guit...@linaro.org> > wrote: >> >> Hi Sargun, >> >> On Mon, 21 Jan 2019 at 15:46, Vincent Guittot >> <vincent.guit...@linaro.org> wrote: >> > >> > Hi Sargun, >> > >> > Le Friday 18 Jan 2019 à 15:06:28 (+0100), Vincent Guittot a écrit : >> > > On Fri, 18 Jan 2019 at 11:16, Vincent Guittot >> > > <vincent.guit...@linaro.org> wrote: >> > > > >> > > > On Wed, 9 Jan 2019 at 23:43, Sargun Dhillon <sar...@sargun.me> wrote: >> > > > > >> > > > > On Wed, Jan 9, 2019 at 2:14 PM Sargun Dhillon <sar...@sargun.me> >> > > > > wrote: >> > > > > > >> > > > > > I picked up c40f7d74c741a907cfaeb73a7697081881c497d0 sched/fair: >> > > > > > Fix >> > > > > > infinite loop in update_blocked_averages() by reverting >> > > > > > a9e7f6544b9c >> > > > > > and put it on top of 4.19.13. In addition to this, I uninlined >> > > > > > list_add_leaf_cfs_rq for debugging. >> > > >> > > With the fix above applied, the code that manages the leaf_cfs_rq_list >> > > is the same since v4.9. >> > > Have you noticed similar problem on other older kernel version between >> > > v4.9 and v4.19 ? The problem might have been introduce while modifying >> > > other part of the scheduler like the sequence for adding/removing >> > > cgroup. >> > > >> > > Knowing the most recent kernel version without the problem could help >> > > to narrow the problem >> > > >> > > Thanks, >> > > Vincent >> > > >> > > > > > >> > > > > > This revealed a new bug that we didn't get to because we kept >> > > > > > getting >> > > > > > crashes from the previous issue. When we are running with cgroups >> > > > > > that >> > > > > > are rapidly changing, with CFS bandwidth control, and in addition >> > > > > > using the cpusets cgroup, we see this crash. Specifically, it >> > > > > > seems to >> > > > > > occur with cgroups that are throttled and we change the allowed >> > > > > > cpuset. >> > > > >> > > > Thanks for the context, I will try to reproduce the problem and >> > > > understand how we can stop in the middle of walking to the >> > > > sched_entity branch with a parent not already added >> > > > >> > > > How many cgroup level have you got in you setup ? >> > > > >> > > > > > >> > > > > >> > > > > This patch from Gabriel should fix the problem: >> > > > > >> > > > > >> > > > > [PATCH] sched/fair: Reset tmp_alone_branch on cfs_rq delete >> > > > > >> > > > > When a child cfs_rq is added to the leaf cfs_rq list before its >> > > > > parent >> > > > > tmp_alone_branch is set to point to the child in preparation for the >> > > > > parent being added. >> > > > > >> > > > > If the child is deleted before the parent is added then >> > > > > tmp_alone_branch >> > > > > points to a freed cfs_rq. Any future reference to tmp_alone_branch >> > > > > will >> > > > > result in a use after free. >> > > > >> > > > So, the patch below is a temporary fix that helps to recover from the >> > > > situation where tmp_alone_branch doesn't finished back to >> > > > rq->leaf_cfs_rq_list >> > > > But this situation should not happened at the beginning >> > >> > I have been able to reproduce the situation where tmp_alone_branch doesn't >> > point to rq->leaf_cfs_rq_list after enqueuing a task. >> > >> > Can you try the patch below which ensures all cfs_rq of a cgroup branch >> > will >> > be added in the list even if throttled ? >> >> Did you get a chance to test this patch ? >> >> Regards, >> Vincent >> >> > >> > The algorithm used to order cfs_rq in rq->leaf_cfs_rq_list assumes that >> > it will walk down to root the 1st time a cfs_rq is used and we will >> > finished >> > to add either a cfs_rq without parent or a cfs_rq with a parent that is >> > already >> > on the list. But this is not always true in presence of throttling. >> > Because a cfs_rq can be throttled even if it has never been used but other >> > CPUS >> > of the cgroup have already used all the bandwdith, we are not sure to go >> > down to >> > the root and add all cfs_rq in the list. >> > >> > Ensure that all cfs_rq will be added in the list even if they are >> > throttled. >> > >> > Signed-off-by: Vincent Guittot <vincent.guit...@linaro.org> >> > --- >> > kernel/sched/fair.c | 17 +++++++++++++++++ >> > 1 file changed, 17 insertions(+) >> > >> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> > index 6483834..ae468ab 100644 >> > --- a/kernel/sched/fair.c >> > +++ b/kernel/sched/fair.c >> > @@ -352,6 +352,20 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq >> > *cfs_rq) >> > } >> > } >> > >> > +static inline void list_add_branch_cfs_rq(struct sched_entity *se, struct >> > rq *rq) >> > +{ >> > +struct cfs_rq *cfs_rq; >> > + >> > + for_each_sched_entity(se) { >> > + cfs_rq = cfs_rq_of(se); >> > + list_add_leaf_cfs_rq(cfs_rq); >> > + >> > + /* If parent is already in the list, we can stop */ >> > + if (rq->tmp_alone_branch == &rq->leaf_cfs_rq_list) >> > + break; >> > + } >> > +} >> > + >> > /* Iterate through all leaf cfs_rq's on a runqueue: */ >> > #define for_each_leaf_cfs_rq(rq, cfs_rq) \ >> > list_for_each_entry_rcu(cfs_rq, &rq->leaf_cfs_rq_list, >> > leaf_cfs_rq_list) >> > @@ -5177,6 +5191,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct >> > *p, int flags) >> > >> > } >> > >> > + /* Ensure that all cfs_rq have been added to the list */ >> > + list_add_branch_cfs_rq(se, rq); >> > + >> > hrtick_update(rq); >> > } >> > >> > >> > >> > > > >> > > > >> > > > > >> > > > > Signed-off-by: Gabriel Hartmann <gabriel.hartm...@gmail.com> >> > > > > Reported-by: Sargun Dhillon <sar...@sargun.me> >> > > > > --- >> > > > > kernel/sched/fair.c | 5 +++++ >> > > > > 1 file changed, 5 insertions(+) >> > > > > >> > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> > > > > index 7137bc343b4a..0987629cbb76 100644 >> > > > > --- a/kernel/sched/fair.c >> > > > > +++ b/kernel/sched/fair.c >> > > > > @@ -347,6 +347,11 @@ static inline void list_add_leaf_cfs_rq(struct >> > > > > cfs_rq *cfs_rq) >> > > > > static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq) >> > > > > { >> > > > > if (cfs_rq->on_list) { >> > > > > + struct rq *rq = rq_of(cfs_rq); >> > > > > + >> > > > > + if (rq->tmp_alone_branch == &cfs_rq->leaf_cfs_rq_list) >> > > > > + rq->tmp_alone_branch = &rq->leaf_cfs_rq_list; >> > > > > + >> > > > > list_del_rcu(&cfs_rq->leaf_cfs_rq_list); >> > > > > cfs_rq->on_list = 0; >> > > > > }