Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more

2020-05-07 Thread Vincent Guittot
On Thu, 7 May 2020 at 20:04, Phil Auld  wrote:
>
> Hi Vincent,
>
> On Thu, May 07, 2020 at 05:06:29PM +0200 Vincent Guittot wrote:
> > Hi Phil,
> >
> > On Wed, 6 May 2020 at 20:05, Phil Auld  wrote:
> > >
> > > Hi Vincent,
> > >
> > > Thanks for taking a look. More below...
> > >
> > > On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> > > > Hi Phil,
> > > >
> > > > - reply to all this time
> > > >
> > > > On Wed, 6 May 2020 at 16:18, Phil Auld  wrote:
> > > > >
> > > > > sched/fair: Fix enqueue_task_fair warning some more
> > > > >
> > > > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair 
> > > > > warning)
> > > > > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > > > > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case 
> > > > > where
> > > > > the first for_each_sched_entity loop exits due to on_rq, having 
> > > > > incompletely
> > > > > updated the list.  In this case the second for_each_sched_entity loop 
> > > > > can
> > > > > further modify se. The later code to fix up the list management fails 
> > > > > to do
> > > >
> > > > But for the 2nd  for_each_sched_entity, the cfs_rq should already be
> > > > in the list, isn't it ?
> > >
> > > No. In this case we hit the parent not on list case in 
> > > list_add_leaf_cfs_rq
> > > which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
> > > the same. It returns false expecting the parent to be added later.
> > >
> > > But then the parent doens't get there because it's on_rq.
> > >
> > > >
> > > > The third for_each_entity loop is there for the throttled case but is
> > > > useless for other case
> > > >
> > >
> > > There actually is a throttling involved usually. The second loop breaks 
> > > out
> > > early because one of the parents is throttled. But not before it advances
> > > se at least once.
> >
> > Ok, that's even because of the throttling that the problem occurs
> >
> > >
> > > Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't 
> > > start
> > > with the right se.
> > >
> > > > Could you provide us some details about the use case that creates such
> > > > a situation ?
> > > >
> > >
> > > I admit I had to add trace_printks to get here. Here's what it showed 
> > > (sorry
> > > for the long lines...)
> > >
> > > 1)  sh-6271  [044]  1271.322317: bprint: enqueue_task_fair: se 
> > > 0xa085e7e30080 on_rq 0 cfs_rq = 0xa085e93da200
> > > 2)  sh-6271  [044]  1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 
> > > 17: nr_r 2; cfs 0xa085e93da200 onlist 0 tmp_a_b = 0xa085ef92c868 
> > > &rq->l_c_r_l = 0xa085ef92c868
> > > 3)  sh-6271  [044]  1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 
> > > 17: nr_r 2: parent not onlist  Set t_a_branch to 0xa085e93da340 
> > > rq->l_c_r_l = 0xa085ef92c868
> > > 4)  sh-6271  [044]  1271.322323: bprint: enqueue_task_fair: se 
> > > 0xa085e93d8800 on_rq 1 cfs_rq = 0xa085dbfaea00
> > > 5)  sh-6271  [044]  1271.322324: bprint: enqueue_task_fair: Done 
> > > enqueues, se=0xa085e93d8800, pid=3642
> > > 6)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: update: cfs 
> > > 0xa085e48ce000 throttled, se = 0xa085dbfafc00
> > > 7)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: current se = 
> > > 0xa085dbfafc00, orig_se = 0xa085e7e30080
> > > 8)  sh-6271  [044]  1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: 
> > > cpu 17: nr_r 2; cfs 0xa085e48ce000 onlist 1 tmp_a_b = 
> > > 0xa085e93da340 &rq->l_c_r_l = 0xa085ef92c868
> > > 9)  sh-6271  [044]  1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: 
> > > cpu 17: nr_r 0; cfs 0xa085ef92bf80 onlist 1 tmp_a_b = 
> > > 0xa085e93da340 &rq->l_c_r_l = 0xa085ef92c868
> > > 10) sh-6271  [044]  1271.672599: bprint: enqueue_task_fair: cpu 17: 
> > > rq->tmp_alone_branch = 0xa085e93da340 != &rq->leaf_cfs_rq_list = 
> > > 0xa085ef92c868
> > >
> > >
> > > lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 
> > > are from the
> > > first call to list_add_leaf_rq with line 2 being at the start and line 3 
> > > showing which
> > > of the 3 cases we hit.
> > >
> > > Line 5 is right after the first loop.
> > >
> > > Line 6 is the second trip through the 2nd loop and is in the 
> > > if(throttled) condition.
> > > Line 7 is right below the enqueue_throttle label.
> > >
> > > Lines 8 and 9 are from the fixup loop and since onlist is set for both of 
> > > these it doesn't
> > > do anything. But we've left rq->tmp_alone_branch pointing to the 
> > > cfs_rq->leaf_cfs_rq_list
> > > from the one call to list_add_leaf_rq that did something and so the 
> > > cleanup doesn't work.
> > >
> > > Based on the comment at the clean up, it looked like it expected the se 
> > > to be what it was
> > > when the first loop broke not whatever it was left at after the second 
> > > loop.  Could have
> > > been NULL there too I guess but I didn't hit that case.
> > 

Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more

2020-05-07 Thread Phil Auld
Hi Vincent,

On Thu, May 07, 2020 at 05:06:29PM +0200 Vincent Guittot wrote:
> Hi Phil,
> 
> On Wed, 6 May 2020 at 20:05, Phil Auld  wrote:
> >
> > Hi Vincent,
> >
> > Thanks for taking a look. More below...
> >
> > On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> > > Hi Phil,
> > >
> > > - reply to all this time
> > >
> > > On Wed, 6 May 2020 at 16:18, Phil Auld  wrote:
> > > >
> > > > sched/fair: Fix enqueue_task_fair warning some more
> > > >
> > > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair 
> > > > warning)
> > > > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > > > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case 
> > > > where
> > > > the first for_each_sched_entity loop exits due to on_rq, having 
> > > > incompletely
> > > > updated the list.  In this case the second for_each_sched_entity loop 
> > > > can
> > > > further modify se. The later code to fix up the list management fails 
> > > > to do
> > >
> > > But for the 2nd  for_each_sched_entity, the cfs_rq should already be
> > > in the list, isn't it ?
> >
> > No. In this case we hit the parent not on list case in list_add_leaf_cfs_rq
> > which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
> > the same. It returns false expecting the parent to be added later.
> >
> > But then the parent doens't get there because it's on_rq.
> >
> > >
> > > The third for_each_entity loop is there for the throttled case but is
> > > useless for other case
> > >
> >
> > There actually is a throttling involved usually. The second loop breaks out
> > early because one of the parents is throttled. But not before it advances
> > se at least once.
> 
> Ok, that's even because of the throttling that the problem occurs
> 
> >
> > Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't start
> > with the right se.
> >
> > > Could you provide us some details about the use case that creates such
> > > a situation ?
> > >
> >
> > I admit I had to add trace_printks to get here. Here's what it showed (sorry
> > for the long lines...)
> >
> > 1)  sh-6271  [044]  1271.322317: bprint: enqueue_task_fair: se 
> > 0xa085e7e30080 on_rq 0 cfs_rq = 0xa085e93da200
> > 2)  sh-6271  [044]  1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 
> > 17: nr_r 2; cfs 0xa085e93da200 onlist 0 tmp_a_b = 0xa085ef92c868 
> > &rq->l_c_r_l = 0xa085ef92c868
> > 3)  sh-6271  [044]  1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 
> > 17: nr_r 2: parent not onlist  Set t_a_branch to 0xa085e93da340 
> > rq->l_c_r_l = 0xa085ef92c868
> > 4)  sh-6271  [044]  1271.322323: bprint: enqueue_task_fair: se 
> > 0xa085e93d8800 on_rq 1 cfs_rq = 0xa085dbfaea00
> > 5)  sh-6271  [044]  1271.322324: bprint: enqueue_task_fair: Done enqueues, 
> > se=0xa085e93d8800, pid=3642
> > 6)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: update: cfs 
> > 0xa085e48ce000 throttled, se = 0xa085dbfafc00
> > 7)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: current se = 
> > 0xa085dbfafc00, orig_se = 0xa085e7e30080
> > 8)  sh-6271  [044]  1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: 
> > cpu 17: nr_r 2; cfs 0xa085e48ce000 onlist 1 tmp_a_b = 
> > 0xa085e93da340 &rq->l_c_r_l = 0xa085ef92c868
> > 9)  sh-6271  [044]  1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: 
> > cpu 17: nr_r 0; cfs 0xa085ef92bf80 onlist 1 tmp_a_b = 
> > 0xa085e93da340 &rq->l_c_r_l = 0xa085ef92c868
> > 10) sh-6271  [044]  1271.672599: bprint: enqueue_task_fair: cpu 17: 
> > rq->tmp_alone_branch = 0xa085e93da340 != &rq->leaf_cfs_rq_list = 
> > 0xa085ef92c868
> >
> >
> > lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 
> > are from the
> > first call to list_add_leaf_rq with line 2 being at the start and line 3 
> > showing which
> > of the 3 cases we hit.
> >
> > Line 5 is right after the first loop.
> >
> > Line 6 is the second trip through the 2nd loop and is in the if(throttled) 
> > condition.
> > Line 7 is right below the enqueue_throttle label.
> >
> > Lines 8 and 9 are from the fixup loop and since onlist is set for both of 
> > these it doesn't
> > do anything. But we've left rq->tmp_alone_branch pointing to the 
> > cfs_rq->leaf_cfs_rq_list
> > from the one call to list_add_leaf_rq that did something and so the cleanup 
> > doesn't work.
> >
> > Based on the comment at the clean up, it looked like it expected the se to 
> > be what it was
> > when the first loop broke not whatever it was left at after the second 
> > loop.  Could have
> > been NULL there too I guess but I didn't hit that case.
> >
> > This is 100% reproducible. And completely gone with the fix. I have a trace 
> > showing that.
> >
> > Does that make more sense?
> 
> Yes, Good catch
> And thanks for the detailed explanation.
> 
> >
> >
> >
> > Cheers,
> > Phil
> >
> >
> > > > what is needed because se does not point to the sch

Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more

2020-05-07 Thread Phil Auld
Hi Vincent,

On Thu, May 07, 2020 at 05:06:29PM +0200 Vincent Guittot wrote:
> Hi Phil,
> 
> On Wed, 6 May 2020 at 20:05, Phil Auld  wrote:
> >
> > Hi Vincent,
> >
> > Thanks for taking a look. More below...
> >
> > On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> > > Hi Phil,
> > >
> > > - reply to all this time
> > >
> > > On Wed, 6 May 2020 at 16:18, Phil Auld  wrote:
> > > >
> > > > sched/fair: Fix enqueue_task_fair warning some more
> > > >
> > > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair 
> > > > warning)
> > > > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > > > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case 
> > > > where
> > > > the first for_each_sched_entity loop exits due to on_rq, having 
> > > > incompletely
> > > > updated the list.  In this case the second for_each_sched_entity loop 
> > > > can
> > > > further modify se. The later code to fix up the list management fails 
> > > > to do
> > >
> > > But for the 2nd  for_each_sched_entity, the cfs_rq should already be
> > > in the list, isn't it ?
> >
> > No. In this case we hit the parent not on list case in list_add_leaf_cfs_rq
> > which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
> > the same. It returns false expecting the parent to be added later.
> >
> > But then the parent doens't get there because it's on_rq.
> >
> > >
> > > The third for_each_entity loop is there for the throttled case but is
> > > useless for other case
> > >
> >
> > There actually is a throttling involved usually. The second loop breaks out
> > early because one of the parents is throttled. But not before it advances
> > se at least once.
> 
> Ok, that's even because of the throttling that the problem occurs
> 
> >
> > Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't start
> > with the right se.
> >
> > > Could you provide us some details about the use case that creates such
> > > a situation ?
> > >
> >
> > I admit I had to add trace_printks to get here. Here's what it showed (sorry
> > for the long lines...)
> >
> > 1)  sh-6271  [044]  1271.322317: bprint: enqueue_task_fair: se 
> > 0xa085e7e30080 on_rq 0 cfs_rq = 0xa085e93da200
> > 2)  sh-6271  [044]  1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 
> > 17: nr_r 2; cfs 0xa085e93da200 onlist 0 tmp_a_b = 0xa085ef92c868 
> > &rq->l_c_r_l = 0xa085ef92c868
> > 3)  sh-6271  [044]  1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 
> > 17: nr_r 2: parent not onlist  Set t_a_branch to 0xa085e93da340 
> > rq->l_c_r_l = 0xa085ef92c868
> > 4)  sh-6271  [044]  1271.322323: bprint: enqueue_task_fair: se 
> > 0xa085e93d8800 on_rq 1 cfs_rq = 0xa085dbfaea00
> > 5)  sh-6271  [044]  1271.322324: bprint: enqueue_task_fair: Done enqueues, 
> > se=0xa085e93d8800, pid=3642
> > 6)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: update: cfs 
> > 0xa085e48ce000 throttled, se = 0xa085dbfafc00
> > 7)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: current se = 
> > 0xa085dbfafc00, orig_se = 0xa085e7e30080
> > 8)  sh-6271  [044]  1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: 
> > cpu 17: nr_r 2; cfs 0xa085e48ce000 onlist 1 tmp_a_b = 
> > 0xa085e93da340 &rq->l_c_r_l = 0xa085ef92c868
> > 9)  sh-6271  [044]  1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: 
> > cpu 17: nr_r 0; cfs 0xa085ef92bf80 onlist 1 tmp_a_b = 
> > 0xa085e93da340 &rq->l_c_r_l = 0xa085ef92c868
> > 10) sh-6271  [044]  1271.672599: bprint: enqueue_task_fair: cpu 17: 
> > rq->tmp_alone_branch = 0xa085e93da340 != &rq->leaf_cfs_rq_list = 
> > 0xa085ef92c868
> >
> >
> > lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 
> > are from the
> > first call to list_add_leaf_rq with line 2 being at the start and line 3 
> > showing which
> > of the 3 cases we hit.
> >
> > Line 5 is right after the first loop.
> >
> > Line 6 is the second trip through the 2nd loop and is in the if(throttled) 
> > condition.
> > Line 7 is right below the enqueue_throttle label.
> >
> > Lines 8 and 9 are from the fixup loop and since onlist is set for both of 
> > these it doesn't
> > do anything. But we've left rq->tmp_alone_branch pointing to the 
> > cfs_rq->leaf_cfs_rq_list
> > from the one call to list_add_leaf_rq that did something and so the cleanup 
> > doesn't work.
> >
> > Based on the comment at the clean up, it looked like it expected the se to 
> > be what it was
> > when the first loop broke not whatever it was left at after the second 
> > loop.  Could have
> > been NULL there too I guess but I didn't hit that case.
> >
> > This is 100% reproducible. And completely gone with the fix. I have a trace 
> > showing that.
> >
> > Does that make more sense?
> 
> Yes, Good catch
> And thanks for the detailed explanation.

No problem. I had to see it all myself anyway :)


> 
> >
> >
> >
> > Cheers,
> > Phil
> >
> >
> > > >

Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more

2020-05-07 Thread Vincent Guittot
Hi Phil,

On Wed, 6 May 2020 at 20:05, Phil Auld  wrote:
>
> Hi Vincent,
>
> Thanks for taking a look. More below...
>
> On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> > Hi Phil,
> >
> > - reply to all this time
> >
> > On Wed, 6 May 2020 at 16:18, Phil Auld  wrote:
> > >
> > > sched/fair: Fix enqueue_task_fair warning some more
> > >
> > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> > > the first for_each_sched_entity loop exits due to on_rq, having 
> > > incompletely
> > > updated the list.  In this case the second for_each_sched_entity loop can
> > > further modify se. The later code to fix up the list management fails to 
> > > do
> >
> > But for the 2nd  for_each_sched_entity, the cfs_rq should already be
> > in the list, isn't it ?
>
> No. In this case we hit the parent not on list case in list_add_leaf_cfs_rq
> which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
> the same. It returns false expecting the parent to be added later.
>
> But then the parent doens't get there because it's on_rq.
>
> >
> > The third for_each_entity loop is there for the throttled case but is
> > useless for other case
> >
>
> There actually is a throttling involved usually. The second loop breaks out
> early because one of the parents is throttled. But not before it advances
> se at least once.

Ok, that's even because of the throttling that the problem occurs

>
> Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't start
> with the right se.
>
> > Could you provide us some details about the use case that creates such
> > a situation ?
> >
>
> I admit I had to add trace_printks to get here. Here's what it showed (sorry
> for the long lines...)
>
> 1)  sh-6271  [044]  1271.322317: bprint: enqueue_task_fair: se 
> 0xa085e7e30080 on_rq 0 cfs_rq = 0xa085e93da200
> 2)  sh-6271  [044]  1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 17: 
> nr_r 2; cfs 0xa085e93da200 onlist 0 tmp_a_b = 0xa085ef92c868 
> &rq->l_c_r_l = 0xa085ef92c868
> 3)  sh-6271  [044]  1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 17: 
> nr_r 2: parent not onlist  Set t_a_branch to 0xa085e93da340 rq->l_c_r_l = 
> 0xa085ef92c868
> 4)  sh-6271  [044]  1271.322323: bprint: enqueue_task_fair: se 
> 0xa085e93d8800 on_rq 1 cfs_rq = 0xa085dbfaea00
> 5)  sh-6271  [044]  1271.322324: bprint: enqueue_task_fair: Done enqueues, 
> se=0xa085e93d8800, pid=3642
> 6)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: update: cfs 
> 0xa085e48ce000 throttled, se = 0xa085dbfafc00
> 7)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: current se = 
> 0xa085dbfafc00, orig_se = 0xa085e7e30080
> 8)  sh-6271  [044]  1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: cpu 
> 17: nr_r 2; cfs 0xa085e48ce000 onlist 1 tmp_a_b = 0xa085e93da340 
> &rq->l_c_r_l = 0xa085ef92c868
> 9)  sh-6271  [044]  1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: cpu 
> 17: nr_r 0; cfs 0xa085ef92bf80 onlist 1 tmp_a_b = 0xa085e93da340 
> &rq->l_c_r_l = 0xa085ef92c868
> 10) sh-6271  [044]  1271.672599: bprint: enqueue_task_fair: cpu 17: 
> rq->tmp_alone_branch = 0xa085e93da340 != &rq->leaf_cfs_rq_list = 
> 0xa085ef92c868
>
>
> lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 are 
> from the
> first call to list_add_leaf_rq with line 2 being at the start and line 3 
> showing which
> of the 3 cases we hit.
>
> Line 5 is right after the first loop.
>
> Line 6 is the second trip through the 2nd loop and is in the if(throttled) 
> condition.
> Line 7 is right below the enqueue_throttle label.
>
> Lines 8 and 9 are from the fixup loop and since onlist is set for both of 
> these it doesn't
> do anything. But we've left rq->tmp_alone_branch pointing to the 
> cfs_rq->leaf_cfs_rq_list
> from the one call to list_add_leaf_rq that did something and so the cleanup 
> doesn't work.
>
> Based on the comment at the clean up, it looked like it expected the se to be 
> what it was
> when the first loop broke not whatever it was left at after the second loop.  
> Could have
> been NULL there too I guess but I didn't hit that case.
>
> This is 100% reproducible. And completely gone with the fix. I have a trace 
> showing that.
>
> Does that make more sense?

Yes, Good catch
And thanks for the detailed explanation.

>
>
>
> Cheers,
> Phil
>
>
> > > what is needed because se does not point to the sched_entity which broke 
> > > out
> > > of the first loop.
> > >
> > > Address this issue by saving the se pointer when the first loop exits and
> > > resetting it before doing the fix up, if needed.
> > >
> > > Signed-off-by: Phil Auld 
> > > Cc: Peter Zijlstra (Intel) 
> > > Cc: Vincent Guittot 
> > > Cc: Ingo Molnar 
> > > Cc: Juri Lelli 
> > > ---
>

Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more

2020-05-06 Thread Phil Auld
Hi Vincent,

Thanks for taking a look. More below...

On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> Hi Phil,
> 
> - reply to all this time
> 
> On Wed, 6 May 2020 at 16:18, Phil Auld  wrote:
> >
> > sched/fair: Fix enqueue_task_fair warning some more
> >
> > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > updated the list.  In this case the second for_each_sched_entity loop can
> > further modify se. The later code to fix up the list management fails to do
> 
> But for the 2nd  for_each_sched_entity, the cfs_rq should already be
> in the list, isn't it ?

No. In this case we hit the parent not on list case in list_add_leaf_cfs_rq
which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
the same. It returns false expecting the parent to be added later.

But then the parent doens't get there because it's on_rq. 

> 
> The third for_each_entity loop is there for the throttled case but is
> useless for other case
>

There actually is a throttling involved usually. The second loop breaks out
early because one of the parents is throttled. But not before it advances
se at least once. 

Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't start
with the right se.

> Could you provide us some details about the use case that creates such
> a situation ?
>

I admit I had to add trace_printks to get here. Here's what it showed (sorry
for the long lines...)

1)  sh-6271  [044]  1271.322317: bprint: enqueue_task_fair: se 
0xa085e7e30080 on_rq 0 cfs_rq = 0xa085e93da200
2)  sh-6271  [044]  1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 17: 
nr_r 2; cfs 0xa085e93da200 onlist 0 tmp_a_b = 0xa085ef92c868 
&rq->l_c_r_l = 0xa085ef92c868
3)  sh-6271  [044]  1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 17: 
nr_r 2: parent not onlist  Set t_a_branch to 0xa085e93da340 rq->l_c_r_l = 
0xa085ef92c868
4)  sh-6271  [044]  1271.322323: bprint: enqueue_task_fair: se 
0xa085e93d8800 on_rq 1 cfs_rq = 0xa085dbfaea00
5)  sh-6271  [044]  1271.322324: bprint: enqueue_task_fair: Done enqueues, 
se=0xa085e93d8800, pid=3642
6)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: update: cfs 
0xa085e48ce000 throttled, se = 0xa085dbfafc00
7)  sh-6271  [044]  1271.322326: bprint: enqueue_task_fair: current se = 
0xa085dbfafc00, orig_se = 0xa085e7e30080
8)  sh-6271  [044]  1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: cpu 
17: nr_r 2; cfs 0xa085e48ce000 onlist 1 tmp_a_b = 0xa085e93da340 
&rq->l_c_r_l = 0xa085ef92c868
9)  sh-6271  [044]  1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: cpu 
17: nr_r 0; cfs 0xa085ef92bf80 onlist 1 tmp_a_b = 0xa085e93da340 
&rq->l_c_r_l = 0xa085ef92c868
10) sh-6271  [044]  1271.672599: bprint: enqueue_task_fair: cpu 17: 
rq->tmp_alone_branch = 0xa085e93da340 != &rq->leaf_cfs_rq_list = 
0xa085ef92c868


lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 are 
from the
first call to list_add_leaf_rq with line 2 being at the start and line 3 
showing which
of the 3 cases we hit.

Line 5 is right after the first loop.

Line 6 is the second trip through the 2nd loop and is in the if(throttled) 
condition.
Line 7 is right below the enqueue_throttle label.

Lines 8 and 9 are from the fixup loop and since onlist is set for both of these 
it doesn't
do anything. But we've left rq->tmp_alone_branch pointing to the 
cfs_rq->leaf_cfs_rq_list
from the one call to list_add_leaf_rq that did something and so the cleanup 
doesn't work.

Based on the comment at the clean up, it looked like it expected the se to be 
what it was
when the first loop broke not whatever it was left at after the second loop.  
Could have
been NULL there too I guess but I didn't hit that case.

This is 100% reproducible. And completely gone with the fix. I have a trace 
showing that.

Does that make more sense?



Cheers,
Phil


> > what is needed because se does not point to the sched_entity which broke out
> > of the first loop.
> >
> > Address this issue by saving the se pointer when the first loop exits and
> > resetting it before doing the fix up, if needed.
> >
> > Signed-off-by: Phil Auld 
> > Cc: Peter Zijlstra (Intel) 
> > Cc: Vincent Guittot 
> > Cc: Ingo Molnar 
> > Cc: Juri Lelli 
> > ---
> >  kernel/sched/fair.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 02f323b85b6d..719c996317e3 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct 
> > *p, int flags)
> >  {
> > struct cfs_rq *cfs_rq;
> > struct sched_entity *se = &p->se;
>

Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more

2020-05-06 Thread Vincent Guittot
Hi Phil,

- reply to all this time

On Wed, 6 May 2020 at 16:18, Phil Auld  wrote:
>
> sched/fair: Fix enqueue_task_fair warning some more
>
> The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> did not fully resolve the issues with the (rq->tmp_alone_branch !=
> &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> the first for_each_sched_entity loop exits due to on_rq, having incompletely
> updated the list.  In this case the second for_each_sched_entity loop can
> further modify se. The later code to fix up the list management fails to do

But for the 2nd  for_each_sched_entity, the cfs_rq should already be
in the list, isn't it ?

The third for_each_entity loop is there for the throttled case but is
useless for other case

Could you provide us some details about the use case that creates such
a situation ?

> what is needed because se does not point to the sched_entity which broke out
> of the first loop.
>
> Address this issue by saving the se pointer when the first loop exits and
> resetting it before doing the fix up, if needed.
>
> Signed-off-by: Phil Auld 
> Cc: Peter Zijlstra (Intel) 
> Cc: Vincent Guittot 
> Cc: Ingo Molnar 
> Cc: Juri Lelli 
> ---
>  kernel/sched/fair.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 02f323b85b6d..719c996317e3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, 
> int flags)
>  {
> struct cfs_rq *cfs_rq;
> struct sched_entity *se = &p->se;
> +   struct sched_entity *saved_se = NULL;
> int idle_h_nr_running = task_has_idle_policy(p);
>
> /*
> @@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, 
> int flags)
> flags = ENQUEUE_WAKEUP;
> }
>
> +   saved_se = se;
> for_each_sched_entity(se) {
> cfs_rq = cfs_rq_of(se);
>
> @@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, 
> int flags)
>  * leaf list maintenance, resulting in triggering the 
> assertion
>  * below.
>  */
> +   if (saved_se)
> +   se = saved_se;
> for_each_sched_entity(se) {
> cfs_rq = cfs_rq_of(se);
>
> --
> 2.18.0
>
>
> Cheers,
> Phil
>


[PATCH] sched/fair: Fix enqueue_task_fair warning some more

2020-05-06 Thread Phil Auld
sched/fair: Fix enqueue_task_fair warning some more

The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the (rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list.  In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se does not point to the sched_entity which broke out
of the first loop.

Address this issue by saving the se pointer when the first loop exits and
resetting it before doing the fix up, if needed.

Signed-off-by: Phil Auld 
Cc: Peter Zijlstra (Intel) 
Cc: Vincent Guittot 
Cc: Ingo Molnar 
Cc: Juri Lelli 
---
 kernel/sched/fair.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b85b6d..719c996317e3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, 
int flags)
 {
struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se;
+   struct sched_entity *saved_se = NULL;
int idle_h_nr_running = task_has_idle_policy(p);
 
/*
@@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, 
int flags)
flags = ENQUEUE_WAKEUP;
}
 
+   saved_se = se;
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);
 
@@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, 
int flags)
 * leaf list maintenance, resulting in triggering the assertion
 * below.
 */
+   if (saved_se)
+   se = saved_se;
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);
 
-- 
2.18.0


Cheers,
Phil