Re: [tip:sched/core] sched/fair: Correct unit of load_above_capacity
On Tue, May 24, 2016 at 04:24:01AM +0800, Yuyang Du wrote: > On Thu, May 19, 2016 at 04:36:38PM +0100, Morten Rasmussen wrote: > > > > And this is exactly you get with this patch :-) load_above_capacity > > (through max_pull) is multiplied by the group capacity to compute that > > actual amount of [load] to remove: > > > > env->imbalance = load_above_capacity * busiest->group_capacity / > > SCHED_CAPACITY_SCALE > > > > = 1*NICE_0_LOAD * 3*SCHED_CAPACITY_SCALE / > > SCHED_CAPACITY_SCALE > > > > = 3*NICE_0_LOAD > > > > I don't think we disagree on how it should work :-) Without the capacity > > scaling in this patch you get: > > > > env->imbalance = (6*SCHED_CAPACITY_SCALE - 3*SCHED_CAPACITY_SCALE) * > > 3*SCHED_CAPACITY_SCALE / SCHED_CAPACITY_SCALE > > > > = 9*SCHED_CAPACITY_SCALE > > > > Coming back to Yuyang's question. I think it should be NICE_0_LOAD to > > ensure that the resulting imbalance has the proper unit [load]. > > Sorry, I'm still confused. After this patch, the unit is indeed [load], the > load same as the weight visible to the user. However, you literally compared > it with sg_lb_stats's avg_load and load_per_task, which has the unit of > load_avg, which is scaled_load_down(NICE_0_LOAD). Am I missing something? Hello?
Re: [tip:sched/core] sched/fair: Correct unit of load_above_capacity
On Thu, May 19, 2016 at 04:36:38PM +0100, Morten Rasmussen wrote: > > And this is exactly you get with this patch :-) load_above_capacity > (through max_pull) is multiplied by the group capacity to compute that > actual amount of [load] to remove: > > env->imbalance= load_above_capacity * busiest->group_capacity / > SCHED_CAPACITY_SCALE > > = 1*NICE_0_LOAD * 3*SCHED_CAPACITY_SCALE / > SCHED_CAPACITY_SCALE > > = 3*NICE_0_LOAD > > I don't think we disagree on how it should work :-) Without the capacity > scaling in this patch you get: > > env->imbalance = (6*SCHED_CAPACITY_SCALE - 3*SCHED_CAPACITY_SCALE) * > 3*SCHED_CAPACITY_SCALE / SCHED_CAPACITY_SCALE > > = 9*SCHED_CAPACITY_SCALE > > Coming back to Yuyang's question. I think it should be NICE_0_LOAD to > ensure that the resulting imbalance has the proper unit [load]. Sorry, I'm still confused. After this patch, the unit is indeed [load], the load same as the weight visible to the user. However, you literally compared it with sg_lb_stats's avg_load and load_per_task, which has the unit of load_avg, which is scaled_load_down(NICE_0_LOAD). Am I missing something?
Re: [tip:sched/core] sched/fair: Correct unit of load_above_capacity
Hi Morten, On 19 May 2016 at 17:36, Morten Rasmussen wrote: > On Fri, May 13, 2016 at 10:22:19AM +0200, Vincent Guittot wrote: >> On 12 May 2016 at 23:48, Yuyang Du wrote: >> > On Thu, May 12, 2016 at 03:31:51AM -0700, tip-bot for Morten Rasmussen >> > wrote: >> >> Commit-ID: cfa10334318d8212d007da8c771187643c9cef35 >> >> Gitweb: >> >> http://git.kernel.org/tip/cfa10334318d8212d007da8c771187643c9cef35 >> >> Author: Morten Rasmussen >> >> AuthorDate: Fri, 29 Apr 2016 20:32:40 +0100 >> >> Committer: Ingo Molnar >> >> CommitDate: Thu, 12 May 2016 09:55:33 +0200 >> >> >> >> sched/fair: Correct unit of load_above_capacity >> >> >> >> In calculate_imbalance() load_above_capacity currently has the unit >> >> [capacity] while it is used as being [load/capacity]. Not only is it >> >> wrong it also makes it unlikely that load_above_capacity is ever used >> >> as the subsequent code picks the smaller of load_above_capacity and >> >> the avg_load >> >> >> >> This patch ensures that load_above_capacity has the right unit >> >> [load/capacity]. >> >> load_above_capacity has the unit [load] and is then compared with other load > > I'm not sure if talk about the same code, I'm referring to: > > max_pull = min(busiest->avg_load - sds->avg_load, load_above_capacity); > > a bit further down. Here the first option has unit [load/capacity], and > the subsequent code multiplies the result with [capacity] to get back to > [load]: My understand is that it multiplies and divides by capacity so we select the minimum between max_pull * busiest->group_capacity/SCHED_CAPACITY_SCALE and (sds->avg_load - local->avg_load) * local->group_capacity / SCHED_CAPACITY_SCALE so the unit of imbalance is the same as max_pull because we multiply and divide by [capacity] the imbalance's unit is [load] so max_pull also has a unit [load] and max_pull has the same unit of load_above_capacity > > /* How much load to actually move to equalise the imbalance */ > env->imbalance = min( > max_pull * busiest->group_capacity, > (sds->avg_load - local->avg_load) * local->group_capacity > ) / SCHED_CAPACITY_SCALE; > > That lead me to the conclusion that load_above_capacity would have to be > 'per capacity', i.e. [load/capacity], as well for the code to make > sense. > >> >> >> >> >> Signed-off-by: Morten Rasmussen >> >> [ Changed changelog to note it was in capacity unit; +rebase. ] >> >> Signed-off-by: Peter Zijlstra (Intel) >> >> Cc: Dietmar Eggemann >> >> Cc: Linus Torvalds >> >> Cc: Mike Galbraith >> >> Cc: Peter Zijlstra >> >> Cc: Thomas Gleixner >> >> Cc: linux-kernel@vger.kernel.org >> >> Link: >> >> http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggem...@arm.com >> >> Signed-off-by: Ingo Molnar >> >> --- >> >> kernel/sched/fair.c | 6 -- >> >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> >> index 2338105..218f8e8 100644 >> >> --- a/kernel/sched/fair.c >> >> +++ b/kernel/sched/fair.c >> >> @@ -7067,9 +7067,11 @@ static inline void calculate_imbalance(struct >> >> lb_env *env, struct sd_lb_stats *s >> >> if (busiest->group_type == group_overloaded && >> >> local->group_type == group_overloaded) { >> >> load_above_capacity = busiest->sum_nr_running * >> >> SCHED_CAPACITY_SCALE; >> >> - if (load_above_capacity > busiest->group_capacity) >> >> + if (load_above_capacity > busiest->group_capacity) { >> >> load_above_capacity -= busiest->group_capacity; >> >> - else >> >> + load_above_capacity *= NICE_0_LOAD; >> >> + load_above_capacity /= busiest->group_capacity; >> >> If I follow correctly the sequence, >> load_above_capacity = (busiest->sum_nr_running * SCHED_CAPACITY_SCALE >> - busiest->group_capacity) * NICE_0_LOAD / busiest->group_capacity >> so >> load_above_capacity = busiest->sum_nr_running * SCHED_CAPACITY_SCALE / >> busiest->group_capacity * NICE_0_LOAD - NICE_0_LOAD >> >> so the unit is [load] > > I'm with you for the equation, but not for the unit and I find it hard > convince myself what the unit really is :-( the sole NICE_0_LOAD in the equation above tends to confirms that it's only [load] > > I think it is down to the fact that we are comparing [load] directly > with [capacity] here, basically saying [load] == [capacity]. Most other > places we scale [load] by [capacity], we don't compare the two or > otherwise imply that they are the same unit. Do we have any other > examples of this? IIUC the goal of this part of calculate_imbalance, we make the assumption that one task with NICE_0_LOAD should fit in one core with SCHED_CAPACITY_SCALE capacity and we want to ensure that we will not make a CPU idle > > The name load_above_capacity suggests that it should have unit [load], > but we actually compute it by subtracting to values, where the latter > clearly has unit [capacit
Re: [tip:sched/core] sched/fair: Correct unit of load_above_capacity
On Fri, May 13, 2016 at 10:22:19AM +0200, Vincent Guittot wrote: > On 12 May 2016 at 23:48, Yuyang Du wrote: > > On Thu, May 12, 2016 at 03:31:51AM -0700, tip-bot for Morten Rasmussen > > wrote: > >> Commit-ID: cfa10334318d8212d007da8c771187643c9cef35 > >> Gitweb: > >> http://git.kernel.org/tip/cfa10334318d8212d007da8c771187643c9cef35 > >> Author: Morten Rasmussen > >> AuthorDate: Fri, 29 Apr 2016 20:32:40 +0100 > >> Committer: Ingo Molnar > >> CommitDate: Thu, 12 May 2016 09:55:33 +0200 > >> > >> sched/fair: Correct unit of load_above_capacity > >> > >> In calculate_imbalance() load_above_capacity currently has the unit > >> [capacity] while it is used as being [load/capacity]. Not only is it > >> wrong it also makes it unlikely that load_above_capacity is ever used > >> as the subsequent code picks the smaller of load_above_capacity and > >> the avg_load > >> > >> This patch ensures that load_above_capacity has the right unit > >> [load/capacity]. > > load_above_capacity has the unit [load] and is then compared with other load I'm not sure if talk about the same code, I'm referring to: max_pull = min(busiest->avg_load - sds->avg_load, load_above_capacity); a bit further down. Here the first option has unit [load/capacity], and the subsequent code multiplies the result with [capacity] to get back to [load]: /* How much load to actually move to equalise the imbalance */ env->imbalance = min( max_pull * busiest->group_capacity, (sds->avg_load - local->avg_load) * local->group_capacity ) / SCHED_CAPACITY_SCALE; That lead me to the conclusion that load_above_capacity would have to be 'per capacity', i.e. [load/capacity], as well for the code to make sense. > > >> > >> Signed-off-by: Morten Rasmussen > >> [ Changed changelog to note it was in capacity unit; +rebase. ] > >> Signed-off-by: Peter Zijlstra (Intel) > >> Cc: Dietmar Eggemann > >> Cc: Linus Torvalds > >> Cc: Mike Galbraith > >> Cc: Peter Zijlstra > >> Cc: Thomas Gleixner > >> Cc: linux-kernel@vger.kernel.org > >> Link: > >> http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggem...@arm.com > >> Signed-off-by: Ingo Molnar > >> --- > >> kernel/sched/fair.c | 6 -- > >> 1 file changed, 4 insertions(+), 2 deletions(-) > >> > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >> index 2338105..218f8e8 100644 > >> --- a/kernel/sched/fair.c > >> +++ b/kernel/sched/fair.c > >> @@ -7067,9 +7067,11 @@ static inline void calculate_imbalance(struct > >> lb_env *env, struct sd_lb_stats *s > >> if (busiest->group_type == group_overloaded && > >> local->group_type == group_overloaded) { > >> load_above_capacity = busiest->sum_nr_running * > >> SCHED_CAPACITY_SCALE; > >> - if (load_above_capacity > busiest->group_capacity) > >> + if (load_above_capacity > busiest->group_capacity) { > >> load_above_capacity -= busiest->group_capacity; > >> - else > >> + load_above_capacity *= NICE_0_LOAD; > >> + load_above_capacity /= busiest->group_capacity; > > If I follow correctly the sequence, > load_above_capacity = (busiest->sum_nr_running * SCHED_CAPACITY_SCALE > - busiest->group_capacity) * NICE_0_LOAD / busiest->group_capacity > so > load_above_capacity = busiest->sum_nr_running * SCHED_CAPACITY_SCALE / > busiest->group_capacity * NICE_0_LOAD - NICE_0_LOAD > > so the unit is [load] I'm with you for the equation, but not for the unit and I find it hard convince myself what the unit really is :-( I think it is down to the fact that we are comparing [load] directly with [capacity] here, basically saying [load] == [capacity]. Most other places we scale [load] by [capacity], we don't compare the two or otherwise imply that they are the same unit. Do we have any other examples of this? The name load_above_capacity suggests that it should have unit [load], but we actually compute it by subtracting to values, where the latter clearly has unit [capacity]. PeterZ's recent patch (1be0eb2a97 sched/fair: Clean up scale confusion) changes the former to be [capacity]. load_above_capacity is later multiplied by [capacity] to determine the imbalance which must have unit [load]. So working our way backwards, load_above_capacity must have unit [load/capacity]. However, if [load] = [capacity] then what we really have is just group-normalized [load]. As said in my other reply, the code only really makes sense for under special circumstances where [load] == [capacity]. The easy, and possibly best, way out of this mess would be to replace the code with something like PeterZ's suggestion that I tried to implement in the patch included in my other reply. > Lets take the example of a group made of 2 cores with 2 threads per > core and the capacity of each core is 1,5* SCHED_CAPACITY_SCALE so the > capacity of the group is 3*SCHED_CAPACITY_SCALE. > If we have 6 tasks on this group : > lo
Re: [tip:sched/core] sched/fair: Correct unit of load_above_capacity
On 12 May 2016 at 23:48, Yuyang Du wrote: > On Thu, May 12, 2016 at 03:31:51AM -0700, tip-bot for Morten Rasmussen wrote: >> Commit-ID: cfa10334318d8212d007da8c771187643c9cef35 >> Gitweb: >> http://git.kernel.org/tip/cfa10334318d8212d007da8c771187643c9cef35 >> Author: Morten Rasmussen >> AuthorDate: Fri, 29 Apr 2016 20:32:40 +0100 >> Committer: Ingo Molnar >> CommitDate: Thu, 12 May 2016 09:55:33 +0200 >> >> sched/fair: Correct unit of load_above_capacity >> >> In calculate_imbalance() load_above_capacity currently has the unit >> [capacity] while it is used as being [load/capacity]. Not only is it >> wrong it also makes it unlikely that load_above_capacity is ever used >> as the subsequent code picks the smaller of load_above_capacity and >> the avg_load >> >> This patch ensures that load_above_capacity has the right unit >> [load/capacity]. load_above_capacity has the unit [load] and is then compared with other load >> >> Signed-off-by: Morten Rasmussen >> [ Changed changelog to note it was in capacity unit; +rebase. ] >> Signed-off-by: Peter Zijlstra (Intel) >> Cc: Dietmar Eggemann >> Cc: Linus Torvalds >> Cc: Mike Galbraith >> Cc: Peter Zijlstra >> Cc: Thomas Gleixner >> Cc: linux-kernel@vger.kernel.org >> Link: >> http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggem...@arm.com >> Signed-off-by: Ingo Molnar >> --- >> kernel/sched/fair.c | 6 -- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 2338105..218f8e8 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -7067,9 +7067,11 @@ static inline void calculate_imbalance(struct lb_env >> *env, struct sd_lb_stats *s >> if (busiest->group_type == group_overloaded && >> local->group_type == group_overloaded) { >> load_above_capacity = busiest->sum_nr_running * >> SCHED_CAPACITY_SCALE; >> - if (load_above_capacity > busiest->group_capacity) >> + if (load_above_capacity > busiest->group_capacity) { >> load_above_capacity -= busiest->group_capacity; >> - else >> + load_above_capacity *= NICE_0_LOAD; >> + load_above_capacity /= busiest->group_capacity; If I follow correctly the sequence, load_above_capacity = (busiest->sum_nr_running * SCHED_CAPACITY_SCALE - busiest->group_capacity) * NICE_0_LOAD / busiest->group_capacity so load_above_capacity = busiest->sum_nr_running * SCHED_CAPACITY_SCALE / busiest->group_capacity * NICE_0_LOAD - NICE_0_LOAD so the unit is [load] Lets take the example of a group made of 2 cores with 2 threads per core and the capacity of each core is 1,5* SCHED_CAPACITY_SCALE so the capacity of the group is 3*SCHED_CAPACITY_SCALE. If we have 6 tasks on this group : load_above capacity = 1 *NICE_0_LOAD which means we want to remove no more than 1 tasks from the group and let 5 tasks in the group whereas we don't expect to have more than 4 tasks as we have 4 CPUs and probably less because the compute capacity of each CPU is less than the default one So i would have expected load_above_capacity = busiest->sum_nr_running * NICE_0_LOAD - busiest->group_capacity * NICE_0_LOAD / SCHED_CAPACITY_SCALE load_above capacity = 3*NICE_0_LOAD which means we want to remove no more than 3 tasks and let 3 tasks in the group Regards, Vincent >> + } else >> load_above_capacity = ~0UL; >> } > > Hi Morten, > > I got the feeling this might be wrong, the NICE_0_LOAD should be scaled down. > But I hope I am wrong. > > Vincent, could you take a look?
Re: [tip:sched/core] sched/fair: Correct unit of load_above_capacity
On Thu, May 12, 2016 at 03:31:51AM -0700, tip-bot for Morten Rasmussen wrote: > Commit-ID: cfa10334318d8212d007da8c771187643c9cef35 > Gitweb: http://git.kernel.org/tip/cfa10334318d8212d007da8c771187643c9cef35 > Author: Morten Rasmussen > AuthorDate: Fri, 29 Apr 2016 20:32:40 +0100 > Committer: Ingo Molnar > CommitDate: Thu, 12 May 2016 09:55:33 +0200 > > sched/fair: Correct unit of load_above_capacity > > In calculate_imbalance() load_above_capacity currently has the unit > [capacity] while it is used as being [load/capacity]. Not only is it > wrong it also makes it unlikely that load_above_capacity is ever used > as the subsequent code picks the smaller of load_above_capacity and > the avg_load > > This patch ensures that load_above_capacity has the right unit > [load/capacity]. > > Signed-off-by: Morten Rasmussen > [ Changed changelog to note it was in capacity unit; +rebase. ] > Signed-off-by: Peter Zijlstra (Intel) > Cc: Dietmar Eggemann > Cc: Linus Torvalds > Cc: Mike Galbraith > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: linux-kernel@vger.kernel.org > Link: > http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggem...@arm.com > Signed-off-by: Ingo Molnar > --- > kernel/sched/fair.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 2338105..218f8e8 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7067,9 +7067,11 @@ static inline void calculate_imbalance(struct lb_env > *env, struct sd_lb_stats *s > if (busiest->group_type == group_overloaded && > local->group_type == group_overloaded) { > load_above_capacity = busiest->sum_nr_running * > SCHED_CAPACITY_SCALE; > - if (load_above_capacity > busiest->group_capacity) > + if (load_above_capacity > busiest->group_capacity) { > load_above_capacity -= busiest->group_capacity; > - else > + load_above_capacity *= NICE_0_LOAD; > + load_above_capacity /= busiest->group_capacity; > + } else > load_above_capacity = ~0UL; > } Hi Morten, I got the feeling this might be wrong, the NICE_0_LOAD should be scaled down. But I hope I am wrong. Vincent, could you take a look?
[tip:sched/core] sched/fair: Correct unit of load_above_capacity
Commit-ID: cfa10334318d8212d007da8c771187643c9cef35 Gitweb: http://git.kernel.org/tip/cfa10334318d8212d007da8c771187643c9cef35 Author: Morten Rasmussen AuthorDate: Fri, 29 Apr 2016 20:32:40 +0100 Committer: Ingo Molnar CommitDate: Thu, 12 May 2016 09:55:33 +0200 sched/fair: Correct unit of load_above_capacity In calculate_imbalance() load_above_capacity currently has the unit [capacity] while it is used as being [load/capacity]. Not only is it wrong it also makes it unlikely that load_above_capacity is ever used as the subsequent code picks the smaller of load_above_capacity and the avg_load This patch ensures that load_above_capacity has the right unit [load/capacity]. Signed-off-by: Morten Rasmussen [ Changed changelog to note it was in capacity unit; +rebase. ] Signed-off-by: Peter Zijlstra (Intel) Cc: Dietmar Eggemann Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggem...@arm.com Signed-off-by: Ingo Molnar --- kernel/sched/fair.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2338105..218f8e8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7067,9 +7067,11 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s if (busiest->group_type == group_overloaded && local->group_type == group_overloaded) { load_above_capacity = busiest->sum_nr_running * SCHED_CAPACITY_SCALE; - if (load_above_capacity > busiest->group_capacity) + if (load_above_capacity > busiest->group_capacity) { load_above_capacity -= busiest->group_capacity; - else + load_above_capacity *= NICE_0_LOAD; + load_above_capacity /= busiest->group_capacity; + } else load_above_capacity = ~0UL; }