subject:"Re\: \[PATCH 4\/4\] sched\: bias to target cpu load to reduce task moving"

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2014-01-08 Thread Alex Shi

On 01/07/2014 08:59 PM, Peter Zijlstra wrote:
> On Tue, Jan 07, 2014 at 12:55:18PM +, Morten Rasmussen wrote:
>> My understanding is that should_we_balance() decides which cpu is
>> eligible for doing the load balancing for a given domain (and the
>> domains above). That is, only one cpu in a group is allowed to load
>> balance between the local group and other groups. That cpu would
>> therefore be reponsible for pulling enough load that the groups are
>> balanced even if it means temporarily overloading itself. The other cpus
>> in the group will take care of load balancing the extra load within the
>> local group later.

Thanks for both of you comments and explanations! :)

I know this patch's change is arguable and my attempt doesn't tune well. But I 
believe I am in a correct way. :) let me explain a bit for this patch again.

First cpu_load includes the history load info, so repeatedly decay and use the 
history load is kind of non-sense. and the old source/target_load randomly 
select history load or current load just according to max/min, it also owe a 
well explanation.
Second, we consider the bias in source/target_load already. but still use 
imbalance_pct as last check in idlest/busiest group finding. It is also a kind 
of redundant job. If we can consider the source/target bias, we'd better not 
use imbalance_pct again.
And last, imbalance pct overused with quickly core number increasing cpu. Like 
in find_busiset_group:
Assume a 2 groups domain, each group has 8 cores cpus.
The target group will bias 8 * (imbalance_pct -100) 
= 8 * (125 - 100) = 200.
 Since each of cpu bias .25 times load, for 8 cpus, totally bias 2 times 
average cpu load between groups. That is a too much. But if there only 2 cores 
in cpu group(common case when the code introduced). the bias is just 2 * 25 / 
100 = 0.5 times average cpu load.

Now this patchset remove the cpu_load array avoid repeated history decay; 
reorganize the imbalance_pct usage to avoid redundant balance bias. and reduce 
the bias value between cpu groups -- maybe it isn't tune well. :)

> 
> Correct.
> 
>> I may have missed something, but I don't understand the reason for the
>> performance improvements that you are reporting. I see better numbers
>> for a few benchmarks, but I still don't understand why the code makes
>> sense after the cleanup. If we don't understand why it works, we cannot
>> be sure that it doesn't harm other benchmarks. There is always a chance
>> that we miss something but, IMHO, not having any idea to begin with
>> increases the chances for problems later significantly. So why not get
>> to the bottom of the problem of cleaning up cpu_load?
>>
>> Have you done more extensive benchmarking? Have you seen any regressions
>> in other benchmarks?
> 
> I only remember hackbench numbers and that generally fares well with a
> more aggressive balancer since it has no actual work to speak of the
> migration penalty is very low and because there's a metric ton of tasks
> the aggressive leveling makes for more coherent 'throughput'.

I just tested hackbench on arm. and with more testing times plus rebase to 
.13-rc6, the variation increased, then the benefit become unclear. anyway still 
no regression find on both perf-stat cpu-migration times and real execute time.

On 0day performance testing should tested kbuild, hackbench, aim7, dbench, 
tbench, sysbench, netperf etc. etc. No regression found.

The 0day performance testing also catch a cpu migration reduced on kvm guest.
https://lkml.org/lkml/2013/12/21/135 

and another benchmark get benefit on the old patchset:
like the testing results show on 0day performance testing: 

https://lkml.org/lkml/2013/12/4/102

Hi Alex,

We obsevered 150% performance gain with vm-scalability/300s-mmap-pread-seq
testcase with this patch applied. Here is a list of changes we got so far:

testbox : brickland
testcase: vm-scalability/300s-mmap-pread-seq

f1b6442c7dd12802e622  d70495ef86f397816d73  
   (parent commit)(this commit)

 26393249.80  +150.9%  66223933.60  vm-scalability.throughput

  225.12   -49.9%   112.75  time.elapsed_time
36333.40   -90.7%  3392.20  vmstat.system.cs
2.40  +375.0%11.40  vmstat.cpu.id
  3770081.60   -97.7% 87673.40  time.major_page_faults
  3975276.20   -97.0%117409.60  
time.voluntary_context_switches
3.05  +301.7%12.24  iostat.cpu.idle
21118.41   -70.3%  6277.19  time.system_time
   18.40  +130.4%42.40  vmstat.cpu.us
   77.00   -41.3%45.20  vmstat.cpu.sy
47459.60   -31.3% 32592.20  vmstat.system.in
82435.40   -12.1% 72443.60  
time.involuntary_context_switches

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2014-01-07 Thread Peter Zijlstra

On Tue, Jan 07, 2014 at 03:16:32PM +, Morten Rasmussen wrote:
> From a load perspective wouldn't it be better to pick the least loaded
> cpu in the group? It is not cheap to implement, but in theory it should
> give less balancing within the group later an less unfairness until it
> happens.

I tried that; see 04f733b4afac5dc93ae9b0a8703c60b87def491e for why it
doesn't work.

> Rotating the cpu is probably good enough for most cases and certainly
> easier to implement.

Indeed.

> The bias continues after they first round of load balance by the other
> cpus?

The cost, yes. Even when perfectly balanced, we still get to iterate the
entire machine computing s[gd]_lb_stats to find out we're good and don't
need to move tasks about.

> Pulling everything to one cpu is not ideal from a performance point of
> view. You loose some available cpu cycles until the balance settles.
> However, it is not easy to do better and maintain scalability at the
> same time.

Right, its part of the cost we pay for scaling better. And rotating this
cost around a bit would alleviate the obvious bias.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2014-01-07 Thread Morten Rasmussen

On Tue, Jan 07, 2014 at 01:15:23PM +, Peter Zijlstra wrote:
> On Tue, Jan 07, 2014 at 01:59:30PM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 07, 2014 at 12:55:18PM +, Morten Rasmussen wrote:
> > > My understanding is that should_we_balance() decides which cpu is
> > > eligible for doing the load balancing for a given domain (and the
> > > domains above). That is, only one cpu in a group is allowed to load
> > > balance between the local group and other groups. That cpu would
> > > therefore be reponsible for pulling enough load that the groups are
> > > balanced even if it means temporarily overloading itself. The other cpus
> > > in the group will take care of load balancing the extra load within the
> > > local group later.
> > 
> > Correct.
> 
> On that; one of the things I wanted to (and previously did attempt but
> failed) is trying to rotate this cpu. Currently its always the first cpu
> (of the group) and that gives a noticeable bias.
> 
> If we could slowly rotate the cpu that does this that would alleviate
> both the load and cost bias.

>From a load perspective wouldn't it be better to pick the least loaded
cpu in the group? It is not cheap to implement, but in theory it should
give less balancing within the group later an less unfairness until it
happens.

Rotating the cpu is probably good enough for most cases and certainly
easier to implement.

> 
> One thing I was thinking of is keeping a global counter maybe:
>  'x := jiffies >> n'
> might be good enough and using the 'x % nr_cpus_in_group'-th cpu
> instead.
> 
> Then again, these are micro issue and not a lot of people complain
> about this.

The bias continues after they first round of load balance by the other
cpus?

Pulling everything to one cpu is not ideal from a performance point of
view. You loose some available cpu cycles until the balance settles.
However, it is not easy to do better and maintain scalability at the
same time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2014-01-07 Thread Peter Zijlstra

On Tue, Jan 07, 2014 at 02:32:07PM +0100, Vincent Guittot wrote:
> On 7 January 2014 14:15, Peter Zijlstra  wrote:
> > On Tue, Jan 07, 2014 at 01:59:30PM +0100, Peter Zijlstra wrote:
> >> On Tue, Jan 07, 2014 at 12:55:18PM +, Morten Rasmussen wrote:
> >> > My understanding is that should_we_balance() decides which cpu is
> >> > eligible for doing the load balancing for a given domain (and the
> >> > domains above). That is, only one cpu in a group is allowed to load
> >> > balance between the local group and other groups. That cpu would
> >> > therefore be reponsible for pulling enough load that the groups are
> >> > balanced even if it means temporarily overloading itself. The other cpus
> >> > in the group will take care of load balancing the extra load within the
> >> > local group later.
> >>
> >> Correct.
> >
> > On that; one of the things I wanted to (and previously did attempt but
> > failed) is trying to rotate this cpu. Currently its always the first cpu
> > (of the group) and that gives a noticeable bias.
> 
> Isn't the current policy (it's the 1st idle cpu in priority). a good
> enough way to rotate the cpus ? Are you need the rotation for loaded
> use case too ?

Yeah its for the fully loaded case. And like I said, there's not been
many complaints on this.

The 'problem' is that its always same cpu that does the most expensive
full machine balance; and always that cpu that is the one that gains
extra load to redistribute in the group. So its penalized twice.

Like said, really minor issue. Just something I thought I'd mention.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2014-01-07 Thread Vincent Guittot

On 7 January 2014 14:15, Peter Zijlstra  wrote:
> On Tue, Jan 07, 2014 at 01:59:30PM +0100, Peter Zijlstra wrote:
>> On Tue, Jan 07, 2014 at 12:55:18PM +, Morten Rasmussen wrote:
>> > My understanding is that should_we_balance() decides which cpu is
>> > eligible for doing the load balancing for a given domain (and the
>> > domains above). That is, only one cpu in a group is allowed to load
>> > balance between the local group and other groups. That cpu would
>> > therefore be reponsible for pulling enough load that the groups are
>> > balanced even if it means temporarily overloading itself. The other cpus
>> > in the group will take care of load balancing the extra load within the
>> > local group later.
>>
>> Correct.
>
> On that; one of the things I wanted to (and previously did attempt but
> failed) is trying to rotate this cpu. Currently its always the first cpu
> (of the group) and that gives a noticeable bias.

Isn't the current policy (it's the 1st idle cpu in priority). a good
enough way to rotate the cpus ? Are you need the rotation for loaded
use case too ?

>
> If we could slowly rotate the cpu that does this that would alleviate
> both the load and cost bias.
>
> One thing I was thinking of is keeping a global counter maybe:
>  'x := jiffies >> n'
> might be good enough and using the 'x % nr_cpus_in_group'-th cpu
> instead.
>
> Then again, these are micro issue and not a lot of people complain
> about this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2014-01-07 Thread Peter Zijlstra

On Tue, Jan 07, 2014 at 01:59:30PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 07, 2014 at 12:55:18PM +, Morten Rasmussen wrote:
> > My understanding is that should_we_balance() decides which cpu is
> > eligible for doing the load balancing for a given domain (and the
> > domains above). That is, only one cpu in a group is allowed to load
> > balance between the local group and other groups. That cpu would
> > therefore be reponsible for pulling enough load that the groups are
> > balanced even if it means temporarily overloading itself. The other cpus
> > in the group will take care of load balancing the extra load within the
> > local group later.
> 
> Correct.

On that; one of the things I wanted to (and previously did attempt but
failed) is trying to rotate this cpu. Currently its always the first cpu
(of the group) and that gives a noticeable bias.

If we could slowly rotate the cpu that does this that would alleviate
both the load and cost bias.

One thing I was thinking of is keeping a global counter maybe:
 'x := jiffies >> n'
might be good enough and using the 'x % nr_cpus_in_group'-th cpu
instead.

Then again, these are micro issue and not a lot of people complain
about this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2014-01-07 Thread Peter Zijlstra

On Tue, Jan 07, 2014 at 12:55:18PM +, Morten Rasmussen wrote:
> My understanding is that should_we_balance() decides which cpu is
> eligible for doing the load balancing for a given domain (and the
> domains above). That is, only one cpu in a group is allowed to load
> balance between the local group and other groups. That cpu would
> therefore be reponsible for pulling enough load that the groups are
> balanced even if it means temporarily overloading itself. The other cpus
> in the group will take care of load balancing the extra load within the
> local group later.

Correct.

> I may have missed something, but I don't understand the reason for the
> performance improvements that you are reporting. I see better numbers
> for a few benchmarks, but I still don't understand why the code makes
> sense after the cleanup. If we don't understand why it works, we cannot
> be sure that it doesn't harm other benchmarks. There is always a chance
> that we miss something but, IMHO, not having any idea to begin with
> increases the chances for problems later significantly. So why not get
> to the bottom of the problem of cleaning up cpu_load?
> 
> Have you done more extensive benchmarking? Have you seen any regressions
> in other benchmarks?

I only remember hackbench numbers and that generally fares well with a
more aggressive balancer since it has no actual work to speak of the
migration penalty is very low and because there's a metric ton of tasks
the aggressive leveling makes for more coherent 'throughput'.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2014-01-07 Thread Morten Rasmussen

On Mon, Jan 06, 2014 at 01:35:39PM +, Alex Shi wrote:
> On 01/03/2014 12:04 AM, Morten Rasmussen wrote:
> > On Wed, Dec 25, 2013 at 02:58:26PM +, Alex Shi wrote:
> >>
>  From 5cd67d975001edafe2ee820e0be5d86881a23bd6 Mon Sep 17 00:00:00 2001
>  From: Alex Shi 
>  Date: Sat, 23 Nov 2013 23:18:09 +0800
>  Subject: [PATCH 4/4] sched: bias to target cpu load to reduce task moving
> 
>  Task migration happens when target just a bit less then source cpu load.
>  To reduce such situation happens, aggravate the target cpu load with
>  sd->imbalance_pct/100 in wake_affine.
> 
>  In find_idlest/busiest_group, change the aggravate to local cpu only
>  from old group aggravation.
> 
>  on my pandaboard ES.
> 
> latest kernel 527d1511310a89+ whole patchset
>  hackbench -T -g 10 -f 40
> 23.25"  21.99"
> 23.16"  21.20"
> 24.24"  21.89"
>  hackbench -p -g 10 -f 40
> 26.52"  21.46"
> 23.89"  22.96"
> 25.65"  22.73"
>  hackbench -P -g 10 -f 40
> 20.14"  19.72"
> 19.96"  19.10"
> 21.76"  20.03"
> 
>  Signed-off-by: Alex Shi 
>  ---
>   kernel/sched/fair.c | 35 ---
>   1 file changed, 16 insertions(+), 19 deletions(-)
> 
>  diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>  index bccdd89..3623ba4 100644
>  --- a/kernel/sched/fair.c
>  +++ b/kernel/sched/fair.c
>  @@ -978,7 +978,7 @@ static inline unsigned long group_weight(struct 
>  task_struct *p, int nid)
> 
>   static unsigned long weighted_cpuload(const int cpu);
>   static unsigned long source_load(int cpu);
>  -static unsigned long target_load(int cpu);
>  +static unsigned long target_load(int cpu, int imbalance_pct);
>   static unsigned long power_of(int cpu);
>   static long effective_load(struct task_group *tg, int cpu, long wl, 
>  long wg);
> 
>  @@ -3809,11 +3809,17 @@ static unsigned long source_load(int cpu)
>    * Return a high guess at the load of a migration-target cpu weighted
>    * according to the scheduling class and "nice" value.
>    */
>  -static unsigned long target_load(int cpu)
>  +static unsigned long target_load(int cpu, int imbalance_pct)
>   {
> struct rq *rq = cpu_rq(cpu);
> unsigned long total = weighted_cpuload(cpu);
> 
>  +  /*
>  +   * without cpu_load decay, in most of time cpu_load is same as total
>  +   * so we need to make target a bit heavier to reduce task migration
>  +   */
>  +  total = total * imbalance_pct / 100;
>  +
> if (!sched_feat(LB_BIAS))
> return total;
> 
>  @@ -4033,7 +4039,7 @@ static int wake_affine(struct sched_domain *sd, 
>  struct task_struct *p, int sync)
> this_cpu  = smp_processor_id();
> prev_cpu  = task_cpu(p);
> load  = source_load(prev_cpu);
>  -  this_load = target_load(this_cpu);
>  +  this_load = target_load(this_cpu, 100);
> 
> /*
>  * If sync wakeup then subtract the (maximum possible)
>  @@ -4089,7 +4095,7 @@ static int wake_affine(struct sched_domain *sd, 
>  struct task_struct *p, int sync)
> 
> if (balanced ||
> (this_load <= load &&
>  -   this_load + target_load(prev_cpu) <= tl_per_task)) {
>  +   this_load + target_load(prev_cpu, 100) <= tl_per_task)) {
> /*
>  * This domain has SD_WAKE_AFFINE and
>  * p is cache cold in this domain, and
>  @@ -4112,7 +4118,6 @@ find_idlest_group(struct sched_domain *sd, struct 
>  task_struct *p, int this_cpu)
>   {
> struct sched_group *idlest = NULL, *group = sd->groups;
> unsigned long min_load = ULONG_MAX, this_load = 0;
>  -  int imbalance = 100 + (sd->imbalance_pct-100)/2;
> 
> do {
> unsigned long load, avg_load;
>  @@ -4132,10 +4137,10 @@ find_idlest_group(struct sched_domain *sd, 
>  struct task_struct *p, int this_cpu)
> 
> for_each_cpu(i, sched_group_cpus(group)) {
> /* Bias balancing toward cpus of our domain */
>  -  if (local_group)
>  +  if (i == this_cpu)
> >>>
> >>> What is the motivation for changing the local_group load calculation?
> >>> Now the load contributions of all cpus in the local group, except
> >>> this_cpu, will contribute more as their contribution (this_load) is
> >>> determined using target_load() instead.
> >>
> >> This part code 147cbb4bbe99, written i

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2014-01-06 Thread Alex Shi

On 01/03/2014 12:04 AM, Morten Rasmussen wrote:
> On Wed, Dec 25, 2013 at 02:58:26PM +, Alex Shi wrote:
>>
 From 5cd67d975001edafe2ee820e0be5d86881a23bd6 Mon Sep 17 00:00:00 2001
 From: Alex Shi 
 Date: Sat, 23 Nov 2013 23:18:09 +0800
 Subject: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

 Task migration happens when target just a bit less then source cpu load.
 To reduce such situation happens, aggravate the target cpu load with
 sd->imbalance_pct/100 in wake_affine.

 In find_idlest/busiest_group, change the aggravate to local cpu only
 from old group aggravation.

 on my pandaboard ES.

latest kernel 527d1511310a89+ whole patchset
 hackbench -T -g 10 -f 40
23.25"  21.99"
23.16"  21.20"
24.24"  21.89"
 hackbench -p -g 10 -f 40
26.52"  21.46"
23.89"  22.96"
25.65"  22.73"
 hackbench -P -g 10 -f 40
20.14"  19.72"
19.96"  19.10"
21.76"  20.03"

 Signed-off-by: Alex Shi 
 ---
  kernel/sched/fair.c | 35 ---
  1 file changed, 16 insertions(+), 19 deletions(-)

 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
 index bccdd89..3623ba4 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -978,7 +978,7 @@ static inline unsigned long group_weight(struct 
 task_struct *p, int nid)
  
  static unsigned long weighted_cpuload(const int cpu);
  static unsigned long source_load(int cpu);
 -static unsigned long target_load(int cpu);
 +static unsigned long target_load(int cpu, int imbalance_pct);
  static unsigned long power_of(int cpu);
  static long effective_load(struct task_group *tg, int cpu, long wl, long 
 wg);
  
 @@ -3809,11 +3809,17 @@ static unsigned long source_load(int cpu)
   * Return a high guess at the load of a migration-target cpu weighted
   * according to the scheduling class and "nice" value.
   */
 -static unsigned long target_load(int cpu)
 +static unsigned long target_load(int cpu, int imbalance_pct)
  {
struct rq *rq = cpu_rq(cpu);
unsigned long total = weighted_cpuload(cpu);
  
 +  /*
 +   * without cpu_load decay, in most of time cpu_load is same as total
 +   * so we need to make target a bit heavier to reduce task migration
 +   */
 +  total = total * imbalance_pct / 100;
 +
if (!sched_feat(LB_BIAS))
return total;
  
 @@ -4033,7 +4039,7 @@ static int wake_affine(struct sched_domain *sd, 
 struct task_struct *p, int sync)
this_cpu  = smp_processor_id();
prev_cpu  = task_cpu(p);
load  = source_load(prev_cpu);
 -  this_load = target_load(this_cpu);
 +  this_load = target_load(this_cpu, 100);
  
/*
 * If sync wakeup then subtract the (maximum possible)
 @@ -4089,7 +4095,7 @@ static int wake_affine(struct sched_domain *sd, 
 struct task_struct *p, int sync)
  
if (balanced ||
(this_load <= load &&
 -   this_load + target_load(prev_cpu) <= tl_per_task)) {
 +   this_load + target_load(prev_cpu, 100) <= tl_per_task)) {
/*
 * This domain has SD_WAKE_AFFINE and
 * p is cache cold in this domain, and
 @@ -4112,7 +4118,6 @@ find_idlest_group(struct sched_domain *sd, struct 
 task_struct *p, int this_cpu)
  {
struct sched_group *idlest = NULL, *group = sd->groups;
unsigned long min_load = ULONG_MAX, this_load = 0;
 -  int imbalance = 100 + (sd->imbalance_pct-100)/2;
  
do {
unsigned long load, avg_load;
 @@ -4132,10 +4137,10 @@ find_idlest_group(struct sched_domain *sd, struct 
 task_struct *p, int this_cpu)
  
for_each_cpu(i, sched_group_cpus(group)) {
/* Bias balancing toward cpus of our domain */
 -  if (local_group)
 +  if (i == this_cpu)
>>>
>>> What is the motivation for changing the local_group load calculation?
>>> Now the load contributions of all cpus in the local group, except
>>> this_cpu, will contribute more as their contribution (this_load) is
>>> determined using target_load() instead.
>>
>> This part code 147cbb4bbe99, written in 2005 for x86, at that time, only
>> 2 cores(guess no HT at that time) in cpu socket. With the cores number
> 
> NUMA support was already present. I guess that means support for systems
> with significantly more than two cpus.

Thanks a lot for comments, Morten!
the

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2014-01-02 Thread Morten Rasmussen

On Wed, Dec 25, 2013 at 02:58:26PM +, Alex Shi wrote:
> 
> >> From 5cd67d975001edafe2ee820e0be5d86881a23bd6 Mon Sep 17 00:00:00 2001
> >> From: Alex Shi 
> >> Date: Sat, 23 Nov 2013 23:18:09 +0800
> >> Subject: [PATCH 4/4] sched: bias to target cpu load to reduce task moving
> >>
> >> Task migration happens when target just a bit less then source cpu load.
> >> To reduce such situation happens, aggravate the target cpu load with
> >> sd->imbalance_pct/100 in wake_affine.
> >>
> >> In find_idlest/busiest_group, change the aggravate to local cpu only
> >> from old group aggravation.
> >>
> >> on my pandaboard ES.
> >>
> >>latest kernel 527d1511310a89+ whole patchset
> >> hackbench -T -g 10 -f 40
> >>23.25"  21.99"
> >>23.16"  21.20"
> >>24.24"  21.89"
> >> hackbench -p -g 10 -f 40
> >>26.52"  21.46"
> >>23.89"  22.96"
> >>25.65"  22.73"
> >> hackbench -P -g 10 -f 40
> >>20.14"  19.72"
> >>19.96"  19.10"
> >>21.76"  20.03"
> >>
> >> Signed-off-by: Alex Shi 
> >> ---
> >>  kernel/sched/fair.c | 35 ---
> >>  1 file changed, 16 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index bccdd89..3623ba4 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -978,7 +978,7 @@ static inline unsigned long group_weight(struct 
> >> task_struct *p, int nid)
> >>  
> >>  static unsigned long weighted_cpuload(const int cpu);
> >>  static unsigned long source_load(int cpu);
> >> -static unsigned long target_load(int cpu);
> >> +static unsigned long target_load(int cpu, int imbalance_pct);
> >>  static unsigned long power_of(int cpu);
> >>  static long effective_load(struct task_group *tg, int cpu, long wl, long 
> >> wg);
> >>  
> >> @@ -3809,11 +3809,17 @@ static unsigned long source_load(int cpu)
> >>   * Return a high guess at the load of a migration-target cpu weighted
> >>   * according to the scheduling class and "nice" value.
> >>   */
> >> -static unsigned long target_load(int cpu)
> >> +static unsigned long target_load(int cpu, int imbalance_pct)
> >>  {
> >>struct rq *rq = cpu_rq(cpu);
> >>unsigned long total = weighted_cpuload(cpu);
> >>  
> >> +  /*
> >> +   * without cpu_load decay, in most of time cpu_load is same as total
> >> +   * so we need to make target a bit heavier to reduce task migration
> >> +   */
> >> +  total = total * imbalance_pct / 100;
> >> +
> >>if (!sched_feat(LB_BIAS))
> >>return total;
> >>  
> >> @@ -4033,7 +4039,7 @@ static int wake_affine(struct sched_domain *sd, 
> >> struct task_struct *p, int sync)
> >>this_cpu  = smp_processor_id();
> >>prev_cpu  = task_cpu(p);
> >>load  = source_load(prev_cpu);
> >> -  this_load = target_load(this_cpu);
> >> +  this_load = target_load(this_cpu, 100);
> >>  
> >>/*
> >> * If sync wakeup then subtract the (maximum possible)
> >> @@ -4089,7 +4095,7 @@ static int wake_affine(struct sched_domain *sd, 
> >> struct task_struct *p, int sync)
> >>  
> >>if (balanced ||
> >>(this_load <= load &&
> >> -   this_load + target_load(prev_cpu) <= tl_per_task)) {
> >> +   this_load + target_load(prev_cpu, 100) <= tl_per_task)) {
> >>/*
> >> * This domain has SD_WAKE_AFFINE and
> >> * p is cache cold in this domain, and
> >> @@ -4112,7 +4118,6 @@ find_idlest_group(struct sched_domain *sd, struct 
> >> task_struct *p, int this_cpu)
> >>  {
> >>struct sched_group *idlest = NULL, *group = sd->groups;
> >>unsigned long min_load = ULONG_MAX, this_load = 0;
> >> -  int imbalance = 100 + (sd->imbalance_pct-100)/2;
> >>  
> >>do {
> >>unsigned long load, avg_load;
> >> @@ -4132,10 +4137,10 @@ find_idlest_group(struct sched_domain *sd, struct 
> >> task_struct *p, int this_cpu)
> >>  
> >>for_each_cpu(i, sched_group_cpus(group)) {
> >>/* Bias balancing toward cpus of our domain */
> >> -  if (local_group)
> >> +  if (i == this_cpu)
> > 
> > What is the motivation for changing the local_group load calculation?
> > Now the load contributions of all cpus in the local group, except
> > this_cpu, will contribute more as their contribution (this_load) is
> > determined using target_load() instead.
> 
> This part code 147cbb4bbe99, written in 2005 for x86, at that time, only
> 2 cores(guess no HT at that time) in cpu socket. With the cores number

NUMA support was already present. I guess that means support for systems
with significantly more than two cpus.

> increasing trend, the sched_group become large and large, to give whole
> group this bias valu

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2013-12-25 Thread Alex Shi


>> From 5cd67d975001edafe2ee820e0be5d86881a23bd6 Mon Sep 17 00:00:00 2001
>> From: Alex Shi 
>> Date: Sat, 23 Nov 2013 23:18:09 +0800
>> Subject: [PATCH 4/4] sched: bias to target cpu load to reduce task moving
>>
>> Task migration happens when target just a bit less then source cpu load.
>> To reduce such situation happens, aggravate the target cpu load with
>> sd->imbalance_pct/100 in wake_affine.
>>
>> In find_idlest/busiest_group, change the aggravate to local cpu only
>> from old group aggravation.
>>
>> on my pandaboard ES.
>>
>>  latest kernel 527d1511310a89+ whole patchset
>> hackbench -T -g 10 -f 40
>>  23.25"  21.99"
>>  23.16"  21.20"
>>  24.24"  21.89"
>> hackbench -p -g 10 -f 40
>>  26.52"  21.46"
>>  23.89"  22.96"
>>  25.65"  22.73"
>> hackbench -P -g 10 -f 40
>>  20.14"  19.72"
>>  19.96"  19.10"
>>  21.76"  20.03"
>>
>> Signed-off-by: Alex Shi 
>> ---
>>  kernel/sched/fair.c | 35 ---
>>  1 file changed, 16 insertions(+), 19 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index bccdd89..3623ba4 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -978,7 +978,7 @@ static inline unsigned long group_weight(struct 
>> task_struct *p, int nid)
>>  
>>  static unsigned long weighted_cpuload(const int cpu);
>>  static unsigned long source_load(int cpu);
>> -static unsigned long target_load(int cpu);
>> +static unsigned long target_load(int cpu, int imbalance_pct);
>>  static unsigned long power_of(int cpu);
>>  static long effective_load(struct task_group *tg, int cpu, long wl, long 
>> wg);
>>  
>> @@ -3809,11 +3809,17 @@ static unsigned long source_load(int cpu)
>>   * Return a high guess at the load of a migration-target cpu weighted
>>   * according to the scheduling class and "nice" value.
>>   */
>> -static unsigned long target_load(int cpu)
>> +static unsigned long target_load(int cpu, int imbalance_pct)
>>  {
>>  struct rq *rq = cpu_rq(cpu);
>>  unsigned long total = weighted_cpuload(cpu);
>>  
>> +/*
>> + * without cpu_load decay, in most of time cpu_load is same as total
>> + * so we need to make target a bit heavier to reduce task migration
>> + */
>> +total = total * imbalance_pct / 100;
>> +
>>  if (!sched_feat(LB_BIAS))
>>  return total;
>>  
>> @@ -4033,7 +4039,7 @@ static int wake_affine(struct sched_domain *sd, struct 
>> task_struct *p, int sync)
>>  this_cpu  = smp_processor_id();
>>  prev_cpu  = task_cpu(p);
>>  load  = source_load(prev_cpu);
>> -this_load = target_load(this_cpu);
>> +this_load = target_load(this_cpu, 100);
>>  
>>  /*
>>   * If sync wakeup then subtract the (maximum possible)
>> @@ -4089,7 +4095,7 @@ static int wake_affine(struct sched_domain *sd, struct 
>> task_struct *p, int sync)
>>  
>>  if (balanced ||
>>  (this_load <= load &&
>> - this_load + target_load(prev_cpu) <= tl_per_task)) {
>> + this_load + target_load(prev_cpu, 100) <= tl_per_task)) {
>>  /*
>>   * This domain has SD_WAKE_AFFINE and
>>   * p is cache cold in this domain, and
>> @@ -4112,7 +4118,6 @@ find_idlest_group(struct sched_domain *sd, struct 
>> task_struct *p, int this_cpu)
>>  {
>>  struct sched_group *idlest = NULL, *group = sd->groups;
>>  unsigned long min_load = ULONG_MAX, this_load = 0;
>> -int imbalance = 100 + (sd->imbalance_pct-100)/2;
>>  
>>  do {
>>  unsigned long load, avg_load;
>> @@ -4132,10 +4137,10 @@ find_idlest_group(struct sched_domain *sd, struct 
>> task_struct *p, int this_cpu)
>>  
>>  for_each_cpu(i, sched_group_cpus(group)) {
>>  /* Bias balancing toward cpus of our domain */
>> -if (local_group)
>> +if (i == this_cpu)
> 
> What is the motivation for changing the local_group load calculation?
> Now the load contributions of all cpus in the local group, except
> this_cpu, will contribute more as their contribution (this_load) is
> determined using target_load() instead.

This part code 147cbb4bbe99, written in 2005 for x86, at that time, only
2 cores(guess no HT at that time) in cpu socket. With the cores number
increasing trend, the sched_group become large and large, to give whole
group this bias value is becoming non-sense. So it looks reasonable to
just bias this cpu only.
> 
> If I'm not mistaken, that will lead to more frequent load balancing as
> the local_group bias has been reduced. That is the opposite of your
> intentions based on your comment in target_load().
> 
>>  load =

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2013-12-20 Thread Alex Shi

On 12/20/2013 07:19 PM, Morten Rasmussen wrote:
>> @@ -4132,10 +4137,10 @@ find_idlest_group(struct sched_domain *sd, struct 
>> task_struct *p, int this_cpu)
>> >  
>> >for_each_cpu(i, sched_group_cpus(group)) {
>> >/* Bias balancing toward cpus of our domain */
>> > -  if (local_group)
>> > +  if (i == this_cpu)
> What is the motivation for changing the local_group load calculation?
> Now the load contributions of all cpus in the local group, except
> this_cpu, will contribute more as their contribution (this_load) is
> determined using target_load() instead.
> 
> If I'm not mistaken, that will lead to more frequent load balancing as
> the local_group bias has been reduced. That is the opposite of your
> intentions based on your comment in target_load().

Good catch. will reconsider this again. :)
> 
>> >load = source_load(i);
>> >else
>> > -  load = target_load(i);
>> > +  load = target_load(i, sd->imbalance_pct);
> You scale by sd->imbalance_pct instead of 100+(sd->imbalance_pct-100)/2
> that you removed above. sd->imbalance_pct may have been arbitrarily
> chosen in the past, but changing it may affect behavior. 
> 


-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2013-12-20 Thread Morten Rasmussen

On Thu, Dec 19, 2013 at 01:34:08PM +, Alex Shi wrote:
> On 12/17/2013 11:38 PM, Peter Zijlstra wrote:
> > On Tue, Dec 17, 2013 at 02:10:12PM +, Morten Rasmussen wrote:
> >>> @@ -4135,7 +4141,7 @@ find_idlest_group(struct sched_domain *sd, struct 
> >>> task_struct *p, int this_cpu)
> >>>   if (local_group)
> >>>   load = source_load(i);
> >>>   else
> >>> - load = target_load(i);
> >>> + load = target_load(i, sd->imbalance_pct);
> >>
> >> Don't you apply imbalance_pct twice here? Later on in
> >> find_idlest_group() you have:
> >>
> >>if (!idlest || 100*this_load < imbalance*min_load)
> >>return NULL;
> >>
> >> where min_load comes from target_load().
> > 
> > Yes! exactly! this doesn't make any sense.
> 
> Thanks a lot for review and comments!
> 
> I changed the patch to following shape. and push it under Fengguang's testing 
> system monitor. Any testing are appreciated!
> 
> BTW, Seems lots of changes in scheduler come from kinds of 
> scenarios/benchmarks
> experience. But I still like to take any theoretical comments/suggestions.
> 
> -- 
> Thanks
> Alex
> 
> ===
> 
> From 5cd67d975001edafe2ee820e0be5d86881a23bd6 Mon Sep 17 00:00:00 2001
> From: Alex Shi 
> Date: Sat, 23 Nov 2013 23:18:09 +0800
> Subject: [PATCH 4/4] sched: bias to target cpu load to reduce task moving
> 
> Task migration happens when target just a bit less then source cpu load.
> To reduce such situation happens, aggravate the target cpu load with
> sd->imbalance_pct/100 in wake_affine.
> 
> In find_idlest/busiest_group, change the aggravate to local cpu only
> from old group aggravation.
> 
> on my pandaboard ES.
> 
>   latest kernel 527d1511310a89+ whole patchset
> hackbench -T -g 10 -f 40
>   23.25"  21.99"
>   23.16"  21.20"
>   24.24"  21.89"
> hackbench -p -g 10 -f 40
>   26.52"  21.46"
>   23.89"  22.96"
>   25.65"  22.73"
> hackbench -P -g 10 -f 40
>   20.14"  19.72"
>   19.96"  19.10"
>   21.76"  20.03"
> 
> Signed-off-by: Alex Shi 
> ---
>  kernel/sched/fair.c | 35 ---
>  1 file changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bccdd89..3623ba4 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -978,7 +978,7 @@ static inline unsigned long group_weight(struct 
> task_struct *p, int nid)
>  
>  static unsigned long weighted_cpuload(const int cpu);
>  static unsigned long source_load(int cpu);
> -static unsigned long target_load(int cpu);
> +static unsigned long target_load(int cpu, int imbalance_pct);
>  static unsigned long power_of(int cpu);
>  static long effective_load(struct task_group *tg, int cpu, long wl, long wg);
>  
> @@ -3809,11 +3809,17 @@ static unsigned long source_load(int cpu)
>   * Return a high guess at the load of a migration-target cpu weighted
>   * according to the scheduling class and "nice" value.
>   */
> -static unsigned long target_load(int cpu)
> +static unsigned long target_load(int cpu, int imbalance_pct)
>  {
>   struct rq *rq = cpu_rq(cpu);
>   unsigned long total = weighted_cpuload(cpu);
>  
> + /*
> +  * without cpu_load decay, in most of time cpu_load is same as total
> +  * so we need to make target a bit heavier to reduce task migration
> +  */
> + total = total * imbalance_pct / 100;
> +
>   if (!sched_feat(LB_BIAS))
>   return total;
>  
> @@ -4033,7 +4039,7 @@ static int wake_affine(struct sched_domain *sd, struct 
> task_struct *p, int sync)
>   this_cpu  = smp_processor_id();
>   prev_cpu  = task_cpu(p);
>   load  = source_load(prev_cpu);
> - this_load = target_load(this_cpu);
> + this_load = target_load(this_cpu, 100);
>  
>   /*
>* If sync wakeup then subtract the (maximum possible)
> @@ -4089,7 +4095,7 @@ static int wake_affine(struct sched_domain *sd, struct 
> task_struct *p, int sync)
>  
>   if (balanced ||
>   (this_load <= load &&
> -  this_load + target_load(prev_cpu) <= tl_per_task)) {
> +  this_load + target_load(prev_cpu, 100) <= tl_per_task)) {
>   /*
>* This domain has SD_WAKE_AFFINE and
>* p is cache cold in this domain, and
> @@ -4112,7 +4118,6 @@ find_idlest_group(struct sched_domain *sd, struct 
> task_struct *p, int this_cpu)
>  {
>   struct sched_group *idlest = NULL, *group = sd->groups;
>   unsigned long min_load = ULONG_MAX, this_load = 0;
> - int imbalance = 100 + (sd->imbalance_pct-100)/2;
>  
>   do {
>   unsi

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2013-12-19 Thread Alex Shi

On 12/17/2013 11:38 PM, Peter Zijlstra wrote:
> On Tue, Dec 17, 2013 at 02:10:12PM +, Morten Rasmussen wrote:
>>> @@ -4135,7 +4141,7 @@ find_idlest_group(struct sched_domain *sd, struct 
>>> task_struct *p, int this_cpu)
>>> if (local_group)
>>> load = source_load(i);
>>> else
>>> -   load = target_load(i);
>>> +   load = target_load(i, sd->imbalance_pct);
>>
>> Don't you apply imbalance_pct twice here? Later on in
>> find_idlest_group() you have:
>>
>>  if (!idlest || 100*this_load < imbalance*min_load)
>>  return NULL;
>>
>> where min_load comes from target_load().
> 
> Yes! exactly! this doesn't make any sense.

Thanks a lot for review and comments!

I changed the patch to following shape. and push it under Fengguang's testing 
system monitor. Any testing are appreciated!

BTW, Seems lots of changes in scheduler come from kinds of scenarios/benchmarks
experience. But I still like to take any theoretical comments/suggestions.

-- 
Thanks
Alex

===

>From 5cd67d975001edafe2ee820e0be5d86881a23bd6 Mon Sep 17 00:00:00 2001
From: Alex Shi 
Date: Sat, 23 Nov 2013 23:18:09 +0800
Subject: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Task migration happens when target just a bit less then source cpu load.
To reduce such situation happens, aggravate the target cpu load with
sd->imbalance_pct/100 in wake_affine.

In find_idlest/busiest_group, change the aggravate to local cpu only
from old group aggravation.

on my pandaboard ES.

latest kernel 527d1511310a89+ whole patchset
hackbench -T -g 10 -f 40
23.25"  21.99"
23.16"  21.20"
24.24"  21.89"
hackbench -p -g 10 -f 40
26.52"  21.46"
23.89"  22.96"
25.65"  22.73"
hackbench -P -g 10 -f 40
20.14"  19.72"
19.96"  19.10"
21.76"  20.03"

Signed-off-by: Alex Shi 
---
 kernel/sched/fair.c | 35 ---
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bccdd89..3623ba4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -978,7 +978,7 @@ static inline unsigned long group_weight(struct task_struct 
*p, int nid)
 
 static unsigned long weighted_cpuload(const int cpu);
 static unsigned long source_load(int cpu);
-static unsigned long target_load(int cpu);
+static unsigned long target_load(int cpu, int imbalance_pct);
 static unsigned long power_of(int cpu);
 static long effective_load(struct task_group *tg, int cpu, long wl, long wg);
 
@@ -3809,11 +3809,17 @@ static unsigned long source_load(int cpu)
  * Return a high guess at the load of a migration-target cpu weighted
  * according to the scheduling class and "nice" value.
  */
-static unsigned long target_load(int cpu)
+static unsigned long target_load(int cpu, int imbalance_pct)
 {
struct rq *rq = cpu_rq(cpu);
unsigned long total = weighted_cpuload(cpu);
 
+   /*
+* without cpu_load decay, in most of time cpu_load is same as total
+* so we need to make target a bit heavier to reduce task migration
+*/
+   total = total * imbalance_pct / 100;
+
if (!sched_feat(LB_BIAS))
return total;
 
@@ -4033,7 +4039,7 @@ static int wake_affine(struct sched_domain *sd, struct 
task_struct *p, int sync)
this_cpu  = smp_processor_id();
prev_cpu  = task_cpu(p);
load  = source_load(prev_cpu);
-   this_load = target_load(this_cpu);
+   this_load = target_load(this_cpu, 100);
 
/*
 * If sync wakeup then subtract the (maximum possible)
@@ -4089,7 +4095,7 @@ static int wake_affine(struct sched_domain *sd, struct 
task_struct *p, int sync)
 
if (balanced ||
(this_load <= load &&
-this_load + target_load(prev_cpu) <= tl_per_task)) {
+this_load + target_load(prev_cpu, 100) <= tl_per_task)) {
/*
 * This domain has SD_WAKE_AFFINE and
 * p is cache cold in this domain, and
@@ -4112,7 +4118,6 @@ find_idlest_group(struct sched_domain *sd, struct 
task_struct *p, int this_cpu)
 {
struct sched_group *idlest = NULL, *group = sd->groups;
unsigned long min_load = ULONG_MAX, this_load = 0;
-   int imbalance = 100 + (sd->imbalance_pct-100)/2;
 
do {
unsigned long load, avg_load;
@@ -4132,10 +4137,10 @@ find_idlest_group(struct sched_domain *sd, struct 
task_struct *p, int this_cpu)
 
for_each_cpu(i, sched_group_cpus(group)) {
/* Bias bala

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2013-12-17 Thread Peter Zijlstra

On Tue, Dec 17, 2013 at 02:10:12PM +, Morten Rasmussen wrote:
> > @@ -4135,7 +4141,7 @@ find_idlest_group(struct sched_domain *sd, struct 
> > task_struct *p, int this_cpu)
> > if (local_group)
> > load = source_load(i);
> > else
> > -   load = target_load(i);
> > +   load = target_load(i, sd->imbalance_pct);
> 
> Don't you apply imbalance_pct twice here? Later on in
> find_idlest_group() you have:
> 
>   if (!idlest || 100*this_load < imbalance*min_load)
>   return NULL;
> 
> where min_load comes from target_load().

Yes! exactly! this doesn't make any sense.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2013-12-17 Thread Morten Rasmussen

On Tue, Dec 03, 2013 at 09:05:56AM +, Alex Shi wrote:
> Task migration happens when target just a bit less then source cpu load.
> To reduce such situation happens, aggravate the target cpu load with
> sd->imbalance_pct/100.
> 
> This patch removes the hackbench thread regression on Daniel's
> Intel Core2 server.
> 
> a5d6e63   +patch1~3   +patch1~4
> hackbench -T -s 4096 -l 1000 -g 10 -f 40
> 27.914" 38.694"   28.587"
> 28.390" 38.341"   29.513"
> 28.048" 38.626"   28.706"
> 
> Signed-off-by: Alex Shi 
> ---
>  kernel/sched/fair.c | 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bccdd89..c49b7ba 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -978,7 +978,7 @@ static inline unsigned long group_weight(struct 
> task_struct *p, int nid)
>  
>  static unsigned long weighted_cpuload(const int cpu);
>  static unsigned long source_load(int cpu);
> -static unsigned long target_load(int cpu);
> +static unsigned long target_load(int cpu, int imbalance_pct);
>  static unsigned long power_of(int cpu);
>  static long effective_load(struct task_group *tg, int cpu, long wl, long wg);
>  
> @@ -3809,11 +3809,17 @@ static unsigned long source_load(int cpu)
>   * Return a high guess at the load of a migration-target cpu weighted
>   * according to the scheduling class and "nice" value.
>   */
> -static unsigned long target_load(int cpu)
> +static unsigned long target_load(int cpu, int imbalance_pct)
>  {
>   struct rq *rq = cpu_rq(cpu);
>   unsigned long total = weighted_cpuload(cpu);
>  
> + /*
> +  * without cpu_load decay, in most of time cpu_load is same as total
> +  * so we need to make target a bit heavier to reduce task migration
> +  */
> + total = total * imbalance_pct / 100;
> +
>   if (!sched_feat(LB_BIAS))
>   return total;
>  
> @@ -4033,7 +4039,7 @@ static int wake_affine(struct sched_domain *sd, struct 
> task_struct *p, int sync)
>   this_cpu  = smp_processor_id();
>   prev_cpu  = task_cpu(p);
>   load  = source_load(prev_cpu);
> - this_load = target_load(this_cpu);
> + this_load = target_load(this_cpu, 100);
>  
>   /*
>* If sync wakeup then subtract the (maximum possible)
> @@ -4089,7 +4095,7 @@ static int wake_affine(struct sched_domain *sd, struct 
> task_struct *p, int sync)
>  
>   if (balanced ||
>   (this_load <= load &&
> -  this_load + target_load(prev_cpu) <= tl_per_task)) {
> +  this_load + target_load(prev_cpu, 100) <= tl_per_task)) {
>   /*
>* This domain has SD_WAKE_AFFINE and
>* p is cache cold in this domain, and
> @@ -4135,7 +4141,7 @@ find_idlest_group(struct sched_domain *sd, struct 
> task_struct *p, int this_cpu)
>   if (local_group)
>   load = source_load(i);
>   else
> - load = target_load(i);
> + load = target_load(i, sd->imbalance_pct);

Don't you apply imbalance_pct twice here? Later on in
find_idlest_group() you have:

if (!idlest || 100*this_load < imbalance*min_load)
return NULL;

where min_load comes from target_load().


>  
>   avg_load += load;
>   }
> @@ -5478,7 +5484,7 @@ static inline void update_sg_lb_stats(struct lb_env 
> *env,
>  
>   /* Bias balancing toward cpus of our domain */
>   if (local_group)
> - load = target_load(i);
> + load = target_load(i, env->sd->imbalance_pct);

You probably have the same problem here.

Morten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2013-12-04 Thread Alex Shi


> We obsevered 150% performance gain with vm-scalability/300s-mmap-pread-seq
> testcase with this patch applied. Here is a list of changes we got so far:
> 
> testbox : brickland

got some explain of brickland on wiki:
High-end server platform based on the Ivy Bridge-EX processor
> testcase: vm-scalability/300s-mmap-pread-seq

https://github.com/aristeu/vm-scalability

Thanks a lot for testing! :)
Do you have data of base upstream commit?

> 
> 
> f1b6442c7dd12802e622  d70495ef86f397816d73  
>(parent commit)(this commit)
>     
>  26393249.80  +150.9%  66223933.60  vm-scalability.throughput
> 
>   225.12   -49.9%   112.75  time.elapsed_time
> 36333.40   -90.7%  3392.20  vmstat.system.cs
> 2.40  +375.0%11.40  vmstat.cpu.id
>   3770081.60   -97.7% 87673.40  time.major_page_faults
>   3975276.20   -97.0%117409.60  
> time.voluntary_context_switches
> 3.05  +301.7%12.24  iostat.cpu.idle
> 21118.41   -70.3%  6277.19  time.system_time
>18.40  +130.4%42.40  vmstat.cpu.us
>77.00   -41.3%45.20  vmstat.cpu.sy
> 47459.60   -31.3% 32592.20  vmstat.system.in
> 82435.40   -12.1% 72443.60  
> time.involuntary_context_switches
>  5128.13   +14.0%  5848.30  time.user_time
> 11656.20-7.8% 10745.60  
> time.percent_of_cpu_this_job_got
>1069997484.80+0.3% 1073679919.00 time.minor_page_faults
> 

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

2013-12-04 Thread Yuanhan Liu

On Tue, Dec 03, 2013 at 05:05:56PM +0800, Alex Shi wrote:
> Task migration happens when target just a bit less then source cpu load.
> To reduce such situation happens, aggravate the target cpu load with
> sd->imbalance_pct/100.
> 
> This patch removes the hackbench thread regression on Daniel's
> Intel Core2 server.
> 
> a5d6e63   +patch1~3   +patch1~4
> hackbench -T -s 4096 -l 1000 -g 10 -f 40
> 27.914" 38.694"   28.587"
> 28.390" 38.341"   29.513"
> 28.048" 38.626"   28.706"
> 
> Signed-off-by: Alex Shi 

Hi Alex,

We obsevered 150% performance gain with vm-scalability/300s-mmap-pread-seq
testcase with this patch applied. Here is a list of changes we got so far:

testbox : brickland
testcase: vm-scalability/300s-mmap-pread-seq


f1b6442c7dd12802e622  d70495ef86f397816d73  
   (parent commit)(this commit)
    
 26393249.80  +150.9%  66223933.60  vm-scalability.throughput

  225.12   -49.9%   112.75  time.elapsed_time
36333.40   -90.7%  3392.20  vmstat.system.cs
2.40  +375.0%11.40  vmstat.cpu.id
  3770081.60   -97.7% 87673.40  time.major_page_faults
  3975276.20   -97.0%117409.60  
time.voluntary_context_switches
3.05  +301.7%12.24  iostat.cpu.idle
21118.41   -70.3%  6277.19  time.system_time
   18.40  +130.4%42.40  vmstat.cpu.us
   77.00   -41.3%45.20  vmstat.cpu.sy
47459.60   -31.3% 32592.20  vmstat.system.in
82435.40   -12.1% 72443.60  
time.involuntary_context_switches
 5128.13   +14.0%  5848.30  time.user_time
11656.20-7.8% 10745.60  
time.percent_of_cpu_this_job_got
   1069997484.80+0.3% 1073679919.00 time.minor_page_faults


--yliu
> ---
>  kernel/sched/fair.c | 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bccdd89..c49b7ba 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -978,7 +978,7 @@ static inline unsigned long group_weight(struct 
> task_struct *p, int nid)
>  
>  static unsigned long weighted_cpuload(const int cpu);
>  static unsigned long source_load(int cpu);
> -static unsigned long target_load(int cpu);
> +static unsigned long target_load(int cpu, int imbalance_pct);
>  static unsigned long power_of(int cpu);
>  static long effective_load(struct task_group *tg, int cpu, long wl, long wg);
>  
> @@ -3809,11 +3809,17 @@ static unsigned long source_load(int cpu)
>   * Return a high guess at the load of a migration-target cpu weighted
>   * according to the scheduling class and "nice" value.
>   */
> -static unsigned long target_load(int cpu)
> +static unsigned long target_load(int cpu, int imbalance_pct)
>  {
>   struct rq *rq = cpu_rq(cpu);
>   unsigned long total = weighted_cpuload(cpu);
>  
> + /*
> +  * without cpu_load decay, in most of time cpu_load is same as total
> +  * so we need to make target a bit heavier to reduce task migration
> +  */
> + total = total * imbalance_pct / 100;
> +
>   if (!sched_feat(LB_BIAS))
>   return total;
>  
> @@ -4033,7 +4039,7 @@ static int wake_affine(struct sched_domain *sd, struct 
> task_struct *p, int sync)
>   this_cpu  = smp_processor_id();
>   prev_cpu  = task_cpu(p);
>   load  = source_load(prev_cpu);
> - this_load = target_load(this_cpu);
> + this_load = target_load(this_cpu, 100);
>  
>   /*
>* If sync wakeup then subtract the (maximum possible)
> @@ -4089,7 +4095,7 @@ static int wake_affine(struct sched_domain *sd, struct 
> task_struct *p, int sync)
>  
>   if (balanced ||
>   (this_load <= load &&
> -  this_load + target_load(prev_cpu) <= tl_per_task)) {
> +  this_load + target_load(prev_cpu, 100) <= tl_per_task)) {
>   /*
>* This domain has SD_WAKE_AFFINE and
>* p is cache cold in this domain, and
> @@ -4135,7 +4141,7 @@ find_idlest_group(struct sched_domain *sd, struct 
> task_struct *p, int this_cpu)
>   if (local_group)
>   load = source_load(i);
>   else
> - load = target_load(i);
> + load = target_load(i, sd->imbalance_pct);
>  
>   avg_load += load;
>   }
> @@ -5478,7 +5484,7 @@ static inline void update_sg_lb_stats(struct lb_env 
> *env,
>  
>   /* Bias balancing toward cpus of our domain */
>   if (local_group)
> - load = target_l

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

Re: [PATCH 4/4] sched: bias to target cpu load to reduce task moving

18 matches

Site Navigation

Mail list logo

Footer information