subject:"\[patch v4 0\/18\] sched\: simplified fork, release load avg and power awareness scheduling"

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-02-05 Thread Alex Shi

BTW,

Since numa balance scheduling is also a kind of cpu locality policy, it
is natural compatible with power aware scheduling.

The v2/v3 of this patch had developed on tip/master, testing show above
2 scheduling policy work together well.

-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-02-05 Thread Alex Shi

BTW,

Since numa balance scheduling is also a kind of cpu locality policy, it
is natural compatible with power aware scheduling.

The v2/v3 of this patch had developed on tip/master, testing show above
2 scheduling policy work together well.

-- 
Thanks Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-02-04 Thread Alex Shi


>> Ingo, I appreciate for any comments from you. :)
> 
> Have you tried to quantify the actual real or expected power 
> savings with the knob enabled?

Thanks a lot for your comments! :)

Yes, the following power data copied form patch 17th:
---
A test can show the effort on different policy:
for ((i = 0; i < I; i++)) ; do while true; do :; done  &   done

On my SNB laptop with 4core* HT: the data is Watts
powersaving balance performance
i = 2   40  54  54
i = 4   57  64* 68
i = 8   68  68  68

Note:
When i = 4 with balance policy, the power may change in 57~68Watt,
since the HT capacity and core capacity are both 1.

on SNB EP machine with 2 sockets * 8 cores * HT:
powersaving balance performance
i = 4   190 201 238
i = 8   205 241 268
i = 16  271 348 376

If system has few continued tasks, use power policy can get
the performance/power gain. Like sysbench fileio randrw test with 16
thread on the SNB EP box.
=

and the following from patch 18th
---
On my SNB EP 2 sockets machine with 8 cores * HT: 'make -j x vmlinux'
results:

powersaving balance performance
x = 1175.603 /417 13  175.220 /416 13176.073 /407 13
x = 2192.215 /218 23  194.522 /202 25217.393 /200 23
x = 4205.226 /124 39  208.823 /114 42230.425 /105 41
x = 8236.369 /71 59   249.005 /65 61 257.661 /62 62
x = 16   283.842 /48 73   307.465 /40 81 309.336 /39 82
x = 32   325.197 /32 96   333.503 /32 93 336.138 /32 92

data explains: 175.603 /417 13
175.603: average Watts
417: seconds(compile time)
13:  scaled performance/power = 100 / seconds / watts
=

some data for parallel compress: https://lkml.org/lkml/2012/12/11/155
---
Another testing of parallel compress with pigz on Linus' git tree.
results show we get much better performance/power with powersaving and
balance policy:

testing command:
#pigz -k -c  -p$x -r linux* &> /dev/null

On a NHM EP box
 powersaving   balance   performance
x = 4166.516 /88 68   170.515 /82 71 165.283 /103 58
x = 8173.654 /61 94   177.693 /60 93 172.31 /76 76

On a 2 sockets SNB EP box.
 powersaving   balance   performance
x = 4190.995 /149 35  200.6 /129 38  208.561 /135 35
x = 8197.969 /108 46  208.885 /103 46213.96 /108 43
x = 16   205.163 /76 64   212.144 /91 51 229.287 /97 44

data format is: 166.516 /88 68
166.516: average Watts
88: seconds(compress time)
68:  scaled performance/power = 100 / time / power
=

BTW, bltk-game with openarena dropped 0.3/1.5 Watt on powersaving policy
or 0.2/0.5 Watt on balance policy on my laptop wsm/snb;
> 
> I'd also love to have an automatic policy here, with a knob that 
> has 3 values:
> 
>0: always disabled
>1: automatic
>2: always enabled
> 
> here enabled/disabled is your current knob's functionality, and 
> those can also be used by user-space policy daemons/handlers.

Sure, this patch has a knob for user-space policy selecting,

$cat /sys/devices/system/cpu/sched_policy/available_sched_policy
performance powersaving balance

User can change the policy by commend 'echo':
 echo performance > /sys/devices/system/cpu/current_sched_policy

The 'performance' policy means 'always disabled' power friendly scheduling.

The 'balance/powersaving' is automatic power friendly scheduling, since
system will auto bypass power scheduling when cpus utilisation in a
sched domain is beyond the domain's cpu weight (powersaving) or beyond
the domain's capacity (balance).

There is no always enabled power scheduling, since the patchset bases on
'race to idle'. but it's easy to add this function if needed.

> 
> The interesting thing would be '1' which should be the default: 
> on laptops that are on battery it should result in a power 
> saving policy, on laptops that are on AC or on battery-less 
> systems it should mean 'performance' policy.

yes, with above sysfs interface it is easy to be done. :)
> 
> It should generally default to 'performance', switching to 
> 'power saving on' only if there's positive, reliable information 
> somewhere in the kernel that we are operating on battery power. 
> A callback or two would have to go into the ACPI battery driver 
> I suspect.
> 
> So I'd like this feature to be a tangible improvement for laptop 
> users (as long as the laptop hardware is passing us battery/AC 
> events reliably).

Maybe it is better to let system admin change it from user space? I am
not sure some one like to enable a call back in ACPI battery driver?

CC to Zhang Rui.
> 
> Or something like that - with .config

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-02-04 Thread Ingo Molnar


* Alex Shi  wrote:

> On 01/24/2013 11:06 AM, Alex Shi wrote:
> > Since the runnable info needs 345ms to accumulate, balancing
> > doesn't do well for many tasks burst waking. After talking with Mike
> > Galbraith, we are agree to just use runnable avg in power friendly 
> > scheduling and keep current instant load in performance scheduling for 
> > low latency.
> > 
> > So the biggest change in this version is removing runnable load avg in
> > balance and just using runnable data in power balance.
> > 
> > The patchset bases on Linus' tree, includes 3 parts,
> > ** 1, bug fix and fork/wake balancing clean up. patch 1~5,
> > --
> > the first patch remove one domain level. patch 2~5 simplified fork/wake
> > balancing, it can increase 10+% hackbench performance on our 4 sockets
> > SNB EP machine.
> > 
> > V3 change:
> > a, added the first patch to remove one domain level on x86 platform.
> > b, some small changes according to Namhyung Kim's comments, thanks!
> > 
> > ** 2, bug fix of load avg and remove the CONFIG_FAIR_GROUP_SCHED limit
> > --
> > patch 6~8, That using runnable avg in load balancing, with
> > two initial runnable variables fix.
> > 
> > V4 change:
> > a, remove runnable log avg using in balancing.
> > 
> > V3 change:
> > a, use rq->cfs.runnable_load_avg as cpu load not
> > rq->avg.load_avg_contrib, since the latter need much time to accumulate
> > for new forked task,
> > b, a build issue fixed with Namhyung Kim's reminder.
> > 
> > ** 3, power awareness scheduling, patch 9~18.
> > --
> > The subset implement/consummate the rough power aware scheduling
> > proposal: https://lkml.org/lkml/2012/8/13/139.
> > It defines 2 new power aware policy 'balance' and 'powersaving' and then
> > try to spread or pack tasks on each sched groups level according the
> > different scheduler policy. That can save much power when task number in
> > system is no more then LCPU number.
> > 
> > As mentioned in the power aware scheduler proposal, Power aware
> > scheduling has 2 assumptions:
> > 1, race to idle is helpful for power saving
> > 2, pack tasks on less sched_groups will reduce power consumption
> > 
> > The first assumption make performance policy take over scheduling when
> > system busy.
> > The second assumption make power aware scheduling try to move
> > disperse tasks into fewer groups until that groups are full of tasks.
> > 
> > Some power testing data is in the last 2 patches.
> > 
> > V4 change:
> > a, fix few bugs and clean up code according to Morten Rasmussen, Mike
> > Galbraith and Namhyung Kim. Thanks!
> > b, take Morten's suggestion to set different criteria for different
> > policy in small task packing.
> > c, shorter latency in power aware scheduling.
> > 
> > V3 change:
> > a, engaged nr_running in max potential utils consideration in periodic
> > power balancing.
> > b, try exec/wake small tasks on running cpu not idle cpu.
> > 
> > V2 change:
> > a, add lazy power scheduling to deal with kbuild like benchmark.
> > 
> > 
> > Thanks Fengguang Wu for the build testing of this patchset!
> 
> 
> Add some testing report summary that were posted:
> Alex Shi tested the benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, 
> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
> loopback netperf. on core2, nhm, wsm, snb, platforms: 
>   a, no clear performance change on performance balance
>   b, specjbb2005 drop 5~7% on balance/powersaving policy on SNB/NHM 
> platforms; hackbench drop 30~70% SNB EP4S machine.
>   c, no other peformance change on balance/powersaving machine.
> 
> test result from Mike Galbraith:
> -
> With aim7 compute on 4 node 40 core box, I see stable throughput
> improvement at tasks = nr_cores and below w. balance and powersaving. 
> 
>  3.8.0-performance  3.8.0-balance  
> 3.8.0-powersaving
> Tasksjobs/min/task   cpu   jobs/min/task   cpujobs/min/task   
>   cpu
> 1 432.8571  3.99433.4764  3.97 433.1665   
>  3.98
> 5 480.1902 12.49510.9612  7.55 497.5369   
>  8.22
>10 429.1785 40.14533.4507 11.13 518.3918   
> 12.15
>20 424.3697 63.14529.7203 23.72 528.7958   
> 22.08
>40 419.0871171.42500.8264 51.44 517.0648   
> 42.45
> 
> No deltas after that.  There were also no deltas between patched kernel
> using performance policy and virgin source.
> --
> 
> Ingo, I appreciate for any comments from you. :)

Have you tried to quantify the actual real or expected power 
savings with the knob enabled?

I'd also love to have an automatic policy here, with a knob that 
has 3 values:

   0: always disabled
   1: automatic
   2: always enabled

here enabled/disabled is your current knob's functionality, and 
those can also be used

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-02-04 Thread Ingo Molnar


* Alex Shi alex@intel.com wrote:

 On 01/24/2013 11:06 AM, Alex Shi wrote:
  Since the runnable info needs 345ms to accumulate, balancing
  doesn't do well for many tasks burst waking. After talking with Mike
  Galbraith, we are agree to just use runnable avg in power friendly 
  scheduling and keep current instant load in performance scheduling for 
  low latency.
  
  So the biggest change in this version is removing runnable load avg in
  balance and just using runnable data in power balance.
  
  The patchset bases on Linus' tree, includes 3 parts,
  ** 1, bug fix and fork/wake balancing clean up. patch 1~5,
  --
  the first patch remove one domain level. patch 2~5 simplified fork/wake
  balancing, it can increase 10+% hackbench performance on our 4 sockets
  SNB EP machine.
  
  V3 change:
  a, added the first patch to remove one domain level on x86 platform.
  b, some small changes according to Namhyung Kim's comments, thanks!
  
  ** 2, bug fix of load avg and remove the CONFIG_FAIR_GROUP_SCHED limit
  --
  patch 6~8, That using runnable avg in load balancing, with
  two initial runnable variables fix.
  
  V4 change:
  a, remove runnable log avg using in balancing.
  
  V3 change:
  a, use rq-cfs.runnable_load_avg as cpu load not
  rq-avg.load_avg_contrib, since the latter need much time to accumulate
  for new forked task,
  b, a build issue fixed with Namhyung Kim's reminder.
  
  ** 3, power awareness scheduling, patch 9~18.
  --
  The subset implement/consummate the rough power aware scheduling
  proposal: https://lkml.org/lkml/2012/8/13/139.
  It defines 2 new power aware policy 'balance' and 'powersaving' and then
  try to spread or pack tasks on each sched groups level according the
  different scheduler policy. That can save much power when task number in
  system is no more then LCPU number.
  
  As mentioned in the power aware scheduler proposal, Power aware
  scheduling has 2 assumptions:
  1, race to idle is helpful for power saving
  2, pack tasks on less sched_groups will reduce power consumption
  
  The first assumption make performance policy take over scheduling when
  system busy.
  The second assumption make power aware scheduling try to move
  disperse tasks into fewer groups until that groups are full of tasks.
  
  Some power testing data is in the last 2 patches.
  
  V4 change:
  a, fix few bugs and clean up code according to Morten Rasmussen, Mike
  Galbraith and Namhyung Kim. Thanks!
  b, take Morten's suggestion to set different criteria for different
  policy in small task packing.
  c, shorter latency in power aware scheduling.
  
  V3 change:
  a, engaged nr_running in max potential utils consideration in periodic
  power balancing.
  b, try exec/wake small tasks on running cpu not idle cpu.
  
  V2 change:
  a, add lazy power scheduling to deal with kbuild like benchmark.
  
  
  Thanks Fengguang Wu for the build testing of this patchset!
 
 
 Add some testing report summary that were posted:
 Alex Shi tested the benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, 
 hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
 loopback netperf. on core2, nhm, wsm, snb, platforms: 
   a, no clear performance change on performance balance
   b, specjbb2005 drop 5~7% on balance/powersaving policy on SNB/NHM 
 platforms; hackbench drop 30~70% SNB EP4S machine.
   c, no other peformance change on balance/powersaving machine.
 
 test result from Mike Galbraith:
 -
 With aim7 compute on 4 node 40 core box, I see stable throughput
 improvement at tasks = nr_cores and below w. balance and powersaving. 
 
  3.8.0-performance  3.8.0-balance  
 3.8.0-powersaving
 Tasksjobs/min/task   cpu   jobs/min/task   cpujobs/min/task   
   cpu
 1 432.8571  3.99433.4764  3.97 433.1665   
  3.98
 5 480.1902 12.49510.9612  7.55 497.5369   
  8.22
10 429.1785 40.14533.4507 11.13 518.3918   
 12.15
20 424.3697 63.14529.7203 23.72 528.7958   
 22.08
40 419.0871171.42500.8264 51.44 517.0648   
 42.45
 
 No deltas after that.  There were also no deltas between patched kernel
 using performance policy and virgin source.
 --
 
 Ingo, I appreciate for any comments from you. :)

Have you tried to quantify the actual real or expected power 
savings with the knob enabled?

I'd also love to have an automatic policy here, with a knob that 
has 3 values:

   0: always disabled
   1: automatic
   2: always enabled

here enabled/disabled is your current knob's functionality, and 
those can also be used by user-space policy daemons/handlers.

The interesting thing would be '1' which should be the default: 
on laptops that are on battery it should result in a power

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-02-04 Thread Alex Shi


 Ingo, I appreciate for any comments from you. :)
 
 Have you tried to quantify the actual real or expected power 
 savings with the knob enabled?

Thanks a lot for your comments! :)

Yes, the following power data copied form patch 17th:
---
A test can show the effort on different policy:
for ((i = 0; i  I; i++)) ; do while true; do :; done done

On my SNB laptop with 4core* HT: the data is Watts
powersaving balance performance
i = 2   40  54  54
i = 4   57  64* 68
i = 8   68  68  68

Note:
When i = 4 with balance policy, the power may change in 57~68Watt,
since the HT capacity and core capacity are both 1.

on SNB EP machine with 2 sockets * 8 cores * HT:
powersaving balance performance
i = 4   190 201 238
i = 8   205 241 268
i = 16  271 348 376

If system has few continued tasks, use power policy can get
the performance/power gain. Like sysbench fileio randrw test with 16
thread on the SNB EP box.
=

and the following from patch 18th
---
On my SNB EP 2 sockets machine with 8 cores * HT: 'make -j x vmlinux'
results:

powersaving balance performance
x = 1175.603 /417 13  175.220 /416 13176.073 /407 13
x = 2192.215 /218 23  194.522 /202 25217.393 /200 23
x = 4205.226 /124 39  208.823 /114 42230.425 /105 41
x = 8236.369 /71 59   249.005 /65 61 257.661 /62 62
x = 16   283.842 /48 73   307.465 /40 81 309.336 /39 82
x = 32   325.197 /32 96   333.503 /32 93 336.138 /32 92

data explains: 175.603 /417 13
175.603: average Watts
417: seconds(compile time)
13:  scaled performance/power = 100 / seconds / watts
=

some data for parallel compress: https://lkml.org/lkml/2012/12/11/155
---
Another testing of parallel compress with pigz on Linus' git tree.
results show we get much better performance/power with powersaving and
balance policy:

testing command:
#pigz -k -c  -p$x -r linux*  /dev/null

On a NHM EP box
 powersaving   balance   performance
x = 4166.516 /88 68   170.515 /82 71 165.283 /103 58
x = 8173.654 /61 94   177.693 /60 93 172.31 /76 76

On a 2 sockets SNB EP box.
 powersaving   balance   performance
x = 4190.995 /149 35  200.6 /129 38  208.561 /135 35
x = 8197.969 /108 46  208.885 /103 46213.96 /108 43
x = 16   205.163 /76 64   212.144 /91 51 229.287 /97 44

data format is: 166.516 /88 68
166.516: average Watts
88: seconds(compress time)
68:  scaled performance/power = 100 / time / power
=

BTW, bltk-game with openarena dropped 0.3/1.5 Watt on powersaving policy
or 0.2/0.5 Watt on balance policy on my laptop wsm/snb;
 
 I'd also love to have an automatic policy here, with a knob that 
 has 3 values:
 
0: always disabled
1: automatic
2: always enabled
 
 here enabled/disabled is your current knob's functionality, and 
 those can also be used by user-space policy daemons/handlers.

Sure, this patch has a knob for user-space policy selecting,

$cat /sys/devices/system/cpu/sched_policy/available_sched_policy
performance powersaving balance

User can change the policy by commend 'echo':
 echo performance  /sys/devices/system/cpu/current_sched_policy

The 'performance' policy means 'always disabled' power friendly scheduling.

The 'balance/powersaving' is automatic power friendly scheduling, since
system will auto bypass power scheduling when cpus utilisation in a
sched domain is beyond the domain's cpu weight (powersaving) or beyond
the domain's capacity (balance).

There is no always enabled power scheduling, since the patchset bases on
'race to idle'. but it's easy to add this function if needed.

 
 The interesting thing would be '1' which should be the default: 
 on laptops that are on battery it should result in a power 
 saving policy, on laptops that are on AC or on battery-less 
 systems it should mean 'performance' policy.

yes, with above sysfs interface it is easy to be done. :)
 
 It should generally default to 'performance', switching to 
 'power saving on' only if there's positive, reliable information 
 somewhere in the kernel that we are operating on battery power. 
 A callback or two would have to go into the ACPI battery driver 
 I suspect.
 
 So I'd like this feature to be a tangible improvement for laptop 
 users (as long as the laptop hardware is passing us battery/AC 
 events reliably).

Maybe it is better to let system admin change it from user space? I am
not sure some one like to enable a call back in ACPI battery driver?

CC to Zhang Rui.
 
 Or something like that - with .config switches to influence 
 these values as

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-02-03 Thread Alex Shi

On 01/24/2013 11:06 AM, Alex Shi wrote:
> Since the runnable info needs 345ms to accumulate, balancing
> doesn't do well for many tasks burst waking. After talking with Mike
> Galbraith, we are agree to just use runnable avg in power friendly 
> scheduling and keep current instant load in performance scheduling for 
> low latency.
> 
> So the biggest change in this version is removing runnable load avg in
> balance and just using runnable data in power balance.
> 
> The patchset bases on Linus' tree, includes 3 parts,
> ** 1, bug fix and fork/wake balancing clean up. patch 1~5,
> --
> the first patch remove one domain level. patch 2~5 simplified fork/wake
> balancing, it can increase 10+% hackbench performance on our 4 sockets
> SNB EP machine.
> 
> V3 change:
> a, added the first patch to remove one domain level on x86 platform.
> b, some small changes according to Namhyung Kim's comments, thanks!
> 
> ** 2, bug fix of load avg and remove the CONFIG_FAIR_GROUP_SCHED limit
> --
> patch 6~8, That using runnable avg in load balancing, with
> two initial runnable variables fix.
> 
> V4 change:
> a, remove runnable log avg using in balancing.
> 
> V3 change:
> a, use rq->cfs.runnable_load_avg as cpu load not
> rq->avg.load_avg_contrib, since the latter need much time to accumulate
> for new forked task,
> b, a build issue fixed with Namhyung Kim's reminder.
> 
> ** 3, power awareness scheduling, patch 9~18.
> --
> The subset implement/consummate the rough power aware scheduling
> proposal: https://lkml.org/lkml/2012/8/13/139.
> It defines 2 new power aware policy 'balance' and 'powersaving' and then
> try to spread or pack tasks on each sched groups level according the
> different scheduler policy. That can save much power when task number in
> system is no more then LCPU number.
> 
> As mentioned in the power aware scheduler proposal, Power aware
> scheduling has 2 assumptions:
> 1, race to idle is helpful for power saving
> 2, pack tasks on less sched_groups will reduce power consumption
> 
> The first assumption make performance policy take over scheduling when
> system busy.
> The second assumption make power aware scheduling try to move
> disperse tasks into fewer groups until that groups are full of tasks.
> 
> Some power testing data is in the last 2 patches.
> 
> V4 change:
> a, fix few bugs and clean up code according to Morten Rasmussen, Mike
> Galbraith and Namhyung Kim. Thanks!
> b, take Morten's suggestion to set different criteria for different
> policy in small task packing.
> c, shorter latency in power aware scheduling.
> 
> V3 change:
> a, engaged nr_running in max potential utils consideration in periodic
> power balancing.
> b, try exec/wake small tasks on running cpu not idle cpu.
> 
> V2 change:
> a, add lazy power scheduling to deal with kbuild like benchmark.
> 
> 
> Thanks Fengguang Wu for the build testing of this patchset!


Add some testing report summary that were posted:
Alex Shi tested the benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, 
hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
loopback netperf. on core2, nhm, wsm, snb, platforms: 
a, no clear performance change on performance balance
b, specjbb2005 drop 5~7% on balance/powersaving policy on SNB/NHM 
platforms; hackbench drop 30~70% SNB EP4S machine.
c, no other peformance change on balance/powersaving machine.

test result from Mike Galbraith:
-
With aim7 compute on 4 node 40 core box, I see stable throughput
improvement at tasks = nr_cores and below w. balance and powersaving. 

 3.8.0-performance  3.8.0-balance  
3.8.0-powersaving
Tasksjobs/min/task   cpu   jobs/min/task   cpujobs/min/task 
cpu
1 432.8571  3.99433.4764  3.97 433.1665 
   3.98
5 480.1902 12.49510.9612  7.55 497.5369 
   8.22
   10 429.1785 40.14533.4507 11.13 518.3918 
  12.15
   20 424.3697 63.14529.7203 23.72 528.7958 
  22.08
   40 419.0871171.42500.8264 51.44 517.0648 
  42.45

No deltas after that.  There were also no deltas between patched kernel
using performance policy and virgin source.
--

Ingo, I appreciate for any comments from you. :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-02-03 Thread Alex Shi

On 01/24/2013 11:06 AM, Alex Shi wrote:
 Since the runnable info needs 345ms to accumulate, balancing
 doesn't do well for many tasks burst waking. After talking with Mike
 Galbraith, we are agree to just use runnable avg in power friendly 
 scheduling and keep current instant load in performance scheduling for 
 low latency.
 
 So the biggest change in this version is removing runnable load avg in
 balance and just using runnable data in power balance.
 
 The patchset bases on Linus' tree, includes 3 parts,
 ** 1, bug fix and fork/wake balancing clean up. patch 1~5,
 --
 the first patch remove one domain level. patch 2~5 simplified fork/wake
 balancing, it can increase 10+% hackbench performance on our 4 sockets
 SNB EP machine.
 
 V3 change:
 a, added the first patch to remove one domain level on x86 platform.
 b, some small changes according to Namhyung Kim's comments, thanks!
 
 ** 2, bug fix of load avg and remove the CONFIG_FAIR_GROUP_SCHED limit
 --
 patch 6~8, That using runnable avg in load balancing, with
 two initial runnable variables fix.
 
 V4 change:
 a, remove runnable log avg using in balancing.
 
 V3 change:
 a, use rq-cfs.runnable_load_avg as cpu load not
 rq-avg.load_avg_contrib, since the latter need much time to accumulate
 for new forked task,
 b, a build issue fixed with Namhyung Kim's reminder.
 
 ** 3, power awareness scheduling, patch 9~18.
 --
 The subset implement/consummate the rough power aware scheduling
 proposal: https://lkml.org/lkml/2012/8/13/139.
 It defines 2 new power aware policy 'balance' and 'powersaving' and then
 try to spread or pack tasks on each sched groups level according the
 different scheduler policy. That can save much power when task number in
 system is no more then LCPU number.
 
 As mentioned in the power aware scheduler proposal, Power aware
 scheduling has 2 assumptions:
 1, race to idle is helpful for power saving
 2, pack tasks on less sched_groups will reduce power consumption
 
 The first assumption make performance policy take over scheduling when
 system busy.
 The second assumption make power aware scheduling try to move
 disperse tasks into fewer groups until that groups are full of tasks.
 
 Some power testing data is in the last 2 patches.
 
 V4 change:
 a, fix few bugs and clean up code according to Morten Rasmussen, Mike
 Galbraith and Namhyung Kim. Thanks!
 b, take Morten's suggestion to set different criteria for different
 policy in small task packing.
 c, shorter latency in power aware scheduling.
 
 V3 change:
 a, engaged nr_running in max potential utils consideration in periodic
 power balancing.
 b, try exec/wake small tasks on running cpu not idle cpu.
 
 V2 change:
 a, add lazy power scheduling to deal with kbuild like benchmark.
 
 
 Thanks Fengguang Wu for the build testing of this patchset!


Add some testing report summary that were posted:
Alex Shi tested the benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, 
hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
loopback netperf. on core2, nhm, wsm, snb, platforms: 
a, no clear performance change on performance balance
b, specjbb2005 drop 5~7% on balance/powersaving policy on SNB/NHM 
platforms; hackbench drop 30~70% SNB EP4S machine.
c, no other peformance change on balance/powersaving machine.

test result from Mike Galbraith:
-
With aim7 compute on 4 node 40 core box, I see stable throughput
improvement at tasks = nr_cores and below w. balance and powersaving. 

 3.8.0-performance  3.8.0-balance  
3.8.0-powersaving
Tasksjobs/min/task   cpu   jobs/min/task   cpujobs/min/task 
cpu
1 432.8571  3.99433.4764  3.97 433.1665 
   3.98
5 480.1902 12.49510.9612  7.55 497.5369 
   8.22
   10 429.1785 40.14533.4507 11.13 518.3918 
  12.15
   20 424.3697 63.14529.7203 23.72 528.7958 
  22.08
   40 419.0871171.42500.8264 51.44 517.0648 
  42.45

No deltas after that.  There were also no deltas between patched kernel
using performance policy and virgin source.
--

Ingo, I appreciate for any comments from you. :)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Alex Shi

On 01/28/2013 01:19 PM, Alex Shi wrote:
> On 01/27/2013 06:40 PM, Borislav Petkov wrote:
>> On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
>>> Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
>>> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
>>> loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
>>> performance change found.
>>
>> Ok, good, You could put that in one of the commit messages so that it is
>> there and people know that this patchset doesn't cause perf regressions
>> with the bunch of benchmarks.
>>
>>> I also tested balance policy/powersaving policy with above benchmark,
>>> found, the specjbb2005 drop much 30~50% on both of policy whenever
>>> with openjdk or jrockit. and hackbench drops a lots with powersaving
>>> policy on snb 4 sockets platforms. others has no clear change.

Sorry, the testing configuration is unfair for this specjbb2005 results
here. I set JVM hard pin and use hugepage for peak performance.

When remove the hard pin and no hugepage, the balance/powersaving both
drop about 5% VS performance policy, and performance policy result is
similar with 3.8-rc5.

>>
>> I guess this is expected because there has to be some performance hit
>> when saving power...
>>
> 
> BTW, I had tested the v3 version based on sched numa -- on tip/master.
> The specjbb just has about 5~7% dropping on balance/powersaving policy.
> The power scheduling done after the numa scheduling logical.
> 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Mike Galbraith

On Tue, 2013-01-29 at 09:45 +0800, Alex Shi wrote: 
> On 01/28/2013 11:47 PM, Mike Galbraith wrote:

> > monteverdi:/abuild/mike/:[0]# echo 1 > /sys/devices/system/cpu/cpufreq/boost
> > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > 014635  00058160
> > 014633  00058592
> > 014638  00058592
> > 014636  00058160
> > 014632  00058200
> > 014634  00058704
> > 014639  00058704
> > 014641  00058200
> > 014640  00058560
> > 014637  00058560
> > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > 014673  00059504
> > 014676  00059504
> > 014674  00059064
> > 014672  00059064
> > 014675  00058560
> > 014671  00058560
> > 014677  00059248
> > 014668  00058864
> > 014669  00059248
> > 014670  00058864
> > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > 014686  00043472
> > 014689  00043472
> > 014685  00043760
> > 014690  00043760
> > 014687  00043528
> > 014688  00043528  (hmm)
> > 014683  00043216
> > 014692  00043208
> > 014684  00043336
> > 014691  00043336
> 
> I am sorry Mike. does above 3 times testing has a same sched policy? and
> same question for the following testing.

Yeah, they're back to back repeats.  Using dirt simple massive_intr
didn't help clarify aim7 oddity.

aim7 is fully repeatable, seems to be saying that consolidation of small
independent jobs is a win, that spreading before fully saturated has its
price, just as consolidation of large coordinated burst has its price.

Seems to cut both ways.. but why not, everything else does.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Alex Shi

On 01/28/2013 11:47 PM, Mike Galbraith wrote:
> 014776  00059528
> 
> Ok box, whatever blows your skirt up.  I'm done.

Many thanks for so much fruitful testing! :D

-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Alex Shi

On 01/28/2013 11:47 PM, Mike Galbraith wrote:
> On Mon, 2013-01-28 at 06:17 +0100, Mike Galbraith wrote:
> 
> Ok damnit.
> 
>> monteverdi:/abuild/mike/:[0]# echo powersaving > 
>> /sys/devices/system/cpu/sched_policy/current_sched_policy
>> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
>> 043321  00058616
>> 043313  00058616
>> 043318  00058968
>> 043317  00058968
>> 043316  00059184
>> 043319  00059192
>> 043320  00059048
>> 043314  00059048
>> 043312  00058176
>> 043315  00058184
> 
> That was boost if you like, and free to roam 4 nodes.
> 
> monteverdi:/abuild/mike/:[0]# echo powersaving > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# echo 0 > /sys/devices/system/cpu/cpufreq/boost
> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> 014618  00039616
> 014623  00039256
> 014617  00039256
> 014620  00039304
> 014621  00039304  (wait a minute, you said..)
> 014616  00039080
> 014625  00039064
> 014622  00039672
> 014624  00039624
> 014619  00039672
> monteverdi:/abuild/mike/:[0]# echo 1 > /sys/devices/system/cpu/cpufreq/boost
> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> 014635  00058160
> 014633  00058592
> 014638  00058592
> 014636  00058160
> 014632  00058200
> 014634  00058704
> 014639  00058704
> 014641  00058200
> 014640  00058560
> 014637  00058560
> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> 014673  00059504
> 014676  00059504
> 014674  00059064
> 014672  00059064
> 014675  00058560
> 014671  00058560
> 014677  00059248
> 014668  00058864
> 014669  00059248
> 014670  00058864
> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> 014686  00043472
> 014689  00043472
> 014685  00043760
> 014690  00043760
> 014687  00043528
> 014688  00043528  (hmm)
> 014683  00043216
> 014692  00043208
> 014684  00043336
> 014691  00043336

I am sorry Mike. does above 3 times testing has a same sched policy? and
same question for the following testing.

> monteverdi:/abuild/mike/:[0]# echo 0 > /sys/devices/system/cpu/cpufreq/boost
> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> 014701  00039344
> 014707  00039344
> 014709  00038976
> 014700  00038976
> 014708  00039256  (hmm)
> 014703  00039256
> 014705  00039400
> 014704  00039400
> 014706  00039320
> 014702  00039320
> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> 014713  00058552
> 014716  00058664
> 014719  00058600
> 014715  00058600
> 014718  00058520
> 014722  00058400
> 014721  00058768
> 014717  00058768
> 014714  00058552
> 014720  00058560
> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> 014732  00058736
> 014734  00058760
> 014729  00040872
> 014736  00059184
> 014728  00059184
> 014727  00058744
> 014733  00058760
> 014731  00059320
> 014730  00059280
> 014735  00041072
> monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
> 014749  00040608
> 014748  00040616
> 014745  00039360
> 014750  00039360
> 014751  00039416
> 014747  00039416
> 014752  00039336
> 014746  00039336
> 014744  00039480
> 014753  00039480
> monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
> 014757  00039272
> 014761  00039272
> 014765  00039528
> 014756  00039528
> 014759  00039352
> 014760  00039352
> 014764  00039248
> 014762  00039248
> 014758  00039352
> 014763  00039352
> monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
> 014773  00059680
> 014769  00059680
> 014768  00059144
> 014777  00059144
> 014775  00059688
> 014774  00059688
> 014770  00059264
> 014771  00059264
> 014772  00059528
> 014776  00059528
> 
> Ok box, whatever blows your skirt up.  I'm done.
> 
> Non
> Uniform
> Mysterious
> Artifacts
> 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Alex Shi

On 01/28/2013 11:55 PM, Mike Galbraith wrote:
> On Mon, 2013-01-28 at 16:22 +0100, Borislav Petkov wrote: 
>> On Mon, Jan 28, 2013 at 12:40:46PM +0100, Mike Galbraith wrote:
 No no, that's not restricted to one node.  It's just overloaded because
 I turned balancing off at the NODE domain level.
>>>
>>> Which shows only that I was multitasking, and in a rush.  Boy was that
>>> dumb.  Hohum.
>>
>> Ok, let's take a step back and slow it down a bit so that people like me
>> can understand it: you want to try it with disabled load balancing on
>> the node level, AFAICT. But with that many tasks, perf will suck anyway,
>> no? Unless you want to benchmark the numa-aware aspect and see whether
>> load balancing on the node level feels differently, perf-wise?
> 
> The broken thought was, since it's not wakeup path, stop node balance..
> but killing all of it killed FORK/EXEC balance, oops.

Um. sure. so guess all of tasks just running on one node.
> 
> I think I'm done with this thing though.  See mail I just sent.   There
> are better things to do than letting box jerk my chain endlessly ;-)
> 
> -Mike
> 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Alex Shi


> Benchmark   Version Machine Run Date
> AIM Multiuser Benchmark - Suite VII "1.1"   performance Jan 28 
> 08:09:20 2013
> 
> Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
> 1   438.8   100 13.83.8 7.3135
> 5   2634.8  99  11.57.2 8.7826
> 10  5396.3  99  11.211.48.9938
> 20  10725.7 99  11.324.08.9381
> 40  20183.2 99  12.038.58.4097
> 80  35620.9 99  13.671.47.4210
> 160 57203.5 98  16.9137.8   5.9587
> 320 81995.8 98  23.7271.3   4.2706
> 
> then the above no_node-load_balance thing suffers a small-ish dip at 320
> tasks, yeah.
> 
> And AFAICR, the effect of disabling boosting will be visible in the
> small count tasks cases anyway because if you saturate the cores with
> tasks, the boosting algorithms tend to get the box out of boosting for
> the simple reason that the power/perf headroom simply disappears due to
> the SOC being busy.

Sure. and according to the context of serial email. guess this result
has boosting enabled, right?


> 
>> 640 100294.898  38.7570.9   2.6118
>> 1280115998.297  66.91132.8  1.5104
>> 2560125820.097  123.3   2256.6  0.8191
> 
> I dunno about those. maybe this is expected with so many tasks or do we
> want to optimize that case further?
> 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Alex Shi


>> then the above no_node-load_balance thing suffers a small-ish dip at 320
>> tasks, yeah.
> 
> No no, that's not restricted to one node.  It's just overloaded because
> I turned balancing off at the NODE domain level.
> 
>> And AFAICR, the effect of disabling boosting will be visible in the
>> small count tasks cases anyway because if you saturate the cores with
>> tasks, the boosting algorithms tend to get the box out of boosting for
>> the simple reason that the power/perf headroom simply disappears due to
>> the SOC being busy.
>>
>>> 640 100294.898  38.7570.9   2.6118
>>> 1280115998.297  66.91132.8  1.5104
>>> 2560125820.097  123.3   2256.6  0.8191
>>
>> I dunno about those. maybe this is expected with so many tasks or do we
>> want to optimize that case further?
> 
> When using all 4 nodes properly, that's still scaling.  Here, I

Without node regular balancing, only waking balance left in
select_task_rq_fair for aim7 testing, (I just assume you used shared
workfile, most of testing is cpu density and only few exec/fork load).

Since, waking balance just happened in same llc domain. guess that is
the reason for this.

> intentionally screwed up balancing to watch the low end.  High end is
> expected wreckage.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Alex Shi

On 01/28/2013 02:42 PM, Mike Galbraith wrote:
> Back to original 1ms sleep, 8ms work, turning NUMA box into a single
> node 10 core box with numactl.
> 
> monteverdi:/abuild/mike/:[0]# echo powersaving > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
> 045286  00043872
> 045289  00043464
> 045284  00043488
> 045287  00043440
> 045283  00043416
> 045281  00044456
> 045285  00043456
> 045288  00044312
> 045280  00043048
> 045282  00043240

Um, no idea why the powersaving data is so low.
> monteverdi:/abuild/mike/:[0]# echo balance > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
> 045300  00052536
> 045307  00052472
> 045304  00052536
> 045299  00052536
> 045305  00052520
> 045306  00052528
> 045302  00052528
> 045303  00052528
> 045308  00052512
> 045301  00052520
> monteverdi:/abuild/mike/:[0]# echo performance > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
> 045339  00052600
> 045340  00052608
> 045338  00052600
> 045337  00052608
> 045343  00052600
> 045341  00052600
> 045336  00052608
> 045335  00052616
> 045334  00052576


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Mike Galbraith

On Mon, 2013-01-28 at 16:22 +0100, Borislav Petkov wrote: 
> On Mon, Jan 28, 2013 at 12:40:46PM +0100, Mike Galbraith wrote:
> > > No no, that's not restricted to one node.  It's just overloaded because
> > > I turned balancing off at the NODE domain level.
> > 
> > Which shows only that I was multitasking, and in a rush.  Boy was that
> > dumb.  Hohum.
> 
> Ok, let's take a step back and slow it down a bit so that people like me
> can understand it: you want to try it with disabled load balancing on
> the node level, AFAICT. But with that many tasks, perf will suck anyway,
> no? Unless you want to benchmark the numa-aware aspect and see whether
> load balancing on the node level feels differently, perf-wise?

The broken thought was, since it's not wakeup path, stop node balance..
but killing all of it killed FORK/EXEC balance, oops.

I think I'm done with this thing though.  See mail I just sent.   There
are better things to do than letting box jerk my chain endlessly ;-)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Mike Galbraith

On Mon, 2013-01-28 at 06:17 +0100, Mike Galbraith wrote:

Ok damnit.

> monteverdi:/abuild/mike/:[0]# echo powersaving > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> 043321  00058616
> 043313  00058616
> 043318  00058968
> 043317  00058968
> 043316  00059184
> 043319  00059192
> 043320  00059048
> 043314  00059048
> 043312  00058176
> 043315  00058184

That was boost if you like, and free to roam 4 nodes.

monteverdi:/abuild/mike/:[0]# echo powersaving > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# echo 0 > /sys/devices/system/cpu/cpufreq/boost
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
014618  00039616
014623  00039256
014617  00039256
014620  00039304
014621  00039304  (wait a minute, you said..)
014616  00039080
014625  00039064
014622  00039672
014624  00039624
014619  00039672
monteverdi:/abuild/mike/:[0]# echo 1 > /sys/devices/system/cpu/cpufreq/boost
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
014635  00058160
014633  00058592
014638  00058592
014636  00058160
014632  00058200
014634  00058704
014639  00058704
014641  00058200
014640  00058560
014637  00058560
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
014673  00059504
014676  00059504
014674  00059064
014672  00059064
014675  00058560
014671  00058560
014677  00059248
014668  00058864
014669  00059248
014670  00058864
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
014686  00043472
014689  00043472
014685  00043760
014690  00043760
014687  00043528
014688  00043528  (hmm)
014683  00043216
014692  00043208
014684  00043336
014691  00043336
monteverdi:/abuild/mike/:[0]# echo 0 > /sys/devices/system/cpu/cpufreq/boost
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
014701  00039344
014707  00039344
014709  00038976
014700  00038976
014708  00039256  (hmm)
014703  00039256
014705  00039400
014704  00039400
014706  00039320
014702  00039320
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
014713  00058552
014716  00058664
014719  00058600
014715  00058600
014718  00058520
014722  00058400
014721  00058768
014717  00058768
014714  00058552
014720  00058560
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
014732  00058736
014734  00058760
014729  00040872
014736  00059184
014728  00059184
014727  00058744
014733  00058760
014731  00059320
014730  00059280
014735  00041072
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
014749  00040608
014748  00040616
014745  00039360
014750  00039360
014751  00039416
014747  00039416
014752  00039336
014746  00039336
014744  00039480
014753  00039480
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
014757  00039272
014761  00039272
014765  00039528
014756  00039528
014759  00039352
014760  00039352
014764  00039248
014762  00039248
014758  00039352
014763  00039352
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
014773  00059680
014769  00059680
014768  00059144
014777  00059144
014775  00059688
014774  00059688
014770  00059264
014771  00059264
014772  00059528
014776  00059528

Ok box, whatever blows your skirt up.  I'm done.

Non
Uniform
Mysterious
Artifacts

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Borislav Petkov

On Mon, Jan 28, 2013 at 12:40:46PM +0100, Mike Galbraith wrote:
> > No no, that's not restricted to one node.  It's just overloaded because
> > I turned balancing off at the NODE domain level.
> 
> Which shows only that I was multitasking, and in a rush.  Boy was that
> dumb.  Hohum.

Ok, let's take a step back and slow it down a bit so that people like me
can understand it: you want to try it with disabled load balancing on
the node level, AFAICT. But with that many tasks, perf will suck anyway,
no? Unless you want to benchmark the numa-aware aspect and see whether
load balancing on the node level feels differently, perf-wise?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Mike Galbraith

On Mon, 2013-01-28 at 12:32 +0100, Mike Galbraith wrote: 
> On Mon, 2013-01-28 at 12:29 +0100, Borislav Petkov wrote: 
> > On Mon, Jan 28, 2013 at 11:44:44AM +0100, Mike Galbraith wrote:
> > > On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: 
> > > > On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote:
> > > > > Zzzt.  Wish I could turn turbo thingy off.
> > > > 
> > > > Try setting /sys/devices/system/cpu/cpufreq/boost to 0.
> > > 
> > > How convenient (test) works too.
> > > 
> > > So much for turbo boost theory.  Nothing changed until I turned load
> > > balancing off at NODE.  High end went to hell (gee), but low end... 
> > >   
> > > Benchmark   Version Machine Run Date
> > > AIM Multiuser Benchmark - Suite VII "1.1"   
> > > performance-no-node-load_balance Jan 28 11:20:12 2013
> > > 
> > > Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
> > > 1   436.3   100 13.93.9 7.2714
> > > 5   2637.1  99  11.57.3 8.7903
> > > 10  5415.5  99  11.211.39.0259
> > > 20  10603.7 99  11.424.88.8364
> > > 40  20066.2 99  12.140.58.3609
> > > 80  35079.6 99  13.875.57.3082
> > > 160 55884.7 98  17.3145.6   5.8213
> > > 320 79345.3 98  24.4287.4   4.1326
> > 
> > If you're talking about those results from earlier:
> > 
> > Benchmark   Version Machine Run Date
> > AIM Multiuser Benchmark - Suite VII "1.1"   performance Jan 28 
> > 08:09:20 2013
> > 
> > Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
> > 1   438.8   100 13.83.8 7.3135
> > 5   2634.8  99  11.57.2 8.7826
> > 10  5396.3  99  11.211.48.9938
> > 20  10725.7 99  11.324.08.9381
> > 40  20183.2 99  12.038.58.4097
> > 80  35620.9 99  13.671.47.4210
> > 160 57203.5 98  16.9137.8   5.9587
> > 320 81995.8 98  23.7271.3   4.2706
> > 
> > then the above no_node-load_balance thing suffers a small-ish dip at 320
> > tasks, yeah.
> 
> No no, that's not restricted to one node.  It's just overloaded because
> I turned balancing off at the NODE domain level.

Which shows only that I was multitasking, and in a rush.  Boy was that
dumb.  Hohum.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Mike Galbraith

On Mon, 2013-01-28 at 12:29 +0100, Borislav Petkov wrote: 
> On Mon, Jan 28, 2013 at 11:44:44AM +0100, Mike Galbraith wrote:
> > On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: 
> > > On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote:
> > > > Zzzt.  Wish I could turn turbo thingy off.
> > > 
> > > Try setting /sys/devices/system/cpu/cpufreq/boost to 0.
> > 
> > How convenient (test) works too.
> > 
> > So much for turbo boost theory.  Nothing changed until I turned load
> > balancing off at NODE.  High end went to hell (gee), but low end... 
> >   
> > Benchmark   Version Machine Run Date
> > AIM Multiuser Benchmark - Suite VII "1.1"   
> > performance-no-node-load_balance Jan 28 11:20:12 2013
> > 
> > Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
> > 1   436.3   100 13.93.9 7.2714
> > 5   2637.1  99  11.57.3 8.7903
> > 10  5415.5  99  11.211.39.0259
> > 20  10603.7 99  11.424.88.8364
> > 40  20066.2 99  12.140.58.3609
> > 80  35079.6 99  13.875.57.3082
> > 160 55884.7 98  17.3145.6   5.8213
> > 320 79345.3 98  24.4287.4   4.1326
> 
> If you're talking about those results from earlier:
> 
> Benchmark   Version Machine Run Date
> AIM Multiuser Benchmark - Suite VII "1.1"   performance Jan 28 
> 08:09:20 2013
> 
> Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
> 1   438.8   100 13.83.8 7.3135
> 5   2634.8  99  11.57.2 8.7826
> 10  5396.3  99  11.211.48.9938
> 20  10725.7 99  11.324.08.9381
> 40  20183.2 99  12.038.58.4097
> 80  35620.9 99  13.671.47.4210
> 160 57203.5 98  16.9137.8   5.9587
> 320 81995.8 98  23.7271.3   4.2706
> 
> then the above no_node-load_balance thing suffers a small-ish dip at 320
> tasks, yeah.

No no, that's not restricted to one node.  It's just overloaded because
I turned balancing off at the NODE domain level.

> And AFAICR, the effect of disabling boosting will be visible in the
> small count tasks cases anyway because if you saturate the cores with
> tasks, the boosting algorithms tend to get the box out of boosting for
> the simple reason that the power/perf headroom simply disappears due to
> the SOC being busy.
> 
> > 640 100294.898  38.7570.9   2.6118
> > 1280115998.297  66.91132.8  1.5104
> > 2560125820.097  123.3   2256.6  0.8191
> 
> I dunno about those. maybe this is expected with so many tasks or do we
> want to optimize that case further?

When using all 4 nodes properly, that's still scaling.  Here, I
intentionally screwed up balancing to watch the low end.  High end is
expected wreckage.

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Borislav Petkov

On Mon, Jan 28, 2013 at 11:44:44AM +0100, Mike Galbraith wrote:
> On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: 
> > On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote:
> > > Zzzt.  Wish I could turn turbo thingy off.
> > 
> > Try setting /sys/devices/system/cpu/cpufreq/boost to 0.
> 
> How convenient (test) works too.
> 
> So much for turbo boost theory.  Nothing changed until I turned load
> balancing off at NODE.  High end went to hell (gee), but low end... 
>   
> Benchmark   Version Machine Run Date
> AIM Multiuser Benchmark - Suite VII "1.1"   
> performance-no-node-load_balance Jan 28 11:20:12 2013
> 
> Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
> 1   436.3   100 13.93.9 7.2714
> 5   2637.1  99  11.57.3 8.7903
> 10  5415.5  99  11.211.39.0259
> 20  10603.7 99  11.424.88.8364
> 40  20066.2 99  12.140.58.3609
> 80  35079.6 99  13.875.57.3082
> 160 55884.7 98  17.3145.6   5.8213
> 320 79345.3 98  24.4287.4   4.1326

If you're talking about those results from earlier:

Benchmark   Version Machine Run Date
AIM Multiuser Benchmark - Suite VII "1.1"   performance Jan 28 08:09:20 
2013

Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
1   438.8   100 13.83.8 7.3135
5   2634.8  99  11.57.2 8.7826
10  5396.3  99  11.211.48.9938
20  10725.7 99  11.324.08.9381
40  20183.2 99  12.038.58.4097
80  35620.9 99  13.671.47.4210
160 57203.5 98  16.9137.8   5.9587
320 81995.8 98  23.7271.3   4.2706

then the above no_node-load_balance thing suffers a small-ish dip at 320
tasks, yeah.

And AFAICR, the effect of disabling boosting will be visible in the
small count tasks cases anyway because if you saturate the cores with
tasks, the boosting algorithms tend to get the box out of boosting for
the simple reason that the power/perf headroom simply disappears due to
the SOC being busy.

> 640 100294.898  38.7570.9   2.6118
> 1280115998.297  66.91132.8  1.5104
> 2560125820.097  123.3   2256.6  0.8191

I dunno about those. maybe this is expected with so many tasks or do we
want to optimize that case further?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Mike Galbraith

On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: 
> On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote:
> > Zzzt.  Wish I could turn turbo thingy off.
> 
> Try setting /sys/devices/system/cpu/cpufreq/boost to 0.

How convenient (test) works too.

So much for turbo boost theory.  Nothing changed until I turned load
balancing off at NODE.  High end went to hell (gee), but low end... 
  
Benchmark   Version Machine Run Date
AIM Multiuser Benchmark - Suite VII "1.1"   
performance-no-node-load_balance Jan 28 11:20:12 2013

Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
1   436.3   100 13.93.9 7.2714
5   2637.1  99  11.57.3 8.7903
10  5415.5  99  11.211.39.0259
20  10603.7 99  11.424.88.8364
40  20066.2 99  12.140.58.3609
80  35079.6 99  13.875.57.3082
160 55884.7 98  17.3145.6   5.8213
320 79345.3 98  24.4287.4   4.1326
640 100294.898  38.7570.9   2.6118  
1280115998.297  66.91132.8  1.5104  
2560125820.097  123.3   2256.6  0.8191

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Borislav Petkov

On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote:
> Zzzt.  Wish I could turn turbo thingy off.

Try setting /sys/devices/system/cpu/cpufreq/boost to 0.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Borislav Petkov

On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote:
 Zzzt.  Wish I could turn turbo thingy off.

Try setting /sys/devices/system/cpu/cpufreq/boost to 0.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Mike Galbraith

On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: 
 On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote:
  Zzzt.  Wish I could turn turbo thingy off.
 
 Try setting /sys/devices/system/cpu/cpufreq/boost to 0.

How convenient (test) works too.

So much for turbo boost theory.  Nothing changed until I turned load
balancing off at NODE.  High end went to hell (gee), but low end... 
  
Benchmark   Version Machine Run Date
AIM Multiuser Benchmark - Suite VII 1.1   
performance-no-node-load_balance Jan 28 11:20:12 2013

Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
1   436.3   100 13.93.9 7.2714
5   2637.1  99  11.57.3 8.7903
10  5415.5  99  11.211.39.0259
20  10603.7 99  11.424.88.8364
40  20066.2 99  12.140.58.3609
80  35079.6 99  13.875.57.3082
160 55884.7 98  17.3145.6   5.8213
320 79345.3 98  24.4287.4   4.1326
640 100294.898  38.7570.9   2.6118  
1280115998.297  66.91132.8  1.5104  
2560125820.097  123.3   2256.6  0.8191

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Borislav Petkov

On Mon, Jan 28, 2013 at 11:44:44AM +0100, Mike Galbraith wrote:
 On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: 
  On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote:
   Zzzt.  Wish I could turn turbo thingy off.
  
  Try setting /sys/devices/system/cpu/cpufreq/boost to 0.
 
 How convenient (test) works too.
 
 So much for turbo boost theory.  Nothing changed until I turned load
 balancing off at NODE.  High end went to hell (gee), but low end... 
   
 Benchmark   Version Machine Run Date
 AIM Multiuser Benchmark - Suite VII 1.1   
 performance-no-node-load_balance Jan 28 11:20:12 2013
 
 Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
 1   436.3   100 13.93.9 7.2714
 5   2637.1  99  11.57.3 8.7903
 10  5415.5  99  11.211.39.0259
 20  10603.7 99  11.424.88.8364
 40  20066.2 99  12.140.58.3609
 80  35079.6 99  13.875.57.3082
 160 55884.7 98  17.3145.6   5.8213
 320 79345.3 98  24.4287.4   4.1326

If you're talking about those results from earlier:

Benchmark   Version Machine Run Date
AIM Multiuser Benchmark - Suite VII 1.1   performance Jan 28 08:09:20 
2013

Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
1   438.8   100 13.83.8 7.3135
5   2634.8  99  11.57.2 8.7826
10  5396.3  99  11.211.48.9938
20  10725.7 99  11.324.08.9381
40  20183.2 99  12.038.58.4097
80  35620.9 99  13.671.47.4210
160 57203.5 98  16.9137.8   5.9587
320 81995.8 98  23.7271.3   4.2706

then the above no_node-load_balance thing suffers a small-ish dip at 320
tasks, yeah.

And AFAICR, the effect of disabling boosting will be visible in the
small count tasks cases anyway because if you saturate the cores with
tasks, the boosting algorithms tend to get the box out of boosting for
the simple reason that the power/perf headroom simply disappears due to
the SOC being busy.

 640 100294.898  38.7570.9   2.6118
 1280115998.297  66.91132.8  1.5104
 2560125820.097  123.3   2256.6  0.8191

I dunno about those. maybe this is expected with so many tasks or do we
want to optimize that case further?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Mike Galbraith

On Mon, 2013-01-28 at 12:29 +0100, Borislav Petkov wrote: 
 On Mon, Jan 28, 2013 at 11:44:44AM +0100, Mike Galbraith wrote:
  On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: 
   On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote:
Zzzt.  Wish I could turn turbo thingy off.
   
   Try setting /sys/devices/system/cpu/cpufreq/boost to 0.
  
  How convenient (test) works too.
  
  So much for turbo boost theory.  Nothing changed until I turned load
  balancing off at NODE.  High end went to hell (gee), but low end... 

  Benchmark   Version Machine Run Date
  AIM Multiuser Benchmark - Suite VII 1.1   
  performance-no-node-load_balance Jan 28 11:20:12 2013
  
  Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
  1   436.3   100 13.93.9 7.2714
  5   2637.1  99  11.57.3 8.7903
  10  5415.5  99  11.211.39.0259
  20  10603.7 99  11.424.88.8364
  40  20066.2 99  12.140.58.3609
  80  35079.6 99  13.875.57.3082
  160 55884.7 98  17.3145.6   5.8213
  320 79345.3 98  24.4287.4   4.1326
 
 If you're talking about those results from earlier:
 
 Benchmark   Version Machine Run Date
 AIM Multiuser Benchmark - Suite VII 1.1   performance Jan 28 
 08:09:20 2013
 
 Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
 1   438.8   100 13.83.8 7.3135
 5   2634.8  99  11.57.2 8.7826
 10  5396.3  99  11.211.48.9938
 20  10725.7 99  11.324.08.9381
 40  20183.2 99  12.038.58.4097
 80  35620.9 99  13.671.47.4210
 160 57203.5 98  16.9137.8   5.9587
 320 81995.8 98  23.7271.3   4.2706
 
 then the above no_node-load_balance thing suffers a small-ish dip at 320
 tasks, yeah.

No no, that's not restricted to one node.  It's just overloaded because
I turned balancing off at the NODE domain level.

 And AFAICR, the effect of disabling boosting will be visible in the
 small count tasks cases anyway because if you saturate the cores with
 tasks, the boosting algorithms tend to get the box out of boosting for
 the simple reason that the power/perf headroom simply disappears due to
 the SOC being busy.
 
  640 100294.898  38.7570.9   2.6118
  1280115998.297  66.91132.8  1.5104
  2560125820.097  123.3   2256.6  0.8191
 
 I dunno about those. maybe this is expected with so many tasks or do we
 want to optimize that case further?

When using all 4 nodes properly, that's still scaling.  Here, I
intentionally screwed up balancing to watch the low end.  High end is
expected wreckage.

-Mike


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Mike Galbraith

On Mon, 2013-01-28 at 12:32 +0100, Mike Galbraith wrote: 
 On Mon, 2013-01-28 at 12:29 +0100, Borislav Petkov wrote: 
  On Mon, Jan 28, 2013 at 11:44:44AM +0100, Mike Galbraith wrote:
   On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: 
On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote:
 Zzzt.  Wish I could turn turbo thingy off.

Try setting /sys/devices/system/cpu/cpufreq/boost to 0.
   
   How convenient (test) works too.
   
   So much for turbo boost theory.  Nothing changed until I turned load
   balancing off at NODE.  High end went to hell (gee), but low end... 
 
   Benchmark   Version Machine Run Date
   AIM Multiuser Benchmark - Suite VII 1.1   
   performance-no-node-load_balance Jan 28 11:20:12 2013
   
   Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
   1   436.3   100 13.93.9 7.2714
   5   2637.1  99  11.57.3 8.7903
   10  5415.5  99  11.211.39.0259
   20  10603.7 99  11.424.88.8364
   40  20066.2 99  12.140.58.3609
   80  35079.6 99  13.875.57.3082
   160 55884.7 98  17.3145.6   5.8213
   320 79345.3 98  24.4287.4   4.1326
  
  If you're talking about those results from earlier:
  
  Benchmark   Version Machine Run Date
  AIM Multiuser Benchmark - Suite VII 1.1   performance Jan 28 
  08:09:20 2013
  
  Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
  1   438.8   100 13.83.8 7.3135
  5   2634.8  99  11.57.2 8.7826
  10  5396.3  99  11.211.48.9938
  20  10725.7 99  11.324.08.9381
  40  20183.2 99  12.038.58.4097
  80  35620.9 99  13.671.47.4210
  160 57203.5 98  16.9137.8   5.9587
  320 81995.8 98  23.7271.3   4.2706
  
  then the above no_node-load_balance thing suffers a small-ish dip at 320
  tasks, yeah.
 
 No no, that's not restricted to one node.  It's just overloaded because
 I turned balancing off at the NODE domain level.

Which shows only that I was multitasking, and in a rush.  Boy was that
dumb.  Hohum.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Borislav Petkov

On Mon, Jan 28, 2013 at 12:40:46PM +0100, Mike Galbraith wrote:
  No no, that's not restricted to one node.  It's just overloaded because
  I turned balancing off at the NODE domain level.
 
 Which shows only that I was multitasking, and in a rush.  Boy was that
 dumb.  Hohum.

Ok, let's take a step back and slow it down a bit so that people like me
can understand it: you want to try it with disabled load balancing on
the node level, AFAICT. But with that many tasks, perf will suck anyway,
no? Unless you want to benchmark the numa-aware aspect and see whether
load balancing on the node level feels differently, perf-wise?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Mike Galbraith

On Mon, 2013-01-28 at 06:17 +0100, Mike Galbraith wrote:

Ok damnit.

 monteverdi:/abuild/mike/:[0]# echo powersaving  
 /sys/devices/system/cpu/sched_policy/current_sched_policy
 monteverdi:/abuild/mike/:[0]# massive_intr 10 60
 043321  00058616
 043313  00058616
 043318  00058968
 043317  00058968
 043316  00059184
 043319  00059192
 043320  00059048
 043314  00059048
 043312  00058176
 043315  00058184

That was boost if you like, and free to roam 4 nodes.

monteverdi:/abuild/mike/:[0]# echo powersaving  
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# echo 0  /sys/devices/system/cpu/cpufreq/boost
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
014618  00039616
014623  00039256
014617  00039256
014620  00039304
014621  00039304  (wait a minute, you said..)
014616  00039080
014625  00039064
014622  00039672
014624  00039624
014619  00039672
monteverdi:/abuild/mike/:[0]# echo 1  /sys/devices/system/cpu/cpufreq/boost
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
014635  00058160
014633  00058592
014638  00058592
014636  00058160
014632  00058200
014634  00058704
014639  00058704
014641  00058200
014640  00058560
014637  00058560
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
014673  00059504
014676  00059504
014674  00059064
014672  00059064
014675  00058560
014671  00058560
014677  00059248
014668  00058864
014669  00059248
014670  00058864
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
014686  00043472
014689  00043472
014685  00043760
014690  00043760
014687  00043528
014688  00043528  (hmm)
014683  00043216
014692  00043208
014684  00043336
014691  00043336
monteverdi:/abuild/mike/:[0]# echo 0  /sys/devices/system/cpu/cpufreq/boost
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
014701  00039344
014707  00039344
014709  00038976
014700  00038976
014708  00039256  (hmm)
014703  00039256
014705  00039400
014704  00039400
014706  00039320
014702  00039320
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
014713  00058552
014716  00058664
014719  00058600
014715  00058600
014718  00058520
014722  00058400
014721  00058768
014717  00058768
014714  00058552
014720  00058560
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
014732  00058736
014734  00058760
014729  00040872
014736  00059184
014728  00059184
014727  00058744
014733  00058760
014731  00059320
014730  00059280
014735  00041072
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
014749  00040608
014748  00040616
014745  00039360
014750  00039360
014751  00039416
014747  00039416
014752  00039336
014746  00039336
014744  00039480
014753  00039480
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
014757  00039272
014761  00039272
014765  00039528
014756  00039528
014759  00039352
014760  00039352
014764  00039248
014762  00039248
014758  00039352
014763  00039352
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
014773  00059680
014769  00059680
014768  00059144
014777  00059144
014775  00059688
014774  00059688
014770  00059264
014771  00059264
014772  00059528
014776  00059528

Ok box, whatever blows your skirt up.  I'm done.

Non
Uniform
Mysterious
Artifacts

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Mike Galbraith

On Mon, 2013-01-28 at 16:22 +0100, Borislav Petkov wrote: 
 On Mon, Jan 28, 2013 at 12:40:46PM +0100, Mike Galbraith wrote:
   No no, that's not restricted to one node.  It's just overloaded because
   I turned balancing off at the NODE domain level.
  
  Which shows only that I was multitasking, and in a rush.  Boy was that
  dumb.  Hohum.
 
 Ok, let's take a step back and slow it down a bit so that people like me
 can understand it: you want to try it with disabled load balancing on
 the node level, AFAICT. But with that many tasks, perf will suck anyway,
 no? Unless you want to benchmark the numa-aware aspect and see whether
 load balancing on the node level feels differently, perf-wise?

The broken thought was, since it's not wakeup path, stop node balance..
but killing all of it killed FORK/EXEC balance, oops.

I think I'm done with this thing though.  See mail I just sent.   There
are better things to do than letting box jerk my chain endlessly ;-)

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Alex Shi

On 01/28/2013 02:42 PM, Mike Galbraith wrote:
 Back to original 1ms sleep, 8ms work, turning NUMA box into a single
 node 10 core box with numactl.
 
 monteverdi:/abuild/mike/:[0]# echo powersaving  
 /sys/devices/system/cpu/sched_policy/current_sched_policy
 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
 045286  00043872
 045289  00043464
 045284  00043488
 045287  00043440
 045283  00043416
 045281  00044456
 045285  00043456
 045288  00044312
 045280  00043048
 045282  00043240

Um, no idea why the powersaving data is so low.
 monteverdi:/abuild/mike/:[0]# echo balance  
 /sys/devices/system/cpu/sched_policy/current_sched_policy
 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
 045300  00052536
 045307  00052472
 045304  00052536
 045299  00052536
 045305  00052520
 045306  00052528
 045302  00052528
 045303  00052528
 045308  00052512
 045301  00052520
 monteverdi:/abuild/mike/:[0]# echo performance  
 /sys/devices/system/cpu/sched_policy/current_sched_policy
 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
 045339  00052600
 045340  00052608
 045338  00052600
 045337  00052608
 045343  00052600
 045341  00052600
 045336  00052608
 045335  00052616
 045334  00052576


-- 
Thanks Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Alex Shi


 then the above no_node-load_balance thing suffers a small-ish dip at 320
 tasks, yeah.
 
 No no, that's not restricted to one node.  It's just overloaded because
 I turned balancing off at the NODE domain level.
 
 And AFAICR, the effect of disabling boosting will be visible in the
 small count tasks cases anyway because if you saturate the cores with
 tasks, the boosting algorithms tend to get the box out of boosting for
 the simple reason that the power/perf headroom simply disappears due to
 the SOC being busy.

 640 100294.898  38.7570.9   2.6118
 1280115998.297  66.91132.8  1.5104
 2560125820.097  123.3   2256.6  0.8191

 I dunno about those. maybe this is expected with so many tasks or do we
 want to optimize that case further?
 
 When using all 4 nodes properly, that's still scaling.  Here, I

Without node regular balancing, only waking balance left in
select_task_rq_fair for aim7 testing, (I just assume you used shared
workfile, most of testing is cpu density and only few exec/fork load).

Since, waking balance just happened in same llc domain. guess that is
the reason for this.

 intentionally screwed up balancing to watch the low end.  High end is
 expected wreckage.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Alex Shi


 Benchmark   Version Machine Run Date
 AIM Multiuser Benchmark - Suite VII 1.1   performance Jan 28 
 08:09:20 2013
 
 Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
 1   438.8   100 13.83.8 7.3135
 5   2634.8  99  11.57.2 8.7826
 10  5396.3  99  11.211.48.9938
 20  10725.7 99  11.324.08.9381
 40  20183.2 99  12.038.58.4097
 80  35620.9 99  13.671.47.4210
 160 57203.5 98  16.9137.8   5.9587
 320 81995.8 98  23.7271.3   4.2706
 
 then the above no_node-load_balance thing suffers a small-ish dip at 320
 tasks, yeah.
 
 And AFAICR, the effect of disabling boosting will be visible in the
 small count tasks cases anyway because if you saturate the cores with
 tasks, the boosting algorithms tend to get the box out of boosting for
 the simple reason that the power/perf headroom simply disappears due to
 the SOC being busy.

Sure. and according to the context of serial email. guess this result
has boosting enabled, right?


 
 640 100294.898  38.7570.9   2.6118
 1280115998.297  66.91132.8  1.5104
 2560125820.097  123.3   2256.6  0.8191
 
 I dunno about those. maybe this is expected with so many tasks or do we
 want to optimize that case further?
 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Alex Shi

On 01/28/2013 11:55 PM, Mike Galbraith wrote:
 On Mon, 2013-01-28 at 16:22 +0100, Borislav Petkov wrote: 
 On Mon, Jan 28, 2013 at 12:40:46PM +0100, Mike Galbraith wrote:
 No no, that's not restricted to one node.  It's just overloaded because
 I turned balancing off at the NODE domain level.

 Which shows only that I was multitasking, and in a rush.  Boy was that
 dumb.  Hohum.

 Ok, let's take a step back and slow it down a bit so that people like me
 can understand it: you want to try it with disabled load balancing on
 the node level, AFAICT. But with that many tasks, perf will suck anyway,
 no? Unless you want to benchmark the numa-aware aspect and see whether
 load balancing on the node level feels differently, perf-wise?
 
 The broken thought was, since it's not wakeup path, stop node balance..
 but killing all of it killed FORK/EXEC balance, oops.

Um. sure. so guess all of tasks just running on one node.
 
 I think I'm done with this thing though.  See mail I just sent.   There
 are better things to do than letting box jerk my chain endlessly ;-)
 
 -Mike
 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Alex Shi

On 01/28/2013 11:47 PM, Mike Galbraith wrote:
 On Mon, 2013-01-28 at 06:17 +0100, Mike Galbraith wrote:
 
 Ok damnit.
 
 monteverdi:/abuild/mike/:[0]# echo powersaving  
 /sys/devices/system/cpu/sched_policy/current_sched_policy
 monteverdi:/abuild/mike/:[0]# massive_intr 10 60
 043321  00058616
 043313  00058616
 043318  00058968
 043317  00058968
 043316  00059184
 043319  00059192
 043320  00059048
 043314  00059048
 043312  00058176
 043315  00058184
 
 That was boost if you like, and free to roam 4 nodes.
 
 monteverdi:/abuild/mike/:[0]# echo powersaving  
 /sys/devices/system/cpu/sched_policy/current_sched_policy
 monteverdi:/abuild/mike/:[0]# echo 0  /sys/devices/system/cpu/cpufreq/boost
 monteverdi:/abuild/mike/:[0]# massive_intr 10 60
 014618  00039616
 014623  00039256
 014617  00039256
 014620  00039304
 014621  00039304  (wait a minute, you said..)
 014616  00039080
 014625  00039064
 014622  00039672
 014624  00039624
 014619  00039672
 monteverdi:/abuild/mike/:[0]# echo 1  /sys/devices/system/cpu/cpufreq/boost
 monteverdi:/abuild/mike/:[0]# massive_intr 10 60
 014635  00058160
 014633  00058592
 014638  00058592
 014636  00058160
 014632  00058200
 014634  00058704
 014639  00058704
 014641  00058200
 014640  00058560
 014637  00058560
 monteverdi:/abuild/mike/:[0]# massive_intr 10 60
 014673  00059504
 014676  00059504
 014674  00059064
 014672  00059064
 014675  00058560
 014671  00058560
 014677  00059248
 014668  00058864
 014669  00059248
 014670  00058864
 monteverdi:/abuild/mike/:[0]# massive_intr 10 60
 014686  00043472
 014689  00043472
 014685  00043760
 014690  00043760
 014687  00043528
 014688  00043528  (hmm)
 014683  00043216
 014692  00043208
 014684  00043336
 014691  00043336

I am sorry Mike. does above 3 times testing has a same sched policy? and
same question for the following testing.

 monteverdi:/abuild/mike/:[0]# echo 0  /sys/devices/system/cpu/cpufreq/boost
 monteverdi:/abuild/mike/:[0]# massive_intr 10 60
 014701  00039344
 014707  00039344
 014709  00038976
 014700  00038976
 014708  00039256  (hmm)
 014703  00039256
 014705  00039400
 014704  00039400
 014706  00039320
 014702  00039320
 monteverdi:/abuild/mike/:[0]# massive_intr 10 60
 014713  00058552
 014716  00058664
 014719  00058600
 014715  00058600
 014718  00058520
 014722  00058400
 014721  00058768
 014717  00058768
 014714  00058552
 014720  00058560
 monteverdi:/abuild/mike/:[0]# massive_intr 10 60
 014732  00058736
 014734  00058760
 014729  00040872
 014736  00059184
 014728  00059184
 014727  00058744
 014733  00058760
 014731  00059320
 014730  00059280
 014735  00041072
 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
 014749  00040608
 014748  00040616
 014745  00039360
 014750  00039360
 014751  00039416
 014747  00039416
 014752  00039336
 014746  00039336
 014744  00039480
 014753  00039480
 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
 014757  00039272
 014761  00039272
 014765  00039528
 014756  00039528
 014759  00039352
 014760  00039352
 014764  00039248
 014762  00039248
 014758  00039352
 014763  00039352
 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
 014773  00059680
 014769  00059680
 014768  00059144
 014777  00059144
 014775  00059688
 014774  00059688
 014770  00059264
 014771  00059264
 014772  00059528
 014776  00059528
 
 Ok box, whatever blows your skirt up.  I'm done.
 
 Non
 Uniform
 Mysterious
 Artifacts
 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Alex Shi

On 01/28/2013 11:47 PM, Mike Galbraith wrote:
 014776  00059528
 
 Ok box, whatever blows your skirt up.  I'm done.

Many thanks for so much fruitful testing! :D

-- 
Thanks Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Mike Galbraith

On Tue, 2013-01-29 at 09:45 +0800, Alex Shi wrote: 
 On 01/28/2013 11:47 PM, Mike Galbraith wrote:

  monteverdi:/abuild/mike/:[0]# echo 1  /sys/devices/system/cpu/cpufreq/boost
  monteverdi:/abuild/mike/:[0]# massive_intr 10 60
  014635  00058160
  014633  00058592
  014638  00058592
  014636  00058160
  014632  00058200
  014634  00058704
  014639  00058704
  014641  00058200
  014640  00058560
  014637  00058560
  monteverdi:/abuild/mike/:[0]# massive_intr 10 60
  014673  00059504
  014676  00059504
  014674  00059064
  014672  00059064
  014675  00058560
  014671  00058560
  014677  00059248
  014668  00058864
  014669  00059248
  014670  00058864
  monteverdi:/abuild/mike/:[0]# massive_intr 10 60
  014686  00043472
  014689  00043472
  014685  00043760
  014690  00043760
  014687  00043528
  014688  00043528  (hmm)
  014683  00043216
  014692  00043208
  014684  00043336
  014691  00043336
 
 I am sorry Mike. does above 3 times testing has a same sched policy? and
 same question for the following testing.

Yeah, they're back to back repeats.  Using dirt simple massive_intr
didn't help clarify aim7 oddity.

aim7 is fully repeatable, seems to be saying that consolidation of small
independent jobs is a win, that spreading before fully saturated has its
price, just as consolidation of large coordinated burst has its price.

Seems to cut both ways.. but why not, everything else does.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-28 Thread Alex Shi

On 01/28/2013 01:19 PM, Alex Shi wrote:
 On 01/27/2013 06:40 PM, Borislav Petkov wrote:
 On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
 Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
 hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
 loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
 performance change found.

 Ok, good, You could put that in one of the commit messages so that it is
 there and people know that this patchset doesn't cause perf regressions
 with the bunch of benchmarks.

 I also tested balance policy/powersaving policy with above benchmark,
 found, the specjbb2005 drop much 30~50% on both of policy whenever
 with openjdk or jrockit. and hackbench drops a lots with powersaving
 policy on snb 4 sockets platforms. others has no clear change.

Sorry, the testing configuration is unfair for this specjbb2005 results
here. I set JVM hard pin and use hugepage for peak performance.

When remove the hard pin and no hugepage, the balance/powersaving both
drop about 5% VS performance policy, and performance policy result is
similar with 3.8-rc5.


 I guess this is expected because there has to be some performance hit
 when saving power...

 
 BTW, I had tested the v3 version based on sched numa -- on tip/master.
 The specjbb just has about 5~7% dropping on balance/powersaving policy.
 The power scheduling done after the numa scheduling logical.
 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith

On Mon, 2013-01-28 at 15:17 +0800, Alex Shi wrote: 
> On 01/28/2013 02:49 PM, Mike Galbraith wrote:
> > On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: 
> >> On 01/27/2013 06:40 PM, Borislav Petkov wrote:
> >>> On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
>  Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
>  hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
>  loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
>  performance change found.
> >>>
> >>> Ok, good, You could put that in one of the commit messages so that it is
> >>> there and people know that this patchset doesn't cause perf regressions
> >>> with the bunch of benchmarks.
> >>>
>  I also tested balance policy/powersaving policy with above benchmark,
>  found, the specjbb2005 drop much 30~50% on both of policy whenever
>  with openjdk or jrockit. and hackbench drops a lots with powersaving
>  policy on snb 4 sockets platforms. others has no clear change.
> >>>
> >>> I guess this is expected because there has to be some performance hit
> >>> when saving power...
> >>>
> >>
> >> BTW, I had tested the v3 version based on sched numa -- on tip/master.
> >> The specjbb just has about 5~7% dropping on balance/powersaving policy.
> >> The power scheduling done after the numa scheduling logical.
> > 
> > That makes sense.  How the numa scheduling numbers compare to mainline?
> > Do you have all three available, mainline, and tip w. w/o powersaving
> > policy?
> > 
> 
> I once caught 20~40% performance increasing on sched numa VS mainline
> 3.7-rc5. but have no baseline to compare balance/powersaving performance
> since lower data are acceptable for balance/powersaving and
> tip/master changes too quickly to follow up at that time.
> :)

(wow.  dram sucks, dram+smp sucks more, dram+smp+numa _sucks rocks_;)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith

On Mon, 2013-01-28 at 07:42 +0100, Mike Galbraith wrote:

> Back to original 1ms sleep, 8ms work, turning NUMA box into a single
> node 10 core box with numactl.

(aim7 in one 10 core node.. so spread, no delta.)

Benchmark   Version Machine Run Date
AIM Multiuser Benchmark - Suite VII "1.1"   powersaving Jan 28 08:04:14 
2013

Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
1   441.0   100 13.73.7 7.3508
5   2516.6  98  12.08.1 8.3887
10  5215.1  98  11.611.98.6919
20  10475.4 99  11.621.78.7295
40  20216.8 99  12.038.28.4237
80  35568.6 99  13.671.47.4101
160 57102.5 98  17.0138.2   5.9482
320 82099.9 97  23.6271.1   4.2760
Benchmark   Version Machine Run Date
AIM Multiuser Benchmark - Suite VII "1.1"   balance Jan 28 08:06:49 2013

Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
1   439.4   100 13.83.8 7.3241
5   2583.1  98  11.77.2 8.6104
10  5325.1  99  11.411.08.8752
20  10687.8 99  11.323.68.9065
40  20200.0 99  12.038.78.4167
80  35464.5 98  13.771.47.3884
160 57203.5 98  16.9137.9   5.9587
320 82065.2 98  23.6271.1   4.2742
Benchmark   Version Machine Run Date
AIM Multiuser Benchmark - Suite VII "1.1"   performance Jan 28 08:09:20 
2013

Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
1   438.8   100 13.83.8 7.3135
5   2634.8  99  11.57.2 8.7826
10  5396.3  99  11.211.48.9938
20  10725.7 99  11.324.08.9381
40  20183.2 99  12.038.58.4097
80  35620.9 99  13.671.47.4210
160 57203.5 98  16.9137.8   5.9587
320 81995.8 98  23.7271.3   4.2706

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi

On 01/28/2013 02:49 PM, Mike Galbraith wrote:
> On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: 
>> On 01/27/2013 06:40 PM, Borislav Petkov wrote:
>>> On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
 Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
 hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
 loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
 performance change found.
>>>
>>> Ok, good, You could put that in one of the commit messages so that it is
>>> there and people know that this patchset doesn't cause perf regressions
>>> with the bunch of benchmarks.
>>>
 I also tested balance policy/powersaving policy with above benchmark,
 found, the specjbb2005 drop much 30~50% on both of policy whenever
 with openjdk or jrockit. and hackbench drops a lots with powersaving
 policy on snb 4 sockets platforms. others has no clear change.
>>>
>>> I guess this is expected because there has to be some performance hit
>>> when saving power...
>>>
>>
>> BTW, I had tested the v3 version based on sched numa -- on tip/master.
>> The specjbb just has about 5~7% dropping on balance/powersaving policy.
>> The power scheduling done after the numa scheduling logical.
> 
> That makes sense.  How the numa scheduling numbers compare to mainline?
> Do you have all three available, mainline, and tip w. w/o powersaving
> policy?
> 

I once caught 20~40% performance increasing on sched numa VS mainline
3.7-rc5. but have no baseline to compare balance/powersaving performance
since lower data are acceptable for balance/powersaving and
tip/master changes too quickly to follow up at that time.
:)

> -Mike
> 
> 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith

On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: 
> On 01/27/2013 06:40 PM, Borislav Petkov wrote:
> > On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
> >> Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
> >> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
> >> loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
> >> performance change found.
> > 
> > Ok, good, You could put that in one of the commit messages so that it is
> > there and people know that this patchset doesn't cause perf regressions
> > with the bunch of benchmarks.
> > 
> >> I also tested balance policy/powersaving policy with above benchmark,
> >> found, the specjbb2005 drop much 30~50% on both of policy whenever
> >> with openjdk or jrockit. and hackbench drops a lots with powersaving
> >> policy on snb 4 sockets platforms. others has no clear change.
> > 
> > I guess this is expected because there has to be some performance hit
> > when saving power...
> > 
> 
> BTW, I had tested the v3 version based on sched numa -- on tip/master.
> The specjbb just has about 5~7% dropping on balance/powersaving policy.
> The power scheduling done after the numa scheduling logical.

That makes sense.  How the numa scheduling numbers compare to mainline?
Do you have all three available, mainline, and tip w. w/o powersaving
policy?

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith

On Mon, 2013-01-28 at 07:15 +0100, Mike Galbraith wrote: 
> On Mon, 2013-01-28 at 13:51 +0800, Alex Shi wrote: 
> > On 01/28/2013 01:17 PM, Mike Galbraith wrote:
> > > On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: 
> > >> On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: 
> > >>> On 01/27/2013 06:35 PM, Borislav Petkov wrote:
> >  On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
> > > With aim7 compute on 4 node 40 core box, I see stable throughput
> > > improvement at tasks = nr_cores and below w. balance and powersaving. 
> > >> ... 
> >  Ok, this is sick. How is balance and powersaving better than perf? Both
> >  have much more jobs per minute than perf; is that because we do pack
> >  much more tasks per cpu with balance and powersaving?
> > >>>
> > >>> Maybe it is due to the lazy balancing on balance/powersaving. You can
> > >>> check the CS times in /proc/pid/status.
> > >>
> > >> Well, it's not wakeup path, limiting entry frequency per waker did zip
> > >> squat nada to any policy throughput.
> > > 
> > > monteverdi:/abuild/mike/:[0]# echo powersaving > 
> > > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > > 043321  00058616
> > > 043313  00058616
> > > 043318  00058968
> > > 043317  00058968
> > > 043316  00059184
> > > 043319  00059192
> > > 043320  00059048
> > > 043314  00059048
> > > 043312  00058176
> > > 043315  00058184
> > > monteverdi:/abuild/mike/:[0]# echo balance > 
> > > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > > 043337  00053448
> > > 04  00053456
> > > 043338  00052992
> > > 043331  00053448
> > > 043332  00053488
> > > 043335  00053496
> > > 043334  00053480
> > > 043329  00053288
> > > 043336  00053464
> > > 043330  00053496
> > > monteverdi:/abuild/mike/:[0]# echo performance > 
> > > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > > 043348  00052488
> > > 043344  00052488
> > > 043349  00052744
> > > 043343  00052504
> > > 043347  00052504
> > > 043352  00052888
> > > 043345  00052504
> > > 043351  00052496
> > > 043346  00052496
> > > 043350  00052304
> > > monteverdi:/abuild/mike/:[0]#
> > 
> > similar with aim7 results. Thanks, Mike!
> > 
> > Wold you like to collect vmstat info in background?
> > > 
> > > Zzzt.  Wish I could turn turbo thingy off.
> > 
> > Do you mean the turbo mode of cpu frequency? I remember some of machine
> > can disable it in BIOS.
> 
> Yeah, I can do that in my local x3550 box.  I can't fiddle with BIOS
> settings on the remote NUMA box.
> 
> This can't be anything but turbo gizmo mucking up the numbers I think,
> not that the numbers are invalid or anything, better numbers are better
> numbers no matter where/how they come about ;-)
> 
> The massive_intr load is dirt simple sleep/spin with bean counting.  It
> sleeps 1ms spins 8ms.  Change that to sleep 8ms, grind away for 1ms...
> 
> monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
> 045150  6484
> 045157  6427
> 045156  6401
> 045152  6428
> 045155  6372
> 045154  6370
> 045158  6453
> 045149  6372
> 045151  6371
> 045153  6371
> monteverdi:/abuild/mike/:[0]# echo balance > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
> 045170  6380
> 045172  6374
> 045169  6376
> 045175  6376
> 045171  6334
> 045176  6380
> 045168  6374
> 045174  6334
> 045177  6375
> 045173  6376
> monteverdi:/abuild/mike/:[0]# echo performance > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
> 045198  6408
> 045191  6408
> 045197  6408
> 045192  6411
> 045194  6409
> 045196  6409
> 045195  6336
> 045189  6336
> 045193  6411
> 045190  6410

Back to original 1ms sleep, 8ms work, turning NUMA box into a single
node 10 core box with numactl.

monteverdi:/abuild/mike/:[0]# echo powersaving > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
045286  00043872
045289  00043464
045284  00043488
045287  00043440
045283  00043416
045281  00044456
045285  00043456
045288  00044312
045280  00043048
045282  00043240
monteverdi:/abuild/mike/:[0]# echo balance > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
045300  00052536
045307  00052472
045304  00052536
045299  00052536
045305  00052520
045306  00052528
045302  00052528
045303  00052528
045308  00052512
045301  00052520
monteverdi:/abuild/mike/:[0]# echo performance > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
045339

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith

On Mon, 2013-01-28 at 13:51 +0800, Alex Shi wrote: 
> On 01/28/2013 01:17 PM, Mike Galbraith wrote:
> > On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: 
> >> On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: 
> >>> On 01/27/2013 06:35 PM, Borislav Petkov wrote:
>  On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
> > With aim7 compute on 4 node 40 core box, I see stable throughput
> > improvement at tasks = nr_cores and below w. balance and powersaving. 
> >> ... 
>  Ok, this is sick. How is balance and powersaving better than perf? Both
>  have much more jobs per minute than perf; is that because we do pack
>  much more tasks per cpu with balance and powersaving?
> >>>
> >>> Maybe it is due to the lazy balancing on balance/powersaving. You can
> >>> check the CS times in /proc/pid/status.
> >>
> >> Well, it's not wakeup path, limiting entry frequency per waker did zip
> >> squat nada to any policy throughput.
> > 
> > monteverdi:/abuild/mike/:[0]# echo powersaving > 
> > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > 043321  00058616
> > 043313  00058616
> > 043318  00058968
> > 043317  00058968
> > 043316  00059184
> > 043319  00059192
> > 043320  00059048
> > 043314  00059048
> > 043312  00058176
> > 043315  00058184
> > monteverdi:/abuild/mike/:[0]# echo balance > 
> > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > 043337  00053448
> > 04  00053456
> > 043338  00052992
> > 043331  00053448
> > 043332  00053488
> > 043335  00053496
> > 043334  00053480
> > 043329  00053288
> > 043336  00053464
> > 043330  00053496
> > monteverdi:/abuild/mike/:[0]# echo performance > 
> > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > 043348  00052488
> > 043344  00052488
> > 043349  00052744
> > 043343  00052504
> > 043347  00052504
> > 043352  00052888
> > 043345  00052504
> > 043351  00052496
> > 043346  00052496
> > 043350  00052304
> > monteverdi:/abuild/mike/:[0]#
> 
> similar with aim7 results. Thanks, Mike!
> 
> Wold you like to collect vmstat info in background?
> > 
> > Zzzt.  Wish I could turn turbo thingy off.
> 
> Do you mean the turbo mode of cpu frequency? I remember some of machine
> can disable it in BIOS.

Yeah, I can do that in my local x3550 box.  I can't fiddle with BIOS
settings on the remote NUMA box.

This can't be anything but turbo gizmo mucking up the numbers I think,
not that the numbers are invalid or anything, better numbers are better
numbers no matter where/how they come about ;-)

The massive_intr load is dirt simple sleep/spin with bean counting.  It
sleeps 1ms spins 8ms.  Change that to sleep 8ms, grind away for 1ms...

monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
045150  6484
045157  6427
045156  6401
045152  6428
045155  6372
045154  6370
045158  6453
045149  6372
045151  6371
045153  6371
monteverdi:/abuild/mike/:[0]# echo balance > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
045170  6380
045172  6374
045169  6376
045175  6376
045171  6334
045176  6380
045168  6374
045174  6334
045177  6375
045173  6376
monteverdi:/abuild/mike/:[0]# echo performance > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
045198  6408
045191  6408
045197  6408
045192  6411
045194  6409
045196  6409
045195  6336
045189  6336
045193  6411
045190  6410

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi

On 01/28/2013 01:17 PM, Mike Galbraith wrote:
> On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: 
>> On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: 
>>> On 01/27/2013 06:35 PM, Borislav Petkov wrote:
 On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
> With aim7 compute on 4 node 40 core box, I see stable throughput
> improvement at tasks = nr_cores and below w. balance and powersaving. 
>> ... 
 Ok, this is sick. How is balance and powersaving better than perf? Both
 have much more jobs per minute than perf; is that because we do pack
 much more tasks per cpu with balance and powersaving?
>>>
>>> Maybe it is due to the lazy balancing on balance/powersaving. You can
>>> check the CS times in /proc/pid/status.
>>
>> Well, it's not wakeup path, limiting entry frequency per waker did zip
>> squat nada to any policy throughput.
> 
> monteverdi:/abuild/mike/:[0]# echo powersaving > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> 043321  00058616
> 043313  00058616
> 043318  00058968
> 043317  00058968
> 043316  00059184
> 043319  00059192
> 043320  00059048
> 043314  00059048
> 043312  00058176
> 043315  00058184
> monteverdi:/abuild/mike/:[0]# echo balance > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> 043337  00053448
> 04  00053456
> 043338  00052992
> 043331  00053448
> 043332  00053488
> 043335  00053496
> 043334  00053480
> 043329  00053288
> 043336  00053464
> 043330  00053496
> monteverdi:/abuild/mike/:[0]# echo performance > 
> /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> 043348  00052488
> 043344  00052488
> 043349  00052744
> 043343  00052504
> 043347  00052504
> 043352  00052888
> 043345  00052504
> 043351  00052496
> 043346  00052496
> 043350  00052304
> monteverdi:/abuild/mike/:[0]#

similar with aim7 results. Thanks, Mike!

Wold you like to collect vmstat info in background?
> 
> Zzzt.  Wish I could turn turbo thingy off.

Do you mean the turbo mode of cpu frequency? I remember some of machine
can disable it in BIOS.
> 
> -Mike
> 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi

On 01/27/2013 06:40 PM, Borislav Petkov wrote:
> On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
>> Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
>> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
>> loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
>> performance change found.
> 
> Ok, good, You could put that in one of the commit messages so that it is
> there and people know that this patchset doesn't cause perf regressions
> with the bunch of benchmarks.
> 
>> I also tested balance policy/powersaving policy with above benchmark,
>> found, the specjbb2005 drop much 30~50% on both of policy whenever
>> with openjdk or jrockit. and hackbench drops a lots with powersaving
>> policy on snb 4 sockets platforms. others has no clear change.
> 
> I guess this is expected because there has to be some performance hit
> when saving power...
> 

BTW, I had tested the v3 version based on sched numa -- on tip/master.
The specjbb just has about 5~7% dropping on balance/powersaving policy.
The power scheduling done after the numa scheduling logical.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith

On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: 
> On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: 
> > On 01/27/2013 06:35 PM, Borislav Petkov wrote:
> > > On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
> > >> With aim7 compute on 4 node 40 core box, I see stable throughput
> > >> improvement at tasks = nr_cores and below w. balance and powersaving. 
> ... 
> > > Ok, this is sick. How is balance and powersaving better than perf? Both
> > > have much more jobs per minute than perf; is that because we do pack
> > > much more tasks per cpu with balance and powersaving?
> > 
> > Maybe it is due to the lazy balancing on balance/powersaving. You can
> > check the CS times in /proc/pid/status.
> 
> Well, it's not wakeup path, limiting entry frequency per waker did zip
> squat nada to any policy throughput.

monteverdi:/abuild/mike/:[0]# echo powersaving > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
043321  00058616
043313  00058616
043318  00058968
043317  00058968
043316  00059184
043319  00059192
043320  00059048
043314  00059048
043312  00058176
043315  00058184
monteverdi:/abuild/mike/:[0]# echo balance > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
043337  00053448
04  00053456
043338  00052992
043331  00053448
043332  00053488
043335  00053496
043334  00053480
043329  00053288
043336  00053464
043330  00053496
monteverdi:/abuild/mike/:[0]# echo performance > 
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
043348  00052488
043344  00052488
043349  00052744
043343  00052504
043347  00052504
043352  00052888
043345  00052504
043351  00052496
043346  00052496
043350  00052304
monteverdi:/abuild/mike/:[0]#

Zzzt.  Wish I could turn turbo thingy off.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi

On 01/24/2013 11:06 AM, Alex Shi wrote:
> Since the runnable info needs 345ms to accumulate, balancing
> doesn't do well for many tasks burst waking. After talking with Mike
> Galbraith, we are agree to just use runnable avg in power friendly 
> scheduling and keep current instant load in performance scheduling for 
> low latency.
> 
> So the biggest change in this version is removing runnable load avg in
> balance and just using runnable data in power balance.
> 
> The patchset bases on Linus' tree, includes 3 parts,

Would you like to give some comments, Ingo? :)

Best regards!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith

On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: 
> On 01/27/2013 06:35 PM, Borislav Petkov wrote:
> > On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
> >> With aim7 compute on 4 node 40 core box, I see stable throughput
> >> improvement at tasks = nr_cores and below w. balance and powersaving. 
... 
> > Ok, this is sick. How is balance and powersaving better than perf? Both
> > have much more jobs per minute than perf; is that because we do pack
> > much more tasks per cpu with balance and powersaving?
> 
> Maybe it is due to the lazy balancing on balance/powersaving. You can
> check the CS times in /proc/pid/status.

Well, it's not wakeup path, limiting entry frequency per waker did zip
squat nada to any policy throughput.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi

On 01/27/2013 06:40 PM, Borislav Petkov wrote:
> On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
>> Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
>> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
>> loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
>> performance change found.
> 
> Ok, good, You could put that in one of the commit messages so that it is
> there and people know that this patchset doesn't cause perf regressions
> with the bunch of benchmarks.

thanks suggestion!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi

On 01/27/2013 06:35 PM, Borislav Petkov wrote:
> On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
>> With aim7 compute on 4 node 40 core box, I see stable throughput
>> improvement at tasks = nr_cores and below w. balance and powersaving. 
>>
>>  3.8.0-performance  3.8.0-balance
>>   3.8.0-powersaving
>> Tasksjobs/min  jti  jobs/min/task  real   cpu   jobs/min  jti  
>> jobs/min/task  real   cpu   jobs/min  jti  jobs/min/task  real   
>> cpu
>> 1  432.86  100   432.8571 14.00  3.99 433.48  100
>>433.4764 13.98  3.97 433.17  100   433.1665 13.99 
>>  3.98
>> 1  437.23  100   437.2294 13.86  3.85 436.60  100
>>436.5994 13.88  3.86 435.66  100   435.6578 13.91 
>>  3.90
>> 1  434.10  100   434.0974 13.96  3.95 436.29  100
>>436.2851 13.89  3.89 436.29  100   436.2851 13.89 
>>  3.87
>> 5 2400.95   99   480.1902 12.62 12.492554.81   98
>>510.9612 11.86  7.552487.68   98   497.5369 12.18 
>>  8.22
>> 5 2341.58   99   468.3153 12.94 13.952578.72   99
>>515.7447 11.75  7.252527.11   99   505.4212 11.99 
>>  7.90
>> 5 2350.66   99   470.1319 12.89 13.662600.86   99
>>520.1717 11.65  7.092508.28   98   501.6556 12.08 
>>  8.24
>>10 4291.78   99   429.1785 14.12 40.145334.51   99
>>533.4507 11.36 11.135183.92   98   518.3918 11.69 
>> 12.15
>>10 4334.76   99   433.4764 13.98 38.705311.13   99
>>531.1131 11.41 11.235215.15   99   521.5146 11.62 
>> 12.53
>>10 4273.62   99   427.3625 14.18 40.295287.96   99
>>528.7958 11.46 11.465144.31   98   514.4312 11.78 
>> 12.32
>>20 8487.39   94   424.3697 14.28 63.14   10594.41   99
>>529.7203 11.44 23.72   10575.92   99   528.7958 11.46 
>> 22.08
>>20 8387.54   97   419.3772 14.45 77.01   10575.92   98
>>528.7958 11.46 23.41   10520.83   99   526.0417 11.52 
>> 21.88
>>20 8713.16   95   435.6578 13.91 55.10   10659.63   99
>>532.9815 11.37 24.17   10539.13   99   526.9565 11.50 
>> 22.13
>>4016786.70   99   419.6676 14.44170.08   19469.88   98
>>486.7470 12.45 60.78   19967.05   98   499.1763 12.14 
>> 51.40
>>4016728.78   99   418.2195 14.49172.96   19627.53   98
>>490.6883 12.35 65.26   20386.88   98   509.6720 11.89 
>> 46.91
>>4016763.49   99   419.0871 14.46171.42   20033.06   98
>>500.8264 12.10 51.44   20682.59   98   517.0648 11.72 
>> 42.45
> 
> Ok, this is sick. How is balance and powersaving better than perf? Both
> have much more jobs per minute than perf; is that because we do pack
> much more tasks per cpu with balance and powersaving?

Maybe it is due to the lazy balancing on balance/powersaving. You can
check the CS times in /proc/pid/status.
> 
> Thanks.
> 


-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Borislav Petkov

On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
> Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
> loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
> performance change found.

Ok, good, You could put that in one of the commit messages so that it is
there and people know that this patchset doesn't cause perf regressions
with the bunch of benchmarks.

> I also tested balance policy/powersaving policy with above benchmark,
> found, the specjbb2005 drop much 30~50% on both of policy whenever
> with openjdk or jrockit. and hackbench drops a lots with powersaving
> policy on snb 4 sockets platforms. others has no clear change.

I guess this is expected because there has to be some performance hit
when saving power...

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Borislav Petkov

On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
> With aim7 compute on 4 node 40 core box, I see stable throughput
> improvement at tasks = nr_cores and below w. balance and powersaving. 
> 
>  3.8.0-performance  3.8.0-balance 
>  3.8.0-powersaving
> Tasksjobs/min  jti  jobs/min/task  real   cpu   jobs/min  jti  
> jobs/min/task  real   cpu   jobs/min  jti  jobs/min/task  real
>cpu
> 1  432.86  100   432.8571 14.00  3.99 433.48  100 
>   433.4764 13.98  3.97 433.17  100   433.1665 13.99  
> 3.98
> 1  437.23  100   437.2294 13.86  3.85 436.60  100 
>   436.5994 13.88  3.86 435.66  100   435.6578 13.91  
> 3.90
> 1  434.10  100   434.0974 13.96  3.95 436.29  100 
>   436.2851 13.89  3.89 436.29  100   436.2851 13.89  
> 3.87
> 5 2400.95   99   480.1902 12.62 12.492554.81   98 
>   510.9612 11.86  7.552487.68   98   497.5369 12.18  
> 8.22
> 5 2341.58   99   468.3153 12.94 13.952578.72   99 
>   515.7447 11.75  7.252527.11   99   505.4212 11.99  
> 7.90
> 5 2350.66   99   470.1319 12.89 13.662600.86   99 
>   520.1717 11.65  7.092508.28   98   501.6556 12.08  
> 8.24
>10 4291.78   99   429.1785 14.12 40.145334.51   99 
>   533.4507 11.36 11.135183.92   98   518.3918 11.69 
> 12.15
>10 4334.76   99   433.4764 13.98 38.705311.13   99 
>   531.1131 11.41 11.235215.15   99   521.5146 11.62 
> 12.53
>10 4273.62   99   427.3625 14.18 40.295287.96   99 
>   528.7958 11.46 11.465144.31   98   514.4312 11.78 
> 12.32
>20 8487.39   94   424.3697 14.28 63.14   10594.41   99 
>   529.7203 11.44 23.72   10575.92   99   528.7958 11.46 
> 22.08
>20 8387.54   97   419.3772 14.45 77.01   10575.92   98 
>   528.7958 11.46 23.41   10520.83   99   526.0417 11.52 
> 21.88
>20 8713.16   95   435.6578 13.91 55.10   10659.63   99 
>   532.9815 11.37 24.17   10539.13   99   526.9565 11.50 
> 22.13
>4016786.70   99   419.6676 14.44170.08   19469.88   98 
>   486.7470 12.45 60.78   19967.05   98   499.1763 12.14 
> 51.40
>4016728.78   99   418.2195 14.49172.96   19627.53   98 
>   490.6883 12.35 65.26   20386.88   98   509.6720 11.89 
> 46.91
>4016763.49   99   419.0871 14.46171.42   20033.06   98 
>   500.8264 12.10 51.44   20682.59   98   517.0648 11.72 
> 42.45

Ok, this is sick. How is balance and powersaving better than perf? Both
have much more jobs per minute than perf; is that because we do pack
much more tasks per cpu with balance and powersaving?

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Borislav Petkov

On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
 With aim7 compute on 4 node 40 core box, I see stable throughput
 improvement at tasks = nr_cores and below w. balance and powersaving. 
 
  3.8.0-performance  3.8.0-balance 
  3.8.0-powersaving
 Tasksjobs/min  jti  jobs/min/task  real   cpu   jobs/min  jti  
 jobs/min/task  real   cpu   jobs/min  jti  jobs/min/task  real
cpu
 1  432.86  100   432.8571 14.00  3.99 433.48  100 
   433.4764 13.98  3.97 433.17  100   433.1665 13.99  
 3.98
 1  437.23  100   437.2294 13.86  3.85 436.60  100 
   436.5994 13.88  3.86 435.66  100   435.6578 13.91  
 3.90
 1  434.10  100   434.0974 13.96  3.95 436.29  100 
   436.2851 13.89  3.89 436.29  100   436.2851 13.89  
 3.87
 5 2400.95   99   480.1902 12.62 12.492554.81   98 
   510.9612 11.86  7.552487.68   98   497.5369 12.18  
 8.22
 5 2341.58   99   468.3153 12.94 13.952578.72   99 
   515.7447 11.75  7.252527.11   99   505.4212 11.99  
 7.90
 5 2350.66   99   470.1319 12.89 13.662600.86   99 
   520.1717 11.65  7.092508.28   98   501.6556 12.08  
 8.24
10 4291.78   99   429.1785 14.12 40.145334.51   99 
   533.4507 11.36 11.135183.92   98   518.3918 11.69 
 12.15
10 4334.76   99   433.4764 13.98 38.705311.13   99 
   531.1131 11.41 11.235215.15   99   521.5146 11.62 
 12.53
10 4273.62   99   427.3625 14.18 40.295287.96   99 
   528.7958 11.46 11.465144.31   98   514.4312 11.78 
 12.32
20 8487.39   94   424.3697 14.28 63.14   10594.41   99 
   529.7203 11.44 23.72   10575.92   99   528.7958 11.46 
 22.08
20 8387.54   97   419.3772 14.45 77.01   10575.92   98 
   528.7958 11.46 23.41   10520.83   99   526.0417 11.52 
 21.88
20 8713.16   95   435.6578 13.91 55.10   10659.63   99 
   532.9815 11.37 24.17   10539.13   99   526.9565 11.50 
 22.13
4016786.70   99   419.6676 14.44170.08   19469.88   98 
   486.7470 12.45 60.78   19967.05   98   499.1763 12.14 
 51.40
4016728.78   99   418.2195 14.49172.96   19627.53   98 
   490.6883 12.35 65.26   20386.88   98   509.6720 11.89 
 46.91
4016763.49   99   419.0871 14.46171.42   20033.06   98 
   500.8264 12.10 51.44   20682.59   98   517.0648 11.72 
 42.45

Ok, this is sick. How is balance and powersaving better than perf? Both
have much more jobs per minute than perf; is that because we do pack
much more tasks per cpu with balance and powersaving?

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Borislav Petkov

On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
 Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
 hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
 loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
 performance change found.

Ok, good, You could put that in one of the commit messages so that it is
there and people know that this patchset doesn't cause perf regressions
with the bunch of benchmarks.

 I also tested balance policy/powersaving policy with above benchmark,
 found, the specjbb2005 drop much 30~50% on both of policy whenever
 with openjdk or jrockit. and hackbench drops a lots with powersaving
 policy on snb 4 sockets platforms. others has no clear change.

I guess this is expected because there has to be some performance hit
when saving power...

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi

On 01/27/2013 06:35 PM, Borislav Petkov wrote:
 On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
 With aim7 compute on 4 node 40 core box, I see stable throughput
 improvement at tasks = nr_cores and below w. balance and powersaving. 

  3.8.0-performance  3.8.0-balance
   3.8.0-powersaving
 Tasksjobs/min  jti  jobs/min/task  real   cpu   jobs/min  jti  
 jobs/min/task  real   cpu   jobs/min  jti  jobs/min/task  real   
 cpu
 1  432.86  100   432.8571 14.00  3.99 433.48  100
433.4764 13.98  3.97 433.17  100   433.1665 13.99 
  3.98
 1  437.23  100   437.2294 13.86  3.85 436.60  100
436.5994 13.88  3.86 435.66  100   435.6578 13.91 
  3.90
 1  434.10  100   434.0974 13.96  3.95 436.29  100
436.2851 13.89  3.89 436.29  100   436.2851 13.89 
  3.87
 5 2400.95   99   480.1902 12.62 12.492554.81   98
510.9612 11.86  7.552487.68   98   497.5369 12.18 
  8.22
 5 2341.58   99   468.3153 12.94 13.952578.72   99
515.7447 11.75  7.252527.11   99   505.4212 11.99 
  7.90
 5 2350.66   99   470.1319 12.89 13.662600.86   99
520.1717 11.65  7.092508.28   98   501.6556 12.08 
  8.24
10 4291.78   99   429.1785 14.12 40.145334.51   99
533.4507 11.36 11.135183.92   98   518.3918 11.69 
 12.15
10 4334.76   99   433.4764 13.98 38.705311.13   99
531.1131 11.41 11.235215.15   99   521.5146 11.62 
 12.53
10 4273.62   99   427.3625 14.18 40.295287.96   99
528.7958 11.46 11.465144.31   98   514.4312 11.78 
 12.32
20 8487.39   94   424.3697 14.28 63.14   10594.41   99
529.7203 11.44 23.72   10575.92   99   528.7958 11.46 
 22.08
20 8387.54   97   419.3772 14.45 77.01   10575.92   98
528.7958 11.46 23.41   10520.83   99   526.0417 11.52 
 21.88
20 8713.16   95   435.6578 13.91 55.10   10659.63   99
532.9815 11.37 24.17   10539.13   99   526.9565 11.50 
 22.13
4016786.70   99   419.6676 14.44170.08   19469.88   98
486.7470 12.45 60.78   19967.05   98   499.1763 12.14 
 51.40
4016728.78   99   418.2195 14.49172.96   19627.53   98
490.6883 12.35 65.26   20386.88   98   509.6720 11.89 
 46.91
4016763.49   99   419.0871 14.46171.42   20033.06   98
500.8264 12.10 51.44   20682.59   98   517.0648 11.72 
 42.45
 
 Ok, this is sick. How is balance and powersaving better than perf? Both
 have much more jobs per minute than perf; is that because we do pack
 much more tasks per cpu with balance and powersaving?

Maybe it is due to the lazy balancing on balance/powersaving. You can
check the CS times in /proc/pid/status.
 
 Thanks.
 


-- 
Thanks
Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi

On 01/27/2013 06:40 PM, Borislav Petkov wrote:
 On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
 Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
 hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
 loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
 performance change found.
 
 Ok, good, You could put that in one of the commit messages so that it is
 there and people know that this patchset doesn't cause perf regressions
 with the bunch of benchmarks.

thanks suggestion!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith

On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: 
 On 01/27/2013 06:35 PM, Borislav Petkov wrote:
  On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
  With aim7 compute on 4 node 40 core box, I see stable throughput
  improvement at tasks = nr_cores and below w. balance and powersaving. 
... 
  Ok, this is sick. How is balance and powersaving better than perf? Both
  have much more jobs per minute than perf; is that because we do pack
  much more tasks per cpu with balance and powersaving?
 
 Maybe it is due to the lazy balancing on balance/powersaving. You can
 check the CS times in /proc/pid/status.

Well, it's not wakeup path, limiting entry frequency per waker did zip
squat nada to any policy throughput.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi

On 01/24/2013 11:06 AM, Alex Shi wrote:
 Since the runnable info needs 345ms to accumulate, balancing
 doesn't do well for many tasks burst waking. After talking with Mike
 Galbraith, we are agree to just use runnable avg in power friendly 
 scheduling and keep current instant load in performance scheduling for 
 low latency.
 
 So the biggest change in this version is removing runnable load avg in
 balance and just using runnable data in power balance.
 
 The patchset bases on Linus' tree, includes 3 parts,

Would you like to give some comments, Ingo? :)

Best regards!


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith

On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: 
 On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: 
  On 01/27/2013 06:35 PM, Borislav Petkov wrote:
   On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
   With aim7 compute on 4 node 40 core box, I see stable throughput
   improvement at tasks = nr_cores and below w. balance and powersaving. 
 ... 
   Ok, this is sick. How is balance and powersaving better than perf? Both
   have much more jobs per minute than perf; is that because we do pack
   much more tasks per cpu with balance and powersaving?
  
  Maybe it is due to the lazy balancing on balance/powersaving. You can
  check the CS times in /proc/pid/status.
 
 Well, it's not wakeup path, limiting entry frequency per waker did zip
 squat nada to any policy throughput.

monteverdi:/abuild/mike/:[0]# echo powersaving  
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
043321  00058616
043313  00058616
043318  00058968
043317  00058968
043316  00059184
043319  00059192
043320  00059048
043314  00059048
043312  00058176
043315  00058184
monteverdi:/abuild/mike/:[0]# echo balance  
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
043337  00053448
04  00053456
043338  00052992
043331  00053448
043332  00053488
043335  00053496
043334  00053480
043329  00053288
043336  00053464
043330  00053496
monteverdi:/abuild/mike/:[0]# echo performance  
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# massive_intr 10 60
043348  00052488
043344  00052488
043349  00052744
043343  00052504
043347  00052504
043352  00052888
043345  00052504
043351  00052496
043346  00052496
043350  00052304
monteverdi:/abuild/mike/:[0]#

Zzzt.  Wish I could turn turbo thingy off.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi

On 01/27/2013 06:40 PM, Borislav Petkov wrote:
 On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
 Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
 hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
 loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
 performance change found.
 
 Ok, good, You could put that in one of the commit messages so that it is
 there and people know that this patchset doesn't cause perf regressions
 with the bunch of benchmarks.
 
 I also tested balance policy/powersaving policy with above benchmark,
 found, the specjbb2005 drop much 30~50% on both of policy whenever
 with openjdk or jrockit. and hackbench drops a lots with powersaving
 policy on snb 4 sockets platforms. others has no clear change.
 
 I guess this is expected because there has to be some performance hit
 when saving power...
 

BTW, I had tested the v3 version based on sched numa -- on tip/master.
The specjbb just has about 5~7% dropping on balance/powersaving policy.
The power scheduling done after the numa scheduling logical.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi

On 01/28/2013 01:17 PM, Mike Galbraith wrote:
 On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: 
 On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: 
 On 01/27/2013 06:35 PM, Borislav Petkov wrote:
 On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
 With aim7 compute on 4 node 40 core box, I see stable throughput
 improvement at tasks = nr_cores and below w. balance and powersaving. 
 ... 
 Ok, this is sick. How is balance and powersaving better than perf? Both
 have much more jobs per minute than perf; is that because we do pack
 much more tasks per cpu with balance and powersaving?

 Maybe it is due to the lazy balancing on balance/powersaving. You can
 check the CS times in /proc/pid/status.

 Well, it's not wakeup path, limiting entry frequency per waker did zip
 squat nada to any policy throughput.
 
 monteverdi:/abuild/mike/:[0]# echo powersaving  
 /sys/devices/system/cpu/sched_policy/current_sched_policy
 monteverdi:/abuild/mike/:[0]# massive_intr 10 60
 043321  00058616
 043313  00058616
 043318  00058968
 043317  00058968
 043316  00059184
 043319  00059192
 043320  00059048
 043314  00059048
 043312  00058176
 043315  00058184
 monteverdi:/abuild/mike/:[0]# echo balance  
 /sys/devices/system/cpu/sched_policy/current_sched_policy
 monteverdi:/abuild/mike/:[0]# massive_intr 10 60
 043337  00053448
 04  00053456
 043338  00052992
 043331  00053448
 043332  00053488
 043335  00053496
 043334  00053480
 043329  00053288
 043336  00053464
 043330  00053496
 monteverdi:/abuild/mike/:[0]# echo performance  
 /sys/devices/system/cpu/sched_policy/current_sched_policy
 monteverdi:/abuild/mike/:[0]# massive_intr 10 60
 043348  00052488
 043344  00052488
 043349  00052744
 043343  00052504
 043347  00052504
 043352  00052888
 043345  00052504
 043351  00052496
 043346  00052496
 043350  00052304
 monteverdi:/abuild/mike/:[0]#

similar with aim7 results. Thanks, Mike!

Wold you like to collect vmstat info in background?
 
 Zzzt.  Wish I could turn turbo thingy off.

Do you mean the turbo mode of cpu frequency? I remember some of machine
can disable it in BIOS.
 
 -Mike
 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith

On Mon, 2013-01-28 at 13:51 +0800, Alex Shi wrote: 
 On 01/28/2013 01:17 PM, Mike Galbraith wrote:
  On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: 
  On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: 
  On 01/27/2013 06:35 PM, Borislav Petkov wrote:
  On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
  With aim7 compute on 4 node 40 core box, I see stable throughput
  improvement at tasks = nr_cores and below w. balance and powersaving. 
  ... 
  Ok, this is sick. How is balance and powersaving better than perf? Both
  have much more jobs per minute than perf; is that because we do pack
  much more tasks per cpu with balance and powersaving?
 
  Maybe it is due to the lazy balancing on balance/powersaving. You can
  check the CS times in /proc/pid/status.
 
  Well, it's not wakeup path, limiting entry frequency per waker did zip
  squat nada to any policy throughput.
  
  monteverdi:/abuild/mike/:[0]# echo powersaving  
  /sys/devices/system/cpu/sched_policy/current_sched_policy
  monteverdi:/abuild/mike/:[0]# massive_intr 10 60
  043321  00058616
  043313  00058616
  043318  00058968
  043317  00058968
  043316  00059184
  043319  00059192
  043320  00059048
  043314  00059048
  043312  00058176
  043315  00058184
  monteverdi:/abuild/mike/:[0]# echo balance  
  /sys/devices/system/cpu/sched_policy/current_sched_policy
  monteverdi:/abuild/mike/:[0]# massive_intr 10 60
  043337  00053448
  04  00053456
  043338  00052992
  043331  00053448
  043332  00053488
  043335  00053496
  043334  00053480
  043329  00053288
  043336  00053464
  043330  00053496
  monteverdi:/abuild/mike/:[0]# echo performance  
  /sys/devices/system/cpu/sched_policy/current_sched_policy
  monteverdi:/abuild/mike/:[0]# massive_intr 10 60
  043348  00052488
  043344  00052488
  043349  00052744
  043343  00052504
  043347  00052504
  043352  00052888
  043345  00052504
  043351  00052496
  043346  00052496
  043350  00052304
  monteverdi:/abuild/mike/:[0]#
 
 similar with aim7 results. Thanks, Mike!
 
 Wold you like to collect vmstat info in background?
  
  Zzzt.  Wish I could turn turbo thingy off.
 
 Do you mean the turbo mode of cpu frequency? I remember some of machine
 can disable it in BIOS.

Yeah, I can do that in my local x3550 box.  I can't fiddle with BIOS
settings on the remote NUMA box.

This can't be anything but turbo gizmo mucking up the numbers I think,
not that the numbers are invalid or anything, better numbers are better
numbers no matter where/how they come about ;-)

The massive_intr load is dirt simple sleep/spin with bean counting.  It
sleeps 1ms spins 8ms.  Change that to sleep 8ms, grind away for 1ms...

monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
045150  6484
045157  6427
045156  6401
045152  6428
045155  6372
045154  6370
045158  6453
045149  6372
045151  6371
045153  6371
monteverdi:/abuild/mike/:[0]# echo balance  
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
045170  6380
045172  6374
045169  6376
045175  6376
045171  6334
045176  6380
045168  6374
045174  6334
045177  6375
045173  6376
monteverdi:/abuild/mike/:[0]# echo performance  
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
045198  6408
045191  6408
045197  6408
045192  6411
045194  6409
045196  6409
045195  6336
045189  6336
045193  6411
045190  6410

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith

On Mon, 2013-01-28 at 07:15 +0100, Mike Galbraith wrote: 
 On Mon, 2013-01-28 at 13:51 +0800, Alex Shi wrote: 
  On 01/28/2013 01:17 PM, Mike Galbraith wrote:
   On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: 
   On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: 
   On 01/27/2013 06:35 PM, Borislav Petkov wrote:
   On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
   With aim7 compute on 4 node 40 core box, I see stable throughput
   improvement at tasks = nr_cores and below w. balance and powersaving. 
   ... 
   Ok, this is sick. How is balance and powersaving better than perf? Both
   have much more jobs per minute than perf; is that because we do pack
   much more tasks per cpu with balance and powersaving?
  
   Maybe it is due to the lazy balancing on balance/powersaving. You can
   check the CS times in /proc/pid/status.
  
   Well, it's not wakeup path, limiting entry frequency per waker did zip
   squat nada to any policy throughput.
   
   monteverdi:/abuild/mike/:[0]# echo powersaving  
   /sys/devices/system/cpu/sched_policy/current_sched_policy
   monteverdi:/abuild/mike/:[0]# massive_intr 10 60
   043321  00058616
   043313  00058616
   043318  00058968
   043317  00058968
   043316  00059184
   043319  00059192
   043320  00059048
   043314  00059048
   043312  00058176
   043315  00058184
   monteverdi:/abuild/mike/:[0]# echo balance  
   /sys/devices/system/cpu/sched_policy/current_sched_policy
   monteverdi:/abuild/mike/:[0]# massive_intr 10 60
   043337  00053448
   04  00053456
   043338  00052992
   043331  00053448
   043332  00053488
   043335  00053496
   043334  00053480
   043329  00053288
   043336  00053464
   043330  00053496
   monteverdi:/abuild/mike/:[0]# echo performance  
   /sys/devices/system/cpu/sched_policy/current_sched_policy
   monteverdi:/abuild/mike/:[0]# massive_intr 10 60
   043348  00052488
   043344  00052488
   043349  00052744
   043343  00052504
   043347  00052504
   043352  00052888
   043345  00052504
   043351  00052496
   043346  00052496
   043350  00052304
   monteverdi:/abuild/mike/:[0]#
  
  similar with aim7 results. Thanks, Mike!
  
  Wold you like to collect vmstat info in background?
   
   Zzzt.  Wish I could turn turbo thingy off.
  
  Do you mean the turbo mode of cpu frequency? I remember some of machine
  can disable it in BIOS.
 
 Yeah, I can do that in my local x3550 box.  I can't fiddle with BIOS
 settings on the remote NUMA box.
 
 This can't be anything but turbo gizmo mucking up the numbers I think,
 not that the numbers are invalid or anything, better numbers are better
 numbers no matter where/how they come about ;-)
 
 The massive_intr load is dirt simple sleep/spin with bean counting.  It
 sleeps 1ms spins 8ms.  Change that to sleep 8ms, grind away for 1ms...
 
 monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
 045150  6484
 045157  6427
 045156  6401
 045152  6428
 045155  6372
 045154  6370
 045158  6453
 045149  6372
 045151  6371
 045153  6371
 monteverdi:/abuild/mike/:[0]# echo balance  
 /sys/devices/system/cpu/sched_policy/current_sched_policy
 monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
 045170  6380
 045172  6374
 045169  6376
 045175  6376
 045171  6334
 045176  6380
 045168  6374
 045174  6334
 045177  6375
 045173  6376
 monteverdi:/abuild/mike/:[0]# echo performance  
 /sys/devices/system/cpu/sched_policy/current_sched_policy
 monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
 045198  6408
 045191  6408
 045197  6408
 045192  6411
 045194  6409
 045196  6409
 045195  6336
 045189  6336
 045193  6411
 045190  6410

Back to original 1ms sleep, 8ms work, turning NUMA box into a single
node 10 core box with numactl.

monteverdi:/abuild/mike/:[0]# echo powersaving  
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
045286  00043872
045289  00043464
045284  00043488
045287  00043440
045283  00043416
045281  00044456
045285  00043456
045288  00044312
045280  00043048
045282  00043240
monteverdi:/abuild/mike/:[0]# echo balance  
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
045300  00052536
045307  00052472
045304  00052536
045299  00052536
045305  00052520
045306  00052528
045302  00052528
045303  00052528
045308  00052512
045301  00052520
monteverdi:/abuild/mike/:[0]# echo performance  
/sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
045339  00052600
045340  00052608
045338  00052600
045337  00052608
045343  00052600
045341  00052600
045336  00052608
045335  00052616
045334  00052576
045342  00052600

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith

On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: 
 On 01/27/2013 06:40 PM, Borislav Petkov wrote:
  On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
  Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
  hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
  loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
  performance change found.
  
  Ok, good, You could put that in one of the commit messages so that it is
  there and people know that this patchset doesn't cause perf regressions
  with the bunch of benchmarks.
  
  I also tested balance policy/powersaving policy with above benchmark,
  found, the specjbb2005 drop much 30~50% on both of policy whenever
  with openjdk or jrockit. and hackbench drops a lots with powersaving
  policy on snb 4 sockets platforms. others has no clear change.
  
  I guess this is expected because there has to be some performance hit
  when saving power...
  
 
 BTW, I had tested the v3 version based on sched numa -- on tip/master.
 The specjbb just has about 5~7% dropping on balance/powersaving policy.
 The power scheduling done after the numa scheduling logical.

That makes sense.  How the numa scheduling numbers compare to mainline?
Do you have all three available, mainline, and tip w. w/o powersaving
policy?

-Mike


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Alex Shi

On 01/28/2013 02:49 PM, Mike Galbraith wrote:
 On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: 
 On 01/27/2013 06:40 PM, Borislav Petkov wrote:
 On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
 Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
 hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
 loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
 performance change found.

 Ok, good, You could put that in one of the commit messages so that it is
 there and people know that this patchset doesn't cause perf regressions
 with the bunch of benchmarks.

 I also tested balance policy/powersaving policy with above benchmark,
 found, the specjbb2005 drop much 30~50% on both of policy whenever
 with openjdk or jrockit. and hackbench drops a lots with powersaving
 policy on snb 4 sockets platforms. others has no clear change.

 I guess this is expected because there has to be some performance hit
 when saving power...


 BTW, I had tested the v3 version based on sched numa -- on tip/master.
 The specjbb just has about 5~7% dropping on balance/powersaving policy.
 The power scheduling done after the numa scheduling logical.
 
 That makes sense.  How the numa scheduling numbers compare to mainline?
 Do you have all three available, mainline, and tip w. w/o powersaving
 policy?
 

I once caught 20~40% performance increasing on sched numa VS mainline
3.7-rc5. but have no baseline to compare balance/powersaving performance
since lower data are acceptable for balance/powersaving and
tip/master changes too quickly to follow up at that time.
:)

 -Mike
 
 


-- 
Thanks Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith

On Mon, 2013-01-28 at 07:42 +0100, Mike Galbraith wrote:

 Back to original 1ms sleep, 8ms work, turning NUMA box into a single
 node 10 core box with numactl.

(aim7 in one 10 core node.. so spread, no delta.)

Benchmark   Version Machine Run Date
AIM Multiuser Benchmark - Suite VII 1.1   powersaving Jan 28 08:04:14 
2013

Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
1   441.0   100 13.73.7 7.3508
5   2516.6  98  12.08.1 8.3887
10  5215.1  98  11.611.98.6919
20  10475.4 99  11.621.78.7295
40  20216.8 99  12.038.28.4237
80  35568.6 99  13.671.47.4101
160 57102.5 98  17.0138.2   5.9482
320 82099.9 97  23.6271.1   4.2760
Benchmark   Version Machine Run Date
AIM Multiuser Benchmark - Suite VII 1.1   balance Jan 28 08:06:49 2013

Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
1   439.4   100 13.83.8 7.3241
5   2583.1  98  11.77.2 8.6104
10  5325.1  99  11.411.08.8752
20  10687.8 99  11.323.68.9065
40  20200.0 99  12.038.78.4167
80  35464.5 98  13.771.47.3884
160 57203.5 98  16.9137.9   5.9587
320 82065.2 98  23.6271.1   4.2742
Benchmark   Version Machine Run Date
AIM Multiuser Benchmark - Suite VII 1.1   performance Jan 28 08:09:20 
2013

Tasks   Jobs/MinJTI RealCPU Jobs/sec/task
1   438.8   100 13.83.8 7.3135
5   2634.8  99  11.57.2 8.7826
10  5396.3  99  11.211.48.9938
20  10725.7 99  11.324.08.9381
40  20183.2 99  12.038.58.4097
80  35620.9 99  13.671.47.4210
160 57203.5 98  16.9137.8   5.9587
320 81995.8 98  23.7271.3   4.2706

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-27 Thread Mike Galbraith

On Mon, 2013-01-28 at 15:17 +0800, Alex Shi wrote: 
 On 01/28/2013 02:49 PM, Mike Galbraith wrote:
  On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: 
  On 01/27/2013 06:40 PM, Borislav Petkov wrote:
  On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote:
  Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
  hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
  loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
  performance change found.
 
  Ok, good, You could put that in one of the commit messages so that it is
  there and people know that this patchset doesn't cause perf regressions
  with the bunch of benchmarks.
 
  I also tested balance policy/powersaving policy with above benchmark,
  found, the specjbb2005 drop much 30~50% on both of policy whenever
  with openjdk or jrockit. and hackbench drops a lots with powersaving
  policy on snb 4 sockets platforms. others has no clear change.
 
  I guess this is expected because there has to be some performance hit
  when saving power...
 
 
  BTW, I had tested the v3 version based on sched numa -- on tip/master.
  The specjbb just has about 5~7% dropping on balance/powersaving policy.
  The power scheduling done after the numa scheduling logical.
  
  That makes sense.  How the numa scheduling numbers compare to mainline?
  Do you have all three available, mainline, and tip w. w/o powersaving
  policy?
  
 
 I once caught 20~40% performance increasing on sched numa VS mainline
 3.7-rc5. but have no baseline to compare balance/powersaving performance
 since lower data are acceptable for balance/powersaving and
 tip/master changes too quickly to follow up at that time.
 :)

(wow.  dram sucks, dram+smp sucks more, dram+smp+numa _sucks rocks_;)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-26 Thread Mike Galbraith

On Sun, 2013-01-27 at 10:41 +0800, Alex Shi wrote: 
> On 01/24/2013 11:07 PM, Alex Shi wrote:
> > On 01/24/2013 05:44 PM, Borislav Petkov wrote:
> >> On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote:
> >>> Since the runnable info needs 345ms to accumulate, balancing
> >>> doesn't do well for many tasks burst waking. After talking with Mike
> >>> Galbraith, we are agree to just use runnable avg in power friendly 
> >>> scheduling and keep current instant load in performance scheduling for 
> >>> low latency.
> >>>
> >>> So the biggest change in this version is removing runnable load avg in
> >>> balance and just using runnable data in power balance.
> >>>
> >>> The patchset bases on Linus' tree, includes 3 parts,
> >>> ** 1, bug fix and fork/wake balancing clean up. patch 1~5,
> >>> --
> >>> the first patch remove one domain level. patch 2~5 simplified fork/wake
> >>> balancing, it can increase 10+% hackbench performance on our 4 sockets
> >>> SNB EP machine.
> >>
> >> Ok, I see some benchmarking results here and there in the commit
> >> messages but since this is touching the scheduler, you probably would
> >> need to make sure it doesn't introduce performance regressions vs
> >> mainline with a comprehensive set of benchmarks.
> >>
> > 
> > Thanks a lot for your comments, Borislav! :)
> > 
> > For this patchset, the code will just check current policy, if it is
> > performance, the code patch will back to original performance code at
> > once. So there should no performance change on performance policy.
> > 
> > I once tested the balance policy performance with benchmark
> > kbuild/hackbench/aim9/dbench/tbench on version 2, only hackbench has a
> > bit drop ~3%. others have no clear change.
> > 
> >> And, AFAICR, mainline does by default the 'performance' scheme by
> >> spreading out tasks to idle cores, so have you tried comparing vanilla
> >> mainline to your patchset in the 'performance' setting so that you can
> >> make sure there are no problems there? And not only hackbench or a
> >> microbenchmark but aim9 (I saw that in a commit message somewhere) and
> >> whatever else multithreaded benchmark you can get your hands on.
> >>
> >> Also, you might want to run it on other machines too, not only SNB :-)
> > 
> > Anyway I will redo the performance testing on this version again on all
> > machine. but doesn't expect something change. :)
> 
> Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
> loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
> performance change found.

With aim7 compute on 4 node 40 core box, I see stable throughput
improvement at tasks = nr_cores and below w. balance and powersaving. 

 3.8.0-performance  3.8.0-balance   
   3.8.0-powersaving
Tasksjobs/min  jti  jobs/min/task  real   cpu   jobs/min  jti  
jobs/min/task  real   cpu   jobs/min  jti  jobs/min/task  real  
 cpu
1  432.86  100   432.8571 14.00  3.99 433.48  100   
433.4764 13.98  3.97 433.17  100   433.1665 13.99  3.98
1  437.23  100   437.2294 13.86  3.85 436.60  100   
436.5994 13.88  3.86 435.66  100   435.6578 13.91  3.90
1  434.10  100   434.0974 13.96  3.95 436.29  100   
436.2851 13.89  3.89 436.29  100   436.2851 13.89  3.87
5 2400.95   99   480.1902 12.62 12.492554.81   98   
510.9612 11.86  7.552487.68   98   497.5369 12.18  8.22
5 2341.58   99   468.3153 12.94 13.952578.72   99   
515.7447 11.75  7.252527.11   99   505.4212 11.99  7.90
5 2350.66   99   470.1319 12.89 13.662600.86   99   
520.1717 11.65  7.092508.28   98   501.6556 12.08  8.24
   10 4291.78   99   429.1785 14.12 40.145334.51   99   
533.4507 11.36 11.135183.92   98   518.3918 11.69 12.15
   10 4334.76   99   433.4764 13.98 38.705311.13   99   
531.1131 11.41 11.235215.15   99   521.5146 11.62 12.53
   10 4273.62   99   427.3625 14.18 40.295287.96   99   
528.7958 11.46 11.465144.31   98   514.4312 11.78 12.32
   20 8487.39   94   424.3697 14.28 63.14   10594.41   99   
529.7203 11.44 23.72   10575.92   99   528.7958 11.46 22.08
   20 8387.54   97   419.3772 14.45 77.01   10575.92   98   
528.7958 11.46 23.41   10520.83   99   526.0417 11.52 21.88
   20 8713.16   95   435.6578 13.91 55.10   10659.63   99   
532.9815 11.37 24.17   10539.13   99   526.9565 11.50

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-26 Thread Alex Shi

On 01/24/2013 11:07 PM, Alex Shi wrote:
> On 01/24/2013 05:44 PM, Borislav Petkov wrote:
>> On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote:
>>> Since the runnable info needs 345ms to accumulate, balancing
>>> doesn't do well for many tasks burst waking. After talking with Mike
>>> Galbraith, we are agree to just use runnable avg in power friendly 
>>> scheduling and keep current instant load in performance scheduling for 
>>> low latency.
>>>
>>> So the biggest change in this version is removing runnable load avg in
>>> balance and just using runnable data in power balance.
>>>
>>> The patchset bases on Linus' tree, includes 3 parts,
>>> ** 1, bug fix and fork/wake balancing clean up. patch 1~5,
>>> --
>>> the first patch remove one domain level. patch 2~5 simplified fork/wake
>>> balancing, it can increase 10+% hackbench performance on our 4 sockets
>>> SNB EP machine.
>>
>> Ok, I see some benchmarking results here and there in the commit
>> messages but since this is touching the scheduler, you probably would
>> need to make sure it doesn't introduce performance regressions vs
>> mainline with a comprehensive set of benchmarks.
>>
> 
> Thanks a lot for your comments, Borislav! :)
> 
> For this patchset, the code will just check current policy, if it is
> performance, the code patch will back to original performance code at
> once. So there should no performance change on performance policy.
> 
> I once tested the balance policy performance with benchmark
> kbuild/hackbench/aim9/dbench/tbench on version 2, only hackbench has a
> bit drop ~3%. others have no clear change.
> 
>> And, AFAICR, mainline does by default the 'performance' scheme by
>> spreading out tasks to idle cores, so have you tried comparing vanilla
>> mainline to your patchset in the 'performance' setting so that you can
>> make sure there are no problems there? And not only hackbench or a
>> microbenchmark but aim9 (I saw that in a commit message somewhere) and
>> whatever else multithreaded benchmark you can get your hands on.
>>
>> Also, you might want to run it on other machines too, not only SNB :-)
> 
> Anyway I will redo the performance testing on this version again on all
> machine. but doesn't expect something change. :)

Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
performance change found.

I also tested balance policy/powersaving policy with above benchmark,
found, the specjbb2005 drop much 30~50% on both of policy whenever with
openjdk or jrockit. and hackbench drops a lots with powersaving policy
on snb 4 sockets platforms. others has no clear change.

> 
>> And what about ARM, maybe someone there can run your patchset too?
>>
>> So, it would be cool to see comprehensive results from all those runs
>> and see what the numbers say.
>>
>> Thanks.
>>
> 
> 


-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-26 Thread Alex Shi

On 01/24/2013 11:07 PM, Alex Shi wrote:
 On 01/24/2013 05:44 PM, Borislav Petkov wrote:
 On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote:
 Since the runnable info needs 345ms to accumulate, balancing
 doesn't do well for many tasks burst waking. After talking with Mike
 Galbraith, we are agree to just use runnable avg in power friendly 
 scheduling and keep current instant load in performance scheduling for 
 low latency.

 So the biggest change in this version is removing runnable load avg in
 balance and just using runnable data in power balance.

 The patchset bases on Linus' tree, includes 3 parts,
 ** 1, bug fix and fork/wake balancing clean up. patch 1~5,
 --
 the first patch remove one domain level. patch 2~5 simplified fork/wake
 balancing, it can increase 10+% hackbench performance on our 4 sockets
 SNB EP machine.

 Ok, I see some benchmarking results here and there in the commit
 messages but since this is touching the scheduler, you probably would
 need to make sure it doesn't introduce performance regressions vs
 mainline with a comprehensive set of benchmarks.

 
 Thanks a lot for your comments, Borislav! :)
 
 For this patchset, the code will just check current policy, if it is
 performance, the code patch will back to original performance code at
 once. So there should no performance change on performance policy.
 
 I once tested the balance policy performance with benchmark
 kbuild/hackbench/aim9/dbench/tbench on version 2, only hackbench has a
 bit drop ~3%. others have no clear change.
 
 And, AFAICR, mainline does by default the 'performance' scheme by
 spreading out tasks to idle cores, so have you tried comparing vanilla
 mainline to your patchset in the 'performance' setting so that you can
 make sure there are no problems there? And not only hackbench or a
 microbenchmark but aim9 (I saw that in a commit message somewhere) and
 whatever else multithreaded benchmark you can get your hands on.

 Also, you might want to run it on other machines too, not only SNB :-)
 
 Anyway I will redo the performance testing on this version again on all
 machine. but doesn't expect something change. :)

Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
performance change found.

I also tested balance policy/powersaving policy with above benchmark,
found, the specjbb2005 drop much 30~50% on both of policy whenever with
openjdk or jrockit. and hackbench drops a lots with powersaving policy
on snb 4 sockets platforms. others has no clear change.

 
 And what about ARM, maybe someone there can run your patchset too?

 So, it would be cool to see comprehensive results from all those runs
 and see what the numbers say.

 Thanks.

 
 


-- 
Thanks
Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-26 Thread Mike Galbraith

On Sun, 2013-01-27 at 10:41 +0800, Alex Shi wrote: 
 On 01/24/2013 11:07 PM, Alex Shi wrote:
  On 01/24/2013 05:44 PM, Borislav Petkov wrote:
  On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote:
  Since the runnable info needs 345ms to accumulate, balancing
  doesn't do well for many tasks burst waking. After talking with Mike
  Galbraith, we are agree to just use runnable avg in power friendly 
  scheduling and keep current instant load in performance scheduling for 
  low latency.
 
  So the biggest change in this version is removing runnable load avg in
  balance and just using runnable data in power balance.
 
  The patchset bases on Linus' tree, includes 3 parts,
  ** 1, bug fix and fork/wake balancing clean up. patch 1~5,
  --
  the first patch remove one domain level. patch 2~5 simplified fork/wake
  balancing, it can increase 10+% hackbench performance on our 4 sockets
  SNB EP machine.
 
  Ok, I see some benchmarking results here and there in the commit
  messages but since this is touching the scheduler, you probably would
  need to make sure it doesn't introduce performance regressions vs
  mainline with a comprehensive set of benchmarks.
 
  
  Thanks a lot for your comments, Borislav! :)
  
  For this patchset, the code will just check current policy, if it is
  performance, the code patch will back to original performance code at
  once. So there should no performance change on performance policy.
  
  I once tested the balance policy performance with benchmark
  kbuild/hackbench/aim9/dbench/tbench on version 2, only hackbench has a
  bit drop ~3%. others have no clear change.
  
  And, AFAICR, mainline does by default the 'performance' scheme by
  spreading out tasks to idle cores, so have you tried comparing vanilla
  mainline to your patchset in the 'performance' setting so that you can
  make sure there are no problems there? And not only hackbench or a
  microbenchmark but aim9 (I saw that in a commit message somewhere) and
  whatever else multithreaded benchmark you can get your hands on.
 
  Also, you might want to run it on other machines too, not only SNB :-)
  
  Anyway I will redo the performance testing on this version again on all
  machine. but doesn't expect something change. :)
 
 Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9,
 hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads
 loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear
 performance change found.

With aim7 compute on 4 node 40 core box, I see stable throughput
improvement at tasks = nr_cores and below w. balance and powersaving. 

 3.8.0-performance  3.8.0-balance   
   3.8.0-powersaving
Tasksjobs/min  jti  jobs/min/task  real   cpu   jobs/min  jti  
jobs/min/task  real   cpu   jobs/min  jti  jobs/min/task  real  
 cpu
1  432.86  100   432.8571 14.00  3.99 433.48  100   
433.4764 13.98  3.97 433.17  100   433.1665 13.99  3.98
1  437.23  100   437.2294 13.86  3.85 436.60  100   
436.5994 13.88  3.86 435.66  100   435.6578 13.91  3.90
1  434.10  100   434.0974 13.96  3.95 436.29  100   
436.2851 13.89  3.89 436.29  100   436.2851 13.89  3.87
5 2400.95   99   480.1902 12.62 12.492554.81   98   
510.9612 11.86  7.552487.68   98   497.5369 12.18  8.22
5 2341.58   99   468.3153 12.94 13.952578.72   99   
515.7447 11.75  7.252527.11   99   505.4212 11.99  7.90
5 2350.66   99   470.1319 12.89 13.662600.86   99   
520.1717 11.65  7.092508.28   98   501.6556 12.08  8.24
   10 4291.78   99   429.1785 14.12 40.145334.51   99   
533.4507 11.36 11.135183.92   98   518.3918 11.69 12.15
   10 4334.76   99   433.4764 13.98 38.705311.13   99   
531.1131 11.41 11.235215.15   99   521.5146 11.62 12.53
   10 4273.62   99   427.3625 14.18 40.295287.96   99   
528.7958 11.46 11.465144.31   98   514.4312 11.78 12.32
   20 8487.39   94   424.3697 14.28 63.14   10594.41   99   
529.7203 11.44 23.72   10575.92   99   528.7958 11.46 22.08
   20 8387.54   97   419.3772 14.45 77.01   10575.92   98   
528.7958 11.46 23.41   10520.83   99   526.0417 11.52 21.88
   20 8713.16   95   435.6578 13.91 55.10   10659.63   99   
532.9815 11.37 24.17   10539.13   99   526.9565 11.50 22.13
   4016786.70   99   419.6676 14.44170.08   19469.88   98   
486.7470 12.45 60.78   19967.05   98

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-24 Thread Alex Shi

On 01/24/2013 05:44 PM, Borislav Petkov wrote:
> On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote:
>> Since the runnable info needs 345ms to accumulate, balancing
>> doesn't do well for many tasks burst waking. After talking with Mike
>> Galbraith, we are agree to just use runnable avg in power friendly 
>> scheduling and keep current instant load in performance scheduling for 
>> low latency.
>>
>> So the biggest change in this version is removing runnable load avg in
>> balance and just using runnable data in power balance.
>>
>> The patchset bases on Linus' tree, includes 3 parts,
>> ** 1, bug fix and fork/wake balancing clean up. patch 1~5,
>> --
>> the first patch remove one domain level. patch 2~5 simplified fork/wake
>> balancing, it can increase 10+% hackbench performance on our 4 sockets
>> SNB EP machine.
> 
> Ok, I see some benchmarking results here and there in the commit
> messages but since this is touching the scheduler, you probably would
> need to make sure it doesn't introduce performance regressions vs
> mainline with a comprehensive set of benchmarks.
> 

Thanks a lot for your comments, Borislav! :)

For this patchset, the code will just check current policy, if it is
performance, the code patch will back to original performance code at
once. So there should no performance change on performance policy.

I once tested the balance policy performance with benchmark
kbuild/hackbench/aim9/dbench/tbench on version 2, only hackbench has a
bit drop ~3%. others have no clear change.

> And, AFAICR, mainline does by default the 'performance' scheme by
> spreading out tasks to idle cores, so have you tried comparing vanilla
> mainline to your patchset in the 'performance' setting so that you can
> make sure there are no problems there? And not only hackbench or a
> microbenchmark but aim9 (I saw that in a commit message somewhere) and
> whatever else multithreaded benchmark you can get your hands on.
> 
> Also, you might want to run it on other machines too, not only SNB :-)

Anyway I will redo the performance testing on this version again on all
machine. but doesn't expect something change. :)

> And what about ARM, maybe someone there can run your patchset too?
> 
> So, it would be cool to see comprehensive results from all those runs
> and see what the numbers say.
> 
> Thanks.
> 


-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-24 Thread Borislav Petkov

On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote:
> Since the runnable info needs 345ms to accumulate, balancing
> doesn't do well for many tasks burst waking. After talking with Mike
> Galbraith, we are agree to just use runnable avg in power friendly 
> scheduling and keep current instant load in performance scheduling for 
> low latency.
> 
> So the biggest change in this version is removing runnable load avg in
> balance and just using runnable data in power balance.
> 
> The patchset bases on Linus' tree, includes 3 parts,
> ** 1, bug fix and fork/wake balancing clean up. patch 1~5,
> --
> the first patch remove one domain level. patch 2~5 simplified fork/wake
> balancing, it can increase 10+% hackbench performance on our 4 sockets
> SNB EP machine.

Ok, I see some benchmarking results here and there in the commit
messages but since this is touching the scheduler, you probably would
need to make sure it doesn't introduce performance regressions vs
mainline with a comprehensive set of benchmarks.

And, AFAICR, mainline does by default the 'performance' scheme by
spreading out tasks to idle cores, so have you tried comparing vanilla
mainline to your patchset in the 'performance' setting so that you can
make sure there are no problems there? And not only hackbench or a
microbenchmark but aim9 (I saw that in a commit message somewhere) and
whatever else multithreaded benchmark you can get your hands on.

Also, you might want to run it on other machines too, not only SNB :-)
And what about ARM, maybe someone there can run your patchset too?

So, it would be cool to see comprehensive results from all those runs
and see what the numbers say.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-24 Thread Alex Shi

On 01/24/2013 05:44 PM, Borislav Petkov wrote:
 On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote:
 Since the runnable info needs 345ms to accumulate, balancing
 doesn't do well for many tasks burst waking. After talking with Mike
 Galbraith, we are agree to just use runnable avg in power friendly 
 scheduling and keep current instant load in performance scheduling for 
 low latency.

 So the biggest change in this version is removing runnable load avg in
 balance and just using runnable data in power balance.

 The patchset bases on Linus' tree, includes 3 parts,
 ** 1, bug fix and fork/wake balancing clean up. patch 1~5,
 --
 the first patch remove one domain level. patch 2~5 simplified fork/wake
 balancing, it can increase 10+% hackbench performance on our 4 sockets
 SNB EP machine.
 
 Ok, I see some benchmarking results here and there in the commit
 messages but since this is touching the scheduler, you probably would
 need to make sure it doesn't introduce performance regressions vs
 mainline with a comprehensive set of benchmarks.
 

Thanks a lot for your comments, Borislav! :)

For this patchset, the code will just check current policy, if it is
performance, the code patch will back to original performance code at
once. So there should no performance change on performance policy.

I once tested the balance policy performance with benchmark
kbuild/hackbench/aim9/dbench/tbench on version 2, only hackbench has a
bit drop ~3%. others have no clear change.

 And, AFAICR, mainline does by default the 'performance' scheme by
 spreading out tasks to idle cores, so have you tried comparing vanilla
 mainline to your patchset in the 'performance' setting so that you can
 make sure there are no problems there? And not only hackbench or a
 microbenchmark but aim9 (I saw that in a commit message somewhere) and
 whatever else multithreaded benchmark you can get your hands on.
 
 Also, you might want to run it on other machines too, not only SNB :-)

Anyway I will redo the performance testing on this version again on all
machine. but doesn't expect something change. :)

 And what about ARM, maybe someone there can run your patchset too?
 
 So, it would be cool to see comprehensive results from all those runs
 and see what the numbers say.
 
 Thanks.
 


-- 
Thanks
Alex
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-24 Thread Borislav Petkov

On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote:
 Since the runnable info needs 345ms to accumulate, balancing
 doesn't do well for many tasks burst waking. After talking with Mike
 Galbraith, we are agree to just use runnable avg in power friendly 
 scheduling and keep current instant load in performance scheduling for 
 low latency.
 
 So the biggest change in this version is removing runnable load avg in
 balance and just using runnable data in power balance.
 
 The patchset bases on Linus' tree, includes 3 parts,
 ** 1, bug fix and fork/wake balancing clean up. patch 1~5,
 --
 the first patch remove one domain level. patch 2~5 simplified fork/wake
 balancing, it can increase 10+% hackbench performance on our 4 sockets
 SNB EP machine.

Ok, I see some benchmarking results here and there in the commit
messages but since this is touching the scheduler, you probably would
need to make sure it doesn't introduce performance regressions vs
mainline with a comprehensive set of benchmarks.

And, AFAICR, mainline does by default the 'performance' scheme by
spreading out tasks to idle cores, so have you tried comparing vanilla
mainline to your patchset in the 'performance' setting so that you can
make sure there are no problems there? And not only hackbench or a
microbenchmark but aim9 (I saw that in a commit message somewhere) and
whatever else multithreaded benchmark you can get your hands on.

Also, you might want to run it on other machines too, not only SNB :-)
And what about ARM, maybe someone there can run your patchset too?

So, it would be cool to see comprehensive results from all those runs
and see what the numbers say.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-23 Thread Alex Shi

Since the runnable info needs 345ms to accumulate, balancing
doesn't do well for many tasks burst waking. After talking with Mike
Galbraith, we are agree to just use runnable avg in power friendly 
scheduling and keep current instant load in performance scheduling for 
low latency.

So the biggest change in this version is removing runnable load avg in
balance and just using runnable data in power balance.

The patchset bases on Linus' tree, includes 3 parts,
** 1, bug fix and fork/wake balancing clean up. patch 1~5,
--
the first patch remove one domain level. patch 2~5 simplified fork/wake
balancing, it can increase 10+% hackbench performance on our 4 sockets
SNB EP machine.

V3 change:
a, added the first patch to remove one domain level on x86 platform.
b, some small changes according to Namhyung Kim's comments, thanks!

** 2, bug fix of load avg and remove the CONFIG_FAIR_GROUP_SCHED limit
--
patch 6~8, That using runnable avg in load balancing, with
two initial runnable variables fix.

V4 change:
a, remove runnable log avg using in balancing.

V3 change:
a, use rq->cfs.runnable_load_avg as cpu load not
rq->avg.load_avg_contrib, since the latter need much time to accumulate
for new forked task,
b, a build issue fixed with Namhyung Kim's reminder.

** 3, power awareness scheduling, patch 9~18.
--
The subset implement/consummate the rough power aware scheduling
proposal: https://lkml.org/lkml/2012/8/13/139.
It defines 2 new power aware policy 'balance' and 'powersaving' and then
try to spread or pack tasks on each sched groups level according the
different scheduler policy. That can save much power when task number in
system is no more then LCPU number.

As mentioned in the power aware scheduler proposal, Power aware
scheduling has 2 assumptions:
1, race to idle is helpful for power saving
2, pack tasks on less sched_groups will reduce power consumption

The first assumption make performance policy take over scheduling when
system busy.
The second assumption make power aware scheduling try to move
disperse tasks into fewer groups until that groups are full of tasks.

Some power testing data is in the last 2 patches.

V4 change:
a, fix few bugs and clean up code according to Morten Rasmussen, Mike
Galbraith and Namhyung Kim. Thanks!
b, take Morten's suggestion to set different criteria for different
policy in small task packing.
c, shorter latency in power aware scheduling.

V3 change:
a, engaged nr_running in max potential utils consideration in periodic
power balancing.
b, try exec/wake small tasks on running cpu not idle cpu.

V2 change:
a, add lazy power scheduling to deal with kbuild like benchmark.


Thanks Fengguang Wu for the build testing of this patchset!

Any comments are appreciated!

-- Thanks Alex

[patch v4 01/18] sched: set SD_PREFER_SIBLING on MC domain to reduce
[patch v4 02/18] sched: select_task_rq_fair clean up
[patch v4 03/18] sched: fix find_idlest_group mess logical
[patch v4 04/18] sched: don't need go to smaller sched domain
[patch v4 05/18] sched: quicker balancing on fork/exec/wake
[patch v4 06/18] sched: give initial value for runnable avg of sched
[patch v4 07/18] sched: set initial load avg of new forked task
[patch v4 08/18] Revert "sched: Introduce temporary FAIR_GROUP_SCHED
[patch v4 09/18] sched: add sched_policies in kernel
[patch v4 10/18] sched: add sysfs interface for sched_policy
[patch v4 11/18] sched: log the cpu utilization at rq
[patch v4 12/18] sched: add power aware scheduling in fork/exec/wake
[patch v4 13/18] sched: packing small tasks in wake/exec balancing
[patch v4 14/18] sched: add power/performance balance allowed flag
[patch v4 15/18] sched: pull all tasks from source group
[patch v4 16/18] sched: don't care if the local group has capacity
[patch v4 17/18] sched: power aware load balance,
[patch v4 18/18] sched: lazy power balance
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling

2013-01-23 Thread Alex Shi

Since the runnable info needs 345ms to accumulate, balancing
doesn't do well for many tasks burst waking. After talking with Mike
Galbraith, we are agree to just use runnable avg in power friendly 
scheduling and keep current instant load in performance scheduling for 
low latency.

So the biggest change in this version is removing runnable load avg in
balance and just using runnable data in power balance.

The patchset bases on Linus' tree, includes 3 parts,
** 1, bug fix and fork/wake balancing clean up. patch 1~5,
--
the first patch remove one domain level. patch 2~5 simplified fork/wake
balancing, it can increase 10+% hackbench performance on our 4 sockets
SNB EP machine.

V3 change:
a, added the first patch to remove one domain level on x86 platform.
b, some small changes according to Namhyung Kim's comments, thanks!

** 2, bug fix of load avg and remove the CONFIG_FAIR_GROUP_SCHED limit
--
patch 6~8, That using runnable avg in load balancing, with
two initial runnable variables fix.

V4 change:
a, remove runnable log avg using in balancing.

V3 change:
a, use rq-cfs.runnable_load_avg as cpu load not
rq-avg.load_avg_contrib, since the latter need much time to accumulate
for new forked task,
b, a build issue fixed with Namhyung Kim's reminder.

** 3, power awareness scheduling, patch 9~18.
--
The subset implement/consummate the rough power aware scheduling
proposal: https://lkml.org/lkml/2012/8/13/139.
It defines 2 new power aware policy 'balance' and 'powersaving' and then
try to spread or pack tasks on each sched groups level according the
different scheduler policy. That can save much power when task number in
system is no more then LCPU number.

As mentioned in the power aware scheduler proposal, Power aware
scheduling has 2 assumptions:
1, race to idle is helpful for power saving
2, pack tasks on less sched_groups will reduce power consumption

The first assumption make performance policy take over scheduling when
system busy.
The second assumption make power aware scheduling try to move
disperse tasks into fewer groups until that groups are full of tasks.

Some power testing data is in the last 2 patches.

V4 change:
a, fix few bugs and clean up code according to Morten Rasmussen, Mike
Galbraith and Namhyung Kim. Thanks!
b, take Morten's suggestion to set different criteria for different
policy in small task packing.
c, shorter latency in power aware scheduling.

V3 change:
a, engaged nr_running in max potential utils consideration in periodic
power balancing.
b, try exec/wake small tasks on running cpu not idle cpu.

V2 change:
a, add lazy power scheduling to deal with kbuild like benchmark.


Thanks Fengguang Wu for the build testing of this patchset!

Any comments are appreciated!

-- Thanks Alex

[patch v4 01/18] sched: set SD_PREFER_SIBLING on MC domain to reduce
[patch v4 02/18] sched: select_task_rq_fair clean up
[patch v4 03/18] sched: fix find_idlest_group mess logical
[patch v4 04/18] sched: don't need go to smaller sched domain
[patch v4 05/18] sched: quicker balancing on fork/exec/wake
[patch v4 06/18] sched: give initial value for runnable avg of sched
[patch v4 07/18] sched: set initial load avg of new forked task
[patch v4 08/18] Revert sched: Introduce temporary FAIR_GROUP_SCHED
[patch v4 09/18] sched: add sched_policies in kernel
[patch v4 10/18] sched: add sysfs interface for sched_policy
[patch v4 11/18] sched: log the cpu utilization at rq
[patch v4 12/18] sched: add power aware scheduling in fork/exec/wake
[patch v4 13/18] sched: packing small tasks in wake/exec balancing
[patch v4 14/18] sched: add power/performance balance allowed flag
[patch v4 15/18] sched: pull all tasks from source group
[patch v4 16/18] sched: don't care if the local group has capacity
[patch v4 17/18] sched: power aware load balance,
[patch v4 18/18] sched: lazy power balance
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

80 matches

Mail list logo