Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
BTW, Since numa balance scheduling is also a kind of cpu locality policy, it is natural compatible with power aware scheduling. The v2/v3 of this patch had developed on tip/master, testing show above 2 scheduling policy work together well. -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
BTW, Since numa balance scheduling is also a kind of cpu locality policy, it is natural compatible with power aware scheduling. The v2/v3 of this patch had developed on tip/master, testing show above 2 scheduling policy work together well. -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
>> Ingo, I appreciate for any comments from you. :) > > Have you tried to quantify the actual real or expected power > savings with the knob enabled? Thanks a lot for your comments! :) Yes, the following power data copied form patch 17th: --- A test can show the effort on different policy: for ((i = 0; i < I; i++)) ; do while true; do :; done & done On my SNB laptop with 4core* HT: the data is Watts powersaving balance performance i = 2 40 54 54 i = 4 57 64* 68 i = 8 68 68 68 Note: When i = 4 with balance policy, the power may change in 57~68Watt, since the HT capacity and core capacity are both 1. on SNB EP machine with 2 sockets * 8 cores * HT: powersaving balance performance i = 4 190 201 238 i = 8 205 241 268 i = 16 271 348 376 If system has few continued tasks, use power policy can get the performance/power gain. Like sysbench fileio randrw test with 16 thread on the SNB EP box. = and the following from patch 18th --- On my SNB EP 2 sockets machine with 8 cores * HT: 'make -j x vmlinux' results: powersaving balance performance x = 1175.603 /417 13 175.220 /416 13176.073 /407 13 x = 2192.215 /218 23 194.522 /202 25217.393 /200 23 x = 4205.226 /124 39 208.823 /114 42230.425 /105 41 x = 8236.369 /71 59 249.005 /65 61 257.661 /62 62 x = 16 283.842 /48 73 307.465 /40 81 309.336 /39 82 x = 32 325.197 /32 96 333.503 /32 93 336.138 /32 92 data explains: 175.603 /417 13 175.603: average Watts 417: seconds(compile time) 13: scaled performance/power = 100 / seconds / watts = some data for parallel compress: https://lkml.org/lkml/2012/12/11/155 --- Another testing of parallel compress with pigz on Linus' git tree. results show we get much better performance/power with powersaving and balance policy: testing command: #pigz -k -c -p$x -r linux* &> /dev/null On a NHM EP box powersaving balance performance x = 4166.516 /88 68 170.515 /82 71 165.283 /103 58 x = 8173.654 /61 94 177.693 /60 93 172.31 /76 76 On a 2 sockets SNB EP box. powersaving balance performance x = 4190.995 /149 35 200.6 /129 38 208.561 /135 35 x = 8197.969 /108 46 208.885 /103 46213.96 /108 43 x = 16 205.163 /76 64 212.144 /91 51 229.287 /97 44 data format is: 166.516 /88 68 166.516: average Watts 88: seconds(compress time) 68: scaled performance/power = 100 / time / power = BTW, bltk-game with openarena dropped 0.3/1.5 Watt on powersaving policy or 0.2/0.5 Watt on balance policy on my laptop wsm/snb; > > I'd also love to have an automatic policy here, with a knob that > has 3 values: > >0: always disabled >1: automatic >2: always enabled > > here enabled/disabled is your current knob's functionality, and > those can also be used by user-space policy daemons/handlers. Sure, this patch has a knob for user-space policy selecting, $cat /sys/devices/system/cpu/sched_policy/available_sched_policy performance powersaving balance User can change the policy by commend 'echo': echo performance > /sys/devices/system/cpu/current_sched_policy The 'performance' policy means 'always disabled' power friendly scheduling. The 'balance/powersaving' is automatic power friendly scheduling, since system will auto bypass power scheduling when cpus utilisation in a sched domain is beyond the domain's cpu weight (powersaving) or beyond the domain's capacity (balance). There is no always enabled power scheduling, since the patchset bases on 'race to idle'. but it's easy to add this function if needed. > > The interesting thing would be '1' which should be the default: > on laptops that are on battery it should result in a power > saving policy, on laptops that are on AC or on battery-less > systems it should mean 'performance' policy. yes, with above sysfs interface it is easy to be done. :) > > It should generally default to 'performance', switching to > 'power saving on' only if there's positive, reliable information > somewhere in the kernel that we are operating on battery power. > A callback or two would have to go into the ACPI battery driver > I suspect. > > So I'd like this feature to be a tangible improvement for laptop > users (as long as the laptop hardware is passing us battery/AC > events reliably). Maybe it is better to let system admin change it from user space? I am not sure some one like to enable a call back in ACPI battery driver? CC to Zhang Rui. > > Or something like that - with .config
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
* Alex Shi wrote: > On 01/24/2013 11:06 AM, Alex Shi wrote: > > Since the runnable info needs 345ms to accumulate, balancing > > doesn't do well for many tasks burst waking. After talking with Mike > > Galbraith, we are agree to just use runnable avg in power friendly > > scheduling and keep current instant load in performance scheduling for > > low latency. > > > > So the biggest change in this version is removing runnable load avg in > > balance and just using runnable data in power balance. > > > > The patchset bases on Linus' tree, includes 3 parts, > > ** 1, bug fix and fork/wake balancing clean up. patch 1~5, > > -- > > the first patch remove one domain level. patch 2~5 simplified fork/wake > > balancing, it can increase 10+% hackbench performance on our 4 sockets > > SNB EP machine. > > > > V3 change: > > a, added the first patch to remove one domain level on x86 platform. > > b, some small changes according to Namhyung Kim's comments, thanks! > > > > ** 2, bug fix of load avg and remove the CONFIG_FAIR_GROUP_SCHED limit > > -- > > patch 6~8, That using runnable avg in load balancing, with > > two initial runnable variables fix. > > > > V4 change: > > a, remove runnable log avg using in balancing. > > > > V3 change: > > a, use rq->cfs.runnable_load_avg as cpu load not > > rq->avg.load_avg_contrib, since the latter need much time to accumulate > > for new forked task, > > b, a build issue fixed with Namhyung Kim's reminder. > > > > ** 3, power awareness scheduling, patch 9~18. > > -- > > The subset implement/consummate the rough power aware scheduling > > proposal: https://lkml.org/lkml/2012/8/13/139. > > It defines 2 new power aware policy 'balance' and 'powersaving' and then > > try to spread or pack tasks on each sched groups level according the > > different scheduler policy. That can save much power when task number in > > system is no more then LCPU number. > > > > As mentioned in the power aware scheduler proposal, Power aware > > scheduling has 2 assumptions: > > 1, race to idle is helpful for power saving > > 2, pack tasks on less sched_groups will reduce power consumption > > > > The first assumption make performance policy take over scheduling when > > system busy. > > The second assumption make power aware scheduling try to move > > disperse tasks into fewer groups until that groups are full of tasks. > > > > Some power testing data is in the last 2 patches. > > > > V4 change: > > a, fix few bugs and clean up code according to Morten Rasmussen, Mike > > Galbraith and Namhyung Kim. Thanks! > > b, take Morten's suggestion to set different criteria for different > > policy in small task packing. > > c, shorter latency in power aware scheduling. > > > > V3 change: > > a, engaged nr_running in max potential utils consideration in periodic > > power balancing. > > b, try exec/wake small tasks on running cpu not idle cpu. > > > > V2 change: > > a, add lazy power scheduling to deal with kbuild like benchmark. > > > > > > Thanks Fengguang Wu for the build testing of this patchset! > > > Add some testing report summary that were posted: > Alex Shi tested the benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, > hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads > loopback netperf. on core2, nhm, wsm, snb, platforms: > a, no clear performance change on performance balance > b, specjbb2005 drop 5~7% on balance/powersaving policy on SNB/NHM > platforms; hackbench drop 30~70% SNB EP4S machine. > c, no other peformance change on balance/powersaving machine. > > test result from Mike Galbraith: > - > With aim7 compute on 4 node 40 core box, I see stable throughput > improvement at tasks = nr_cores and below w. balance and powersaving. > > 3.8.0-performance 3.8.0-balance > 3.8.0-powersaving > Tasksjobs/min/task cpu jobs/min/task cpujobs/min/task > cpu > 1 432.8571 3.99433.4764 3.97 433.1665 > 3.98 > 5 480.1902 12.49510.9612 7.55 497.5369 > 8.22 >10 429.1785 40.14533.4507 11.13 518.3918 > 12.15 >20 424.3697 63.14529.7203 23.72 528.7958 > 22.08 >40 419.0871171.42500.8264 51.44 517.0648 > 42.45 > > No deltas after that. There were also no deltas between patched kernel > using performance policy and virgin source. > -- > > Ingo, I appreciate for any comments from you. :) Have you tried to quantify the actual real or expected power savings with the knob enabled? I'd also love to have an automatic policy here, with a knob that has 3 values: 0: always disabled 1: automatic 2: always enabled here enabled/disabled is your current knob's functionality, and those can also be used
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
* Alex Shi alex@intel.com wrote: On 01/24/2013 11:06 AM, Alex Shi wrote: Since the runnable info needs 345ms to accumulate, balancing doesn't do well for many tasks burst waking. After talking with Mike Galbraith, we are agree to just use runnable avg in power friendly scheduling and keep current instant load in performance scheduling for low latency. So the biggest change in this version is removing runnable load avg in balance and just using runnable data in power balance. The patchset bases on Linus' tree, includes 3 parts, ** 1, bug fix and fork/wake balancing clean up. patch 1~5, -- the first patch remove one domain level. patch 2~5 simplified fork/wake balancing, it can increase 10+% hackbench performance on our 4 sockets SNB EP machine. V3 change: a, added the first patch to remove one domain level on x86 platform. b, some small changes according to Namhyung Kim's comments, thanks! ** 2, bug fix of load avg and remove the CONFIG_FAIR_GROUP_SCHED limit -- patch 6~8, That using runnable avg in load balancing, with two initial runnable variables fix. V4 change: a, remove runnable log avg using in balancing. V3 change: a, use rq-cfs.runnable_load_avg as cpu load not rq-avg.load_avg_contrib, since the latter need much time to accumulate for new forked task, b, a build issue fixed with Namhyung Kim's reminder. ** 3, power awareness scheduling, patch 9~18. -- The subset implement/consummate the rough power aware scheduling proposal: https://lkml.org/lkml/2012/8/13/139. It defines 2 new power aware policy 'balance' and 'powersaving' and then try to spread or pack tasks on each sched groups level according the different scheduler policy. That can save much power when task number in system is no more then LCPU number. As mentioned in the power aware scheduler proposal, Power aware scheduling has 2 assumptions: 1, race to idle is helpful for power saving 2, pack tasks on less sched_groups will reduce power consumption The first assumption make performance policy take over scheduling when system busy. The second assumption make power aware scheduling try to move disperse tasks into fewer groups until that groups are full of tasks. Some power testing data is in the last 2 patches. V4 change: a, fix few bugs and clean up code according to Morten Rasmussen, Mike Galbraith and Namhyung Kim. Thanks! b, take Morten's suggestion to set different criteria for different policy in small task packing. c, shorter latency in power aware scheduling. V3 change: a, engaged nr_running in max potential utils consideration in periodic power balancing. b, try exec/wake small tasks on running cpu not idle cpu. V2 change: a, add lazy power scheduling to deal with kbuild like benchmark. Thanks Fengguang Wu for the build testing of this patchset! Add some testing report summary that were posted: Alex Shi tested the benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on core2, nhm, wsm, snb, platforms: a, no clear performance change on performance balance b, specjbb2005 drop 5~7% on balance/powersaving policy on SNB/NHM platforms; hackbench drop 30~70% SNB EP4S machine. c, no other peformance change on balance/powersaving machine. test result from Mike Galbraith: - With aim7 compute on 4 node 40 core box, I see stable throughput improvement at tasks = nr_cores and below w. balance and powersaving. 3.8.0-performance 3.8.0-balance 3.8.0-powersaving Tasksjobs/min/task cpu jobs/min/task cpujobs/min/task cpu 1 432.8571 3.99433.4764 3.97 433.1665 3.98 5 480.1902 12.49510.9612 7.55 497.5369 8.22 10 429.1785 40.14533.4507 11.13 518.3918 12.15 20 424.3697 63.14529.7203 23.72 528.7958 22.08 40 419.0871171.42500.8264 51.44 517.0648 42.45 No deltas after that. There were also no deltas between patched kernel using performance policy and virgin source. -- Ingo, I appreciate for any comments from you. :) Have you tried to quantify the actual real or expected power savings with the knob enabled? I'd also love to have an automatic policy here, with a knob that has 3 values: 0: always disabled 1: automatic 2: always enabled here enabled/disabled is your current knob's functionality, and those can also be used by user-space policy daemons/handlers. The interesting thing would be '1' which should be the default: on laptops that are on battery it should result in a power
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
Ingo, I appreciate for any comments from you. :) Have you tried to quantify the actual real or expected power savings with the knob enabled? Thanks a lot for your comments! :) Yes, the following power data copied form patch 17th: --- A test can show the effort on different policy: for ((i = 0; i I; i++)) ; do while true; do :; done done On my SNB laptop with 4core* HT: the data is Watts powersaving balance performance i = 2 40 54 54 i = 4 57 64* 68 i = 8 68 68 68 Note: When i = 4 with balance policy, the power may change in 57~68Watt, since the HT capacity and core capacity are both 1. on SNB EP machine with 2 sockets * 8 cores * HT: powersaving balance performance i = 4 190 201 238 i = 8 205 241 268 i = 16 271 348 376 If system has few continued tasks, use power policy can get the performance/power gain. Like sysbench fileio randrw test with 16 thread on the SNB EP box. = and the following from patch 18th --- On my SNB EP 2 sockets machine with 8 cores * HT: 'make -j x vmlinux' results: powersaving balance performance x = 1175.603 /417 13 175.220 /416 13176.073 /407 13 x = 2192.215 /218 23 194.522 /202 25217.393 /200 23 x = 4205.226 /124 39 208.823 /114 42230.425 /105 41 x = 8236.369 /71 59 249.005 /65 61 257.661 /62 62 x = 16 283.842 /48 73 307.465 /40 81 309.336 /39 82 x = 32 325.197 /32 96 333.503 /32 93 336.138 /32 92 data explains: 175.603 /417 13 175.603: average Watts 417: seconds(compile time) 13: scaled performance/power = 100 / seconds / watts = some data for parallel compress: https://lkml.org/lkml/2012/12/11/155 --- Another testing of parallel compress with pigz on Linus' git tree. results show we get much better performance/power with powersaving and balance policy: testing command: #pigz -k -c -p$x -r linux* /dev/null On a NHM EP box powersaving balance performance x = 4166.516 /88 68 170.515 /82 71 165.283 /103 58 x = 8173.654 /61 94 177.693 /60 93 172.31 /76 76 On a 2 sockets SNB EP box. powersaving balance performance x = 4190.995 /149 35 200.6 /129 38 208.561 /135 35 x = 8197.969 /108 46 208.885 /103 46213.96 /108 43 x = 16 205.163 /76 64 212.144 /91 51 229.287 /97 44 data format is: 166.516 /88 68 166.516: average Watts 88: seconds(compress time) 68: scaled performance/power = 100 / time / power = BTW, bltk-game with openarena dropped 0.3/1.5 Watt on powersaving policy or 0.2/0.5 Watt on balance policy on my laptop wsm/snb; I'd also love to have an automatic policy here, with a knob that has 3 values: 0: always disabled 1: automatic 2: always enabled here enabled/disabled is your current knob's functionality, and those can also be used by user-space policy daemons/handlers. Sure, this patch has a knob for user-space policy selecting, $cat /sys/devices/system/cpu/sched_policy/available_sched_policy performance powersaving balance User can change the policy by commend 'echo': echo performance /sys/devices/system/cpu/current_sched_policy The 'performance' policy means 'always disabled' power friendly scheduling. The 'balance/powersaving' is automatic power friendly scheduling, since system will auto bypass power scheduling when cpus utilisation in a sched domain is beyond the domain's cpu weight (powersaving) or beyond the domain's capacity (balance). There is no always enabled power scheduling, since the patchset bases on 'race to idle'. but it's easy to add this function if needed. The interesting thing would be '1' which should be the default: on laptops that are on battery it should result in a power saving policy, on laptops that are on AC or on battery-less systems it should mean 'performance' policy. yes, with above sysfs interface it is easy to be done. :) It should generally default to 'performance', switching to 'power saving on' only if there's positive, reliable information somewhere in the kernel that we are operating on battery power. A callback or two would have to go into the ACPI battery driver I suspect. So I'd like this feature to be a tangible improvement for laptop users (as long as the laptop hardware is passing us battery/AC events reliably). Maybe it is better to let system admin change it from user space? I am not sure some one like to enable a call back in ACPI battery driver? CC to Zhang Rui. Or something like that - with .config switches to influence these values as
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/24/2013 11:06 AM, Alex Shi wrote: > Since the runnable info needs 345ms to accumulate, balancing > doesn't do well for many tasks burst waking. After talking with Mike > Galbraith, we are agree to just use runnable avg in power friendly > scheduling and keep current instant load in performance scheduling for > low latency. > > So the biggest change in this version is removing runnable load avg in > balance and just using runnable data in power balance. > > The patchset bases on Linus' tree, includes 3 parts, > ** 1, bug fix and fork/wake balancing clean up. patch 1~5, > -- > the first patch remove one domain level. patch 2~5 simplified fork/wake > balancing, it can increase 10+% hackbench performance on our 4 sockets > SNB EP machine. > > V3 change: > a, added the first patch to remove one domain level on x86 platform. > b, some small changes according to Namhyung Kim's comments, thanks! > > ** 2, bug fix of load avg and remove the CONFIG_FAIR_GROUP_SCHED limit > -- > patch 6~8, That using runnable avg in load balancing, with > two initial runnable variables fix. > > V4 change: > a, remove runnable log avg using in balancing. > > V3 change: > a, use rq->cfs.runnable_load_avg as cpu load not > rq->avg.load_avg_contrib, since the latter need much time to accumulate > for new forked task, > b, a build issue fixed with Namhyung Kim's reminder. > > ** 3, power awareness scheduling, patch 9~18. > -- > The subset implement/consummate the rough power aware scheduling > proposal: https://lkml.org/lkml/2012/8/13/139. > It defines 2 new power aware policy 'balance' and 'powersaving' and then > try to spread or pack tasks on each sched groups level according the > different scheduler policy. That can save much power when task number in > system is no more then LCPU number. > > As mentioned in the power aware scheduler proposal, Power aware > scheduling has 2 assumptions: > 1, race to idle is helpful for power saving > 2, pack tasks on less sched_groups will reduce power consumption > > The first assumption make performance policy take over scheduling when > system busy. > The second assumption make power aware scheduling try to move > disperse tasks into fewer groups until that groups are full of tasks. > > Some power testing data is in the last 2 patches. > > V4 change: > a, fix few bugs and clean up code according to Morten Rasmussen, Mike > Galbraith and Namhyung Kim. Thanks! > b, take Morten's suggestion to set different criteria for different > policy in small task packing. > c, shorter latency in power aware scheduling. > > V3 change: > a, engaged nr_running in max potential utils consideration in periodic > power balancing. > b, try exec/wake small tasks on running cpu not idle cpu. > > V2 change: > a, add lazy power scheduling to deal with kbuild like benchmark. > > > Thanks Fengguang Wu for the build testing of this patchset! Add some testing report summary that were posted: Alex Shi tested the benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on core2, nhm, wsm, snb, platforms: a, no clear performance change on performance balance b, specjbb2005 drop 5~7% on balance/powersaving policy on SNB/NHM platforms; hackbench drop 30~70% SNB EP4S machine. c, no other peformance change on balance/powersaving machine. test result from Mike Galbraith: - With aim7 compute on 4 node 40 core box, I see stable throughput improvement at tasks = nr_cores and below w. balance and powersaving. 3.8.0-performance 3.8.0-balance 3.8.0-powersaving Tasksjobs/min/task cpu jobs/min/task cpujobs/min/task cpu 1 432.8571 3.99433.4764 3.97 433.1665 3.98 5 480.1902 12.49510.9612 7.55 497.5369 8.22 10 429.1785 40.14533.4507 11.13 518.3918 12.15 20 424.3697 63.14529.7203 23.72 528.7958 22.08 40 419.0871171.42500.8264 51.44 517.0648 42.45 No deltas after that. There were also no deltas between patched kernel using performance policy and virgin source. -- Ingo, I appreciate for any comments from you. :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/24/2013 11:06 AM, Alex Shi wrote: Since the runnable info needs 345ms to accumulate, balancing doesn't do well for many tasks burst waking. After talking with Mike Galbraith, we are agree to just use runnable avg in power friendly scheduling and keep current instant load in performance scheduling for low latency. So the biggest change in this version is removing runnable load avg in balance and just using runnable data in power balance. The patchset bases on Linus' tree, includes 3 parts, ** 1, bug fix and fork/wake balancing clean up. patch 1~5, -- the first patch remove one domain level. patch 2~5 simplified fork/wake balancing, it can increase 10+% hackbench performance on our 4 sockets SNB EP machine. V3 change: a, added the first patch to remove one domain level on x86 platform. b, some small changes according to Namhyung Kim's comments, thanks! ** 2, bug fix of load avg and remove the CONFIG_FAIR_GROUP_SCHED limit -- patch 6~8, That using runnable avg in load balancing, with two initial runnable variables fix. V4 change: a, remove runnable log avg using in balancing. V3 change: a, use rq-cfs.runnable_load_avg as cpu load not rq-avg.load_avg_contrib, since the latter need much time to accumulate for new forked task, b, a build issue fixed with Namhyung Kim's reminder. ** 3, power awareness scheduling, patch 9~18. -- The subset implement/consummate the rough power aware scheduling proposal: https://lkml.org/lkml/2012/8/13/139. It defines 2 new power aware policy 'balance' and 'powersaving' and then try to spread or pack tasks on each sched groups level according the different scheduler policy. That can save much power when task number in system is no more then LCPU number. As mentioned in the power aware scheduler proposal, Power aware scheduling has 2 assumptions: 1, race to idle is helpful for power saving 2, pack tasks on less sched_groups will reduce power consumption The first assumption make performance policy take over scheduling when system busy. The second assumption make power aware scheduling try to move disperse tasks into fewer groups until that groups are full of tasks. Some power testing data is in the last 2 patches. V4 change: a, fix few bugs and clean up code according to Morten Rasmussen, Mike Galbraith and Namhyung Kim. Thanks! b, take Morten's suggestion to set different criteria for different policy in small task packing. c, shorter latency in power aware scheduling. V3 change: a, engaged nr_running in max potential utils consideration in periodic power balancing. b, try exec/wake small tasks on running cpu not idle cpu. V2 change: a, add lazy power scheduling to deal with kbuild like benchmark. Thanks Fengguang Wu for the build testing of this patchset! Add some testing report summary that were posted: Alex Shi tested the benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on core2, nhm, wsm, snb, platforms: a, no clear performance change on performance balance b, specjbb2005 drop 5~7% on balance/powersaving policy on SNB/NHM platforms; hackbench drop 30~70% SNB EP4S machine. c, no other peformance change on balance/powersaving machine. test result from Mike Galbraith: - With aim7 compute on 4 node 40 core box, I see stable throughput improvement at tasks = nr_cores and below w. balance and powersaving. 3.8.0-performance 3.8.0-balance 3.8.0-powersaving Tasksjobs/min/task cpu jobs/min/task cpujobs/min/task cpu 1 432.8571 3.99433.4764 3.97 433.1665 3.98 5 480.1902 12.49510.9612 7.55 497.5369 8.22 10 429.1785 40.14533.4507 11.13 518.3918 12.15 20 424.3697 63.14529.7203 23.72 528.7958 22.08 40 419.0871171.42500.8264 51.44 517.0648 42.45 No deltas after that. There were also no deltas between patched kernel using performance policy and virgin source. -- Ingo, I appreciate for any comments from you. :) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/28/2013 01:19 PM, Alex Shi wrote: > On 01/27/2013 06:40 PM, Borislav Petkov wrote: >> On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote: >>> Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, >>> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads >>> loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear >>> performance change found. >> >> Ok, good, You could put that in one of the commit messages so that it is >> there and people know that this patchset doesn't cause perf regressions >> with the bunch of benchmarks. >> >>> I also tested balance policy/powersaving policy with above benchmark, >>> found, the specjbb2005 drop much 30~50% on both of policy whenever >>> with openjdk or jrockit. and hackbench drops a lots with powersaving >>> policy on snb 4 sockets platforms. others has no clear change. Sorry, the testing configuration is unfair for this specjbb2005 results here. I set JVM hard pin and use hugepage for peak performance. When remove the hard pin and no hugepage, the balance/powersaving both drop about 5% VS performance policy, and performance policy result is similar with 3.8-rc5. >> >> I guess this is expected because there has to be some performance hit >> when saving power... >> > > BTW, I had tested the v3 version based on sched numa -- on tip/master. > The specjbb just has about 5~7% dropping on balance/powersaving policy. > The power scheduling done after the numa scheduling logical. > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Tue, 2013-01-29 at 09:45 +0800, Alex Shi wrote: > On 01/28/2013 11:47 PM, Mike Galbraith wrote: > > monteverdi:/abuild/mike/:[0]# echo 1 > /sys/devices/system/cpu/cpufreq/boost > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > > 014635 00058160 > > 014633 00058592 > > 014638 00058592 > > 014636 00058160 > > 014632 00058200 > > 014634 00058704 > > 014639 00058704 > > 014641 00058200 > > 014640 00058560 > > 014637 00058560 > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > > 014673 00059504 > > 014676 00059504 > > 014674 00059064 > > 014672 00059064 > > 014675 00058560 > > 014671 00058560 > > 014677 00059248 > > 014668 00058864 > > 014669 00059248 > > 014670 00058864 > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > > 014686 00043472 > > 014689 00043472 > > 014685 00043760 > > 014690 00043760 > > 014687 00043528 > > 014688 00043528 (hmm) > > 014683 00043216 > > 014692 00043208 > > 014684 00043336 > > 014691 00043336 > > I am sorry Mike. does above 3 times testing has a same sched policy? and > same question for the following testing. Yeah, they're back to back repeats. Using dirt simple massive_intr didn't help clarify aim7 oddity. aim7 is fully repeatable, seems to be saying that consolidation of small independent jobs is a win, that spreading before fully saturated has its price, just as consolidation of large coordinated burst has its price. Seems to cut both ways.. but why not, everything else does. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/28/2013 11:47 PM, Mike Galbraith wrote: > 014776 00059528 > > Ok box, whatever blows your skirt up. I'm done. Many thanks for so much fruitful testing! :D -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/28/2013 11:47 PM, Mike Galbraith wrote: > On Mon, 2013-01-28 at 06:17 +0100, Mike Galbraith wrote: > > Ok damnit. > >> monteverdi:/abuild/mike/:[0]# echo powersaving > >> /sys/devices/system/cpu/sched_policy/current_sched_policy >> monteverdi:/abuild/mike/:[0]# massive_intr 10 60 >> 043321 00058616 >> 043313 00058616 >> 043318 00058968 >> 043317 00058968 >> 043316 00059184 >> 043319 00059192 >> 043320 00059048 >> 043314 00059048 >> 043312 00058176 >> 043315 00058184 > > That was boost if you like, and free to roam 4 nodes. > > monteverdi:/abuild/mike/:[0]# echo powersaving > > /sys/devices/system/cpu/sched_policy/current_sched_policy > monteverdi:/abuild/mike/:[0]# echo 0 > /sys/devices/system/cpu/cpufreq/boost > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > 014618 00039616 > 014623 00039256 > 014617 00039256 > 014620 00039304 > 014621 00039304 (wait a minute, you said..) > 014616 00039080 > 014625 00039064 > 014622 00039672 > 014624 00039624 > 014619 00039672 > monteverdi:/abuild/mike/:[0]# echo 1 > /sys/devices/system/cpu/cpufreq/boost > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > 014635 00058160 > 014633 00058592 > 014638 00058592 > 014636 00058160 > 014632 00058200 > 014634 00058704 > 014639 00058704 > 014641 00058200 > 014640 00058560 > 014637 00058560 > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > 014673 00059504 > 014676 00059504 > 014674 00059064 > 014672 00059064 > 014675 00058560 > 014671 00058560 > 014677 00059248 > 014668 00058864 > 014669 00059248 > 014670 00058864 > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > 014686 00043472 > 014689 00043472 > 014685 00043760 > 014690 00043760 > 014687 00043528 > 014688 00043528 (hmm) > 014683 00043216 > 014692 00043208 > 014684 00043336 > 014691 00043336 I am sorry Mike. does above 3 times testing has a same sched policy? and same question for the following testing. > monteverdi:/abuild/mike/:[0]# echo 0 > /sys/devices/system/cpu/cpufreq/boost > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > 014701 00039344 > 014707 00039344 > 014709 00038976 > 014700 00038976 > 014708 00039256 (hmm) > 014703 00039256 > 014705 00039400 > 014704 00039400 > 014706 00039320 > 014702 00039320 > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > 014713 00058552 > 014716 00058664 > 014719 00058600 > 014715 00058600 > 014718 00058520 > 014722 00058400 > 014721 00058768 > 014717 00058768 > 014714 00058552 > 014720 00058560 > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > 014732 00058736 > 014734 00058760 > 014729 00040872 > 014736 00059184 > 014728 00059184 > 014727 00058744 > 014733 00058760 > 014731 00059320 > 014730 00059280 > 014735 00041072 > monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 > 014749 00040608 > 014748 00040616 > 014745 00039360 > 014750 00039360 > 014751 00039416 > 014747 00039416 > 014752 00039336 > 014746 00039336 > 014744 00039480 > 014753 00039480 > monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 > 014757 00039272 > 014761 00039272 > 014765 00039528 > 014756 00039528 > 014759 00039352 > 014760 00039352 > 014764 00039248 > 014762 00039248 > 014758 00039352 > 014763 00039352 > monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 > 014773 00059680 > 014769 00059680 > 014768 00059144 > 014777 00059144 > 014775 00059688 > 014774 00059688 > 014770 00059264 > 014771 00059264 > 014772 00059528 > 014776 00059528 > > Ok box, whatever blows your skirt up. I'm done. > > Non > Uniform > Mysterious > Artifacts > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/28/2013 11:55 PM, Mike Galbraith wrote: > On Mon, 2013-01-28 at 16:22 +0100, Borislav Petkov wrote: >> On Mon, Jan 28, 2013 at 12:40:46PM +0100, Mike Galbraith wrote: No no, that's not restricted to one node. It's just overloaded because I turned balancing off at the NODE domain level. >>> >>> Which shows only that I was multitasking, and in a rush. Boy was that >>> dumb. Hohum. >> >> Ok, let's take a step back and slow it down a bit so that people like me >> can understand it: you want to try it with disabled load balancing on >> the node level, AFAICT. But with that many tasks, perf will suck anyway, >> no? Unless you want to benchmark the numa-aware aspect and see whether >> load balancing on the node level feels differently, perf-wise? > > The broken thought was, since it's not wakeup path, stop node balance.. > but killing all of it killed FORK/EXEC balance, oops. Um. sure. so guess all of tasks just running on one node. > > I think I'm done with this thing though. See mail I just sent. There > are better things to do than letting box jerk my chain endlessly ;-) > > -Mike > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
> Benchmark Version Machine Run Date > AIM Multiuser Benchmark - Suite VII "1.1" performance Jan 28 > 08:09:20 2013 > > Tasks Jobs/MinJTI RealCPU Jobs/sec/task > 1 438.8 100 13.83.8 7.3135 > 5 2634.8 99 11.57.2 8.7826 > 10 5396.3 99 11.211.48.9938 > 20 10725.7 99 11.324.08.9381 > 40 20183.2 99 12.038.58.4097 > 80 35620.9 99 13.671.47.4210 > 160 57203.5 98 16.9137.8 5.9587 > 320 81995.8 98 23.7271.3 4.2706 > > then the above no_node-load_balance thing suffers a small-ish dip at 320 > tasks, yeah. > > And AFAICR, the effect of disabling boosting will be visible in the > small count tasks cases anyway because if you saturate the cores with > tasks, the boosting algorithms tend to get the box out of boosting for > the simple reason that the power/perf headroom simply disappears due to > the SOC being busy. Sure. and according to the context of serial email. guess this result has boosting enabled, right? > >> 640 100294.898 38.7570.9 2.6118 >> 1280115998.297 66.91132.8 1.5104 >> 2560125820.097 123.3 2256.6 0.8191 > > I dunno about those. maybe this is expected with so many tasks or do we > want to optimize that case further? > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
>> then the above no_node-load_balance thing suffers a small-ish dip at 320 >> tasks, yeah. > > No no, that's not restricted to one node. It's just overloaded because > I turned balancing off at the NODE domain level. > >> And AFAICR, the effect of disabling boosting will be visible in the >> small count tasks cases anyway because if you saturate the cores with >> tasks, the boosting algorithms tend to get the box out of boosting for >> the simple reason that the power/perf headroom simply disappears due to >> the SOC being busy. >> >>> 640 100294.898 38.7570.9 2.6118 >>> 1280115998.297 66.91132.8 1.5104 >>> 2560125820.097 123.3 2256.6 0.8191 >> >> I dunno about those. maybe this is expected with so many tasks or do we >> want to optimize that case further? > > When using all 4 nodes properly, that's still scaling. Here, I Without node regular balancing, only waking balance left in select_task_rq_fair for aim7 testing, (I just assume you used shared workfile, most of testing is cpu density and only few exec/fork load). Since, waking balance just happened in same llc domain. guess that is the reason for this. > intentionally screwed up balancing to watch the low end. High end is > expected wreckage. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/28/2013 02:42 PM, Mike Galbraith wrote: > Back to original 1ms sleep, 8ms work, turning NUMA box into a single > node 10 core box with numactl. > > monteverdi:/abuild/mike/:[0]# echo powersaving > > /sys/devices/system/cpu/sched_policy/current_sched_policy > monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 > 045286 00043872 > 045289 00043464 > 045284 00043488 > 045287 00043440 > 045283 00043416 > 045281 00044456 > 045285 00043456 > 045288 00044312 > 045280 00043048 > 045282 00043240 Um, no idea why the powersaving data is so low. > monteverdi:/abuild/mike/:[0]# echo balance > > /sys/devices/system/cpu/sched_policy/current_sched_policy > monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 > 045300 00052536 > 045307 00052472 > 045304 00052536 > 045299 00052536 > 045305 00052520 > 045306 00052528 > 045302 00052528 > 045303 00052528 > 045308 00052512 > 045301 00052520 > monteverdi:/abuild/mike/:[0]# echo performance > > /sys/devices/system/cpu/sched_policy/current_sched_policy > monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 > 045339 00052600 > 045340 00052608 > 045338 00052600 > 045337 00052608 > 045343 00052600 > 045341 00052600 > 045336 00052608 > 045335 00052616 > 045334 00052576 -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 16:22 +0100, Borislav Petkov wrote: > On Mon, Jan 28, 2013 at 12:40:46PM +0100, Mike Galbraith wrote: > > > No no, that's not restricted to one node. It's just overloaded because > > > I turned balancing off at the NODE domain level. > > > > Which shows only that I was multitasking, and in a rush. Boy was that > > dumb. Hohum. > > Ok, let's take a step back and slow it down a bit so that people like me > can understand it: you want to try it with disabled load balancing on > the node level, AFAICT. But with that many tasks, perf will suck anyway, > no? Unless you want to benchmark the numa-aware aspect and see whether > load balancing on the node level feels differently, perf-wise? The broken thought was, since it's not wakeup path, stop node balance.. but killing all of it killed FORK/EXEC balance, oops. I think I'm done with this thing though. See mail I just sent. There are better things to do than letting box jerk my chain endlessly ;-) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 06:17 +0100, Mike Galbraith wrote: Ok damnit. > monteverdi:/abuild/mike/:[0]# echo powersaving > > /sys/devices/system/cpu/sched_policy/current_sched_policy > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > 043321 00058616 > 043313 00058616 > 043318 00058968 > 043317 00058968 > 043316 00059184 > 043319 00059192 > 043320 00059048 > 043314 00059048 > 043312 00058176 > 043315 00058184 That was boost if you like, and free to roam 4 nodes. monteverdi:/abuild/mike/:[0]# echo powersaving > /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# echo 0 > /sys/devices/system/cpu/cpufreq/boost monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014618 00039616 014623 00039256 014617 00039256 014620 00039304 014621 00039304 (wait a minute, you said..) 014616 00039080 014625 00039064 014622 00039672 014624 00039624 014619 00039672 monteverdi:/abuild/mike/:[0]# echo 1 > /sys/devices/system/cpu/cpufreq/boost monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014635 00058160 014633 00058592 014638 00058592 014636 00058160 014632 00058200 014634 00058704 014639 00058704 014641 00058200 014640 00058560 014637 00058560 monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014673 00059504 014676 00059504 014674 00059064 014672 00059064 014675 00058560 014671 00058560 014677 00059248 014668 00058864 014669 00059248 014670 00058864 monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014686 00043472 014689 00043472 014685 00043760 014690 00043760 014687 00043528 014688 00043528 (hmm) 014683 00043216 014692 00043208 014684 00043336 014691 00043336 monteverdi:/abuild/mike/:[0]# echo 0 > /sys/devices/system/cpu/cpufreq/boost monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014701 00039344 014707 00039344 014709 00038976 014700 00038976 014708 00039256 (hmm) 014703 00039256 014705 00039400 014704 00039400 014706 00039320 014702 00039320 monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014713 00058552 014716 00058664 014719 00058600 014715 00058600 014718 00058520 014722 00058400 014721 00058768 014717 00058768 014714 00058552 014720 00058560 monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014732 00058736 014734 00058760 014729 00040872 014736 00059184 014728 00059184 014727 00058744 014733 00058760 014731 00059320 014730 00059280 014735 00041072 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 014749 00040608 014748 00040616 014745 00039360 014750 00039360 014751 00039416 014747 00039416 014752 00039336 014746 00039336 014744 00039480 014753 00039480 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 014757 00039272 014761 00039272 014765 00039528 014756 00039528 014759 00039352 014760 00039352 014764 00039248 014762 00039248 014758 00039352 014763 00039352 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 014773 00059680 014769 00059680 014768 00059144 014777 00059144 014775 00059688 014774 00059688 014770 00059264 014771 00059264 014772 00059528 014776 00059528 Ok box, whatever blows your skirt up. I'm done. Non Uniform Mysterious Artifacts -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, Jan 28, 2013 at 12:40:46PM +0100, Mike Galbraith wrote: > > No no, that's not restricted to one node. It's just overloaded because > > I turned balancing off at the NODE domain level. > > Which shows only that I was multitasking, and in a rush. Boy was that > dumb. Hohum. Ok, let's take a step back and slow it down a bit so that people like me can understand it: you want to try it with disabled load balancing on the node level, AFAICT. But with that many tasks, perf will suck anyway, no? Unless you want to benchmark the numa-aware aspect and see whether load balancing on the node level feels differently, perf-wise? -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 12:32 +0100, Mike Galbraith wrote: > On Mon, 2013-01-28 at 12:29 +0100, Borislav Petkov wrote: > > On Mon, Jan 28, 2013 at 11:44:44AM +0100, Mike Galbraith wrote: > > > On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: > > > > On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote: > > > > > Zzzt. Wish I could turn turbo thingy off. > > > > > > > > Try setting /sys/devices/system/cpu/cpufreq/boost to 0. > > > > > > How convenient (test) works too. > > > > > > So much for turbo boost theory. Nothing changed until I turned load > > > balancing off at NODE. High end went to hell (gee), but low end... > > > > > > Benchmark Version Machine Run Date > > > AIM Multiuser Benchmark - Suite VII "1.1" > > > performance-no-node-load_balance Jan 28 11:20:12 2013 > > > > > > Tasks Jobs/MinJTI RealCPU Jobs/sec/task > > > 1 436.3 100 13.93.9 7.2714 > > > 5 2637.1 99 11.57.3 8.7903 > > > 10 5415.5 99 11.211.39.0259 > > > 20 10603.7 99 11.424.88.8364 > > > 40 20066.2 99 12.140.58.3609 > > > 80 35079.6 99 13.875.57.3082 > > > 160 55884.7 98 17.3145.6 5.8213 > > > 320 79345.3 98 24.4287.4 4.1326 > > > > If you're talking about those results from earlier: > > > > Benchmark Version Machine Run Date > > AIM Multiuser Benchmark - Suite VII "1.1" performance Jan 28 > > 08:09:20 2013 > > > > Tasks Jobs/MinJTI RealCPU Jobs/sec/task > > 1 438.8 100 13.83.8 7.3135 > > 5 2634.8 99 11.57.2 8.7826 > > 10 5396.3 99 11.211.48.9938 > > 20 10725.7 99 11.324.08.9381 > > 40 20183.2 99 12.038.58.4097 > > 80 35620.9 99 13.671.47.4210 > > 160 57203.5 98 16.9137.8 5.9587 > > 320 81995.8 98 23.7271.3 4.2706 > > > > then the above no_node-load_balance thing suffers a small-ish dip at 320 > > tasks, yeah. > > No no, that's not restricted to one node. It's just overloaded because > I turned balancing off at the NODE domain level. Which shows only that I was multitasking, and in a rush. Boy was that dumb. Hohum. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 12:29 +0100, Borislav Petkov wrote: > On Mon, Jan 28, 2013 at 11:44:44AM +0100, Mike Galbraith wrote: > > On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: > > > On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote: > > > > Zzzt. Wish I could turn turbo thingy off. > > > > > > Try setting /sys/devices/system/cpu/cpufreq/boost to 0. > > > > How convenient (test) works too. > > > > So much for turbo boost theory. Nothing changed until I turned load > > balancing off at NODE. High end went to hell (gee), but low end... > > > > Benchmark Version Machine Run Date > > AIM Multiuser Benchmark - Suite VII "1.1" > > performance-no-node-load_balance Jan 28 11:20:12 2013 > > > > Tasks Jobs/MinJTI RealCPU Jobs/sec/task > > 1 436.3 100 13.93.9 7.2714 > > 5 2637.1 99 11.57.3 8.7903 > > 10 5415.5 99 11.211.39.0259 > > 20 10603.7 99 11.424.88.8364 > > 40 20066.2 99 12.140.58.3609 > > 80 35079.6 99 13.875.57.3082 > > 160 55884.7 98 17.3145.6 5.8213 > > 320 79345.3 98 24.4287.4 4.1326 > > If you're talking about those results from earlier: > > Benchmark Version Machine Run Date > AIM Multiuser Benchmark - Suite VII "1.1" performance Jan 28 > 08:09:20 2013 > > Tasks Jobs/MinJTI RealCPU Jobs/sec/task > 1 438.8 100 13.83.8 7.3135 > 5 2634.8 99 11.57.2 8.7826 > 10 5396.3 99 11.211.48.9938 > 20 10725.7 99 11.324.08.9381 > 40 20183.2 99 12.038.58.4097 > 80 35620.9 99 13.671.47.4210 > 160 57203.5 98 16.9137.8 5.9587 > 320 81995.8 98 23.7271.3 4.2706 > > then the above no_node-load_balance thing suffers a small-ish dip at 320 > tasks, yeah. No no, that's not restricted to one node. It's just overloaded because I turned balancing off at the NODE domain level. > And AFAICR, the effect of disabling boosting will be visible in the > small count tasks cases anyway because if you saturate the cores with > tasks, the boosting algorithms tend to get the box out of boosting for > the simple reason that the power/perf headroom simply disappears due to > the SOC being busy. > > > 640 100294.898 38.7570.9 2.6118 > > 1280115998.297 66.91132.8 1.5104 > > 2560125820.097 123.3 2256.6 0.8191 > > I dunno about those. maybe this is expected with so many tasks or do we > want to optimize that case further? When using all 4 nodes properly, that's still scaling. Here, I intentionally screwed up balancing to watch the low end. High end is expected wreckage. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, Jan 28, 2013 at 11:44:44AM +0100, Mike Galbraith wrote: > On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: > > On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote: > > > Zzzt. Wish I could turn turbo thingy off. > > > > Try setting /sys/devices/system/cpu/cpufreq/boost to 0. > > How convenient (test) works too. > > So much for turbo boost theory. Nothing changed until I turned load > balancing off at NODE. High end went to hell (gee), but low end... > > Benchmark Version Machine Run Date > AIM Multiuser Benchmark - Suite VII "1.1" > performance-no-node-load_balance Jan 28 11:20:12 2013 > > Tasks Jobs/MinJTI RealCPU Jobs/sec/task > 1 436.3 100 13.93.9 7.2714 > 5 2637.1 99 11.57.3 8.7903 > 10 5415.5 99 11.211.39.0259 > 20 10603.7 99 11.424.88.8364 > 40 20066.2 99 12.140.58.3609 > 80 35079.6 99 13.875.57.3082 > 160 55884.7 98 17.3145.6 5.8213 > 320 79345.3 98 24.4287.4 4.1326 If you're talking about those results from earlier: Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII "1.1" performance Jan 28 08:09:20 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 438.8 100 13.83.8 7.3135 5 2634.8 99 11.57.2 8.7826 10 5396.3 99 11.211.48.9938 20 10725.7 99 11.324.08.9381 40 20183.2 99 12.038.58.4097 80 35620.9 99 13.671.47.4210 160 57203.5 98 16.9137.8 5.9587 320 81995.8 98 23.7271.3 4.2706 then the above no_node-load_balance thing suffers a small-ish dip at 320 tasks, yeah. And AFAICR, the effect of disabling boosting will be visible in the small count tasks cases anyway because if you saturate the cores with tasks, the boosting algorithms tend to get the box out of boosting for the simple reason that the power/perf headroom simply disappears due to the SOC being busy. > 640 100294.898 38.7570.9 2.6118 > 1280115998.297 66.91132.8 1.5104 > 2560125820.097 123.3 2256.6 0.8191 I dunno about those. maybe this is expected with so many tasks or do we want to optimize that case further? -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: > On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote: > > Zzzt. Wish I could turn turbo thingy off. > > Try setting /sys/devices/system/cpu/cpufreq/boost to 0. How convenient (test) works too. So much for turbo boost theory. Nothing changed until I turned load balancing off at NODE. High end went to hell (gee), but low end... Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII "1.1" performance-no-node-load_balance Jan 28 11:20:12 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 436.3 100 13.93.9 7.2714 5 2637.1 99 11.57.3 8.7903 10 5415.5 99 11.211.39.0259 20 10603.7 99 11.424.88.8364 40 20066.2 99 12.140.58.3609 80 35079.6 99 13.875.57.3082 160 55884.7 98 17.3145.6 5.8213 320 79345.3 98 24.4287.4 4.1326 640 100294.898 38.7570.9 2.6118 1280115998.297 66.91132.8 1.5104 2560125820.097 123.3 2256.6 0.8191 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote: > Zzzt. Wish I could turn turbo thingy off. Try setting /sys/devices/system/cpu/cpufreq/boost to 0. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote: Zzzt. Wish I could turn turbo thingy off. Try setting /sys/devices/system/cpu/cpufreq/boost to 0. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote: Zzzt. Wish I could turn turbo thingy off. Try setting /sys/devices/system/cpu/cpufreq/boost to 0. How convenient (test) works too. So much for turbo boost theory. Nothing changed until I turned load balancing off at NODE. High end went to hell (gee), but low end... Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII 1.1 performance-no-node-load_balance Jan 28 11:20:12 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 436.3 100 13.93.9 7.2714 5 2637.1 99 11.57.3 8.7903 10 5415.5 99 11.211.39.0259 20 10603.7 99 11.424.88.8364 40 20066.2 99 12.140.58.3609 80 35079.6 99 13.875.57.3082 160 55884.7 98 17.3145.6 5.8213 320 79345.3 98 24.4287.4 4.1326 640 100294.898 38.7570.9 2.6118 1280115998.297 66.91132.8 1.5104 2560125820.097 123.3 2256.6 0.8191 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, Jan 28, 2013 at 11:44:44AM +0100, Mike Galbraith wrote: On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote: Zzzt. Wish I could turn turbo thingy off. Try setting /sys/devices/system/cpu/cpufreq/boost to 0. How convenient (test) works too. So much for turbo boost theory. Nothing changed until I turned load balancing off at NODE. High end went to hell (gee), but low end... Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII 1.1 performance-no-node-load_balance Jan 28 11:20:12 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 436.3 100 13.93.9 7.2714 5 2637.1 99 11.57.3 8.7903 10 5415.5 99 11.211.39.0259 20 10603.7 99 11.424.88.8364 40 20066.2 99 12.140.58.3609 80 35079.6 99 13.875.57.3082 160 55884.7 98 17.3145.6 5.8213 320 79345.3 98 24.4287.4 4.1326 If you're talking about those results from earlier: Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII 1.1 performance Jan 28 08:09:20 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 438.8 100 13.83.8 7.3135 5 2634.8 99 11.57.2 8.7826 10 5396.3 99 11.211.48.9938 20 10725.7 99 11.324.08.9381 40 20183.2 99 12.038.58.4097 80 35620.9 99 13.671.47.4210 160 57203.5 98 16.9137.8 5.9587 320 81995.8 98 23.7271.3 4.2706 then the above no_node-load_balance thing suffers a small-ish dip at 320 tasks, yeah. And AFAICR, the effect of disabling boosting will be visible in the small count tasks cases anyway because if you saturate the cores with tasks, the boosting algorithms tend to get the box out of boosting for the simple reason that the power/perf headroom simply disappears due to the SOC being busy. 640 100294.898 38.7570.9 2.6118 1280115998.297 66.91132.8 1.5104 2560125820.097 123.3 2256.6 0.8191 I dunno about those. maybe this is expected with so many tasks or do we want to optimize that case further? -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 12:29 +0100, Borislav Petkov wrote: On Mon, Jan 28, 2013 at 11:44:44AM +0100, Mike Galbraith wrote: On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote: Zzzt. Wish I could turn turbo thingy off. Try setting /sys/devices/system/cpu/cpufreq/boost to 0. How convenient (test) works too. So much for turbo boost theory. Nothing changed until I turned load balancing off at NODE. High end went to hell (gee), but low end... Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII 1.1 performance-no-node-load_balance Jan 28 11:20:12 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 436.3 100 13.93.9 7.2714 5 2637.1 99 11.57.3 8.7903 10 5415.5 99 11.211.39.0259 20 10603.7 99 11.424.88.8364 40 20066.2 99 12.140.58.3609 80 35079.6 99 13.875.57.3082 160 55884.7 98 17.3145.6 5.8213 320 79345.3 98 24.4287.4 4.1326 If you're talking about those results from earlier: Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII 1.1 performance Jan 28 08:09:20 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 438.8 100 13.83.8 7.3135 5 2634.8 99 11.57.2 8.7826 10 5396.3 99 11.211.48.9938 20 10725.7 99 11.324.08.9381 40 20183.2 99 12.038.58.4097 80 35620.9 99 13.671.47.4210 160 57203.5 98 16.9137.8 5.9587 320 81995.8 98 23.7271.3 4.2706 then the above no_node-load_balance thing suffers a small-ish dip at 320 tasks, yeah. No no, that's not restricted to one node. It's just overloaded because I turned balancing off at the NODE domain level. And AFAICR, the effect of disabling boosting will be visible in the small count tasks cases anyway because if you saturate the cores with tasks, the boosting algorithms tend to get the box out of boosting for the simple reason that the power/perf headroom simply disappears due to the SOC being busy. 640 100294.898 38.7570.9 2.6118 1280115998.297 66.91132.8 1.5104 2560125820.097 123.3 2256.6 0.8191 I dunno about those. maybe this is expected with so many tasks or do we want to optimize that case further? When using all 4 nodes properly, that's still scaling. Here, I intentionally screwed up balancing to watch the low end. High end is expected wreckage. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 12:32 +0100, Mike Galbraith wrote: On Mon, 2013-01-28 at 12:29 +0100, Borislav Petkov wrote: On Mon, Jan 28, 2013 at 11:44:44AM +0100, Mike Galbraith wrote: On Mon, 2013-01-28 at 10:55 +0100, Borislav Petkov wrote: On Mon, Jan 28, 2013 at 06:17:46AM +0100, Mike Galbraith wrote: Zzzt. Wish I could turn turbo thingy off. Try setting /sys/devices/system/cpu/cpufreq/boost to 0. How convenient (test) works too. So much for turbo boost theory. Nothing changed until I turned load balancing off at NODE. High end went to hell (gee), but low end... Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII 1.1 performance-no-node-load_balance Jan 28 11:20:12 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 436.3 100 13.93.9 7.2714 5 2637.1 99 11.57.3 8.7903 10 5415.5 99 11.211.39.0259 20 10603.7 99 11.424.88.8364 40 20066.2 99 12.140.58.3609 80 35079.6 99 13.875.57.3082 160 55884.7 98 17.3145.6 5.8213 320 79345.3 98 24.4287.4 4.1326 If you're talking about those results from earlier: Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII 1.1 performance Jan 28 08:09:20 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 438.8 100 13.83.8 7.3135 5 2634.8 99 11.57.2 8.7826 10 5396.3 99 11.211.48.9938 20 10725.7 99 11.324.08.9381 40 20183.2 99 12.038.58.4097 80 35620.9 99 13.671.47.4210 160 57203.5 98 16.9137.8 5.9587 320 81995.8 98 23.7271.3 4.2706 then the above no_node-load_balance thing suffers a small-ish dip at 320 tasks, yeah. No no, that's not restricted to one node. It's just overloaded because I turned balancing off at the NODE domain level. Which shows only that I was multitasking, and in a rush. Boy was that dumb. Hohum. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, Jan 28, 2013 at 12:40:46PM +0100, Mike Galbraith wrote: No no, that's not restricted to one node. It's just overloaded because I turned balancing off at the NODE domain level. Which shows only that I was multitasking, and in a rush. Boy was that dumb. Hohum. Ok, let's take a step back and slow it down a bit so that people like me can understand it: you want to try it with disabled load balancing on the node level, AFAICT. But with that many tasks, perf will suck anyway, no? Unless you want to benchmark the numa-aware aspect and see whether load balancing on the node level feels differently, perf-wise? -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 06:17 +0100, Mike Galbraith wrote: Ok damnit. monteverdi:/abuild/mike/:[0]# echo powersaving /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043321 00058616 043313 00058616 043318 00058968 043317 00058968 043316 00059184 043319 00059192 043320 00059048 043314 00059048 043312 00058176 043315 00058184 That was boost if you like, and free to roam 4 nodes. monteverdi:/abuild/mike/:[0]# echo powersaving /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# echo 0 /sys/devices/system/cpu/cpufreq/boost monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014618 00039616 014623 00039256 014617 00039256 014620 00039304 014621 00039304 (wait a minute, you said..) 014616 00039080 014625 00039064 014622 00039672 014624 00039624 014619 00039672 monteverdi:/abuild/mike/:[0]# echo 1 /sys/devices/system/cpu/cpufreq/boost monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014635 00058160 014633 00058592 014638 00058592 014636 00058160 014632 00058200 014634 00058704 014639 00058704 014641 00058200 014640 00058560 014637 00058560 monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014673 00059504 014676 00059504 014674 00059064 014672 00059064 014675 00058560 014671 00058560 014677 00059248 014668 00058864 014669 00059248 014670 00058864 monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014686 00043472 014689 00043472 014685 00043760 014690 00043760 014687 00043528 014688 00043528 (hmm) 014683 00043216 014692 00043208 014684 00043336 014691 00043336 monteverdi:/abuild/mike/:[0]# echo 0 /sys/devices/system/cpu/cpufreq/boost monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014701 00039344 014707 00039344 014709 00038976 014700 00038976 014708 00039256 (hmm) 014703 00039256 014705 00039400 014704 00039400 014706 00039320 014702 00039320 monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014713 00058552 014716 00058664 014719 00058600 014715 00058600 014718 00058520 014722 00058400 014721 00058768 014717 00058768 014714 00058552 014720 00058560 monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014732 00058736 014734 00058760 014729 00040872 014736 00059184 014728 00059184 014727 00058744 014733 00058760 014731 00059320 014730 00059280 014735 00041072 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 014749 00040608 014748 00040616 014745 00039360 014750 00039360 014751 00039416 014747 00039416 014752 00039336 014746 00039336 014744 00039480 014753 00039480 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 014757 00039272 014761 00039272 014765 00039528 014756 00039528 014759 00039352 014760 00039352 014764 00039248 014762 00039248 014758 00039352 014763 00039352 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 014773 00059680 014769 00059680 014768 00059144 014777 00059144 014775 00059688 014774 00059688 014770 00059264 014771 00059264 014772 00059528 014776 00059528 Ok box, whatever blows your skirt up. I'm done. Non Uniform Mysterious Artifacts -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 16:22 +0100, Borislav Petkov wrote: On Mon, Jan 28, 2013 at 12:40:46PM +0100, Mike Galbraith wrote: No no, that's not restricted to one node. It's just overloaded because I turned balancing off at the NODE domain level. Which shows only that I was multitasking, and in a rush. Boy was that dumb. Hohum. Ok, let's take a step back and slow it down a bit so that people like me can understand it: you want to try it with disabled load balancing on the node level, AFAICT. But with that many tasks, perf will suck anyway, no? Unless you want to benchmark the numa-aware aspect and see whether load balancing on the node level feels differently, perf-wise? The broken thought was, since it's not wakeup path, stop node balance.. but killing all of it killed FORK/EXEC balance, oops. I think I'm done with this thing though. See mail I just sent. There are better things to do than letting box jerk my chain endlessly ;-) -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/28/2013 02:42 PM, Mike Galbraith wrote: Back to original 1ms sleep, 8ms work, turning NUMA box into a single node 10 core box with numactl. monteverdi:/abuild/mike/:[0]# echo powersaving /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 045286 00043872 045289 00043464 045284 00043488 045287 00043440 045283 00043416 045281 00044456 045285 00043456 045288 00044312 045280 00043048 045282 00043240 Um, no idea why the powersaving data is so low. monteverdi:/abuild/mike/:[0]# echo balance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 045300 00052536 045307 00052472 045304 00052536 045299 00052536 045305 00052520 045306 00052528 045302 00052528 045303 00052528 045308 00052512 045301 00052520 monteverdi:/abuild/mike/:[0]# echo performance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 045339 00052600 045340 00052608 045338 00052600 045337 00052608 045343 00052600 045341 00052600 045336 00052608 045335 00052616 045334 00052576 -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
then the above no_node-load_balance thing suffers a small-ish dip at 320 tasks, yeah. No no, that's not restricted to one node. It's just overloaded because I turned balancing off at the NODE domain level. And AFAICR, the effect of disabling boosting will be visible in the small count tasks cases anyway because if you saturate the cores with tasks, the boosting algorithms tend to get the box out of boosting for the simple reason that the power/perf headroom simply disappears due to the SOC being busy. 640 100294.898 38.7570.9 2.6118 1280115998.297 66.91132.8 1.5104 2560125820.097 123.3 2256.6 0.8191 I dunno about those. maybe this is expected with so many tasks or do we want to optimize that case further? When using all 4 nodes properly, that's still scaling. Here, I Without node regular balancing, only waking balance left in select_task_rq_fair for aim7 testing, (I just assume you used shared workfile, most of testing is cpu density and only few exec/fork load). Since, waking balance just happened in same llc domain. guess that is the reason for this. intentionally screwed up balancing to watch the low end. High end is expected wreckage. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII 1.1 performance Jan 28 08:09:20 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 438.8 100 13.83.8 7.3135 5 2634.8 99 11.57.2 8.7826 10 5396.3 99 11.211.48.9938 20 10725.7 99 11.324.08.9381 40 20183.2 99 12.038.58.4097 80 35620.9 99 13.671.47.4210 160 57203.5 98 16.9137.8 5.9587 320 81995.8 98 23.7271.3 4.2706 then the above no_node-load_balance thing suffers a small-ish dip at 320 tasks, yeah. And AFAICR, the effect of disabling boosting will be visible in the small count tasks cases anyway because if you saturate the cores with tasks, the boosting algorithms tend to get the box out of boosting for the simple reason that the power/perf headroom simply disappears due to the SOC being busy. Sure. and according to the context of serial email. guess this result has boosting enabled, right? 640 100294.898 38.7570.9 2.6118 1280115998.297 66.91132.8 1.5104 2560125820.097 123.3 2256.6 0.8191 I dunno about those. maybe this is expected with so many tasks or do we want to optimize that case further? -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/28/2013 11:55 PM, Mike Galbraith wrote: On Mon, 2013-01-28 at 16:22 +0100, Borislav Petkov wrote: On Mon, Jan 28, 2013 at 12:40:46PM +0100, Mike Galbraith wrote: No no, that's not restricted to one node. It's just overloaded because I turned balancing off at the NODE domain level. Which shows only that I was multitasking, and in a rush. Boy was that dumb. Hohum. Ok, let's take a step back and slow it down a bit so that people like me can understand it: you want to try it with disabled load balancing on the node level, AFAICT. But with that many tasks, perf will suck anyway, no? Unless you want to benchmark the numa-aware aspect and see whether load balancing on the node level feels differently, perf-wise? The broken thought was, since it's not wakeup path, stop node balance.. but killing all of it killed FORK/EXEC balance, oops. Um. sure. so guess all of tasks just running on one node. I think I'm done with this thing though. See mail I just sent. There are better things to do than letting box jerk my chain endlessly ;-) -Mike -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/28/2013 11:47 PM, Mike Galbraith wrote: On Mon, 2013-01-28 at 06:17 +0100, Mike Galbraith wrote: Ok damnit. monteverdi:/abuild/mike/:[0]# echo powersaving /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043321 00058616 043313 00058616 043318 00058968 043317 00058968 043316 00059184 043319 00059192 043320 00059048 043314 00059048 043312 00058176 043315 00058184 That was boost if you like, and free to roam 4 nodes. monteverdi:/abuild/mike/:[0]# echo powersaving /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# echo 0 /sys/devices/system/cpu/cpufreq/boost monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014618 00039616 014623 00039256 014617 00039256 014620 00039304 014621 00039304 (wait a minute, you said..) 014616 00039080 014625 00039064 014622 00039672 014624 00039624 014619 00039672 monteverdi:/abuild/mike/:[0]# echo 1 /sys/devices/system/cpu/cpufreq/boost monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014635 00058160 014633 00058592 014638 00058592 014636 00058160 014632 00058200 014634 00058704 014639 00058704 014641 00058200 014640 00058560 014637 00058560 monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014673 00059504 014676 00059504 014674 00059064 014672 00059064 014675 00058560 014671 00058560 014677 00059248 014668 00058864 014669 00059248 014670 00058864 monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014686 00043472 014689 00043472 014685 00043760 014690 00043760 014687 00043528 014688 00043528 (hmm) 014683 00043216 014692 00043208 014684 00043336 014691 00043336 I am sorry Mike. does above 3 times testing has a same sched policy? and same question for the following testing. monteverdi:/abuild/mike/:[0]# echo 0 /sys/devices/system/cpu/cpufreq/boost monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014701 00039344 014707 00039344 014709 00038976 014700 00038976 014708 00039256 (hmm) 014703 00039256 014705 00039400 014704 00039400 014706 00039320 014702 00039320 monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014713 00058552 014716 00058664 014719 00058600 014715 00058600 014718 00058520 014722 00058400 014721 00058768 014717 00058768 014714 00058552 014720 00058560 monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014732 00058736 014734 00058760 014729 00040872 014736 00059184 014728 00059184 014727 00058744 014733 00058760 014731 00059320 014730 00059280 014735 00041072 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 014749 00040608 014748 00040616 014745 00039360 014750 00039360 014751 00039416 014747 00039416 014752 00039336 014746 00039336 014744 00039480 014753 00039480 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 014757 00039272 014761 00039272 014765 00039528 014756 00039528 014759 00039352 014760 00039352 014764 00039248 014762 00039248 014758 00039352 014763 00039352 monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 014773 00059680 014769 00059680 014768 00059144 014777 00059144 014775 00059688 014774 00059688 014770 00059264 014771 00059264 014772 00059528 014776 00059528 Ok box, whatever blows your skirt up. I'm done. Non Uniform Mysterious Artifacts -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/28/2013 11:47 PM, Mike Galbraith wrote: 014776 00059528 Ok box, whatever blows your skirt up. I'm done. Many thanks for so much fruitful testing! :D -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Tue, 2013-01-29 at 09:45 +0800, Alex Shi wrote: On 01/28/2013 11:47 PM, Mike Galbraith wrote: monteverdi:/abuild/mike/:[0]# echo 1 /sys/devices/system/cpu/cpufreq/boost monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014635 00058160 014633 00058592 014638 00058592 014636 00058160 014632 00058200 014634 00058704 014639 00058704 014641 00058200 014640 00058560 014637 00058560 monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014673 00059504 014676 00059504 014674 00059064 014672 00059064 014675 00058560 014671 00058560 014677 00059248 014668 00058864 014669 00059248 014670 00058864 monteverdi:/abuild/mike/:[0]# massive_intr 10 60 014686 00043472 014689 00043472 014685 00043760 014690 00043760 014687 00043528 014688 00043528 (hmm) 014683 00043216 014692 00043208 014684 00043336 014691 00043336 I am sorry Mike. does above 3 times testing has a same sched policy? and same question for the following testing. Yeah, they're back to back repeats. Using dirt simple massive_intr didn't help clarify aim7 oddity. aim7 is fully repeatable, seems to be saying that consolidation of small independent jobs is a win, that spreading before fully saturated has its price, just as consolidation of large coordinated burst has its price. Seems to cut both ways.. but why not, everything else does. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/28/2013 01:19 PM, Alex Shi wrote: On 01/27/2013 06:40 PM, Borislav Petkov wrote: On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote: Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear performance change found. Ok, good, You could put that in one of the commit messages so that it is there and people know that this patchset doesn't cause perf regressions with the bunch of benchmarks. I also tested balance policy/powersaving policy with above benchmark, found, the specjbb2005 drop much 30~50% on both of policy whenever with openjdk or jrockit. and hackbench drops a lots with powersaving policy on snb 4 sockets platforms. others has no clear change. Sorry, the testing configuration is unfair for this specjbb2005 results here. I set JVM hard pin and use hugepage for peak performance. When remove the hard pin and no hugepage, the balance/powersaving both drop about 5% VS performance policy, and performance policy result is similar with 3.8-rc5. I guess this is expected because there has to be some performance hit when saving power... BTW, I had tested the v3 version based on sched numa -- on tip/master. The specjbb just has about 5~7% dropping on balance/powersaving policy. The power scheduling done after the numa scheduling logical. -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 15:17 +0800, Alex Shi wrote: > On 01/28/2013 02:49 PM, Mike Galbraith wrote: > > On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: > >> On 01/27/2013 06:40 PM, Borislav Petkov wrote: > >>> On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote: > Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, > hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads > loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear > performance change found. > >>> > >>> Ok, good, You could put that in one of the commit messages so that it is > >>> there and people know that this patchset doesn't cause perf regressions > >>> with the bunch of benchmarks. > >>> > I also tested balance policy/powersaving policy with above benchmark, > found, the specjbb2005 drop much 30~50% on both of policy whenever > with openjdk or jrockit. and hackbench drops a lots with powersaving > policy on snb 4 sockets platforms. others has no clear change. > >>> > >>> I guess this is expected because there has to be some performance hit > >>> when saving power... > >>> > >> > >> BTW, I had tested the v3 version based on sched numa -- on tip/master. > >> The specjbb just has about 5~7% dropping on balance/powersaving policy. > >> The power scheduling done after the numa scheduling logical. > > > > That makes sense. How the numa scheduling numbers compare to mainline? > > Do you have all three available, mainline, and tip w. w/o powersaving > > policy? > > > > I once caught 20~40% performance increasing on sched numa VS mainline > 3.7-rc5. but have no baseline to compare balance/powersaving performance > since lower data are acceptable for balance/powersaving and > tip/master changes too quickly to follow up at that time. > :) (wow. dram sucks, dram+smp sucks more, dram+smp+numa _sucks rocks_;) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 07:42 +0100, Mike Galbraith wrote: > Back to original 1ms sleep, 8ms work, turning NUMA box into a single > node 10 core box with numactl. (aim7 in one 10 core node.. so spread, no delta.) Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII "1.1" powersaving Jan 28 08:04:14 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 441.0 100 13.73.7 7.3508 5 2516.6 98 12.08.1 8.3887 10 5215.1 98 11.611.98.6919 20 10475.4 99 11.621.78.7295 40 20216.8 99 12.038.28.4237 80 35568.6 99 13.671.47.4101 160 57102.5 98 17.0138.2 5.9482 320 82099.9 97 23.6271.1 4.2760 Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII "1.1" balance Jan 28 08:06:49 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 439.4 100 13.83.8 7.3241 5 2583.1 98 11.77.2 8.6104 10 5325.1 99 11.411.08.8752 20 10687.8 99 11.323.68.9065 40 20200.0 99 12.038.78.4167 80 35464.5 98 13.771.47.3884 160 57203.5 98 16.9137.9 5.9587 320 82065.2 98 23.6271.1 4.2742 Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII "1.1" performance Jan 28 08:09:20 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 438.8 100 13.83.8 7.3135 5 2634.8 99 11.57.2 8.7826 10 5396.3 99 11.211.48.9938 20 10725.7 99 11.324.08.9381 40 20183.2 99 12.038.58.4097 80 35620.9 99 13.671.47.4210 160 57203.5 98 16.9137.8 5.9587 320 81995.8 98 23.7271.3 4.2706 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/28/2013 02:49 PM, Mike Galbraith wrote: > On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: >> On 01/27/2013 06:40 PM, Borislav Petkov wrote: >>> On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote: Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear performance change found. >>> >>> Ok, good, You could put that in one of the commit messages so that it is >>> there and people know that this patchset doesn't cause perf regressions >>> with the bunch of benchmarks. >>> I also tested balance policy/powersaving policy with above benchmark, found, the specjbb2005 drop much 30~50% on both of policy whenever with openjdk or jrockit. and hackbench drops a lots with powersaving policy on snb 4 sockets platforms. others has no clear change. >>> >>> I guess this is expected because there has to be some performance hit >>> when saving power... >>> >> >> BTW, I had tested the v3 version based on sched numa -- on tip/master. >> The specjbb just has about 5~7% dropping on balance/powersaving policy. >> The power scheduling done after the numa scheduling logical. > > That makes sense. How the numa scheduling numbers compare to mainline? > Do you have all three available, mainline, and tip w. w/o powersaving > policy? > I once caught 20~40% performance increasing on sched numa VS mainline 3.7-rc5. but have no baseline to compare balance/powersaving performance since lower data are acceptable for balance/powersaving and tip/master changes too quickly to follow up at that time. :) > -Mike > > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: > On 01/27/2013 06:40 PM, Borislav Petkov wrote: > > On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote: > >> Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, > >> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads > >> loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear > >> performance change found. > > > > Ok, good, You could put that in one of the commit messages so that it is > > there and people know that this patchset doesn't cause perf regressions > > with the bunch of benchmarks. > > > >> I also tested balance policy/powersaving policy with above benchmark, > >> found, the specjbb2005 drop much 30~50% on both of policy whenever > >> with openjdk or jrockit. and hackbench drops a lots with powersaving > >> policy on snb 4 sockets platforms. others has no clear change. > > > > I guess this is expected because there has to be some performance hit > > when saving power... > > > > BTW, I had tested the v3 version based on sched numa -- on tip/master. > The specjbb just has about 5~7% dropping on balance/powersaving policy. > The power scheduling done after the numa scheduling logical. That makes sense. How the numa scheduling numbers compare to mainline? Do you have all three available, mainline, and tip w. w/o powersaving policy? -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 07:15 +0100, Mike Galbraith wrote: > On Mon, 2013-01-28 at 13:51 +0800, Alex Shi wrote: > > On 01/28/2013 01:17 PM, Mike Galbraith wrote: > > > On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: > > >> On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: > > >>> On 01/27/2013 06:35 PM, Borislav Petkov wrote: > > On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote: > > > With aim7 compute on 4 node 40 core box, I see stable throughput > > > improvement at tasks = nr_cores and below w. balance and powersaving. > > >> ... > > Ok, this is sick. How is balance and powersaving better than perf? Both > > have much more jobs per minute than perf; is that because we do pack > > much more tasks per cpu with balance and powersaving? > > >>> > > >>> Maybe it is due to the lazy balancing on balance/powersaving. You can > > >>> check the CS times in /proc/pid/status. > > >> > > >> Well, it's not wakeup path, limiting entry frequency per waker did zip > > >> squat nada to any policy throughput. > > > > > > monteverdi:/abuild/mike/:[0]# echo powersaving > > > > /sys/devices/system/cpu/sched_policy/current_sched_policy > > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > > > 043321 00058616 > > > 043313 00058616 > > > 043318 00058968 > > > 043317 00058968 > > > 043316 00059184 > > > 043319 00059192 > > > 043320 00059048 > > > 043314 00059048 > > > 043312 00058176 > > > 043315 00058184 > > > monteverdi:/abuild/mike/:[0]# echo balance > > > > /sys/devices/system/cpu/sched_policy/current_sched_policy > > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > > > 043337 00053448 > > > 04 00053456 > > > 043338 00052992 > > > 043331 00053448 > > > 043332 00053488 > > > 043335 00053496 > > > 043334 00053480 > > > 043329 00053288 > > > 043336 00053464 > > > 043330 00053496 > > > monteverdi:/abuild/mike/:[0]# echo performance > > > > /sys/devices/system/cpu/sched_policy/current_sched_policy > > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > > > 043348 00052488 > > > 043344 00052488 > > > 043349 00052744 > > > 043343 00052504 > > > 043347 00052504 > > > 043352 00052888 > > > 043345 00052504 > > > 043351 00052496 > > > 043346 00052496 > > > 043350 00052304 > > > monteverdi:/abuild/mike/:[0]# > > > > similar with aim7 results. Thanks, Mike! > > > > Wold you like to collect vmstat info in background? > > > > > > Zzzt. Wish I could turn turbo thingy off. > > > > Do you mean the turbo mode of cpu frequency? I remember some of machine > > can disable it in BIOS. > > Yeah, I can do that in my local x3550 box. I can't fiddle with BIOS > settings on the remote NUMA box. > > This can't be anything but turbo gizmo mucking up the numbers I think, > not that the numbers are invalid or anything, better numbers are better > numbers no matter where/how they come about ;-) > > The massive_intr load is dirt simple sleep/spin with bean counting. It > sleeps 1ms spins 8ms. Change that to sleep 8ms, grind away for 1ms... > > monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60 > 045150 6484 > 045157 6427 > 045156 6401 > 045152 6428 > 045155 6372 > 045154 6370 > 045158 6453 > 045149 6372 > 045151 6371 > 045153 6371 > monteverdi:/abuild/mike/:[0]# echo balance > > /sys/devices/system/cpu/sched_policy/current_sched_policy > monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60 > 045170 6380 > 045172 6374 > 045169 6376 > 045175 6376 > 045171 6334 > 045176 6380 > 045168 6374 > 045174 6334 > 045177 6375 > 045173 6376 > monteverdi:/abuild/mike/:[0]# echo performance > > /sys/devices/system/cpu/sched_policy/current_sched_policy > monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60 > 045198 6408 > 045191 6408 > 045197 6408 > 045192 6411 > 045194 6409 > 045196 6409 > 045195 6336 > 045189 6336 > 045193 6411 > 045190 6410 Back to original 1ms sleep, 8ms work, turning NUMA box into a single node 10 core box with numactl. monteverdi:/abuild/mike/:[0]# echo powersaving > /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 045286 00043872 045289 00043464 045284 00043488 045287 00043440 045283 00043416 045281 00044456 045285 00043456 045288 00044312 045280 00043048 045282 00043240 monteverdi:/abuild/mike/:[0]# echo balance > /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 045300 00052536 045307 00052472 045304 00052536 045299 00052536 045305 00052520 045306 00052528 045302 00052528 045303 00052528 045308 00052512 045301 00052520 monteverdi:/abuild/mike/:[0]# echo performance > /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 045339
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 13:51 +0800, Alex Shi wrote: > On 01/28/2013 01:17 PM, Mike Galbraith wrote: > > On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: > >> On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: > >>> On 01/27/2013 06:35 PM, Borislav Petkov wrote: > On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote: > > With aim7 compute on 4 node 40 core box, I see stable throughput > > improvement at tasks = nr_cores and below w. balance and powersaving. > >> ... > Ok, this is sick. How is balance and powersaving better than perf? Both > have much more jobs per minute than perf; is that because we do pack > much more tasks per cpu with balance and powersaving? > >>> > >>> Maybe it is due to the lazy balancing on balance/powersaving. You can > >>> check the CS times in /proc/pid/status. > >> > >> Well, it's not wakeup path, limiting entry frequency per waker did zip > >> squat nada to any policy throughput. > > > > monteverdi:/abuild/mike/:[0]# echo powersaving > > > /sys/devices/system/cpu/sched_policy/current_sched_policy > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > > 043321 00058616 > > 043313 00058616 > > 043318 00058968 > > 043317 00058968 > > 043316 00059184 > > 043319 00059192 > > 043320 00059048 > > 043314 00059048 > > 043312 00058176 > > 043315 00058184 > > monteverdi:/abuild/mike/:[0]# echo balance > > > /sys/devices/system/cpu/sched_policy/current_sched_policy > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > > 043337 00053448 > > 04 00053456 > > 043338 00052992 > > 043331 00053448 > > 043332 00053488 > > 043335 00053496 > > 043334 00053480 > > 043329 00053288 > > 043336 00053464 > > 043330 00053496 > > monteverdi:/abuild/mike/:[0]# echo performance > > > /sys/devices/system/cpu/sched_policy/current_sched_policy > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > > 043348 00052488 > > 043344 00052488 > > 043349 00052744 > > 043343 00052504 > > 043347 00052504 > > 043352 00052888 > > 043345 00052504 > > 043351 00052496 > > 043346 00052496 > > 043350 00052304 > > monteverdi:/abuild/mike/:[0]# > > similar with aim7 results. Thanks, Mike! > > Wold you like to collect vmstat info in background? > > > > Zzzt. Wish I could turn turbo thingy off. > > Do you mean the turbo mode of cpu frequency? I remember some of machine > can disable it in BIOS. Yeah, I can do that in my local x3550 box. I can't fiddle with BIOS settings on the remote NUMA box. This can't be anything but turbo gizmo mucking up the numbers I think, not that the numbers are invalid or anything, better numbers are better numbers no matter where/how they come about ;-) The massive_intr load is dirt simple sleep/spin with bean counting. It sleeps 1ms spins 8ms. Change that to sleep 8ms, grind away for 1ms... monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60 045150 6484 045157 6427 045156 6401 045152 6428 045155 6372 045154 6370 045158 6453 045149 6372 045151 6371 045153 6371 monteverdi:/abuild/mike/:[0]# echo balance > /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60 045170 6380 045172 6374 045169 6376 045175 6376 045171 6334 045176 6380 045168 6374 045174 6334 045177 6375 045173 6376 monteverdi:/abuild/mike/:[0]# echo performance > /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60 045198 6408 045191 6408 045197 6408 045192 6411 045194 6409 045196 6409 045195 6336 045189 6336 045193 6411 045190 6410 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/28/2013 01:17 PM, Mike Galbraith wrote: > On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: >> On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: >>> On 01/27/2013 06:35 PM, Borislav Petkov wrote: On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote: > With aim7 compute on 4 node 40 core box, I see stable throughput > improvement at tasks = nr_cores and below w. balance and powersaving. >> ... Ok, this is sick. How is balance and powersaving better than perf? Both have much more jobs per minute than perf; is that because we do pack much more tasks per cpu with balance and powersaving? >>> >>> Maybe it is due to the lazy balancing on balance/powersaving. You can >>> check the CS times in /proc/pid/status. >> >> Well, it's not wakeup path, limiting entry frequency per waker did zip >> squat nada to any policy throughput. > > monteverdi:/abuild/mike/:[0]# echo powersaving > > /sys/devices/system/cpu/sched_policy/current_sched_policy > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > 043321 00058616 > 043313 00058616 > 043318 00058968 > 043317 00058968 > 043316 00059184 > 043319 00059192 > 043320 00059048 > 043314 00059048 > 043312 00058176 > 043315 00058184 > monteverdi:/abuild/mike/:[0]# echo balance > > /sys/devices/system/cpu/sched_policy/current_sched_policy > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > 043337 00053448 > 04 00053456 > 043338 00052992 > 043331 00053448 > 043332 00053488 > 043335 00053496 > 043334 00053480 > 043329 00053288 > 043336 00053464 > 043330 00053496 > monteverdi:/abuild/mike/:[0]# echo performance > > /sys/devices/system/cpu/sched_policy/current_sched_policy > monteverdi:/abuild/mike/:[0]# massive_intr 10 60 > 043348 00052488 > 043344 00052488 > 043349 00052744 > 043343 00052504 > 043347 00052504 > 043352 00052888 > 043345 00052504 > 043351 00052496 > 043346 00052496 > 043350 00052304 > monteverdi:/abuild/mike/:[0]# similar with aim7 results. Thanks, Mike! Wold you like to collect vmstat info in background? > > Zzzt. Wish I could turn turbo thingy off. Do you mean the turbo mode of cpu frequency? I remember some of machine can disable it in BIOS. > > -Mike > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/27/2013 06:40 PM, Borislav Petkov wrote: > On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote: >> Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, >> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads >> loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear >> performance change found. > > Ok, good, You could put that in one of the commit messages so that it is > there and people know that this patchset doesn't cause perf regressions > with the bunch of benchmarks. > >> I also tested balance policy/powersaving policy with above benchmark, >> found, the specjbb2005 drop much 30~50% on both of policy whenever >> with openjdk or jrockit. and hackbench drops a lots with powersaving >> policy on snb 4 sockets platforms. others has no clear change. > > I guess this is expected because there has to be some performance hit > when saving power... > BTW, I had tested the v3 version based on sched numa -- on tip/master. The specjbb just has about 5~7% dropping on balance/powersaving policy. The power scheduling done after the numa scheduling logical. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: > On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: > > On 01/27/2013 06:35 PM, Borislav Petkov wrote: > > > On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote: > > >> With aim7 compute on 4 node 40 core box, I see stable throughput > > >> improvement at tasks = nr_cores and below w. balance and powersaving. > ... > > > Ok, this is sick. How is balance and powersaving better than perf? Both > > > have much more jobs per minute than perf; is that because we do pack > > > much more tasks per cpu with balance and powersaving? > > > > Maybe it is due to the lazy balancing on balance/powersaving. You can > > check the CS times in /proc/pid/status. > > Well, it's not wakeup path, limiting entry frequency per waker did zip > squat nada to any policy throughput. monteverdi:/abuild/mike/:[0]# echo powersaving > /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043321 00058616 043313 00058616 043318 00058968 043317 00058968 043316 00059184 043319 00059192 043320 00059048 043314 00059048 043312 00058176 043315 00058184 monteverdi:/abuild/mike/:[0]# echo balance > /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043337 00053448 04 00053456 043338 00052992 043331 00053448 043332 00053488 043335 00053496 043334 00053480 043329 00053288 043336 00053464 043330 00053496 monteverdi:/abuild/mike/:[0]# echo performance > /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043348 00052488 043344 00052488 043349 00052744 043343 00052504 043347 00052504 043352 00052888 043345 00052504 043351 00052496 043346 00052496 043350 00052304 monteverdi:/abuild/mike/:[0]# Zzzt. Wish I could turn turbo thingy off. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/24/2013 11:06 AM, Alex Shi wrote: > Since the runnable info needs 345ms to accumulate, balancing > doesn't do well for many tasks burst waking. After talking with Mike > Galbraith, we are agree to just use runnable avg in power friendly > scheduling and keep current instant load in performance scheduling for > low latency. > > So the biggest change in this version is removing runnable load avg in > balance and just using runnable data in power balance. > > The patchset bases on Linus' tree, includes 3 parts, Would you like to give some comments, Ingo? :) Best regards! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: > On 01/27/2013 06:35 PM, Borislav Petkov wrote: > > On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote: > >> With aim7 compute on 4 node 40 core box, I see stable throughput > >> improvement at tasks = nr_cores and below w. balance and powersaving. ... > > Ok, this is sick. How is balance and powersaving better than perf? Both > > have much more jobs per minute than perf; is that because we do pack > > much more tasks per cpu with balance and powersaving? > > Maybe it is due to the lazy balancing on balance/powersaving. You can > check the CS times in /proc/pid/status. Well, it's not wakeup path, limiting entry frequency per waker did zip squat nada to any policy throughput. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/27/2013 06:40 PM, Borislav Petkov wrote: > On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote: >> Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, >> hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads >> loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear >> performance change found. > > Ok, good, You could put that in one of the commit messages so that it is > there and people know that this patchset doesn't cause perf regressions > with the bunch of benchmarks. thanks suggestion! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/27/2013 06:35 PM, Borislav Petkov wrote: > On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote: >> With aim7 compute on 4 node 40 core box, I see stable throughput >> improvement at tasks = nr_cores and below w. balance and powersaving. >> >> 3.8.0-performance 3.8.0-balance >> 3.8.0-powersaving >> Tasksjobs/min jti jobs/min/task real cpu jobs/min jti >> jobs/min/task real cpu jobs/min jti jobs/min/task real >> cpu >> 1 432.86 100 432.8571 14.00 3.99 433.48 100 >>433.4764 13.98 3.97 433.17 100 433.1665 13.99 >> 3.98 >> 1 437.23 100 437.2294 13.86 3.85 436.60 100 >>436.5994 13.88 3.86 435.66 100 435.6578 13.91 >> 3.90 >> 1 434.10 100 434.0974 13.96 3.95 436.29 100 >>436.2851 13.89 3.89 436.29 100 436.2851 13.89 >> 3.87 >> 5 2400.95 99 480.1902 12.62 12.492554.81 98 >>510.9612 11.86 7.552487.68 98 497.5369 12.18 >> 8.22 >> 5 2341.58 99 468.3153 12.94 13.952578.72 99 >>515.7447 11.75 7.252527.11 99 505.4212 11.99 >> 7.90 >> 5 2350.66 99 470.1319 12.89 13.662600.86 99 >>520.1717 11.65 7.092508.28 98 501.6556 12.08 >> 8.24 >>10 4291.78 99 429.1785 14.12 40.145334.51 99 >>533.4507 11.36 11.135183.92 98 518.3918 11.69 >> 12.15 >>10 4334.76 99 433.4764 13.98 38.705311.13 99 >>531.1131 11.41 11.235215.15 99 521.5146 11.62 >> 12.53 >>10 4273.62 99 427.3625 14.18 40.295287.96 99 >>528.7958 11.46 11.465144.31 98 514.4312 11.78 >> 12.32 >>20 8487.39 94 424.3697 14.28 63.14 10594.41 99 >>529.7203 11.44 23.72 10575.92 99 528.7958 11.46 >> 22.08 >>20 8387.54 97 419.3772 14.45 77.01 10575.92 98 >>528.7958 11.46 23.41 10520.83 99 526.0417 11.52 >> 21.88 >>20 8713.16 95 435.6578 13.91 55.10 10659.63 99 >>532.9815 11.37 24.17 10539.13 99 526.9565 11.50 >> 22.13 >>4016786.70 99 419.6676 14.44170.08 19469.88 98 >>486.7470 12.45 60.78 19967.05 98 499.1763 12.14 >> 51.40 >>4016728.78 99 418.2195 14.49172.96 19627.53 98 >>490.6883 12.35 65.26 20386.88 98 509.6720 11.89 >> 46.91 >>4016763.49 99 419.0871 14.46171.42 20033.06 98 >>500.8264 12.10 51.44 20682.59 98 517.0648 11.72 >> 42.45 > > Ok, this is sick. How is balance and powersaving better than perf? Both > have much more jobs per minute than perf; is that because we do pack > much more tasks per cpu with balance and powersaving? Maybe it is due to the lazy balancing on balance/powersaving. You can check the CS times in /proc/pid/status. > > Thanks. > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote: > Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, > hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads > loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear > performance change found. Ok, good, You could put that in one of the commit messages so that it is there and people know that this patchset doesn't cause perf regressions with the bunch of benchmarks. > I also tested balance policy/powersaving policy with above benchmark, > found, the specjbb2005 drop much 30~50% on both of policy whenever > with openjdk or jrockit. and hackbench drops a lots with powersaving > policy on snb 4 sockets platforms. others has no clear change. I guess this is expected because there has to be some performance hit when saving power... Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote: > With aim7 compute on 4 node 40 core box, I see stable throughput > improvement at tasks = nr_cores and below w. balance and powersaving. > > 3.8.0-performance 3.8.0-balance > 3.8.0-powersaving > Tasksjobs/min jti jobs/min/task real cpu jobs/min jti > jobs/min/task real cpu jobs/min jti jobs/min/task real >cpu > 1 432.86 100 432.8571 14.00 3.99 433.48 100 > 433.4764 13.98 3.97 433.17 100 433.1665 13.99 > 3.98 > 1 437.23 100 437.2294 13.86 3.85 436.60 100 > 436.5994 13.88 3.86 435.66 100 435.6578 13.91 > 3.90 > 1 434.10 100 434.0974 13.96 3.95 436.29 100 > 436.2851 13.89 3.89 436.29 100 436.2851 13.89 > 3.87 > 5 2400.95 99 480.1902 12.62 12.492554.81 98 > 510.9612 11.86 7.552487.68 98 497.5369 12.18 > 8.22 > 5 2341.58 99 468.3153 12.94 13.952578.72 99 > 515.7447 11.75 7.252527.11 99 505.4212 11.99 > 7.90 > 5 2350.66 99 470.1319 12.89 13.662600.86 99 > 520.1717 11.65 7.092508.28 98 501.6556 12.08 > 8.24 >10 4291.78 99 429.1785 14.12 40.145334.51 99 > 533.4507 11.36 11.135183.92 98 518.3918 11.69 > 12.15 >10 4334.76 99 433.4764 13.98 38.705311.13 99 > 531.1131 11.41 11.235215.15 99 521.5146 11.62 > 12.53 >10 4273.62 99 427.3625 14.18 40.295287.96 99 > 528.7958 11.46 11.465144.31 98 514.4312 11.78 > 12.32 >20 8487.39 94 424.3697 14.28 63.14 10594.41 99 > 529.7203 11.44 23.72 10575.92 99 528.7958 11.46 > 22.08 >20 8387.54 97 419.3772 14.45 77.01 10575.92 98 > 528.7958 11.46 23.41 10520.83 99 526.0417 11.52 > 21.88 >20 8713.16 95 435.6578 13.91 55.10 10659.63 99 > 532.9815 11.37 24.17 10539.13 99 526.9565 11.50 > 22.13 >4016786.70 99 419.6676 14.44170.08 19469.88 98 > 486.7470 12.45 60.78 19967.05 98 499.1763 12.14 > 51.40 >4016728.78 99 418.2195 14.49172.96 19627.53 98 > 490.6883 12.35 65.26 20386.88 98 509.6720 11.89 > 46.91 >4016763.49 99 419.0871 14.46171.42 20033.06 98 > 500.8264 12.10 51.44 20682.59 98 517.0648 11.72 > 42.45 Ok, this is sick. How is balance and powersaving better than perf? Both have much more jobs per minute than perf; is that because we do pack much more tasks per cpu with balance and powersaving? Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote: With aim7 compute on 4 node 40 core box, I see stable throughput improvement at tasks = nr_cores and below w. balance and powersaving. 3.8.0-performance 3.8.0-balance 3.8.0-powersaving Tasksjobs/min jti jobs/min/task real cpu jobs/min jti jobs/min/task real cpu jobs/min jti jobs/min/task real cpu 1 432.86 100 432.8571 14.00 3.99 433.48 100 433.4764 13.98 3.97 433.17 100 433.1665 13.99 3.98 1 437.23 100 437.2294 13.86 3.85 436.60 100 436.5994 13.88 3.86 435.66 100 435.6578 13.91 3.90 1 434.10 100 434.0974 13.96 3.95 436.29 100 436.2851 13.89 3.89 436.29 100 436.2851 13.89 3.87 5 2400.95 99 480.1902 12.62 12.492554.81 98 510.9612 11.86 7.552487.68 98 497.5369 12.18 8.22 5 2341.58 99 468.3153 12.94 13.952578.72 99 515.7447 11.75 7.252527.11 99 505.4212 11.99 7.90 5 2350.66 99 470.1319 12.89 13.662600.86 99 520.1717 11.65 7.092508.28 98 501.6556 12.08 8.24 10 4291.78 99 429.1785 14.12 40.145334.51 99 533.4507 11.36 11.135183.92 98 518.3918 11.69 12.15 10 4334.76 99 433.4764 13.98 38.705311.13 99 531.1131 11.41 11.235215.15 99 521.5146 11.62 12.53 10 4273.62 99 427.3625 14.18 40.295287.96 99 528.7958 11.46 11.465144.31 98 514.4312 11.78 12.32 20 8487.39 94 424.3697 14.28 63.14 10594.41 99 529.7203 11.44 23.72 10575.92 99 528.7958 11.46 22.08 20 8387.54 97 419.3772 14.45 77.01 10575.92 98 528.7958 11.46 23.41 10520.83 99 526.0417 11.52 21.88 20 8713.16 95 435.6578 13.91 55.10 10659.63 99 532.9815 11.37 24.17 10539.13 99 526.9565 11.50 22.13 4016786.70 99 419.6676 14.44170.08 19469.88 98 486.7470 12.45 60.78 19967.05 98 499.1763 12.14 51.40 4016728.78 99 418.2195 14.49172.96 19627.53 98 490.6883 12.35 65.26 20386.88 98 509.6720 11.89 46.91 4016763.49 99 419.0871 14.46171.42 20033.06 98 500.8264 12.10 51.44 20682.59 98 517.0648 11.72 42.45 Ok, this is sick. How is balance and powersaving better than perf? Both have much more jobs per minute than perf; is that because we do pack much more tasks per cpu with balance and powersaving? Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote: Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear performance change found. Ok, good, You could put that in one of the commit messages so that it is there and people know that this patchset doesn't cause perf regressions with the bunch of benchmarks. I also tested balance policy/powersaving policy with above benchmark, found, the specjbb2005 drop much 30~50% on both of policy whenever with openjdk or jrockit. and hackbench drops a lots with powersaving policy on snb 4 sockets platforms. others has no clear change. I guess this is expected because there has to be some performance hit when saving power... Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/27/2013 06:35 PM, Borislav Petkov wrote: On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote: With aim7 compute on 4 node 40 core box, I see stable throughput improvement at tasks = nr_cores and below w. balance and powersaving. 3.8.0-performance 3.8.0-balance 3.8.0-powersaving Tasksjobs/min jti jobs/min/task real cpu jobs/min jti jobs/min/task real cpu jobs/min jti jobs/min/task real cpu 1 432.86 100 432.8571 14.00 3.99 433.48 100 433.4764 13.98 3.97 433.17 100 433.1665 13.99 3.98 1 437.23 100 437.2294 13.86 3.85 436.60 100 436.5994 13.88 3.86 435.66 100 435.6578 13.91 3.90 1 434.10 100 434.0974 13.96 3.95 436.29 100 436.2851 13.89 3.89 436.29 100 436.2851 13.89 3.87 5 2400.95 99 480.1902 12.62 12.492554.81 98 510.9612 11.86 7.552487.68 98 497.5369 12.18 8.22 5 2341.58 99 468.3153 12.94 13.952578.72 99 515.7447 11.75 7.252527.11 99 505.4212 11.99 7.90 5 2350.66 99 470.1319 12.89 13.662600.86 99 520.1717 11.65 7.092508.28 98 501.6556 12.08 8.24 10 4291.78 99 429.1785 14.12 40.145334.51 99 533.4507 11.36 11.135183.92 98 518.3918 11.69 12.15 10 4334.76 99 433.4764 13.98 38.705311.13 99 531.1131 11.41 11.235215.15 99 521.5146 11.62 12.53 10 4273.62 99 427.3625 14.18 40.295287.96 99 528.7958 11.46 11.465144.31 98 514.4312 11.78 12.32 20 8487.39 94 424.3697 14.28 63.14 10594.41 99 529.7203 11.44 23.72 10575.92 99 528.7958 11.46 22.08 20 8387.54 97 419.3772 14.45 77.01 10575.92 98 528.7958 11.46 23.41 10520.83 99 526.0417 11.52 21.88 20 8713.16 95 435.6578 13.91 55.10 10659.63 99 532.9815 11.37 24.17 10539.13 99 526.9565 11.50 22.13 4016786.70 99 419.6676 14.44170.08 19469.88 98 486.7470 12.45 60.78 19967.05 98 499.1763 12.14 51.40 4016728.78 99 418.2195 14.49172.96 19627.53 98 490.6883 12.35 65.26 20386.88 98 509.6720 11.89 46.91 4016763.49 99 419.0871 14.46171.42 20033.06 98 500.8264 12.10 51.44 20682.59 98 517.0648 11.72 42.45 Ok, this is sick. How is balance and powersaving better than perf? Both have much more jobs per minute than perf; is that because we do pack much more tasks per cpu with balance and powersaving? Maybe it is due to the lazy balancing on balance/powersaving. You can check the CS times in /proc/pid/status. Thanks. -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/27/2013 06:40 PM, Borislav Petkov wrote: On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote: Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear performance change found. Ok, good, You could put that in one of the commit messages so that it is there and people know that this patchset doesn't cause perf regressions with the bunch of benchmarks. thanks suggestion! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: On 01/27/2013 06:35 PM, Borislav Petkov wrote: On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote: With aim7 compute on 4 node 40 core box, I see stable throughput improvement at tasks = nr_cores and below w. balance and powersaving. ... Ok, this is sick. How is balance and powersaving better than perf? Both have much more jobs per minute than perf; is that because we do pack much more tasks per cpu with balance and powersaving? Maybe it is due to the lazy balancing on balance/powersaving. You can check the CS times in /proc/pid/status. Well, it's not wakeup path, limiting entry frequency per waker did zip squat nada to any policy throughput. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/24/2013 11:06 AM, Alex Shi wrote: Since the runnable info needs 345ms to accumulate, balancing doesn't do well for many tasks burst waking. After talking with Mike Galbraith, we are agree to just use runnable avg in power friendly scheduling and keep current instant load in performance scheduling for low latency. So the biggest change in this version is removing runnable load avg in balance and just using runnable data in power balance. The patchset bases on Linus' tree, includes 3 parts, Would you like to give some comments, Ingo? :) Best regards! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: On 01/27/2013 06:35 PM, Borislav Petkov wrote: On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote: With aim7 compute on 4 node 40 core box, I see stable throughput improvement at tasks = nr_cores and below w. balance and powersaving. ... Ok, this is sick. How is balance and powersaving better than perf? Both have much more jobs per minute than perf; is that because we do pack much more tasks per cpu with balance and powersaving? Maybe it is due to the lazy balancing on balance/powersaving. You can check the CS times in /proc/pid/status. Well, it's not wakeup path, limiting entry frequency per waker did zip squat nada to any policy throughput. monteverdi:/abuild/mike/:[0]# echo powersaving /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043321 00058616 043313 00058616 043318 00058968 043317 00058968 043316 00059184 043319 00059192 043320 00059048 043314 00059048 043312 00058176 043315 00058184 monteverdi:/abuild/mike/:[0]# echo balance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043337 00053448 04 00053456 043338 00052992 043331 00053448 043332 00053488 043335 00053496 043334 00053480 043329 00053288 043336 00053464 043330 00053496 monteverdi:/abuild/mike/:[0]# echo performance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043348 00052488 043344 00052488 043349 00052744 043343 00052504 043347 00052504 043352 00052888 043345 00052504 043351 00052496 043346 00052496 043350 00052304 monteverdi:/abuild/mike/:[0]# Zzzt. Wish I could turn turbo thingy off. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/27/2013 06:40 PM, Borislav Petkov wrote: On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote: Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear performance change found. Ok, good, You could put that in one of the commit messages so that it is there and people know that this patchset doesn't cause perf regressions with the bunch of benchmarks. I also tested balance policy/powersaving policy with above benchmark, found, the specjbb2005 drop much 30~50% on both of policy whenever with openjdk or jrockit. and hackbench drops a lots with powersaving policy on snb 4 sockets platforms. others has no clear change. I guess this is expected because there has to be some performance hit when saving power... BTW, I had tested the v3 version based on sched numa -- on tip/master. The specjbb just has about 5~7% dropping on balance/powersaving policy. The power scheduling done after the numa scheduling logical. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/28/2013 01:17 PM, Mike Galbraith wrote: On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: On 01/27/2013 06:35 PM, Borislav Petkov wrote: On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote: With aim7 compute on 4 node 40 core box, I see stable throughput improvement at tasks = nr_cores and below w. balance and powersaving. ... Ok, this is sick. How is balance and powersaving better than perf? Both have much more jobs per minute than perf; is that because we do pack much more tasks per cpu with balance and powersaving? Maybe it is due to the lazy balancing on balance/powersaving. You can check the CS times in /proc/pid/status. Well, it's not wakeup path, limiting entry frequency per waker did zip squat nada to any policy throughput. monteverdi:/abuild/mike/:[0]# echo powersaving /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043321 00058616 043313 00058616 043318 00058968 043317 00058968 043316 00059184 043319 00059192 043320 00059048 043314 00059048 043312 00058176 043315 00058184 monteverdi:/abuild/mike/:[0]# echo balance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043337 00053448 04 00053456 043338 00052992 043331 00053448 043332 00053488 043335 00053496 043334 00053480 043329 00053288 043336 00053464 043330 00053496 monteverdi:/abuild/mike/:[0]# echo performance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043348 00052488 043344 00052488 043349 00052744 043343 00052504 043347 00052504 043352 00052888 043345 00052504 043351 00052496 043346 00052496 043350 00052304 monteverdi:/abuild/mike/:[0]# similar with aim7 results. Thanks, Mike! Wold you like to collect vmstat info in background? Zzzt. Wish I could turn turbo thingy off. Do you mean the turbo mode of cpu frequency? I remember some of machine can disable it in BIOS. -Mike -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 13:51 +0800, Alex Shi wrote: On 01/28/2013 01:17 PM, Mike Galbraith wrote: On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: On 01/27/2013 06:35 PM, Borislav Petkov wrote: On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote: With aim7 compute on 4 node 40 core box, I see stable throughput improvement at tasks = nr_cores and below w. balance and powersaving. ... Ok, this is sick. How is balance and powersaving better than perf? Both have much more jobs per minute than perf; is that because we do pack much more tasks per cpu with balance and powersaving? Maybe it is due to the lazy balancing on balance/powersaving. You can check the CS times in /proc/pid/status. Well, it's not wakeup path, limiting entry frequency per waker did zip squat nada to any policy throughput. monteverdi:/abuild/mike/:[0]# echo powersaving /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043321 00058616 043313 00058616 043318 00058968 043317 00058968 043316 00059184 043319 00059192 043320 00059048 043314 00059048 043312 00058176 043315 00058184 monteverdi:/abuild/mike/:[0]# echo balance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043337 00053448 04 00053456 043338 00052992 043331 00053448 043332 00053488 043335 00053496 043334 00053480 043329 00053288 043336 00053464 043330 00053496 monteverdi:/abuild/mike/:[0]# echo performance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043348 00052488 043344 00052488 043349 00052744 043343 00052504 043347 00052504 043352 00052888 043345 00052504 043351 00052496 043346 00052496 043350 00052304 monteverdi:/abuild/mike/:[0]# similar with aim7 results. Thanks, Mike! Wold you like to collect vmstat info in background? Zzzt. Wish I could turn turbo thingy off. Do you mean the turbo mode of cpu frequency? I remember some of machine can disable it in BIOS. Yeah, I can do that in my local x3550 box. I can't fiddle with BIOS settings on the remote NUMA box. This can't be anything but turbo gizmo mucking up the numbers I think, not that the numbers are invalid or anything, better numbers are better numbers no matter where/how they come about ;-) The massive_intr load is dirt simple sleep/spin with bean counting. It sleeps 1ms spins 8ms. Change that to sleep 8ms, grind away for 1ms... monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60 045150 6484 045157 6427 045156 6401 045152 6428 045155 6372 045154 6370 045158 6453 045149 6372 045151 6371 045153 6371 monteverdi:/abuild/mike/:[0]# echo balance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60 045170 6380 045172 6374 045169 6376 045175 6376 045171 6334 045176 6380 045168 6374 045174 6334 045177 6375 045173 6376 monteverdi:/abuild/mike/:[0]# echo performance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60 045198 6408 045191 6408 045197 6408 045192 6411 045194 6409 045196 6409 045195 6336 045189 6336 045193 6411 045190 6410 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 07:15 +0100, Mike Galbraith wrote: On Mon, 2013-01-28 at 13:51 +0800, Alex Shi wrote: On 01/28/2013 01:17 PM, Mike Galbraith wrote: On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote: On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote: On 01/27/2013 06:35 PM, Borislav Petkov wrote: On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote: With aim7 compute on 4 node 40 core box, I see stable throughput improvement at tasks = nr_cores and below w. balance and powersaving. ... Ok, this is sick. How is balance and powersaving better than perf? Both have much more jobs per minute than perf; is that because we do pack much more tasks per cpu with balance and powersaving? Maybe it is due to the lazy balancing on balance/powersaving. You can check the CS times in /proc/pid/status. Well, it's not wakeup path, limiting entry frequency per waker did zip squat nada to any policy throughput. monteverdi:/abuild/mike/:[0]# echo powersaving /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043321 00058616 043313 00058616 043318 00058968 043317 00058968 043316 00059184 043319 00059192 043320 00059048 043314 00059048 043312 00058176 043315 00058184 monteverdi:/abuild/mike/:[0]# echo balance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043337 00053448 04 00053456 043338 00052992 043331 00053448 043332 00053488 043335 00053496 043334 00053480 043329 00053288 043336 00053464 043330 00053496 monteverdi:/abuild/mike/:[0]# echo performance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# massive_intr 10 60 043348 00052488 043344 00052488 043349 00052744 043343 00052504 043347 00052504 043352 00052888 043345 00052504 043351 00052496 043346 00052496 043350 00052304 monteverdi:/abuild/mike/:[0]# similar with aim7 results. Thanks, Mike! Wold you like to collect vmstat info in background? Zzzt. Wish I could turn turbo thingy off. Do you mean the turbo mode of cpu frequency? I remember some of machine can disable it in BIOS. Yeah, I can do that in my local x3550 box. I can't fiddle with BIOS settings on the remote NUMA box. This can't be anything but turbo gizmo mucking up the numbers I think, not that the numbers are invalid or anything, better numbers are better numbers no matter where/how they come about ;-) The massive_intr load is dirt simple sleep/spin with bean counting. It sleeps 1ms spins 8ms. Change that to sleep 8ms, grind away for 1ms... monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60 045150 6484 045157 6427 045156 6401 045152 6428 045155 6372 045154 6370 045158 6453 045149 6372 045151 6371 045153 6371 monteverdi:/abuild/mike/:[0]# echo balance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60 045170 6380 045172 6374 045169 6376 045175 6376 045171 6334 045176 6380 045168 6374 045174 6334 045177 6375 045173 6376 monteverdi:/abuild/mike/:[0]# echo performance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60 045198 6408 045191 6408 045197 6408 045192 6411 045194 6409 045196 6409 045195 6336 045189 6336 045193 6411 045190 6410 Back to original 1ms sleep, 8ms work, turning NUMA box into a single node 10 core box with numactl. monteverdi:/abuild/mike/:[0]# echo powersaving /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 045286 00043872 045289 00043464 045284 00043488 045287 00043440 045283 00043416 045281 00044456 045285 00043456 045288 00044312 045280 00043048 045282 00043240 monteverdi:/abuild/mike/:[0]# echo balance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 045300 00052536 045307 00052472 045304 00052536 045299 00052536 045305 00052520 045306 00052528 045302 00052528 045303 00052528 045308 00052512 045301 00052520 monteverdi:/abuild/mike/:[0]# echo performance /sys/devices/system/cpu/sched_policy/current_sched_policy monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60 045339 00052600 045340 00052608 045338 00052600 045337 00052608 045343 00052600 045341 00052600 045336 00052608 045335 00052616 045334 00052576 045342 00052600 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: On 01/27/2013 06:40 PM, Borislav Petkov wrote: On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote: Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear performance change found. Ok, good, You could put that in one of the commit messages so that it is there and people know that this patchset doesn't cause perf regressions with the bunch of benchmarks. I also tested balance policy/powersaving policy with above benchmark, found, the specjbb2005 drop much 30~50% on both of policy whenever with openjdk or jrockit. and hackbench drops a lots with powersaving policy on snb 4 sockets platforms. others has no clear change. I guess this is expected because there has to be some performance hit when saving power... BTW, I had tested the v3 version based on sched numa -- on tip/master. The specjbb just has about 5~7% dropping on balance/powersaving policy. The power scheduling done after the numa scheduling logical. That makes sense. How the numa scheduling numbers compare to mainline? Do you have all three available, mainline, and tip w. w/o powersaving policy? -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/28/2013 02:49 PM, Mike Galbraith wrote: On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: On 01/27/2013 06:40 PM, Borislav Petkov wrote: On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote: Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear performance change found. Ok, good, You could put that in one of the commit messages so that it is there and people know that this patchset doesn't cause perf regressions with the bunch of benchmarks. I also tested balance policy/powersaving policy with above benchmark, found, the specjbb2005 drop much 30~50% on both of policy whenever with openjdk or jrockit. and hackbench drops a lots with powersaving policy on snb 4 sockets platforms. others has no clear change. I guess this is expected because there has to be some performance hit when saving power... BTW, I had tested the v3 version based on sched numa -- on tip/master. The specjbb just has about 5~7% dropping on balance/powersaving policy. The power scheduling done after the numa scheduling logical. That makes sense. How the numa scheduling numbers compare to mainline? Do you have all three available, mainline, and tip w. w/o powersaving policy? I once caught 20~40% performance increasing on sched numa VS mainline 3.7-rc5. but have no baseline to compare balance/powersaving performance since lower data are acceptable for balance/powersaving and tip/master changes too quickly to follow up at that time. :) -Mike -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 07:42 +0100, Mike Galbraith wrote: Back to original 1ms sleep, 8ms work, turning NUMA box into a single node 10 core box with numactl. (aim7 in one 10 core node.. so spread, no delta.) Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII 1.1 powersaving Jan 28 08:04:14 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 441.0 100 13.73.7 7.3508 5 2516.6 98 12.08.1 8.3887 10 5215.1 98 11.611.98.6919 20 10475.4 99 11.621.78.7295 40 20216.8 99 12.038.28.4237 80 35568.6 99 13.671.47.4101 160 57102.5 98 17.0138.2 5.9482 320 82099.9 97 23.6271.1 4.2760 Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII 1.1 balance Jan 28 08:06:49 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 439.4 100 13.83.8 7.3241 5 2583.1 98 11.77.2 8.6104 10 5325.1 99 11.411.08.8752 20 10687.8 99 11.323.68.9065 40 20200.0 99 12.038.78.4167 80 35464.5 98 13.771.47.3884 160 57203.5 98 16.9137.9 5.9587 320 82065.2 98 23.6271.1 4.2742 Benchmark Version Machine Run Date AIM Multiuser Benchmark - Suite VII 1.1 performance Jan 28 08:09:20 2013 Tasks Jobs/MinJTI RealCPU Jobs/sec/task 1 438.8 100 13.83.8 7.3135 5 2634.8 99 11.57.2 8.7826 10 5396.3 99 11.211.48.9938 20 10725.7 99 11.324.08.9381 40 20183.2 99 12.038.58.4097 80 35620.9 99 13.671.47.4210 160 57203.5 98 16.9137.8 5.9587 320 81995.8 98 23.7271.3 4.2706 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Mon, 2013-01-28 at 15:17 +0800, Alex Shi wrote: On 01/28/2013 02:49 PM, Mike Galbraith wrote: On Mon, 2013-01-28 at 13:19 +0800, Alex Shi wrote: On 01/27/2013 06:40 PM, Borislav Petkov wrote: On Sun, Jan 27, 2013 at 10:41:40AM +0800, Alex Shi wrote: Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear performance change found. Ok, good, You could put that in one of the commit messages so that it is there and people know that this patchset doesn't cause perf regressions with the bunch of benchmarks. I also tested balance policy/powersaving policy with above benchmark, found, the specjbb2005 drop much 30~50% on both of policy whenever with openjdk or jrockit. and hackbench drops a lots with powersaving policy on snb 4 sockets platforms. others has no clear change. I guess this is expected because there has to be some performance hit when saving power... BTW, I had tested the v3 version based on sched numa -- on tip/master. The specjbb just has about 5~7% dropping on balance/powersaving policy. The power scheduling done after the numa scheduling logical. That makes sense. How the numa scheduling numbers compare to mainline? Do you have all three available, mainline, and tip w. w/o powersaving policy? I once caught 20~40% performance increasing on sched numa VS mainline 3.7-rc5. but have no baseline to compare balance/powersaving performance since lower data are acceptable for balance/powersaving and tip/master changes too quickly to follow up at that time. :) (wow. dram sucks, dram+smp sucks more, dram+smp+numa _sucks rocks_;) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Sun, 2013-01-27 at 10:41 +0800, Alex Shi wrote: > On 01/24/2013 11:07 PM, Alex Shi wrote: > > On 01/24/2013 05:44 PM, Borislav Petkov wrote: > >> On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote: > >>> Since the runnable info needs 345ms to accumulate, balancing > >>> doesn't do well for many tasks burst waking. After talking with Mike > >>> Galbraith, we are agree to just use runnable avg in power friendly > >>> scheduling and keep current instant load in performance scheduling for > >>> low latency. > >>> > >>> So the biggest change in this version is removing runnable load avg in > >>> balance and just using runnable data in power balance. > >>> > >>> The patchset bases on Linus' tree, includes 3 parts, > >>> ** 1, bug fix and fork/wake balancing clean up. patch 1~5, > >>> -- > >>> the first patch remove one domain level. patch 2~5 simplified fork/wake > >>> balancing, it can increase 10+% hackbench performance on our 4 sockets > >>> SNB EP machine. > >> > >> Ok, I see some benchmarking results here and there in the commit > >> messages but since this is touching the scheduler, you probably would > >> need to make sure it doesn't introduce performance regressions vs > >> mainline with a comprehensive set of benchmarks. > >> > > > > Thanks a lot for your comments, Borislav! :) > > > > For this patchset, the code will just check current policy, if it is > > performance, the code patch will back to original performance code at > > once. So there should no performance change on performance policy. > > > > I once tested the balance policy performance with benchmark > > kbuild/hackbench/aim9/dbench/tbench on version 2, only hackbench has a > > bit drop ~3%. others have no clear change. > > > >> And, AFAICR, mainline does by default the 'performance' scheme by > >> spreading out tasks to idle cores, so have you tried comparing vanilla > >> mainline to your patchset in the 'performance' setting so that you can > >> make sure there are no problems there? And not only hackbench or a > >> microbenchmark but aim9 (I saw that in a commit message somewhere) and > >> whatever else multithreaded benchmark you can get your hands on. > >> > >> Also, you might want to run it on other machines too, not only SNB :-) > > > > Anyway I will redo the performance testing on this version again on all > > machine. but doesn't expect something change. :) > > Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, > hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads > loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear > performance change found. With aim7 compute on 4 node 40 core box, I see stable throughput improvement at tasks = nr_cores and below w. balance and powersaving. 3.8.0-performance 3.8.0-balance 3.8.0-powersaving Tasksjobs/min jti jobs/min/task real cpu jobs/min jti jobs/min/task real cpu jobs/min jti jobs/min/task real cpu 1 432.86 100 432.8571 14.00 3.99 433.48 100 433.4764 13.98 3.97 433.17 100 433.1665 13.99 3.98 1 437.23 100 437.2294 13.86 3.85 436.60 100 436.5994 13.88 3.86 435.66 100 435.6578 13.91 3.90 1 434.10 100 434.0974 13.96 3.95 436.29 100 436.2851 13.89 3.89 436.29 100 436.2851 13.89 3.87 5 2400.95 99 480.1902 12.62 12.492554.81 98 510.9612 11.86 7.552487.68 98 497.5369 12.18 8.22 5 2341.58 99 468.3153 12.94 13.952578.72 99 515.7447 11.75 7.252527.11 99 505.4212 11.99 7.90 5 2350.66 99 470.1319 12.89 13.662600.86 99 520.1717 11.65 7.092508.28 98 501.6556 12.08 8.24 10 4291.78 99 429.1785 14.12 40.145334.51 99 533.4507 11.36 11.135183.92 98 518.3918 11.69 12.15 10 4334.76 99 433.4764 13.98 38.705311.13 99 531.1131 11.41 11.235215.15 99 521.5146 11.62 12.53 10 4273.62 99 427.3625 14.18 40.295287.96 99 528.7958 11.46 11.465144.31 98 514.4312 11.78 12.32 20 8487.39 94 424.3697 14.28 63.14 10594.41 99 529.7203 11.44 23.72 10575.92 99 528.7958 11.46 22.08 20 8387.54 97 419.3772 14.45 77.01 10575.92 98 528.7958 11.46 23.41 10520.83 99 526.0417 11.52 21.88 20 8713.16 95 435.6578 13.91 55.10 10659.63 99 532.9815 11.37 24.17 10539.13 99 526.9565 11.50
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/24/2013 11:07 PM, Alex Shi wrote: > On 01/24/2013 05:44 PM, Borislav Petkov wrote: >> On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote: >>> Since the runnable info needs 345ms to accumulate, balancing >>> doesn't do well for many tasks burst waking. After talking with Mike >>> Galbraith, we are agree to just use runnable avg in power friendly >>> scheduling and keep current instant load in performance scheduling for >>> low latency. >>> >>> So the biggest change in this version is removing runnable load avg in >>> balance and just using runnable data in power balance. >>> >>> The patchset bases on Linus' tree, includes 3 parts, >>> ** 1, bug fix and fork/wake balancing clean up. patch 1~5, >>> -- >>> the first patch remove one domain level. patch 2~5 simplified fork/wake >>> balancing, it can increase 10+% hackbench performance on our 4 sockets >>> SNB EP machine. >> >> Ok, I see some benchmarking results here and there in the commit >> messages but since this is touching the scheduler, you probably would >> need to make sure it doesn't introduce performance regressions vs >> mainline with a comprehensive set of benchmarks. >> > > Thanks a lot for your comments, Borislav! :) > > For this patchset, the code will just check current policy, if it is > performance, the code patch will back to original performance code at > once. So there should no performance change on performance policy. > > I once tested the balance policy performance with benchmark > kbuild/hackbench/aim9/dbench/tbench on version 2, only hackbench has a > bit drop ~3%. others have no clear change. > >> And, AFAICR, mainline does by default the 'performance' scheme by >> spreading out tasks to idle cores, so have you tried comparing vanilla >> mainline to your patchset in the 'performance' setting so that you can >> make sure there are no problems there? And not only hackbench or a >> microbenchmark but aim9 (I saw that in a commit message somewhere) and >> whatever else multithreaded benchmark you can get your hands on. >> >> Also, you might want to run it on other machines too, not only SNB :-) > > Anyway I will redo the performance testing on this version again on all > machine. but doesn't expect something change. :) Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear performance change found. I also tested balance policy/powersaving policy with above benchmark, found, the specjbb2005 drop much 30~50% on both of policy whenever with openjdk or jrockit. and hackbench drops a lots with powersaving policy on snb 4 sockets platforms. others has no clear change. > >> And what about ARM, maybe someone there can run your patchset too? >> >> So, it would be cool to see comprehensive results from all those runs >> and see what the numbers say. >> >> Thanks. >> > > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/24/2013 11:07 PM, Alex Shi wrote: On 01/24/2013 05:44 PM, Borislav Petkov wrote: On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote: Since the runnable info needs 345ms to accumulate, balancing doesn't do well for many tasks burst waking. After talking with Mike Galbraith, we are agree to just use runnable avg in power friendly scheduling and keep current instant load in performance scheduling for low latency. So the biggest change in this version is removing runnable load avg in balance and just using runnable data in power balance. The patchset bases on Linus' tree, includes 3 parts, ** 1, bug fix and fork/wake balancing clean up. patch 1~5, -- the first patch remove one domain level. patch 2~5 simplified fork/wake balancing, it can increase 10+% hackbench performance on our 4 sockets SNB EP machine. Ok, I see some benchmarking results here and there in the commit messages but since this is touching the scheduler, you probably would need to make sure it doesn't introduce performance regressions vs mainline with a comprehensive set of benchmarks. Thanks a lot for your comments, Borislav! :) For this patchset, the code will just check current policy, if it is performance, the code patch will back to original performance code at once. So there should no performance change on performance policy. I once tested the balance policy performance with benchmark kbuild/hackbench/aim9/dbench/tbench on version 2, only hackbench has a bit drop ~3%. others have no clear change. And, AFAICR, mainline does by default the 'performance' scheme by spreading out tasks to idle cores, so have you tried comparing vanilla mainline to your patchset in the 'performance' setting so that you can make sure there are no problems there? And not only hackbench or a microbenchmark but aim9 (I saw that in a commit message somewhere) and whatever else multithreaded benchmark you can get your hands on. Also, you might want to run it on other machines too, not only SNB :-) Anyway I will redo the performance testing on this version again on all machine. but doesn't expect something change. :) Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear performance change found. I also tested balance policy/powersaving policy with above benchmark, found, the specjbb2005 drop much 30~50% on both of policy whenever with openjdk or jrockit. and hackbench drops a lots with powersaving policy on snb 4 sockets platforms. others has no clear change. And what about ARM, maybe someone there can run your patchset too? So, it would be cool to see comprehensive results from all those runs and see what the numbers say. Thanks. -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Sun, 2013-01-27 at 10:41 +0800, Alex Shi wrote: On 01/24/2013 11:07 PM, Alex Shi wrote: On 01/24/2013 05:44 PM, Borislav Petkov wrote: On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote: Since the runnable info needs 345ms to accumulate, balancing doesn't do well for many tasks burst waking. After talking with Mike Galbraith, we are agree to just use runnable avg in power friendly scheduling and keep current instant load in performance scheduling for low latency. So the biggest change in this version is removing runnable load avg in balance and just using runnable data in power balance. The patchset bases on Linus' tree, includes 3 parts, ** 1, bug fix and fork/wake balancing clean up. patch 1~5, -- the first patch remove one domain level. patch 2~5 simplified fork/wake balancing, it can increase 10+% hackbench performance on our 4 sockets SNB EP machine. Ok, I see some benchmarking results here and there in the commit messages but since this is touching the scheduler, you probably would need to make sure it doesn't introduce performance regressions vs mainline with a comprehensive set of benchmarks. Thanks a lot for your comments, Borislav! :) For this patchset, the code will just check current policy, if it is performance, the code patch will back to original performance code at once. So there should no performance change on performance policy. I once tested the balance policy performance with benchmark kbuild/hackbench/aim9/dbench/tbench on version 2, only hackbench has a bit drop ~3%. others have no clear change. And, AFAICR, mainline does by default the 'performance' scheme by spreading out tasks to idle cores, so have you tried comparing vanilla mainline to your patchset in the 'performance' setting so that you can make sure there are no problems there? And not only hackbench or a microbenchmark but aim9 (I saw that in a commit message somewhere) and whatever else multithreaded benchmark you can get your hands on. Also, you might want to run it on other machines too, not only SNB :-) Anyway I will redo the performance testing on this version again on all machine. but doesn't expect something change. :) Just rerun some benchmarks: kbuild, specjbb2005, oltp, tbench, aim9, hackbench, fileio-cfq of sysbench, dbench, aiostress, multhreads loopback netperf. on my core2, nhm, wsm, snb, platforms. no clear performance change found. With aim7 compute on 4 node 40 core box, I see stable throughput improvement at tasks = nr_cores and below w. balance and powersaving. 3.8.0-performance 3.8.0-balance 3.8.0-powersaving Tasksjobs/min jti jobs/min/task real cpu jobs/min jti jobs/min/task real cpu jobs/min jti jobs/min/task real cpu 1 432.86 100 432.8571 14.00 3.99 433.48 100 433.4764 13.98 3.97 433.17 100 433.1665 13.99 3.98 1 437.23 100 437.2294 13.86 3.85 436.60 100 436.5994 13.88 3.86 435.66 100 435.6578 13.91 3.90 1 434.10 100 434.0974 13.96 3.95 436.29 100 436.2851 13.89 3.89 436.29 100 436.2851 13.89 3.87 5 2400.95 99 480.1902 12.62 12.492554.81 98 510.9612 11.86 7.552487.68 98 497.5369 12.18 8.22 5 2341.58 99 468.3153 12.94 13.952578.72 99 515.7447 11.75 7.252527.11 99 505.4212 11.99 7.90 5 2350.66 99 470.1319 12.89 13.662600.86 99 520.1717 11.65 7.092508.28 98 501.6556 12.08 8.24 10 4291.78 99 429.1785 14.12 40.145334.51 99 533.4507 11.36 11.135183.92 98 518.3918 11.69 12.15 10 4334.76 99 433.4764 13.98 38.705311.13 99 531.1131 11.41 11.235215.15 99 521.5146 11.62 12.53 10 4273.62 99 427.3625 14.18 40.295287.96 99 528.7958 11.46 11.465144.31 98 514.4312 11.78 12.32 20 8487.39 94 424.3697 14.28 63.14 10594.41 99 529.7203 11.44 23.72 10575.92 99 528.7958 11.46 22.08 20 8387.54 97 419.3772 14.45 77.01 10575.92 98 528.7958 11.46 23.41 10520.83 99 526.0417 11.52 21.88 20 8713.16 95 435.6578 13.91 55.10 10659.63 99 532.9815 11.37 24.17 10539.13 99 526.9565 11.50 22.13 4016786.70 99 419.6676 14.44170.08 19469.88 98 486.7470 12.45 60.78 19967.05 98
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/24/2013 05:44 PM, Borislav Petkov wrote: > On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote: >> Since the runnable info needs 345ms to accumulate, balancing >> doesn't do well for many tasks burst waking. After talking with Mike >> Galbraith, we are agree to just use runnable avg in power friendly >> scheduling and keep current instant load in performance scheduling for >> low latency. >> >> So the biggest change in this version is removing runnable load avg in >> balance and just using runnable data in power balance. >> >> The patchset bases on Linus' tree, includes 3 parts, >> ** 1, bug fix and fork/wake balancing clean up. patch 1~5, >> -- >> the first patch remove one domain level. patch 2~5 simplified fork/wake >> balancing, it can increase 10+% hackbench performance on our 4 sockets >> SNB EP machine. > > Ok, I see some benchmarking results here and there in the commit > messages but since this is touching the scheduler, you probably would > need to make sure it doesn't introduce performance regressions vs > mainline with a comprehensive set of benchmarks. > Thanks a lot for your comments, Borislav! :) For this patchset, the code will just check current policy, if it is performance, the code patch will back to original performance code at once. So there should no performance change on performance policy. I once tested the balance policy performance with benchmark kbuild/hackbench/aim9/dbench/tbench on version 2, only hackbench has a bit drop ~3%. others have no clear change. > And, AFAICR, mainline does by default the 'performance' scheme by > spreading out tasks to idle cores, so have you tried comparing vanilla > mainline to your patchset in the 'performance' setting so that you can > make sure there are no problems there? And not only hackbench or a > microbenchmark but aim9 (I saw that in a commit message somewhere) and > whatever else multithreaded benchmark you can get your hands on. > > Also, you might want to run it on other machines too, not only SNB :-) Anyway I will redo the performance testing on this version again on all machine. but doesn't expect something change. :) > And what about ARM, maybe someone there can run your patchset too? > > So, it would be cool to see comprehensive results from all those runs > and see what the numbers say. > > Thanks. > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote: > Since the runnable info needs 345ms to accumulate, balancing > doesn't do well for many tasks burst waking. After talking with Mike > Galbraith, we are agree to just use runnable avg in power friendly > scheduling and keep current instant load in performance scheduling for > low latency. > > So the biggest change in this version is removing runnable load avg in > balance and just using runnable data in power balance. > > The patchset bases on Linus' tree, includes 3 parts, > ** 1, bug fix and fork/wake balancing clean up. patch 1~5, > -- > the first patch remove one domain level. patch 2~5 simplified fork/wake > balancing, it can increase 10+% hackbench performance on our 4 sockets > SNB EP machine. Ok, I see some benchmarking results here and there in the commit messages but since this is touching the scheduler, you probably would need to make sure it doesn't introduce performance regressions vs mainline with a comprehensive set of benchmarks. And, AFAICR, mainline does by default the 'performance' scheme by spreading out tasks to idle cores, so have you tried comparing vanilla mainline to your patchset in the 'performance' setting so that you can make sure there are no problems there? And not only hackbench or a microbenchmark but aim9 (I saw that in a commit message somewhere) and whatever else multithreaded benchmark you can get your hands on. Also, you might want to run it on other machines too, not only SNB :-) And what about ARM, maybe someone there can run your patchset too? So, it would be cool to see comprehensive results from all those runs and see what the numbers say. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On 01/24/2013 05:44 PM, Borislav Petkov wrote: On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote: Since the runnable info needs 345ms to accumulate, balancing doesn't do well for many tasks burst waking. After talking with Mike Galbraith, we are agree to just use runnable avg in power friendly scheduling and keep current instant load in performance scheduling for low latency. So the biggest change in this version is removing runnable load avg in balance and just using runnable data in power balance. The patchset bases on Linus' tree, includes 3 parts, ** 1, bug fix and fork/wake balancing clean up. patch 1~5, -- the first patch remove one domain level. patch 2~5 simplified fork/wake balancing, it can increase 10+% hackbench performance on our 4 sockets SNB EP machine. Ok, I see some benchmarking results here and there in the commit messages but since this is touching the scheduler, you probably would need to make sure it doesn't introduce performance regressions vs mainline with a comprehensive set of benchmarks. Thanks a lot for your comments, Borislav! :) For this patchset, the code will just check current policy, if it is performance, the code patch will back to original performance code at once. So there should no performance change on performance policy. I once tested the balance policy performance with benchmark kbuild/hackbench/aim9/dbench/tbench on version 2, only hackbench has a bit drop ~3%. others have no clear change. And, AFAICR, mainline does by default the 'performance' scheme by spreading out tasks to idle cores, so have you tried comparing vanilla mainline to your patchset in the 'performance' setting so that you can make sure there are no problems there? And not only hackbench or a microbenchmark but aim9 (I saw that in a commit message somewhere) and whatever else multithreaded benchmark you can get your hands on. Also, you might want to run it on other machines too, not only SNB :-) Anyway I will redo the performance testing on this version again on all machine. but doesn't expect something change. :) And what about ARM, maybe someone there can run your patchset too? So, it would be cool to see comprehensive results from all those runs and see what the numbers say. Thanks. -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
On Thu, Jan 24, 2013 at 11:06:42AM +0800, Alex Shi wrote: Since the runnable info needs 345ms to accumulate, balancing doesn't do well for many tasks burst waking. After talking with Mike Galbraith, we are agree to just use runnable avg in power friendly scheduling and keep current instant load in performance scheduling for low latency. So the biggest change in this version is removing runnable load avg in balance and just using runnable data in power balance. The patchset bases on Linus' tree, includes 3 parts, ** 1, bug fix and fork/wake balancing clean up. patch 1~5, -- the first patch remove one domain level. patch 2~5 simplified fork/wake balancing, it can increase 10+% hackbench performance on our 4 sockets SNB EP machine. Ok, I see some benchmarking results here and there in the commit messages but since this is touching the scheduler, you probably would need to make sure it doesn't introduce performance regressions vs mainline with a comprehensive set of benchmarks. And, AFAICR, mainline does by default the 'performance' scheme by spreading out tasks to idle cores, so have you tried comparing vanilla mainline to your patchset in the 'performance' setting so that you can make sure there are no problems there? And not only hackbench or a microbenchmark but aim9 (I saw that in a commit message somewhere) and whatever else multithreaded benchmark you can get your hands on. Also, you might want to run it on other machines too, not only SNB :-) And what about ARM, maybe someone there can run your patchset too? So, it would be cool to see comprehensive results from all those runs and see what the numbers say. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
Since the runnable info needs 345ms to accumulate, balancing doesn't do well for many tasks burst waking. After talking with Mike Galbraith, we are agree to just use runnable avg in power friendly scheduling and keep current instant load in performance scheduling for low latency. So the biggest change in this version is removing runnable load avg in balance and just using runnable data in power balance. The patchset bases on Linus' tree, includes 3 parts, ** 1, bug fix and fork/wake balancing clean up. patch 1~5, -- the first patch remove one domain level. patch 2~5 simplified fork/wake balancing, it can increase 10+% hackbench performance on our 4 sockets SNB EP machine. V3 change: a, added the first patch to remove one domain level on x86 platform. b, some small changes according to Namhyung Kim's comments, thanks! ** 2, bug fix of load avg and remove the CONFIG_FAIR_GROUP_SCHED limit -- patch 6~8, That using runnable avg in load balancing, with two initial runnable variables fix. V4 change: a, remove runnable log avg using in balancing. V3 change: a, use rq->cfs.runnable_load_avg as cpu load not rq->avg.load_avg_contrib, since the latter need much time to accumulate for new forked task, b, a build issue fixed with Namhyung Kim's reminder. ** 3, power awareness scheduling, patch 9~18. -- The subset implement/consummate the rough power aware scheduling proposal: https://lkml.org/lkml/2012/8/13/139. It defines 2 new power aware policy 'balance' and 'powersaving' and then try to spread or pack tasks on each sched groups level according the different scheduler policy. That can save much power when task number in system is no more then LCPU number. As mentioned in the power aware scheduler proposal, Power aware scheduling has 2 assumptions: 1, race to idle is helpful for power saving 2, pack tasks on less sched_groups will reduce power consumption The first assumption make performance policy take over scheduling when system busy. The second assumption make power aware scheduling try to move disperse tasks into fewer groups until that groups are full of tasks. Some power testing data is in the last 2 patches. V4 change: a, fix few bugs and clean up code according to Morten Rasmussen, Mike Galbraith and Namhyung Kim. Thanks! b, take Morten's suggestion to set different criteria for different policy in small task packing. c, shorter latency in power aware scheduling. V3 change: a, engaged nr_running in max potential utils consideration in periodic power balancing. b, try exec/wake small tasks on running cpu not idle cpu. V2 change: a, add lazy power scheduling to deal with kbuild like benchmark. Thanks Fengguang Wu for the build testing of this patchset! Any comments are appreciated! -- Thanks Alex [patch v4 01/18] sched: set SD_PREFER_SIBLING on MC domain to reduce [patch v4 02/18] sched: select_task_rq_fair clean up [patch v4 03/18] sched: fix find_idlest_group mess logical [patch v4 04/18] sched: don't need go to smaller sched domain [patch v4 05/18] sched: quicker balancing on fork/exec/wake [patch v4 06/18] sched: give initial value for runnable avg of sched [patch v4 07/18] sched: set initial load avg of new forked task [patch v4 08/18] Revert "sched: Introduce temporary FAIR_GROUP_SCHED [patch v4 09/18] sched: add sched_policies in kernel [patch v4 10/18] sched: add sysfs interface for sched_policy [patch v4 11/18] sched: log the cpu utilization at rq [patch v4 12/18] sched: add power aware scheduling in fork/exec/wake [patch v4 13/18] sched: packing small tasks in wake/exec balancing [patch v4 14/18] sched: add power/performance balance allowed flag [patch v4 15/18] sched: pull all tasks from source group [patch v4 16/18] sched: don't care if the local group has capacity [patch v4 17/18] sched: power aware load balance, [patch v4 18/18] sched: lazy power balance -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch v4 0/18] sched: simplified fork, release load avg and power awareness scheduling
Since the runnable info needs 345ms to accumulate, balancing doesn't do well for many tasks burst waking. After talking with Mike Galbraith, we are agree to just use runnable avg in power friendly scheduling and keep current instant load in performance scheduling for low latency. So the biggest change in this version is removing runnable load avg in balance and just using runnable data in power balance. The patchset bases on Linus' tree, includes 3 parts, ** 1, bug fix and fork/wake balancing clean up. patch 1~5, -- the first patch remove one domain level. patch 2~5 simplified fork/wake balancing, it can increase 10+% hackbench performance on our 4 sockets SNB EP machine. V3 change: a, added the first patch to remove one domain level on x86 platform. b, some small changes according to Namhyung Kim's comments, thanks! ** 2, bug fix of load avg and remove the CONFIG_FAIR_GROUP_SCHED limit -- patch 6~8, That using runnable avg in load balancing, with two initial runnable variables fix. V4 change: a, remove runnable log avg using in balancing. V3 change: a, use rq-cfs.runnable_load_avg as cpu load not rq-avg.load_avg_contrib, since the latter need much time to accumulate for new forked task, b, a build issue fixed with Namhyung Kim's reminder. ** 3, power awareness scheduling, patch 9~18. -- The subset implement/consummate the rough power aware scheduling proposal: https://lkml.org/lkml/2012/8/13/139. It defines 2 new power aware policy 'balance' and 'powersaving' and then try to spread or pack tasks on each sched groups level according the different scheduler policy. That can save much power when task number in system is no more then LCPU number. As mentioned in the power aware scheduler proposal, Power aware scheduling has 2 assumptions: 1, race to idle is helpful for power saving 2, pack tasks on less sched_groups will reduce power consumption The first assumption make performance policy take over scheduling when system busy. The second assumption make power aware scheduling try to move disperse tasks into fewer groups until that groups are full of tasks. Some power testing data is in the last 2 patches. V4 change: a, fix few bugs and clean up code according to Morten Rasmussen, Mike Galbraith and Namhyung Kim. Thanks! b, take Morten's suggestion to set different criteria for different policy in small task packing. c, shorter latency in power aware scheduling. V3 change: a, engaged nr_running in max potential utils consideration in periodic power balancing. b, try exec/wake small tasks on running cpu not idle cpu. V2 change: a, add lazy power scheduling to deal with kbuild like benchmark. Thanks Fengguang Wu for the build testing of this patchset! Any comments are appreciated! -- Thanks Alex [patch v4 01/18] sched: set SD_PREFER_SIBLING on MC domain to reduce [patch v4 02/18] sched: select_task_rq_fair clean up [patch v4 03/18] sched: fix find_idlest_group mess logical [patch v4 04/18] sched: don't need go to smaller sched domain [patch v4 05/18] sched: quicker balancing on fork/exec/wake [patch v4 06/18] sched: give initial value for runnable avg of sched [patch v4 07/18] sched: set initial load avg of new forked task [patch v4 08/18] Revert sched: Introduce temporary FAIR_GROUP_SCHED [patch v4 09/18] sched: add sched_policies in kernel [patch v4 10/18] sched: add sysfs interface for sched_policy [patch v4 11/18] sched: log the cpu utilization at rq [patch v4 12/18] sched: add power aware scheduling in fork/exec/wake [patch v4 13/18] sched: packing small tasks in wake/exec balancing [patch v4 14/18] sched: add power/performance balance allowed flag [patch v4 15/18] sched: pull all tasks from source group [patch v4 16/18] sched: don't care if the local group has capacity [patch v4 17/18] sched: power aware load balance, [patch v4 18/18] sched: lazy power balance -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/