RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Chen, Kenneth W
Ingo Molnar wrote on Friday, July 29, 2005 4:26 AM
> * Chen, Kenneth W <[EMAIL PROTECTED]> wrote:
> > To demonstrate the problem, we turned off these two flags in the cpu 
> > sd domain and measured a stunning 2.15% performance gain!  And 
> > deleting all the code in the try_to_wake_up() pertain to load 
> > balancing gives us another 0.2% gain.
> 
> another thing: do you have a HT-capable ia64 CPU, and do you have 
> CONFIG_SCHED_SMT turned on? If yes then could you try to turn off 
> SD_WAKE_IDLE too, i found it to bring further performance improvements 
> in certain workloads.

The scheduler experiments done so far are on non-SMT CPU (Madison processor).
We have another db setup with multi-thread capable ia64 cpu (montecito, and to
be precise, it is SOEMT capable).  We are just about to do scheduler experiments
on that setup.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> Chen, Kenneth W wrote:
> >Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM
> 
> >This clearly outlines an issue with the implementation.  Optimize for one
> >type of workload has detrimental effect on another workload and vice versa.
> >
> 
> Yep. That comes up fairly regularly when tuning the scheduler :(

in this particular case we can clearly separate the two workloads 
though: CPU-overload (Ken's benchmark) vs. half-load (3-task tbench). So 
by checking for migration target/source idleness we can have a hard 
separator for wakeup balancing. (whether it works out for both types of 
workloads remains to be seen)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Ingo Molnar

* Chen, Kenneth W <[EMAIL PROTECTED]> wrote:

> To demonstrate the problem, we turned off these two flags in the cpu 
> sd domain and measured a stunning 2.15% performance gain!  And 
> deleting all the code in the try_to_wake_up() pertain to load 
> balancing gives us another 0.2% gain.

another thing: do you have a HT-capable ia64 CPU, and do you have 
CONFIG_SCHED_SMT turned on? If yes then could you try to turn off 
SD_WAKE_IDLE too, i found it to bring further performance improvements 
in certain workloads.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> Well, you can easily see suboptimal scheduling decisions on many 
> programs with lots of interprocess communication. For example, tbench 
> on a dual Xeon:
> 
> processes1   2   3  4
> 
> 2.6.13-rc4:  187, 183, 179   260, 259, 256   340, 320, 349  504, 496, 500
> no wake-bal: 180, 180, 177   254, 254, 253   268, 270, 348  345, 290, 500
> 
> Numbers are MB/s, higher is better.

i cannot see any difference with/without wake-balancing in this 
workload, on a dual Xeon. Could you try the quick hack below and do:

echo 1 > /proc/sys/kernel/panic # turn on wake-balancing
echo 0 > /proc/sys/kernel/panic # turn off wake-balancing

does the runtime switching show any effects on the throughput numbers 
tbench is showing? I'm using dbench-3.03. (i only checked the status 
numbers, didnt do full runs)

(did you have SCHED_SMT enabled?)

Ingo

 kernel/sched.c |2 ++
 1 files changed, 2 insertions(+)

Index: linux-prefetch-task/kernel/sched.c
===
--- linux-prefetch-task.orig/kernel/sched.c
+++ linux-prefetch-task/kernel/sched.c
@@ -1155,6 +1155,8 @@ static int try_to_wake_up(task_t * p, un
goto out_activate;
 
new_cpu = cpu;
+   if (!panic_timeout)
+   goto out_set_cpu;
 
schedstat_inc(rq, ttwu_cnt);
if (cpu == this_cpu) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> >>processes1   2   3  4
> >>
> >>2.6.13-rc4:  187, 183, 179   260, 259, 256   340, 320, 349  504, 496, 500
> >>no wake-bal: 180, 180, 177   254, 254, 253   268, 270, 348  345, 290, 500
> >>
> >>Numbers are MB/s, higher is better.
> >
> >
> >what type of network was used - localhost or a real one?
> >
> 
> Localhost. Yeah it isn't a real world test, but it does show the 
> erratic behaviour without wake affine.

yeah - fine enough. (It's not representative for IO workloads, but it's 
representative for local IPC workloads, just wanted to know precisely 
which workload it is.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Nick Piggin

Ingo Molnar wrote:

* Nick Piggin <[EMAIL PROTECTED]> wrote:



processes1   2   3  4

2.6.13-rc4:  187, 183, 179   260, 259, 256   340, 320, 349  504, 496, 500
no wake-bal: 180, 180, 177   254, 254, 253   268, 270, 348  345, 290, 500

Numbers are MB/s, higher is better.



what type of network was used - localhost or a real one?



Localhost. Yeah it isn't a real world test, but it does show the
erratic behaviour without wake affine.

I don't have a setup with multiple fast network adapters otherwise
I would have run a similar test using a real network.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> Well, you can easily see suboptimal scheduling decisions on many 
> programs with lots of interprocess communication. For example, tbench 
> on a dual Xeon:
> 
> processes1   2   3  4
> 
> 2.6.13-rc4:  187, 183, 179   260, 259, 256   340, 320, 349  504, 496, 500
> no wake-bal: 180, 180, 177   254, 254, 253   268, 270, 348  345, 290, 500
> 
> Numbers are MB/s, higher is better.

what type of network was used - localhost or a real one?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Nick Piggin

Chen, Kenneth W wrote:

Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM



This clearly outlines an issue with the implementation.  Optimize for one
type of workload has detrimental effect on another workload and vice versa.



Yep. That comes up fairly regularly when tuning the scheduler :(



I won't try to compromise between the two.  If you do so, we would end up
with two half baked raw turkey.  Making less aggressive load balance in the
wake up path would probably reduce performance for the type of workload you
quoted earlier and for db workload, we don't want any of them at all, not
even the code to determine whether it should be balanced or not.



Well, that remains to be seen. If it can be made _smarter_, then you
may not have to take such a big compromise.

But either way, there will have to be some compromise made. At the
very least you have to find some acceptable default.


Do you have an example workload you mentioned earlier that depends on
SD_WAKE_BALANCE?  I would like to experiment with it so we can move this
forward instead of paper talk.



Well, you can easily see suboptimal scheduling decisions on many
programs with lots of interprocess communication. For example, tbench
on a dual Xeon:

processes1   2   3  4

2.6.13-rc4:  187, 183, 179   260, 259, 256   340, 320, 349  504, 496, 500
no wake-bal: 180, 180, 177   254, 254, 253   268, 270, 348  345, 290, 500

Numbers are MB/s, higher is better.

Networking or other IO workloads where processes are tightly coupled
to a specific adapter / interrupt source can also see pretty good
gains.

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM
> Chen, Kenneth W wrote:
> >Well, that's exactly what I'm trying to do: make them not aggressive
> >at all by not performing any load balance :-)  The workload gets maximum
> >benefit with zero aggressiveness.
> 
> Unfortunately we can't forget about other workloads, and we're
> trying to stay away from runtime tunables in the scheduler.


This clearly outlines an issue with the implementation.  Optimize for one
type of workload has detrimental effect on another workload and vice versa.


> If we can get performance to within a couple of tenths of a percent
> of the zero balancing case, then that would be preferable I think.

I won't try to compromise between the two.  If you do so, we would end up
with two half baked raw turkey.  Making less aggressive load balance in the
wake up path would probably reduce performance for the type of workload you
quoted earlier and for db workload, we don't want any of them at all, not
even the code to determine whether it should be balanced or not.

Do you have an example workload you mentioned earlier that depends on
SD_WAKE_BALANCE?  I would like to experiment with it so we can move this
forward instead of paper talk.

- Ken

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM
 Chen, Kenneth W wrote:
 Well, that's exactly what I'm trying to do: make them not aggressive
 at all by not performing any load balance :-)  The workload gets maximum
 benefit with zero aggressiveness.
 
 Unfortunately we can't forget about other workloads, and we're
 trying to stay away from runtime tunables in the scheduler.


This clearly outlines an issue with the implementation.  Optimize for one
type of workload has detrimental effect on another workload and vice versa.


 If we can get performance to within a couple of tenths of a percent
 of the zero balancing case, then that would be preferable I think.

I won't try to compromise between the two.  If you do so, we would end up
with two half baked raw turkey.  Making less aggressive load balance in the
wake up path would probably reduce performance for the type of workload you
quoted earlier and for db workload, we don't want any of them at all, not
even the code to determine whether it should be balanced or not.

Do you have an example workload you mentioned earlier that depends on
SD_WAKE_BALANCE?  I would like to experiment with it so we can move this
forward instead of paper talk.

- Ken

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Nick Piggin

Chen, Kenneth W wrote:

Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM



This clearly outlines an issue with the implementation.  Optimize for one
type of workload has detrimental effect on another workload and vice versa.



Yep. That comes up fairly regularly when tuning the scheduler :(



I won't try to compromise between the two.  If you do so, we would end up
with two half baked raw turkey.  Making less aggressive load balance in the
wake up path would probably reduce performance for the type of workload you
quoted earlier and for db workload, we don't want any of them at all, not
even the code to determine whether it should be balanced or not.



Well, that remains to be seen. If it can be made _smarter_, then you
may not have to take such a big compromise.

But either way, there will have to be some compromise made. At the
very least you have to find some acceptable default.


Do you have an example workload you mentioned earlier that depends on
SD_WAKE_BALANCE?  I would like to experiment with it so we can move this
forward instead of paper talk.



Well, you can easily see suboptimal scheduling decisions on many
programs with lots of interprocess communication. For example, tbench
on a dual Xeon:

processes1   2   3  4

2.6.13-rc4:  187, 183, 179   260, 259, 256   340, 320, 349  504, 496, 500
no wake-bal: 180, 180, 177   254, 254, 253   268, 270, 348  345, 290, 500

Numbers are MB/s, higher is better.

Networking or other IO workloads where processes are tightly coupled
to a specific adapter / interrupt source can also see pretty good
gains.

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

 Well, you can easily see suboptimal scheduling decisions on many 
 programs with lots of interprocess communication. For example, tbench 
 on a dual Xeon:
 
 processes1   2   3  4
 
 2.6.13-rc4:  187, 183, 179   260, 259, 256   340, 320, 349  504, 496, 500
 no wake-bal: 180, 180, 177   254, 254, 253   268, 270, 348  345, 290, 500
 
 Numbers are MB/s, higher is better.

what type of network was used - localhost or a real one?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Nick Piggin

Ingo Molnar wrote:

* Nick Piggin [EMAIL PROTECTED] wrote:



processes1   2   3  4

2.6.13-rc4:  187, 183, 179   260, 259, 256   340, 320, 349  504, 496, 500
no wake-bal: 180, 180, 177   254, 254, 253   268, 270, 348  345, 290, 500

Numbers are MB/s, higher is better.



what type of network was used - localhost or a real one?



Localhost. Yeah it isn't a real world test, but it does show the
erratic behaviour without wake affine.

I don't have a setup with multiple fast network adapters otherwise
I would have run a similar test using a real network.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

 processes1   2   3  4
 
 2.6.13-rc4:  187, 183, 179   260, 259, 256   340, 320, 349  504, 496, 500
 no wake-bal: 180, 180, 177   254, 254, 253   268, 270, 348  345, 290, 500
 
 Numbers are MB/s, higher is better.
 
 
 what type of network was used - localhost or a real one?
 
 
 Localhost. Yeah it isn't a real world test, but it does show the 
 erratic behaviour without wake affine.

yeah - fine enough. (It's not representative for IO workloads, but it's 
representative for local IPC workloads, just wanted to know precisely 
which workload it is.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

 Well, you can easily see suboptimal scheduling decisions on many 
 programs with lots of interprocess communication. For example, tbench 
 on a dual Xeon:
 
 processes1   2   3  4
 
 2.6.13-rc4:  187, 183, 179   260, 259, 256   340, 320, 349  504, 496, 500
 no wake-bal: 180, 180, 177   254, 254, 253   268, 270, 348  345, 290, 500
 
 Numbers are MB/s, higher is better.

i cannot see any difference with/without wake-balancing in this 
workload, on a dual Xeon. Could you try the quick hack below and do:

echo 1  /proc/sys/kernel/panic # turn on wake-balancing
echo 0  /proc/sys/kernel/panic # turn off wake-balancing

does the runtime switching show any effects on the throughput numbers 
tbench is showing? I'm using dbench-3.03. (i only checked the status 
numbers, didnt do full runs)

(did you have SCHED_SMT enabled?)

Ingo

 kernel/sched.c |2 ++
 1 files changed, 2 insertions(+)

Index: linux-prefetch-task/kernel/sched.c
===
--- linux-prefetch-task.orig/kernel/sched.c
+++ linux-prefetch-task/kernel/sched.c
@@ -1155,6 +1155,8 @@ static int try_to_wake_up(task_t * p, un
goto out_activate;
 
new_cpu = cpu;
+   if (!panic_timeout)
+   goto out_set_cpu;
 
schedstat_inc(rq, ttwu_cnt);
if (cpu == this_cpu) {
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Ingo Molnar

* Chen, Kenneth W [EMAIL PROTECTED] wrote:

 To demonstrate the problem, we turned off these two flags in the cpu 
 sd domain and measured a stunning 2.15% performance gain!  And 
 deleting all the code in the try_to_wake_up() pertain to load 
 balancing gives us another 0.2% gain.

another thing: do you have a HT-capable ia64 CPU, and do you have 
CONFIG_SCHED_SMT turned on? If yes then could you try to turn off 
SD_WAKE_IDLE too, i found it to bring further performance improvements 
in certain workloads.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

 Chen, Kenneth W wrote:
 Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM
 
 This clearly outlines an issue with the implementation.  Optimize for one
 type of workload has detrimental effect on another workload and vice versa.
 
 
 Yep. That comes up fairly regularly when tuning the scheduler :(

in this particular case we can clearly separate the two workloads 
though: CPU-overload (Ken's benchmark) vs. half-load (3-task tbench). So 
by checking for migration target/source idleness we can have a hard 
separator for wakeup balancing. (whether it works out for both types of 
workloads remains to be seen)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Chen, Kenneth W
Ingo Molnar wrote on Friday, July 29, 2005 4:26 AM
 * Chen, Kenneth W [EMAIL PROTECTED] wrote:
  To demonstrate the problem, we turned off these two flags in the cpu 
  sd domain and measured a stunning 2.15% performance gain!  And 
  deleting all the code in the try_to_wake_up() pertain to load 
  balancing gives us another 0.2% gain.
 
 another thing: do you have a HT-capable ia64 CPU, and do you have 
 CONFIG_SCHED_SMT turned on? If yes then could you try to turn off 
 SD_WAKE_IDLE too, i found it to bring further performance improvements 
 in certain workloads.

The scheduler experiments done so far are on non-SMT CPU (Madison processor).
We have another db setup with multi-thread capable ia64 cpu (montecito, and to
be precise, it is SOEMT capable).  We are just about to do scheduler experiments
on that setup.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Nick Piggin

Chen, Kenneth W wrote:


Nick Piggin wrote on Thursday, July 28, 2005 6:46 PM


I'd like to try making them less aggressive first if possible.



Well, that's exactly what I'm trying to do: make them not aggressive
at all by not performing any load balance :-)  The workload gets maximum
benefit with zero aggressiveness.




Unfortunately we can't forget about other workloads, and we're
trying to stay away from runtime tunables in the scheduler.

If we can get performance to within a couple of tenths of a percent
of the zero balancing case, then that would be preferable I think.


Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 6:46 PM
> I'd like to try making them less aggressive first if possible.

Well, that's exactly what I'm trying to do: make them not aggressive
at all by not performing any load balance :-)  The workload gets maximum
benefit with zero aggressiveness.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Nick Piggin

Chen, Kenneth W wrote:


Nick Piggin wrote on Thursday, July 28, 2005 6:25 PM


Well pipes are just an example. It could be any type of communication.
What's more, even the synchronous wakeup uses the wake balancing path
(although that could be modified to only do wake balancing for synch
wakeups, I'd have to be convinced we should special case pipes and not
eg. semaphores or AF_UNIX sockets).




Why is the normal load balance path not enough (or not be able to do the
right thing)?  The reblance_tick and idle_balance ought be enough to take
care of the imbalance.  What makes load balancing in wake up path so special?




Well the normal load balancing path treats all tasks the same, while
the wake path knows if a CPU is waking a remote task and can attempt
to maximise the number of local wakeups.


Oh, I'd like to hear your opinion on what to do with these two flags, make
them runtime configurable? (I'm of the opinion to delete them altogether)




I'd like to try making them less aggressive first if possible.


Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 6:25 PM
> Well pipes are just an example. It could be any type of communication.
> What's more, even the synchronous wakeup uses the wake balancing path
> (although that could be modified to only do wake balancing for synch
> wakeups, I'd have to be convinced we should special case pipes and not
> eg. semaphores or AF_UNIX sockets).


Why is the normal load balance path not enough (or not be able to do the
right thing)?  The reblance_tick and idle_balance ought be enough to take
care of the imbalance.  What makes load balancing in wake up path so special?

Oh, I'd like to hear your opinion on what to do with these two flags, make
them runtime configurable? (I'm of the opinion to delete them altogether)

- Ken

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Nick Piggin

Chen, Kenneth W wrote:


Nick Piggin wrote on Thursday, July 28, 2005 4:35 PM


Wake balancing provides an opportunity to provide some input bias
into the load balancer.

For example, if you started 100 pairs of tasks which communicate
through a pipe. On a 2 CPU system without wake balancing, probably
half of the pairs will be on different CPUs. With wake balancing,
it should be much better.



Shouldn't the pipe code use synchronous wakeup?




Well pipes are just an example. It could be any type of communication.
What's more, even the synchronous wakeup uses the wake balancing path
(although that could be modified to only do wake balancing for synch
wakeups, I'd have to be convinced we should special case pipes and not
eg. semaphores or AF_UNIX sockets).




I hear you might be having problems with recent 2.6.13 kernels? If so,
it would be really good to have a look that before 2.6.13 goes out the
door.



Yes I do :-(, apparently bumping up cache_hot_time won't give us the
performance boost we used to see.



OK there are probably a number of things we can explore depending on
what are the symptoms (eg. excessive idle time, bad cache performance).

Unfortunately it is kind of difficult to tune 2.6.13 on the basis of
2.6.12 results - although that's not to say it won't indicate a good
avenue to investigate.


Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 4:35 PM
> Wake balancing provides an opportunity to provide some input bias
> into the load balancer.
> 
> For example, if you started 100 pairs of tasks which communicate
> through a pipe. On a 2 CPU system without wake balancing, probably
> half of the pairs will be on different CPUs. With wake balancing,
> it should be much better.

Shouldn't the pipe code use synchronous wakeup?


> I hear you might be having problems with recent 2.6.13 kernels? If so,
> it would be really good to have a look that before 2.6.13 goes out the
> door.

Yes I do :-(, apparently bumping up cache_hot_time won't give us the
performance boost we used to see.

- Ken

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Nick Piggin

Chen, Kenneth W wrote:

What sort of workload needs SD_WAKE_AFFINE and SD_WAKE_BALANCE?
SD_WAKE_AFFINE are not useful in conjunction with interrupt binding.
In fact, it creates more harm than usefulness, causing detrimental
process migration and destroy process cache affinity etc.  Also
SD_WAKE_BALANCE is giving us performance grief with our industry
standard OLTP workload.



The periodic load balancer basically makes completely undirected,
random choices when picking which tasks to move where.

Wake balancing provides an opportunity to provide some input bias
into the load balancer.

For example, if you started 100 pairs of tasks which communicate
through a pipe. On a 2 CPU system without wake balancing, probably
half of the pairs will be on different CPUs. With wake balancing,
it should be much better.

I've also been told that it impoves IO efficiency significantly -
obviously that depends on the system and workload.


To demonstrate the problem, we turned off these two flags in the cpu
sd domain and measured a stunning 2.15% performance gain!  And deleting
all the code in the try_to_wake_up() pertain to load balancing gives us
another 0.2% gain.

The wake up patch should be made simple, just put the waking task on
the previously ran cpu runqueue.  Simple and elegant.

I'm proposing we either delete these two flags or make them run time
configurable.



There have been lots of changes since 2.6.12. Including less aggressive
wake balancing.

I hear you might be having problems with recent 2.6.13 kernels? If so,
it would be really good to have a look that before 2.6.13 goes out the
door.

I appreciate all the effort you're putting into this!

Nick

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Nick Piggin

Chen, Kenneth W wrote:

What sort of workload needs SD_WAKE_AFFINE and SD_WAKE_BALANCE?
SD_WAKE_AFFINE are not useful in conjunction with interrupt binding.
In fact, it creates more harm than usefulness, causing detrimental
process migration and destroy process cache affinity etc.  Also
SD_WAKE_BALANCE is giving us performance grief with our industry
standard OLTP workload.



The periodic load balancer basically makes completely undirected,
random choices when picking which tasks to move where.

Wake balancing provides an opportunity to provide some input bias
into the load balancer.

For example, if you started 100 pairs of tasks which communicate
through a pipe. On a 2 CPU system without wake balancing, probably
half of the pairs will be on different CPUs. With wake balancing,
it should be much better.

I've also been told that it impoves IO efficiency significantly -
obviously that depends on the system and workload.


To demonstrate the problem, we turned off these two flags in the cpu
sd domain and measured a stunning 2.15% performance gain!  And deleting
all the code in the try_to_wake_up() pertain to load balancing gives us
another 0.2% gain.

The wake up patch should be made simple, just put the waking task on
the previously ran cpu runqueue.  Simple and elegant.

I'm proposing we either delete these two flags or make them run time
configurable.



There have been lots of changes since 2.6.12. Including less aggressive
wake balancing.

I hear you might be having problems with recent 2.6.13 kernels? If so,
it would be really good to have a look that before 2.6.13 goes out the
door.

I appreciate all the effort you're putting into this!

Nick

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 4:35 PM
 Wake balancing provides an opportunity to provide some input bias
 into the load balancer.
 
 For example, if you started 100 pairs of tasks which communicate
 through a pipe. On a 2 CPU system without wake balancing, probably
 half of the pairs will be on different CPUs. With wake balancing,
 it should be much better.

Shouldn't the pipe code use synchronous wakeup?


 I hear you might be having problems with recent 2.6.13 kernels? If so,
 it would be really good to have a look that before 2.6.13 goes out the
 door.

Yes I do :-(, apparently bumping up cache_hot_time won't give us the
performance boost we used to see.

- Ken

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Nick Piggin

Chen, Kenneth W wrote:


Nick Piggin wrote on Thursday, July 28, 2005 4:35 PM


Wake balancing provides an opportunity to provide some input bias
into the load balancer.

For example, if you started 100 pairs of tasks which communicate
through a pipe. On a 2 CPU system without wake balancing, probably
half of the pairs will be on different CPUs. With wake balancing,
it should be much better.



Shouldn't the pipe code use synchronous wakeup?




Well pipes are just an example. It could be any type of communication.
What's more, even the synchronous wakeup uses the wake balancing path
(although that could be modified to only do wake balancing for synch
wakeups, I'd have to be convinced we should special case pipes and not
eg. semaphores or AF_UNIX sockets).




I hear you might be having problems with recent 2.6.13 kernels? If so,
it would be really good to have a look that before 2.6.13 goes out the
door.



Yes I do :-(, apparently bumping up cache_hot_time won't give us the
performance boost we used to see.



OK there are probably a number of things we can explore depending on
what are the symptoms (eg. excessive idle time, bad cache performance).

Unfortunately it is kind of difficult to tune 2.6.13 on the basis of
2.6.12 results - although that's not to say it won't indicate a good
avenue to investigate.


Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 6:25 PM
 Well pipes are just an example. It could be any type of communication.
 What's more, even the synchronous wakeup uses the wake balancing path
 (although that could be modified to only do wake balancing for synch
 wakeups, I'd have to be convinced we should special case pipes and not
 eg. semaphores or AF_UNIX sockets).


Why is the normal load balance path not enough (or not be able to do the
right thing)?  The reblance_tick and idle_balance ought be enough to take
care of the imbalance.  What makes load balancing in wake up path so special?

Oh, I'd like to hear your opinion on what to do with these two flags, make
them runtime configurable? (I'm of the opinion to delete them altogether)

- Ken

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Nick Piggin

Chen, Kenneth W wrote:


Nick Piggin wrote on Thursday, July 28, 2005 6:25 PM


Well pipes are just an example. It could be any type of communication.
What's more, even the synchronous wakeup uses the wake balancing path
(although that could be modified to only do wake balancing for synch
wakeups, I'd have to be convinced we should special case pipes and not
eg. semaphores or AF_UNIX sockets).




Why is the normal load balance path not enough (or not be able to do the
right thing)?  The reblance_tick and idle_balance ought be enough to take
care of the imbalance.  What makes load balancing in wake up path so special?




Well the normal load balancing path treats all tasks the same, while
the wake path knows if a CPU is waking a remote task and can attempt
to maximise the number of local wakeups.


Oh, I'd like to hear your opinion on what to do with these two flags, make
them runtime configurable? (I'm of the opinion to delete them altogether)




I'd like to try making them less aggressive first if possible.


Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Chen, Kenneth W
Nick Piggin wrote on Thursday, July 28, 2005 6:46 PM
 I'd like to try making them less aggressive first if possible.

Well, that's exactly what I'm trying to do: make them not aggressive
at all by not performing any load balance :-)  The workload gets maximum
benefit with zero aggressiveness.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Nick Piggin

Chen, Kenneth W wrote:


Nick Piggin wrote on Thursday, July 28, 2005 6:46 PM


I'd like to try making them less aggressive first if possible.



Well, that's exactly what I'm trying to do: make them not aggressive
at all by not performing any load balance :-)  The workload gets maximum
benefit with zero aggressiveness.




Unfortunately we can't forget about other workloads, and we're
trying to stay away from runtime tunables in the scheduler.

If we can get performance to within a couple of tenths of a percent
of the zero balancing case, then that would be preferable I think.


Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/