RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Ingo Molnar wrote on Friday, July 29, 2005 4:26 AM > * Chen, Kenneth W <[EMAIL PROTECTED]> wrote: > > To demonstrate the problem, we turned off these two flags in the cpu > > sd domain and measured a stunning 2.15% performance gain! And > > deleting all the code in the try_to_wake_up() pertain to load > > balancing gives us another 0.2% gain. > > another thing: do you have a HT-capable ia64 CPU, and do you have > CONFIG_SCHED_SMT turned on? If yes then could you try to turn off > SD_WAKE_IDLE too, i found it to bring further performance improvements > in certain workloads. The scheduler experiments done so far are on non-SMT CPU (Madison processor). We have another db setup with multi-thread capable ia64 cpu (montecito, and to be precise, it is SOEMT capable). We are just about to do scheduler experiments on that setup. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
* Nick Piggin <[EMAIL PROTECTED]> wrote: > Chen, Kenneth W wrote: > >Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM > > >This clearly outlines an issue with the implementation. Optimize for one > >type of workload has detrimental effect on another workload and vice versa. > > > > Yep. That comes up fairly regularly when tuning the scheduler :( in this particular case we can clearly separate the two workloads though: CPU-overload (Ken's benchmark) vs. half-load (3-task tbench). So by checking for migration target/source idleness we can have a hard separator for wakeup balancing. (whether it works out for both types of workloads remains to be seen) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
* Chen, Kenneth W <[EMAIL PROTECTED]> wrote: > To demonstrate the problem, we turned off these two flags in the cpu > sd domain and measured a stunning 2.15% performance gain! And > deleting all the code in the try_to_wake_up() pertain to load > balancing gives us another 0.2% gain. another thing: do you have a HT-capable ia64 CPU, and do you have CONFIG_SCHED_SMT turned on? If yes then could you try to turn off SD_WAKE_IDLE too, i found it to bring further performance improvements in certain workloads. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
* Nick Piggin <[EMAIL PROTECTED]> wrote: > Well, you can easily see suboptimal scheduling decisions on many > programs with lots of interprocess communication. For example, tbench > on a dual Xeon: > > processes1 2 3 4 > > 2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 > no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 > > Numbers are MB/s, higher is better. i cannot see any difference with/without wake-balancing in this workload, on a dual Xeon. Could you try the quick hack below and do: echo 1 > /proc/sys/kernel/panic # turn on wake-balancing echo 0 > /proc/sys/kernel/panic # turn off wake-balancing does the runtime switching show any effects on the throughput numbers tbench is showing? I'm using dbench-3.03. (i only checked the status numbers, didnt do full runs) (did you have SCHED_SMT enabled?) Ingo kernel/sched.c |2 ++ 1 files changed, 2 insertions(+) Index: linux-prefetch-task/kernel/sched.c === --- linux-prefetch-task.orig/kernel/sched.c +++ linux-prefetch-task/kernel/sched.c @@ -1155,6 +1155,8 @@ static int try_to_wake_up(task_t * p, un goto out_activate; new_cpu = cpu; + if (!panic_timeout) + goto out_set_cpu; schedstat_inc(rq, ttwu_cnt); if (cpu == this_cpu) { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
* Nick Piggin <[EMAIL PROTECTED]> wrote: > >>processes1 2 3 4 > >> > >>2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 > >>no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 > >> > >>Numbers are MB/s, higher is better. > > > > > >what type of network was used - localhost or a real one? > > > > Localhost. Yeah it isn't a real world test, but it does show the > erratic behaviour without wake affine. yeah - fine enough. (It's not representative for IO workloads, but it's representative for local IPC workloads, just wanted to know precisely which workload it is.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: processes1 2 3 4 2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 Numbers are MB/s, higher is better. what type of network was used - localhost or a real one? Localhost. Yeah it isn't a real world test, but it does show the erratic behaviour without wake affine. I don't have a setup with multiple fast network adapters otherwise I would have run a similar test using a real network. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
* Nick Piggin <[EMAIL PROTECTED]> wrote: > Well, you can easily see suboptimal scheduling decisions on many > programs with lots of interprocess communication. For example, tbench > on a dual Xeon: > > processes1 2 3 4 > > 2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 > no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 > > Numbers are MB/s, higher is better. what type of network was used - localhost or a real one? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Chen, Kenneth W wrote: Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM This clearly outlines an issue with the implementation. Optimize for one type of workload has detrimental effect on another workload and vice versa. Yep. That comes up fairly regularly when tuning the scheduler :( I won't try to compromise between the two. If you do so, we would end up with two half baked raw turkey. Making less aggressive load balance in the wake up path would probably reduce performance for the type of workload you quoted earlier and for db workload, we don't want any of them at all, not even the code to determine whether it should be balanced or not. Well, that remains to be seen. If it can be made _smarter_, then you may not have to take such a big compromise. But either way, there will have to be some compromise made. At the very least you have to find some acceptable default. Do you have an example workload you mentioned earlier that depends on SD_WAKE_BALANCE? I would like to experiment with it so we can move this forward instead of paper talk. Well, you can easily see suboptimal scheduling decisions on many programs with lots of interprocess communication. For example, tbench on a dual Xeon: processes1 2 3 4 2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 Numbers are MB/s, higher is better. Networking or other IO workloads where processes are tightly coupled to a specific adapter / interrupt source can also see pretty good gains. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM > Chen, Kenneth W wrote: > >Well, that's exactly what I'm trying to do: make them not aggressive > >at all by not performing any load balance :-) The workload gets maximum > >benefit with zero aggressiveness. > > Unfortunately we can't forget about other workloads, and we're > trying to stay away from runtime tunables in the scheduler. This clearly outlines an issue with the implementation. Optimize for one type of workload has detrimental effect on another workload and vice versa. > If we can get performance to within a couple of tenths of a percent > of the zero balancing case, then that would be preferable I think. I won't try to compromise between the two. If you do so, we would end up with two half baked raw turkey. Making less aggressive load balance in the wake up path would probably reduce performance for the type of workload you quoted earlier and for db workload, we don't want any of them at all, not even the code to determine whether it should be balanced or not. Do you have an example workload you mentioned earlier that depends on SD_WAKE_BALANCE? I would like to experiment with it so we can move this forward instead of paper talk. - Ken - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM Chen, Kenneth W wrote: Well, that's exactly what I'm trying to do: make them not aggressive at all by not performing any load balance :-) The workload gets maximum benefit with zero aggressiveness. Unfortunately we can't forget about other workloads, and we're trying to stay away from runtime tunables in the scheduler. This clearly outlines an issue with the implementation. Optimize for one type of workload has detrimental effect on another workload and vice versa. If we can get performance to within a couple of tenths of a percent of the zero balancing case, then that would be preferable I think. I won't try to compromise between the two. If you do so, we would end up with two half baked raw turkey. Making less aggressive load balance in the wake up path would probably reduce performance for the type of workload you quoted earlier and for db workload, we don't want any of them at all, not even the code to determine whether it should be balanced or not. Do you have an example workload you mentioned earlier that depends on SD_WAKE_BALANCE? I would like to experiment with it so we can move this forward instead of paper talk. - Ken - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Chen, Kenneth W wrote: Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM This clearly outlines an issue with the implementation. Optimize for one type of workload has detrimental effect on another workload and vice versa. Yep. That comes up fairly regularly when tuning the scheduler :( I won't try to compromise between the two. If you do so, we would end up with two half baked raw turkey. Making less aggressive load balance in the wake up path would probably reduce performance for the type of workload you quoted earlier and for db workload, we don't want any of them at all, not even the code to determine whether it should be balanced or not. Well, that remains to be seen. If it can be made _smarter_, then you may not have to take such a big compromise. But either way, there will have to be some compromise made. At the very least you have to find some acceptable default. Do you have an example workload you mentioned earlier that depends on SD_WAKE_BALANCE? I would like to experiment with it so we can move this forward instead of paper talk. Well, you can easily see suboptimal scheduling decisions on many programs with lots of interprocess communication. For example, tbench on a dual Xeon: processes1 2 3 4 2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 Numbers are MB/s, higher is better. Networking or other IO workloads where processes are tightly coupled to a specific adapter / interrupt source can also see pretty good gains. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
* Nick Piggin [EMAIL PROTECTED] wrote: Well, you can easily see suboptimal scheduling decisions on many programs with lots of interprocess communication. For example, tbench on a dual Xeon: processes1 2 3 4 2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 Numbers are MB/s, higher is better. what type of network was used - localhost or a real one? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: processes1 2 3 4 2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 Numbers are MB/s, higher is better. what type of network was used - localhost or a real one? Localhost. Yeah it isn't a real world test, but it does show the erratic behaviour without wake affine. I don't have a setup with multiple fast network adapters otherwise I would have run a similar test using a real network. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
* Nick Piggin [EMAIL PROTECTED] wrote: processes1 2 3 4 2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 Numbers are MB/s, higher is better. what type of network was used - localhost or a real one? Localhost. Yeah it isn't a real world test, but it does show the erratic behaviour without wake affine. yeah - fine enough. (It's not representative for IO workloads, but it's representative for local IPC workloads, just wanted to know precisely which workload it is.) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
* Nick Piggin [EMAIL PROTECTED] wrote: Well, you can easily see suboptimal scheduling decisions on many programs with lots of interprocess communication. For example, tbench on a dual Xeon: processes1 2 3 4 2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 Numbers are MB/s, higher is better. i cannot see any difference with/without wake-balancing in this workload, on a dual Xeon. Could you try the quick hack below and do: echo 1 /proc/sys/kernel/panic # turn on wake-balancing echo 0 /proc/sys/kernel/panic # turn off wake-balancing does the runtime switching show any effects on the throughput numbers tbench is showing? I'm using dbench-3.03. (i only checked the status numbers, didnt do full runs) (did you have SCHED_SMT enabled?) Ingo kernel/sched.c |2 ++ 1 files changed, 2 insertions(+) Index: linux-prefetch-task/kernel/sched.c === --- linux-prefetch-task.orig/kernel/sched.c +++ linux-prefetch-task/kernel/sched.c @@ -1155,6 +1155,8 @@ static int try_to_wake_up(task_t * p, un goto out_activate; new_cpu = cpu; + if (!panic_timeout) + goto out_set_cpu; schedstat_inc(rq, ttwu_cnt); if (cpu == this_cpu) { - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
* Chen, Kenneth W [EMAIL PROTECTED] wrote: To demonstrate the problem, we turned off these two flags in the cpu sd domain and measured a stunning 2.15% performance gain! And deleting all the code in the try_to_wake_up() pertain to load balancing gives us another 0.2% gain. another thing: do you have a HT-capable ia64 CPU, and do you have CONFIG_SCHED_SMT turned on? If yes then could you try to turn off SD_WAKE_IDLE too, i found it to bring further performance improvements in certain workloads. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
* Nick Piggin [EMAIL PROTECTED] wrote: Chen, Kenneth W wrote: Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM This clearly outlines an issue with the implementation. Optimize for one type of workload has detrimental effect on another workload and vice versa. Yep. That comes up fairly regularly when tuning the scheduler :( in this particular case we can clearly separate the two workloads though: CPU-overload (Ken's benchmark) vs. half-load (3-task tbench). So by checking for migration target/source idleness we can have a hard separator for wakeup balancing. (whether it works out for both types of workloads remains to be seen) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Ingo Molnar wrote on Friday, July 29, 2005 4:26 AM * Chen, Kenneth W [EMAIL PROTECTED] wrote: To demonstrate the problem, we turned off these two flags in the cpu sd domain and measured a stunning 2.15% performance gain! And deleting all the code in the try_to_wake_up() pertain to load balancing gives us another 0.2% gain. another thing: do you have a HT-capable ia64 CPU, and do you have CONFIG_SCHED_SMT turned on? If yes then could you try to turn off SD_WAKE_IDLE too, i found it to bring further performance improvements in certain workloads. The scheduler experiments done so far are on non-SMT CPU (Madison processor). We have another db setup with multi-thread capable ia64 cpu (montecito, and to be precise, it is SOEMT capable). We are just about to do scheduler experiments on that setup. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Chen, Kenneth W wrote: Nick Piggin wrote on Thursday, July 28, 2005 6:46 PM I'd like to try making them less aggressive first if possible. Well, that's exactly what I'm trying to do: make them not aggressive at all by not performing any load balance :-) The workload gets maximum benefit with zero aggressiveness. Unfortunately we can't forget about other workloads, and we're trying to stay away from runtime tunables in the scheduler. If we can get performance to within a couple of tenths of a percent of the zero balancing case, then that would be preferable I think. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Nick Piggin wrote on Thursday, July 28, 2005 6:46 PM > I'd like to try making them less aggressive first if possible. Well, that's exactly what I'm trying to do: make them not aggressive at all by not performing any load balance :-) The workload gets maximum benefit with zero aggressiveness. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Chen, Kenneth W wrote: Nick Piggin wrote on Thursday, July 28, 2005 6:25 PM Well pipes are just an example. It could be any type of communication. What's more, even the synchronous wakeup uses the wake balancing path (although that could be modified to only do wake balancing for synch wakeups, I'd have to be convinced we should special case pipes and not eg. semaphores or AF_UNIX sockets). Why is the normal load balance path not enough (or not be able to do the right thing)? The reblance_tick and idle_balance ought be enough to take care of the imbalance. What makes load balancing in wake up path so special? Well the normal load balancing path treats all tasks the same, while the wake path knows if a CPU is waking a remote task and can attempt to maximise the number of local wakeups. Oh, I'd like to hear your opinion on what to do with these two flags, make them runtime configurable? (I'm of the opinion to delete them altogether) I'd like to try making them less aggressive first if possible. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Nick Piggin wrote on Thursday, July 28, 2005 6:25 PM > Well pipes are just an example. It could be any type of communication. > What's more, even the synchronous wakeup uses the wake balancing path > (although that could be modified to only do wake balancing for synch > wakeups, I'd have to be convinced we should special case pipes and not > eg. semaphores or AF_UNIX sockets). Why is the normal load balance path not enough (or not be able to do the right thing)? The reblance_tick and idle_balance ought be enough to take care of the imbalance. What makes load balancing in wake up path so special? Oh, I'd like to hear your opinion on what to do with these two flags, make them runtime configurable? (I'm of the opinion to delete them altogether) - Ken - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Chen, Kenneth W wrote: Nick Piggin wrote on Thursday, July 28, 2005 4:35 PM Wake balancing provides an opportunity to provide some input bias into the load balancer. For example, if you started 100 pairs of tasks which communicate through a pipe. On a 2 CPU system without wake balancing, probably half of the pairs will be on different CPUs. With wake balancing, it should be much better. Shouldn't the pipe code use synchronous wakeup? Well pipes are just an example. It could be any type of communication. What's more, even the synchronous wakeup uses the wake balancing path (although that could be modified to only do wake balancing for synch wakeups, I'd have to be convinced we should special case pipes and not eg. semaphores or AF_UNIX sockets). I hear you might be having problems with recent 2.6.13 kernels? If so, it would be really good to have a look that before 2.6.13 goes out the door. Yes I do :-(, apparently bumping up cache_hot_time won't give us the performance boost we used to see. OK there are probably a number of things we can explore depending on what are the symptoms (eg. excessive idle time, bad cache performance). Unfortunately it is kind of difficult to tune 2.6.13 on the basis of 2.6.12 results - although that's not to say it won't indicate a good avenue to investigate. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Nick Piggin wrote on Thursday, July 28, 2005 4:35 PM > Wake balancing provides an opportunity to provide some input bias > into the load balancer. > > For example, if you started 100 pairs of tasks which communicate > through a pipe. On a 2 CPU system without wake balancing, probably > half of the pairs will be on different CPUs. With wake balancing, > it should be much better. Shouldn't the pipe code use synchronous wakeup? > I hear you might be having problems with recent 2.6.13 kernels? If so, > it would be really good to have a look that before 2.6.13 goes out the > door. Yes I do :-(, apparently bumping up cache_hot_time won't give us the performance boost we used to see. - Ken - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Chen, Kenneth W wrote: What sort of workload needs SD_WAKE_AFFINE and SD_WAKE_BALANCE? SD_WAKE_AFFINE are not useful in conjunction with interrupt binding. In fact, it creates more harm than usefulness, causing detrimental process migration and destroy process cache affinity etc. Also SD_WAKE_BALANCE is giving us performance grief with our industry standard OLTP workload. The periodic load balancer basically makes completely undirected, random choices when picking which tasks to move where. Wake balancing provides an opportunity to provide some input bias into the load balancer. For example, if you started 100 pairs of tasks which communicate through a pipe. On a 2 CPU system without wake balancing, probably half of the pairs will be on different CPUs. With wake balancing, it should be much better. I've also been told that it impoves IO efficiency significantly - obviously that depends on the system and workload. To demonstrate the problem, we turned off these two flags in the cpu sd domain and measured a stunning 2.15% performance gain! And deleting all the code in the try_to_wake_up() pertain to load balancing gives us another 0.2% gain. The wake up patch should be made simple, just put the waking task on the previously ran cpu runqueue. Simple and elegant. I'm proposing we either delete these two flags or make them run time configurable. There have been lots of changes since 2.6.12. Including less aggressive wake balancing. I hear you might be having problems with recent 2.6.13 kernels? If so, it would be really good to have a look that before 2.6.13 goes out the door. I appreciate all the effort you're putting into this! Nick -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
What sort of workload needs SD_WAKE_AFFINE and SD_WAKE_BALANCE? SD_WAKE_AFFINE are not useful in conjunction with interrupt binding. In fact, it creates more harm than usefulness, causing detrimental process migration and destroy process cache affinity etc. Also SD_WAKE_BALANCE is giving us performance grief with our industry standard OLTP workload. To demonstrate the problem, we turned off these two flags in the cpu sd domain and measured a stunning 2.15% performance gain! And deleting all the code in the try_to_wake_up() pertain to load balancing gives us another 0.2% gain. The wake up patch should be made simple, just put the waking task on the previously ran cpu runqueue. Simple and elegant. I'm proposing we either delete these two flags or make them run time configurable. - Ken --- linux-2.6.12/include/linux/topology.h.orig 2005-07-28 15:54:05.007399685 -0700 +++ linux-2.6.12/include/linux/topology.h 2005-07-28 15:54:39.29215 -0700 @@ -118,9 +118,7 @@ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE\ | SD_BALANCE_EXEC \ - | SD_WAKE_AFFINE\ - | SD_WAKE_IDLE \ - | SD_WAKE_BALANCE, \ + | SD_WAKE_IDLE, \ .last_balance = jiffies, \ .balance_interval = 1,\ .nr_balance_failed = 0,\ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
What sort of workload needs SD_WAKE_AFFINE and SD_WAKE_BALANCE? SD_WAKE_AFFINE are not useful in conjunction with interrupt binding. In fact, it creates more harm than usefulness, causing detrimental process migration and destroy process cache affinity etc. Also SD_WAKE_BALANCE is giving us performance grief with our industry standard OLTP workload. To demonstrate the problem, we turned off these two flags in the cpu sd domain and measured a stunning 2.15% performance gain! And deleting all the code in the try_to_wake_up() pertain to load balancing gives us another 0.2% gain. The wake up patch should be made simple, just put the waking task on the previously ran cpu runqueue. Simple and elegant. I'm proposing we either delete these two flags or make them run time configurable. - Ken --- linux-2.6.12/include/linux/topology.h.orig 2005-07-28 15:54:05.007399685 -0700 +++ linux-2.6.12/include/linux/topology.h 2005-07-28 15:54:39.29215 -0700 @@ -118,9 +118,7 @@ .flags = SD_LOAD_BALANCE \ | SD_BALANCE_NEWIDLE\ | SD_BALANCE_EXEC \ - | SD_WAKE_AFFINE\ - | SD_WAKE_IDLE \ - | SD_WAKE_BALANCE, \ + | SD_WAKE_IDLE, \ .last_balance = jiffies, \ .balance_interval = 1,\ .nr_balance_failed = 0,\ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Chen, Kenneth W wrote: What sort of workload needs SD_WAKE_AFFINE and SD_WAKE_BALANCE? SD_WAKE_AFFINE are not useful in conjunction with interrupt binding. In fact, it creates more harm than usefulness, causing detrimental process migration and destroy process cache affinity etc. Also SD_WAKE_BALANCE is giving us performance grief with our industry standard OLTP workload. The periodic load balancer basically makes completely undirected, random choices when picking which tasks to move where. Wake balancing provides an opportunity to provide some input bias into the load balancer. For example, if you started 100 pairs of tasks which communicate through a pipe. On a 2 CPU system without wake balancing, probably half of the pairs will be on different CPUs. With wake balancing, it should be much better. I've also been told that it impoves IO efficiency significantly - obviously that depends on the system and workload. To demonstrate the problem, we turned off these two flags in the cpu sd domain and measured a stunning 2.15% performance gain! And deleting all the code in the try_to_wake_up() pertain to load balancing gives us another 0.2% gain. The wake up patch should be made simple, just put the waking task on the previously ran cpu runqueue. Simple and elegant. I'm proposing we either delete these two flags or make them run time configurable. There have been lots of changes since 2.6.12. Including less aggressive wake balancing. I hear you might be having problems with recent 2.6.13 kernels? If so, it would be really good to have a look that before 2.6.13 goes out the door. I appreciate all the effort you're putting into this! Nick -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Nick Piggin wrote on Thursday, July 28, 2005 4:35 PM Wake balancing provides an opportunity to provide some input bias into the load balancer. For example, if you started 100 pairs of tasks which communicate through a pipe. On a 2 CPU system without wake balancing, probably half of the pairs will be on different CPUs. With wake balancing, it should be much better. Shouldn't the pipe code use synchronous wakeup? I hear you might be having problems with recent 2.6.13 kernels? If so, it would be really good to have a look that before 2.6.13 goes out the door. Yes I do :-(, apparently bumping up cache_hot_time won't give us the performance boost we used to see. - Ken - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Chen, Kenneth W wrote: Nick Piggin wrote on Thursday, July 28, 2005 4:35 PM Wake balancing provides an opportunity to provide some input bias into the load balancer. For example, if you started 100 pairs of tasks which communicate through a pipe. On a 2 CPU system without wake balancing, probably half of the pairs will be on different CPUs. With wake balancing, it should be much better. Shouldn't the pipe code use synchronous wakeup? Well pipes are just an example. It could be any type of communication. What's more, even the synchronous wakeup uses the wake balancing path (although that could be modified to only do wake balancing for synch wakeups, I'd have to be convinced we should special case pipes and not eg. semaphores or AF_UNIX sockets). I hear you might be having problems with recent 2.6.13 kernels? If so, it would be really good to have a look that before 2.6.13 goes out the door. Yes I do :-(, apparently bumping up cache_hot_time won't give us the performance boost we used to see. OK there are probably a number of things we can explore depending on what are the symptoms (eg. excessive idle time, bad cache performance). Unfortunately it is kind of difficult to tune 2.6.13 on the basis of 2.6.12 results - although that's not to say it won't indicate a good avenue to investigate. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Nick Piggin wrote on Thursday, July 28, 2005 6:25 PM Well pipes are just an example. It could be any type of communication. What's more, even the synchronous wakeup uses the wake balancing path (although that could be modified to only do wake balancing for synch wakeups, I'd have to be convinced we should special case pipes and not eg. semaphores or AF_UNIX sockets). Why is the normal load balance path not enough (or not be able to do the right thing)? The reblance_tick and idle_balance ought be enough to take care of the imbalance. What makes load balancing in wake up path so special? Oh, I'd like to hear your opinion on what to do with these two flags, make them runtime configurable? (I'm of the opinion to delete them altogether) - Ken - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Chen, Kenneth W wrote: Nick Piggin wrote on Thursday, July 28, 2005 6:25 PM Well pipes are just an example. It could be any type of communication. What's more, even the synchronous wakeup uses the wake balancing path (although that could be modified to only do wake balancing for synch wakeups, I'd have to be convinced we should special case pipes and not eg. semaphores or AF_UNIX sockets). Why is the normal load balance path not enough (or not be able to do the right thing)? The reblance_tick and idle_balance ought be enough to take care of the imbalance. What makes load balancing in wake up path so special? Well the normal load balancing path treats all tasks the same, while the wake path knows if a CPU is waking a remote task and can attempt to maximise the number of local wakeups. Oh, I'd like to hear your opinion on what to do with these two flags, make them runtime configurable? (I'm of the opinion to delete them altogether) I'd like to try making them less aggressive first if possible. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Nick Piggin wrote on Thursday, July 28, 2005 6:46 PM I'd like to try making them less aggressive first if possible. Well, that's exactly what I'm trying to do: make them not aggressive at all by not performing any load balance :-) The workload gets maximum benefit with zero aggressiveness. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags
Chen, Kenneth W wrote: Nick Piggin wrote on Thursday, July 28, 2005 6:46 PM I'd like to try making them less aggressive first if possible. Well, that's exactly what I'm trying to do: make them not aggressive at all by not performing any load balance :-) The workload gets maximum benefit with zero aggressiveness. Unfortunately we can't forget about other workloads, and we're trying to stay away from runtime tunables in the scheduler. If we can get performance to within a couple of tenths of a percent of the zero balancing case, then that would be preferable I think. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/