Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On 01/16/2013 11:08 PM, Morten Rasmussen wrote: > On Wed, Jan 16, 2013 at 07:32:49AM +, Alex Shi wrote: >> On 01/15/2013 01:00 AM, Morten Rasmussen wrote: > Why multiply rq->util by nr_running? >>> >>> Let's take an example where rq->util = 50, nr_running = 2, and putil = >>> 10. In this case the value of putil doesn't really matter as vacancy >>> would be negative anyway since FULL_UTIL - rq->util * nr_running is -1. >>> However, with rq->util = 50 there should be plenty of spare cpu time to >>> take another task. > > for this example, the util is not full maybe due to it was just wake up, > it still is possible like to run full time. So, I try to give it the > large guess load. >>> I don't see why rq->util should be treated different depending on the >>> number of tasks causing the load. rq->util = 50 means that the cpu is >>> busy about 50% of the time no matter how many tasks contibute to that >>> load. >>> >>> If nr_running = 1 instead in my example, you would consider the cpu >>> vacant if putil = 6, but if nr_running > 1 you would not. Why should the >>> two scenarios be treated differently? >>> >>> >>> Also, why multiply putil by 8? rq->util must be very close to 0 for >>> vacancy to be positive if putil is close to 12 (12.5%). > > just want to pack small util tasks, since packing is possible to hurt > performance. >>> I agree that packing may affect performance. But why don't you reduce >>> FULL_UTIL instead of multiplying by 8? With current expression you will >>> not pack a 10% task if rq->util = 20 and nr_running = 1, but you would >>> pack a 6% task even if rq->util = 50 and the resulting cpu load is much >>> higher. >>> >> >> Yes, the threshold has no strong theory or experiment support. I had >> tried cyclitest which Vicent used, the case's load avg is too small to >> be caught. so just use half of Vicent value as 12.5%. If you has more >> reasonable value, let me know. >> >> As to nr_running engaged as multiple mode. it's base on 2 reasons. >> 1, load avg/util need 345ms to accumulate as 100%. so, if a tasks is >> cost full cpu time, it still has 345ms with rq->util < 1. > > I agree that load avg may not be accurate, especially for new tasks. But > why use it if you don't trust its value anyway? > > The load avg (sum/period) of a new task will reach 100% instantly if the > task is consuming all the cpu time it can get. An old task can reach 50% > within 32ms. So you should fairly quickly be able to see if it is a > light task or not. You may under-estimate its load in the beginning, but > only for a very short time. this packing is done in wakup, even no 'a very short time' here:) > >> 2, if there are more tasks, like 2 tasks running on one cpu, it's >> possible to has capacity to burn 200% cpu time, while the biggest >> rq->util is still 100%. > > If you want to have a better metric for how much cpu time the task on > the runqueue could potentially use, I would suggest using > cfs_rq->runnable_load_avg which is the load_avg_contrib sum of all tasks > on the runqueue. It would give you 200% in your example above. runnable_load_avg also need much time to accumulate its value, not better than util. > > On the other hand, I think rq->util is fine for this purpose. If > rq->util < 100% you know for sure that cpu is not fully utilized no > matter how many tasks you have on the runqueue. So as long as rq->util > is well below 100% (like < 50%) it should be safe to pack more small > tasks on that cpu even if it has multiple tasks running already. > >> >> Consider to figure out precise utils is complicate and cost much. I do >> this simple calculation. It is not very precise, but it is efficient and >> more bias toward performance. > > It is indeed very biased towards performance. I would prefer more focus > on saving power in a power scheduling policy :) > Agree, and I don't refuse to change the criteria for power. :) but without reliable benchmarks or data, everything is guess. -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On 01/16/2013 11:08 PM, Morten Rasmussen wrote: On Wed, Jan 16, 2013 at 07:32:49AM +, Alex Shi wrote: On 01/15/2013 01:00 AM, Morten Rasmussen wrote: Why multiply rq-util by nr_running? Let's take an example where rq-util = 50, nr_running = 2, and putil = 10. In this case the value of putil doesn't really matter as vacancy would be negative anyway since FULL_UTIL - rq-util * nr_running is -1. However, with rq-util = 50 there should be plenty of spare cpu time to take another task. for this example, the util is not full maybe due to it was just wake up, it still is possible like to run full time. So, I try to give it the large guess load. I don't see why rq-util should be treated different depending on the number of tasks causing the load. rq-util = 50 means that the cpu is busy about 50% of the time no matter how many tasks contibute to that load. If nr_running = 1 instead in my example, you would consider the cpu vacant if putil = 6, but if nr_running 1 you would not. Why should the two scenarios be treated differently? Also, why multiply putil by 8? rq-util must be very close to 0 for vacancy to be positive if putil is close to 12 (12.5%). just want to pack small util tasks, since packing is possible to hurt performance. I agree that packing may affect performance. But why don't you reduce FULL_UTIL instead of multiplying by 8? With current expression you will not pack a 10% task if rq-util = 20 and nr_running = 1, but you would pack a 6% task even if rq-util = 50 and the resulting cpu load is much higher. Yes, the threshold has no strong theory or experiment support. I had tried cyclitest which Vicent used, the case's load avg is too small to be caught. so just use half of Vicent value as 12.5%. If you has more reasonable value, let me know. As to nr_running engaged as multiple mode. it's base on 2 reasons. 1, load avg/util need 345ms to accumulate as 100%. so, if a tasks is cost full cpu time, it still has 345ms with rq-util 1. I agree that load avg may not be accurate, especially for new tasks. But why use it if you don't trust its value anyway? The load avg (sum/period) of a new task will reach 100% instantly if the task is consuming all the cpu time it can get. An old task can reach 50% within 32ms. So you should fairly quickly be able to see if it is a light task or not. You may under-estimate its load in the beginning, but only for a very short time. this packing is done in wakup, even no 'a very short time' here:) 2, if there are more tasks, like 2 tasks running on one cpu, it's possible to has capacity to burn 200% cpu time, while the biggest rq-util is still 100%. If you want to have a better metric for how much cpu time the task on the runqueue could potentially use, I would suggest using cfs_rq-runnable_load_avg which is the load_avg_contrib sum of all tasks on the runqueue. It would give you 200% in your example above. runnable_load_avg also need much time to accumulate its value, not better than util. On the other hand, I think rq-util is fine for this purpose. If rq-util 100% you know for sure that cpu is not fully utilized no matter how many tasks you have on the runqueue. So as long as rq-util is well below 100% (like 50%) it should be safe to pack more small tasks on that cpu even if it has multiple tasks running already. Consider to figure out precise utils is complicate and cost much. I do this simple calculation. It is not very precise, but it is efficient and more bias toward performance. It is indeed very biased towards performance. I would prefer more focus on saving power in a power scheduling policy :) Agree, and I don't refuse to change the criteria for power. :) but without reliable benchmarks or data, everything is guess. -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On Wed, Jan 16, 2013 at 07:32:49AM +, Alex Shi wrote: > On 01/15/2013 01:00 AM, Morten Rasmussen wrote: > >>> Why multiply rq->util by nr_running? > >>> > > > >>> > > Let's take an example where rq->util = 50, nr_running = 2, and putil = > >>> > > 10. In this case the value of putil doesn't really matter as vacancy > >>> > > would be negative anyway since FULL_UTIL - rq->util * nr_running is > >>> > > -1. > >>> > > However, with rq->util = 50 there should be plenty of spare cpu time > >>> > > to > >>> > > take another task. > >> > > >> > for this example, the util is not full maybe due to it was just wake up, > >> > it still is possible like to run full time. So, I try to give it the > >> > large guess load. > > I don't see why rq->util should be treated different depending on the > > number of tasks causing the load. rq->util = 50 means that the cpu is > > busy about 50% of the time no matter how many tasks contibute to that > > load. > > > > If nr_running = 1 instead in my example, you would consider the cpu > > vacant if putil = 6, but if nr_running > 1 you would not. Why should the > > two scenarios be treated differently? > > > >>> > > > >>> > > Also, why multiply putil by 8? rq->util must be very close to 0 for > >>> > > vacancy to be positive if putil is close to 12 (12.5%). > >> > > >> > just want to pack small util tasks, since packing is possible to hurt > >> > performance. > > I agree that packing may affect performance. But why don't you reduce > > FULL_UTIL instead of multiplying by 8? With current expression you will > > not pack a 10% task if rq->util = 20 and nr_running = 1, but you would > > pack a 6% task even if rq->util = 50 and the resulting cpu load is much > > higher. > > > > Yes, the threshold has no strong theory or experiment support. I had > tried cyclitest which Vicent used, the case's load avg is too small to > be caught. so just use half of Vicent value as 12.5%. If you has more > reasonable value, let me know. > > As to nr_running engaged as multiple mode. it's base on 2 reasons. > 1, load avg/util need 345ms to accumulate as 100%. so, if a tasks is > cost full cpu time, it still has 345ms with rq->util < 1. I agree that load avg may not be accurate, especially for new tasks. But why use it if you don't trust its value anyway? The load avg (sum/period) of a new task will reach 100% instantly if the task is consuming all the cpu time it can get. An old task can reach 50% within 32ms. So you should fairly quickly be able to see if it is a light task or not. You may under-estimate its load in the beginning, but only for a very short time. > 2, if there are more tasks, like 2 tasks running on one cpu, it's > possible to has capacity to burn 200% cpu time, while the biggest > rq->util is still 100%. If you want to have a better metric for how much cpu time the task on the runqueue could potentially use, I would suggest using cfs_rq->runnable_load_avg which is the load_avg_contrib sum of all tasks on the runqueue. It would give you 200% in your example above. On the other hand, I think rq->util is fine for this purpose. If rq->util < 100% you know for sure that cpu is not fully utilized no matter how many tasks you have on the runqueue. So as long as rq->util is well below 100% (like < 50%) it should be safe to pack more small tasks on that cpu even if it has multiple tasks running already. > > Consider to figure out precise utils is complicate and cost much. I do > this simple calculation. It is not very precise, but it is efficient and > more bias toward performance. It is indeed very biased towards performance. I would prefer more focus on saving power in a power scheduling policy :) Morten > > -- > Thanks Alex > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On Wed, 16 Jan 2013 14:11:53 +0800, Alex Shi wrote: > On 01/14/2013 03:13 PM, Namhyung Kim wrote: >> On Fri, 11 Jan 2013 11:47:03 +0800, Alex Shi wrote: >>> Um, change to leader_cpu? >> >> vacantest? ;-) > > hard to the ward in google. are you sure it is better than leader_cpu? :) Nop. My English is just bad. :) But I'm not sure the "leader" is the right one. Why is it called "leader" in the first place? Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On Wed, 16 Jan 2013 14:11:53 +0800, Alex Shi wrote: On 01/14/2013 03:13 PM, Namhyung Kim wrote: On Fri, 11 Jan 2013 11:47:03 +0800, Alex Shi wrote: Um, change to leader_cpu? vacantest? ;-) hard to the ward in google. are you sure it is better than leader_cpu? :) Nop. My English is just bad. :) But I'm not sure the leader is the right one. Why is it called leader in the first place? Thanks, Namhyung -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On Wed, Jan 16, 2013 at 07:32:49AM +, Alex Shi wrote: On 01/15/2013 01:00 AM, Morten Rasmussen wrote: Why multiply rq-util by nr_running? Let's take an example where rq-util = 50, nr_running = 2, and putil = 10. In this case the value of putil doesn't really matter as vacancy would be negative anyway since FULL_UTIL - rq-util * nr_running is -1. However, with rq-util = 50 there should be plenty of spare cpu time to take another task. for this example, the util is not full maybe due to it was just wake up, it still is possible like to run full time. So, I try to give it the large guess load. I don't see why rq-util should be treated different depending on the number of tasks causing the load. rq-util = 50 means that the cpu is busy about 50% of the time no matter how many tasks contibute to that load. If nr_running = 1 instead in my example, you would consider the cpu vacant if putil = 6, but if nr_running 1 you would not. Why should the two scenarios be treated differently? Also, why multiply putil by 8? rq-util must be very close to 0 for vacancy to be positive if putil is close to 12 (12.5%). just want to pack small util tasks, since packing is possible to hurt performance. I agree that packing may affect performance. But why don't you reduce FULL_UTIL instead of multiplying by 8? With current expression you will not pack a 10% task if rq-util = 20 and nr_running = 1, but you would pack a 6% task even if rq-util = 50 and the resulting cpu load is much higher. Yes, the threshold has no strong theory or experiment support. I had tried cyclitest which Vicent used, the case's load avg is too small to be caught. so just use half of Vicent value as 12.5%. If you has more reasonable value, let me know. As to nr_running engaged as multiple mode. it's base on 2 reasons. 1, load avg/util need 345ms to accumulate as 100%. so, if a tasks is cost full cpu time, it still has 345ms with rq-util 1. I agree that load avg may not be accurate, especially for new tasks. But why use it if you don't trust its value anyway? The load avg (sum/period) of a new task will reach 100% instantly if the task is consuming all the cpu time it can get. An old task can reach 50% within 32ms. So you should fairly quickly be able to see if it is a light task or not. You may under-estimate its load in the beginning, but only for a very short time. 2, if there are more tasks, like 2 tasks running on one cpu, it's possible to has capacity to burn 200% cpu time, while the biggest rq-util is still 100%. If you want to have a better metric for how much cpu time the task on the runqueue could potentially use, I would suggest using cfs_rq-runnable_load_avg which is the load_avg_contrib sum of all tasks on the runqueue. It would give you 200% in your example above. On the other hand, I think rq-util is fine for this purpose. If rq-util 100% you know for sure that cpu is not fully utilized no matter how many tasks you have on the runqueue. So as long as rq-util is well below 100% (like 50%) it should be safe to pack more small tasks on that cpu even if it has multiple tasks running already. Consider to figure out precise utils is complicate and cost much. I do this simple calculation. It is not very precise, but it is efficient and more bias toward performance. It is indeed very biased towards performance. I would prefer more focus on saving power in a power scheduling policy :) Morten -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On 01/15/2013 01:00 AM, Morten Rasmussen wrote: >>> Why multiply rq->util by nr_running? >>> > > >>> > > Let's take an example where rq->util = 50, nr_running = 2, and putil = >>> > > 10. In this case the value of putil doesn't really matter as vacancy >>> > > would be negative anyway since FULL_UTIL - rq->util * nr_running is -1. >>> > > However, with rq->util = 50 there should be plenty of spare cpu time to >>> > > take another task. >> > >> > for this example, the util is not full maybe due to it was just wake up, >> > it still is possible like to run full time. So, I try to give it the >> > large guess load. > I don't see why rq->util should be treated different depending on the > number of tasks causing the load. rq->util = 50 means that the cpu is > busy about 50% of the time no matter how many tasks contibute to that > load. > > If nr_running = 1 instead in my example, you would consider the cpu > vacant if putil = 6, but if nr_running > 1 you would not. Why should the > two scenarios be treated differently? > >>> > > >>> > > Also, why multiply putil by 8? rq->util must be very close to 0 for >>> > > vacancy to be positive if putil is close to 12 (12.5%). >> > >> > just want to pack small util tasks, since packing is possible to hurt >> > performance. > I agree that packing may affect performance. But why don't you reduce > FULL_UTIL instead of multiplying by 8? With current expression you will > not pack a 10% task if rq->util = 20 and nr_running = 1, but you would > pack a 6% task even if rq->util = 50 and the resulting cpu load is much > higher. > Yes, the threshold has no strong theory or experiment support. I had tried cyclitest which Vicent used, the case's load avg is too small to be caught. so just use half of Vicent value as 12.5%. If you has more reasonable value, let me know. As to nr_running engaged as multiple mode. it's base on 2 reasons. 1, load avg/util need 345ms to accumulate as 100%. so, if a tasks is cost full cpu time, it still has 345ms with rq->util < 1. 2, if there are more tasks, like 2 tasks running on one cpu, it's possible to has capacity to burn 200% cpu time, while the biggest rq->util is still 100%. Consider to figure out precise utils is complicate and cost much. I do this simple calculation. It is not very precise, but it is efficient and more bias toward performance. -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On 01/14/2013 03:13 PM, Namhyung Kim wrote: > On Fri, 11 Jan 2013 11:47:03 +0800, Alex Shi wrote: >> On 01/11/2013 01:17 AM, Morten Rasmussen wrote: >>> On Sat, Jan 05, 2013 at 08:37:46AM +, Alex Shi wrote: If the wake/exec task is small enough, utils < 12.5%, it will has the chance to be packed into a cpu which is busy but still has space to handle it. Signed-off-by: Alex Shi --- > [snip] >>> I may be missing something, but could the expression be something like >>> the below instead? >>> >>> Create a putil < 12.5% check before the loop. There is no reason to >>> recheck it every iteration. Then: > > Agreed. Also suggest that the checking local cpu can also be moved > before the loop so that it can be used without going through the loop if > it's vacant enough. Yes, thanks for suggestion! > >>> >>> vacancy = FULL_UTIL - (rq->util + putil) >>> >>> should be enough? >>> + + /* bias toward local cpu */ + if (vacancy > 0 && (i == this_cpu)) + return i; + + if (vacancy > 0 && vacancy < min_vacancy) { + min_vacancy = vacancy; + idlest = i; >>> >>> "idlest" may be a bit misleading here as you actually select busiest cpu >>> that have enough spare capacity to take the task. >> >> Um, change to leader_cpu? > > vacantest? ;-) hard to the ward in google. are you sure it is better than leader_cpu? :) > > Thanks, > Namhyung > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On 01/14/2013 03:13 PM, Namhyung Kim wrote: On Fri, 11 Jan 2013 11:47:03 +0800, Alex Shi wrote: On 01/11/2013 01:17 AM, Morten Rasmussen wrote: On Sat, Jan 05, 2013 at 08:37:46AM +, Alex Shi wrote: If the wake/exec task is small enough, utils 12.5%, it will has the chance to be packed into a cpu which is busy but still has space to handle it. Signed-off-by: Alex Shi alex@intel.com --- [snip] I may be missing something, but could the expression be something like the below instead? Create a putil 12.5% check before the loop. There is no reason to recheck it every iteration. Then: Agreed. Also suggest that the checking local cpu can also be moved before the loop so that it can be used without going through the loop if it's vacant enough. Yes, thanks for suggestion! vacancy = FULL_UTIL - (rq-util + putil) should be enough? + + /* bias toward local cpu */ + if (vacancy 0 (i == this_cpu)) + return i; + + if (vacancy 0 vacancy min_vacancy) { + min_vacancy = vacancy; + idlest = i; idlest may be a bit misleading here as you actually select busiest cpu that have enough spare capacity to take the task. Um, change to leader_cpu? vacantest? ;-) hard to the ward in google. are you sure it is better than leader_cpu? :) Thanks, Namhyung -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On 01/15/2013 01:00 AM, Morten Rasmussen wrote: Why multiply rq-util by nr_running? Let's take an example where rq-util = 50, nr_running = 2, and putil = 10. In this case the value of putil doesn't really matter as vacancy would be negative anyway since FULL_UTIL - rq-util * nr_running is -1. However, with rq-util = 50 there should be plenty of spare cpu time to take another task. for this example, the util is not full maybe due to it was just wake up, it still is possible like to run full time. So, I try to give it the large guess load. I don't see why rq-util should be treated different depending on the number of tasks causing the load. rq-util = 50 means that the cpu is busy about 50% of the time no matter how many tasks contibute to that load. If nr_running = 1 instead in my example, you would consider the cpu vacant if putil = 6, but if nr_running 1 you would not. Why should the two scenarios be treated differently? Also, why multiply putil by 8? rq-util must be very close to 0 for vacancy to be positive if putil is close to 12 (12.5%). just want to pack small util tasks, since packing is possible to hurt performance. I agree that packing may affect performance. But why don't you reduce FULL_UTIL instead of multiplying by 8? With current expression you will not pack a 10% task if rq-util = 20 and nr_running = 1, but you would pack a 6% task even if rq-util = 50 and the resulting cpu load is much higher. Yes, the threshold has no strong theory or experiment support. I had tried cyclitest which Vicent used, the case's load avg is too small to be caught. so just use half of Vicent value as 12.5%. If you has more reasonable value, let me know. As to nr_running engaged as multiple mode. it's base on 2 reasons. 1, load avg/util need 345ms to accumulate as 100%. so, if a tasks is cost full cpu time, it still has 345ms with rq-util 1. 2, if there are more tasks, like 2 tasks running on one cpu, it's possible to has capacity to burn 200% cpu time, while the biggest rq-util is still 100%. Consider to figure out precise utils is complicate and cost much. I do this simple calculation. It is not very precise, but it is efficient and more bias toward performance. -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On Fri, Jan 11, 2013 at 03:47:03AM +, Alex Shi wrote: > On 01/11/2013 01:17 AM, Morten Rasmussen wrote: > > On Sat, Jan 05, 2013 at 08:37:46AM +, Alex Shi wrote: > >> If the wake/exec task is small enough, utils < 12.5%, it will > >> has the chance to be packed into a cpu which is busy but still has space to > >> handle it. > >> > >> Signed-off-by: Alex Shi > >> --- > >> kernel/sched/fair.c | 51 > >> +-- > >> 1 file changed, 45 insertions(+), 6 deletions(-) > >> > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >> index 8d0d3af..0596e81 100644 > >> --- a/kernel/sched/fair.c > >> +++ b/kernel/sched/fair.c > >> @@ -3471,19 +3471,57 @@ static inline int get_sd_sched_policy(struct > >> sched_domain *sd, > >> } > >> > >> /* > >> + * find_leader_cpu - find the busiest but still has enough leisure time > >> cpu > >> + * among the cpus in group. > >> + */ > >> +static int > >> +find_leader_cpu(struct sched_group *group, struct task_struct *p, int > >> this_cpu) > >> +{ > >> + unsigned vacancy, min_vacancy = UINT_MAX; > > > > unsigned int? > > yes > > > >> + int idlest = -1; > >> + int i; > >> + /* percentage the task's util */ > >> + unsigned putil = p->se.avg.runnable_avg_sum * 100 > >> + / (p->se.avg.runnable_avg_period + 1); > > > > Alternatively you could use se.avg.load_avg_contrib which is the same > > ratio scaled by the task priority (se->load.weight). In the above > > expression you don't take priority into account. > > sure. but this seems more directly of meaningful. > > > >> + > >> + /* Traverse only the allowed CPUs */ > >> + for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { > >> + struct rq *rq = cpu_rq(i); > >> + int nr_running = rq->nr_running > 0 ? rq->nr_running : 1; > >> + > >> + /* only pack task which putil < 12.5% */ > >> + vacancy = FULL_UTIL - (rq->util * nr_running + putil * 8); > > > > I can't follow this expression. > > > > The variables can have the following values: > > FULL_UTIL = 99 > > rq->util = [0..99] > > nr_running = [1..inf] > > putil = [0..99] > > > > Why multiply rq->util by nr_running? > > > > Let's take an example where rq->util = 50, nr_running = 2, and putil = > > 10. In this case the value of putil doesn't really matter as vacancy > > would be negative anyway since FULL_UTIL - rq->util * nr_running is -1. > > However, with rq->util = 50 there should be plenty of spare cpu time to > > take another task. > > for this example, the util is not full maybe due to it was just wake up, > it still is possible like to run full time. So, I try to give it the > large guess load. I don't see why rq->util should be treated different depending on the number of tasks causing the load. rq->util = 50 means that the cpu is busy about 50% of the time no matter how many tasks contibute to that load. If nr_running = 1 instead in my example, you would consider the cpu vacant if putil = 6, but if nr_running > 1 you would not. Why should the two scenarios be treated differently? > > > > Also, why multiply putil by 8? rq->util must be very close to 0 for > > vacancy to be positive if putil is close to 12 (12.5%). > > just want to pack small util tasks, since packing is possible to hurt > performance. I agree that packing may affect performance. But why don't you reduce FULL_UTIL instead of multiplying by 8? With current expression you will not pack a 10% task if rq->util = 20 and nr_running = 1, but you would pack a 6% task even if rq->util = 50 and the resulting cpu load is much higher. > > > > The vacancy variable is declared unsigned, so it will underflow instead > > of becoming negative. Is this intentional? > > ops, my mistake. > > > > I may be missing something, but could the expression be something like > > the below instead? > > > > Create a putil < 12.5% check before the loop. There is no reason to > > recheck it every iteration. Then: > > > > vacancy = FULL_UTIL - (rq->util + putil) > > > > should be enough? > > > >> + > >> + /* bias toward local cpu */ > >> + if (vacancy > 0 && (i == this_cpu)) > >> + return i; > >> + > >> + if (vacancy > 0 && vacancy < min_vacancy) { > >> + min_vacancy = vacancy; > >> + idlest = i; > > > > "idlest" may be a bit misleading here as you actually select busiest cpu > > that have enough spare capacity to take the task. > > Um, change to leader_cpu? Fine by me. Morten > > > > Morten > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On Fri, Jan 11, 2013 at 03:47:03AM +, Alex Shi wrote: On 01/11/2013 01:17 AM, Morten Rasmussen wrote: On Sat, Jan 05, 2013 at 08:37:46AM +, Alex Shi wrote: If the wake/exec task is small enough, utils 12.5%, it will has the chance to be packed into a cpu which is busy but still has space to handle it. Signed-off-by: Alex Shi alex@intel.com --- kernel/sched/fair.c | 51 +-- 1 file changed, 45 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8d0d3af..0596e81 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3471,19 +3471,57 @@ static inline int get_sd_sched_policy(struct sched_domain *sd, } /* + * find_leader_cpu - find the busiest but still has enough leisure time cpu + * among the cpus in group. + */ +static int +find_leader_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) +{ + unsigned vacancy, min_vacancy = UINT_MAX; unsigned int? yes + int idlest = -1; + int i; + /* percentage the task's util */ + unsigned putil = p-se.avg.runnable_avg_sum * 100 + / (p-se.avg.runnable_avg_period + 1); Alternatively you could use se.avg.load_avg_contrib which is the same ratio scaled by the task priority (se-load.weight). In the above expression you don't take priority into account. sure. but this seems more directly of meaningful. + + /* Traverse only the allowed CPUs */ + for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { + struct rq *rq = cpu_rq(i); + int nr_running = rq-nr_running 0 ? rq-nr_running : 1; + + /* only pack task which putil 12.5% */ + vacancy = FULL_UTIL - (rq-util * nr_running + putil * 8); I can't follow this expression. The variables can have the following values: FULL_UTIL = 99 rq-util = [0..99] nr_running = [1..inf] putil = [0..99] Why multiply rq-util by nr_running? Let's take an example where rq-util = 50, nr_running = 2, and putil = 10. In this case the value of putil doesn't really matter as vacancy would be negative anyway since FULL_UTIL - rq-util * nr_running is -1. However, with rq-util = 50 there should be plenty of spare cpu time to take another task. for this example, the util is not full maybe due to it was just wake up, it still is possible like to run full time. So, I try to give it the large guess load. I don't see why rq-util should be treated different depending on the number of tasks causing the load. rq-util = 50 means that the cpu is busy about 50% of the time no matter how many tasks contibute to that load. If nr_running = 1 instead in my example, you would consider the cpu vacant if putil = 6, but if nr_running 1 you would not. Why should the two scenarios be treated differently? Also, why multiply putil by 8? rq-util must be very close to 0 for vacancy to be positive if putil is close to 12 (12.5%). just want to pack small util tasks, since packing is possible to hurt performance. I agree that packing may affect performance. But why don't you reduce FULL_UTIL instead of multiplying by 8? With current expression you will not pack a 10% task if rq-util = 20 and nr_running = 1, but you would pack a 6% task even if rq-util = 50 and the resulting cpu load is much higher. The vacancy variable is declared unsigned, so it will underflow instead of becoming negative. Is this intentional? ops, my mistake. I may be missing something, but could the expression be something like the below instead? Create a putil 12.5% check before the loop. There is no reason to recheck it every iteration. Then: vacancy = FULL_UTIL - (rq-util + putil) should be enough? + + /* bias toward local cpu */ + if (vacancy 0 (i == this_cpu)) + return i; + + if (vacancy 0 vacancy min_vacancy) { + min_vacancy = vacancy; + idlest = i; idlest may be a bit misleading here as you actually select busiest cpu that have enough spare capacity to take the task. Um, change to leader_cpu? Fine by me. Morten Morten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On Fri, 11 Jan 2013 11:47:03 +0800, Alex Shi wrote: > On 01/11/2013 01:17 AM, Morten Rasmussen wrote: >> On Sat, Jan 05, 2013 at 08:37:46AM +, Alex Shi wrote: >>> If the wake/exec task is small enough, utils < 12.5%, it will >>> has the chance to be packed into a cpu which is busy but still has space to >>> handle it. >>> >>> Signed-off-by: Alex Shi >>> --- [snip] >> I may be missing something, but could the expression be something like >> the below instead? >> >> Create a putil < 12.5% check before the loop. There is no reason to >> recheck it every iteration. Then: Agreed. Also suggest that the checking local cpu can also be moved before the loop so that it can be used without going through the loop if it's vacant enough. >> >> vacancy = FULL_UTIL - (rq->util + putil) >> >> should be enough? >> >>> + >>> + /* bias toward local cpu */ >>> + if (vacancy > 0 && (i == this_cpu)) >>> + return i; >>> + >>> + if (vacancy > 0 && vacancy < min_vacancy) { >>> + min_vacancy = vacancy; >>> + idlest = i; >> >> "idlest" may be a bit misleading here as you actually select busiest cpu >> that have enough spare capacity to take the task. > > Um, change to leader_cpu? vacantest? ;-) Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On Fri, 11 Jan 2013 11:47:03 +0800, Alex Shi wrote: On 01/11/2013 01:17 AM, Morten Rasmussen wrote: On Sat, Jan 05, 2013 at 08:37:46AM +, Alex Shi wrote: If the wake/exec task is small enough, utils 12.5%, it will has the chance to be packed into a cpu which is busy but still has space to handle it. Signed-off-by: Alex Shi alex@intel.com --- [snip] I may be missing something, but could the expression be something like the below instead? Create a putil 12.5% check before the loop. There is no reason to recheck it every iteration. Then: Agreed. Also suggest that the checking local cpu can also be moved before the loop so that it can be used without going through the loop if it's vacant enough. vacancy = FULL_UTIL - (rq-util + putil) should be enough? + + /* bias toward local cpu */ + if (vacancy 0 (i == this_cpu)) + return i; + + if (vacancy 0 vacancy min_vacancy) { + min_vacancy = vacancy; + idlest = i; idlest may be a bit misleading here as you actually select busiest cpu that have enough spare capacity to take the task. Um, change to leader_cpu? vacantest? ;-) Thanks, Namhyung -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On 01/11/2013 01:17 AM, Morten Rasmussen wrote: > On Sat, Jan 05, 2013 at 08:37:46AM +, Alex Shi wrote: >> If the wake/exec task is small enough, utils < 12.5%, it will >> has the chance to be packed into a cpu which is busy but still has space to >> handle it. >> >> Signed-off-by: Alex Shi >> --- >> kernel/sched/fair.c | 51 +-- >> 1 file changed, 45 insertions(+), 6 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 8d0d3af..0596e81 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -3471,19 +3471,57 @@ static inline int get_sd_sched_policy(struct >> sched_domain *sd, >> } >> >> /* >> + * find_leader_cpu - find the busiest but still has enough leisure time cpu >> + * among the cpus in group. >> + */ >> +static int >> +find_leader_cpu(struct sched_group *group, struct task_struct *p, int >> this_cpu) >> +{ >> +unsigned vacancy, min_vacancy = UINT_MAX; > > unsigned int? yes > >> +int idlest = -1; >> +int i; >> +/* percentage the task's util */ >> +unsigned putil = p->se.avg.runnable_avg_sum * 100 >> +/ (p->se.avg.runnable_avg_period + 1); > > Alternatively you could use se.avg.load_avg_contrib which is the same > ratio scaled by the task priority (se->load.weight). In the above > expression you don't take priority into account. sure. but this seems more directly of meaningful. > >> + >> +/* Traverse only the allowed CPUs */ >> +for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { >> +struct rq *rq = cpu_rq(i); >> +int nr_running = rq->nr_running > 0 ? rq->nr_running : 1; >> + >> +/* only pack task which putil < 12.5% */ >> +vacancy = FULL_UTIL - (rq->util * nr_running + putil * 8); > > I can't follow this expression. > > The variables can have the following values: > FULL_UTIL = 99 > rq->util = [0..99] > nr_running = [1..inf] > putil = [0..99] > > Why multiply rq->util by nr_running? > > Let's take an example where rq->util = 50, nr_running = 2, and putil = > 10. In this case the value of putil doesn't really matter as vacancy > would be negative anyway since FULL_UTIL - rq->util * nr_running is -1. > However, with rq->util = 50 there should be plenty of spare cpu time to > take another task. for this example, the util is not full maybe due to it was just wake up, it still is possible like to run full time. So, I try to give it the large guess load. > > Also, why multiply putil by 8? rq->util must be very close to 0 for > vacancy to be positive if putil is close to 12 (12.5%). just want to pack small util tasks, since packing is possible to hurt performance. > > The vacancy variable is declared unsigned, so it will underflow instead > of becoming negative. Is this intentional? ops, my mistake. > > I may be missing something, but could the expression be something like > the below instead? > > Create a putil < 12.5% check before the loop. There is no reason to > recheck it every iteration. Then: > > vacancy = FULL_UTIL - (rq->util + putil) > > should be enough? > >> + >> +/* bias toward local cpu */ >> +if (vacancy > 0 && (i == this_cpu)) >> +return i; >> + >> +if (vacancy > 0 && vacancy < min_vacancy) { >> +min_vacancy = vacancy; >> +idlest = i; > > "idlest" may be a bit misleading here as you actually select busiest cpu > that have enough spare capacity to take the task. Um, change to leader_cpu? > > Morten > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On Sat, Jan 05, 2013 at 08:37:46AM +, Alex Shi wrote: > If the wake/exec task is small enough, utils < 12.5%, it will > has the chance to be packed into a cpu which is busy but still has space to > handle it. > > Signed-off-by: Alex Shi > --- > kernel/sched/fair.c | 51 +-- > 1 file changed, 45 insertions(+), 6 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 8d0d3af..0596e81 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3471,19 +3471,57 @@ static inline int get_sd_sched_policy(struct > sched_domain *sd, > } > > /* > + * find_leader_cpu - find the busiest but still has enough leisure time cpu > + * among the cpus in group. > + */ > +static int > +find_leader_cpu(struct sched_group *group, struct task_struct *p, int > this_cpu) > +{ > + unsigned vacancy, min_vacancy = UINT_MAX; unsigned int? > + int idlest = -1; > + int i; > + /* percentage the task's util */ > + unsigned putil = p->se.avg.runnable_avg_sum * 100 > + / (p->se.avg.runnable_avg_period + 1); Alternatively you could use se.avg.load_avg_contrib which is the same ratio scaled by the task priority (se->load.weight). In the above expression you don't take priority into account. > + > + /* Traverse only the allowed CPUs */ > + for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { > + struct rq *rq = cpu_rq(i); > + int nr_running = rq->nr_running > 0 ? rq->nr_running : 1; > + > + /* only pack task which putil < 12.5% */ > + vacancy = FULL_UTIL - (rq->util * nr_running + putil * 8); I can't follow this expression. The variables can have the following values: FULL_UTIL = 99 rq->util = [0..99] nr_running = [1..inf] putil = [0..99] Why multiply rq->util by nr_running? Let's take an example where rq->util = 50, nr_running = 2, and putil = 10. In this case the value of putil doesn't really matter as vacancy would be negative anyway since FULL_UTIL - rq->util * nr_running is -1. However, with rq->util = 50 there should be plenty of spare cpu time to take another task. Also, why multiply putil by 8? rq->util must be very close to 0 for vacancy to be positive if putil is close to 12 (12.5%). The vacancy variable is declared unsigned, so it will underflow instead of becoming negative. Is this intentional? I may be missing something, but could the expression be something like the below instead? Create a putil < 12.5% check before the loop. There is no reason to recheck it every iteration. Then: vacancy = FULL_UTIL - (rq->util + putil) should be enough? > + > + /* bias toward local cpu */ > + if (vacancy > 0 && (i == this_cpu)) > + return i; > + > + if (vacancy > 0 && vacancy < min_vacancy) { > + min_vacancy = vacancy; > + idlest = i; "idlest" may be a bit misleading here as you actually select busiest cpu that have enough spare capacity to take the task. Morten > + } > + } > + return idlest; > +} > + > +/* > * If power policy is eligible for this domain, and it has task allowed cpu. > * we will select CPU from this domain. > */ > static int get_cpu_for_power_policy(struct sched_domain *sd, int cpu, > - struct task_struct *p, struct sd_lb_stats *sds) > + struct task_struct *p, struct sd_lb_stats *sds, int fork) > { > int policy; > int new_cpu = -1; > > policy = get_sd_sched_policy(sd, cpu, p, sds); > - if (policy != SCHED_POLICY_PERFORMANCE && sds->group_leader) > - new_cpu = find_idlest_cpu(sds->group_leader, p, cpu); > - > + if (policy != SCHED_POLICY_PERFORMANCE && sds->group_leader) { > + if (!fork) > + new_cpu = find_leader_cpu(sds->group_leader, p, cpu); > + /* for fork balancing and a little busy task */ > + if (new_cpu == -1) > + new_cpu = find_idlest_cpu(sds->group_leader, p, cpu); > + } > return new_cpu; > } > > @@ -3534,14 +3572,15 @@ select_task_rq_fair(struct task_struct *p, int > sd_flag, int flags) > if (tmp->flags & sd_flag) { > sd = tmp; > > - new_cpu = get_cpu_for_power_policy(sd, cpu, p, ); > + new_cpu = get_cpu_for_power_policy(sd, cpu, p, , > + flags & SD_BALANCE_FORK); > if (new_cpu != -1) > goto unlock; > } > } > > if (affine_sd) { > - new_cpu = get_cpu_for_power_policy(affine_sd, cpu, p, ); > + new_cpu = get_cpu_for_power_policy(affine_sd, cpu, p, , 0); > if (new_cpu != -1) > goto unlock; > > -- > 1.7.12 > > -- > To unsubscribe from this list: send
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On Sat, Jan 05, 2013 at 08:37:46AM +, Alex Shi wrote: If the wake/exec task is small enough, utils 12.5%, it will has the chance to be packed into a cpu which is busy but still has space to handle it. Signed-off-by: Alex Shi alex@intel.com --- kernel/sched/fair.c | 51 +-- 1 file changed, 45 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8d0d3af..0596e81 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3471,19 +3471,57 @@ static inline int get_sd_sched_policy(struct sched_domain *sd, } /* + * find_leader_cpu - find the busiest but still has enough leisure time cpu + * among the cpus in group. + */ +static int +find_leader_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) +{ + unsigned vacancy, min_vacancy = UINT_MAX; unsigned int? + int idlest = -1; + int i; + /* percentage the task's util */ + unsigned putil = p-se.avg.runnable_avg_sum * 100 + / (p-se.avg.runnable_avg_period + 1); Alternatively you could use se.avg.load_avg_contrib which is the same ratio scaled by the task priority (se-load.weight). In the above expression you don't take priority into account. + + /* Traverse only the allowed CPUs */ + for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { + struct rq *rq = cpu_rq(i); + int nr_running = rq-nr_running 0 ? rq-nr_running : 1; + + /* only pack task which putil 12.5% */ + vacancy = FULL_UTIL - (rq-util * nr_running + putil * 8); I can't follow this expression. The variables can have the following values: FULL_UTIL = 99 rq-util = [0..99] nr_running = [1..inf] putil = [0..99] Why multiply rq-util by nr_running? Let's take an example where rq-util = 50, nr_running = 2, and putil = 10. In this case the value of putil doesn't really matter as vacancy would be negative anyway since FULL_UTIL - rq-util * nr_running is -1. However, with rq-util = 50 there should be plenty of spare cpu time to take another task. Also, why multiply putil by 8? rq-util must be very close to 0 for vacancy to be positive if putil is close to 12 (12.5%). The vacancy variable is declared unsigned, so it will underflow instead of becoming negative. Is this intentional? I may be missing something, but could the expression be something like the below instead? Create a putil 12.5% check before the loop. There is no reason to recheck it every iteration. Then: vacancy = FULL_UTIL - (rq-util + putil) should be enough? + + /* bias toward local cpu */ + if (vacancy 0 (i == this_cpu)) + return i; + + if (vacancy 0 vacancy min_vacancy) { + min_vacancy = vacancy; + idlest = i; idlest may be a bit misleading here as you actually select busiest cpu that have enough spare capacity to take the task. Morten + } + } + return idlest; +} + +/* * If power policy is eligible for this domain, and it has task allowed cpu. * we will select CPU from this domain. */ static int get_cpu_for_power_policy(struct sched_domain *sd, int cpu, - struct task_struct *p, struct sd_lb_stats *sds) + struct task_struct *p, struct sd_lb_stats *sds, int fork) { int policy; int new_cpu = -1; policy = get_sd_sched_policy(sd, cpu, p, sds); - if (policy != SCHED_POLICY_PERFORMANCE sds-group_leader) - new_cpu = find_idlest_cpu(sds-group_leader, p, cpu); - + if (policy != SCHED_POLICY_PERFORMANCE sds-group_leader) { + if (!fork) + new_cpu = find_leader_cpu(sds-group_leader, p, cpu); + /* for fork balancing and a little busy task */ + if (new_cpu == -1) + new_cpu = find_idlest_cpu(sds-group_leader, p, cpu); + } return new_cpu; } @@ -3534,14 +3572,15 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int flags) if (tmp-flags sd_flag) { sd = tmp; - new_cpu = get_cpu_for_power_policy(sd, cpu, p, sds); + new_cpu = get_cpu_for_power_policy(sd, cpu, p, sds, + flags SD_BALANCE_FORK); if (new_cpu != -1) goto unlock; } } if (affine_sd) { - new_cpu = get_cpu_for_power_policy(affine_sd, cpu, p, sds); + new_cpu = get_cpu_for_power_policy(affine_sd, cpu, p, sds, 0); if (new_cpu != -1) goto unlock; -- 1.7.12 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info
Re: [PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
On 01/11/2013 01:17 AM, Morten Rasmussen wrote: On Sat, Jan 05, 2013 at 08:37:46AM +, Alex Shi wrote: If the wake/exec task is small enough, utils 12.5%, it will has the chance to be packed into a cpu which is busy but still has space to handle it. Signed-off-by: Alex Shi alex@intel.com --- kernel/sched/fair.c | 51 +-- 1 file changed, 45 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8d0d3af..0596e81 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3471,19 +3471,57 @@ static inline int get_sd_sched_policy(struct sched_domain *sd, } /* + * find_leader_cpu - find the busiest but still has enough leisure time cpu + * among the cpus in group. + */ +static int +find_leader_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) +{ +unsigned vacancy, min_vacancy = UINT_MAX; unsigned int? yes +int idlest = -1; +int i; +/* percentage the task's util */ +unsigned putil = p-se.avg.runnable_avg_sum * 100 +/ (p-se.avg.runnable_avg_period + 1); Alternatively you could use se.avg.load_avg_contrib which is the same ratio scaled by the task priority (se-load.weight). In the above expression you don't take priority into account. sure. but this seems more directly of meaningful. + +/* Traverse only the allowed CPUs */ +for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { +struct rq *rq = cpu_rq(i); +int nr_running = rq-nr_running 0 ? rq-nr_running : 1; + +/* only pack task which putil 12.5% */ +vacancy = FULL_UTIL - (rq-util * nr_running + putil * 8); I can't follow this expression. The variables can have the following values: FULL_UTIL = 99 rq-util = [0..99] nr_running = [1..inf] putil = [0..99] Why multiply rq-util by nr_running? Let's take an example where rq-util = 50, nr_running = 2, and putil = 10. In this case the value of putil doesn't really matter as vacancy would be negative anyway since FULL_UTIL - rq-util * nr_running is -1. However, with rq-util = 50 there should be plenty of spare cpu time to take another task. for this example, the util is not full maybe due to it was just wake up, it still is possible like to run full time. So, I try to give it the large guess load. Also, why multiply putil by 8? rq-util must be very close to 0 for vacancy to be positive if putil is close to 12 (12.5%). just want to pack small util tasks, since packing is possible to hurt performance. The vacancy variable is declared unsigned, so it will underflow instead of becoming negative. Is this intentional? ops, my mistake. I may be missing something, but could the expression be something like the below instead? Create a putil 12.5% check before the loop. There is no reason to recheck it every iteration. Then: vacancy = FULL_UTIL - (rq-util + putil) should be enough? + +/* bias toward local cpu */ +if (vacancy 0 (i == this_cpu)) +return i; + +if (vacancy 0 vacancy min_vacancy) { +min_vacancy = vacancy; +idlest = i; idlest may be a bit misleading here as you actually select busiest cpu that have enough spare capacity to take the task. Um, change to leader_cpu? Morten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
If the wake/exec task is small enough, utils < 12.5%, it will has the chance to be packed into a cpu which is busy but still has space to handle it. Signed-off-by: Alex Shi --- kernel/sched/fair.c | 51 +-- 1 file changed, 45 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8d0d3af..0596e81 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3471,19 +3471,57 @@ static inline int get_sd_sched_policy(struct sched_domain *sd, } /* + * find_leader_cpu - find the busiest but still has enough leisure time cpu + * among the cpus in group. + */ +static int +find_leader_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) +{ + unsigned vacancy, min_vacancy = UINT_MAX; + int idlest = -1; + int i; + /* percentage the task's util */ + unsigned putil = p->se.avg.runnable_avg_sum * 100 + / (p->se.avg.runnable_avg_period + 1); + + /* Traverse only the allowed CPUs */ + for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { + struct rq *rq = cpu_rq(i); + int nr_running = rq->nr_running > 0 ? rq->nr_running : 1; + + /* only pack task which putil < 12.5% */ + vacancy = FULL_UTIL - (rq->util * nr_running + putil * 8); + + /* bias toward local cpu */ + if (vacancy > 0 && (i == this_cpu)) + return i; + + if (vacancy > 0 && vacancy < min_vacancy) { + min_vacancy = vacancy; + idlest = i; + } + } + return idlest; +} + +/* * If power policy is eligible for this domain, and it has task allowed cpu. * we will select CPU from this domain. */ static int get_cpu_for_power_policy(struct sched_domain *sd, int cpu, - struct task_struct *p, struct sd_lb_stats *sds) + struct task_struct *p, struct sd_lb_stats *sds, int fork) { int policy; int new_cpu = -1; policy = get_sd_sched_policy(sd, cpu, p, sds); - if (policy != SCHED_POLICY_PERFORMANCE && sds->group_leader) - new_cpu = find_idlest_cpu(sds->group_leader, p, cpu); - + if (policy != SCHED_POLICY_PERFORMANCE && sds->group_leader) { + if (!fork) + new_cpu = find_leader_cpu(sds->group_leader, p, cpu); + /* for fork balancing and a little busy task */ + if (new_cpu == -1) + new_cpu = find_idlest_cpu(sds->group_leader, p, cpu); + } return new_cpu; } @@ -3534,14 +3572,15 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int flags) if (tmp->flags & sd_flag) { sd = tmp; - new_cpu = get_cpu_for_power_policy(sd, cpu, p, ); + new_cpu = get_cpu_for_power_policy(sd, cpu, p, , + flags & SD_BALANCE_FORK); if (new_cpu != -1) goto unlock; } } if (affine_sd) { - new_cpu = get_cpu_for_power_policy(affine_sd, cpu, p, ); + new_cpu = get_cpu_for_power_policy(affine_sd, cpu, p, , 0); if (new_cpu != -1) goto unlock; -- 1.7.12 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 17/22] sched: packing small tasks in wake/exec balancing
If the wake/exec task is small enough, utils 12.5%, it will has the chance to be packed into a cpu which is busy but still has space to handle it. Signed-off-by: Alex Shi alex@intel.com --- kernel/sched/fair.c | 51 +-- 1 file changed, 45 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8d0d3af..0596e81 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3471,19 +3471,57 @@ static inline int get_sd_sched_policy(struct sched_domain *sd, } /* + * find_leader_cpu - find the busiest but still has enough leisure time cpu + * among the cpus in group. + */ +static int +find_leader_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) +{ + unsigned vacancy, min_vacancy = UINT_MAX; + int idlest = -1; + int i; + /* percentage the task's util */ + unsigned putil = p-se.avg.runnable_avg_sum * 100 + / (p-se.avg.runnable_avg_period + 1); + + /* Traverse only the allowed CPUs */ + for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { + struct rq *rq = cpu_rq(i); + int nr_running = rq-nr_running 0 ? rq-nr_running : 1; + + /* only pack task which putil 12.5% */ + vacancy = FULL_UTIL - (rq-util * nr_running + putil * 8); + + /* bias toward local cpu */ + if (vacancy 0 (i == this_cpu)) + return i; + + if (vacancy 0 vacancy min_vacancy) { + min_vacancy = vacancy; + idlest = i; + } + } + return idlest; +} + +/* * If power policy is eligible for this domain, and it has task allowed cpu. * we will select CPU from this domain. */ static int get_cpu_for_power_policy(struct sched_domain *sd, int cpu, - struct task_struct *p, struct sd_lb_stats *sds) + struct task_struct *p, struct sd_lb_stats *sds, int fork) { int policy; int new_cpu = -1; policy = get_sd_sched_policy(sd, cpu, p, sds); - if (policy != SCHED_POLICY_PERFORMANCE sds-group_leader) - new_cpu = find_idlest_cpu(sds-group_leader, p, cpu); - + if (policy != SCHED_POLICY_PERFORMANCE sds-group_leader) { + if (!fork) + new_cpu = find_leader_cpu(sds-group_leader, p, cpu); + /* for fork balancing and a little busy task */ + if (new_cpu == -1) + new_cpu = find_idlest_cpu(sds-group_leader, p, cpu); + } return new_cpu; } @@ -3534,14 +3572,15 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int flags) if (tmp-flags sd_flag) { sd = tmp; - new_cpu = get_cpu_for_power_policy(sd, cpu, p, sds); + new_cpu = get_cpu_for_power_policy(sd, cpu, p, sds, + flags SD_BALANCE_FORK); if (new_cpu != -1) goto unlock; } } if (affine_sd) { - new_cpu = get_cpu_for_power_policy(affine_sd, cpu, p, sds); + new_cpu = get_cpu_for_power_policy(affine_sd, cpu, p, sds, 0); if (new_cpu != -1) goto unlock; -- 1.7.12 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/