Re: [RFC PATCH v3 3/6] sched: pack small tasks
On 26 April 2013 12:30, Peter Zijlstra wrote: > On Wed, Mar 27, 2013 at 12:00:40PM +0100, Vincent Guittot wrote: >> On 27 March 2013 11:21, Preeti U Murthy wrote: >> > Hi, >> > >> > On 03/26/2013 05:56 PM, Peter Zijlstra wrote: >> >> On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: >> >>> +static bool is_buddy_busy(int cpu) >> >>> +{ >> >>> + struct rq *rq = cpu_rq(cpu); >> >>> + >> >>> + /* >> >>> +* A busy buddy is a CPU with a high load or a small load with >> >>> a lot of >> >>> +* running tasks. >> >>> +*/ >> >>> + return (rq->avg.runnable_avg_sum > >> >>> + (rq->avg.runnable_avg_period / (rq->nr_running >> >>> + 2))); >> >>> +} >> >> >> >> Why does the comment talk about load but we don't see it in the >> >> equation. Also, why does nr_running matter at all? I thought we'd >> >> simply bother with utilization, if fully utilized we're done etc.. >> >> By load, I mean : 100 * avg.runnable_avg_sum / avg.runnable_avg_period >> In addition, i take into account the number of tasks already in the >> runqueue in order to define the business of a CPU. A CPU with a load >> of 50% without any tasks in the runqeue in not busy at this time and >> we can migrate tasks on it but if the CPU already has 2 tasks in its >> runqueue, it means that newly wake up task will have to share the CPU >> with other tasks so we consider that the CPU is already busy and we >> will fall back to default behavior. The equation considers that a CPU >> is not busy if >> 100 * avg.runnable_avg_sum / avg.runnable_avg_period < 100 / (nr_running + 2) > > I'm still somewhat confused by all this. So raising nr_running will lower the > required utilization to be considered busy. Suppose we have 50 tasks running, > all at 1% utilization (bulk wakeup) we'd consider the cpu busy, even though > its > picking its nose for half the time. > > > I'm assuming it's mean to limit process latency or so? Why are you concerned > with that? This seems like an arbitrary power vs performance tweak without > solid effidence its needed or even wanted. Yes the goal was to limit the wake up latency because this version was only trying to modify the scheduler behavior when the system was not busy in order to pack the small tasks like background activities but without decreasing the performance so we were concerned by wakeup latency. The new version proposes a more aggressive mode that packs all tasks until CPUs becomes full. Vincent -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
Hi Peter, On 04/26/2013 03:48 PM, Peter Zijlstra wrote: > On Wed, Mar 27, 2013 at 03:51:51PM +0530, Preeti U Murthy wrote: >> Hi, >> >> On 03/26/2013 05:56 PM, Peter Zijlstra wrote: >>> On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: +static bool is_buddy_busy(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + /* +* A busy buddy is a CPU with a high load or a small load with a lot of +* running tasks. +*/ + return (rq->avg.runnable_avg_sum > + (rq->avg.runnable_avg_period / (rq->nr_running + 2))); +} >>> >>> Why does the comment talk about load but we don't see it in the >>> equation. Also, why does nr_running matter at all? I thought we'd >>> simply bother with utilization, if fully utilized we're done etc.. >>> >> >> Peter, lets say the run-queue has 50% utilization and is running 2 >> tasks. And we wish to find out if it is busy. We would compare this >> metric with the cpu power, which lets say is 100. >> >> rq->util * 100 < cpu_of(rq)->power. >> >> In the above scenario would we declare the cpu _not_busy? Or would we do >> the following: >> >> (rq->util * 100) * #nr_running < cpu_of(rq)->power and conclude that it >> is just enough _busy_ to not take on more processes? > > That is just confused... ->power doesn't have anything to do with a per-cpu > measure. ->power is a inter-cpu measure of relative compute capacity. Ok. > > Mixing in nr_running confuses things even more; it doesn't matter how many > tasks it takes to push utilization up to 100%; once its there the cpu simply > cannot run more. True, this is from the perspective of the CPU. But will not the tasks on this CPU get throttled if, you find the utilization of this CPU < 100% and decide to put more tasks on it? Regards Preeti U Murthy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Wed, Mar 27, 2013 at 12:00:40PM +0100, Vincent Guittot wrote: > On 27 March 2013 11:21, Preeti U Murthy wrote: > > Hi, > > > > On 03/26/2013 05:56 PM, Peter Zijlstra wrote: > >> On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: > >>> +static bool is_buddy_busy(int cpu) > >>> +{ > >>> + struct rq *rq = cpu_rq(cpu); > >>> + > >>> + /* > >>> +* A busy buddy is a CPU with a high load or a small load with > >>> a lot of > >>> +* running tasks. > >>> +*/ > >>> + return (rq->avg.runnable_avg_sum > > >>> + (rq->avg.runnable_avg_period / (rq->nr_running > >>> + 2))); > >>> +} > >> > >> Why does the comment talk about load but we don't see it in the > >> equation. Also, why does nr_running matter at all? I thought we'd > >> simply bother with utilization, if fully utilized we're done etc.. > > By load, I mean : 100 * avg.runnable_avg_sum / avg.runnable_avg_period > In addition, i take into account the number of tasks already in the > runqueue in order to define the business of a CPU. A CPU with a load > of 50% without any tasks in the runqeue in not busy at this time and > we can migrate tasks on it but if the CPU already has 2 tasks in its > runqueue, it means that newly wake up task will have to share the CPU > with other tasks so we consider that the CPU is already busy and we > will fall back to default behavior. The equation considers that a CPU > is not busy if > 100 * avg.runnable_avg_sum / avg.runnable_avg_period < 100 / (nr_running + 2) I'm still somewhat confused by all this. So raising nr_running will lower the required utilization to be considered busy. Suppose we have 50 tasks running, all at 1% utilization (bulk wakeup) we'd consider the cpu busy, even though its picking its nose for half the time. I'm assuming it's mean to limit process latency or so? Why are you concerned with that? This seems like an arbitrary power vs performance tweak without solid effidence its needed or even wanted. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Wed, Mar 27, 2013 at 03:51:51PM +0530, Preeti U Murthy wrote: > Hi, > > On 03/26/2013 05:56 PM, Peter Zijlstra wrote: > > On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: > >> +static bool is_buddy_busy(int cpu) > >> +{ > >> + struct rq *rq = cpu_rq(cpu); > >> + > >> + /* > >> +* A busy buddy is a CPU with a high load or a small load with > >> a lot of > >> +* running tasks. > >> +*/ > >> + return (rq->avg.runnable_avg_sum > > >> + (rq->avg.runnable_avg_period / (rq->nr_running > >> + 2))); > >> +} > > > > Why does the comment talk about load but we don't see it in the > > equation. Also, why does nr_running matter at all? I thought we'd > > simply bother with utilization, if fully utilized we're done etc.. > > > > Peter, lets say the run-queue has 50% utilization and is running 2 > tasks. And we wish to find out if it is busy. We would compare this > metric with the cpu power, which lets say is 100. > > rq->util * 100 < cpu_of(rq)->power. > > In the above scenario would we declare the cpu _not_busy? Or would we do > the following: > > (rq->util * 100) * #nr_running < cpu_of(rq)->power and conclude that it > is just enough _busy_ to not take on more processes? That is just confused... ->power doesn't have anything to do with a per-cpu measure. ->power is a inter-cpu measure of relative compute capacity. Mixing in nr_running confuses things even more; it doesn't matter how many tasks it takes to push utilization up to 100%; once its there the cpu simply cannot run more. So colour me properly confused.. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Wed, Mar 27, 2013 at 03:51:51PM +0530, Preeti U Murthy wrote: Hi, On 03/26/2013 05:56 PM, Peter Zijlstra wrote: On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: +static bool is_buddy_busy(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + /* +* A busy buddy is a CPU with a high load or a small load with a lot of +* running tasks. +*/ + return (rq-avg.runnable_avg_sum + (rq-avg.runnable_avg_period / (rq-nr_running + 2))); +} Why does the comment talk about load but we don't see it in the equation. Also, why does nr_running matter at all? I thought we'd simply bother with utilization, if fully utilized we're done etc.. Peter, lets say the run-queue has 50% utilization and is running 2 tasks. And we wish to find out if it is busy. We would compare this metric with the cpu power, which lets say is 100. rq-util * 100 cpu_of(rq)-power. In the above scenario would we declare the cpu _not_busy? Or would we do the following: (rq-util * 100) * #nr_running cpu_of(rq)-power and conclude that it is just enough _busy_ to not take on more processes? That is just confused... -power doesn't have anything to do with a per-cpu measure. -power is a inter-cpu measure of relative compute capacity. Mixing in nr_running confuses things even more; it doesn't matter how many tasks it takes to push utilization up to 100%; once its there the cpu simply cannot run more. So colour me properly confused.. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Wed, Mar 27, 2013 at 12:00:40PM +0100, Vincent Guittot wrote: On 27 March 2013 11:21, Preeti U Murthy pre...@linux.vnet.ibm.com wrote: Hi, On 03/26/2013 05:56 PM, Peter Zijlstra wrote: On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: +static bool is_buddy_busy(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + /* +* A busy buddy is a CPU with a high load or a small load with a lot of +* running tasks. +*/ + return (rq-avg.runnable_avg_sum + (rq-avg.runnable_avg_period / (rq-nr_running + 2))); +} Why does the comment talk about load but we don't see it in the equation. Also, why does nr_running matter at all? I thought we'd simply bother with utilization, if fully utilized we're done etc.. By load, I mean : 100 * avg.runnable_avg_sum / avg.runnable_avg_period In addition, i take into account the number of tasks already in the runqueue in order to define the business of a CPU. A CPU with a load of 50% without any tasks in the runqeue in not busy at this time and we can migrate tasks on it but if the CPU already has 2 tasks in its runqueue, it means that newly wake up task will have to share the CPU with other tasks so we consider that the CPU is already busy and we will fall back to default behavior. The equation considers that a CPU is not busy if 100 * avg.runnable_avg_sum / avg.runnable_avg_period 100 / (nr_running + 2) I'm still somewhat confused by all this. So raising nr_running will lower the required utilization to be considered busy. Suppose we have 50 tasks running, all at 1% utilization (bulk wakeup) we'd consider the cpu busy, even though its picking its nose for half the time. I'm assuming it's mean to limit process latency or so? Why are you concerned with that? This seems like an arbitrary power vs performance tweak without solid effidence its needed or even wanted. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
Hi Peter, On 04/26/2013 03:48 PM, Peter Zijlstra wrote: On Wed, Mar 27, 2013 at 03:51:51PM +0530, Preeti U Murthy wrote: Hi, On 03/26/2013 05:56 PM, Peter Zijlstra wrote: On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: +static bool is_buddy_busy(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + /* +* A busy buddy is a CPU with a high load or a small load with a lot of +* running tasks. +*/ + return (rq-avg.runnable_avg_sum + (rq-avg.runnable_avg_period / (rq-nr_running + 2))); +} Why does the comment talk about load but we don't see it in the equation. Also, why does nr_running matter at all? I thought we'd simply bother with utilization, if fully utilized we're done etc.. Peter, lets say the run-queue has 50% utilization and is running 2 tasks. And we wish to find out if it is busy. We would compare this metric with the cpu power, which lets say is 100. rq-util * 100 cpu_of(rq)-power. In the above scenario would we declare the cpu _not_busy? Or would we do the following: (rq-util * 100) * #nr_running cpu_of(rq)-power and conclude that it is just enough _busy_ to not take on more processes? That is just confused... -power doesn't have anything to do with a per-cpu measure. -power is a inter-cpu measure of relative compute capacity. Ok. Mixing in nr_running confuses things even more; it doesn't matter how many tasks it takes to push utilization up to 100%; once its there the cpu simply cannot run more. True, this is from the perspective of the CPU. But will not the tasks on this CPU get throttled if, you find the utilization of this CPU 100% and decide to put more tasks on it? Regards Preeti U Murthy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On 26 April 2013 12:30, Peter Zijlstra pet...@infradead.org wrote: On Wed, Mar 27, 2013 at 12:00:40PM +0100, Vincent Guittot wrote: On 27 March 2013 11:21, Preeti U Murthy pre...@linux.vnet.ibm.com wrote: Hi, On 03/26/2013 05:56 PM, Peter Zijlstra wrote: On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: +static bool is_buddy_busy(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + /* +* A busy buddy is a CPU with a high load or a small load with a lot of +* running tasks. +*/ + return (rq-avg.runnable_avg_sum + (rq-avg.runnable_avg_period / (rq-nr_running + 2))); +} Why does the comment talk about load but we don't see it in the equation. Also, why does nr_running matter at all? I thought we'd simply bother with utilization, if fully utilized we're done etc.. By load, I mean : 100 * avg.runnable_avg_sum / avg.runnable_avg_period In addition, i take into account the number of tasks already in the runqueue in order to define the business of a CPU. A CPU with a load of 50% without any tasks in the runqeue in not busy at this time and we can migrate tasks on it but if the CPU already has 2 tasks in its runqueue, it means that newly wake up task will have to share the CPU with other tasks so we consider that the CPU is already busy and we will fall back to default behavior. The equation considers that a CPU is not busy if 100 * avg.runnable_avg_sum / avg.runnable_avg_period 100 / (nr_running + 2) I'm still somewhat confused by all this. So raising nr_running will lower the required utilization to be considered busy. Suppose we have 50 tasks running, all at 1% utilization (bulk wakeup) we'd consider the cpu busy, even though its picking its nose for half the time. I'm assuming it's mean to limit process latency or so? Why are you concerned with that? This seems like an arbitrary power vs performance tweak without solid effidence its needed or even wanted. Yes the goal was to limit the wake up latency because this version was only trying to modify the scheduler behavior when the system was not busy in order to pack the small tasks like background activities but without decreasing the performance so we were concerned by wakeup latency. The new version proposes a more aggressive mode that packs all tasks until CPUs becomes full. Vincent -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Wed, Mar 27, 2013 at 05:18:53PM +, Nicolas Pitre wrote: > On Wed, 27 Mar 2013, Catalin Marinas wrote: > > > So if the above works, the scheduler guys can mandate that little CPUs > > are always first and for ARM it would be a matter of getting the right > > CPU topology in the DT (independent of what hw vendors think of CPU > > topology) and booting Linux on CPU 4 etc. > > Just a note about that: if the scheduler mandates little CPUs first, > that should _not_ have any implications on the DT content. DT is not > about encoding Linux specific implementation details. It is simple > enough to tweak the CPU logical map at run time when enumeratiing CPUs. You are right, though a simpler way (hack) to tweak the cpu_logical_map is to change the DT ;). But the problem is that the kernel doesn't know which CPU is big and which is little, unless you specify this in some way via the DT. It can be the cpu nodes order or some other means. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Wed, 27 Mar 2013, Catalin Marinas wrote: > So if the above works, the scheduler guys can mandate that little CPUs > are always first and for ARM it would be a matter of getting the right > CPU topology in the DT (independent of what hw vendors think of CPU > topology) and booting Linux on CPU 4 etc. Just a note about that: if the scheduler mandates little CPUs first, that should _not_ have any implications on the DT content. DT is not about encoding Linux specific implementation details. It is simple enough to tweak the CPU logical map at run time when enumeratiing CPUs. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Wed, 27 Mar 2013, Vincent Guittot wrote: > On 27 March 2013 09:46, Peter Zijlstra wrote: > > On Tue, 2013-03-26 at 08:29 -0700, Arjan van de Ven wrote: > >> > Isn't this basically related to picking the NO_HZ cpu; if the system > >> > isn't fully symmetric with its power gates you want the NO_HZ cpu to be > >> > the 'special' cpu. If it is symmetric we really don't care which core > >> > is left 'running' and we can even select a new pack cpu from the idle > >> > cores once the old one is fully utilized. > >> > >> you don't really care much sure, but there's some advantages for sorting > >> "all the way left", > >> e.g. to linux cpu 0. > >> Some tasks only run there, and interrupts tend to be favored to that cpu > >> as well on x86. > > > > Right, and I suspect all the big-little nonsense will have the little > > cores on low numbers as well (is this architected or can a creative > > licensee screw us over?) > > It's not mandatory to have little cores on low numbers even if it's advised We can trivially move things around in the logical CPU mapping if that simplifies things. However the boot CPU might not be CPU0 in that case which might be less trivial. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On 27 March 2013 11:21, Preeti U Murthy wrote: > Hi, > > On 03/26/2013 05:56 PM, Peter Zijlstra wrote: >> On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: >>> +static bool is_buddy_busy(int cpu) >>> +{ >>> + struct rq *rq = cpu_rq(cpu); >>> + >>> + /* >>> +* A busy buddy is a CPU with a high load or a small load with >>> a lot of >>> +* running tasks. >>> +*/ >>> + return (rq->avg.runnable_avg_sum > >>> + (rq->avg.runnable_avg_period / (rq->nr_running >>> + 2))); >>> +} >> >> Why does the comment talk about load but we don't see it in the >> equation. Also, why does nr_running matter at all? I thought we'd >> simply bother with utilization, if fully utilized we're done etc.. >> > > Peter, lets say the run-queue has 50% utilization and is running 2 > tasks. And we wish to find out if it is busy. We would compare this > metric with the cpu power, which lets say is 100. > > rq->util * 100 < cpu_of(rq)->power. I don't use cpu_of(rq)->power in the definition of the business > > In the above scenario would we declare the cpu _not_busy? Or would we do > the following: In the above scenario, the CPU is busy By load, I mean : 100 * avg.runnable_avg_sum / avg.runnable_avg_period In addition, i take into account the number of tasks already in the runqueue in order to define the business of a CPU. A CPU with a load of 50% without any tasks in the runqeue in not busy at this time and we can migrate tasks on it but if the CPU already has 2 tasks in its runqueue, it means that newly wake up task will have to share the CPU with other tasks so we consider that the CPU is already busy and we will fall back to default behavior. The equation considers that a CPU is not busy if 100 * avg.runnable_avg_sum / avg.runnable_avg_period < 100 / (nr_running + 2) > > (rq->util * 100) * #nr_running < cpu_of(rq)->power and conclude that it > is just enough _busy_ to not take on more processes? > > > @Vincent: Yes the comment above needs to be fixed. A busy buddy is a CPU > with *high rq utilization*, as far as the equation goes. I can update the comment. Is the comment below more clear ? /* * A busy buddy is a CPU with a high average running time or a small average running time but a lot of * running tasks in its runqueue which are already sharing this CPU time. */ Vincent > > Regards > Preeti U Murthy > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
Hi, On 03/26/2013 05:56 PM, Peter Zijlstra wrote: > On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: >> +static bool is_buddy_busy(int cpu) >> +{ >> + struct rq *rq = cpu_rq(cpu); >> + >> + /* >> +* A busy buddy is a CPU with a high load or a small load with >> a lot of >> +* running tasks. >> +*/ >> + return (rq->avg.runnable_avg_sum > >> + (rq->avg.runnable_avg_period / (rq->nr_running >> + 2))); >> +} > > Why does the comment talk about load but we don't see it in the > equation. Also, why does nr_running matter at all? I thought we'd > simply bother with utilization, if fully utilized we're done etc.. > Peter, lets say the run-queue has 50% utilization and is running 2 tasks. And we wish to find out if it is busy. We would compare this metric with the cpu power, which lets say is 100. rq->util * 100 < cpu_of(rq)->power. In the above scenario would we declare the cpu _not_busy? Or would we do the following: (rq->util * 100) * #nr_running < cpu_of(rq)->power and conclude that it is just enough _busy_ to not take on more processes? @Vincent: Yes the comment above needs to be fixed. A busy buddy is a CPU with *high rq utilization*, as far as the equation goes. Regards Preeti U Murthy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Wed, 2013-03-27 at 09:54 +0100, Vincent Guittot wrote: > It's not mandatory to have little cores on low numbers even if it's > advised ARGH! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Tue, 2013-03-26 at 08:29 -0700, Arjan van de Ven wrote: > > Isn't this basically related to picking the NO_HZ cpu; if the system > > isn't fully symmetric with its power gates you want the NO_HZ cpu to be > > the 'special' cpu. If it is symmetric we really don't care which core > > is left 'running' and we can even select a new pack cpu from the idle > > cores once the old one is fully utilized. > > you don't really care much sure, but there's some advantages for sorting "all > the way left", > e.g. to linux cpu 0. > Some tasks only run there, and interrupts tend to be favored to that cpu as > well on x86. Right, and I suspect all the big-little nonsense will have the little cores on low numbers as well (is this architected or can a creative licensee screw us over?) So find_new_ilb() already does cpumask_first(), so it has a strong leftmost preference. We just need to make sure it indeed does the right thing and doesn't have some unintended side effect. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On 27 March 2013 09:46, Peter Zijlstra wrote: > On Tue, 2013-03-26 at 08:29 -0700, Arjan van de Ven wrote: >> > Isn't this basically related to picking the NO_HZ cpu; if the system >> > isn't fully symmetric with its power gates you want the NO_HZ cpu to be >> > the 'special' cpu. If it is symmetric we really don't care which core >> > is left 'running' and we can even select a new pack cpu from the idle >> > cores once the old one is fully utilized. >> >> you don't really care much sure, but there's some advantages for sorting >> "all the way left", >> e.g. to linux cpu 0. >> Some tasks only run there, and interrupts tend to be favored to that cpu as >> well on x86. > > Right, and I suspect all the big-little nonsense will have the little > cores on low numbers as well (is this architected or can a creative > licensee screw us over?) It's not mandatory to have little cores on low numbers even if it's advised > > So find_new_ilb() already does cpumask_first(), so it has a strong > leftmost preference. We just need to make sure it indeed does the right > thing and doesn't have some unintended side effect. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Wed, 2013-03-27 at 12:48 +0800, Alex Shi wrote: > > Yes, the new forked runnable load was set as full utilisation in V5 > power aware scheduling. PJT, Mike and I both agree on this. PJT just > discussion how to give the full load to new forked task. and we get > agreement in my coming V6 power aware scheduling patchset. Great!, means I can mark the v5 thread read! :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Wed, 2013-03-27 at 12:48 +0800, Alex Shi wrote: Yes, the new forked runnable load was set as full utilisation in V5 power aware scheduling. PJT, Mike and I both agree on this. PJT just discussion how to give the full load to new forked task. and we get agreement in my coming V6 power aware scheduling patchset. Great!, means I can mark the v5 thread read! :-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On 27 March 2013 09:46, Peter Zijlstra pet...@infradead.org wrote: On Tue, 2013-03-26 at 08:29 -0700, Arjan van de Ven wrote: Isn't this basically related to picking the NO_HZ cpu; if the system isn't fully symmetric with its power gates you want the NO_HZ cpu to be the 'special' cpu. If it is symmetric we really don't care which core is left 'running' and we can even select a new pack cpu from the idle cores once the old one is fully utilized. you don't really care much sure, but there's some advantages for sorting all the way left, e.g. to linux cpu 0. Some tasks only run there, and interrupts tend to be favored to that cpu as well on x86. Right, and I suspect all the big-little nonsense will have the little cores on low numbers as well (is this architected or can a creative licensee screw us over?) It's not mandatory to have little cores on low numbers even if it's advised So find_new_ilb() already does cpumask_first(), so it has a strong leftmost preference. We just need to make sure it indeed does the right thing and doesn't have some unintended side effect. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Tue, 2013-03-26 at 08:29 -0700, Arjan van de Ven wrote: Isn't this basically related to picking the NO_HZ cpu; if the system isn't fully symmetric with its power gates you want the NO_HZ cpu to be the 'special' cpu. If it is symmetric we really don't care which core is left 'running' and we can even select a new pack cpu from the idle cores once the old one is fully utilized. you don't really care much sure, but there's some advantages for sorting all the way left, e.g. to linux cpu 0. Some tasks only run there, and interrupts tend to be favored to that cpu as well on x86. Right, and I suspect all the big-little nonsense will have the little cores on low numbers as well (is this architected or can a creative licensee screw us over?) So find_new_ilb() already does cpumask_first(), so it has a strong leftmost preference. We just need to make sure it indeed does the right thing and doesn't have some unintended side effect. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Wed, 2013-03-27 at 09:54 +0100, Vincent Guittot wrote: It's not mandatory to have little cores on low numbers even if it's advised ARGH! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
Hi, On 03/26/2013 05:56 PM, Peter Zijlstra wrote: On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: +static bool is_buddy_busy(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + /* +* A busy buddy is a CPU with a high load or a small load with a lot of +* running tasks. +*/ + return (rq-avg.runnable_avg_sum + (rq-avg.runnable_avg_period / (rq-nr_running + 2))); +} Why does the comment talk about load but we don't see it in the equation. Also, why does nr_running matter at all? I thought we'd simply bother with utilization, if fully utilized we're done etc.. Peter, lets say the run-queue has 50% utilization and is running 2 tasks. And we wish to find out if it is busy. We would compare this metric with the cpu power, which lets say is 100. rq-util * 100 cpu_of(rq)-power. In the above scenario would we declare the cpu _not_busy? Or would we do the following: (rq-util * 100) * #nr_running cpu_of(rq)-power and conclude that it is just enough _busy_ to not take on more processes? @Vincent: Yes the comment above needs to be fixed. A busy buddy is a CPU with *high rq utilization*, as far as the equation goes. Regards Preeti U Murthy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On 27 March 2013 11:21, Preeti U Murthy pre...@linux.vnet.ibm.com wrote: Hi, On 03/26/2013 05:56 PM, Peter Zijlstra wrote: On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: +static bool is_buddy_busy(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + /* +* A busy buddy is a CPU with a high load or a small load with a lot of +* running tasks. +*/ + return (rq-avg.runnable_avg_sum + (rq-avg.runnable_avg_period / (rq-nr_running + 2))); +} Why does the comment talk about load but we don't see it in the equation. Also, why does nr_running matter at all? I thought we'd simply bother with utilization, if fully utilized we're done etc.. Peter, lets say the run-queue has 50% utilization and is running 2 tasks. And we wish to find out if it is busy. We would compare this metric with the cpu power, which lets say is 100. rq-util * 100 cpu_of(rq)-power. I don't use cpu_of(rq)-power in the definition of the business In the above scenario would we declare the cpu _not_busy? Or would we do the following: In the above scenario, the CPU is busy By load, I mean : 100 * avg.runnable_avg_sum / avg.runnable_avg_period In addition, i take into account the number of tasks already in the runqueue in order to define the business of a CPU. A CPU with a load of 50% without any tasks in the runqeue in not busy at this time and we can migrate tasks on it but if the CPU already has 2 tasks in its runqueue, it means that newly wake up task will have to share the CPU with other tasks so we consider that the CPU is already busy and we will fall back to default behavior. The equation considers that a CPU is not busy if 100 * avg.runnable_avg_sum / avg.runnable_avg_period 100 / (nr_running + 2) (rq-util * 100) * #nr_running cpu_of(rq)-power and conclude that it is just enough _busy_ to not take on more processes? @Vincent: Yes the comment above needs to be fixed. A busy buddy is a CPU with *high rq utilization*, as far as the equation goes. I can update the comment. Is the comment below more clear ? /* * A busy buddy is a CPU with a high average running time or a small average running time but a lot of * running tasks in its runqueue which are already sharing this CPU time. */ Vincent Regards Preeti U Murthy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Wed, 27 Mar 2013, Vincent Guittot wrote: On 27 March 2013 09:46, Peter Zijlstra pet...@infradead.org wrote: On Tue, 2013-03-26 at 08:29 -0700, Arjan van de Ven wrote: Isn't this basically related to picking the NO_HZ cpu; if the system isn't fully symmetric with its power gates you want the NO_HZ cpu to be the 'special' cpu. If it is symmetric we really don't care which core is left 'running' and we can even select a new pack cpu from the idle cores once the old one is fully utilized. you don't really care much sure, but there's some advantages for sorting all the way left, e.g. to linux cpu 0. Some tasks only run there, and interrupts tend to be favored to that cpu as well on x86. Right, and I suspect all the big-little nonsense will have the little cores on low numbers as well (is this architected or can a creative licensee screw us over?) It's not mandatory to have little cores on low numbers even if it's advised We can trivially move things around in the logical CPU mapping if that simplifies things. However the boot CPU might not be CPU0 in that case which might be less trivial. Nicolas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Wed, 27 Mar 2013, Catalin Marinas wrote: So if the above works, the scheduler guys can mandate that little CPUs are always first and for ARM it would be a matter of getting the right CPU topology in the DT (independent of what hw vendors think of CPU topology) and booting Linux on CPU 4 etc. Just a note about that: if the scheduler mandates little CPUs first, that should _not_ have any implications on the DT content. DT is not about encoding Linux specific implementation details. It is simple enough to tweak the CPU logical map at run time when enumeratiing CPUs. Nicolas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Wed, Mar 27, 2013 at 05:18:53PM +, Nicolas Pitre wrote: On Wed, 27 Mar 2013, Catalin Marinas wrote: So if the above works, the scheduler guys can mandate that little CPUs are always first and for ARM it would be a matter of getting the right CPU topology in the DT (independent of what hw vendors think of CPU topology) and booting Linux on CPU 4 etc. Just a note about that: if the scheduler mandates little CPUs first, that should _not_ have any implications on the DT content. DT is not about encoding Linux specific implementation details. It is simple enough to tweak the CPU logical map at run time when enumeratiing CPUs. You are right, though a simpler way (hack) to tweak the cpu_logical_map is to change the DT ;). But the problem is that the kernel doesn't know which CPU is big and which is little, unless you specify this in some way via the DT. It can be the cpu nodes order or some other means. -- Catalin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On 03/27/2013 12:33 PM, Preeti U Murthy wrote: > Hi Peter, > > On 03/26/2013 06:07 PM, Peter Zijlstra wrote: >> On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: >>> +static bool is_light_task(struct task_struct *p) >>> +{ >>> + /* A light task runs less than 20% in average */ >>> + return ((p->se.avg.runnable_avg_sum * 5) < >>> + (p->se.avg.runnable_avg_period)); >>> +} >> >> OK, so we have a 'problem' here, we initialize runnable_avg_* to 0, but >> we want to 'assume' a fresh task is fully 'loaded'. IIRC Alex ran into >> this as well. >> >> PJT, do you have any sane solution for this, I forgot what the result >> of the last discussion was -- was there any? > > The conclusion after last discussion between PJT and Alex was that the > load contribution of a fresh task be set to "full" during "__sched_fork()". > > task->se.avg.load_avg_contrib = task->se.load.weight during > __sched_fork() is reflected in the latest power aware scheduler patchset > by Alex. Yes, the new forked runnable load was set as full utilisation in V5 power aware scheduling. PJT, Mike and I both agree on this. PJT just discussion how to give the full load to new forked task. and we get agreement in my coming V6 power aware scheduling patchset. > > Thanks > > Regards > Preeti U Murthy >> > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
Hi Peter, On 03/26/2013 06:07 PM, Peter Zijlstra wrote: > On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: >> +static bool is_light_task(struct task_struct *p) >> +{ >> + /* A light task runs less than 20% in average */ >> + return ((p->se.avg.runnable_avg_sum * 5) < >> + (p->se.avg.runnable_avg_period)); >> +} > > OK, so we have a 'problem' here, we initialize runnable_avg_* to 0, but > we want to 'assume' a fresh task is fully 'loaded'. IIRC Alex ran into > this as well. > > PJT, do you have any sane solution for this, I forgot what the result > of the last discussion was -- was there any? The conclusion after last discussion between PJT and Alex was that the load contribution of a fresh task be set to "full" during "__sched_fork()". task->se.avg.load_avg_contrib = task->se.load.weight during __sched_fork() is reflected in the latest power aware scheduler patchset by Alex. Thanks Regards Preeti U Murthy > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
Isn't this basically related to picking the NO_HZ cpu; if the system isn't fully symmetric with its power gates you want the NO_HZ cpu to be the 'special' cpu. If it is symmetric we really don't care which core is left 'running' and we can even select a new pack cpu from the idle cores once the old one is fully utilized. you don't really care much sure, but there's some advantages for sorting "all the way left", e.g. to linux cpu 0. Some tasks only run there, and interrupts tend to be favored to that cpu as well on x86. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On 26 March 2013 13:46, Peter Zijlstra wrote: > On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: >> During the creation of sched_domain, we define a pack buddy CPU for >> each CPU >> when one is available. We want to pack at all levels where a group of >> CPU can >> be power gated independently from others. >> On a system that can't power gate a group of CPUs independently, the >> flag is >> set at all sched_domain level and the buddy is set to -1. This is the >> default >> behavior. >> On a dual clusters / dual cores system which can power gate each core >> and >> cluster independently, the buddy configuration will be : >> >> | Cluster 0 | Cluster 1 | >> | CPU0 | CPU1 | CPU2 | CPU3 | >> --- >> buddy | CPU0 | CPU0 | CPU0 | CPU2 | > > I suppose this is adequate for the 'small' systems you currently have; > but given that Samsung is already bragging with its 'octo'-core Exynos > 5 (4+4 big-little thing) does this solution scale? The packing is only done at MC and CPU level to minimize the number of transition. > > Isn't this basically related to picking the NO_HZ cpu; if the system > isn't fully symmetric with its power gates you want the NO_HZ cpu to be > the 'special' cpu. If it is symmetric we really don't care which core > is left 'running' and we can even select a new pack cpu from the idle > cores once the old one is fully utilized. I agree that on a symmetric system, we don't really care about which core is selected but we want to use the same one whenever possible to prevent a ping pong between several cores or groups of cores, which is power consuming. By forcing a NOHZ cpu, your background activity will smoothly pack on this CPU and will not be spread on your system. When a CPU is fully loaded, we don't fall in a low CPU load use case and the periodic load balance can handle the situation to select a new target CPU which is close to the buddy CPU > > Re-using (or integrating) with NO_HZ has the dual advantage that you'll > make NO_HZ do the right thing for big-little (you typically want a > little core to be the one staying 'awake' and once someone makes NO_HZ > scale this all gets to scale along with it. > I think that you have answered to this question in your comment of patch 5, isn't it? Vincent -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On 26 March 2013 13:37, Peter Zijlstra wrote: > On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: >> +static bool is_light_task(struct task_struct *p) >> +{ >> + /* A light task runs less than 20% in average */ >> + return ((p->se.avg.runnable_avg_sum * 5) < >> + (p->se.avg.runnable_avg_period)); >> +} > > OK, so we have a 'problem' here, we initialize runnable_avg_* to 0, but > we want to 'assume' a fresh task is fully 'loaded'. IIRC Alex ran into > this as well. Hi Peter, The packing small tasks is only applied at wake up and not during fork or exec so the runnable_avg_* should have been initialized. As you mentionned, we assume that a fresh task is fully loaded and let the default scheduler behavior to select a target CPU Vincent > > PJT, do you have any sane solution for this, I forgot what the result > of the last discussion was -- was there any? > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: > During the creation of sched_domain, we define a pack buddy CPU for > each CPU > when one is available. We want to pack at all levels where a group of > CPU can > be power gated independently from others. > On a system that can't power gate a group of CPUs independently, the > flag is > set at all sched_domain level and the buddy is set to -1. This is the > default > behavior. > On a dual clusters / dual cores system which can power gate each core > and > cluster independently, the buddy configuration will be : > > | Cluster 0 | Cluster 1 | > | CPU0 | CPU1 | CPU2 | CPU3 | > --- > buddy | CPU0 | CPU0 | CPU0 | CPU2 | I suppose this is adequate for the 'small' systems you currently have; but given that Samsung is already bragging with its 'octo'-core Exynos 5 (4+4 big-little thing) does this solution scale? Isn't this basically related to picking the NO_HZ cpu; if the system isn't fully symmetric with its power gates you want the NO_HZ cpu to be the 'special' cpu. If it is symmetric we really don't care which core is left 'running' and we can even select a new pack cpu from the idle cores once the old one is fully utilized. Re-using (or integrating) with NO_HZ has the dual advantage that you'll make NO_HZ do the right thing for big-little (you typically want a little core to be the one staying 'awake' and once someone makes NO_HZ scale this all gets to scale along with it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: > +static bool is_light_task(struct task_struct *p) > +{ > + /* A light task runs less than 20% in average */ > + return ((p->se.avg.runnable_avg_sum * 5) < > + (p->se.avg.runnable_avg_period)); > +} OK, so we have a 'problem' here, we initialize runnable_avg_* to 0, but we want to 'assume' a fresh task is fully 'loaded'. IIRC Alex ran into this as well. PJT, do you have any sane solution for this, I forgot what the result of the last discussion was -- was there any? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: > +static bool is_buddy_busy(int cpu) > +{ > + struct rq *rq = cpu_rq(cpu); > + > + /* > +* A busy buddy is a CPU with a high load or a small load with > a lot of > +* running tasks. > +*/ > + return (rq->avg.runnable_avg_sum > > + (rq->avg.runnable_avg_period / (rq->nr_running > + 2))); > +} Why does the comment talk about load but we don't see it in the equation. Also, why does nr_running matter at all? I thought we'd simply bother with utilization, if fully utilized we're done etc.. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: +static bool is_buddy_busy(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + /* +* A busy buddy is a CPU with a high load or a small load with a lot of +* running tasks. +*/ + return (rq-avg.runnable_avg_sum + (rq-avg.runnable_avg_period / (rq-nr_running + 2))); +} Why does the comment talk about load but we don't see it in the equation. Also, why does nr_running matter at all? I thought we'd simply bother with utilization, if fully utilized we're done etc.. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: +static bool is_light_task(struct task_struct *p) +{ + /* A light task runs less than 20% in average */ + return ((p-se.avg.runnable_avg_sum * 5) + (p-se.avg.runnable_avg_period)); +} OK, so we have a 'problem' here, we initialize runnable_avg_* to 0, but we want to 'assume' a fresh task is fully 'loaded'. IIRC Alex ran into this as well. PJT, do you have any sane solution for this, I forgot what the result of the last discussion was -- was there any? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: During the creation of sched_domain, we define a pack buddy CPU for each CPU when one is available. We want to pack at all levels where a group of CPU can be power gated independently from others. On a system that can't power gate a group of CPUs independently, the flag is set at all sched_domain level and the buddy is set to -1. This is the default behavior. On a dual clusters / dual cores system which can power gate each core and cluster independently, the buddy configuration will be : | Cluster 0 | Cluster 1 | | CPU0 | CPU1 | CPU2 | CPU3 | --- buddy | CPU0 | CPU0 | CPU0 | CPU2 | I suppose this is adequate for the 'small' systems you currently have; but given that Samsung is already bragging with its 'octo'-core Exynos 5 (4+4 big-little thing) does this solution scale? Isn't this basically related to picking the NO_HZ cpu; if the system isn't fully symmetric with its power gates you want the NO_HZ cpu to be the 'special' cpu. If it is symmetric we really don't care which core is left 'running' and we can even select a new pack cpu from the idle cores once the old one is fully utilized. Re-using (or integrating) with NO_HZ has the dual advantage that you'll make NO_HZ do the right thing for big-little (you typically want a little core to be the one staying 'awake' and once someone makes NO_HZ scale this all gets to scale along with it. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On 26 March 2013 13:37, Peter Zijlstra pet...@infradead.org wrote: On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: +static bool is_light_task(struct task_struct *p) +{ + /* A light task runs less than 20% in average */ + return ((p-se.avg.runnable_avg_sum * 5) + (p-se.avg.runnable_avg_period)); +} OK, so we have a 'problem' here, we initialize runnable_avg_* to 0, but we want to 'assume' a fresh task is fully 'loaded'. IIRC Alex ran into this as well. Hi Peter, The packing small tasks is only applied at wake up and not during fork or exec so the runnable_avg_* should have been initialized. As you mentionned, we assume that a fresh task is fully loaded and let the default scheduler behavior to select a target CPU Vincent PJT, do you have any sane solution for this, I forgot what the result of the last discussion was -- was there any? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On 26 March 2013 13:46, Peter Zijlstra pet...@infradead.org wrote: On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: During the creation of sched_domain, we define a pack buddy CPU for each CPU when one is available. We want to pack at all levels where a group of CPU can be power gated independently from others. On a system that can't power gate a group of CPUs independently, the flag is set at all sched_domain level and the buddy is set to -1. This is the default behavior. On a dual clusters / dual cores system which can power gate each core and cluster independently, the buddy configuration will be : | Cluster 0 | Cluster 1 | | CPU0 | CPU1 | CPU2 | CPU3 | --- buddy | CPU0 | CPU0 | CPU0 | CPU2 | I suppose this is adequate for the 'small' systems you currently have; but given that Samsung is already bragging with its 'octo'-core Exynos 5 (4+4 big-little thing) does this solution scale? The packing is only done at MC and CPU level to minimize the number of transition. Isn't this basically related to picking the NO_HZ cpu; if the system isn't fully symmetric with its power gates you want the NO_HZ cpu to be the 'special' cpu. If it is symmetric we really don't care which core is left 'running' and we can even select a new pack cpu from the idle cores once the old one is fully utilized. I agree that on a symmetric system, we don't really care about which core is selected but we want to use the same one whenever possible to prevent a ping pong between several cores or groups of cores, which is power consuming. By forcing a NOHZ cpu, your background activity will smoothly pack on this CPU and will not be spread on your system. When a CPU is fully loaded, we don't fall in a low CPU load use case and the periodic load balance can handle the situation to select a new target CPU which is close to the buddy CPU Re-using (or integrating) with NO_HZ has the dual advantage that you'll make NO_HZ do the right thing for big-little (you typically want a little core to be the one staying 'awake' and once someone makes NO_HZ scale this all gets to scale along with it. I think that you have answered to this question in your comment of patch 5, isn't it? Vincent -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
Isn't this basically related to picking the NO_HZ cpu; if the system isn't fully symmetric with its power gates you want the NO_HZ cpu to be the 'special' cpu. If it is symmetric we really don't care which core is left 'running' and we can even select a new pack cpu from the idle cores once the old one is fully utilized. you don't really care much sure, but there's some advantages for sorting all the way left, e.g. to linux cpu 0. Some tasks only run there, and interrupts tend to be favored to that cpu as well on x86. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
Hi Peter, On 03/26/2013 06:07 PM, Peter Zijlstra wrote: On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: +static bool is_light_task(struct task_struct *p) +{ + /* A light task runs less than 20% in average */ + return ((p-se.avg.runnable_avg_sum * 5) + (p-se.avg.runnable_avg_period)); +} OK, so we have a 'problem' here, we initialize runnable_avg_* to 0, but we want to 'assume' a fresh task is fully 'loaded'. IIRC Alex ran into this as well. PJT, do you have any sane solution for this, I forgot what the result of the last discussion was -- was there any? The conclusion after last discussion between PJT and Alex was that the load contribution of a fresh task be set to full during __sched_fork(). task-se.avg.load_avg_contrib = task-se.load.weight during __sched_fork() is reflected in the latest power aware scheduler patchset by Alex. Thanks Regards Preeti U Murthy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 3/6] sched: pack small tasks
On 03/27/2013 12:33 PM, Preeti U Murthy wrote: Hi Peter, On 03/26/2013 06:07 PM, Peter Zijlstra wrote: On Fri, 2013-03-22 at 13:25 +0100, Vincent Guittot wrote: +static bool is_light_task(struct task_struct *p) +{ + /* A light task runs less than 20% in average */ + return ((p-se.avg.runnable_avg_sum * 5) + (p-se.avg.runnable_avg_period)); +} OK, so we have a 'problem' here, we initialize runnable_avg_* to 0, but we want to 'assume' a fresh task is fully 'loaded'. IIRC Alex ran into this as well. PJT, do you have any sane solution for this, I forgot what the result of the last discussion was -- was there any? The conclusion after last discussion between PJT and Alex was that the load contribution of a fresh task be set to full during __sched_fork(). task-se.avg.load_avg_contrib = task-se.load.weight during __sched_fork() is reflected in the latest power aware scheduler patchset by Alex. Yes, the new forked runnable load was set as full utilisation in V5 power aware scheduling. PJT, Mike and I both agree on this. PJT just discussion how to give the full load to new forked task. and we get agreement in my coming V6 power aware scheduling patchset. Thanks Regards Preeti U Murthy -- Thanks Alex -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH v3 3/6] sched: pack small tasks
During the creation of sched_domain, we define a pack buddy CPU for each CPU when one is available. We want to pack at all levels where a group of CPU can be power gated independently from others. On a system that can't power gate a group of CPUs independently, the flag is set at all sched_domain level and the buddy is set to -1. This is the default behavior. On a dual clusters / dual cores system which can power gate each core and cluster independently, the buddy configuration will be : | Cluster 0 | Cluster 1 | | CPU0 | CPU1 | CPU2 | CPU3 | --- buddy | CPU0 | CPU0 | CPU0 | CPU2 | Small tasks tend to slip out of the periodic load balance so the best place to choose to migrate them is during their wake up. The decision is in O(1) as we only check again one buddy CPU Signed-off-by: Vincent Guittot Reviewed-by: Morten Rasmussen --- kernel/sched/core.c |1 + kernel/sched/fair.c | 115 ++ kernel/sched/sched.h |5 +++ 3 files changed, 121 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b827e0c..21c35ce 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5662,6 +5662,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) rcu_assign_pointer(rq->sd, sd); destroy_sched_domains(tmp, cpu); + update_packing_domain(cpu); update_top_cache_domain(cpu); } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9c2f726..021c7b7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -160,6 +160,76 @@ void sched_init_granularity(void) update_sysctl(); } + +#ifdef CONFIG_SMP +/* + * Save the id of the optimal CPU that should be used to pack small tasks + * The value -1 is used when no buddy has been found + */ +DEFINE_PER_CPU(int, sd_pack_buddy); + +/* + * Look for the best buddy CPU that can be used to pack small tasks + * We make the assumption that it doesn't wort to pack on CPU that share the + * same powerline. We look for the 1st sched_domain without the + * SD_SHARE_POWERDOMAIN flag. Then we look for the sched_group with the lowest + * power per core based on the assumption that their power efficiency is + * better + */ +void update_packing_domain(int cpu) +{ + struct sched_domain *sd; + int id = -1; + + sd = highest_flag_domain(cpu, SD_SHARE_POWERDOMAIN); + if (!sd) + sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); + else + sd = sd->parent; + + while (sd && (sd->flags & SD_LOAD_BALANCE) + && !(sd->flags & SD_SHARE_POWERDOMAIN)) { + struct sched_group *sg = sd->groups; + struct sched_group *pack = sg; + struct sched_group *tmp; + + /* +* The sched_domain of a CPU points on the local sched_group +* and the 1st CPU of this local group is a good candidate +*/ + id = cpumask_first(sched_group_cpus(pack)); + + /* loop the sched groups to find the best one */ + for (tmp = sg->next; tmp != sg; tmp = tmp->next) { + if (tmp->sgp->power * pack->group_weight > + pack->sgp->power * tmp->group_weight) + continue; + + if ((tmp->sgp->power * pack->group_weight == + pack->sgp->power * tmp->group_weight) +&& (cpumask_first(sched_group_cpus(tmp)) >= id)) + continue; + + /* we have found a better group */ + pack = tmp; + + /* Take the 1st CPU of the new group */ + id = cpumask_first(sched_group_cpus(pack)); + } + + /* Look for another CPU than itself */ + if (id != cpu) + break; + + sd = sd->parent; + } + + pr_debug("CPU%d packing on CPU%d\n", cpu, id); + per_cpu(sd_pack_buddy, cpu) = id; +} + +#endif /* CONFIG_SMP */ + #if BITS_PER_LONG == 32 # define WMULT_CONST (~0UL) #else @@ -3291,6 +3361,47 @@ done: return target; } +static bool is_buddy_busy(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + /* +* A busy buddy is a CPU with a high load or a small load with a lot of +* running tasks. +*/ + return (rq->avg.runnable_avg_sum > + (rq->avg.runnable_avg_period / (rq->nr_running + 2))); +} + +static bool is_light_task(struct task_struct *p) +{ + /* A light task runs less than 20% in average */ + return ((p->se.avg.runnable_avg_sum * 5) < + (p->se.avg.runnable_avg_period)); +} + +static int check_pack_buddy(int cpu, struct task_struct *p) +{ + int buddy = per_cpu(sd_pack_buddy,
[RFC PATCH v3 3/6] sched: pack small tasks
During the creation of sched_domain, we define a pack buddy CPU for each CPU when one is available. We want to pack at all levels where a group of CPU can be power gated independently from others. On a system that can't power gate a group of CPUs independently, the flag is set at all sched_domain level and the buddy is set to -1. This is the default behavior. On a dual clusters / dual cores system which can power gate each core and cluster independently, the buddy configuration will be : | Cluster 0 | Cluster 1 | | CPU0 | CPU1 | CPU2 | CPU3 | --- buddy | CPU0 | CPU0 | CPU0 | CPU2 | Small tasks tend to slip out of the periodic load balance so the best place to choose to migrate them is during their wake up. The decision is in O(1) as we only check again one buddy CPU Signed-off-by: Vincent Guittot vincent.guit...@linaro.org Reviewed-by: Morten Rasmussen morten.rasmus...@arm.com --- kernel/sched/core.c |1 + kernel/sched/fair.c | 115 ++ kernel/sched/sched.h |5 +++ 3 files changed, 121 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b827e0c..21c35ce 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5662,6 +5662,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) rcu_assign_pointer(rq-sd, sd); destroy_sched_domains(tmp, cpu); + update_packing_domain(cpu); update_top_cache_domain(cpu); } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9c2f726..021c7b7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -160,6 +160,76 @@ void sched_init_granularity(void) update_sysctl(); } + +#ifdef CONFIG_SMP +/* + * Save the id of the optimal CPU that should be used to pack small tasks + * The value -1 is used when no buddy has been found + */ +DEFINE_PER_CPU(int, sd_pack_buddy); + +/* + * Look for the best buddy CPU that can be used to pack small tasks + * We make the assumption that it doesn't wort to pack on CPU that share the + * same powerline. We look for the 1st sched_domain without the + * SD_SHARE_POWERDOMAIN flag. Then we look for the sched_group with the lowest + * power per core based on the assumption that their power efficiency is + * better + */ +void update_packing_domain(int cpu) +{ + struct sched_domain *sd; + int id = -1; + + sd = highest_flag_domain(cpu, SD_SHARE_POWERDOMAIN); + if (!sd) + sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)-sd); + else + sd = sd-parent; + + while (sd (sd-flags SD_LOAD_BALANCE) +!(sd-flags SD_SHARE_POWERDOMAIN)) { + struct sched_group *sg = sd-groups; + struct sched_group *pack = sg; + struct sched_group *tmp; + + /* +* The sched_domain of a CPU points on the local sched_group +* and the 1st CPU of this local group is a good candidate +*/ + id = cpumask_first(sched_group_cpus(pack)); + + /* loop the sched groups to find the best one */ + for (tmp = sg-next; tmp != sg; tmp = tmp-next) { + if (tmp-sgp-power * pack-group_weight + pack-sgp-power * tmp-group_weight) + continue; + + if ((tmp-sgp-power * pack-group_weight == + pack-sgp-power * tmp-group_weight) + (cpumask_first(sched_group_cpus(tmp)) = id)) + continue; + + /* we have found a better group */ + pack = tmp; + + /* Take the 1st CPU of the new group */ + id = cpumask_first(sched_group_cpus(pack)); + } + + /* Look for another CPU than itself */ + if (id != cpu) + break; + + sd = sd-parent; + } + + pr_debug(CPU%d packing on CPU%d\n, cpu, id); + per_cpu(sd_pack_buddy, cpu) = id; +} + +#endif /* CONFIG_SMP */ + #if BITS_PER_LONG == 32 # define WMULT_CONST (~0UL) #else @@ -3291,6 +3361,47 @@ done: return target; } +static bool is_buddy_busy(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + /* +* A busy buddy is a CPU with a high load or a small load with a lot of +* running tasks. +*/ + return (rq-avg.runnable_avg_sum + (rq-avg.runnable_avg_period / (rq-nr_running + 2))); +} + +static bool is_light_task(struct task_struct *p) +{ + /* A light task runs less than 20% in average */ + return ((p-se.avg.runnable_avg_sum * 5) + (p-se.avg.runnable_avg_period)); +} + +static int check_pack_buddy(int cpu, struct task_struct *p) +{ + int buddy =