Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
On 10/9/07, Steven Rostedt <[EMAIL PROTECTED]> wrote: > This has been complied tested (and no more ;-) > > > The idea here is when we find a situation that we just scheduled in an > RT task and we either pushed a lesser RT task away or more than one RT > task was scheduled on this CPU before scheduling occurred. > > The answer that this patch does is to do a O(n) search of CPUs for the > CPU with the lowest prio task running. When that CPU is found the next > highest RT task is pushed to that CPU. It can be extended: Search CPU that is running the lowest priority or the same priority as the highest RT task (which is tried to be pushed). If any CPU is found to be running lower priority task (lowest among the CPU) as above push the task to the CPU. Else if no CPU was found with lower priority, find CPU that runs a task of same priority In this there are two cases case 1. if the currently running task on this CPU is higher priority than the task running (ie active task priority) , then RT task can be pushed to the CPU (where it competes with the similar priority task in round robin fashion ). case 2: if the priority of the task that is running and the task that is trying to be pushed are same (from the same queue .. queue->next->next) then the balancing has to be done on the number of task that are running on these CPUs. making them run equal (or almost , considering the ping-pong effect ) number of tasks. > > Some notes: > > 1) no lock is taken while looking for the lowest priority CPU. When one > is found, only that CPU's lock is taken and after that a check is made > to see if it is still a candidate to push the RT task over. If not, we > try the search again, for a max of 3 tries. > > 2) I only do this for the second highest RT task on the CPU queue. This > can be easily changed to do it for all RT tasks until no more can be > pushed off to other CPUs. > > This is a simple approach right now, and is only being posted for > comments. I'm sure more can be done to make this more efficient or just > simply better. > > -- Steve > > Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> > > Index: linux-2.6.23-rc9-rt2/kernel/sched.c > === > --- linux-2.6.23-rc9-rt2.orig/kernel/sched.c > +++ linux-2.6.23-rc9-rt2/kernel/sched.c > @@ -304,6 +304,7 @@ struct rq { > #ifdef CONFIG_PREEMPT_RT > unsigned long rt_nr_running; > unsigned long rt_nr_uninterruptible; > + int curr_prio; > #endif > > unsigned long switch_timestamp; > @@ -1485,6 +1486,87 @@ next_in_queue: > static int double_lock_balance(struct rq *this_rq, struct rq *busiest); > > /* > + * If the current CPU has more than one RT task, see if the non > + * running task can migrate over to a CPU that is running a task > + * of lesser priority. > + */ > +static int push_rt_task(struct rq *this_rq) > +{ > + struct task_struct *next_task; > + struct rq *lowest_rq = NULL; > + int tries; > + int cpu; > + int dst_cpu = -1; > + int ret = 0; > + > + BUG_ON(!spin_is_locked(_rq->lock)); > + > + next_task = rt_next_highest_task(this_rq); > + if (!next_task) > + return 0; > + > + /* We might release this_rq lock */ > + get_task_struct(next_task); > + > + /* Only try this algorithm three times */ > + for (tries = 0; tries < 3; tries++) { > + /* > +* Scan each rq for the lowest prio. > +*/ > + for_each_cpu_mask(cpu, next_task->cpus_allowed) { > + struct rq *rq = _cpu(runqueues, cpu); > + > + if (cpu == smp_processor_id()) > + continue; > + > + /* no locking for now */ > + if (rq->curr_prio > next_task->prio && > + (!lowest_rq || rq->curr_prio < > lowest_rq->curr_prio)) { > + dst_cpu = cpu; > + lowest_rq = rq; > + } > + } > + > + if (!lowest_rq) > + break; > + > + if (double_lock_balance(this_rq, lowest_rq)) { > + /* > +* We had to unlock the run queue. In > +* the mean time, next_task could have > +* migrated already or had its affinity changed. > +*/ > + if (unlikely(task_rq(next_task) != this_rq || > +!cpu_isset(dst_cpu, > next_task->cpus_allowed))) { > + spin_unlock(_rq->lock); > + break; > + } > + } > + > + /* if the prio of this runqueue changed, try again */ > + if (lowest_rq->curr_prio <= next_task->prio) { > +
Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
On Tue, Oct 09, 2007 at 04:50:47PM -0400, Steven Rostedt wrote: > > I did something like this a while ago for another scheduling project. > > A couple 'possible' optimizations to think about are: > > 1) Only scan the remote runqueues once and keep a local copy of the > >remote priorities for subsequent 'scans'. Accessing the remote > >runqueus (CPU specific cache lines) can be expensive. > > You mean to keep the copy for the next two tries? Yes. But with #2 below, your next try is the runqueue/CPU that is the next best candidate (after the trylock fails). The 'hope' is that there is more than one candidate CPU to push the task to. Of course, you always want to try and find the 'best' candidate. My thoughts were that if you could find ANY cpu to take the task that would be better than sending the IPI everywhere. With multiple runqueues/locks there is no way you can be guaranteed of making the 'best' placement. So, a good placement may be enough. > > 2) When verifying priorities, just perform spin_trylock() on the remote > >runqueue. If you can immediately get it great. If not, it implies > >someone else is messing with the runqueue and there is a good chance > >the data you pre-fetched (curr->Priority) is invalid. In this case > >it might be faster to just 'move on' to the next candidate runqueue/CPU. > >i.e. The next highest priority that the new task can preempt. -- Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
-- On Tue, 9 Oct 2007, mike kravetz wrote: > > I did something like this a while ago for another scheduling project. > A couple 'possible' optimizations to think about are: > 1) Only scan the remote runqueues once and keep a local copy of the >remote priorities for subsequent 'scans'. Accessing the remote >runqueus (CPU specific cache lines) can be expensive. You mean to keep the copy for the next two tries? > 2) When verifying priorities, just perform spin_trylock() on the remote >runqueue. If you can immediately get it great. If not, it implies >someone else is messing with the runqueue and there is a good chance >the data you pre-fetched (curr->Priority) is invalid. In this case >it might be faster to just 'move on' to the next candidate runqueue/CPU. >i.e. The next highest priority that the new task can preempt. I was a bit scared of grabing the lock anyway, because that's another cache hit (write side). So only grabbing the lock when needed would save us from dirting the runqueue lock for each CPU. > > Of course, these 'optimizations' would change the algorithm. Trying to > make any decision based on data that is changing is always a crap shoot. :) Yes indeed. The aim for now is to solve the latencies that you've been seeing. But really, there is still holes (small ones) that can cause a latency if a schedule happened "just right". Hopefully the final result of this work will close them too. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
On Tue, Oct 09, 2007 at 01:59:37PM -0400, Steven Rostedt wrote: > This has been complied tested (and no more ;-) > > The idea here is when we find a situation that we just scheduled in an > RT task and we either pushed a lesser RT task away or more than one RT > task was scheduled on this CPU before scheduling occurred. > > The answer that this patch does is to do a O(n) search of CPUs for the > CPU with the lowest prio task running. When that CPU is found the next > highest RT task is pushed to that CPU. > > Some notes: > > 1) no lock is taken while looking for the lowest priority CPU. When one > is found, only that CPU's lock is taken and after that a check is made > to see if it is still a candidate to push the RT task over. If not, we > try the search again, for a max of 3 tries. I did something like this a while ago for another scheduling project. A couple 'possible' optimizations to think about are: 1) Only scan the remote runqueues once and keep a local copy of the remote priorities for subsequent 'scans'. Accessing the remote runqueus (CPU specific cache lines) can be expensive. 2) When verifying priorities, just perform spin_trylock() on the remote runqueue. If you can immediately get it great. If not, it implies someone else is messing with the runqueue and there is a good chance the data you pre-fetched (curr->Priority) is invalid. In this case it might be faster to just 'move on' to the next candidate runqueue/CPU. i.e. The next highest priority that the new task can preempt. Of course, these 'optimizations' would change the algorithm. Trying to make any decision based on data that is changing is always a crap shoot. :) -- Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
-- On Tue, 9 Oct 2007, Peter Zijlstra wrote: > > Do we really want this PREEMPT_RT only? Yes, it will give us better benchmarks ;-) > > > Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> > > > > Index: linux-2.6.23-rc9-rt2/kernel/sched.c > > === > > --- linux-2.6.23-rc9-rt2.orig/kernel/sched.c > > +++ linux-2.6.23-rc9-rt2/kernel/sched.c > > @@ -304,6 +304,7 @@ struct rq { > > #ifdef CONFIG_PREEMPT_RT > > unsigned long rt_nr_running; > > unsigned long rt_nr_uninterruptible; > > + int curr_prio; > > #endif > > > > unsigned long switch_timestamp; > > @@ -1485,6 +1486,87 @@ next_in_queue: > > static int double_lock_balance(struct rq *this_rq, struct rq *busiest); > > > > /* > > + * If the current CPU has more than one RT task, see if the non > > + * running task can migrate over to a CPU that is running a task > > + * of lesser priority. > > + */ > > +static int push_rt_task(struct rq *this_rq) > > +{ > > + struct task_struct *next_task; > > + struct rq *lowest_rq = NULL; > > + int tries; > > + int cpu; > > + int dst_cpu = -1; > > + int ret = 0; > > + > > + BUG_ON(!spin_is_locked(_rq->lock)); > > assert_spin_locked(_rq->lock); Damn! I know that. Thanks, will fix. > > > + > > + next_task = rt_next_highest_task(this_rq); > > + if (!next_task) > > + return 0; > > + > > + /* We might release this_rq lock */ > > + get_task_struct(next_task); > > Can the rest of the code suffer this? (the caller that is) I need to add a comment at the top to state that this function can do this. Now is it OK with the current caller? I need to look more closely. I might need to change where this is actually called. As stated, this hasn't been tested. But you are right, this needs to be looked closely at. > > > + /* Only try this algorithm three times */ > > + for (tries = 0; tries < 3; tries++) { > > magic numbers.. maybe a magic #define with a descriptive name? Hehe, that's one of the clean ups that need to be done ;-) > > > + /* > > +* Scan each rq for the lowest prio. > > +*/ > > + for_each_cpu_mask(cpu, next_task->cpus_allowed) { > > + struct rq *rq = _cpu(runqueues, cpu); > > + > > + if (cpu == smp_processor_id()) > > + continue; > > + > > + /* no locking for now */ > > + if (rq->curr_prio > next_task->prio && > > + (!lowest_rq || rq->curr_prio < > > lowest_rq->curr_prio)) { > > + dst_cpu = cpu; > > + lowest_rq = rq; > > + } > > + } > > + > > + if (!lowest_rq) > > + break; > > + > > + if (double_lock_balance(this_rq, lowest_rq)) { > > + /* > > +* We had to unlock the run queue. In > > +* the mean time, next_task could have > > +* migrated already or had its affinity changed. > > +*/ > > + if (unlikely(task_rq(next_task) != this_rq || > > +!cpu_isset(dst_cpu, > > next_task->cpus_allowed))) { > > + spin_unlock(_rq->lock); > > + break; > > + } > > + } > > + > > + /* if the prio of this runqueue changed, try again */ > > + if (lowest_rq->curr_prio <= next_task->prio) { > > + spin_unlock(_rq->lock); > > + continue; > > + } > > + > > + deactivate_task(this_rq, next_task, 0); > > + set_task_cpu(next_task, dst_cpu); > > + activate_task(lowest_rq, next_task, 0); > > + > > + set_tsk_need_resched(lowest_rq->curr); > > Use resched_task(), that will notify the remote cpu too. OK, will do. > > > + > > + spin_unlock(_rq->lock); > > + ret = 1; > > + > > + break; > > + } > > + > > + put_task_struct(next_task); > > + > > + return ret; > > +} > > + > > +/* > > * Pull RT tasks from other CPUs in the RT-overload > > * case. Interrupts are disabled, local rq is locked. > > */ > > @@ -2207,7 +2289,8 @@ static inline void finish_task_switch(st > > * If we pushed an RT task off the runqueue, > > * then kick other CPUs, they might run it: > > */ > > - if (unlikely(rt_task(current) && rq->rt_nr_running > 1)) { > > + rq->curr_prio = current->prio; > > + if (unlikely(rt_task(current) && push_rt_task(rq))) { > > schedstat_inc(rq, rto_schedule); > > smp_send_reschedule_allbutself_cpumask(current->cpus_allowed); > > Which will allow you to remove this thing. OK, will do. Note, that this is where we need to see if it is ok to release the runqueue lock. > > > } > > Index: linux-2.6.23-rc9-rt2/kernel/sched_rt.c > >
Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
On Tue, 2007-10-09 at 13:59 -0400, Steven Rostedt wrote: > This has been complied tested (and no more ;-) > > > The idea here is when we find a situation that we just scheduled in an > RT task and we either pushed a lesser RT task away or more than one RT > task was scheduled on this CPU before scheduling occurred. > > The answer that this patch does is to do a O(n) search of CPUs for the > CPU with the lowest prio task running. When that CPU is found the next > highest RT task is pushed to that CPU. > > Some notes: > > 1) no lock is taken while looking for the lowest priority CPU. When one > is found, only that CPU's lock is taken and after that a check is made > to see if it is still a candidate to push the RT task over. If not, we > try the search again, for a max of 3 tries. > > 2) I only do this for the second highest RT task on the CPU queue. This > can be easily changed to do it for all RT tasks until no more can be > pushed off to other CPUs. > > This is a simple approach right now, and is only being posted for > comments. I'm sure more can be done to make this more efficient or just > simply better. > > -- Steve Do we really want this PREEMPT_RT only? > Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> > > Index: linux-2.6.23-rc9-rt2/kernel/sched.c > === > --- linux-2.6.23-rc9-rt2.orig/kernel/sched.c > +++ linux-2.6.23-rc9-rt2/kernel/sched.c > @@ -304,6 +304,7 @@ struct rq { > #ifdef CONFIG_PREEMPT_RT > unsigned long rt_nr_running; > unsigned long rt_nr_uninterruptible; > + int curr_prio; > #endif > > unsigned long switch_timestamp; > @@ -1485,6 +1486,87 @@ next_in_queue: > static int double_lock_balance(struct rq *this_rq, struct rq *busiest); > > /* > + * If the current CPU has more than one RT task, see if the non > + * running task can migrate over to a CPU that is running a task > + * of lesser priority. > + */ > +static int push_rt_task(struct rq *this_rq) > +{ > + struct task_struct *next_task; > + struct rq *lowest_rq = NULL; > + int tries; > + int cpu; > + int dst_cpu = -1; > + int ret = 0; > + > + BUG_ON(!spin_is_locked(_rq->lock)); assert_spin_locked(_rq->lock); > + > + next_task = rt_next_highest_task(this_rq); > + if (!next_task) > + return 0; > + > + /* We might release this_rq lock */ > + get_task_struct(next_task); Can the rest of the code suffer this? (the caller that is) > + /* Only try this algorithm three times */ > + for (tries = 0; tries < 3; tries++) { magic numbers.. maybe a magic #define with a descriptive name? > + /* > + * Scan each rq for the lowest prio. > + */ > + for_each_cpu_mask(cpu, next_task->cpus_allowed) { > + struct rq *rq = _cpu(runqueues, cpu); > + > + if (cpu == smp_processor_id()) > + continue; > + > + /* no locking for now */ > + if (rq->curr_prio > next_task->prio && > + (!lowest_rq || rq->curr_prio < > lowest_rq->curr_prio)) { > + dst_cpu = cpu; > + lowest_rq = rq; > + } > + } > + > + if (!lowest_rq) > + break; > + > + if (double_lock_balance(this_rq, lowest_rq)) { > + /* > + * We had to unlock the run queue. In > + * the mean time, next_task could have > + * migrated already or had its affinity changed. > + */ > + if (unlikely(task_rq(next_task) != this_rq || > + !cpu_isset(dst_cpu, > next_task->cpus_allowed))) { > + spin_unlock(_rq->lock); > + break; > + } > + } > + > + /* if the prio of this runqueue changed, try again */ > + if (lowest_rq->curr_prio <= next_task->prio) { > + spin_unlock(_rq->lock); > + continue; > + } > + > + deactivate_task(this_rq, next_task, 0); > + set_task_cpu(next_task, dst_cpu); > + activate_task(lowest_rq, next_task, 0); > + > + set_tsk_need_resched(lowest_rq->curr); Use resched_task(), that will notify the remote cpu too. > + > + spin_unlock(_rq->lock); > + ret = 1; > + > + break; > + } > + > + put_task_struct(next_task); > + > + return ret; > +} > + > +/* > * Pull RT tasks from other CPUs in the RT-overload > * case. Interrupts are disabled, local rq is locked. > */ > @@ -2207,7 +2289,8 @@ static inline void finish_task_switch(st >* If we pushed an RT task off the runqueue, >* then kick other CPUs,
Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
-- On Tue, 9 Oct 2007, Steven Rostedt wrote: > This has been complied tested (and no more ;-) > > > The idea here is when we find a situation that we just scheduled in an > RT task and we either pushed a lesser RT task away or more than one RT > task was scheduled on this CPU before scheduling occurred. > > The answer that this patch does is to do a O(n) search of CPUs for the > CPU with the lowest prio task running. When that CPU is found the next > highest RT task is pushed to that CPU. I don't want that O(n) to scare anyone. It really is a O(1) but with a K = NR_CPUS. I was saying if you grow the NR_CPUS the search grows too. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
-- On Tue, 9 Oct 2007, Steven Rostedt wrote: This has been complied tested (and no more ;-) The idea here is when we find a situation that we just scheduled in an RT task and we either pushed a lesser RT task away or more than one RT task was scheduled on this CPU before scheduling occurred. The answer that this patch does is to do a O(n) search of CPUs for the CPU with the lowest prio task running. When that CPU is found the next highest RT task is pushed to that CPU. I don't want that O(n) to scare anyone. It really is a O(1) but with a K = NR_CPUS. I was saying if you grow the NR_CPUS the search grows too. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
On Tue, 2007-10-09 at 13:59 -0400, Steven Rostedt wrote: This has been complied tested (and no more ;-) The idea here is when we find a situation that we just scheduled in an RT task and we either pushed a lesser RT task away or more than one RT task was scheduled on this CPU before scheduling occurred. The answer that this patch does is to do a O(n) search of CPUs for the CPU with the lowest prio task running. When that CPU is found the next highest RT task is pushed to that CPU. Some notes: 1) no lock is taken while looking for the lowest priority CPU. When one is found, only that CPU's lock is taken and after that a check is made to see if it is still a candidate to push the RT task over. If not, we try the search again, for a max of 3 tries. 2) I only do this for the second highest RT task on the CPU queue. This can be easily changed to do it for all RT tasks until no more can be pushed off to other CPUs. This is a simple approach right now, and is only being posted for comments. I'm sure more can be done to make this more efficient or just simply better. -- Steve Do we really want this PREEMPT_RT only? Signed-off-by: Steven Rostedt [EMAIL PROTECTED] Index: linux-2.6.23-rc9-rt2/kernel/sched.c === --- linux-2.6.23-rc9-rt2.orig/kernel/sched.c +++ linux-2.6.23-rc9-rt2/kernel/sched.c @@ -304,6 +304,7 @@ struct rq { #ifdef CONFIG_PREEMPT_RT unsigned long rt_nr_running; unsigned long rt_nr_uninterruptible; + int curr_prio; #endif unsigned long switch_timestamp; @@ -1485,6 +1486,87 @@ next_in_queue: static int double_lock_balance(struct rq *this_rq, struct rq *busiest); /* + * If the current CPU has more than one RT task, see if the non + * running task can migrate over to a CPU that is running a task + * of lesser priority. + */ +static int push_rt_task(struct rq *this_rq) +{ + struct task_struct *next_task; + struct rq *lowest_rq = NULL; + int tries; + int cpu; + int dst_cpu = -1; + int ret = 0; + + BUG_ON(!spin_is_locked(this_rq-lock)); assert_spin_locked(this_rq-lock); + + next_task = rt_next_highest_task(this_rq); + if (!next_task) + return 0; + + /* We might release this_rq lock */ + get_task_struct(next_task); Can the rest of the code suffer this? (the caller that is) + /* Only try this algorithm three times */ + for (tries = 0; tries 3; tries++) { magic numbers.. maybe a magic #define with a descriptive name? + /* + * Scan each rq for the lowest prio. + */ + for_each_cpu_mask(cpu, next_task-cpus_allowed) { + struct rq *rq = per_cpu(runqueues, cpu); + + if (cpu == smp_processor_id()) + continue; + + /* no locking for now */ + if (rq-curr_prio next_task-prio + (!lowest_rq || rq-curr_prio lowest_rq-curr_prio)) { + dst_cpu = cpu; + lowest_rq = rq; + } + } + + if (!lowest_rq) + break; + + if (double_lock_balance(this_rq, lowest_rq)) { + /* + * We had to unlock the run queue. In + * the mean time, next_task could have + * migrated already or had its affinity changed. + */ + if (unlikely(task_rq(next_task) != this_rq || + !cpu_isset(dst_cpu, next_task-cpus_allowed))) { + spin_unlock(lowest_rq-lock); + break; + } + } + + /* if the prio of this runqueue changed, try again */ + if (lowest_rq-curr_prio = next_task-prio) { + spin_unlock(lowest_rq-lock); + continue; + } + + deactivate_task(this_rq, next_task, 0); + set_task_cpu(next_task, dst_cpu); + activate_task(lowest_rq, next_task, 0); + + set_tsk_need_resched(lowest_rq-curr); Use resched_task(), that will notify the remote cpu too. + + spin_unlock(lowest_rq-lock); + ret = 1; + + break; + } + + put_task_struct(next_task); + + return ret; +} + +/* * Pull RT tasks from other CPUs in the RT-overload * case. Interrupts are disabled, local rq is locked. */ @@ -2207,7 +2289,8 @@ static inline void finish_task_switch(st * If we pushed an RT task off the runqueue, * then kick other CPUs, they might run it: */ - if (unlikely(rt_task(current) rq-rt_nr_running 1)) { + rq-curr_prio =
Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
-- On Tue, 9 Oct 2007, Peter Zijlstra wrote: Do we really want this PREEMPT_RT only? Yes, it will give us better benchmarks ;-) Signed-off-by: Steven Rostedt [EMAIL PROTECTED] Index: linux-2.6.23-rc9-rt2/kernel/sched.c === --- linux-2.6.23-rc9-rt2.orig/kernel/sched.c +++ linux-2.6.23-rc9-rt2/kernel/sched.c @@ -304,6 +304,7 @@ struct rq { #ifdef CONFIG_PREEMPT_RT unsigned long rt_nr_running; unsigned long rt_nr_uninterruptible; + int curr_prio; #endif unsigned long switch_timestamp; @@ -1485,6 +1486,87 @@ next_in_queue: static int double_lock_balance(struct rq *this_rq, struct rq *busiest); /* + * If the current CPU has more than one RT task, see if the non + * running task can migrate over to a CPU that is running a task + * of lesser priority. + */ +static int push_rt_task(struct rq *this_rq) +{ + struct task_struct *next_task; + struct rq *lowest_rq = NULL; + int tries; + int cpu; + int dst_cpu = -1; + int ret = 0; + + BUG_ON(!spin_is_locked(this_rq-lock)); assert_spin_locked(this_rq-lock); Damn! I know that. Thanks, will fix. + + next_task = rt_next_highest_task(this_rq); + if (!next_task) + return 0; + + /* We might release this_rq lock */ + get_task_struct(next_task); Can the rest of the code suffer this? (the caller that is) I need to add a comment at the top to state that this function can do this. Now is it OK with the current caller? I need to look more closely. I might need to change where this is actually called. As stated, this hasn't been tested. But you are right, this needs to be looked closely at. + /* Only try this algorithm three times */ + for (tries = 0; tries 3; tries++) { magic numbers.. maybe a magic #define with a descriptive name? Hehe, that's one of the clean ups that need to be done ;-) + /* +* Scan each rq for the lowest prio. +*/ + for_each_cpu_mask(cpu, next_task-cpus_allowed) { + struct rq *rq = per_cpu(runqueues, cpu); + + if (cpu == smp_processor_id()) + continue; + + /* no locking for now */ + if (rq-curr_prio next_task-prio + (!lowest_rq || rq-curr_prio lowest_rq-curr_prio)) { + dst_cpu = cpu; + lowest_rq = rq; + } + } + + if (!lowest_rq) + break; + + if (double_lock_balance(this_rq, lowest_rq)) { + /* +* We had to unlock the run queue. In +* the mean time, next_task could have +* migrated already or had its affinity changed. +*/ + if (unlikely(task_rq(next_task) != this_rq || +!cpu_isset(dst_cpu, next_task-cpus_allowed))) { + spin_unlock(lowest_rq-lock); + break; + } + } + + /* if the prio of this runqueue changed, try again */ + if (lowest_rq-curr_prio = next_task-prio) { + spin_unlock(lowest_rq-lock); + continue; + } + + deactivate_task(this_rq, next_task, 0); + set_task_cpu(next_task, dst_cpu); + activate_task(lowest_rq, next_task, 0); + + set_tsk_need_resched(lowest_rq-curr); Use resched_task(), that will notify the remote cpu too. OK, will do. + + spin_unlock(lowest_rq-lock); + ret = 1; + + break; + } + + put_task_struct(next_task); + + return ret; +} + +/* * Pull RT tasks from other CPUs in the RT-overload * case. Interrupts are disabled, local rq is locked. */ @@ -2207,7 +2289,8 @@ static inline void finish_task_switch(st * If we pushed an RT task off the runqueue, * then kick other CPUs, they might run it: */ - if (unlikely(rt_task(current) rq-rt_nr_running 1)) { + rq-curr_prio = current-prio; + if (unlikely(rt_task(current) push_rt_task(rq))) { schedstat_inc(rq, rto_schedule); smp_send_reschedule_allbutself_cpumask(current-cpus_allowed); Which will allow you to remove this thing. OK, will do. Note, that this is where we need to see if it is ok to release the runqueue lock. } Index: linux-2.6.23-rc9-rt2/kernel/sched_rt.c === --- linux-2.6.23-rc9-rt2.orig/kernel/sched_rt.c +++ linux-2.6.23-rc9-rt2/kernel/sched_rt.c @@ -96,6 +96,48 @@ static struct task_struct *pick_next_tas return next; } +#ifdef
Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
On Tue, Oct 09, 2007 at 01:59:37PM -0400, Steven Rostedt wrote: This has been complied tested (and no more ;-) The idea here is when we find a situation that we just scheduled in an RT task and we either pushed a lesser RT task away or more than one RT task was scheduled on this CPU before scheduling occurred. The answer that this patch does is to do a O(n) search of CPUs for the CPU with the lowest prio task running. When that CPU is found the next highest RT task is pushed to that CPU. Some notes: 1) no lock is taken while looking for the lowest priority CPU. When one is found, only that CPU's lock is taken and after that a check is made to see if it is still a candidate to push the RT task over. If not, we try the search again, for a max of 3 tries. I did something like this a while ago for another scheduling project. A couple 'possible' optimizations to think about are: 1) Only scan the remote runqueues once and keep a local copy of the remote priorities for subsequent 'scans'. Accessing the remote runqueus (CPU specific cache lines) can be expensive. 2) When verifying priorities, just perform spin_trylock() on the remote runqueue. If you can immediately get it great. If not, it implies someone else is messing with the runqueue and there is a good chance the data you pre-fetched (curr-Priority) is invalid. In this case it might be faster to just 'move on' to the next candidate runqueue/CPU. i.e. The next highest priority that the new task can preempt. Of course, these 'optimizations' would change the algorithm. Trying to make any decision based on data that is changing is always a crap shoot. :) -- Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
-- On Tue, 9 Oct 2007, mike kravetz wrote: I did something like this a while ago for another scheduling project. A couple 'possible' optimizations to think about are: 1) Only scan the remote runqueues once and keep a local copy of the remote priorities for subsequent 'scans'. Accessing the remote runqueus (CPU specific cache lines) can be expensive. You mean to keep the copy for the next two tries? 2) When verifying priorities, just perform spin_trylock() on the remote runqueue. If you can immediately get it great. If not, it implies someone else is messing with the runqueue and there is a good chance the data you pre-fetched (curr-Priority) is invalid. In this case it might be faster to just 'move on' to the next candidate runqueue/CPU. i.e. The next highest priority that the new task can preempt. I was a bit scared of grabing the lock anyway, because that's another cache hit (write side). So only grabbing the lock when needed would save us from dirting the runqueue lock for each CPU. Of course, these 'optimizations' would change the algorithm. Trying to make any decision based on data that is changing is always a crap shoot. :) Yes indeed. The aim for now is to solve the latencies that you've been seeing. But really, there is still holes (small ones) that can cause a latency if a schedule happened just right. Hopefully the final result of this work will close them too. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
On Tue, Oct 09, 2007 at 04:50:47PM -0400, Steven Rostedt wrote: I did something like this a while ago for another scheduling project. A couple 'possible' optimizations to think about are: 1) Only scan the remote runqueues once and keep a local copy of the remote priorities for subsequent 'scans'. Accessing the remote runqueus (CPU specific cache lines) can be expensive. You mean to keep the copy for the next two tries? Yes. But with #2 below, your next try is the runqueue/CPU that is the next best candidate (after the trylock fails). The 'hope' is that there is more than one candidate CPU to push the task to. Of course, you always want to try and find the 'best' candidate. My thoughts were that if you could find ANY cpu to take the task that would be better than sending the IPI everywhere. With multiple runqueues/locks there is no way you can be guaranteed of making the 'best' placement. So, a good placement may be enough. 2) When verifying priorities, just perform spin_trylock() on the remote runqueue. If you can immediately get it great. If not, it implies someone else is messing with the runqueue and there is a good chance the data you pre-fetched (curr-Priority) is invalid. In this case it might be faster to just 'move on' to the next candidate runqueue/CPU. i.e. The next highest priority that the new task can preempt. -- Mike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
On 10/9/07, Steven Rostedt [EMAIL PROTECTED] wrote: This has been complied tested (and no more ;-) The idea here is when we find a situation that we just scheduled in an RT task and we either pushed a lesser RT task away or more than one RT task was scheduled on this CPU before scheduling occurred. The answer that this patch does is to do a O(n) search of CPUs for the CPU with the lowest prio task running. When that CPU is found the next highest RT task is pushed to that CPU. It can be extended: Search CPU that is running the lowest priority or the same priority as the highest RT task (which is tried to be pushed). If any CPU is found to be running lower priority task (lowest among the CPU) as above push the task to the CPU. Else if no CPU was found with lower priority, find CPU that runs a task of same priority In this there are two cases case 1. if the currently running task on this CPU is higher priority than the task running (ie active task priority) , then RT task can be pushed to the CPU (where it competes with the similar priority task in round robin fashion ). case 2: if the priority of the task that is running and the task that is trying to be pushed are same (from the same queue .. queue-next-next) then the balancing has to be done on the number of task that are running on these CPUs. making them run equal (or almost , considering the ping-pong effect ) number of tasks. Some notes: 1) no lock is taken while looking for the lowest priority CPU. When one is found, only that CPU's lock is taken and after that a check is made to see if it is still a candidate to push the RT task over. If not, we try the search again, for a max of 3 tries. 2) I only do this for the second highest RT task on the CPU queue. This can be easily changed to do it for all RT tasks until no more can be pushed off to other CPUs. This is a simple approach right now, and is only being posted for comments. I'm sure more can be done to make this more efficient or just simply better. -- Steve Signed-off-by: Steven Rostedt [EMAIL PROTECTED] Index: linux-2.6.23-rc9-rt2/kernel/sched.c === --- linux-2.6.23-rc9-rt2.orig/kernel/sched.c +++ linux-2.6.23-rc9-rt2/kernel/sched.c @@ -304,6 +304,7 @@ struct rq { #ifdef CONFIG_PREEMPT_RT unsigned long rt_nr_running; unsigned long rt_nr_uninterruptible; + int curr_prio; #endif unsigned long switch_timestamp; @@ -1485,6 +1486,87 @@ next_in_queue: static int double_lock_balance(struct rq *this_rq, struct rq *busiest); /* + * If the current CPU has more than one RT task, see if the non + * running task can migrate over to a CPU that is running a task + * of lesser priority. + */ +static int push_rt_task(struct rq *this_rq) +{ + struct task_struct *next_task; + struct rq *lowest_rq = NULL; + int tries; + int cpu; + int dst_cpu = -1; + int ret = 0; + + BUG_ON(!spin_is_locked(this_rq-lock)); + + next_task = rt_next_highest_task(this_rq); + if (!next_task) + return 0; + + /* We might release this_rq lock */ + get_task_struct(next_task); + + /* Only try this algorithm three times */ + for (tries = 0; tries 3; tries++) { + /* +* Scan each rq for the lowest prio. +*/ + for_each_cpu_mask(cpu, next_task-cpus_allowed) { + struct rq *rq = per_cpu(runqueues, cpu); + + if (cpu == smp_processor_id()) + continue; + + /* no locking for now */ + if (rq-curr_prio next_task-prio + (!lowest_rq || rq-curr_prio lowest_rq-curr_prio)) { + dst_cpu = cpu; + lowest_rq = rq; + } + } + + if (!lowest_rq) + break; + + if (double_lock_balance(this_rq, lowest_rq)) { + /* +* We had to unlock the run queue. In +* the mean time, next_task could have +* migrated already or had its affinity changed. +*/ + if (unlikely(task_rq(next_task) != this_rq || +!cpu_isset(dst_cpu, next_task-cpus_allowed))) { + spin_unlock(lowest_rq-lock); + break; + } + } + + /* if the prio of this runqueue changed, try again */ + if (lowest_rq-curr_prio = next_task-prio) { + spin_unlock(lowest_rq-lock); + continue; + } + +