The idle_balance() code is called to do task load balancing just before going to idle. This makes sense as the CPU is about to sleep anyway. But currently it's called in the middle of the scheduler and in a place that must have interrupts disabled. That means, while the load balancing is going on, if a task wakes up on this CPU, it wont get to run while the interrupts are disabled. The idle task doing the balancing will be clueless about it.
There's no real reason that the idle_balance() needs to be called in the middle of schedule anyway. The only benefit is that if a task is pulled to this CPU, it can be scheduled without the need to schedule the idle task. But load balancing and migrating the task makes a switch to idle and back negligible. By using the post_schedule function pointer of the sched class, the unlikely branch in the hot path of the scheduler can be removed, and the idle task itself can do the load balancing. Another advantage of this, is that by moving the idle_balance to the post_schedule routine, interrupts can now be enabled in the load balance allowing for interrupts and wakeups to still occur on that CPU while a balance is taking place. The enabling of interrupts will come as a separate patch. Signed-off-by: Steven Rostedt <rost...@goodmis.org> Index: linux-trace.git/kernel/sched/core.c =================================================================== --- linux-trace.git.orig/kernel/sched/core.c +++ linux-trace.git/kernel/sched/core.c @@ -2926,9 +2926,6 @@ need_resched: pre_schedule(rq, prev); - if (unlikely(!rq->nr_running)) - idle_balance(cpu, rq); - put_prev_task(rq, prev); next = pick_next_task(rq); clear_tsk_need_resched(prev); Index: linux-trace.git/kernel/sched/idle_task.c =================================================================== --- linux-trace.git.orig/kernel/sched/idle_task.c +++ linux-trace.git/kernel/sched/idle_task.c @@ -25,6 +25,7 @@ static void check_preempt_curr_idle(stru static struct task_struct *pick_next_task_idle(struct rq *rq) { schedstat_inc(rq, sched_goidle); + rq->post_schedule = 1; return rq->idle; } @@ -69,6 +70,12 @@ static unsigned int get_rr_interval_idle return 0; } + +static void post_schedule_idle(struct rq *rq) +{ + idle_balance(smp_processor_id(), rq); +} + /* * Simple, special scheduling class for the per-CPU idle tasks: */ @@ -91,6 +98,8 @@ const struct sched_class idle_sched_clas .set_curr_task = set_curr_task_idle, .task_tick = task_tick_idle, + .post_schedule = post_schedule_idle, + .get_rr_interval = get_rr_interval_idle, .prio_changed = prio_changed_idle, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/