Re: [RFC][PATCH 1/2] sched: Move idle_balance() to post_schedule

2013-01-05 Thread Frederic Weisbecker
2012/12/22 Steven Rostedt :
> The idle_balance() code is called to do task load balancing just before
> going to idle. This makes sense as the CPU is about to sleep anyway.
> But currently it's called in the middle of the scheduler and in a place
> that must have interrupts disabled. That means, while the load balancing
> is going on, if a task wakes up on this CPU, it wont get to run while
> the interrupts are disabled. The idle task doing the balancing will be
> clueless about it.
>
> There's no real reason that the idle_balance() needs to be called in the
> middle of schedule anyway. The only benefit is that if a task is pulled
> to this CPU, it can be scheduled without the need to schedule the idle
> task. But load balancing and migrating the task makes a switch to idle
> and back negligible.

This cleanup looks nice as it does not only let us enable interrupts
there but also debloats a bit the schedule() code from idle specific
code. So it would be a pity if the optimization that goes away with
your cleanup has any measurable impact. Is there any sensible
benchmark that can be run against this patch? Something that may
involve a lot of back and forth to idle with some bunch of tasks
running around on other CPUs?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2] sched: Move idle_balance() to post_schedule

2012-12-21 Thread Steven Rostedt
---
So much for saying it was well tested. As my email box is separate from
the development boxes, I didn't copy the write set of patches over. This
patch was tested thoroughly on 40 core and 4 core, but not UP. The UP
had some bugs that were fixed (but not sent :-p)

Here's the correct patch with the UP fixes. The other patch shouldn't be
affected.
---

sched: Move idle_balance() to post_schedule

The idle_balance() code is called to do task load balancing just before
going to idle. This makes sense as the CPU is about to sleep anyway.
But currently it's called in the middle of the scheduler and in a place
that must have interrupts disabled. That means, while the load balancing
is going on, if a task wakes up on this CPU, it wont get to run while
the interrupts are disabled. The idle task doing the balancing will be
clueless about it.

There's no real reason that the idle_balance() needs to be called in the
middle of schedule anyway. The only benefit is that if a task is pulled
to this CPU, it can be scheduled without the need to schedule the idle
task. But load balancing and migrating the task makes a switch to idle
and back negligible.

By using the post_schedule function pointer of the sched class, the
unlikely branch in the hot path of the scheduler can be removed, and
the idle task itself can do the load balancing.

Another advantage of this, is that by moving the idle_balance to the
post_schedule routine, interrupts can now be enabled in the load balance
allowing for interrupts and wakeups to still occur on that CPU while a
balance is taking place. The enabling of interrupts will come as a separate
patch.

Signed-off-by: Steven Rostedt 

Index: linux-trace.git/kernel/sched/core.c
===
--- linux-trace.git.orig/kernel/sched/core.c
+++ linux-trace.git/kernel/sched/core.c
@@ -2926,9 +2926,6 @@ need_resched:
 
pre_schedule(rq, prev);
 
-   if (unlikely(!rq->nr_running))
-   idle_balance(cpu, rq);
-
put_prev_task(rq, prev);
next = pick_next_task(rq);
clear_tsk_need_resched(prev);
Index: linux-trace.git/kernel/sched/idle_task.c
===
--- linux-trace.git.orig/kernel/sched/idle_task.c
+++ linux-trace.git/kernel/sched/idle_task.c
@@ -13,6 +13,11 @@ select_task_rq_idle(struct task_struct *
 {
return task_cpu(p); /* IDLE tasks as never migrated */
 }
+
+static void post_schedule_idle(struct rq *rq)
+{
+   idle_balance(smp_processor_id(), rq);
+}
 #endif /* CONFIG_SMP */
 /*
  * Idle tasks are unconditionally rescheduled:
@@ -25,6 +30,10 @@ static void check_preempt_curr_idle(stru
 static struct task_struct *pick_next_task_idle(struct rq *rq)
 {
schedstat_inc(rq, sched_goidle);
+#ifdef CONFIG_SMP
+   /* Trigger the post schedule to do an idle_balance */
+   rq->post_schedule = 1;
+#endif
return rq->idle;
 }
 
@@ -86,6 +95,7 @@ const struct sched_class idle_sched_clas
 
 #ifdef CONFIG_SMP
.select_task_rq = select_task_rq_idle,
+   .post_schedule  = post_schedule_idle,
 #endif
 
.set_curr_task  = set_curr_task_idle,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 1/2] sched: Move idle_balance() to post_schedule

2012-12-21 Thread Steven Rostedt
The idle_balance() code is called to do task load balancing just before
going to idle. This makes sense as the CPU is about to sleep anyway.
But currently it's called in the middle of the scheduler and in a place
that must have interrupts disabled. That means, while the load balancing
is going on, if a task wakes up on this CPU, it wont get to run while
the interrupts are disabled. The idle task doing the balancing will be
clueless about it.

There's no real reason that the idle_balance() needs to be called in the
middle of schedule anyway. The only benefit is that if a task is pulled
to this CPU, it can be scheduled without the need to schedule the idle
task. But load balancing and migrating the task makes a switch to idle
and back negligible.

By using the post_schedule function pointer of the sched class, the
unlikely branch in the hot path of the scheduler can be removed, and
the idle task itself can do the load balancing.

Another advantage of this, is that by moving the idle_balance to the
post_schedule routine, interrupts can now be enabled in the load balance
allowing for interrupts and wakeups to still occur on that CPU while a
balance is taking place. The enabling of interrupts will come as a separate
patch.

Signed-off-by: Steven Rostedt 

Index: linux-trace.git/kernel/sched/core.c
===
--- linux-trace.git.orig/kernel/sched/core.c
+++ linux-trace.git/kernel/sched/core.c
@@ -2926,9 +2926,6 @@ need_resched:
 
pre_schedule(rq, prev);
 
-   if (unlikely(!rq->nr_running))
-   idle_balance(cpu, rq);
-
put_prev_task(rq, prev);
next = pick_next_task(rq);
clear_tsk_need_resched(prev);
Index: linux-trace.git/kernel/sched/idle_task.c
===
--- linux-trace.git.orig/kernel/sched/idle_task.c
+++ linux-trace.git/kernel/sched/idle_task.c
@@ -25,6 +25,7 @@ static void check_preempt_curr_idle(stru
 static struct task_struct *pick_next_task_idle(struct rq *rq)
 {
schedstat_inc(rq, sched_goidle);
+   rq->post_schedule = 1;
return rq->idle;
 }
 
@@ -69,6 +70,12 @@ static unsigned int get_rr_interval_idle
return 0;
 }
 
+
+static void post_schedule_idle(struct rq *rq)
+{
+   idle_balance(smp_processor_id(), rq);
+}
+
 /*
  * Simple, special scheduling class for the per-CPU idle tasks:
  */
@@ -91,6 +98,8 @@ const struct sched_class idle_sched_clas
.set_curr_task  = set_curr_task_idle,
.task_tick  = task_tick_idle,
 
+   .post_schedule  = post_schedule_idle,
+
.get_rr_interval= get_rr_interval_idle,
 
.prio_changed   = prio_changed_idle,

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/