Re: [patch][rfc] optimise resched, idle task

2005-04-03 Thread Nick Piggin
Nick Piggin wrote:
This actually improves performance noticably (ie. a % or so) on schedule /
wakeup happy benchmarks (tbench, on a dual Xeon with HT using mwait idle).
Here are some numbers on a 2 socket Xeon with HT and mwait idle.
Average of 3 runs, tbench, single client and single server processes on:
samethread  samecpu othercpu
before patch:   188.684 MB/s189.237 MB/s172.306 MB/s
after patch :   188.425 MB/s191.628 MB/s174.224 MB/s
The improvement on other CPUs should be due to the removal of
RMW operations in resched_task.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch][rfc] optimise resched, idle task

2005-04-03 Thread Nick Piggin
Nick Piggin wrote:
This actually improves performance noticably (ie. a % or so) on schedule /
wakeup happy benchmarks (tbench, on a dual Xeon with HT using mwait idle).
Here are some numbers on a 2 socket Xeon with HT and mwait idle.
Average of 3 runs, tbench, single client and single server processes on:
samethread  samecpu othercpu
before patch:   188.684 MB/s189.237 MB/s172.306 MB/s
after patch :   188.425 MB/s191.628 MB/s174.224 MB/s
The improvement on other CPUs should be due to the removal of
RMW operations in resched_task.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch][rfc] optimise resched, idle task

2005-04-01 Thread Nick Piggin
Haven't finished (hardly started) the arch code for this yet (and probably
broke ppc64), so it is just an RFC at this stage.
The large CC list is because it is a reasonably big change, so I hope one
or two of you smart guys can see if it looks sound before I try to tackle
the rest of the arch code and start asking for testers.
This actually improves performance noticably (ie. a % or so) on schedule /
wakeup happy benchmarks (tbench, on a dual Xeon with HT using mwait idle).
--
SUSE Labs, Novell Inc.
Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
confusion, and make their semantics rigid. Also have preempt explicitly
disabled in idle routines. Improves efficiency of resched_task and some
cpu_idle routines.

* In resched_task:
- TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
  and as we hold it during resched_task, then there is no need for an
  atomic test and set there. (The only time this may prevent an IPI is
  when the task's quantum expires in the timer interrupt - this is a
  very rare race to bother with in comparison with the cost).

- If TIF_NEED_RESCHED is set, then we don't need to do anything. It
  won't get unset until the task get's schedule()d off.

- If we are running on the same CPU as the task we resched, then set
  TIF_NEED_RESCHED and no further action is required.

- If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
  after TIF_NEED_RESCHED has been set, then we need to send an IPI.

Using these rules, we are able to remove the test and set operation in
resched_task, and make clear the previously vague semantics of POLLING_NRFLAG.

* In idle routines:
- Enter cpu_idle with preempt disabled. When the need_resched() condition
  becomes true, explicitly call schedule(). This makes things a bit clearer
  (IMO), but haven't updated all architectures yet.

- Many do a test and clear of TIF_NEED_RESCHED for some reason. According
  to the resched_task rules, this isn't needed (and actually breaks the
  assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
  held). So remove that. Generally one less locked memory op when switching
  to the idle thread.

- Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
  most polling idle loops. The above resched_task semantics allow it to be
  set until before the last time need_resched() is checked before going into
  a halt requiring interrupt wakeup.

  Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
  can be always left set, completely eliminating resched IPIs when rescheduling
  the idle task.

  POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c   2005-03-27 00:25:49.0 +1100
+++ linux-2.6/kernel/sched.c2005-03-27 00:27:30.0 +1100
@@ -796,21 +796,28 @@ static void deactivate_task(struct task_
 #ifdef CONFIG_SMP
 static void resched_task(task_t *p)
 {
-   int need_resched, nrpolling;
+   int cpu;
 
assert_spin_locked(_rq(p)->lock);
 
-   /* minimise the chance of sending an interrupt to poll_idle() */
-   nrpolling = test_tsk_thread_flag(p,TIF_POLLING_NRFLAG);
-   need_resched = test_and_set_tsk_thread_flag(p,TIF_NEED_RESCHED);
-   nrpolling |= test_tsk_thread_flag(p,TIF_POLLING_NRFLAG);
+   if (test_tsk_thread_flag(p, TIF_NEED_RESCHED))
+   return;
+   
+   set_tsk_thread_flag(p, TIF_NEED_RESCHED);
 
-   if (!need_resched && !nrpolling && (task_cpu(p) != smp_processor_id()))
-   smp_send_reschedule(task_cpu(p));
+   cpu = task_cpu(p);
+   if (cpu == smp_processor_id())
+   return;
+
+   /* NEED_RESCHED must be visible before we test POLLING_NRFLAG */
+   smp_mb();
+   if (!test_tsk_thread_flag(p, TIF_POLLING_NRFLAG))
+   smp_send_reschedule(cpu);
 }
 #else
 static inline void resched_task(task_t *p)
 {
+   assert_spin_locked(_rq(p)->lock);
set_tsk_need_resched(p);
 }
 #endif
Index: linux-2.6/arch/i386/kernel/process.c
===
--- linux-2.6.orig/arch/i386/kernel/process.c   2005-03-27 00:25:49.0 
+1100
+++ linux-2.6/arch/i386/kernel/process.c2005-03-27 00:27:30.0 
+1100
@@ -95,14 +95,19 @@ EXPORT_SYMBOL(enable_hlt);
  */
 void default_idle(void)
 {
+   local_irq_enable();
+
if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
-   local_irq_disable();
-   if (!need_resched())
+   clear_thread_flag(TIF_POLLING_NRFLAG);
+   smp_mb__after_clear_bit();
+   while (!need_resched()) {
+   local_irq_disable();
safe_halt();
-   else
-   local_irq_enable();
+   }
+   

[patch][rfc] optimise resched, idle task

2005-04-01 Thread Nick Piggin
Haven't finished (hardly started) the arch code for this yet (and probably
broke ppc64), so it is just an RFC at this stage.
The large CC list is because it is a reasonably big change, so I hope one
or two of you smart guys can see if it looks sound before I try to tackle
the rest of the arch code and start asking for testers.
This actually improves performance noticably (ie. a % or so) on schedule /
wakeup happy benchmarks (tbench, on a dual Xeon with HT using mwait idle).
--
SUSE Labs, Novell Inc.
Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
confusion, and make their semantics rigid. Also have preempt explicitly
disabled in idle routines. Improves efficiency of resched_task and some
cpu_idle routines.

* In resched_task:
- TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
  and as we hold it during resched_task, then there is no need for an
  atomic test and set there. (The only time this may prevent an IPI is
  when the task's quantum expires in the timer interrupt - this is a
  very rare race to bother with in comparison with the cost).

- If TIF_NEED_RESCHED is set, then we don't need to do anything. It
  won't get unset until the task get's schedule()d off.

- If we are running on the same CPU as the task we resched, then set
  TIF_NEED_RESCHED and no further action is required.

- If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
  after TIF_NEED_RESCHED has been set, then we need to send an IPI.

Using these rules, we are able to remove the test and set operation in
resched_task, and make clear the previously vague semantics of POLLING_NRFLAG.

* In idle routines:
- Enter cpu_idle with preempt disabled. When the need_resched() condition
  becomes true, explicitly call schedule(). This makes things a bit clearer
  (IMO), but haven't updated all architectures yet.

- Many do a test and clear of TIF_NEED_RESCHED for some reason. According
  to the resched_task rules, this isn't needed (and actually breaks the
  assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
  held). So remove that. Generally one less locked memory op when switching
  to the idle thread.

- Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
  most polling idle loops. The above resched_task semantics allow it to be
  set until before the last time need_resched() is checked before going into
  a halt requiring interrupt wakeup.

  Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
  can be always left set, completely eliminating resched IPIs when rescheduling
  the idle task.

  POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c   2005-03-27 00:25:49.0 +1100
+++ linux-2.6/kernel/sched.c2005-03-27 00:27:30.0 +1100
@@ -796,21 +796,28 @@ static void deactivate_task(struct task_
 #ifdef CONFIG_SMP
 static void resched_task(task_t *p)
 {
-   int need_resched, nrpolling;
+   int cpu;
 
assert_spin_locked(task_rq(p)-lock);
 
-   /* minimise the chance of sending an interrupt to poll_idle() */
-   nrpolling = test_tsk_thread_flag(p,TIF_POLLING_NRFLAG);
-   need_resched = test_and_set_tsk_thread_flag(p,TIF_NEED_RESCHED);
-   nrpolling |= test_tsk_thread_flag(p,TIF_POLLING_NRFLAG);
+   if (test_tsk_thread_flag(p, TIF_NEED_RESCHED))
+   return;
+   
+   set_tsk_thread_flag(p, TIF_NEED_RESCHED);
 
-   if (!need_resched  !nrpolling  (task_cpu(p) != smp_processor_id()))
-   smp_send_reschedule(task_cpu(p));
+   cpu = task_cpu(p);
+   if (cpu == smp_processor_id())
+   return;
+
+   /* NEED_RESCHED must be visible before we test POLLING_NRFLAG */
+   smp_mb();
+   if (!test_tsk_thread_flag(p, TIF_POLLING_NRFLAG))
+   smp_send_reschedule(cpu);
 }
 #else
 static inline void resched_task(task_t *p)
 {
+   assert_spin_locked(task_rq(p)-lock);
set_tsk_need_resched(p);
 }
 #endif
Index: linux-2.6/arch/i386/kernel/process.c
===
--- linux-2.6.orig/arch/i386/kernel/process.c   2005-03-27 00:25:49.0 
+1100
+++ linux-2.6/arch/i386/kernel/process.c2005-03-27 00:27:30.0 
+1100
@@ -95,14 +95,19 @@ EXPORT_SYMBOL(enable_hlt);
  */
 void default_idle(void)
 {
+   local_irq_enable();
+
if (!hlt_counter  boot_cpu_data.hlt_works_ok) {
-   local_irq_disable();
-   if (!need_resched())
+   clear_thread_flag(TIF_POLLING_NRFLAG);
+   smp_mb__after_clear_bit();
+   while (!need_resched()) {
+   local_irq_disable();
safe_halt();
-   else
-   local_irq_enable();
+   }
+