Re: [patch] sched: avoid div in rebalance_tick

2007-01-12 Thread Nick Piggin
On Fri, Jan 12, 2007 at 09:59:40AM +, Alan wrote:
> On Fri, 12 Jan 2007 07:02:13 +0100
> Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > Just noticed this while looking at a bug.
> > Avoid an expensive integer divide 3 times per CPU per tick.
> 
> Integer divide is cheap on some modern processors, and multibit shift
> isn't on all embedded ones.
> 
> How about putting back scale = 1 and using
> 
> scale += scale;
> 
> instead of the shift and getting what ought to be even better results

OK, how about this? It only works out to be around 0.01% of my P3's CPU time
at 1000HZ, but it also did make the x86 code 16 bytes smaller.


--
Avoid expensive integer divide 3 times per CPU per tick.

A userspace test of this loop went from 26ns, down to 19ns on a G5; and
from 123ns down to 28ns on a P3.

(Also avoid a variable bit shift, as suggested by Alan. The effect
of this wasn't noticable on the CPUs I tested with).

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2887,14 +2887,16 @@ static void active_load_balance(struct r
 static void update_load(struct rq *this_rq)
 {
unsigned long this_load;
-   int i, scale;
+   unsigned int i, scale;
 
this_load = this_rq->raw_weighted_load;
 
/* Update our load: */
-   for (i = 0, scale = 1; i < 3; i++, scale <<= 1) {
+   for (i = 0, scale = 1; i < 3; i++, scale += scale) {
unsigned long old_load, new_load;
 
+   /* scale is effectively 1 << i now, and >> i divides by scale */
+
old_load = this_rq->cpu_load[i];
new_load = this_load;
/*
@@ -2904,7 +2906,7 @@ static void update_load(struct rq *this_
 */
if (new_load > old_load)
new_load += scale-1;
-   this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) / scale;
+   this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) >> i;
}
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: avoid div in rebalance_tick

2007-01-12 Thread Nick Piggin
On Fri, Jan 12, 2007 at 09:59:40AM +, Alan wrote:
> On Fri, 12 Jan 2007 07:02:13 +0100
> Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > Just noticed this while looking at a bug.
> > Avoid an expensive integer divide 3 times per CPU per tick.
> 
> Integer divide is cheap on some modern processors, and multibit shift
> isn't on all embedded ones.

Well integer divide unit is non-pipelined on P4 K8 Core2 and probably
most processors, AFAIK. So the 3 divs would take 240 cycles on a P4,
perhaps.

> How about putting back scale = 1 and using
> 
> scale += scale;
> 
> instead of the shift and getting what ought to be even better results

Yes I gues we ccan do this as well, good idea. I'll make a
quick userspace benchmark and post some numbers with my next
submission.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: avoid div in rebalance_tick

2007-01-12 Thread Alan
On Fri, 12 Jan 2007 07:02:13 +0100
Nick Piggin <[EMAIL PROTECTED]> wrote:

> Just noticed this while looking at a bug.
> Avoid an expensive integer divide 3 times per CPU per tick.

Integer divide is cheap on some modern processors, and multibit shift
isn't on all embedded ones.

How about putting back scale = 1 and using

scale += scale;

instead of the shift and getting what ought to be even better results

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: avoid div in rebalance_tick

2007-01-12 Thread Alan
On Fri, 12 Jan 2007 07:02:13 +0100
Nick Piggin [EMAIL PROTECTED] wrote:

 Just noticed this while looking at a bug.
 Avoid an expensive integer divide 3 times per CPU per tick.

Integer divide is cheap on some modern processors, and multibit shift
isn't on all embedded ones.

How about putting back scale = 1 and using

scale += scale;

instead of the shift and getting what ought to be even better results

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: avoid div in rebalance_tick

2007-01-12 Thread Nick Piggin
On Fri, Jan 12, 2007 at 09:59:40AM +, Alan wrote:
 On Fri, 12 Jan 2007 07:02:13 +0100
 Nick Piggin [EMAIL PROTECTED] wrote:
 
  Just noticed this while looking at a bug.
  Avoid an expensive integer divide 3 times per CPU per tick.
 
 Integer divide is cheap on some modern processors, and multibit shift
 isn't on all embedded ones.

Well integer divide unit is non-pipelined on P4 K8 Core2 and probably
most processors, AFAIK. So the 3 divs would take 240 cycles on a P4,
perhaps.

 How about putting back scale = 1 and using
 
 scale += scale;
 
 instead of the shift and getting what ought to be even better results

Yes I gues we ccan do this as well, good idea. I'll make a
quick userspace benchmark and post some numbers with my next
submission.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: avoid div in rebalance_tick

2007-01-12 Thread Nick Piggin
On Fri, Jan 12, 2007 at 09:59:40AM +, Alan wrote:
 On Fri, 12 Jan 2007 07:02:13 +0100
 Nick Piggin [EMAIL PROTECTED] wrote:
 
  Just noticed this while looking at a bug.
  Avoid an expensive integer divide 3 times per CPU per tick.
 
 Integer divide is cheap on some modern processors, and multibit shift
 isn't on all embedded ones.
 
 How about putting back scale = 1 and using
 
 scale += scale;
 
 instead of the shift and getting what ought to be even better results

OK, how about this? It only works out to be around 0.01% of my P3's CPU time
at 1000HZ, but it also did make the x86 code 16 bytes smaller.


--
Avoid expensive integer divide 3 times per CPU per tick.

A userspace test of this loop went from 26ns, down to 19ns on a G5; and
from 123ns down to 28ns on a P3.

(Also avoid a variable bit shift, as suggested by Alan. The effect
of this wasn't noticable on the CPUs I tested with).

Signed-off-by: Nick Piggin [EMAIL PROTECTED]

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2887,14 +2887,16 @@ static void active_load_balance(struct r
 static void update_load(struct rq *this_rq)
 {
unsigned long this_load;
-   int i, scale;
+   unsigned int i, scale;
 
this_load = this_rq-raw_weighted_load;
 
/* Update our load: */
-   for (i = 0, scale = 1; i  3; i++, scale = 1) {
+   for (i = 0, scale = 1; i  3; i++, scale += scale) {
unsigned long old_load, new_load;
 
+   /* scale is effectively 1  i now, and  i divides by scale */
+
old_load = this_rq-cpu_load[i];
new_load = this_load;
/*
@@ -2904,7 +2906,7 @@ static void update_load(struct rq *this_
 */
if (new_load  old_load)
new_load += scale-1;
-   this_rq-cpu_load[i] = (old_load*(scale-1) + new_load) / scale;
+   this_rq-cpu_load[i] = (old_load*(scale-1) + new_load)  i;
}
 }
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] sched: avoid div in rebalance_tick

2007-01-11 Thread Nick Piggin
Just noticed this while looking at a bug.

--

Avoid an expensive integer divide 3 times per CPU per tick.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2887,13 +2887,14 @@ static void active_load_balance(struct r
 static void update_load(struct rq *this_rq)
 {
unsigned long this_load;
-   int i, scale;
+   int i;
 
this_load = this_rq->raw_weighted_load;
 
/* Update our load: */
-   for (i = 0, scale = 1; i < 3; i++, scale <<= 1) {
+   for (i = 0; i < 3; i++) {
unsigned long old_load, new_load;
+   int scale;
 
old_load = this_rq->cpu_load[i];
new_load = this_load;
@@ -2902,9 +2903,11 @@ static void update_load(struct rq *this_
 * prevents us from getting stuck on 9 if the load is 10, for
 * example.
 */
+   scale = 1 << i;
if (new_load > old_load)
new_load += scale-1;
-   this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) / scale;
+   this_rq->cpu_load[i] = (old_load*(scale-1) + new_load)
+   >> i; /* (divide by 'scale') */
}
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] sched: avoid div in rebalance_tick

2007-01-11 Thread Nick Piggin
Just noticed this while looking at a bug.

--

Avoid an expensive integer divide 3 times per CPU per tick.

Signed-off-by: Nick Piggin [EMAIL PROTECTED]

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2887,13 +2887,14 @@ static void active_load_balance(struct r
 static void update_load(struct rq *this_rq)
 {
unsigned long this_load;
-   int i, scale;
+   int i;
 
this_load = this_rq-raw_weighted_load;
 
/* Update our load: */
-   for (i = 0, scale = 1; i  3; i++, scale = 1) {
+   for (i = 0; i  3; i++) {
unsigned long old_load, new_load;
+   int scale;
 
old_load = this_rq-cpu_load[i];
new_load = this_load;
@@ -2902,9 +2903,11 @@ static void update_load(struct rq *this_
 * prevents us from getting stuck on 9 if the load is 10, for
 * example.
 */
+   scale = 1  i;
if (new_load  old_load)
new_load += scale-1;
-   this_rq-cpu_load[i] = (old_load*(scale-1) + new_load) / scale;
+   this_rq-cpu_load[i] = (old_load*(scale-1) + new_load)
+i; /* (divide by 'scale') */
}
 }
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/