Re: [patch] sched: avoid div in rebalance_tick
On Fri, Jan 12, 2007 at 09:59:40AM +, Alan wrote: > On Fri, 12 Jan 2007 07:02:13 +0100 > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > Just noticed this while looking at a bug. > > Avoid an expensive integer divide 3 times per CPU per tick. > > Integer divide is cheap on some modern processors, and multibit shift > isn't on all embedded ones. > > How about putting back scale = 1 and using > > scale += scale; > > instead of the shift and getting what ought to be even better results OK, how about this? It only works out to be around 0.01% of my P3's CPU time at 1000HZ, but it also did make the x86 code 16 bytes smaller. -- Avoid expensive integer divide 3 times per CPU per tick. A userspace test of this loop went from 26ns, down to 19ns on a G5; and from 123ns down to 28ns on a P3. (Also avoid a variable bit shift, as suggested by Alan. The effect of this wasn't noticable on the CPUs I tested with). Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -2887,14 +2887,16 @@ static void active_load_balance(struct r static void update_load(struct rq *this_rq) { unsigned long this_load; - int i, scale; + unsigned int i, scale; this_load = this_rq->raw_weighted_load; /* Update our load: */ - for (i = 0, scale = 1; i < 3; i++, scale <<= 1) { + for (i = 0, scale = 1; i < 3; i++, scale += scale) { unsigned long old_load, new_load; + /* scale is effectively 1 << i now, and >> i divides by scale */ + old_load = this_rq->cpu_load[i]; new_load = this_load; /* @@ -2904,7 +2906,7 @@ static void update_load(struct rq *this_ */ if (new_load > old_load) new_load += scale-1; - this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) / scale; + this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) >> i; } } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] sched: avoid div in rebalance_tick
On Fri, Jan 12, 2007 at 09:59:40AM +, Alan wrote: > On Fri, 12 Jan 2007 07:02:13 +0100 > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > Just noticed this while looking at a bug. > > Avoid an expensive integer divide 3 times per CPU per tick. > > Integer divide is cheap on some modern processors, and multibit shift > isn't on all embedded ones. Well integer divide unit is non-pipelined on P4 K8 Core2 and probably most processors, AFAIK. So the 3 divs would take 240 cycles on a P4, perhaps. > How about putting back scale = 1 and using > > scale += scale; > > instead of the shift and getting what ought to be even better results Yes I gues we ccan do this as well, good idea. I'll make a quick userspace benchmark and post some numbers with my next submission. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] sched: avoid div in rebalance_tick
On Fri, 12 Jan 2007 07:02:13 +0100 Nick Piggin <[EMAIL PROTECTED]> wrote: > Just noticed this while looking at a bug. > Avoid an expensive integer divide 3 times per CPU per tick. Integer divide is cheap on some modern processors, and multibit shift isn't on all embedded ones. How about putting back scale = 1 and using scale += scale; instead of the shift and getting what ought to be even better results - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] sched: avoid div in rebalance_tick
On Fri, 12 Jan 2007 07:02:13 +0100 Nick Piggin [EMAIL PROTECTED] wrote: Just noticed this while looking at a bug. Avoid an expensive integer divide 3 times per CPU per tick. Integer divide is cheap on some modern processors, and multibit shift isn't on all embedded ones. How about putting back scale = 1 and using scale += scale; instead of the shift and getting what ought to be even better results - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] sched: avoid div in rebalance_tick
On Fri, Jan 12, 2007 at 09:59:40AM +, Alan wrote: On Fri, 12 Jan 2007 07:02:13 +0100 Nick Piggin [EMAIL PROTECTED] wrote: Just noticed this while looking at a bug. Avoid an expensive integer divide 3 times per CPU per tick. Integer divide is cheap on some modern processors, and multibit shift isn't on all embedded ones. Well integer divide unit is non-pipelined on P4 K8 Core2 and probably most processors, AFAIK. So the 3 divs would take 240 cycles on a P4, perhaps. How about putting back scale = 1 and using scale += scale; instead of the shift and getting what ought to be even better results Yes I gues we ccan do this as well, good idea. I'll make a quick userspace benchmark and post some numbers with my next submission. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] sched: avoid div in rebalance_tick
On Fri, Jan 12, 2007 at 09:59:40AM +, Alan wrote: On Fri, 12 Jan 2007 07:02:13 +0100 Nick Piggin [EMAIL PROTECTED] wrote: Just noticed this while looking at a bug. Avoid an expensive integer divide 3 times per CPU per tick. Integer divide is cheap on some modern processors, and multibit shift isn't on all embedded ones. How about putting back scale = 1 and using scale += scale; instead of the shift and getting what ought to be even better results OK, how about this? It only works out to be around 0.01% of my P3's CPU time at 1000HZ, but it also did make the x86 code 16 bytes smaller. -- Avoid expensive integer divide 3 times per CPU per tick. A userspace test of this loop went from 26ns, down to 19ns on a G5; and from 123ns down to 28ns on a P3. (Also avoid a variable bit shift, as suggested by Alan. The effect of this wasn't noticable on the CPUs I tested with). Signed-off-by: Nick Piggin [EMAIL PROTECTED] Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -2887,14 +2887,16 @@ static void active_load_balance(struct r static void update_load(struct rq *this_rq) { unsigned long this_load; - int i, scale; + unsigned int i, scale; this_load = this_rq-raw_weighted_load; /* Update our load: */ - for (i = 0, scale = 1; i 3; i++, scale = 1) { + for (i = 0, scale = 1; i 3; i++, scale += scale) { unsigned long old_load, new_load; + /* scale is effectively 1 i now, and i divides by scale */ + old_load = this_rq-cpu_load[i]; new_load = this_load; /* @@ -2904,7 +2906,7 @@ static void update_load(struct rq *this_ */ if (new_load old_load) new_load += scale-1; - this_rq-cpu_load[i] = (old_load*(scale-1) + new_load) / scale; + this_rq-cpu_load[i] = (old_load*(scale-1) + new_load) i; } } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] sched: avoid div in rebalance_tick
Just noticed this while looking at a bug. -- Avoid an expensive integer divide 3 times per CPU per tick. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -2887,13 +2887,14 @@ static void active_load_balance(struct r static void update_load(struct rq *this_rq) { unsigned long this_load; - int i, scale; + int i; this_load = this_rq->raw_weighted_load; /* Update our load: */ - for (i = 0, scale = 1; i < 3; i++, scale <<= 1) { + for (i = 0; i < 3; i++) { unsigned long old_load, new_load; + int scale; old_load = this_rq->cpu_load[i]; new_load = this_load; @@ -2902,9 +2903,11 @@ static void update_load(struct rq *this_ * prevents us from getting stuck on 9 if the load is 10, for * example. */ + scale = 1 << i; if (new_load > old_load) new_load += scale-1; - this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) / scale; + this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) + >> i; /* (divide by 'scale') */ } } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] sched: avoid div in rebalance_tick
Just noticed this while looking at a bug. -- Avoid an expensive integer divide 3 times per CPU per tick. Signed-off-by: Nick Piggin [EMAIL PROTECTED] Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -2887,13 +2887,14 @@ static void active_load_balance(struct r static void update_load(struct rq *this_rq) { unsigned long this_load; - int i, scale; + int i; this_load = this_rq-raw_weighted_load; /* Update our load: */ - for (i = 0, scale = 1; i 3; i++, scale = 1) { + for (i = 0; i 3; i++) { unsigned long old_load, new_load; + int scale; old_load = this_rq-cpu_load[i]; new_load = this_load; @@ -2902,9 +2903,11 @@ static void update_load(struct rq *this_ * prevents us from getting stuck on 9 if the load is 10, for * example. */ + scale = 1 i; if (new_load old_load) new_load += scale-1; - this_rq-cpu_load[i] = (old_load*(scale-1) + new_load) / scale; + this_rq-cpu_load[i] = (old_load*(scale-1) + new_load) +i; /* (divide by 'scale') */ } } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/