Re: [RFC] div64_64 support
Sami Farin wrote: On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote: ... And I found bug in gcc-4.1.2, it gave 0 for ncubic results when doing 1000 loops test... gcc-4.0.3 works. Found it. --- cbrt-test.c~ 2007-03-07 00:20:54.735248105 +0200 +++ cbrt-test.c 2007-03-07 00:21:03.964864343 +0200 @@ -209,7 +209,7 @@ __asm__(bsrl %1,%0\n\t cmovzl %2,%0 - : =r (r) : rm (x), rm (-1)); + : =r (r) : rm (x), rm (-1) : memory); return r+1; } Now Linux 2.6 does not have memory in fls, maybe it causes some gcc funnies some people are seeing. Can you post the difference in the generated code with that change? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Mon, Mar 05, 2007 at 03:57:14PM -0800, Stephen Hemminger wrote: On 03 Mar 2007 03:31:52 +0100 Andi Kleen [EMAIL PROTECTED] wrote: Stephen Hemminger [EMAIL PROTECTED] writes: Here is another way to handle the 64 bit divide case. It allows full 64 bit divide by adding the support routine GCC needs. Not supplying that was intentional by Linus so that people think twice (or more often) before they using such expensive operations. A plain / looks too innocent. Is it really needed by CUBIC anyways? It uses it for getting the cubic root, but the algorithm recommended by Hacker's Delight (great book) doesn't use any divisions at all. Probably better to use a better algorithm without divisions. I tried the code from Hacker's Delight. It is cool, but performance is CPU (and data) dependent: I did too. My experiences were mixed: on 32bit it was slower, on 64bit faster on average. Strangely the 32bit version ran faster again without -fomit-frame-pointer, so it's likely some funny interaction with 32bit long long code generation. The difference is never more than 100 cycles so it shouldn't be a big issue either way. For some input arguments (1% in my testing) it also gave an answer 1 off from the existing code, but I don't think that's a problem. But more importantly during testing I found that the cubic code gives a division by zero for input arguments 2^43. If you have a system with 16TB of memory this could actually be a remotely exploitable bug :) I still think it's a good idea to switch to the new function, especially since it's shorter code. Here's the patch. Note I didn't verify it with real large window TCP operations; only unit testing. -Andi Use Hacker's delight cube root algorithm in cubic TCP Shorter code and fixes a theoretically remote exploitable bug. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Index: linux-2.6.21-rc1-net/net/ipv4/tcp_cubic.c === --- linux-2.6.21-rc1-net.orig/net/ipv4/tcp_cubic.c +++ linux-2.6.21-rc1-net/net/ipv4/tcp_cubic.c @@ -93,51 +93,24 @@ static void bictcp_init(struct sock *sk) tcp_sk(sk)-snd_ssthresh = initial_ssthresh; } -/* 64bit divisor, dividend and result. dynamic precision */ -static inline u_int64_t div64_64(u_int64_t dividend, u_int64_t divisor) +static u32 cubic_root(u64 x) { - u_int32_t d = divisor; - - if (divisor 0xULL) { - unsigned int shift = fls(divisor 32); - - d = divisor shift; - dividend = shift; - } - - /* avoid 64 bit division if possible */ - if (dividend 32) - do_div(dividend, d); - else - dividend = (uint32_t) dividend / d; - - return dividend; -} - -/* - * calculate the cubic root of x using Newton-Raphson - */ -static u32 cubic_root(u64 a) -{ - u32 x, x1; - - /* Initial estimate is based on: -* cbrt(x) = exp(log(x) / 3) -*/ - x = 1u (fls64(a)/3); - - /* -* Iteration based on: -* 2 -* x= ( 2 * x + a / x ) / 3 -* k+1 k k -*/ - do { - x1 = x; - x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3; - } while (abs(x1 - x) 1); - - return x; + int s; + u32 y; + u64 b; + u64 bs; + + y = 0; + for (s = 63; s = 0; s -= 3) { + y = 2 * y; + b = 3 * y * (y+1) + 1; + bs = b s; + if (x = bs (b == (bss))) { /* avoid overflow */ + x -= bs; + y++; + } + } + return y; } /* - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Mon, Mar 05, 2007 at 04:25:51PM -0800, David Miller wrote: Another thing is that the non-Hacker's Delight version iterates differently for different input values, so the input value space is very important to consider when comparing these two pieces of code. I did some stochastic testing on my version. It gave 1 off for 1% of the arguments. Probably not an issue. Besides it actually works for 2^43 @) -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support II
The problem with these algorithms that tradoff one or more multiplies in order to avoid a divide is that they don't give anything and often lose when both multiplies and divides are emulated in software. Actually on rereading this: is there really any Linux port that emulates multiplies in software? I thought that was only done on really small microcontrollers or smart cards; but anything 32bit+ that runs Linux should have hardware multiply, shouldn't it? -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Tuesday 06 March 2007 14:34, Andi Kleen wrote: - return x; + int s; + u32 y; + u64 b; + u64 bs; + + y = 0; + for (s = 63; s = 0; s -= 3) { + y = 2 * y; + b = 3 * y * (y+1) + 1; + bs = b s; + if (x = bs (b == (bss))) { /* avoid overflow */ + x -= bs; + y++; + } + } + return y; } Andi rant Let me see... You throw code like that and expect someone to actually understand it in one year, and be able to correct a bug ? /rant Please add something, an URL or even better a nice explanation, per favor... Thank you - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
rant Let me see... You throw code like that and expect someone to actually understand it in one year, and be able to correct a bug ? To be honest I don't expect any bugs in this function. /rant Please add something, an URL or even better a nice explanation, per favor... It's straight out of Hacker's delight which is referenced in the commit log. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support II
Andi Kleen [EMAIL PROTECTED] writes: Actually on rereading this: is there really any Linux port that emulates multiplies in software? I thought that was only done on really small microcontrollers or smart cards; but anything 32bit+ that runs Linux should have hardware multiply, shouldn't it? SPARCv7 (sun4/sun4c) doesn't have hardware mul/div. This includes SparcStation 1, 1+, 2, SLC, ELC, IPC and IPX. -- ilmari A disappointingly low fraction of the human race is, at any given time, on fire. - Stig Sandbeck Mathisen - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support II
From: [EMAIL PROTECTED] (Dagfinn Ilmari Mannsåker) Date: Tue, 06 Mar 2007 18:43:14 +0100 Andi Kleen [EMAIL PROTECTED] writes: Actually on rereading this: is there really any Linux port that emulates multiplies in software? I thought that was only done on really small microcontrollers or smart cards; but anything 32bit+ that runs Linux should have hardware multiply, shouldn't it? SPARCv7 (sun4/sun4c) doesn't have hardware mul/div. This includes SparcStation 1, 1+, 2, SLC, ELC, IPC and IPX. Right and the cypress sun4m parts don't do multiply/divide in hardware either. I believe the Alpha does divide in software too, see arch/alpha/lib/divide.S - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
Don't count the existing Newton-Raphson out. It turns out that to get enough precision for 32 bits, only 4 iterations are needed. By unrolling those, it gets much better timing. Slightly gross test program (with original cubic wraparound bug fixed). --- /* Test and measure perf of cube root algorithms. */ #include stdio.h #include stdlib.h #include stdint.h #include math.h #ifdef __x86_64 #define rdtscll(val) do { \ unsigned int __a,__d; \ asm volatile(rdtsc : =a (__a), =d (__d)); \ (val) = ((unsigned long)__a) | (((unsigned long)__d)32); \ } while(0) # define do_div(n,base) ({ \ uint32_t __base = (base); \ uint32_t __rem; \ __rem = ((uint64_t)(n)) % __base; \ (n) = ((uint64_t)(n)) / __base; \ __rem; \ }) /** * __ffs - find first bit in word. * @word: The word to search * * Undefined if no bit exists, so code should check against 0 first. */ static __inline__ unsigned long __ffs(unsigned long word) { __asm__(bsfq %1,%0 :=r (word) :rm (word)); return word; } /* * __fls: find last bit set. * @word: The word to search * * Undefined if no zero exists, so code should check against ~0UL first. */ static inline unsigned long __fls(unsigned long word) { __asm__(bsrq %1,%0 :=r (word) :rm (word)); return word; } /** * ffs - find first bit set * @x: the word to search * * This is defined the same way as * the libc and compiler builtin ffs routines, therefore * differs in spirit from the above ffz (man ffs). */ static __inline__ int ffs(int x) { int r; __asm__(bsfl %1,%0\n\t cmovzl %2,%0 : =r (r) : rm (x), r (-1)); return r+1; } /** * fls - find last bit set * @x: the word to search * * This is defined the same way as ffs. */ static inline int fls(int x) { int r; __asm__(bsrl %1,%0\n\t cmovzl %2,%0 : =r (r) : rm (x), rm (-1)); return r+1; } /** * fls64 - find last bit set in 64 bit word * @x: the word to search * * This is defined the same way as fls. */ static inline int fls64(uint64_t x) { if (x == 0) return 0; return __fls(x) + 1; } static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor) { return dividend / divisor; } #elif __i386 #define rdtscll(val) \ __asm__ __volatile__(rdtsc : =A (val)) /** * ffs - find first bit set * @x: the word to search * * This is defined the same way as * the libc and compiler builtin ffs routines, therefore * differs in spirit from the above ffz() (man ffs). */ static inline int ffs(int x) { int r; __asm__(bsfl %1,%0\n\t jnz 1f\n\t movl $-1,%0\n 1: : =r (r) : rm (x)); return r+1; } /** * fls - find last bit set * @x: the word to search * * This is defined the same way as ffs(). */ static inline int fls(int x) { int r; __asm__(bsrl %1,%0\n\t jnz 1f\n\t movl $-1,%0\n 1: : =r (r) : rm (x)); return r+1; } static inline int fls64(uint64_t x) { uint32_t h = x 32; if (h) return fls(h) + 32; return fls(x); } #define do_div(n,base) ({ \ unsigned long __upper, __low, __high, __mod, __base; \ __base = (base); \ asm(:=a (__low), =d (__high):A (n)); \ __upper = __high; \ if (__high) { \ __upper = __high % (__base); \ __high = __high / (__base); \ } \ asm(divl %2:=a (__low), =d (__mod):rm (__base), 0 (__low), 1 (__upper)); \ asm(:=A (n):a (__low),d (__high)); \ __mod; \ }) /* 64bit divisor, dividend and result. dynamic precision */ static uint64_t div64_64(uint64_t dividend, uint64_t divisor) { uint32_t d = divisor; if (divisor 0xULL) { unsigned int shift = fls(divisor 32); d = divisor shift; dividend = shift; } /* avoid 64 bit division if possible */ if (dividend 32) do_div(dividend, d); else dividend = (uint32_t) dividend / d; return dividend; } #endif /* Andi Kleen's version */ uint32_t acbrt(uint64_t x) { uint32_t y = 0; int s; for (s = 63; s = 0; s -= 3) { uint64_t b, bs; y = 2 * y; b = 3 * y * (y+1) + 1; bs = b s; if (x = bs (b == (bss))) { /* avoid overflow */ x -= bs; y++; } } return y; } /*
Re: [RFC] div64_64 support II
Andi Kleen wrote: The problem with these algorithms that tradoff one or more multiplies in order to avoid a divide is that they don't give anything and often lose when both multiplies and divides are emulated in software. Actually on rereading this: is there really any Linux port that emulates multiplies in software? I thought that was only done on really small microcontrollers or smart cards; but anything 32bit+ that runs Linux should have hardware multiply, shouldn't it? SPARC v8 does multiplies using an MSTEP instruction. -hpa - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Tue, Mar 06, 2007 at 10:29:41AM -0800, Stephen Hemminger wrote: Don't count the existing Newton-Raphson out. It turns out that to get enough precision for 32 bits, only 4 iterations are needed. By unrolling those, it gets much better timing. But did you fix the 2^43 bug too? SGI has already shipped 10TB Altixen, so it's not entirely theoretical. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Tue, 6 Mar 2007 20:48:41 +0100 Andi Kleen [EMAIL PROTECTED] wrote: On Tue, Mar 06, 2007 at 10:29:41AM -0800, Stephen Hemminger wrote: Don't count the existing Newton-Raphson out. It turns out that to get enough precision for 32 bits, only 4 iterations are needed. By unrolling those, it gets much better timing. But did you fix the 2^43 bug too? It was caused by not doing x^2 in 64 bit. SGI has already shipped 10TB Altixen, so it's not entirely theoretical. -Andi -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
From: Stephen Hemminger [EMAIL PROTECTED] Date: Tue, 6 Mar 2007 10:29:41 -0800 /* calculate the cubic root of x using Newton-Raphson */ static uint32_t ncubic(uint64_t a) { uint64_t x; /* Initial estimate is based on: * cbrt(x) = exp(log(x) / 3) */ x = 1u (fls64(a)/3); /* Converges in 3 iterations to 32 bits */ x = (2 * x + div64_64(a, x*x)) / 3; x = (2 * x + div64_64(a, x*x)) / 3; x = (2 * x + div64_64(a, x*x)) / 3; return x; } Indeed that will be the fastest variant for cpus with hw integer division. I did a quick sparc64 port, here is what I got: Function clocks mean(us) max(us) std(us) total error ocubic 529 0.3515.16 0.66 545101 ncubic 498 0.3312.83 0.36 576263 acbrt 427 0.2811.04 0.33 547562 hcbrt 393 0.2610.18 0.47 2410 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Tue, Mar 06, 2007 at 10:29:41 -0800, Stephen Hemminger wrote: Don't count the existing Newton-Raphson out. It turns out that to get enough precision for 32 bits, only 4 iterations are needed. By unrolling those, it gets much better timing. Slightly gross test program (with original cubic wraparound bug fixed). ... {~0, 2097151}, ^^^ this should be 2642245. Without serializing instruction before rdtsc and with one loop I do not get very accurate results (104 for ncubic, 1000 for others). #define rdtscll_serialize(val) \ __asm__ __volatile__(movl $0, %%eax\n\tcpuid\n\trdtsc\n : =A (val) : : ebx, ecx) Here Pentium D timings for 1000 loops. ~0, 2097151 Function clocks mean(us) max(us) std(us) total error ocubic 9120.306 20.3170.730 545101 ncubic 7770.261 14.7990.486 576263 acbrt 11680.392 21.6810.547 547562 hcbrt 8270.278 15.2440.3872410 ~0, 2642245 Function clocks mean(us) max(us) std(us) total error ocubic 9080.305 20.2100.656 7 ncubic 7750.260 14.7920.550 31169 acbrt 11760.395 22.0170.9702468 hcbrt 8260.278 15.3260.670 547504 And I found bug in gcc-4.1.2, it gave 0 for ncubic results when doing 1000 loops test... gcc-4.0.3 works. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote: ... And I found bug in gcc-4.1.2, it gave 0 for ncubic results when doing 1000 loops test... gcc-4.0.3 works. Found it. --- cbrt-test.c~2007-03-07 00:20:54.735248105 +0200 +++ cbrt-test.c 2007-03-07 00:21:03.964864343 +0200 @@ -209,7 +209,7 @@ __asm__(bsrl %1,%0\n\t cmovzl %2,%0 - : =r (r) : rm (x), rm (-1)); + : =r (r) : rm (x), rm (-1) : memory); return r+1; } Now Linux 2.6 does not have memory in fls, maybe it causes some gcc funnies some people are seeing. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On 03 Mar 2007 03:31:52 +0100 Andi Kleen [EMAIL PROTECTED] wrote: Stephen Hemminger [EMAIL PROTECTED] writes: Here is another way to handle the 64 bit divide case. It allows full 64 bit divide by adding the support routine GCC needs. Not supplying that was intentional by Linus so that people think twice (or more often) before they using such expensive operations. A plain / looks too innocent. Is it really needed by CUBIC anyways? It uses it for getting the cubic root, but the algorithm recommended by Hacker's Delight (great book) doesn't use any divisions at all. Probably better to use a better algorithm without divisions. I tried the code from Hacker's Delight. It is cool, but performance is CPU (and data) dependent: Average # of usecs per operation: Hacker Newton Pentium 3 68.6 90.4 T2050 98.6 92.0 U1400 450415 Xeon70 90 Xeon (newer)71 78 EM64T 21.8 24.6 AMD64 23.4 32.0 It might be worth the change for code size reduction though. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
From: Stephen Hemminger [EMAIL PROTECTED] Date: Mon, 5 Mar 2007 15:57:14 -0800 I tried the code from Hacker's Delight. It is cool, but performance is CPU (and data) dependent: Average # of usecs per operation: Interesting results. The problem with these algorithms that tradoff one or more multiplies in order to avoid a divide is that they don't give anything and often lose when both multiplies and divides are emulated in software. This is particularly true in this cube-root case from Hacker's Delight, because it's using 3 multiplies per iteration in place of one divide per iteration. Actually, sorry, there is only one real multiply in there since the other two can be computed using addition and shifts. Another thing is that the non-Hacker's Delight version iterates differently for different input values, so the input value space is very important to consider when comparing these two pieces of code. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
Stephen Hemminger [EMAIL PROTECTED] writes: Here is another way to handle the 64 bit divide case. It allows full 64 bit divide by adding the support routine GCC needs. Not supplying that was intentional by Linus so that people think twice (or more often) before they using such expensive operations. A plain / looks too innocent. Is it really needed by CUBIC anyways? It uses it for getting the cubic root, but the algorithm recommended by Hacker's Delight (great book) doesn't use any divisions at all. Probably better to use a better algorithm without divisions. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Feb 23 2007 17:05, Stephen Hemminger wrote: Since there already two users of full 64 bit division in the kernel, and other places maybe hiding out as well. Add a full 64/64 bit divide. Yes this expensive, but there are places where it is necessary. It is not clear if doing the scaling buys any advantage on 64 bit platforms, so for them a full divide is done. --- include/asm-arm/div64.h |2 ++ include/asm-generic/div64.h |8 include/asm-m68k/div64.h |2 ++ include/asm-mips/div64.h |8 include/asm-um/div64.h |1 + include/asm-xtensa/div64.h |1 + lib/div64.c | 22 ++ net/ipv4/tcp_cubic.c | 22 -- net/netfilter/xt_connbytes.c | 16 9 files changed, 44 insertions(+), 38 deletions(-) Actually, there is udivdi3 support in the kernel ./arch/arm26/lib/udivdi3.c ./arch/sh/lib/udivdi3.c ./arch/sparc/lib/udivdi3.S should not this be consolidated too? Jan -- ft: http://freshmeat.net/p/chaostables/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Mon, 26 Feb 2007 21:09:26 +0100 (MET) Jan Engelhardt [EMAIL PROTECTED] wrote: On Feb 23 2007 17:05, Stephen Hemminger wrote: Since there already two users of full 64 bit division in the kernel, and other places maybe hiding out as well. Add a full 64/64 bit divide. Yes this expensive, but there are places where it is necessary. It is not clear if doing the scaling buys any advantage on 64 bit platforms, so for them a full divide is done. --- include/asm-arm/div64.h |2 ++ include/asm-generic/div64.h |8 include/asm-m68k/div64.h |2 ++ include/asm-mips/div64.h |8 include/asm-um/div64.h |1 + include/asm-xtensa/div64.h |1 + lib/div64.c | 22 ++ net/ipv4/tcp_cubic.c | 22 -- net/netfilter/xt_connbytes.c | 16 9 files changed, 44 insertions(+), 38 deletions(-) Actually, there is udivdi3 support in the kernel ./arch/arm26/lib/udivdi3.c ./arch/sh/lib/udivdi3.c ./arch/sparc/lib/udivdi3.S should not this be consolidated too? Hmm. Those are the GCC internal versions, that are picked up but doing divide in place. Do we want to allow general 64 bit in kernel to be easily used? It could cause sloppy slow code, but it would look cleaner. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
Here is another way to handle the 64 bit divide case. It allows full 64 bit divide by adding the support routine GCC needs. --- arch/alpha/Kconfig |4 arch/arm/Kconfig |4 arch/arm26/Kconfig |4 arch/avr32/Kconfig |4 arch/cris/Kconfig|4 arch/frv/Kconfig |4 arch/h8300/Kconfig |4 arch/i386/Kconfig|4 arch/m32r/Kconfig|4 arch/m68k/Kconfig|4 arch/m68knommu/Kconfig |4 arch/mips/Kconfig|4 arch/parisc/Kconfig |4 arch/powerpc/Kconfig |4 arch/ppc/Kconfig |4 arch/s390/Kconfig|4 arch/sh64/Kconfig|4 arch/v850/Kconfig|3 +++ arch/xtensa/Kconfig |4 lib/Makefile |1 + lib/udivdi3.c| 37 + net/ipv4/tcp_cubic.c | 26 ++ net/netfilter/xt_connbytes.c | 19 +-- 23 files changed, 116 insertions(+), 42 deletions(-) --- pktgen.orig/net/ipv4/tcp_cubic.c2007-02-26 13:40:08.0 -0800 +++ pktgen/net/ipv4/tcp_cubic.c 2007-02-26 14:30:00.0 -0800 @@ -51,7 +51,6 @@ module_param(tcp_friendliness, int, 0644); MODULE_PARM_DESC(tcp_friendliness, turn on/off tcp friendliness); -#include asm/div64.h /* BIC TCP Parameters */ struct bictcp { @@ -93,27 +92,6 @@ tcp_sk(sk)-snd_ssthresh = initial_ssthresh; } -/* 64bit divisor, dividend and result. dynamic precision */ -static inline u_int64_t div64_64(u_int64_t dividend, u_int64_t divisor) -{ - u_int32_t d = divisor; - - if (divisor 0xULL) { - unsigned int shift = fls(divisor 32); - - d = divisor shift; - dividend = shift; - } - - /* avoid 64 bit division if possible */ - if (dividend 32) - do_div(dividend, d); - else - dividend = (uint32_t) dividend / d; - - return dividend; -} - /* * calculate the cubic root of x using Newton-Raphson */ @@ -134,7 +112,7 @@ */ do { x1 = x; - x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3; + x = (2 * x + (u32) (a / x*x)) / 3; } while (abs(x1 - x) 1); return x; --- pktgen.orig/net/netfilter/xt_connbytes.c2007-02-26 13:40:08.0 -0800 +++ pktgen/net/netfilter/xt_connbytes.c 2007-02-26 14:29:13.0 -0800 @@ -16,7 +16,6 @@ #include linux/netfilter/x_tables.h #include linux/netfilter/xt_connbytes.h -#include asm/div64.h #include asm/bitops.h MODULE_LICENSE(GPL); @@ -24,22 +23,6 @@ MODULE_DESCRIPTION(iptables match for matching number of pkts/bytes per connection); MODULE_ALIAS(ipt_connbytes); -/* 64bit divisor, dividend and result. dynamic precision */ -static u_int64_t div64_64(u_int64_t dividend, u_int64_t divisor) -{ - u_int32_t d = divisor; - - if (divisor 0xULL) { - unsigned int shift = fls(divisor 32); - - d = divisor shift; - dividend = shift; - } - - do_div(dividend, d); - return dividend; -} - static int match(const struct sk_buff *skb, const struct net_device *in, @@ -106,7 +89,7 @@ break; } if (pkts != 0) - what = div64_64(bytes, pkts); + what = bytes / pkts; break; } --- pktgen.orig/lib/Makefile2007-02-26 13:40:08.0 -0800 +++ pktgen/lib/Makefile 2007-02-26 14:17:31.0 -0800 @@ -28,6 +28,7 @@ lib-$(CONFIG_SEMAPHORE_SLEEPERS) += semaphore-sleepers.o lib-$(CONFIG_GENERIC_FIND_NEXT_BIT) += find_next_bit.o obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o +obj-$(CONFIG_GENERIC_UDIVDI3) += udivdi3.o obj-$(CONFIG_LOCK_KERNEL) += kernel_lock.o obj-$(CONFIG_PLIST) += plist.o obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o --- pktgen.orig/arch/alpha/Kconfig 2007-02-26 13:51:29.0 -0800 +++ pktgen/arch/alpha/Kconfig 2007-02-26 13:54:35.0 -0800 @@ -33,6 +33,10 @@ bool default n +config GENERIC_UDIVDI3 + bool + default y + config GENERIC_FIND_NEXT_BIT bool default y --- pktgen.orig/arch/arm/Kconfig2007-02-26 13:51:29.0 -0800 +++ pktgen/arch/arm/Kconfig 2007-02-26 13:54:57.0 -0800 @@ -90,6 +90,10 @@ bool default n +config GENERIC_UDIVDI3 + bool + default y + config GENERIC_HWEIGHT bool default y --- pktgen.orig/arch/arm26/Kconfig 2007-02-26 13:48:46.0 -0800 +++ pktgen/arch/arm26/Kconfig 2007-02-26 13:55:24.0 -0800 @@ -49,6 +49,10 @@ bool default n +config GENERIC_UDIVDI3 +
Re: [RFC] div64_64 support
On Feb 26 2007 13:28, Stephen Hemminger wrote: ./arch/arm26/lib/udivdi3.c ./arch/sh/lib/udivdi3.c ./arch/sparc/lib/udivdi3.S should not this be consolidated too? Hmm. Those are the GCC internal versions, that are picked up but doing divide in place. Do we want to allow general 64 bit in kernel to be easily used? It could cause sloppy slow code, but it would look cleaner. Then our reviewers should catch it, and if not, the janitors will (/me winks at R.P.J.Day and trivial@). @@ -134,7 +112,7 @@ */ do { x1 = x; - x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3; + x = (2 * x + (u32) (a / x*x)) / 3; Eye see a bug. Previously there was div64_64(a, x*x) which is equivalent to (a)/(x*x), or just: a/(x^2). But now you do a/x*x, which is equivalent to a*x/x (in the domain of real numbers). Furthermore, a/x*x is a-(a%x), which does not even remotely match a/(x^2). Please keep the math intact, thank you ;-) Jan -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Feb 26 2007 15:44, Stephen Hemminger wrote: - x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3; + x = (2 * x + (u32) (a / x*x)) / 3; Previously there was div64_64(a, x*x) which is equivalent to (a)/(x*x), or just: a/(x^2). But now you do a/x*x, which is equivalent to a*x/x (in the domain of real numbers). Furthermore, a/x*x is a-(a%x), which does not even remotely match a/(x^2). Been there, done that, don't want to repeat it... I am sorry I don't quite follow. Jan -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Tue, 27 Feb 2007 01:05:26 +0100 (MET) Jan Engelhardt [EMAIL PROTECTED] wrote: On Feb 26 2007 15:44, Stephen Hemminger wrote: - x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3; + x = (2 * x + (u32) (a / x*x)) / 3; Previously there was div64_64(a, x*x) which is equivalent to (a)/(x*x), or just: a/(x^2). But now you do a/x*x, which is equivalent to a*x/x (in the domain of real numbers). Furthermore, a/x*x is a-(a%x), which does not even remotely match a/(x^2). Been there, done that, don't want to repeat it... I am sorry I don't quite follow. Once before a missed paren's caused a TCP congestion window bug that took 6 months before it was found... -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Feb 26 2007 16:07, Stephen Hemminger wrote: On Feb 26 2007 15:44, Stephen Hemminger wrote: -x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3; +x = (2 * x + (u32) (a / x*x)) / 3; Previously there was div64_64(a, x*x) which is equivalent to (a)/(x*x), or just: a/(x^2). But now you do a/x*x, which is equivalent to a*x/x (in the domain of real numbers). Furthermore, a/x*x is a-(a%x), which does not even remotely match a/(x^2). Been there, done that, don't want to repeat it... I am sorry I don't quite follow. Once before a missed paren's caused a TCP congestion window bug that took 6 months before it was found... Hah, just as I expected. |On Tue, 27 Feb 2007 00:02:50 +0100 (MET), Jan Engelhardt wrote: |Then our reviewers should catch it, and if not, the janitors will. Jan -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
Stephen Hemminger wrote: Hmm. Those are the GCC internal versions, that are picked up but doing divide in place. Do we want to allow general 64 bit in kernel to be easily used? It could cause sloppy slow code, but it would look cleaner. ... and it would handle datatypes which may be architecture-dependent a lot cleaner. I thought the motivation for div64() was that a 64:32-32 divide could be done a lot faster on a number of platforms (including the important x86) than a generic 64:64-64 divide, but gcc doesn't handle the devolution automatically -- there is no such libgcc function. -hpa - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
I thought the motivation for div64() was that a 64:32-32 divide could be done a lot faster on a number of platforms (including the important x86) than a generic 64:64-64 divide, but gcc doesn't handle the devolution automatically -- there is no such libgcc function. That there's no such function in libgcc doesn't mean GCC cannot handle this; libgcc is a bunch of library functions that are really needed for generated code (because you really don't want to expand those functions inline everywhere) -- you won't find an addsi3 in libgcc either. There does exist a divmoddisi4, sort of. It used to be defined in three GCC targets, but commented out in all three. The NS32k is long gone. For Vax, a comment says the machine insn for this isn't used because it is just too slow. And the i386 version is disabled because it returns the wrong result on overflow (not the truncated 64-bit result, required by the implicit cast to 32-bit, but the i386 arch traps to the overflow handler). The only way to express the semantics you want in (GNU) C is to use asm() -- and that's exactly what div64() does :-) Blame it on the C language, but not on GCC. Not this time. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On 2/26/07, Stephen Hemminger [EMAIL PROTECTED] wrote: Here is another way to handle the 64 bit divide case. It allows full 64 bit divide by adding the support routine GCC needs. snip I know ARM already went through the process of removing __udivdi3 support: http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=2723/2 Copying Russell and Nicolas as a heads up. -- Dan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Fri, Feb 23, 2007 at 17:05:27 -0800, Stephen Hemminger wrote: Since there already two users of full 64 bit division in the kernel, and other places maybe hiding out as well. Add a full 64/64 bit divide. Yes this expensive, but there are places where it is necessary. It is not clear if doing the scaling buys any advantage on 64 bit platforms, so for them a full divide is done. Still does not work after these fixes... how came? WARNING: div64_64 [net/netfilter/xt_connbytes.ko] undefined! WARNING: div64_64 [net/ipv4/tcp_cubic.ko] undefined! --- linux-2.6.19/include/asm-i386/div64.h.bak 2006-11-29 23:57:37.0 +0200 +++ linux-2.6.19/include/asm-i386/div64.h 2007-02-24 16:24:55.822529880 +0200 @@ -45,4 +45,7 @@ div_ll_X_l_rem(long long divs, long div, return dum2; } + +extern uint64_t div64_64(uint64_t dividend, uint64_t divisor); + #endif --- linux-2.6.19/lib/div64.c.bak2007-02-24 16:10:03.686084000 +0200 +++ linux-2.6.19/lib/div64.c2007-02-24 17:01:11.224517353 +0200 @@ -80,4 +80,6 @@ uint64_t div64_64(uint64_t dividend, uin return dividend; } +EXPORT_SYMBOL(div64_64); + #endif /* BITS_PER_LONG == 32 */ -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html