Re: [RFC] div64_64 support

2007-03-07 Thread Chuck Ebbert
Sami Farin wrote:
 On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote:
 ...
 And I found bug in gcc-4.1.2, it gave 0 for ncubic results
 when doing 1000 loops test... gcc-4.0.3 works.
 
 Found it.
 
 --- cbrt-test.c~  2007-03-07 00:20:54.735248105 +0200
 +++ cbrt-test.c   2007-03-07 00:21:03.964864343 +0200
 @@ -209,7 +209,7 @@
  
   __asm__(bsrl %1,%0\n\t
   cmovzl %2,%0
 - : =r (r) : rm (x), rm (-1));
 + : =r (r) : rm (x), rm (-1) : memory);
   return r+1;
  }
  
 Now Linux 2.6 does not have memory in fls, maybe it causes
 some gcc funnies some people are seeing.

Can you post the difference in the generated code with that change?

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-06 Thread Andi Kleen
On Mon, Mar 05, 2007 at 03:57:14PM -0800, Stephen Hemminger wrote:
 On 03 Mar 2007 03:31:52 +0100
 Andi Kleen [EMAIL PROTECTED] wrote:
 
  Stephen Hemminger [EMAIL PROTECTED] writes:
  
   Here is another way to handle the 64 bit divide case.
   It allows full 64 bit divide by adding the support routine
   GCC needs.
  
  Not supplying that was intentional by Linus so that people
  think twice (or more often) before they using such expensive
  operations. A plain / looks too innocent.
  
  Is it really needed by CUBIC anyways?  It uses it for getting
  the cubic root, but the algorithm recommended by Hacker's Delight
  (great book) doesn't use any divisions at all. Probably better 
  to use a better algorithm without divisions.
  
 
 I tried the code from Hacker's Delight.
 It is cool, but performance is CPU (and data) dependent:

I did too. My experiences were mixed: on 32bit it was slower,
on 64bit faster on average.  Strangely the 32bit version ran
faster again without -fomit-frame-pointer, so it's likely
some funny interaction with 32bit long long code generation.
The difference is never more than 100 cycles so it shouldn't
be a big issue either way.

For some input arguments (1% in my testing)
it also gave an answer 1 off from the existing code, 
but I don't think that's a problem.

But more importantly during testing I found that the cubic
code gives a division by zero for input arguments 2^43. If you
have a system with 16TB of memory this could actually be a remotely
exploitable bug :)

I still think it's a good idea to switch to the new function,
especially since it's shorter code.

Here's the patch. Note I didn't verify it with real large window
TCP operations; only unit testing.

-Andi

Use Hacker's delight cube root algorithm in cubic TCP

Shorter code and fixes a theoretically remote exploitable bug.

Signed-off-by: Andi Kleen [EMAIL PROTECTED]


Index: linux-2.6.21-rc1-net/net/ipv4/tcp_cubic.c
===
--- linux-2.6.21-rc1-net.orig/net/ipv4/tcp_cubic.c
+++ linux-2.6.21-rc1-net/net/ipv4/tcp_cubic.c
@@ -93,51 +93,24 @@ static void bictcp_init(struct sock *sk)
tcp_sk(sk)-snd_ssthresh = initial_ssthresh;
 }
 
-/* 64bit divisor, dividend and result. dynamic precision */
-static inline u_int64_t div64_64(u_int64_t dividend, u_int64_t divisor)
+static u32 cubic_root(u64 x)
 {
-   u_int32_t d = divisor;
-
-   if (divisor  0xULL) {
-   unsigned int shift = fls(divisor  32);
-
-   d = divisor  shift;
-   dividend = shift;
-   }
-
-   /* avoid 64 bit division if possible */
-   if (dividend  32)
-   do_div(dividend, d);
-   else
-   dividend = (uint32_t) dividend / d;
-
-   return dividend;
-}
-
-/*
- * calculate the cubic root of x using Newton-Raphson
- */
-static u32 cubic_root(u64 a)
-{
-   u32 x, x1;
-
-   /* Initial estimate is based on:
-* cbrt(x) = exp(log(x) / 3)
-*/
-   x = 1u  (fls64(a)/3);
-
-   /*
-* Iteration based on:
-* 2
-* x= ( 2 * x  +  a / x  ) / 3
-*  k+1  k k
-*/
-   do {
-   x1 = x;
-   x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3;
-   } while (abs(x1 - x)  1);
-
-   return x;
+   int s;
+   u32 y;
+   u64 b;
+   u64 bs;
+
+   y = 0;
+   for (s = 63; s = 0; s -= 3) {
+   y = 2 * y;
+   b = 3 * y * (y+1) + 1;
+   bs = b  s;
+   if (x = bs  (b == (bss))) {  /* avoid overflow */
+   x -= bs;
+   y++;
+   }
+   }
+   return y;
 }
 
 /*
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-06 Thread Andi Kleen
On Mon, Mar 05, 2007 at 04:25:51PM -0800, David Miller wrote:
 Another thing is that the non-Hacker's Delight version iterates
 differently for different input values, so the input value space is
 very important to consider when comparing these two pieces of code.

I did some stochastic testing on my version. It gave 1 off for  1% of
the arguments.  Probably not an issue.

Besides it actually works for 2^43 @)

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support II

2007-03-06 Thread Andi Kleen
 The problem with these algorithms that tradoff one or more
 multiplies in order to avoid a divide is that they don't
 give anything and often lose when both multiplies and
 divides are emulated in software.

Actually on rereading this: is there really any Linux port
that emulates multiplies in software? I thought that was only
done on really small microcontrollers or smart cards; but anything
32bit+ that runs Linux should have hardware multiply, shouldn't it?

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-06 Thread Eric Dumazet
On Tuesday 06 March 2007 14:34, Andi Kleen wrote:

 - return x;
 + int s;
 + u32 y;
 + u64 b;
 + u64 bs;
 +
 + y = 0;
 + for (s = 63; s = 0; s -= 3) {
 + y = 2 * y;
 + b = 3 * y * (y+1) + 1;
 + bs = b  s;
 + if (x = bs  (b == (bss))) {  /* avoid overflow */
 + x -= bs;
 + y++;
 + }
 + }
 + return y;
  }


Andi

rant
Let me see... You throw code like that and expect someone to actually 
understand it in one year, and be able to correct a bug ?
/rant

Please add something, an URL or even better a nice explanation, per favor...

Thank you
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-06 Thread Andi Kleen
 rant
 Let me see... You throw code like that and expect someone to actually 
 understand it in one year, and be able to correct a bug ?

To be honest I don't expect any bugs in this function.

 /rant
 
 Please add something, an URL or even better a nice explanation, per favor...

It's straight out of Hacker's delight which is referenced in the commit
log.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support II

2007-03-06 Thread Dagfinn Ilmari Mannsåker
Andi Kleen [EMAIL PROTECTED] writes:

 Actually on rereading this: is there really any Linux port
 that emulates multiplies in software? I thought that was only
 done on really small microcontrollers or smart cards; but anything
 32bit+ that runs Linux should have hardware multiply, shouldn't it?

SPARCv7 (sun4/sun4c) doesn't have hardware mul/div. This includes
SparcStation 1, 1+, 2, SLC, ELC, IPC and IPX.

-- 
ilmari
A disappointingly low fraction of the human race is,
 at any given time, on fire. - Stig Sandbeck Mathisen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support II

2007-03-06 Thread David Miller
From: [EMAIL PROTECTED] (Dagfinn Ilmari Mannsåker)
Date: Tue, 06 Mar 2007 18:43:14 +0100

 Andi Kleen [EMAIL PROTECTED] writes:
 
  Actually on rereading this: is there really any Linux port
  that emulates multiplies in software? I thought that was only
  done on really small microcontrollers or smart cards; but anything
  32bit+ that runs Linux should have hardware multiply, shouldn't it?
 
 SPARCv7 (sun4/sun4c) doesn't have hardware mul/div. This includes
 SparcStation 1, 1+, 2, SLC, ELC, IPC and IPX.

Right and the cypress sun4m parts don't do multiply/divide in
hardware either.

I believe the Alpha does divide in software too, see
arch/alpha/lib/divide.S
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-06 Thread Stephen Hemminger
Don't count the existing Newton-Raphson out. It turns out that to get enough
precision for 32 bits, only 4 iterations are needed. By unrolling those, it
gets much better timing.

Slightly gross test program (with original cubic wraparound bug fixed).

---
/* Test and measure perf of cube root algorithms.  */
#include stdio.h
#include stdlib.h
#include stdint.h
#include math.h

#ifdef __x86_64

#define rdtscll(val) do { \
 unsigned int __a,__d; \
 asm volatile(rdtsc : =a (__a), =d (__d)); \
 (val) = ((unsigned long)__a) | (((unsigned long)__d)32); \
} while(0)

# define do_div(n,base) ({  \
uint32_t __base = (base);   \
uint32_t __rem; \
__rem = ((uint64_t)(n)) % __base;   \
(n) = ((uint64_t)(n)) / __base; \
__rem;  \
 })


/**
 * __ffs - find first bit in word.
 * @word: The word to search
 *
 * Undefined if no bit exists, so code should check against 0 first.
 */
static __inline__ unsigned long __ffs(unsigned long word)
{
__asm__(bsfq %1,%0
:=r (word)
:rm (word));
return word;
}

/*
 * __fls: find last bit set.
 * @word: The word to search
 *
 * Undefined if no zero exists, so code should check against ~0UL first.
 */
static inline unsigned long __fls(unsigned long word)
{
__asm__(bsrq %1,%0
:=r (word)
:rm (word));
return word;
}

/**
 * ffs - find first bit set
 * @x: the word to search
 *
 * This is defined the same way as
 * the libc and compiler builtin ffs routines, therefore
 * differs in spirit from the above ffz (man ffs).
 */
static __inline__ int ffs(int x)
{
int r;

__asm__(bsfl %1,%0\n\t
cmovzl %2,%0 
: =r (r) : rm (x), r (-1));
return r+1;
}

/**
 * fls - find last bit set
 * @x: the word to search
 *
 * This is defined the same way as ffs.
 */
static inline int fls(int x)
{
int r;

__asm__(bsrl %1,%0\n\t
cmovzl %2,%0
: =r (r) : rm (x), rm (-1));
return r+1;
}

/**
 * fls64 - find last bit set in 64 bit word
 * @x: the word to search
 *
 * This is defined the same way as fls.
 */
static inline int fls64(uint64_t x)
{
if (x == 0)
return 0;
return __fls(x) + 1;
}

static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor)
{
return dividend / divisor;
}

#elif __i386

#define rdtscll(val) \
 __asm__ __volatile__(rdtsc : =A (val))

/**
 * ffs - find first bit set
 * @x: the word to search
 *
 * This is defined the same way as
 * the libc and compiler builtin ffs routines, therefore
 * differs in spirit from the above ffz() (man ffs).
 */
static inline int ffs(int x)
{
int r;

__asm__(bsfl %1,%0\n\t
jnz 1f\n\t
movl $-1,%0\n
1: : =r (r) : rm (x));
return r+1;
}

/**
 * fls - find last bit set
 * @x: the word to search
 *
 * This is defined the same way as ffs().
 */
static inline int fls(int x)
{
int r;

__asm__(bsrl %1,%0\n\t
jnz 1f\n\t
movl $-1,%0\n
1: : =r (r) : rm (x));
return r+1;
}

static inline int fls64(uint64_t x)
{
uint32_t h = x  32;
if (h)
return fls(h) + 32;
return fls(x);
}


#define do_div(n,base) ({ \
unsigned long __upper, __low, __high, __mod, __base; \
__base = (base); \
asm(:=a (__low), =d (__high):A (n)); \
__upper = __high; \
if (__high) { \
__upper = __high % (__base); \
__high = __high / (__base); \
} \
asm(divl %2:=a (__low), =d (__mod):rm (__base), 0 (__low), 
1 (__upper)); \
asm(:=A (n):a (__low),d (__high)); \
__mod; \
})


/* 64bit divisor, dividend and result. dynamic precision */
static uint64_t div64_64(uint64_t dividend, uint64_t divisor)
{
uint32_t d = divisor;

if (divisor  0xULL) {
unsigned int shift = fls(divisor  32);

d = divisor  shift;
dividend = shift;
}

/* avoid 64 bit division if possible */
if (dividend  32)
do_div(dividend, d);
else
dividend = (uint32_t) dividend / d;

return dividend;
}
#endif

/* Andi Kleen's version */
uint32_t acbrt(uint64_t x)
{
uint32_t y = 0;
int s;

for (s = 63; s = 0; s -= 3) {
uint64_t b, bs;

y = 2 * y;
b = 3 * y * (y+1) + 1;
bs = b  s;
if (x = bs  (b == (bss))) {  /* avoid overflow */
x -= bs;
y++;
}
}
return y;
}

/* 

Re: [RFC] div64_64 support II

2007-03-06 Thread H. Peter Anvin

Andi Kleen wrote:

The problem with these algorithms that tradoff one or more
multiplies in order to avoid a divide is that they don't
give anything and often lose when both multiplies and
divides are emulated in software.


Actually on rereading this: is there really any Linux port
that emulates multiplies in software? I thought that was only
done on really small microcontrollers or smart cards; but anything
32bit+ that runs Linux should have hardware multiply, shouldn't it?


SPARC  v8 does multiplies using an MSTEP instruction.

-hpa
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-06 Thread Andi Kleen
On Tue, Mar 06, 2007 at 10:29:41AM -0800, Stephen Hemminger wrote:
 Don't count the existing Newton-Raphson out. It turns out that to get enough
 precision for 32 bits, only 4 iterations are needed. By unrolling those, it
 gets much better timing.

But did you fix the 2^43 bug too?

SGI has already shipped 10TB Altixen, so it's not entirely theoretical.

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-06 Thread Stephen Hemminger
On Tue, 6 Mar 2007 20:48:41 +0100
Andi Kleen [EMAIL PROTECTED] wrote:

 On Tue, Mar 06, 2007 at 10:29:41AM -0800, Stephen Hemminger wrote:
  Don't count the existing Newton-Raphson out. It turns out that to get enough
  precision for 32 bits, only 4 iterations are needed. By unrolling those, it
  gets much better timing.
 
 But did you fix the 2^43 bug too?

It was caused by not doing x^2 in 64 bit.

 
 SGI has already shipped 10TB Altixen, so it's not entirely theoretical.
 
 -Andi
 


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-06 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Tue, 6 Mar 2007 10:29:41 -0800

 /* calculate the cubic root of x using Newton-Raphson */
 static uint32_t ncubic(uint64_t a)
 {
   uint64_t x;
 
   /* Initial estimate is based on:
* cbrt(x) = exp(log(x) / 3)
*/
   x = 1u  (fls64(a)/3);
 
   /* Converges in 3 iterations to  32 bits */
 
   x = (2 * x + div64_64(a, x*x)) / 3;
   x = (2 * x + div64_64(a, x*x)) / 3;
   x = (2 * x + div64_64(a, x*x)) / 3;
 
   return x;
 }

Indeed that will be the fastest variant for cpus with hw
integer division.

I did a quick sparc64 port, here is what I got:

Function clocks  mean(us) max(us)  std(us)  total error
ocubic  529 0.3515.16 0.66 545101
ncubic  498 0.3312.83 0.36 576263
acbrt   427 0.2811.04 0.33 547562
hcbrt   393 0.2610.18 0.47 2410
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-06 Thread Sami Farin
On Tue, Mar 06, 2007 at 10:29:41 -0800, Stephen Hemminger wrote:
 Don't count the existing Newton-Raphson out. It turns out that to get enough
 precision for 32 bits, only 4 iterations are needed. By unrolling those, it
 gets much better timing.
 
 Slightly gross test program (with original cubic wraparound bug fixed).
...
   {~0, 2097151},
 ^^^
this should be 2642245.

Without serializing instruction before rdtsc and with one loop
I do not get very accurate results (104 for ncubic,  1000 for others).

#define rdtscll_serialize(val) \
  __asm__ __volatile__(movl $0, %%eax\n\tcpuid\n\trdtsc\n : =A (val) : : 
ebx, ecx)

Here Pentium D timings for 1000 loops. 

~0, 2097151

Function clocks mean(us)  max(us)  std(us) total error
ocubic  9120.306   20.3170.730  545101
ncubic  7770.261   14.7990.486  576263
acbrt  11680.392   21.6810.547  547562
hcbrt   8270.278   15.2440.3872410

~0, 2642245

Function clocks mean(us)  max(us)  std(us) total error
ocubic  9080.305   20.2100.656   7
ncubic  7750.260   14.7920.550   31169
acbrt  11760.395   22.0170.9702468
hcbrt   8260.278   15.3260.670  547504

And I found bug in gcc-4.1.2, it gave 0 for ncubic results
when doing 1000 loops test... gcc-4.0.3 works.

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-06 Thread Sami Farin
On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote:
...
 And I found bug in gcc-4.1.2, it gave 0 for ncubic results
 when doing 1000 loops test... gcc-4.0.3 works.

Found it.

--- cbrt-test.c~2007-03-07 00:20:54.735248105 +0200
+++ cbrt-test.c 2007-03-07 00:21:03.964864343 +0200
@@ -209,7 +209,7 @@
 
__asm__(bsrl %1,%0\n\t
cmovzl %2,%0
-   : =r (r) : rm (x), rm (-1));
+   : =r (r) : rm (x), rm (-1) : memory);
return r+1;
 }
 
Now Linux 2.6 does not have memory in fls, maybe it causes
some gcc funnies some people are seeing.

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-05 Thread Stephen Hemminger
On 03 Mar 2007 03:31:52 +0100
Andi Kleen [EMAIL PROTECTED] wrote:

 Stephen Hemminger [EMAIL PROTECTED] writes:
 
  Here is another way to handle the 64 bit divide case.
  It allows full 64 bit divide by adding the support routine
  GCC needs.
 
 Not supplying that was intentional by Linus so that people
 think twice (or more often) before they using such expensive
 operations. A plain / looks too innocent.
 
 Is it really needed by CUBIC anyways?  It uses it for getting
 the cubic root, but the algorithm recommended by Hacker's Delight
 (great book) doesn't use any divisions at all. Probably better 
 to use a better algorithm without divisions.
 

I tried the code from Hacker's Delight.
It is cool, but performance is CPU (and data) dependent:

Average # of usecs per operation:

Hacker  Newton
Pentium 3   68.6   90.4
T2050   98.6   92.0
U1400   450415
Xeon70 90
Xeon (newer)71 78

EM64T   21.8   24.6
AMD64   23.4   32.0

It might be worth the change for code size reduction though.


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-05 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Mon, 5 Mar 2007 15:57:14 -0800

 I tried the code from Hacker's Delight.
 It is cool, but performance is CPU (and data) dependent:
 
 Average # of usecs per operation:

Interesting results.

The problem with these algorithms that tradoff one or more
multiplies in order to avoid a divide is that they don't
give anything and often lose when both multiplies and
divides are emulated in software.

This is particularly true in this cube-root case from Hacker's
Delight, because it's using 3 multiplies per iteration in place of one
divide per iteration.

Actually, sorry, there is only one real multiply in there since the
other two can be computed using addition and shifts.

Another thing is that the non-Hacker's Delight version iterates
differently for different input values, so the input value space is
very important to consider when comparing these two pieces of code.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-03-02 Thread Andi Kleen
Stephen Hemminger [EMAIL PROTECTED] writes:

 Here is another way to handle the 64 bit divide case.
 It allows full 64 bit divide by adding the support routine
 GCC needs.

Not supplying that was intentional by Linus so that people
think twice (or more often) before they using such expensive
operations. A plain / looks too innocent.

Is it really needed by CUBIC anyways?  It uses it for getting
the cubic root, but the algorithm recommended by Hacker's Delight
(great book) doesn't use any divisions at all. Probably better 
to use a better algorithm without divisions.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-02-26 Thread Jan Engelhardt

On Feb 23 2007 17:05, Stephen Hemminger wrote:

Since there already two users of full 64 bit division in the kernel,
and other places maybe hiding out as well. Add a full 64/64 bit divide.

Yes this expensive, but there are places where it is necessary.
It is not clear if doing the scaling buys any advantage on 64 bit platforms,
so for them a full divide is done.

---
 include/asm-arm/div64.h  |2 ++
 include/asm-generic/div64.h  |8 
 include/asm-m68k/div64.h |2 ++
 include/asm-mips/div64.h |8 
 include/asm-um/div64.h   |1 +
 include/asm-xtensa/div64.h   |1 +
 lib/div64.c  |   22 ++
 net/ipv4/tcp_cubic.c |   22 --
 net/netfilter/xt_connbytes.c |   16 
 9 files changed, 44 insertions(+), 38 deletions(-)

Actually, there is udivdi3 support in the kernel

./arch/arm26/lib/udivdi3.c
./arch/sh/lib/udivdi3.c
./arch/sparc/lib/udivdi3.S

should not this be consolidated too?




Jan
-- 
ft: http://freshmeat.net/p/chaostables/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-02-26 Thread Stephen Hemminger
On Mon, 26 Feb 2007 21:09:26 +0100 (MET)
Jan Engelhardt [EMAIL PROTECTED] wrote:

 
 On Feb 23 2007 17:05, Stephen Hemminger wrote:
 
 Since there already two users of full 64 bit division in the kernel,
 and other places maybe hiding out as well. Add a full 64/64 bit divide.
 
 Yes this expensive, but there are places where it is necessary.
 It is not clear if doing the scaling buys any advantage on 64 bit platforms,
 so for them a full divide is done.
 
 ---
  include/asm-arm/div64.h  |2 ++
  include/asm-generic/div64.h  |8 
  include/asm-m68k/div64.h |2 ++
  include/asm-mips/div64.h |8 
  include/asm-um/div64.h   |1 +
  include/asm-xtensa/div64.h   |1 +
  lib/div64.c  |   22 ++
  net/ipv4/tcp_cubic.c |   22 --
  net/netfilter/xt_connbytes.c |   16 
  9 files changed, 44 insertions(+), 38 deletions(-)
 
 Actually, there is udivdi3 support in the kernel
 
 ./arch/arm26/lib/udivdi3.c
 ./arch/sh/lib/udivdi3.c
 ./arch/sparc/lib/udivdi3.S
 
 should not this be consolidated too?


Hmm. Those are the GCC internal versions, that are picked up but
doing divide in place. Do we want to allow general 64 bit in kernel to
be easily used? It could cause sloppy slow code, but it would look
cleaner.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-02-26 Thread Stephen Hemminger
Here is another way to handle the 64 bit divide case.
It allows full 64 bit divide by adding the support routine
GCC needs.

---
 arch/alpha/Kconfig   |4 
 arch/arm/Kconfig |4 
 arch/arm26/Kconfig   |4 
 arch/avr32/Kconfig   |4 
 arch/cris/Kconfig|4 
 arch/frv/Kconfig |4 
 arch/h8300/Kconfig   |4 
 arch/i386/Kconfig|4 
 arch/m32r/Kconfig|4 
 arch/m68k/Kconfig|4 
 arch/m68knommu/Kconfig   |4 
 arch/mips/Kconfig|4 
 arch/parisc/Kconfig  |4 
 arch/powerpc/Kconfig |4 
 arch/ppc/Kconfig |4 
 arch/s390/Kconfig|4 
 arch/sh64/Kconfig|4 
 arch/v850/Kconfig|3 +++
 arch/xtensa/Kconfig  |4 
 lib/Makefile |1 +
 lib/udivdi3.c|   37 +
 net/ipv4/tcp_cubic.c |   26 ++
 net/netfilter/xt_connbytes.c |   19 +--
 23 files changed, 116 insertions(+), 42 deletions(-)

--- pktgen.orig/net/ipv4/tcp_cubic.c2007-02-26 13:40:08.0 -0800
+++ pktgen/net/ipv4/tcp_cubic.c 2007-02-26 14:30:00.0 -0800
@@ -51,7 +51,6 @@
 module_param(tcp_friendliness, int, 0644);
 MODULE_PARM_DESC(tcp_friendliness, turn on/off tcp friendliness);
 
-#include asm/div64.h
 
 /* BIC TCP Parameters */
 struct bictcp {
@@ -93,27 +92,6 @@
tcp_sk(sk)-snd_ssthresh = initial_ssthresh;
 }
 
-/* 64bit divisor, dividend and result. dynamic precision */
-static inline u_int64_t div64_64(u_int64_t dividend, u_int64_t divisor)
-{
-   u_int32_t d = divisor;
-
-   if (divisor  0xULL) {
-   unsigned int shift = fls(divisor  32);
-
-   d = divisor  shift;
-   dividend = shift;
-   }
-
-   /* avoid 64 bit division if possible */
-   if (dividend  32)
-   do_div(dividend, d);
-   else
-   dividend = (uint32_t) dividend / d;
-
-   return dividend;
-}
-
 /*
  * calculate the cubic root of x using Newton-Raphson
  */
@@ -134,7 +112,7 @@
 */
do {
x1 = x;
-   x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3;
+   x = (2 * x + (u32) (a / x*x)) / 3;
} while (abs(x1 - x)  1);
 
return x;
--- pktgen.orig/net/netfilter/xt_connbytes.c2007-02-26 13:40:08.0 
-0800
+++ pktgen/net/netfilter/xt_connbytes.c 2007-02-26 14:29:13.0 -0800
@@ -16,7 +16,6 @@
 #include linux/netfilter/x_tables.h
 #include linux/netfilter/xt_connbytes.h
 
-#include asm/div64.h
 #include asm/bitops.h
 
 MODULE_LICENSE(GPL);
@@ -24,22 +23,6 @@
 MODULE_DESCRIPTION(iptables match for matching number of pkts/bytes per 
connection);
 MODULE_ALIAS(ipt_connbytes);
 
-/* 64bit divisor, dividend and result. dynamic precision */
-static u_int64_t div64_64(u_int64_t dividend, u_int64_t divisor)
-{
-   u_int32_t d = divisor;
-
-   if (divisor  0xULL) {
-   unsigned int shift = fls(divisor  32);
-
-   d = divisor  shift;
-   dividend = shift;
-   }
-
-   do_div(dividend, d);
-   return dividend;
-}
-
 static int
 match(const struct sk_buff *skb,
   const struct net_device *in,
@@ -106,7 +89,7 @@
break;
}
if (pkts != 0)
-   what = div64_64(bytes, pkts);
+   what = bytes / pkts;
break;
}
 
--- pktgen.orig/lib/Makefile2007-02-26 13:40:08.0 -0800
+++ pktgen/lib/Makefile 2007-02-26 14:17:31.0 -0800
@@ -28,6 +28,7 @@
 lib-$(CONFIG_SEMAPHORE_SLEEPERS) += semaphore-sleepers.o
 lib-$(CONFIG_GENERIC_FIND_NEXT_BIT) += find_next_bit.o
 obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o
+obj-$(CONFIG_GENERIC_UDIVDI3) += udivdi3.o
 obj-$(CONFIG_LOCK_KERNEL) += kernel_lock.o
 obj-$(CONFIG_PLIST) += plist.o
 obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
--- pktgen.orig/arch/alpha/Kconfig  2007-02-26 13:51:29.0 -0800
+++ pktgen/arch/alpha/Kconfig   2007-02-26 13:54:35.0 -0800
@@ -33,6 +33,10 @@
bool
default n
 
+config GENERIC_UDIVDI3
+   bool
+   default y
+
 config GENERIC_FIND_NEXT_BIT
bool
default y
--- pktgen.orig/arch/arm/Kconfig2007-02-26 13:51:29.0 -0800
+++ pktgen/arch/arm/Kconfig 2007-02-26 13:54:57.0 -0800
@@ -90,6 +90,10 @@
bool
default n
 
+config GENERIC_UDIVDI3
+   bool
+   default y
+
 config GENERIC_HWEIGHT
bool
default y
--- pktgen.orig/arch/arm26/Kconfig  2007-02-26 13:48:46.0 -0800
+++ pktgen/arch/arm26/Kconfig   2007-02-26 13:55:24.0 -0800
@@ -49,6 +49,10 @@
bool
default n
 
+config GENERIC_UDIVDI3
+   

Re: [RFC] div64_64 support

2007-02-26 Thread Jan Engelhardt

On Feb 26 2007 13:28, Stephen Hemminger wrote:
 
 ./arch/arm26/lib/udivdi3.c
 ./arch/sh/lib/udivdi3.c
 ./arch/sparc/lib/udivdi3.S
 
 should not this be consolidated too?

Hmm. Those are the GCC internal versions, that are picked up but
doing divide in place. Do we want to allow general 64 bit in kernel to
be easily used? It could cause sloppy slow code, but it would look
cleaner.

Then our reviewers should catch it, and if not, the janitors will
(/me winks at R.P.J.Day and trivial@).

@@ -134,7 +112,7 @@
*/
   do {
   x1 = x;
-  x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3;
+  x = (2 * x + (u32) (a / x*x)) / 3;

Eye see a bug.

Previously there was div64_64(a, x*x) which is equivalent to
(a)/(x*x), or just: a/(x^2). But now you do a/x*x, which is
equivalent to a*x/x (in the domain of real numbers). Furthermore,
a/x*x is a-(a%x), which does not even remotely match a/(x^2).

Please keep the math intact, thank you ;-)



Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-02-26 Thread Jan Engelhardt

On Feb 26 2007 15:44, Stephen Hemminger wrote:
 -   x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3;
 +   x = (2 * x + (u32) (a / x*x)) / 3;
 
 Previously there was div64_64(a, x*x) which is equivalent to
 (a)/(x*x), or just: a/(x^2). But now you do a/x*x, which is
 equivalent to a*x/x (in the domain of real numbers). Furthermore,
 a/x*x is a-(a%x), which does not even remotely match a/(x^2).
 
Been there, done that, don't want to repeat it...

I am sorry I don't quite follow.


Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-02-26 Thread Stephen Hemminger
On Tue, 27 Feb 2007 01:05:26 +0100 (MET)
Jan Engelhardt [EMAIL PROTECTED] wrote:

 
 On Feb 26 2007 15:44, Stephen Hemminger wrote:
  - x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3;
  + x = (2 * x + (u32) (a / x*x)) / 3;
  
  Previously there was div64_64(a, x*x) which is equivalent to
  (a)/(x*x), or just: a/(x^2). But now you do a/x*x, which is
  equivalent to a*x/x (in the domain of real numbers). Furthermore,
  a/x*x is a-(a%x), which does not even remotely match a/(x^2).
  
 Been there, done that, don't want to repeat it...
 
 I am sorry I don't quite follow.

Once before a missed paren's caused a TCP congestion window bug that
took 6 months before it was found...

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-02-26 Thread Jan Engelhardt

On Feb 26 2007 16:07, Stephen Hemminger wrote:
 On Feb 26 2007 15:44, Stephen Hemminger wrote:
  -x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3;
  +x = (2 * x + (u32) (a / x*x)) / 3;
  
  Previously there was div64_64(a, x*x) which is equivalent to
  (a)/(x*x), or just: a/(x^2). But now you do a/x*x, which is
  equivalent to a*x/x (in the domain of real numbers). Furthermore,
  a/x*x is a-(a%x), which does not even remotely match a/(x^2).
  
 Been there, done that, don't want to repeat it...
 
 I am sorry I don't quite follow.

Once before a missed paren's caused a TCP congestion window bug that
took 6 months before it was found...

Hah, just as I expected.

|On Tue, 27 Feb 2007 00:02:50 +0100 (MET), Jan Engelhardt wrote:
|Then our reviewers should catch it, and if not, the janitors will.


Jan
-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-02-26 Thread H. Peter Anvin

Stephen Hemminger wrote:


Hmm. Those are the GCC internal versions, that are picked up but
doing divide in place. Do we want to allow general 64 bit in kernel to
be easily used? It could cause sloppy slow code, but it would look
cleaner.



... and it would handle datatypes which may be architecture-dependent a 
lot cleaner.


I thought the motivation for div64() was that a 64:32-32 divide could 
be done a lot faster on a number of platforms (including the important 
x86) than a generic 64:64-64 divide, but gcc doesn't handle the 
devolution automatically -- there is no such libgcc function.


-hpa
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-02-26 Thread Segher Boessenkool
I thought the motivation for div64() was that a 64:32-32 divide could 
be done a lot faster on a number of platforms (including the important 
x86) than a generic 64:64-64 divide, but gcc doesn't handle the 
devolution automatically -- there is no such libgcc function.


That there's no such function in libgcc doesn't mean GCC
cannot handle this; libgcc is a bunch of library functions
that are really needed for generated code (because you
really don't want to expand those functions inline
everywhere) -- you won't find an addsi3 in libgcc either.

There does exist a divmoddisi4, sort of.

It used to be defined in three GCC targets, but commented
out in all three.  The NS32k is long gone.  For Vax, a
comment says the machine insn for this isn't used because
it is just too slow.  And the i386 version is disabled
because it returns the wrong result on overflow (not the
truncated 64-bit result, required by the implicit cast
to 32-bit, but the i386 arch traps to the overflow handler).

The only way to express the semantics you want in (GNU) C
is to use asm() -- and that's exactly what div64() does :-)

Blame it on the C language, but not on GCC.  Not this time.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-02-26 Thread Dan Williams

On 2/26/07, Stephen Hemminger [EMAIL PROTECTED] wrote:

Here is another way to handle the 64 bit divide case.
It allows full 64 bit divide by adding the support routine
GCC needs.


snip

I know ARM already went through the process of removing __udivdi3 support:
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=2723/2

Copying Russell and Nicolas as a heads up.

--
Dan
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] div64_64 support

2007-02-24 Thread Sami Farin
On Fri, Feb 23, 2007 at 17:05:27 -0800, Stephen Hemminger wrote:
 Since there already two users of full 64 bit division in the kernel,
 and other places maybe hiding out as well. Add a full 64/64 bit divide.
 
 Yes this expensive, but there are places where it is necessary.
 It is not clear if doing the scaling buys any advantage on 64 bit platforms,
 so for them a full divide is done.

Still does not work after these fixes... how came?

WARNING: div64_64 [net/netfilter/xt_connbytes.ko] undefined!
WARNING: div64_64 [net/ipv4/tcp_cubic.ko] undefined!

--- linux-2.6.19/include/asm-i386/div64.h.bak   2006-11-29 23:57:37.0 
+0200
+++ linux-2.6.19/include/asm-i386/div64.h   2007-02-24 16:24:55.822529880 
+0200
@@ -45,4 +45,7 @@ div_ll_X_l_rem(long long divs, long div,
return dum2;
 
 }
+
+extern uint64_t div64_64(uint64_t dividend, uint64_t divisor);
+
 #endif
--- linux-2.6.19/lib/div64.c.bak2007-02-24 16:10:03.686084000 +0200
+++ linux-2.6.19/lib/div64.c2007-02-24 17:01:11.224517353 +0200
@@ -80,4 +80,6 @@ uint64_t div64_64(uint64_t dividend, uin
return dividend;
 }
 
+EXPORT_SYMBOL(div64_64);
+
 #endif /* BITS_PER_LONG == 32 */

-- 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html