On 11/2/06, Uros Bizjak <[EMAIL PROTECTED]> wrote:
This testcase (similar to yours, but it actually compiles):
Hello,
Uros, thank you for the attention to my problem. I upgraded gcc to 4.2
and have been using -march=i686 instead of -march=pentium4 for my
tests now. gcc 4.2 resolved some but not all of my concerns. Please
see below.
double test(int n, double a)
{
double sum = 0.0;
int i;
for(i=0; i<n; ++i)
{
float x = logf((float)i);
sum += isnan(x) ? 0 : x;
}
return sum;
}
produces exactly the code you are looking for (using gcc-4.2 with -march=i686):
.L5:
pushl %ebx
fildl (%esp)
addl $4, %esp
fstps (%esp)
fstpl -24(%ebp)
call logf
fucomi %st(0), %st
fldz
fcmovnu %st(1), %st
fstp %st(1)
addl $1, %ebx
cmpl %esi, %ebx
fldl -24(%ebp)
faddp %st, %st(1)
jne .L5
I was unable to replicate your results with gcc 4.0.3, so I installed
gcc 4.2.0 20061103 (prerelease) from SVN. Using that, I am able to
replicate the loop above exactly with
-O2 -march=i686. It looks like gcc 4.2 is willing to do this
optimization; gcc 4.0 would not. :-)
logf() function will be inlined by specifying
-funsafe-math-optimizations, this flag also enables implicit
float->double extensions for x87 math. As you probably don't need math
errno from log(), -fno-math-errno should be added.
Those two flags produce IMO optimal loop:
.L5:
pushl %eax
fildl (%esp)
addl $4, %esp
fldln2
fxch %st(1)
fyl2x
fucomi %st(0), %st
fldz
fcmovnu %st(1), %st
fstp %st(1)
addl $1, %eax
cmpl %edx, %eax
faddp %st, %st(1)
jne .L5
I have been unable to replicate this result. Still, gcc 4.0.3 and gcc
4.2.0 completely omit the fucomi test and the associated semantics
with testing for NAN:
I compiled exactly the verbatim test case above, and compile using these flags:
-O2 -march=i686 -funsafe-math-optimizations -fno-math-errno
The loop I get is:
.L5:
pushl %eax
addl $1, %eax
fildl (%esp)
addl $4, %esp
cmpl %edx, %eax
fldln2
fxch %st(1)
fyl2x
faddp %st, %st(1)
jne .L5
Now, for this particular code, that loop may be considered a valid
optimization because log can not produce NAN from a non-negative
parameter. To be sure, I then modified the code as follows:
double test(int i0, int n, double a)
{
double sum = 0.0;
int i;
for(i=i0; i<n; ++i)
{
float x = logf((float)i);
sum += isnan(x) ? 0 : x;
}
return sum;
}
And recompiled with the same flags. The assembly code for the loop
portion is identical to the one I posted above. Now though the code is
actually capable of producing NANs.
Just to be sure, I also tested this on my modified loop:
int main(void) {
printf("test(4, 6, 0) = %f\n", test(4,6,0));
printf("test(0, 2, 0) = %f\n", test(0,2,0));
printf("test(-2, 3, 0) = %f\n", test(-2,3,0));
return 0;
}
[EMAIL PROTECTED]:~/project/cf/util$ /home/james/local/gcc/bin/gcc -O2
-march=i686 -funsafe-math-optimizations -fno-math-errno uros-test.c -o
test
[EMAIL PROTECTED]:~/project/cf/util$ ./test
test(4, 6, 0) = 2.995732
test(0, 2, 0) = -inf
test(-2, 3, 0) = nan
[EMAIL PROTECTED]:~/project/cf/util$ /home/james/local/gcc/bin/gcc -O2
-march=i686 uros-test.c -o test -lm
[EMAIL PROTECTED]:~/project/cf/util$ ./uros
test(4, 6, 0) = 2.995732
test(0, 2, 0) = -inf
test(-2, 3, 0) = -inf
[EMAIL PROTECTED]:~/project/cf/util$ /home/james/local/gcc/bin/gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ../gcc-4-2/configure --prefix=/home/james/local/gcc
Thread model: posix
gcc version 4.2.0 20061103 (prerelease)
Perhaps I have not replicated your working environment closely enough,
or you have a different macro in place of the isnan call. I compiled
all code above both with and without include headers <math.h>,
<stdio.h>. I get the same results either way.
Again, help is appreciated. -- Thanks.
Regards,
Michael James