Re: clang gets numerical underflow wrong, please fix.

2016-03-14 Thread Steve Kargl
On Mon, Mar 14, 2016 at 08:23:33PM +0100, Dimitry Andric wrote:
> 
> Maybe this is a usable workaround for libm.
> 

Thanks for looking into this.  I just read the audit
trail at llvm.org.  Searching the clang user manual
turns up

   The support for standard C in clang is feature-complete
   except for the C99 floating-point pragmas.

There is no other statement concerning the implementation
defined behavior.  The understated assumption that FENV_ACCESS
is tacitly set to OFF should be documented.

It won't help possible libm issues.  The libm function is
trying to raise the FE_UNDERFLOW signal and return 0 to
a program.  As it is now, the libm function returns a
nonzero invalid result.

-- 
Steve
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


Re: clang gets numerical underflow wrong, please fix.

2016-03-14 Thread Dimitry Andric
On 14 Mar 2016, at 02:53, Steve Kargl  wrote:
...
> #include 
> #include 
> 
> int
> main(void)
> {
>   int i;
>   float x = 1.f;
>   i = 0;
>   feclearexcept(FE_ALL_EXCEPT);
>   do {
>  x *= 2;
>  i++;
>  printf("%d %e\n", i, x);
>   } while(!fetestexcept(FE_OVERFLOW));
>   if (fetestexcept(FE_OVERFLOW)) printf("FE_UNDERFLOW: ");
>   printf("x = %e after %d iterations\n", x, i);
> 
>   return 0;
> }
> 
> You'll get a bunch of invalid output before the OVERFLOW.
> 
> % cc -O -o z b.c -lm && ./z | tail
> 1016 7.022239e+305  <-- not a valid float
> 1017 1.404448e+306  <-- not a valid float
> 1018 2.808896e+306  <-- not a valid float
> 1019 5.617791e+306  <-- not a valid float
> 1020 1.123558e+307  <-- not a valid float
> 1021 2.247116e+307  <-- not a valid float
> 1022 4.494233e+307  <-- not a valid float
> 1023 8.988466e+307  <-- not a valid float
> 1024 inf
> FE_UNDERFLOW: x = inf after 1024 iterations
> 
> Clang is broken with or without #pragma FENV_ACCESS "on".

Well, it simply doesn't support that #pragma [1], just like gcc [2]. :-(

Apparently compiler writers have trouble with this pragma, don't
implement it, and assume that it's always off.  Which then appears to
make most (or all) fenv.h functions into undefined behavior.

That said, making 'x' in your test case volatile helps, e.g. the main
loop was:

fadd%st(0), %st(0)
fstl-20(%ebp)
incl%esi
movl%esi, 4(%esp)
fstpl   8(%esp)
movl$.L.str, (%esp)
calll   printf
fnstsw  -10(%ebp)

and becomes:

flds-16(%ebp)
fadd%st(0), %st(0)
fstps   -16(%ebp)
incl%esi
flds-16(%ebp)
fstpl   8(%esp)
movl%esi, 4(%esp)
movl$.L.str, (%esp)
calll   printf
#APP
fnstsw  -10(%ebp)

So the fstps causes an overflow when 128 iterations are reached:

[...]
126 8.507059e+37
127 1.701412e+38
128 inf
FE_UNDERFLOW: x = inf after 128 iterations

Maybe this is a usable workaround for libm.

-Dimitry

[1] https://llvm.org/bugs/show_bug.cgi?id=8100
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: clang gets numerical underflow wrong, please fix.

2016-03-13 Thread Steve Kargl
On Mon, Mar 14, 2016 at 01:02:20AM +0100, Dimitry Andric wrote:
> 
> $ gcc -O overflow-iter.c -o overflow-iter-gcc -lm
> $ ./overflow-iter-gcc
> FE_OVERFLOW: x = inf after 1024 iterations
> $ gcc -O2 overflow-iter.c -o overflow-iter-gcc -lm
> $ ./overflow-iter-gcc
> FE_OVERFLOW: x = inf after 16384 iterations
> 

Change the program to 

#include 
#include 

int
main(void)
{
   int i;
   float x = 1.f;
   i = 0;
   feclearexcept(FE_ALL_EXCEPT);
   do {
  x *= 2;
  i++;
  printf("%d %e\n", i, x);
   } while(!fetestexcept(FE_OVERFLOW));
   if (fetestexcept(FE_OVERFLOW)) printf("FE_UNDERFLOW: ");
   printf("x = %e after %d iterations\n", x, i);

   return 0;
}

You'll get a bunch of invalid output before the OVERFLOW.

% cc -O -o z b.c -lm && ./z | tail
1016 7.022239e+305  <-- not a valid float
1017 1.404448e+306  <-- not a valid float
1018 2.808896e+306  <-- not a valid float
1019 5.617791e+306  <-- not a valid float
1020 1.123558e+307  <-- not a valid float
1021 2.247116e+307  <-- not a valid float
1022 4.494233e+307  <-- not a valid float
1023 8.988466e+307  <-- not a valid float
1024 inf
FE_UNDERFLOW: x = inf after 1024 iterations

Clang is broken with or without #pragma FENV_ACCESS "on".

-- 
Steve
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


Re: clang gets numerical underflow wrong, please fix.

2016-03-13 Thread Steve Kargl
On Mon, Mar 14, 2016 at 01:02:20AM +0100, Dimitry Andric wrote:
> On 13 Mar 2016, at 21:10, Steve Kargl  
> wrote:
> > Thanks for the quick reply.  But, it must be using an 80-bit
> > extended double instead of a double for storage.  This variation
> > 
> > #include 
> > #include 
> > 
> > int
> > main(void)
> > {
> >   int i;
> > //   float x = 1.f;
> >   double x = 1.;
> >   i = 0;
> >   feclearexcept(FE_ALL_EXCEPT);
> >   do {
> >  x /= 2;
> >  i++;
> >   } while(!fetestexcept(FE_UNDERFLOW));
> >   if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
> >   printf("x = %e after %d iterations\n", x, i);
> > 
> >   return 0;
> > }
> > 
> > yields
> > 
> > % cc -O -o z b.c -lm && ./z
> > FE_UNDERFLOW: x = 0.00e+00 after 16435 iterations
> > 
> > It should be 1075 iterations.
> > 
> > Note, there is a similar issue with OVERFLOW.  The upshot is
> > that clang on current is probably miscompiling libm.
> 
> With this example, I also get different results from gcc (4.8.5),
> depending on the optimization level:
> 
> $ gcc -O underflow-iter.c -o underflow-iter-gcc -lm
> $ ./underflow-iter-gcc
> FE_UNDERFLOW: x = 0.00e+00 after 1075 iterations
> $ gcc -O2 underflow-iter.c -o underflow-iter-gcc -lm
> $ ./underflow-iter-gcc
> FE_UNDERFLOW: x = 0.00e+00 after 16435 iterations
> 
> Similar for the overflow case:
> 
> $ gcc -O overflow-iter.c -o overflow-iter-gcc -lm
> $ ./overflow-iter-gcc
> FE_OVERFLOW: x = inf after 1024 iterations
> $ gcc -O2 overflow-iter.c -o overflow-iter-gcc -lm
> $ ./overflow-iter-gcc
> FE_OVERFLOW: x = inf after 16384 iterations
> 
> Are we depending on some sort of subtle undefined behavior here?

I don't know.  From n1256.pdf, 6.5.5, I can find

The result of the binary * operator is the product of the operands.

I can't find what happens when one operand is DBL_MAX and the
other is greater than 1.  The result is clearly an overflow 
condition.  Annex F is normative text, which defers to IEC
60559.  F.3 states 

-- The +, -, *, and / operators provide the IEC 60559 add,
   subtract, multiply, and divide operations.

Annex F contains alot of text about "#pragma STDC FENV_ACCESS ON",
but of course neither gcc nor clang implement this pragma.  In
particular, in F.8.1 one has

Floating-point arithmetic operations ... may entail side effects
which optimization shall honor, at least where the state of the
FENV_ACCESS pragma is ``on''.  The flags ... in the floating-point
environment may be regarded as global variables; floating-point
operations (+, *, etc.) implicitly ... write the flags.

However, F.7.1 has

F.7.1 Environment management

IEC 60559 requires that floating-point operations implicitly raise
floating-point exception status flags, ...  When the state for the
FENV_ACCESS pragma (defined in ) is ``on'', these changes
to the floating-point state are treated as side effects which respect
sequence points.313)

313) If the state for the FENV_ACCESS pragma is ``off'', the
 implementation is free to assume the floating-point control
 modes will be the default ones and the floating-point status
 flags will not be tested, which allows certain optimizations
 (see F.8).

So, I'm guessing clang/llvm developers aer going to claim that the
lack of implementation of the FENV_ACCESS pragme means "off".  So,
clang is unsuitable for real floating-point development.

-- 
Steve
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


Re: clang gets numerical underflow wrong, please fix.

2016-03-13 Thread Dimitry Andric
On 13 Mar 2016, at 21:10, Steve Kargl  wrote:
> On Sun, Mar 13, 2016 at 09:03:57PM +0100, Dimitry Andric wrote:
...
>> So it's storing the intermediate result in a double, for some reason.
>> The fnstsw will then result in zero, since there was no underflow at
>> that point.
>> 
>> I will submit a bug for this upstream, thanks for the report.

Submitted upstream as: https://llvm.org/bugs/show_bug.cgi?id=26931


> Thanks for the quick reply.  But, it must be using an 80-bit
> extended double instead of a double for storage.  This variation
> 
> #include 
> #include 
> 
> int
> main(void)
> {
>   int i;
> //   float x = 1.f;
>   double x = 1.;
>   i = 0;
>   feclearexcept(FE_ALL_EXCEPT);
>   do {
>  x /= 2;
>  i++;
>   } while(!fetestexcept(FE_UNDERFLOW));
>   if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
>   printf("x = %e after %d iterations\n", x, i);
> 
>   return 0;
> }
> 
> yields
> 
> % cc -O -o z b.c -lm && ./z
> FE_UNDERFLOW: x = 0.00e+00 after 16435 iterations
> 
> It should be 1075 iterations.
> 
> Note, there is a similar issue with OVERFLOW.  The upshot is
> that clang on current is probably miscompiling libm.

With this example, I also get different results from gcc (4.8.5),
depending on the optimization level:

$ gcc -O underflow-iter.c -o underflow-iter-gcc -lm
$ ./underflow-iter-gcc
FE_UNDERFLOW: x = 0.00e+00 after 1075 iterations
$ gcc -O2 underflow-iter.c -o underflow-iter-gcc -lm
$ ./underflow-iter-gcc
FE_UNDERFLOW: x = 0.00e+00 after 16435 iterations

Similar for the overflow case:

$ gcc -O overflow-iter.c -o overflow-iter-gcc -lm
$ ./overflow-iter-gcc
FE_OVERFLOW: x = inf after 1024 iterations
$ gcc -O2 overflow-iter.c -o overflow-iter-gcc -lm
$ ./overflow-iter-gcc
FE_OVERFLOW: x = inf after 16384 iterations

Are we depending on some sort of subtle undefined behavior here?  With
-O, the 'main loop' becomes:

.L3:
fld1
fstpl   24(%esp)
movl$0, %ebx
.L8:
fldl24(%esp)
fld %st(0)
faddp   %st, %st(1)
fstpl   24(%esp)
addl$1, %ebx
fnstsw %ax
movl%eax, %esi
movl__has_sse, %eax
testl   %eax, %eax
je  .L4
cmpl$2, %eax
jne .L5
call__test_sse
testl   %eax, %eax
je  .L5
.L4:
stmxcsr 44(%esp)
jmp .L6
.L5:
movl$0, 44(%esp)
.L6:
orl 44(%esp), %esi
testl   $8, %esi
je  .L8

With -O2, it becomes:

.L3:
fld1
xorl%ebx, %ebx
.L12:
fadd%st(0), %st
addl$1, %ebx
fnstsw %ax
testl   %edx, %edx
movl%eax, %esi
je  .L10
cmpl$2, %edx
je  .L27
.L9:
xorl%eax, %eax
.L8:
orl %eax, %esi
andl$8, %esi
je  .L12

So it switches from using faddp and fstpl to direct fadd of %st(0) and
%st.  I assume that uses the internal 80 bit precision?  Gcc also
manages to move the __has_sse stuff out to further down in the function,
but it does not really affect the result.

-Dimitry



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: clang gets numerical underflow wrong, please fix.

2016-03-13 Thread Steve Kargl
On Sun, Mar 13, 2016 at 09:03:57PM +0100, Dimitry Andric wrote:
> On 13 Mar 2016, at 19:25, Steve Kargl  
> wrote:
> > 
> > Consider this small piece of code:
> > 
> > #include 
> > #include 
> > 
> > float
> > foo()
> > {
> > static const volatile float tiny = 1.e-30f;
> > return (tiny * tiny);
> > }
> > 
> > int
> > main(void)
> > {
> >   float x;
> >   feclearexcept(FE_ALL_EXCEPT);
> >   x = foo();
> >   if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
> >   printf("x = %e\n", x);
> >   return 0;
> > }
> > 
> > clang seems to get the underflow condition wrong.
> > 
> > % cc -o z a.c -lm && ./z
> > FE_UNDERFLOW: x = 0.00e+00
> > 
> > % cc -O -o z a.c -lm && ./z
> > x = 1.00e-60 <--- This is not a possible value!
> > 
> > % gcc -o z a.c -lm && ./z
> > FE_UNDERFLOW: x = 0.00e+00
> > 
> > % gcc -O -o z a.c -lm && ./z
> > FE_UNDERFLOW: x = 0.00e+00
> 
> Hmm, this is an interesting one.  On amd64, it works as expected with
> clang, but there it always uses SSE, obviously:
> 
> $ ./underflow-amd64
> FE_UNDERFLOW: x = 0.00e+00
> 
> The problem seems to be caused by the intermediate result being stored
> using fstpl instead of fstps, e.g. simplifying the sample program (to
> get rid of all the SSE stuff the fexxx() macros insert):
> 
> int main(void)
> {
>   float x;
>   __uint16_t status;
>   __fnclex();
>   x = foo();
>   __fnstsw();
>   printf("status: %#x\n", (unsigned)status);
>   printf("x = %e\n", x);
>   return 0;
> }
> 
> With gcc, the assembly becomes:
> 
> foo:
> fldstiny.1853
> fldstiny.1853
> fmulp   %st, %st(1)
> ret
> [...]
> main:
> [...]
> fnclex
> callfoo
> fstps   12(%esp)
> fnstsw %ax
> 
> In this case, fmulp does not generate an underflow, but the fstps will.
> With clang, the assembly becomes:
> 
> foo:
> fldsfoo.tiny
> fmuls   foo.tiny
> retl
> [...]
> main:
> subl$24, %esp
> fnclex
> calll   foo
> fstpl   12(%esp)# 8-byte Folded Spill
> fnstsw  22(%esp)
> 
> So it's storing the intermediate result in a double, for some reason.
> The fnstsw will then result in zero, since there was no underflow at
> that point.
> 
> I will submit a bug for this upstream, thanks for the report.
> 

Thanks for the quick reply.  But, it must be using an 80-bit
extended double instead of a double for storage.  This variation

#include 
#include 

int
main(void)
{
   int i;
//   float x = 1.f;
   double x = 1.;
   i = 0;
   feclearexcept(FE_ALL_EXCEPT);
   do {
  x /= 2;
  i++;
   } while(!fetestexcept(FE_UNDERFLOW));
   if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
   printf("x = %e after %d iterations\n", x, i);

   return 0;
}

yields

% cc -O -o z b.c -lm && ./z
FE_UNDERFLOW: x = 0.00e+00 after 16435 iterations

It should be 1075 iterations.

Note, there is a similar issue with OVERFLOW.  The upshot is
that clang on current is probably miscompiling libm.
-- 
Steve
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


clang gets numerical underflow wrong, please fix.

2016-03-13 Thread Steve Kargl
Consider this small piece of code:

#include 
#include 

float
foo()
{
static const volatile float tiny = 1.e-30f;
return (tiny * tiny);
}

int
main(void)
{
   float x;
   feclearexcept(FE_ALL_EXCEPT);
   x = foo();
   if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
   printf("x = %e\n", x);
   return 0;
}

clang seems to get the underflow condition wrong.

% cc -o z a.c -lm && ./z
FE_UNDERFLOW: x = 0.00e+00

% cc -O -o z a.c -lm && ./z
x = 1.00e-60 <--- This is not a possible value!

% gcc -o z a.c -lm && ./z
FE_UNDERFLOW: x = 0.00e+00

% gcc -O -o z a.c -lm && ./z
FE_UNDERFLOW: x = 0.00e+00

% uname -a
FreeBSD laptop-kargl 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r296724:
Sun Mar 13 09:12:38 PDT 2016

% cc --version
FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564)
(based on LLVM 3.8.0)

% gcc --version
gcc (FreeBSD Ports Collection) 4.8.5

-- 
Steve
___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"