Re: clang gets numerical underflow wrong, please fix.
On Mon, Mar 14, 2016 at 08:23:33PM +0100, Dimitry Andric wrote: > > Maybe this is a usable workaround for libm. > Thanks for looking into this. I just read the audit trail at llvm.org. Searching the clang user manual turns up The support for standard C in clang is feature-complete except for the C99 floating-point pragmas. There is no other statement concerning the implementation defined behavior. The understated assumption that FENV_ACCESS is tacitly set to OFF should be documented. It won't help possible libm issues. The libm function is trying to raise the FE_UNDERFLOW signal and return 0 to a program. As it is now, the libm function returns a nonzero invalid result. -- Steve ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
Re: clang gets numerical underflow wrong, please fix.
On 14 Mar 2016, at 02:53, Steve Karglwrote: ... > #include > #include > > int > main(void) > { > int i; > float x = 1.f; > i = 0; > feclearexcept(FE_ALL_EXCEPT); > do { > x *= 2; > i++; > printf("%d %e\n", i, x); > } while(!fetestexcept(FE_OVERFLOW)); > if (fetestexcept(FE_OVERFLOW)) printf("FE_UNDERFLOW: "); > printf("x = %e after %d iterations\n", x, i); > > return 0; > } > > You'll get a bunch of invalid output before the OVERFLOW. > > % cc -O -o z b.c -lm && ./z | tail > 1016 7.022239e+305 <-- not a valid float > 1017 1.404448e+306 <-- not a valid float > 1018 2.808896e+306 <-- not a valid float > 1019 5.617791e+306 <-- not a valid float > 1020 1.123558e+307 <-- not a valid float > 1021 2.247116e+307 <-- not a valid float > 1022 4.494233e+307 <-- not a valid float > 1023 8.988466e+307 <-- not a valid float > 1024 inf > FE_UNDERFLOW: x = inf after 1024 iterations > > Clang is broken with or without #pragma FENV_ACCESS "on". Well, it simply doesn't support that #pragma [1], just like gcc [2]. :-( Apparently compiler writers have trouble with this pragma, don't implement it, and assume that it's always off. Which then appears to make most (or all) fenv.h functions into undefined behavior. That said, making 'x' in your test case volatile helps, e.g. the main loop was: fadd%st(0), %st(0) fstl-20(%ebp) incl%esi movl%esi, 4(%esp) fstpl 8(%esp) movl$.L.str, (%esp) calll printf fnstsw -10(%ebp) and becomes: flds-16(%ebp) fadd%st(0), %st(0) fstps -16(%ebp) incl%esi flds-16(%ebp) fstpl 8(%esp) movl%esi, 4(%esp) movl$.L.str, (%esp) calll printf #APP fnstsw -10(%ebp) So the fstps causes an overflow when 128 iterations are reached: [...] 126 8.507059e+37 127 1.701412e+38 128 inf FE_UNDERFLOW: x = inf after 128 iterations Maybe this is a usable workaround for libm. -Dimitry [1] https://llvm.org/bugs/show_bug.cgi?id=8100 [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678 signature.asc Description: Message signed with OpenPGP using GPGMail
Re: clang gets numerical underflow wrong, please fix.
On Mon, Mar 14, 2016 at 01:02:20AM +0100, Dimitry Andric wrote: > > $ gcc -O overflow-iter.c -o overflow-iter-gcc -lm > $ ./overflow-iter-gcc > FE_OVERFLOW: x = inf after 1024 iterations > $ gcc -O2 overflow-iter.c -o overflow-iter-gcc -lm > $ ./overflow-iter-gcc > FE_OVERFLOW: x = inf after 16384 iterations > Change the program to #include #include int main(void) { int i; float x = 1.f; i = 0; feclearexcept(FE_ALL_EXCEPT); do { x *= 2; i++; printf("%d %e\n", i, x); } while(!fetestexcept(FE_OVERFLOW)); if (fetestexcept(FE_OVERFLOW)) printf("FE_UNDERFLOW: "); printf("x = %e after %d iterations\n", x, i); return 0; } You'll get a bunch of invalid output before the OVERFLOW. % cc -O -o z b.c -lm && ./z | tail 1016 7.022239e+305 <-- not a valid float 1017 1.404448e+306 <-- not a valid float 1018 2.808896e+306 <-- not a valid float 1019 5.617791e+306 <-- not a valid float 1020 1.123558e+307 <-- not a valid float 1021 2.247116e+307 <-- not a valid float 1022 4.494233e+307 <-- not a valid float 1023 8.988466e+307 <-- not a valid float 1024 inf FE_UNDERFLOW: x = inf after 1024 iterations Clang is broken with or without #pragma FENV_ACCESS "on". -- Steve ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
Re: clang gets numerical underflow wrong, please fix.
On Mon, Mar 14, 2016 at 01:02:20AM +0100, Dimitry Andric wrote: > On 13 Mar 2016, at 21:10, Steve Kargl> wrote: > > Thanks for the quick reply. But, it must be using an 80-bit > > extended double instead of a double for storage. This variation > > > > #include > > #include > > > > int > > main(void) > > { > > int i; > > // float x = 1.f; > > double x = 1.; > > i = 0; > > feclearexcept(FE_ALL_EXCEPT); > > do { > > x /= 2; > > i++; > > } while(!fetestexcept(FE_UNDERFLOW)); > > if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: "); > > printf("x = %e after %d iterations\n", x, i); > > > > return 0; > > } > > > > yields > > > > % cc -O -o z b.c -lm && ./z > > FE_UNDERFLOW: x = 0.00e+00 after 16435 iterations > > > > It should be 1075 iterations. > > > > Note, there is a similar issue with OVERFLOW. The upshot is > > that clang on current is probably miscompiling libm. > > With this example, I also get different results from gcc (4.8.5), > depending on the optimization level: > > $ gcc -O underflow-iter.c -o underflow-iter-gcc -lm > $ ./underflow-iter-gcc > FE_UNDERFLOW: x = 0.00e+00 after 1075 iterations > $ gcc -O2 underflow-iter.c -o underflow-iter-gcc -lm > $ ./underflow-iter-gcc > FE_UNDERFLOW: x = 0.00e+00 after 16435 iterations > > Similar for the overflow case: > > $ gcc -O overflow-iter.c -o overflow-iter-gcc -lm > $ ./overflow-iter-gcc > FE_OVERFLOW: x = inf after 1024 iterations > $ gcc -O2 overflow-iter.c -o overflow-iter-gcc -lm > $ ./overflow-iter-gcc > FE_OVERFLOW: x = inf after 16384 iterations > > Are we depending on some sort of subtle undefined behavior here? I don't know. From n1256.pdf, 6.5.5, I can find The result of the binary * operator is the product of the operands. I can't find what happens when one operand is DBL_MAX and the other is greater than 1. The result is clearly an overflow condition. Annex F is normative text, which defers to IEC 60559. F.3 states -- The +, -, *, and / operators provide the IEC 60559 add, subtract, multiply, and divide operations. Annex F contains alot of text about "#pragma STDC FENV_ACCESS ON", but of course neither gcc nor clang implement this pragma. In particular, in F.8.1 one has Floating-point arithmetic operations ... may entail side effects which optimization shall honor, at least where the state of the FENV_ACCESS pragma is ``on''. The flags ... in the floating-point environment may be regarded as global variables; floating-point operations (+, *, etc.) implicitly ... write the flags. However, F.7.1 has F.7.1 Environment management IEC 60559 requires that floating-point operations implicitly raise floating-point exception status flags, ... When the state for the FENV_ACCESS pragma (defined in ) is ``on'', these changes to the floating-point state are treated as side effects which respect sequence points.313) 313) If the state for the FENV_ACCESS pragma is ``off'', the implementation is free to assume the floating-point control modes will be the default ones and the floating-point status flags will not be tested, which allows certain optimizations (see F.8). So, I'm guessing clang/llvm developers aer going to claim that the lack of implementation of the FENV_ACCESS pragme means "off". So, clang is unsuitable for real floating-point development. -- Steve ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
Re: clang gets numerical underflow wrong, please fix.
On 13 Mar 2016, at 21:10, Steve Karglwrote: > On Sun, Mar 13, 2016 at 09:03:57PM +0100, Dimitry Andric wrote: ... >> So it's storing the intermediate result in a double, for some reason. >> The fnstsw will then result in zero, since there was no underflow at >> that point. >> >> I will submit a bug for this upstream, thanks for the report. Submitted upstream as: https://llvm.org/bugs/show_bug.cgi?id=26931 > Thanks for the quick reply. But, it must be using an 80-bit > extended double instead of a double for storage. This variation > > #include > #include > > int > main(void) > { > int i; > // float x = 1.f; > double x = 1.; > i = 0; > feclearexcept(FE_ALL_EXCEPT); > do { > x /= 2; > i++; > } while(!fetestexcept(FE_UNDERFLOW)); > if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: "); > printf("x = %e after %d iterations\n", x, i); > > return 0; > } > > yields > > % cc -O -o z b.c -lm && ./z > FE_UNDERFLOW: x = 0.00e+00 after 16435 iterations > > It should be 1075 iterations. > > Note, there is a similar issue with OVERFLOW. The upshot is > that clang on current is probably miscompiling libm. With this example, I also get different results from gcc (4.8.5), depending on the optimization level: $ gcc -O underflow-iter.c -o underflow-iter-gcc -lm $ ./underflow-iter-gcc FE_UNDERFLOW: x = 0.00e+00 after 1075 iterations $ gcc -O2 underflow-iter.c -o underflow-iter-gcc -lm $ ./underflow-iter-gcc FE_UNDERFLOW: x = 0.00e+00 after 16435 iterations Similar for the overflow case: $ gcc -O overflow-iter.c -o overflow-iter-gcc -lm $ ./overflow-iter-gcc FE_OVERFLOW: x = inf after 1024 iterations $ gcc -O2 overflow-iter.c -o overflow-iter-gcc -lm $ ./overflow-iter-gcc FE_OVERFLOW: x = inf after 16384 iterations Are we depending on some sort of subtle undefined behavior here? With -O, the 'main loop' becomes: .L3: fld1 fstpl 24(%esp) movl$0, %ebx .L8: fldl24(%esp) fld %st(0) faddp %st, %st(1) fstpl 24(%esp) addl$1, %ebx fnstsw %ax movl%eax, %esi movl__has_sse, %eax testl %eax, %eax je .L4 cmpl$2, %eax jne .L5 call__test_sse testl %eax, %eax je .L5 .L4: stmxcsr 44(%esp) jmp .L6 .L5: movl$0, 44(%esp) .L6: orl 44(%esp), %esi testl $8, %esi je .L8 With -O2, it becomes: .L3: fld1 xorl%ebx, %ebx .L12: fadd%st(0), %st addl$1, %ebx fnstsw %ax testl %edx, %edx movl%eax, %esi je .L10 cmpl$2, %edx je .L27 .L9: xorl%eax, %eax .L8: orl %eax, %esi andl$8, %esi je .L12 So it switches from using faddp and fstpl to direct fadd of %st(0) and %st. I assume that uses the internal 80 bit precision? Gcc also manages to move the __has_sse stuff out to further down in the function, but it does not really affect the result. -Dimitry signature.asc Description: Message signed with OpenPGP using GPGMail
Re: clang gets numerical underflow wrong, please fix.
On Sun, Mar 13, 2016 at 09:03:57PM +0100, Dimitry Andric wrote: > On 13 Mar 2016, at 19:25, Steve Kargl> wrote: > > > > Consider this small piece of code: > > > > #include > > #include > > > > float > > foo() > > { > > static const volatile float tiny = 1.e-30f; > > return (tiny * tiny); > > } > > > > int > > main(void) > > { > > float x; > > feclearexcept(FE_ALL_EXCEPT); > > x = foo(); > > if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: "); > > printf("x = %e\n", x); > > return 0; > > } > > > > clang seems to get the underflow condition wrong. > > > > % cc -o z a.c -lm && ./z > > FE_UNDERFLOW: x = 0.00e+00 > > > > % cc -O -o z a.c -lm && ./z > > x = 1.00e-60 <--- This is not a possible value! > > > > % gcc -o z a.c -lm && ./z > > FE_UNDERFLOW: x = 0.00e+00 > > > > % gcc -O -o z a.c -lm && ./z > > FE_UNDERFLOW: x = 0.00e+00 > > Hmm, this is an interesting one. On amd64, it works as expected with > clang, but there it always uses SSE, obviously: > > $ ./underflow-amd64 > FE_UNDERFLOW: x = 0.00e+00 > > The problem seems to be caused by the intermediate result being stored > using fstpl instead of fstps, e.g. simplifying the sample program (to > get rid of all the SSE stuff the fexxx() macros insert): > > int main(void) > { > float x; > __uint16_t status; > __fnclex(); > x = foo(); > __fnstsw(); > printf("status: %#x\n", (unsigned)status); > printf("x = %e\n", x); > return 0; > } > > With gcc, the assembly becomes: > > foo: > fldstiny.1853 > fldstiny.1853 > fmulp %st, %st(1) > ret > [...] > main: > [...] > fnclex > callfoo > fstps 12(%esp) > fnstsw %ax > > In this case, fmulp does not generate an underflow, but the fstps will. > With clang, the assembly becomes: > > foo: > fldsfoo.tiny > fmuls foo.tiny > retl > [...] > main: > subl$24, %esp > fnclex > calll foo > fstpl 12(%esp)# 8-byte Folded Spill > fnstsw 22(%esp) > > So it's storing the intermediate result in a double, for some reason. > The fnstsw will then result in zero, since there was no underflow at > that point. > > I will submit a bug for this upstream, thanks for the report. > Thanks for the quick reply. But, it must be using an 80-bit extended double instead of a double for storage. This variation #include #include int main(void) { int i; // float x = 1.f; double x = 1.; i = 0; feclearexcept(FE_ALL_EXCEPT); do { x /= 2; i++; } while(!fetestexcept(FE_UNDERFLOW)); if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: "); printf("x = %e after %d iterations\n", x, i); return 0; } yields % cc -O -o z b.c -lm && ./z FE_UNDERFLOW: x = 0.00e+00 after 16435 iterations It should be 1075 iterations. Note, there is a similar issue with OVERFLOW. The upshot is that clang on current is probably miscompiling libm. -- Steve ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"
clang gets numerical underflow wrong, please fix.
Consider this small piece of code: #include #include float foo() { static const volatile float tiny = 1.e-30f; return (tiny * tiny); } int main(void) { float x; feclearexcept(FE_ALL_EXCEPT); x = foo(); if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: "); printf("x = %e\n", x); return 0; } clang seems to get the underflow condition wrong. % cc -o z a.c -lm && ./z FE_UNDERFLOW: x = 0.00e+00 % cc -O -o z a.c -lm && ./z x = 1.00e-60 <--- This is not a possible value! % gcc -o z a.c -lm && ./z FE_UNDERFLOW: x = 0.00e+00 % gcc -O -o z a.c -lm && ./z FE_UNDERFLOW: x = 0.00e+00 % uname -a FreeBSD laptop-kargl 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r296724: Sun Mar 13 09:12:38 PDT 2016 % cc --version FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on LLVM 3.8.0) % gcc --version gcc (FreeBSD Ports Collection) 4.8.5 -- Steve ___ freebsd-toolchain@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"