https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115
Jim Wilson <wilson at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wilson at gcc dot gnu.org --- Comment #3 from Jim Wilson <wilson at gcc dot gnu.org> --- Marc Glisse's testcase fails even with old gcc versions. My x86_64 Ubuntu 16.04 gcc-5.4.0 also removes the divide with -O. My Ubuntu 18.04 gcc-7.5.0 gives the same result. This seems to be simple constant folding that we have always done. The assumption here seems to be that if the user is dividing constants, then we don't need to worry about setting exception bits. If I write (4.0 / 3.0) for instance, the compiler just folds it and doesn't worry about setting the inexact bit. Aurelien Jarno's testcase in the attachment is more interesting, as that works with older gcc versions, just not gcc-10. I did a bisect, and tracked this down to the Richard Biener's patch for pr83518. It looks like the glibc code was obfuscated a bit to try to avoid the usual trivial constant folding, and the patch for pr83518 just made gcc smart enough to recognize that constants are involved, and then optimize this case the same way we have always optimized FP constant divides. Newlib incidentally uses (x-x)/(x-x) where x is the input value, so there are no constants involved, and the divide does not get optimized away. This still works with gcc-10. The result is a subtract followed by a divide. At first glance, this looks more like a glibc problem to me than a gcc problem. But maybe the fact that constants were written to memory and then read back in should prevent the usual trivial FP constant divide folding. I can almost make the glibc testcase work if I mark the unions as volatile. That prevents the union reads and writes from being optimized away, but the divide gets moved after the fetestexcept call. That looks like a gcc bug though I think a different problem that this pr. The 234t.optimized dump is correct. The 236r.expand dump is wrong. This happens for both x86_64 and RISC-V. The resulting code is bigger than what the newlib trick generates though.