https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115

Jim Wilson <wilson at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilson at gcc dot gnu.org

--- Comment #3 from Jim Wilson <wilson at gcc dot gnu.org> ---
Marc Glisse's testcase fails even with old gcc versions.  My x86_64 Ubuntu
16.04 gcc-5.4.0 also removes the divide with -O.  My Ubuntu 18.04 gcc-7.5.0
gives the same result.  This seems to be simple constant folding that we have
always done.  The assumption here seems to be that if the user is dividing
constants, then we don't need to worry about setting exception bits.  If I
write (4.0 / 3.0) for instance, the compiler just folds it and doesn't worry
about setting the inexact bit.

Aurelien Jarno's testcase in the attachment is more interesting, as that works
with older gcc versions, just not gcc-10.  I did a bisect, and tracked this
down to the Richard Biener's patch for pr83518.  It looks like the glibc code
was obfuscated a bit to try to avoid the usual trivial constant folding, and
the patch for pr83518 just made gcc smart enough to recognize that constants
are involved, and then optimize this case the same way we have always optimized
FP constant divides.

Newlib incidentally uses (x-x)/(x-x) where x is the input value, so there are
no constants involved, and the divide does not get optimized away.  This still
works with gcc-10.  The result is a subtract followed by a divide.

At first glance, this looks more like a glibc problem to me than a gcc problem.
 But maybe the fact that constants were written to memory and then read back in
should prevent the usual trivial FP constant divide folding.

I can almost make the glibc testcase work if I mark the unions as volatile. 
That prevents the union reads and writes from being optimized away, but the
divide gets moved after the fetestexcept call.  That looks like a gcc bug
though I think a different problem that this pr.  The 234t.optimized dump is
correct.  The 236r.expand dump is wrong.  This happens for both x86_64 and
RISC-V.  The resulting code is bigger than what the newlib trick generates
though.

Reply via email to