http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60418

--- Comment #10 from H.J. Lu <hjl.tools at gmail dot com> ---
Sources have many FP loops contains codes like:

rsq11             = dx11*dx11+dy11*dy11+dz11*dz11;

When they are compiled with

-O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver
-fuse-linker-plugin

LTO input IRs contain statements like

  powmult_241 = dy11_71 * dy11_71;
  powmult_240 = dz11_72 * dz11_72;
  _699 = powmult_240 + powmult_80;
  rsq11_77 = _699 + powmult_241;

During the final LTO link, lto1 repeatedly removes loop a preheader
in one pass and adds it back in the next pass.  Each removal/add
changes the statements to

  powmult_213 = dy11_71 * dy11_71;
  _75 = powmult_213 + powmult_80;
  powmult_244 = dz11_72 * dz11_72;
  rsq11_77 = _75 + powmult_244;

Each such re-order may change the FP result slightly.  They
can accumulate to such a degree that the end result is
outside of tolerance.

Reply via email to