http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60418
--- Comment #10 from H.J. Lu <hjl.tools at gmail dot com> --- Sources have many FP loops contains codes like: rsq11 = dx11*dx11+dy11*dy11+dz11*dz11; When they are compiled with -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver -fuse-linker-plugin LTO input IRs contain statements like powmult_241 = dy11_71 * dy11_71; powmult_240 = dz11_72 * dz11_72; _699 = powmult_240 + powmult_80; rsq11_77 = _699 + powmult_241; During the final LTO link, lto1 repeatedly removes loop a preheader in one pass and adds it back in the next pass. Each removal/add changes the statements to powmult_213 = dy11_71 * dy11_71; _75 = powmult_213 + powmult_80; powmult_244 = dz11_72 * dz11_72; rsq11_77 = _75 + powmult_244; Each such re-order may change the FP result slightly. They can accumulate to such a degree that the end result is outside of tolerance.