https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113365
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- At -O0, GCC produces a lot of load/stores to the stack and subnormals always have a penality in HW for x87. Note double uses SSE while long double uses x87 so the effect there will show up more. Note if we use -O2 and change the variable to be a volatile variable (otherwise the loop is just optimized away), the speed difference is gone because you no longer have addition happening of a subnormal which is what is causing the slow down. Anyways as I mentioned this is not a GCC bug but rather a HW limitation. There are many other targets where subnormals are slow even for double too.