https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
H.J. Lu <hjl.tools at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Depends on| |87007 --- Comment #1 from H.J. Lu <hjl.tools at gmail dot com> --- vcvtsd2ss %xmm1, %xmm1, %xmm0 is faster than vcvtsd2ss %xmm1, %xmm0, %xmm0 But vxorps %xmm0, %xmm0, %xmm0 vcvtsd2ss %xmm1, %xmm0, %xmm0 are faster than both. I have a patch for PR 87007: https://gcc.gnu.org/ml/gcc-patches/2019-01/msg00298.html which inserts a vxorps at the last possible position. vxorps will be executed only once in a function. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87007 [Bug 87007] [8/9 Regression] 10% slowdown with -march=skylake-avx512