https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78954
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> --- That depends on which CPU you tune for. E.g. with -mtune=intel or -mtune=core2 etc. you get what you are asking for, -mtune=generic takes into account that the movd %edi, %xmm1 insn is very slow on some AMD CPUs and because moving through stack isn't that slower on Intel CPUs, it is a compromise between those tunings. /* X86_TUNE_INTER_UNIT_MOVES_TO_VEC: Enable moves in from integer to SSE registers. If disabled, the moves will be done by storing the value to memory and reloading. */ DEF_TUNE (X86_TUNE_INTER_UNIT_MOVES_TO_VEC, "inter_unit_moves_to_vec", ~(m_AMD_MULTIPLE | m_GENERIC)) /* X86_TUNE_INTER_UNIT_MOVES_TO_VEC: Enable moves in from SSE to integer registers. If disabled, the moves will be done by storing the value to memory and reloading. */ DEF_TUNE (X86_TUNE_INTER_UNIT_MOVES_FROM_VEC, "inter_unit_moves_from_vec", ~m_ATHLON_K8) where #define m_AMD_MULTIPLE (m_ATHLON_K8 | m_AMDFAM10 | m_BDVER | m_BTVER \ | m_ZNVER1)