https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68961
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> --- Note that the fix depends on "bogus" cost for the vector construction on x86_64. Currently it is two stmts (nunits / 2 + 1) but the vector can be constructed by a single unpcklpd stmt. The correct cost is nunits - 1. Similar to the PPC case the main issue is that the fact that incoming registers have an exact overlap with the return value registers is hidden from the GIMPLE IL: pack (double a, double aa) { struct x D.1756; <bb 2>: MEM[(struct x *)&D.1756] = a_2(D); MEM[(struct x *)&D.1756 + 8B] = aa_3(D); return D.1756; } Detecting the exact overlap is probably too hard but at least detecting that we don't return in memory and thus the store is not a store and that we return in two different regs and thus require two vector extractions should be possible.