https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68961

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note that the fix depends on "bogus" cost for the vector construction on
x86_64.
Currently it is two stmts (nunits / 2 + 1) but the vector can be constructed
by a single unpcklpd stmt.  The correct cost is nunits - 1.

Similar to the PPC case the main issue is that the fact that incoming registers
have an exact overlap with the return value registers is hidden from the GIMPLE
IL:

pack (double a, double aa)
{
  struct x D.1756;

  <bb 2>:
  MEM[(struct x *)&D.1756] = a_2(D);
  MEM[(struct x *)&D.1756 + 8B] = aa_3(D);
  return D.1756;
}

Detecting the exact overlap is probably too hard but at least detecting that
we don't return in memory and thus the store is not a store and that we return
in two different regs and thus require two vector extractions should be
possible.

Reply via email to