https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
--- Comment #10 from Peter Cordes <peter at cordes dot ca> --- (In reply to Uroš Bizjak from comment #9) > There was similar patch for sqrt [1], I think that the approach is > straightforward, and could be applied to other reg->reg scalar insns as > well, independently of PR87007 patch. > > [1] https://gcc.gnu.org/ml/gcc-patches/2018-05/msg00202.html Yeah, that looks good. So I think it's just vcvtss2sd and sd2ss, and VROUNDSS/SD that aren't done yet. That patch covers VSQRTSS/SD, VRCPSS, and VRSQRTSS. It also bizarrely uses it for VMOVSS, which gcc should only emit if it actually wants to merge (right?). *If* this part of the patch isn't a bug - return "vmovss\t{%1, %0, %0|%0, %0, %1}"; + return "vmovss\t{%d1, %0|%0, %d1}"; then even better would be vmovaps %1, %0 (which can benefit from mov-elimination, and doesn't need a port-5-only ALU uop.) Same for vmovsd of course.