On Thu, Jun 24, 2021 at 1:07 PM Richard Biener <rguent...@suse.de> wrote:

> This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1
> instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... }
> thus subtract, add alternating on lanes, starting with subtract.
>
> It adds a corresponding optab and direct internal function,
> vec_addsub$a3 and renames the existing i386 backend patterns to
> the new canonical name.
>
> The SLP pattern matches the exact alternating lane sequence rather
> than trying to be clever and anticipating incoming permutes - we
> could permute the two input vectors to the needed lane alternation,
> do the addsub and then permute the result vector back but that's
> only profitable in case the two input or the output permute will
> vanish - something Tamars refactoring of SLP pattern recog should
> make possible.

Using the attached patch, I was also able to generate addsub for the
following testcase:

float x[2], y[2], z[2];

void foo ()
{
  x[0] = y[0] - z[0];
  x[1] = y[1] + z[1];
}

       vmovq   y(%rip), %xmm0
       vmovq   z(%rip), %xmm1
       vaddsubps       %xmm1, %xmm0, %xmm0
       vmovlps %xmm0, x(%rip)
       ret

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index e887f03474d..5f10572718d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -788,6 +788,24 @@ (define_insn "*mmx_haddsubv2sf3"
    (set_attr "prefix_extra" "1")
    (set_attr "mode" "V2SF")])
 
+(define_insn "vec_addsubv2sf3"
+  [(set (match_operand:V2SF 0 "register_operand" "=x,x")
+       (vec_merge:V2SF
+         (minus:V2SF
+           (match_operand:V2SF 1 "register_operand" "0,x")
+           (match_operand:V2SF 2 "register_operand" "x,x"))
+         (plus:V2SF (match_dup 1) (match_dup 2))
+         (const_int 1)))]
+  "TARGET_SSE3 && TARGET_MMX_WITH_SSE"
+  "@
+   addsubps\t{%2, %0|%0, %2}
+   vaddsubps\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sseadd")
+   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_rep" "1,*")
+   (set_attr "mode" "V4SF")])
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel single-precision floating point comparisons

Reply via email to