On Wed, Apr 19, 2023 at 02:40:59AM +0000, Jiang, Haochen via Gcc-patches wrote:
> > >  (define_insn "aesenc"
> > > -  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
> > > -       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x")
> > > -                      (match_operand:V2DI 2 "vector_operand" "xBm,xm")]
> > > +  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
> > > +       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
> > > +                      (match_operand:V2DI 2 "vector_operand"
> > > + "xBm,xm,vm")]
> > >                       UNSPEC_AESENC))]
> > > -  "TARGET_AES"
> > > +  "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
> > >    "@
> > >     aesenc\t{%2, %0|%0, %2}
> > > +   vaesenc\t{%2, %1, %0|%0, %1, %2}
> > >     vaesenc\t{%2, %1, %0|%0, %1, %2}"
> > > -  [(set_attr "isa" "noavx,avx")
> > > +  [(set_attr "isa" "noavx,aes,avx512vl")
> > Shouldn't it be vaes_avx512vl and then remove " || (TARGET_VAES &&
> > TARGET_AVX512VL)" from condition.
> 
> Since VAES should not imply AES, we need that "|| (TARGET_VAES && 
> TARGET_AVX512VL)"
> 
> And there is no need to add vaes_avx512vl since the last alternative will only
> be hit when there is no aes. When there is no aes, the pattern will need vaes
> and avx512vl both or we could not use this pattern. avx512vl here is just like
> a placeholder.

As the following testcase shows, the above change was incorrect.

Using aes isa for the second alternative is obviously wrong, aes is enabled
whenever -maes is, regardless of -mavx or -mno-avx, so the above change
means that for -maes -mno-avx RA can choose, either it matches the first
alternative with the dup operand, or it matches the second one (but that
is of course wrong because vaesenc VEX encoded insn needs AES & AVX CPUID).

The big question is if "Since VAES should not imply AES" is the case or not.
Looking around at what LLVM does on godbolt, seems since clang 6 which added
-mvaes support -mvaes there implies -maes, but GCC treats those two
independent.

Now, if we'd take the LLVM path of making -mvaes imply -maes and -mno-aes
imply -mno-vaes, then we should probably just revert the above patch and
tweak common/config/i386/ to do the implications (+ add the testcase from
this patch).

If we keep the current behavior, where AES and VAES are completely
independent extensions, then we need to do more changes as the following
patch attempts to do.
We should use the aesenc etc. insns for noavx as before, we know at that
point that TARGET_AES must be true because (TARGET_VAES && TARGET_AVX512VL)
won't be true when !TARGET_AVX - TARGET_AVX512VL implies TARGET_AVX.
For the second alternative, i.e. the AVX AES VEX encoded case, the patch
uses aes_avx isa which requires both.  Now, for the third one we can't
use avx512vl isa attribute, because one could compile with
-maes -mavx512vl -mno-vaes and in that case we want VEX encoded vaesenc
which can't use %xmm16+ (nor EGPRs), so we need vaes_avx512vl isa to
ensure it is enabled only for -mvaes -mavx512vl.  And there is another
problem, with -mno-aes -mvaes -mavx512vl we could emit VEX encoded vaesenc
which requires AES and AVX ISAs rather than the VAES and AVX512VL which
are enabled.  So the patch uses the {evex} prefix for those cases.
And similarly for the vaes*_<mode> instructions, if they aren't 128-bit
or use %xmm16+ registers, the current case is fine, but if they are 128-bit
and use only %xmm0-15 registers, assembler would again emit VEX encoded insn
which needs AES & AVX CPUID, rather than the EVEX encoded ones which need
VAES & AVX512VL CPUIDs.
Still, I wonder if -mvaes shouldn't imply at least -mavx512f and
-mno-avx512f shouldn't imply -mno-vaes, because otherwise can't see how
it could use 512-bit registers (this part not done in the patch).

The following patch has been successfully bootstrapped/regtested on
x86_64-linux and i686-linux.

2024-04-04  Jakub Jelinek  <ja...@redhat.com>

        PR target/114576
        * config/i386/i386.md (isa): Remove aes, add aes_avx, vaes_avx512vl.
        (enabled): Remove aes isa check, add aes_avx and vaes_avx512vl.
        * config/i386/sse.md (aesenc, aesenclast, aesdec, aesdeclast): Add
        4th alternative, emit {evex} prefix for the third one, use
        noavx,aes_avx,vaes_avx512vl,vaes_avx512vl isa attribute, use jm
        rather than m constraint on the 2nd and 3rd alternative input.
        (vaesdec_<mode>, vaesdeclast_<mode>, vaesenc_<mode>,
        vaesenclast_<mode>): Add second alternative with x instead of v
        and jm instead of m.

        * gcc.target/i386/aes-pr114576.c: New test.

--- gcc/config/i386/i386.md.jj  2024-03-18 22:15:43.165839479 +0100
+++ gcc/config/i386/i386.md     2024-04-04 00:48:46.575511556 +0200
@@ -568,13 +568,14 @@ (define_attr "unit" "integer,i387,sse,mm
 
 ;; Used to control the "enabled" attribute on a per-instruction basis.
 (define_attr "isa" "base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
-                   x64_avx,x64_avx512bw,x64_avx512dq,aes,apx_ndd,
+                   x64_avx,x64_avx512bw,x64_avx512dq,apx_ndd,
                    sse_noavx,sse2,sse2_noavx,sse3,sse3_noavx,sse4,sse4_noavx,
                    avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,avx512f_512,
                    noavx512f,avx512bw,avx512bw_512,noavx512bw,avx512dq,
                    noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,
                    avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert,
-                   avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl"
+                   avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl,
+                   aes_avx,vaes_avx512vl"
   (const_string "base"))
 
 ;; The (bounding maximum) length of an instruction immediate.
@@ -915,7 +916,6 @@ (define_attr "enabled" ""
           (symbol_ref "TARGET_64BIT && TARGET_AVX512BW")
         (eq_attr "isa" "x64_avx512dq")
           (symbol_ref "TARGET_64BIT && TARGET_AVX512DQ")
-        (eq_attr "isa" "aes") (symbol_ref "TARGET_AES")
         (eq_attr "isa" "sse_noavx")
           (symbol_ref "TARGET_SSE && !TARGET_AVX")
         (eq_attr "isa" "sse2") (symbol_ref "TARGET_SSE2")
@@ -968,6 +968,10 @@ (define_attr "enabled" ""
           (symbol_ref "TARGET_VPCLMULQDQ && TARGET_AVX512VL")
         (eq_attr "isa" "apx_ndd")
           (symbol_ref "TARGET_APX_NDD")
+        (eq_attr "isa" "aes_avx")
+          (symbol_ref "TARGET_AES && TARGET_AVX")
+        (eq_attr "isa" "vaes_avx512vl")
+          (symbol_ref "TARGET_VAES && TARGET_AVX512VL")
 
         (eq_attr "mmx_isa" "native")
           (symbol_ref "!TARGET_MMX_WITH_SSE")
--- gcc/config/i386/sse.md.jj   2024-03-18 22:15:43.168839437 +0100
+++ gcc/config/i386/sse.md      2024-04-04 00:58:56.482090689 +0200
@@ -26277,75 +26277,79 @@ (define_insn "xop_vpermil2<mode>3"
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 (define_insn "aesenc"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
-       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
+       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
+                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
                      UNSPEC_AESENC))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesenc\t{%2, %0|%0, %2}
    vaesenc\t{%2, %1, %0|%0, %1, %2}
+   %{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}
    vaesenc\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
+  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
    (set_attr "type" "sselog1")
-   (set_attr "addr" "gpr16,*,*")
+   (set_attr "addr" "gpr16,*,*,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
-   (set_attr "btver2_decode" "double,double,double")
+   (set_attr "prefix" "orig,vex,evex,evex")
+   (set_attr "btver2_decode" "double,double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesenclast"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
-       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
+       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
+                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
                      UNSPEC_AESENCLAST))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesenclast\t{%2, %0|%0, %2}
    vaesenclast\t{%2, %1, %0|%0, %1, %2}
+   %{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}
    vaesenclast\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
+  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
    (set_attr "type" "sselog1")
-   (set_attr "addr" "gpr16,*,*")
+   (set_attr "addr" "gpr16,*,*,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
-   (set_attr "btver2_decode" "double,double,double") 
+   (set_attr "prefix" "orig,vex,evex,evex")
+   (set_attr "btver2_decode" "double,double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesdec"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
-       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
+       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
+                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
                      UNSPEC_AESDEC))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesdec\t{%2, %0|%0, %2}
    vaesdec\t{%2, %1, %0|%0, %1, %2}
+   %{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}
    vaesdec\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
+  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
    (set_attr "type" "sselog1")
-   (set_attr "addr" "gpr16,*,*")
+   (set_attr "addr" "gpr16,*,*,*")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
-   (set_attr "btver2_decode" "double,double,double") 
+   (set_attr "prefix" "orig,vex,evex,evex")
+   (set_attr "btver2_decode" "double,double,double,double") 
    (set_attr "mode" "TI")])
 
 (define_insn "aesdeclast"
-  [(set (match_operand:V2DI 0 "register_operand" "=x,x,v")
-       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,v")
-                      (match_operand:V2DI 2 "vector_operand" "xja,xm,vm")]
+  [(set (match_operand:V2DI 0 "register_operand" "=x,x,x,v")
+       (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0,x,x,v")
+                      (match_operand:V2DI 2 "vector_operand" "xja,xjm,xjm,vm")]
                      UNSPEC_AESDECLAST))]
   "TARGET_AES || (TARGET_VAES && TARGET_AVX512VL)"
   "@
    aesdeclast\t{%2, %0|%0, %2}
    vaesdeclast\t{%2, %1, %0|%0, %1, %2}
+   %{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}
    vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "noavx,aes,avx512vl")
-   (set_attr "addr" "gpr16,*,*")
+  [(set_attr "isa" "noavx,aes_avx,vaes_avx512vl,vaes_avx512vl")
+   (set_attr "addr" "gpr16,*,*,*")
    (set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,vex,evex")
-   (set_attr "btver2_decode" "double,double,double")
+   (set_attr "prefix" "orig,vex,evex,evex")
+   (set_attr "btver2_decode" "double,double,double,double")
    (set_attr "mode" "TI")])
 
 (define_insn "aesimc"
@@ -30246,44 +30250,60 @@ (define_insn "vpdpwssds_<mode>_maskz_1"
    [(set_attr ("prefix") ("evex"))])
 
 (define_insn "vaesdec_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
        (unspec:VI1_AVX512VL_F
-         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
          UNSPEC_VAESDEC))]
   "TARGET_VAES"
-  "vaesdec\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesdec\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vaesdeclast_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
        (unspec:VI1_AVX512VL_F
-         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
          UNSPEC_VAESDECLAST))]
   "TARGET_VAES"
-  "vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vaesenc_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
        (unspec:VI1_AVX512VL_F
-         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
          UNSPEC_VAESENC))]
   "TARGET_VAES"
-  "vaesenc\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesenc\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vaesenclast_<mode>"
-  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
+  [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=x,v")
        (unspec:VI1_AVX512VL_F
-         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
+         [(match_operand:VI1_AVX512VL_F 1 "register_operand" "x,v")
+          (match_operand:VI1_AVX512VL_F 2 "vector_operand" "xjm,vm")]
          UNSPEC_VAESENCLAST))]
   "TARGET_VAES"
-  "vaesenclast\t{%2, %1, %0|%0, %1, %2}"
-)
+{
+  if (which_alternative == 0 && <MODE>mode == V16QImode)
+    return "%{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}";
+  else
+    return "vaesenclast\t{%2, %1, %0|%0, %1, %2}";
+})
 
 (define_insn "vpclmulqdq_<mode>"
   [(set (match_operand:VI8_FVL 0 "register_operand" "=v")
--- gcc/testsuite/gcc.target/i386/aes-pr114576.c.jj     2024-04-04 
09:50:17.117757179 +0200
+++ gcc/testsuite/gcc.target/i386/aes-pr114576.c        2024-04-04 
09:51:45.211544801 +0200
@@ -0,0 +1,63 @@
+/* PR target/114576 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -maes -mno-avx" } */
+/* { dg-final { scan-assembler-times "\taesenc\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taesdec\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taesenclast\t" 2 } } */
+/* { dg-final { scan-assembler-times "\taesdeclast\t" 2 } } */
+/* { dg-final { scan-assembler-not "\tvaesenc" } } */
+/* { dg-final { scan-assembler-not "\tvaesdec" } } */
+
+#include <immintrin.h>
+
+__m128i
+f1 (__m128i x, __m128i y)
+{
+  return _mm_aesenc_si128 (x, y);
+}
+
+__m128i
+f2 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesenc_si128 (x, y);
+  return z + x + y;
+}
+
+__m128i
+f3 (__m128i x, __m128i y)
+{
+  return _mm_aesdec_si128 (x, y);
+}
+
+__m128i
+f4 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesdec_si128 (x, y);
+  return z + x + y;
+}
+
+__m128i
+f5 (__m128i x, __m128i y)
+{
+  return _mm_aesenclast_si128 (x, y);
+}
+
+__m128i
+f6 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesenclast_si128 (x, y);
+  return z + x + y;
+}
+
+__m128i
+f7 (__m128i x, __m128i y)
+{
+  return _mm_aesdeclast_si128 (x, y);
+}
+
+__m128i
+f8 (__m128i x, __m128i y)
+{
+  __m128i z = _mm_aesdeclast_si128 (x, y);
+  return z + x + y;
+}


        Jakub

Reply via email to