On Sat, Aug 22, 2020 at 10:11 AM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Sat, Aug 22, 2020 at 6:27 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > On Fri, Aug 21, 2020 at 9:45 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> > >
> > > On Fri, Aug 21, 2020 at 9:35 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > >
> > > > On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > > > >
> > > > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak <ubiz...@gmail.com> 
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu <crazy...@gmail.com> 
> > > > > > wrote:
> > > > > > >
> > > > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak <ubiz...@gmail.com> 
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > > > gcc/
> > > > > > > > > > >         PR target/88808
> > > > > > > > > > >         * config/i386/i386.c 
> > > > > > > > > > > (ix86_preferred_reload_class): Allow
> > > > > > > > > > >         QImode data go into mask registers.
> > > > > > > > > > >         * config/i386/i386.md: (*movhi_internal): Adjust 
> > > > > > > > > > > constraints
> > > > > > > > > > >         for mask registers.
> > > > > > > > > > >         (*movqi_internal): Ditto.
> > > > > > > > > > >         (*anddi_1): Support mask register operations
> > > > > > > > > > >         (*and<mode>_1): Ditto.
> > > > > > > > > > >         (*andqi_1): Ditto.
> > > > > > > > > > >         (*andn<mode>_1): Ditto.
> > > > > > > > > > >         (*<code><mode>_1): Ditto.
> > > > > > > > > > >         (*<code>qi_1): Ditto.
> > > > > > > > > > >         (*one_cmpl<mode>2_1): Ditto.
> > > > > > > > > > >         (*one_cmplsi2_1_zext): Ditto.
> > > > > > > > > > >         (*one_cmplqi2_1): Ditto.
> > > > > > > > > > >         (define_peephole2): Move constant 0/-1 directly 
> > > > > > > > > > > into mask
> > > > > > > > > > >         registers.
> > > > > > > > > > >         * config/i386/predicates.md (mask_reg_operand): 
> > > > > > > > > > > New predicate.
> > > > > > > > > > >         * config/i386/sse.md (define_split): Add 
> > > > > > > > > > > post-reload splitters
> > > > > > > > > > >         that would convert "generic" patterns to mask 
> > > > > > > > > > > patterns.
> > > > > > > > > > >         (*knotsi_1_zext): New define_insn.
> > > > > > > > > > >
> > > > > > > > > > > gcc/testsuite/
> > > > > > > > > > >         * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > > > > > >         * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > > > > > >         * gcc.target/i386/bitwise_mask_op-3.c: New test.
> > > > > > > > > > >         * gcc.target/i386/avx512bw-pr88465.c: New 
> > > > > > > > > > > testcase.
> > > > > > > > > > >         * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust 
> > > > > > > > > > > testcase.
> > > > > > > > > > >         * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > > > > > >         * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > > > > > >         * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > > > > > >
> > > > > > > > > > A little nit, please put new splitters after the 
> > > > > > > > > > instruction pattern.
> > > > > > > > > >
> > > > > > > > > > OK for the whole patch set with the above change,
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, thanks for the review.
> > > > > > > >
> > > > > > > > Please note that your patch introduces several testsuite fails 
> > > > > > > > with -m32:
> > > > > > > >
> > > > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g 
> > > > > > > > avx512bitalg-vpopcntb-1.c
> > > > > > > >
> > > > > > >
> > > > > > > I can't reproduce this failure.
> > > > > >
> > > > > > Because you are running it on AVX512 enabled target.
> > > > > >
> > > > > > > > Program received signal SIGILL, Illegal instruction.
> > > > > > > > 0x080490ac in __get_cpuid_count (__edx=<synthetic pointer>,
> > > > > > > > __ecx=<synthetic pointer>, __ebx=<synthetic pointer>, 
> > > > > > > > __eax=<synthetic
> > > > > > > > pointer>,
> > > > > > > >     __subleaf=0, __leaf=7) at 
> > > > > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316
> > > > > > > > 316       __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, 
> > > > > > > > *__ecx, *__edx);
> > > > > > > >
> > > > > > > >    0x080490a3 <+51>:    cpuid
> > > > > > > >    0x080490a5 <+53>:    mov    $0x1,%eax
> > > > > > > >    0x080490aa <+58>:    mov    %ecx,%esi
> > > > > > > > => 0x080490ac <+60>:    kmovd  %ebx,%k0
> > > > > > > >    0x080490b0 <+64>:    mov    %edi,%ecx
> > > > > > > >    0x080490b2 <+66>:    mov    %edi,%ebx
> > > > > > > >
> > > > > > > > kmov insn is generated for __cpuid_count function, where the 
> > > > > > > > binary
> > > > > > > > determines, if the new instructions are supported. The binary 
> > > > > > > > will
> > > > > > > > crash in the detection code if the processor lacks AVX512
> > > > > > > > instructions.
> > > > > > > >
> > > > > > >
> > > > > > > IMHO, the testcase shouldn't be run on processors without 
> > > > > > > AVX512BW.
> > > > > >
> > > > > > No, it could run, because it checks for AVX512BW at runtime.
> > > > > >
> > > > >
> > > > > Got it.
> > > > >
> > > > > > > Because in  avx512bitalg-vpopcntb-1.c, there's /*
> > > > > > > dg-require-effective-target avx512bw } */.
> > > > > >
> > > > > > This is to check the toolchain for support.
> > > > > >
> > > > > > > what's the version of your assembler?
> > > > > >
> > > > > > GNU assembler version 2.34-4.fc32
> > > > > >
> > > > >
> > > > > If assembler supports avx512bw, but processor not, the test would pass
> > > > > condition `dg-require-effective-target avx512bw` and be runned.
> > > > > then crashed for illegal instruction.
> > > > >
> > > > > > Please add something like
> > > > > > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK
> > > > > > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL).
> > > > > >
> > > > > > Handle this in inline_secondary_memory_needed to reject direct moves
> > > > > > for all other targets. This should disable direct moves for generic
> > > > > > targets.
> > > > > >
> > > > >
> > > > > Yes, I'll add it.
> > > > >
> > > >
> > > >
> > > > (define_insn "*movsi_internal"
> > > >   [(set (match_operand:SI 0 "nonimmediate_operand"
> > > >     "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> > > >         (match_operand:SI 1 "general_operand"
> > > >     "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k ,CBC"))]
> > > >   "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > > > ...
> > > >  [(set (attr "isa")
> > > >      (cond [(eq_attr "alternative" "12,13")
> > > >               (const_string "sse2")
> > > >            ]
> > > >            (const_string "*")))
> > > >
> > > > is wrong.   mask register alternatives should be marked with avx512f.
> > > > Please fix it.   Other integer move patterns may have the same issue.
> > > > Once these are fixed,
> > > >
> > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > > b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > > index 0a377dba1d5..576e9b390c6 100644
> > > > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> > > > @@ -25,6 +25,7 @@ do_test (void)
> > > >  }
> > > >  #endif
> > > >
> > > > +__attribute__((target ("no-avx512f")))
> > > >  static int
> > > >  check_osxsave (void)
> > > >  {
> > > > @@ -34,6 +35,7 @@ check_osxsave (void)
> > > >    return (ecx & bit_OSXSAVE) != 0;
> > > >  }
> > > >
> > > > +__attribute__((target ("no-avx512f")))
> > > >  int
> > > >  main ()
> > > >  {
> > > >
> > > > should work.
> > > >
> > >
> > > Like this.  You need to check all integer patterns with mskmov and msklog.
> >
> > Compile CPUID check with "-mno-sse -mfpmath=387" to disable SSE, AVX and
> > AVX512 during CPUID check to avoid vector and mask register operations.
>
> -mgeneral-regs-only ?
>

Here is a patch to add target("general-regs-only") function
attribute and use it for CPUID check.   OK for master if there
are no regressions?

Thanks.

-- 
H.J.
From fda2012d7b2100eabec4610595c113fa5fc83638 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.to...@gmail.com>
Date: Fri, 21 Aug 2020 09:42:49 -0700
Subject: [PATCH] x86: Only use general-purpose registers during CPUID check

Compile CPUID check with -mgeneral-regs-only or -mno-sse -mfpmath=387
target attribute to only use general-purpose registers.

Note: -mno-sse -mfpmath=387 target attribute is used for GCCs older than
GCC 11.

gcc/

	PR target/96744
	* common/config/i386/i386-common.c (ix86_handle_option): Set
	x_ix86_fpmath to FPMATH_387 for -mgeneral-regs-only.
	* config/i386/cpuid.h: Add #pragma GCC target("general-regs-only")
	or #pragma GCC target("no-sse,fpmath=387") to only use
	general-purpose registers.
	* config/i386/i386-options.c (IX86_ATTR_IX86_YES): New.
	(IX86_ATTR_IX86_NO): Likewise.
	(ix86_opt_type): Add ix86_opt_ix86_yes and ix86_opt_ix86_no.
	(ix86_valid_target_attribute_inner_p): Handle general-regs-only,
	ix86_opt_ix86_yes and ix86_opt_ix86_no.
	* doc/extend.texi: Document target("general-regs-only") function
	attribute.

gcc/testsuite/

	PR target/96744
	* gcc.target/i386/adx-check.h: Add
	__attribute__((__target__("general-regs-only"))) to only use
	general-purpose registers.
	* gcc.target/i386/bmi-check.h: Likewise.
	* gcc.target/i386/bmi2-check.h: Likewise.
	* gcc.target/i386/pr77756.c: Likewise.
	* gcc.target/i386/pr95973.c: Likewise.
	* gcc.target/i386/rtm-check.h: Likewise.
	* gcc.target/i386/sha-check.h: Likewise.
	* gcc.target/i386/avx2-check.h: Add
	#pragma GCC target("general-regs-only") to only use
	general-purpose registers.
	* gcc.target/i386/avx512-check.h: Likewise.
---
 gcc/common/config/i386/i386-common.c         |  1 +
 gcc/config/i386/cpuid.h                      |  9 +++++
 gcc/config/i386/i386-options.c               | 37 ++++++++++++++++++++
 gcc/doc/extend.texi                          |  4 +++
 gcc/testsuite/gcc.target/i386/adx-check.h    |  2 +-
 gcc/testsuite/gcc.target/i386/avx2-check.h   |  5 +++
 gcc/testsuite/gcc.target/i386/avx512-check.h |  5 +++
 gcc/testsuite/gcc.target/i386/bmi-check.h    |  1 +
 gcc/testsuite/gcc.target/i386/bmi2-check.h   |  1 +
 gcc/testsuite/gcc.target/i386/pr77756.c      |  1 +
 gcc/testsuite/gcc.target/i386/pr95973.c      |  1 +
 gcc/testsuite/gcc.target/i386/rtm-check.h    |  1 +
 gcc/testsuite/gcc.target/i386/sha-check.h    |  1 +
 13 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index bb14305ad7b..77a67d0dd20 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -333,6 +333,7 @@ ix86_handle_option (struct gcc_options *opts,
 	    |= OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET;
 
 	  opts->x_target_flags &= ~MASK_80387;
+	  opts->x_ix86_fpmath = FPMATH_387;
 	}
       else
 	gcc_unreachable ();
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index bca61d620db..d3bcf4ae313 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -24,6 +24,13 @@
 #ifndef _CPUID_H_INCLUDED
 #define _CPUID_H_INCLUDED
 
+#pragma GCC push_options
+#if __GNUC__ >= 11
+#pragma GCC target("general-regs-only")
+#else
+#pragma GCC target("no-sse,fpmath=387")
+#endif
+
 /* %eax */
 #define bit_AVX512BF16	(1 << 5)
 
@@ -324,4 +331,6 @@ __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
 		 __cpuid_info[2], __cpuid_info[3]);
 }
 
+#pragma GCC pop_options
+
 #endif /* _CPUID_H_INCLUDED */
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 26d1ea18ef1..27cba65ccf9 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -922,12 +922,18 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
 #define IX86_ATTR_ENUM(S,O)  { S, sizeof (S)-1, ix86_opt_enum, O, 0 }
 #define IX86_ATTR_YES(S,O,M) { S, sizeof (S)-1, ix86_opt_yes, O, M }
 #define IX86_ATTR_NO(S,O,M)  { S, sizeof (S)-1, ix86_opt_no,  O, M }
+#define IX86_ATTR_IX86_YES(S,O,M) \
+  { S, sizeof (S)-1, ix86_opt_ix86_yes, O, M }
+#define IX86_ATTR_IX86_NO(S,O,M) \
+  { S, sizeof (S)-1, ix86_opt_ix86_no,  O, M }
 
   enum ix86_opt_type
   {
     ix86_opt_unknown,
     ix86_opt_yes,
     ix86_opt_no,
+    ix86_opt_ix86_yes,
+    ix86_opt_ix86_no,
     ix86_opt_str,
     ix86_opt_enum,
     ix86_opt_isa
@@ -1062,6 +1068,10 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
     IX86_ATTR_YES ("recip",
 		   OPT_mrecip,
 		   MASK_RECIP),
+
+    IX86_ATTR_IX86_YES ("general-regs-only",
+			OPT_mgeneral_regs_only,
+			OPTION_MASK_GENERAL_REGS_ONLY),
   };
 
   location_t loc
@@ -1175,6 +1185,33 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[],
 	    opts->x_target_flags &= ~mask;
 	}
 
+      else if (type == ix86_opt_ix86_yes || type == ix86_opt_ix86_no)
+	{
+	  if (mask == OPTION_MASK_GENERAL_REGS_ONLY)
+	    {
+	      if (type != ix86_opt_ix86_yes)
+		gcc_unreachable ();
+
+	      opts->x_ix86_target_flags |= mask;
+
+	      struct cl_decoded_option decoded;
+	      generate_option (opt, NULL, opt_set_p, CL_TARGET,
+			       &decoded);
+	      ix86_handle_option (opts, opts_set, &decoded,
+				  input_location);
+	    }
+	  else
+	    {
+	      if (type == ix86_opt_ix86_no)
+		opt_set_p = !opt_set_p;
+
+	      if (opt_set_p)
+		opts->x_ix86_target_flags |= mask;
+	      else
+		opts->x_ix86_target_flags &= ~mask;
+	    }
+	}
+
       else if (type == ix86_opt_str)
 	{
 	  if (p_strings[opt])
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index fd794961e0a..2bb9b2f72f5 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -6656,6 +6656,10 @@ Enable/disable the generation of RCPSS, RCPPS, RSQRTSS and RSQRTPS
 instructions followed an additional Newton-Raphson step instead of
 doing a floating-point division.
 
+@item general-regs-only
+@cindex @code{target("general-regs-only")} function attribute, x86
+Generate code which uses only the general registers.
+
 @item arch=@var{ARCH}
 @cindex @code{target("arch=@var{ARCH}")} function attribute, x86
 Specify the architecture to generate code for in compiling the function.
diff --git a/gcc/testsuite/gcc.target/i386/adx-check.h b/gcc/testsuite/gcc.target/i386/adx-check.h
index cfed1a38483..40f3b523f2b 100644
--- a/gcc/testsuite/gcc.target/i386/adx-check.h
+++ b/gcc/testsuite/gcc.target/i386/adx-check.h
@@ -8,6 +8,7 @@ static void __attribute__ ((noinline)) do_test (void)
   adx_test ();
 }
 
+__attribute__((__target__("general-regs-only")))
 int
 main ()
 {
@@ -31,4 +32,3 @@ main ()
 #endif
   return 0;
 }
-
diff --git a/gcc/testsuite/gcc.target/i386/avx2-check.h b/gcc/testsuite/gcc.target/i386/avx2-check.h
index 25bed5e0da6..dd0f99db0e2 100644
--- a/gcc/testsuite/gcc.target/i386/avx2-check.h
+++ b/gcc/testsuite/gcc.target/i386/avx2-check.h
@@ -10,6 +10,9 @@ static void __attribute__ ((noinline)) do_test (void)
   avx2_test ();
 }
 
+#pragma GCC push_options
+#pragma GCC target("general-regs-only")
+
 static int
 check_osxsave (void)
 {
@@ -42,3 +45,5 @@ main ()
 #endif
   return 0;
 }
+
+#pragma GCC pop_options
diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h
index 0a377dba1d5..8aa3361dc63 100644
--- a/gcc/testsuite/gcc.target/i386/avx512-check.h
+++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
@@ -25,6 +25,9 @@ do_test (void)
 }
 #endif
 
+#pragma GCC push_options
+#pragma GCC target("general-regs-only")
+
 static int
 check_osxsave (void)
 {
@@ -110,3 +113,5 @@ main ()
 #endif
   return 0;
 }
+
+#pragma GCC pop_options
diff --git a/gcc/testsuite/gcc.target/i386/bmi-check.h b/gcc/testsuite/gcc.target/i386/bmi-check.h
index 1973f3b6468..ee4cad96a8b 100644
--- a/gcc/testsuite/gcc.target/i386/bmi-check.h
+++ b/gcc/testsuite/gcc.target/i386/bmi-check.h
@@ -12,6 +12,7 @@ do_test (void)
   bmi_test ();
 }
 
+__attribute__((__target__("general-regs-only")))
 int
 main ()
 {
diff --git a/gcc/testsuite/gcc.target/i386/bmi2-check.h b/gcc/testsuite/gcc.target/i386/bmi2-check.h
index ba91ef9b780..7977e353640 100644
--- a/gcc/testsuite/gcc.target/i386/bmi2-check.h
+++ b/gcc/testsuite/gcc.target/i386/bmi2-check.h
@@ -11,6 +11,7 @@ do_test (void)
   bmi2_test ();
 }
 
+__attribute__((__target__("general-regs-only")))
 int
 main ()
 {
diff --git a/gcc/testsuite/gcc.target/i386/pr77756.c b/gcc/testsuite/gcc.target/i386/pr77756.c
index 1eee7cd5a00..bfe5d866bca 100644
--- a/gcc/testsuite/gcc.target/i386/pr77756.c
+++ b/gcc/testsuite/gcc.target/i386/pr77756.c
@@ -2,6 +2,7 @@
 
 #include "cpuid.h"
 
+__attribute__((__target__("general-regs-only")))
 int
 main ()
 {
diff --git a/gcc/testsuite/gcc.target/i386/pr95973.c b/gcc/testsuite/gcc.target/i386/pr95973.c
index 08c7dba8f46..9fc50afe43f 100644
--- a/gcc/testsuite/gcc.target/i386/pr95973.c
+++ b/gcc/testsuite/gcc.target/i386/pr95973.c
@@ -4,6 +4,7 @@
 #include <cpuid.h>
 #include <cpuid.h>
 
+__attribute__((__target__("general-regs-only")))
 int
 main ()
 {
diff --git a/gcc/testsuite/gcc.target/i386/rtm-check.h b/gcc/testsuite/gcc.target/i386/rtm-check.h
index bdb5a6dc0bf..b7936ed0455 100644
--- a/gcc/testsuite/gcc.target/i386/rtm-check.h
+++ b/gcc/testsuite/gcc.target/i386/rtm-check.h
@@ -8,6 +8,7 @@ static void __attribute__ ((noinline)) do_test (void)
   rtm_test ();
 }
 
+__attribute__((__target__("general-regs-only")))
 int
 main ()
 {
diff --git a/gcc/testsuite/gcc.target/i386/sha-check.h b/gcc/testsuite/gcc.target/i386/sha-check.h
index 5bc5a59ab80..2bf9e8315d4 100644
--- a/gcc/testsuite/gcc.target/i386/sha-check.h
+++ b/gcc/testsuite/gcc.target/i386/sha-check.h
@@ -10,6 +10,7 @@ do_test (void)
   sha_test ();
 }
 
+__attribute__((__target__("general-regs-only")))
 int
 main ()
 {
-- 
2.26.2

Reply via email to