On Sat, Aug 22, 2020 at 10:11 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Sat, Aug 22, 2020 at 6:27 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > On Fri, Aug 21, 2020 at 9:45 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > On Fri, Aug 21, 2020 at 9:35 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > On Fri, Aug 21, 2020 at 9:29 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > > > > > On Fri, Aug 21, 2020 at 11:50 PM Uros Bizjak <ubiz...@gmail.com> > > > > > wrote: > > > > > > > > > > > > On Fri, Aug 21, 2020 at 5:41 PM Hongtao Liu <crazy...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > On Fri, Aug 21, 2020 at 9:15 PM Uros Bizjak <ubiz...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > gcc/ > > > > > > > > > > > PR target/88808 > > > > > > > > > > > * config/i386/i386.c > > > > > > > > > > > (ix86_preferred_reload_class): Allow > > > > > > > > > > > QImode data go into mask registers. > > > > > > > > > > > * config/i386/i386.md: (*movhi_internal): Adjust > > > > > > > > > > > constraints > > > > > > > > > > > for mask registers. > > > > > > > > > > > (*movqi_internal): Ditto. > > > > > > > > > > > (*anddi_1): Support mask register operations > > > > > > > > > > > (*and<mode>_1): Ditto. > > > > > > > > > > > (*andqi_1): Ditto. > > > > > > > > > > > (*andn<mode>_1): Ditto. > > > > > > > > > > > (*<code><mode>_1): Ditto. > > > > > > > > > > > (*<code>qi_1): Ditto. > > > > > > > > > > > (*one_cmpl<mode>2_1): Ditto. > > > > > > > > > > > (*one_cmplsi2_1_zext): Ditto. > > > > > > > > > > > (*one_cmplqi2_1): Ditto. > > > > > > > > > > > (define_peephole2): Move constant 0/-1 directly > > > > > > > > > > > into mask > > > > > > > > > > > registers. > > > > > > > > > > > * config/i386/predicates.md (mask_reg_operand): > > > > > > > > > > > New predicate. > > > > > > > > > > > * config/i386/sse.md (define_split): Add > > > > > > > > > > > post-reload splitters > > > > > > > > > > > that would convert "generic" patterns to mask > > > > > > > > > > > patterns. > > > > > > > > > > > (*knotsi_1_zext): New define_insn. > > > > > > > > > > > > > > > > > > > > > > gcc/testsuite/ > > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-1.c: New test. > > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-2.c: New test. > > > > > > > > > > > * gcc.target/i386/bitwise_mask_op-3.c: New test. > > > > > > > > > > > * gcc.target/i386/avx512bw-pr88465.c: New > > > > > > > > > > > testcase. > > > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust > > > > > > > > > > > testcase. > > > > > > > > > > > * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto. > > > > > > > > > > > * gcc.target/i386/avx512dq-kmovb-5.c: Ditto. > > > > > > > > > > > * gcc.target/i386/avx512f-kmovw-5.c: Ditto. > > > > > > > > > > > > > > > > > > > > A little nit, please put new splitters after the > > > > > > > > > > instruction pattern. > > > > > > > > > > > > > > > > > > > > OK for the whole patch set with the above change, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, thanks for the review. > > > > > > > > > > > > > > > > Please note that your patch introduces several testsuite fails > > > > > > > > with -m32: > > > > > > > > > > > > > > > > gcc -O2 -mavx512bitalg -mavx512bw -m32 -g > > > > > > > > avx512bitalg-vpopcntb-1.c > > > > > > > > > > > > > > > > > > > > > > I can't reproduce this failure. > > > > > > > > > > > > Because you are running it on AVX512 enabled target. > > > > > > > > > > > > > > Program received signal SIGILL, Illegal instruction. > > > > > > > > 0x080490ac in __get_cpuid_count (__edx=<synthetic pointer>, > > > > > > > > __ecx=<synthetic pointer>, __ebx=<synthetic pointer>, > > > > > > > > __eax=<synthetic > > > > > > > > pointer>, > > > > > > > > __subleaf=0, __leaf=7) at > > > > > > > > /hdd/uros/gcc-build-fast/gcc/include/cpuid.h:316 > > > > > > > > 316 __cpuid_count (__leaf, __subleaf, *__eax, *__ebx, > > > > > > > > *__ecx, *__edx); > > > > > > > > > > > > > > > > 0x080490a3 <+51>: cpuid > > > > > > > > 0x080490a5 <+53>: mov $0x1,%eax > > > > > > > > 0x080490aa <+58>: mov %ecx,%esi > > > > > > > > => 0x080490ac <+60>: kmovd %ebx,%k0 > > > > > > > > 0x080490b0 <+64>: mov %edi,%ecx > > > > > > > > 0x080490b2 <+66>: mov %edi,%ebx > > > > > > > > > > > > > > > > kmov insn is generated for __cpuid_count function, where the > > > > > > > > binary > > > > > > > > determines, if the new instructions are supported. The binary > > > > > > > > will > > > > > > > > crash in the detection code if the processor lacks AVX512 > > > > > > > > instructions. > > > > > > > > > > > > > > > > > > > > > > IMHO, the testcase shouldn't be run on processors without > > > > > > > AVX512BW. > > > > > > > > > > > > No, it could run, because it checks for AVX512BW at runtime. > > > > > > > > > > > > > > > > Got it. > > > > > > > > > > > > Because in avx512bitalg-vpopcntb-1.c, there's /* > > > > > > > dg-require-effective-target avx512bw } */. > > > > > > > > > > > > This is to check the toolchain for support. > > > > > > > > > > > > > what's the version of your assembler? > > > > > > > > > > > > GNU assembler version 2.34-4.fc32 > > > > > > > > > > > > > > > > If assembler supports avx512bw, but processor not, the test would pass > > > > > condition `dg-require-effective-target avx512bw` and be runned. > > > > > then crashed for illegal instruction. > > > > > > > > > > > Please add something like > > > > > > X86_TUNE_INTER_UNIT_MOVES_FROM_MASK/X86_TUNE_INTER_UNIT_MOVES_TO_MASK > > > > > > and enable them only for m_CORE_AVX512 (or perhaps m_INTEL). > > > > > > > > > > > > Handle this in inline_secondary_memory_needed to reject direct moves > > > > > > for all other targets. This should disable direct moves for generic > > > > > > targets. > > > > > > > > > > > > > > > > Yes, I'll add it. > > > > > > > > > > > > > > > > > (define_insn "*movsi_internal" > > > > [(set (match_operand:SI 0 "nonimmediate_operand" > > > > "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k") > > > > (match_operand:SI 1 "general_operand" > > > > "g ,re,C ,*y,m ,*y,*y,r ,C ,*v,m ,*v,*v,r ,*r,*km,*k ,CBC"))] > > > > "!(MEM_P (operands[0]) && MEM_P (operands[1]))" > > > > ... > > > > [(set (attr "isa") > > > > (cond [(eq_attr "alternative" "12,13") > > > > (const_string "sse2") > > > > ] > > > > (const_string "*"))) > > > > > > > > is wrong. mask register alternatives should be marked with avx512f. > > > > Please fix it. Other integer move patterns may have the same issue. > > > > Once these are fixed, > > > > > > > > diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h > > > > b/gcc/testsuite/gcc.target/i386/avx512-check.h > > > > index 0a377dba1d5..576e9b390c6 100644 > > > > --- a/gcc/testsuite/gcc.target/i386/avx512-check.h > > > > +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h > > > > @@ -25,6 +25,7 @@ do_test (void) > > > > } > > > > #endif > > > > > > > > +__attribute__((target ("no-avx512f"))) > > > > static int > > > > check_osxsave (void) > > > > { > > > > @@ -34,6 +35,7 @@ check_osxsave (void) > > > > return (ecx & bit_OSXSAVE) != 0; > > > > } > > > > > > > > +__attribute__((target ("no-avx512f"))) > > > > int > > > > main () > > > > { > > > > > > > > should work. > > > > > > > > > > Like this. You need to check all integer patterns with mskmov and msklog. > > > > Compile CPUID check with "-mno-sse -mfpmath=387" to disable SSE, AVX and > > AVX512 during CPUID check to avoid vector and mask register operations. > > -mgeneral-regs-only ? >
Here is a patch to add target("general-regs-only") function attribute and use it for CPUID check. OK for master if there are no regressions? Thanks. -- H.J.
From fda2012d7b2100eabec4610595c113fa5fc83638 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.to...@gmail.com> Date: Fri, 21 Aug 2020 09:42:49 -0700 Subject: [PATCH] x86: Only use general-purpose registers during CPUID check Compile CPUID check with -mgeneral-regs-only or -mno-sse -mfpmath=387 target attribute to only use general-purpose registers. Note: -mno-sse -mfpmath=387 target attribute is used for GCCs older than GCC 11. gcc/ PR target/96744 * common/config/i386/i386-common.c (ix86_handle_option): Set x_ix86_fpmath to FPMATH_387 for -mgeneral-regs-only. * config/i386/cpuid.h: Add #pragma GCC target("general-regs-only") or #pragma GCC target("no-sse,fpmath=387") to only use general-purpose registers. * config/i386/i386-options.c (IX86_ATTR_IX86_YES): New. (IX86_ATTR_IX86_NO): Likewise. (ix86_opt_type): Add ix86_opt_ix86_yes and ix86_opt_ix86_no. (ix86_valid_target_attribute_inner_p): Handle general-regs-only, ix86_opt_ix86_yes and ix86_opt_ix86_no. * doc/extend.texi: Document target("general-regs-only") function attribute. gcc/testsuite/ PR target/96744 * gcc.target/i386/adx-check.h: Add __attribute__((__target__("general-regs-only"))) to only use general-purpose registers. * gcc.target/i386/bmi-check.h: Likewise. * gcc.target/i386/bmi2-check.h: Likewise. * gcc.target/i386/pr77756.c: Likewise. * gcc.target/i386/pr95973.c: Likewise. * gcc.target/i386/rtm-check.h: Likewise. * gcc.target/i386/sha-check.h: Likewise. * gcc.target/i386/avx2-check.h: Add #pragma GCC target("general-regs-only") to only use general-purpose registers. * gcc.target/i386/avx512-check.h: Likewise. --- gcc/common/config/i386/i386-common.c | 1 + gcc/config/i386/cpuid.h | 9 +++++ gcc/config/i386/i386-options.c | 37 ++++++++++++++++++++ gcc/doc/extend.texi | 4 +++ gcc/testsuite/gcc.target/i386/adx-check.h | 2 +- gcc/testsuite/gcc.target/i386/avx2-check.h | 5 +++ gcc/testsuite/gcc.target/i386/avx512-check.h | 5 +++ gcc/testsuite/gcc.target/i386/bmi-check.h | 1 + gcc/testsuite/gcc.target/i386/bmi2-check.h | 1 + gcc/testsuite/gcc.target/i386/pr77756.c | 1 + gcc/testsuite/gcc.target/i386/pr95973.c | 1 + gcc/testsuite/gcc.target/i386/rtm-check.h | 1 + gcc/testsuite/gcc.target/i386/sha-check.h | 1 + 13 files changed, 68 insertions(+), 1 deletion(-) diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index bb14305ad7b..77a67d0dd20 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -333,6 +333,7 @@ ix86_handle_option (struct gcc_options *opts, |= OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET; opts->x_target_flags &= ~MASK_80387; + opts->x_ix86_fpmath = FPMATH_387; } else gcc_unreachable (); diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index bca61d620db..d3bcf4ae313 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -24,6 +24,13 @@ #ifndef _CPUID_H_INCLUDED #define _CPUID_H_INCLUDED +#pragma GCC push_options +#if __GNUC__ >= 11 +#pragma GCC target("general-regs-only") +#else +#pragma GCC target("no-sse,fpmath=387") +#endif + /* %eax */ #define bit_AVX512BF16 (1 << 5) @@ -324,4 +331,6 @@ __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf) __cpuid_info[2], __cpuid_info[3]); } +#pragma GCC pop_options + #endif /* _CPUID_H_INCLUDED */ diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 26d1ea18ef1..27cba65ccf9 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -922,12 +922,18 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], #define IX86_ATTR_ENUM(S,O) { S, sizeof (S)-1, ix86_opt_enum, O, 0 } #define IX86_ATTR_YES(S,O,M) { S, sizeof (S)-1, ix86_opt_yes, O, M } #define IX86_ATTR_NO(S,O,M) { S, sizeof (S)-1, ix86_opt_no, O, M } +#define IX86_ATTR_IX86_YES(S,O,M) \ + { S, sizeof (S)-1, ix86_opt_ix86_yes, O, M } +#define IX86_ATTR_IX86_NO(S,O,M) \ + { S, sizeof (S)-1, ix86_opt_ix86_no, O, M } enum ix86_opt_type { ix86_opt_unknown, ix86_opt_yes, ix86_opt_no, + ix86_opt_ix86_yes, + ix86_opt_ix86_no, ix86_opt_str, ix86_opt_enum, ix86_opt_isa @@ -1062,6 +1068,10 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], IX86_ATTR_YES ("recip", OPT_mrecip, MASK_RECIP), + + IX86_ATTR_IX86_YES ("general-regs-only", + OPT_mgeneral_regs_only, + OPTION_MASK_GENERAL_REGS_ONLY), }; location_t loc @@ -1175,6 +1185,33 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], opts->x_target_flags &= ~mask; } + else if (type == ix86_opt_ix86_yes || type == ix86_opt_ix86_no) + { + if (mask == OPTION_MASK_GENERAL_REGS_ONLY) + { + if (type != ix86_opt_ix86_yes) + gcc_unreachable (); + + opts->x_ix86_target_flags |= mask; + + struct cl_decoded_option decoded; + generate_option (opt, NULL, opt_set_p, CL_TARGET, + &decoded); + ix86_handle_option (opts, opts_set, &decoded, + input_location); + } + else + { + if (type == ix86_opt_ix86_no) + opt_set_p = !opt_set_p; + + if (opt_set_p) + opts->x_ix86_target_flags |= mask; + else + opts->x_ix86_target_flags &= ~mask; + } + } + else if (type == ix86_opt_str) { if (p_strings[opt]) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index fd794961e0a..2bb9b2f72f5 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -6656,6 +6656,10 @@ Enable/disable the generation of RCPSS, RCPPS, RSQRTSS and RSQRTPS instructions followed an additional Newton-Raphson step instead of doing a floating-point division. +@item general-regs-only +@cindex @code{target("general-regs-only")} function attribute, x86 +Generate code which uses only the general registers. + @item arch=@var{ARCH} @cindex @code{target("arch=@var{ARCH}")} function attribute, x86 Specify the architecture to generate code for in compiling the function. diff --git a/gcc/testsuite/gcc.target/i386/adx-check.h b/gcc/testsuite/gcc.target/i386/adx-check.h index cfed1a38483..40f3b523f2b 100644 --- a/gcc/testsuite/gcc.target/i386/adx-check.h +++ b/gcc/testsuite/gcc.target/i386/adx-check.h @@ -8,6 +8,7 @@ static void __attribute__ ((noinline)) do_test (void) adx_test (); } +__attribute__((__target__("general-regs-only"))) int main () { @@ -31,4 +32,3 @@ main () #endif return 0; } - diff --git a/gcc/testsuite/gcc.target/i386/avx2-check.h b/gcc/testsuite/gcc.target/i386/avx2-check.h index 25bed5e0da6..dd0f99db0e2 100644 --- a/gcc/testsuite/gcc.target/i386/avx2-check.h +++ b/gcc/testsuite/gcc.target/i386/avx2-check.h @@ -10,6 +10,9 @@ static void __attribute__ ((noinline)) do_test (void) avx2_test (); } +#pragma GCC push_options +#pragma GCC target("general-regs-only") + static int check_osxsave (void) { @@ -42,3 +45,5 @@ main () #endif return 0; } + +#pragma GCC pop_options diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h b/gcc/testsuite/gcc.target/i386/avx512-check.h index 0a377dba1d5..8aa3361dc63 100644 --- a/gcc/testsuite/gcc.target/i386/avx512-check.h +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h @@ -25,6 +25,9 @@ do_test (void) } #endif +#pragma GCC push_options +#pragma GCC target("general-regs-only") + static int check_osxsave (void) { @@ -110,3 +113,5 @@ main () #endif return 0; } + +#pragma GCC pop_options diff --git a/gcc/testsuite/gcc.target/i386/bmi-check.h b/gcc/testsuite/gcc.target/i386/bmi-check.h index 1973f3b6468..ee4cad96a8b 100644 --- a/gcc/testsuite/gcc.target/i386/bmi-check.h +++ b/gcc/testsuite/gcc.target/i386/bmi-check.h @@ -12,6 +12,7 @@ do_test (void) bmi_test (); } +__attribute__((__target__("general-regs-only"))) int main () { diff --git a/gcc/testsuite/gcc.target/i386/bmi2-check.h b/gcc/testsuite/gcc.target/i386/bmi2-check.h index ba91ef9b780..7977e353640 100644 --- a/gcc/testsuite/gcc.target/i386/bmi2-check.h +++ b/gcc/testsuite/gcc.target/i386/bmi2-check.h @@ -11,6 +11,7 @@ do_test (void) bmi2_test (); } +__attribute__((__target__("general-regs-only"))) int main () { diff --git a/gcc/testsuite/gcc.target/i386/pr77756.c b/gcc/testsuite/gcc.target/i386/pr77756.c index 1eee7cd5a00..bfe5d866bca 100644 --- a/gcc/testsuite/gcc.target/i386/pr77756.c +++ b/gcc/testsuite/gcc.target/i386/pr77756.c @@ -2,6 +2,7 @@ #include "cpuid.h" +__attribute__((__target__("general-regs-only"))) int main () { diff --git a/gcc/testsuite/gcc.target/i386/pr95973.c b/gcc/testsuite/gcc.target/i386/pr95973.c index 08c7dba8f46..9fc50afe43f 100644 --- a/gcc/testsuite/gcc.target/i386/pr95973.c +++ b/gcc/testsuite/gcc.target/i386/pr95973.c @@ -4,6 +4,7 @@ #include <cpuid.h> #include <cpuid.h> +__attribute__((__target__("general-regs-only"))) int main () { diff --git a/gcc/testsuite/gcc.target/i386/rtm-check.h b/gcc/testsuite/gcc.target/i386/rtm-check.h index bdb5a6dc0bf..b7936ed0455 100644 --- a/gcc/testsuite/gcc.target/i386/rtm-check.h +++ b/gcc/testsuite/gcc.target/i386/rtm-check.h @@ -8,6 +8,7 @@ static void __attribute__ ((noinline)) do_test (void) rtm_test (); } +__attribute__((__target__("general-regs-only"))) int main () { diff --git a/gcc/testsuite/gcc.target/i386/sha-check.h b/gcc/testsuite/gcc.target/i386/sha-check.h index 5bc5a59ab80..2bf9e8315d4 100644 --- a/gcc/testsuite/gcc.target/i386/sha-check.h +++ b/gcc/testsuite/gcc.target/i386/sha-check.h @@ -10,6 +10,7 @@ do_test (void) sha_test (); } +__attribute__((__target__("general-regs-only"))) int main () { -- 2.26.2