Re: [PATCH v5] : Add pragma GCC target("general-regs-only")

2021-08-03 Thread Richard Biener via Gcc-patches
On Sun, Jul 18, 2021 at 3:46 AM H.J. Lu  wrote:
>
> On Thu, Apr 22, 2021 at 7:30 AM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Thu, Apr 22, 2021 at 2:52 PM Richard Biener
> >  wrote:
> > >
> > > On Thu, Apr 22, 2021 at 2:22 PM Jakub Jelinek  wrote:
> > > >
> > > > On Thu, Apr 22, 2021 at 01:23:20PM +0200, Richard Biener via 
> > > > Gcc-patches wrote:
> > > > > > The question is if the pragma GCC target right now behaves 
> > > > > > incrementally
> > > > > > or not, whether
> > > > > > #pragma GCC target("avx2")
> > > > > > adds -mavx2 to options if it was missing before and nothing 
> > > > > > otherwise, or if
> > > > > > it switches other options off.  If it is incremental, we could e.g. 
> > > > > > try to
> > > > > > use the second least significant bit of global_options_set.x_* to 
> > > > > > mean
> > > > > > this option has been set explicitly by some surrounding #pragma GCC 
> > > > > > target.
> > > > > > The normal tests - global_options_set.x_flag_whatever could still 
> > > > > > work
> > > > > > fine because they wouldn't care if the option was explicit from 
> > > > > > anywhere
> > > > > > (command line or GCC target or target attribute) and just & 2 would 
> > > > > > mean
> > > > > > it was explicit from pragma GCC target; though there is the case of
> > > > > > bitfields... And then the inlining decision could check the & 2 
> > > > > > flags to
> > > > > > see what is required and what is just from command line.
> > > > > > Or we can have some other pragma GCC that would be like target but 
> > > > > > would
> > > > > > have flags that are explicit (and could e.g. be more restricted, to 
> > > > > > ISA
> > > > > > options only, and let those use in addition to #pragma GCC target.
> > > > >
> > > > > I'm still curious as to what you think will break if always-inline 
> > > > > does what
> > > > > it is documented to do.
> > > >
> > > > We will silently accept calling intrinsics that must be used only in 
> > > > certain
> > > > ISA contexts, which will lead to people writing non-portable code.
> > > >
> > > > So -O2 -mno-avx
> > > > #include 
> > > >
> > > > void
> > > > foo (__m256 *x)
> > > > {
> > > >   x[0] = _mm256_sub_ps (x[1], x[2]);
> > > > }
> > > > etc. will now be accepted when it shouldn't be.
> > > > clang rejects it like gcc with:
> > > > 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target 
> > > > feature 'avx', but would be inlined into function 'foo' that is 
> > > > compiled without support for 'avx'
> > > >   x[0] = _mm256_sub_ps (x[1], x[2]);
> > > >  ^
> > > >
> > > > Note, if I do:
> > > > #include 
> > > >
> > > > __attribute__((target ("no-sse3"))) void
> > > > foo (__m256 *x)
> > > > {
> > > >   x[0] = _mm256_sub_ps (x[1], x[2]);
> > > > }
> > > > and compile
> > > > clang -S -O2 -mavx2 1.c
> > > > 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target 
> > > > feature 'avx', but would be inlined into function 'foo' that is 
> > > > compiled without support for 'avx'
> > > >   x[0] = _mm256_sub_ps (x[1], x[2]);
> > > >  ^
> > > > then from the error message it seems that unlike GCC, clang remembers
> > > > the exact target features that are needed for the intrinsics and checks 
> > > > just
> > > > those.
> > > > Though, looking at the preprocessed source, seems it uses
> > > > static __inline __m256 __attribute__((__always_inline__, __nodebug__, 
> > > > __target__("avx"), __min_vector_width__(256)))
> > > > _mm256_sub_ps(__m256 __a, __m256 __b)
> > > > {
> > > >   return (__m256)((__v8sf)__a-(__v8sf)__b);
> > > > }
> > > > and not target pragmas.
> > > >
> > > > Anyway, if we tweak our intrinsic headers so that
> > > > -#ifndef __AVX__
> > > >  #pragma GCC push_options
> > > >  #pragma GCC target("avx")
> > > > -#define __DISABLE_AVX__
> > > > -#endif /* __AVX__ */
> > > >
> > > > ...
> > > > -#ifdef __DISABLE_AVX__
> > > > -#undef __DISABLE_AVX__
> > > >  #pragma GCC pop_options
> > > > -#endif /* __DISABLE_AVX__ */
> > > > and do the opts_set->x_* & 2 stuff on explicit options coming out of
> > > > target/optimize pragmas and attributes, perhaps we don't even need
> > > > to introduce a new attribute and can handle everything magically:
> >
> > Oh, and any such changes will likely interact with Martins ideas to rework
> > how optimize and target attributes work (aka adding ontop of the
> > commandline options).  That is, attribute target will then not be enough
> > to remember the exact set of needed ISA features (as opposed to what
> > likely clang implements?)
> >
> > > > 1) if it is gnu_inline extern inline, allow indirect calls, otherwise
> > > > disallow them for always_inline functions
> > >
> > > There are a lot of intrinsics using extern inline __gnu_inline though...
> > >
> > > > 2) for the isa flags and option mismatches, only disallow opts_set->x_* 
> > > > & 2
> > > > stuff
> > > > This will keep both intrinsics and glibc fortify macros working fine
> > > > in all the needed use 

PING^1 [PATCH v5] : Add pragma GCC target("general-regs-only")

2021-07-31 Thread H.J. Lu via Gcc-patches
On Sat, Jul 17, 2021 at 6:45 PM H.J. Lu  wrote:
>
> On Thu, Apr 22, 2021 at 7:30 AM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Thu, Apr 22, 2021 at 2:52 PM Richard Biener
> >  wrote:
> > >
> > > On Thu, Apr 22, 2021 at 2:22 PM Jakub Jelinek  wrote:
> > > >
> > > > On Thu, Apr 22, 2021 at 01:23:20PM +0200, Richard Biener via 
> > > > Gcc-patches wrote:
> > > > > > The question is if the pragma GCC target right now behaves 
> > > > > > incrementally
> > > > > > or not, whether
> > > > > > #pragma GCC target("avx2")
> > > > > > adds -mavx2 to options if it was missing before and nothing 
> > > > > > otherwise, or if
> > > > > > it switches other options off.  If it is incremental, we could e.g. 
> > > > > > try to
> > > > > > use the second least significant bit of global_options_set.x_* to 
> > > > > > mean
> > > > > > this option has been set explicitly by some surrounding #pragma GCC 
> > > > > > target.
> > > > > > The normal tests - global_options_set.x_flag_whatever could still 
> > > > > > work
> > > > > > fine because they wouldn't care if the option was explicit from 
> > > > > > anywhere
> > > > > > (command line or GCC target or target attribute) and just & 2 would 
> > > > > > mean
> > > > > > it was explicit from pragma GCC target; though there is the case of
> > > > > > bitfields... And then the inlining decision could check the & 2 
> > > > > > flags to
> > > > > > see what is required and what is just from command line.
> > > > > > Or we can have some other pragma GCC that would be like target but 
> > > > > > would
> > > > > > have flags that are explicit (and could e.g. be more restricted, to 
> > > > > > ISA
> > > > > > options only, and let those use in addition to #pragma GCC target.
> > > > >
> > > > > I'm still curious as to what you think will break if always-inline 
> > > > > does what
> > > > > it is documented to do.
> > > >
> > > > We will silently accept calling intrinsics that must be used only in 
> > > > certain
> > > > ISA contexts, which will lead to people writing non-portable code.
> > > >
> > > > So -O2 -mno-avx
> > > > #include 
> > > >
> > > > void
> > > > foo (__m256 *x)
> > > > {
> > > >   x[0] = _mm256_sub_ps (x[1], x[2]);
> > > > }
> > > > etc. will now be accepted when it shouldn't be.
> > > > clang rejects it like gcc with:
> > > > 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target 
> > > > feature 'avx', but would be inlined into function 'foo' that is 
> > > > compiled without support for 'avx'
> > > >   x[0] = _mm256_sub_ps (x[1], x[2]);
> > > >  ^
> > > >
> > > > Note, if I do:
> > > > #include 
> > > >
> > > > __attribute__((target ("no-sse3"))) void
> > > > foo (__m256 *x)
> > > > {
> > > >   x[0] = _mm256_sub_ps (x[1], x[2]);
> > > > }
> > > > and compile
> > > > clang -S -O2 -mavx2 1.c
> > > > 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target 
> > > > feature 'avx', but would be inlined into function 'foo' that is 
> > > > compiled without support for 'avx'
> > > >   x[0] = _mm256_sub_ps (x[1], x[2]);
> > > >  ^
> > > > then from the error message it seems that unlike GCC, clang remembers
> > > > the exact target features that are needed for the intrinsics and checks 
> > > > just
> > > > those.
> > > > Though, looking at the preprocessed source, seems it uses
> > > > static __inline __m256 __attribute__((__always_inline__, __nodebug__, 
> > > > __target__("avx"), __min_vector_width__(256)))
> > > > _mm256_sub_ps(__m256 __a, __m256 __b)
> > > > {
> > > >   return (__m256)((__v8sf)__a-(__v8sf)__b);
> > > > }
> > > > and not target pragmas.
> > > >
> > > > Anyway, if we tweak our intrinsic headers so that
> > > > -#ifndef __AVX__
> > > >  #pragma GCC push_options
> > > >  #pragma GCC target("avx")
> > > > -#define __DISABLE_AVX__
> > > > -#endif /* __AVX__ */
> > > >
> > > > ...
> > > > -#ifdef __DISABLE_AVX__
> > > > -#undef __DISABLE_AVX__
> > > >  #pragma GCC pop_options
> > > > -#endif /* __DISABLE_AVX__ */
> > > > and do the opts_set->x_* & 2 stuff on explicit options coming out of
> > > > target/optimize pragmas and attributes, perhaps we don't even need
> > > > to introduce a new attribute and can handle everything magically:
> >
> > Oh, and any such changes will likely interact with Martins ideas to rework
> > how optimize and target attributes work (aka adding ontop of the
> > commandline options).  That is, attribute target will then not be enough
> > to remember the exact set of needed ISA features (as opposed to what
> > likely clang implements?)
> >
> > > > 1) if it is gnu_inline extern inline, allow indirect calls, otherwise
> > > > disallow them for always_inline functions
> > >
> > > There are a lot of intrinsics using extern inline __gnu_inline though...
> > >
> > > > 2) for the isa flags and option mismatches, only disallow opts_set->x_* 
> > > > & 2
> > > > stuff
> > > > This will keep both intrinsics and glibc fortify macros working fine
> > > > in all the needed use 

[PATCH v5] : Add pragma GCC target("general-regs-only")

2021-07-17 Thread H.J. Lu via Gcc-patches
 new attribute and can handle everything magically:
>
> Oh, and any such changes will likely interact with Martins ideas to rework
> how optimize and target attributes work (aka adding ontop of the
> commandline options).  That is, attribute target will then not be enough
> to remember the exact set of needed ISA features (as opposed to what
> likely clang implements?)
>
> > > 1) if it is gnu_inline extern inline, allow indirect calls, otherwise
> > > disallow them for always_inline functions
> >
> > There are a lot of intrinsics using extern inline __gnu_inline though...
> >
> > > 2) for the isa flags and option mismatches, only disallow opts_set->x_* & 
> > > 2
> > > stuff
> > > This will keep both intrinsics and glibc fortify macros working fine
> > > in all the needed use cases.
> >
> > Yes, see my example in the other mail.
> >
> > I think before we add any new attributes we should sort out the
> > current mess, eventually adding some testcases for desired
> > diagnostic.
> >
> > Richard.
> >
> > > Jakub

Here is the v5 patch:

1. Intrinsics in  only require GPR ISAs.  Add

 #if defined __MMX__ || defined __SSE__
 #pragma GCC push_options
 #pragma GCC target("general-regs-only")
 #define __DISABLE_GENERAL_REGS_ONLY__
 #endif

and

 #ifdef __DISABLE_GENERAL_REGS_ONLY__
 #undef __DISABLE_GENERAL_REGS_ONLY__
 #pragma GCC pop_options
 #endif /* __DISABLE_GENERAL_REGS_ONLY__ */

to  to disable non-GPR ISAs so that they can be used in
functions with __attribute__ ((target("general-regs-only"))).
2. When checking always_inline attribute, if callee only uses GPRs,
ignore MASK_80387 since enable MASK_80387 in caller has no impact on
callee inline.

OK for master?

Thanks.

-- 
H.J.
From ac2c100dbaf838a378bd8b1209b57afc8a5c72fc Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 17 Jul 2021 07:44:45 -0700
Subject: [PATCH v5] : Add pragma GCC
 target("general-regs-only")

1. Intrinsics in  only require GPR ISAs.  Add

 #if defined __MMX__ || defined __SSE__
 #pragma GCC push_options
 #pragma GCC target("general-regs-only")
 #define __DISABLE_GENERAL_REGS_ONLY__
 #endif

and

 #ifdef __DISABLE_GENERAL_REGS_ONLY__
 #undef __DISABLE_GENERAL_REGS_ONLY__
 #pragma GCC pop_options
 #endif /* __DISABLE_GENERAL_REGS_ONLY__ */

to  to disable non-GPR ISAs so that they can be used in
functions with __attribute__ ((target("general-regs-only"))).
2. When checking always_inline attribute, if callee only uses GPRs,
ignore MASK_80387 since enable MASK_80387 in caller has no impact on
callee inline.

gcc/

	PR target/99744
	* config/i386/i386.c (ix86_can_inline_p): Ignore MASK_80387 if
	callee only uses GPRs.
	* config/i386/ia32intrin.h: Revert commit 5463cee2770.
	* config/i386/serializeintrin.h: Revert commit 71958f740f1.
	* config/i386/x86gprintrin.h: Add
	#pragma GCC target("general-regs-only") and #pragma GCC pop_options
	to disable non-GPR ISAs.

gcc/testsuite/

	PR target/99744
	* gcc.target/i386/pr99744-3.c: New test.
	* gcc.target/i386/pr99744-4.c: Likewise.
	* gcc.target/i386/pr99744-5.c: Likewise.
	* gcc.target/i386/pr99744-6.c: Likewise.
	* gcc.target/i386/pr99744-7.c: Likewise.
	* gcc.target/i386/pr99744-8.c: Likewise.
---
 gcc/config/i386/i386.c|   6 +-
 gcc/config/i386/ia32intrin.h  |  14 +-
 gcc/config/i386/serializeintrin.h |   7 +-
 gcc/config/i386/x86gprintrin.h|  11 +
 gcc/testsuite/gcc.target/i386/pr99744-3.c |  13 +
 gcc/testsuite/gcc.target/i386/pr99744-4.c | 357 ++
 gcc/testsuite/gcc.target/i386/pr99744-5.c |  25 ++
 gcc/testsuite/gcc.target/i386/pr99744-6.c |  23 ++
 gcc/testsuite/gcc.target/i386/pr99744-7.c |  14 +
 gcc/testsuite/gcc.target/i386/pr99744-8.c |  15 +
 10 files changed, 481 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr99744-8.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9d74b7a191b..da067b1586f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -554,7 +554,7 @@ ix86_can_inline_p (tree caller, tree callee)
 
   /* Changes of those flags can be tolerated for always inlines. Lets hope
  user knows what he is doing.  */
-  const unsigned HOST_WIDE_INT always_inline_safe_mask
+  unsigned HOST_WIDE_INT always_inline_safe_mask
 	 = (MASK_USE_8BIT_IDIV | MASK_ACCUMULATE_OUTGOING_ARGS
 	| MASK_NO_ALIGN_STRINGOPS | MASK_AVX256_SPLIT_UNALIGNED_LOAD
 	| MASK_AVX256_SPLIT_UNALIGNED_STORE | MA