[PATCH] Fix tree-optimization/103257: Missed jump threading due too early conversion of bool*A into bool?A:0

2021-11-17 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

So like many optimizations on the gimple level, sometimes it makes sense to do 
the
optimization early or later. In this case, creating a cond expression early 
causes
other optimizations to be missed.  So just disable it until canonicalize_math_p 
()
is false.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/103257

gcc/ChangeLog:

* match.pd
(((m1 >/=/<= m2) * d -> (m1 >/=/<= m2) ? d : 0):
Disable until !canonicalize_math_p ().

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/vrp116.c: Check optimized instead of vrp1.
* gcc.dg/tree-ssa/pr103257-1.c: New test.
---
 gcc/match.pd   |  8 
 gcc/testsuite/gcc.dg/tree-ssa/pr103257-1.c | 11 +++
 gcc/testsuite/gcc.dg/tree-ssa/vrp116.c |  4 ++--
 3 files changed, 17 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr103257-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index dc3d505..0332d87 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1781,10 +1781,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (convert (bit_and (bit_not @1) @0
 
 /* (m1 CMP m2) * d -> (m1 CMP m2) ? d : 0  */
-(for cmp (gt lt ge le)
-(simplify
- (mult (convert (cmp @0 @1)) @2)
-  (if (GIMPLE || !TREE_SIDE_EFFECTS (@2))
+(if (!canonicalize_math_p ())
+ (for cmp (gt lt ge le)
+  (simplify
+   (mult (convert (cmp @0 @1)) @2)
(cond (cmp @0 @1) @2 { build_zero_cst (type); }
 
 /* For integral types with undefined overflow and C != 0 fold
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr103257-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr103257-1.c
new file mode 100644
index 000..89f4f44
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr103257-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+void link_error(void);
+unsigned b, c;
+static short a(short e, short f) { return e * f; }
+int main() {
+  if (a(1  ^ ((0, 0) ^ 1 && b) <= b, c))
+link_error ();
+  c = 0;
+}
+/* { dg-final { scan-tree-dump-times "link_error" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
index d9d7b23..9e68a77 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp1" } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
 
 int
 f (int m1, int m2, int c)
@@ -9,4 +9,4 @@ f (int m1, int m2, int c)
   return e ? m1 : m2;
 }
 
-/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "vrp1" } } */
+/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "optimized" 
} } */
-- 
1.8.3.1



Re: [PATCH] Fortran: Mark internal symbols as artificial [PR88009,PR68800]

2021-11-17 Thread Bernhard Reutner-Fischer via Gcc-patches
On Tue, 16 Nov 2021 21:46:32 +0100
Harald Anlauf via Fortran  wrote:

> Hi Bernhard,
> 
> I'm trying to understand your patch.  What does it really try to solve?

Compiler generated symbols should be marked artificial.
The fix for PR88009 ( f8add009ce300f24b75e9c2e2cc5dd944a020c28 ,
r9-5194 ) added artificial just to the _final component and left out all the 
rest.
Note that the majority of compiler generated symbols in class.c
already had artificial set properly.
The proposed patch amends the other generated symbols to be marked
artificial, too.

The other parts fix memory leaks.

> 
> PR88009 is closed and seems to have nothing to do with this.

Well it marked only _final as artificial and forgot to adjust the
others as well.
We can remove the reference to PR88009 if you prefer?

thanks!
> 
> Harald
> 
> Am 14.11.21 um 23:17 schrieb Bernhard Reutner-Fischer via Fortran:
> > Hi!
> > 
> > Amend fix for PR88009 to mark all these class components as artificial.
> > 
> > gcc/fortran/ChangeLog:
> > 
> >  * class.c (gfc_build_class_symbol, generate_finalization_wrapper,
> >  (gfc_find_derived_vtab, find_intrinsic_vtab): Use stringpool for
> >  names. Mark internal symbols as artificial.
> >  * decl.c (gfc_match_decl_type_spec, gfc_match_end): Fix
> >  indentation.
> >  (gfc_match_derived_decl): Fix indentation. Check extension level
> >  before incrementing refs counter.
> >  * parse.c (parse_derived): Fix style.
> >  * resolve.c (resolve_global_procedure): Likewise.
> >  * symbol.c (gfc_check_conflict): Do not ignore artificial symbols.
> >  (gfc_add_flavor): Reorder condition, cheapest first.
> >  (gfc_new_symbol, gfc_get_sym_tree,
> >  generate_isocbinding_symbol): Fix style.
> >  * trans-expr.c (gfc_trans_subcomponent_assign): Remove
> >  restriction on !artificial.
> >  * match.c (gfc_match_equivalence): Special-case CLASS_DATA for
> >  warnings.
> > 
> > ---
> > gfc_match_equivalence(), too, should not bail-out early on the first
> > error but should diagnose all errors. I.e. not goto cleanup but set
> > err=true and continue in order to diagnose all constraints of a
> > statement. Maybe Sandra or somebody else will eventually find time to
> > tweak that.
> > 
> > I think it also plugs a very minor leak of name in gfc_find_derived_vtab
> > so i also tagged it [PR68800]. At least that was the initial
> > motiviation to look at that spot.
> > We were doing
> > -  name = xasprintf ("__vtab_%s", tname);
> > ...
> >gfc_set_sym_referenced (vtab);
> > - name = xasprintf ("__vtype_%s", tname);
> > 
> > Bootstrapped and regtested without regressions on x86_64-unknown-linux.
> > Ok for trunk?
> >   
> 
> 



[PATCH, rs6000] optimization for vec_reve builtin [PR100868]

2021-11-17 Thread HAO CHEN GUI via Gcc-patches
Hi,

  The patch optimized for vec_reve builtin on rs6000. For V2DI and V2DF, it is 
implemented by xxswapd on all targets. For V16QI, V8HI, V4SI and V4SF, it is 
implemented by quadword byte reverse plus halfword/word byte reverse when 
p9_vector is set.

  Bootstrapped and tested on powerpc64le-linux with no regressions. Is this 
okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2021-11-17 Haochen Gui 

gcc/
* config/rs6000/altivec.md (altivec_vreve2 for VEC_K): Use
xxbrq for v16qi, xxbrq + xxbrh for v8hi and xxbrq + xxbrw for v4si
or v4sf when p9_vector is set.
(altivec_vreve2 for VEC_64): Defined. Implemented by xxswapd.

gcc/testsuite/
* gcc.target/powerpc/vec_reve_1.c: New test.
* gcc.target/powerpc/vec_reve_2.c: Likewise.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 1351dafbc41..a1698ce85c0 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -4049,13 +4049,43 @@ (define_expand "altivec_negv4sf2"
   DONE;
 })

-;; Vector reverse elements
+;; Vector reverse elements for V16QI V8HI V4SI V4SF
 (define_expand "altivec_vreve2"
-  [(set (match_operand:VEC_A 0 "register_operand" "=v")
-   (unspec:VEC_A [(match_operand:VEC_A 1 "register_operand" "v")]
+  [(set (match_operand:VEC_K 0 "register_operand" "=v")
+   (unspec:VEC_K [(match_operand:VEC_K 1 "register_operand" "v")]
  UNSPEC_VREVEV))]
   "TARGET_ALTIVEC"
 {
+  if (TARGET_P9_VECTOR)
+{
+  if (mode == V16QImode)
+   emit_insn (gen_p9_xxbrq_v16qi (operands[0], operands[1]));
+  else if (mode == V8HImode)
+   {
+ rtx subreg1 = simplify_gen_subreg (V1TImode, operands[1],
+mode, 0);
+ rtx temp = gen_reg_rtx (V1TImode);
+ emit_insn (gen_p9_xxbrq_v1ti (temp, subreg1));
+ rtx subreg2 = simplify_gen_subreg (mode, temp,
+V1TImode, 0);
+ emit_insn (gen_p9_xxbrh_v8hi (operands[0], subreg2));
+   }
+  else /* V4SI and V4SF.  */
+   {
+ rtx subreg1 = simplify_gen_subreg (V1TImode, operands[1],
+mode, 0);
+ rtx temp = gen_reg_rtx (V1TImode);
+ emit_insn (gen_p9_xxbrq_v1ti (temp, subreg1));
+ rtx subreg2 = simplify_gen_subreg (mode, temp,
+V1TImode, 0);
+ if (mode == V4SImode)
+   emit_insn (gen_p9_xxbrw_v4si (operands[0], subreg2));
+ else
+   emit_insn (gen_p9_xxbrw_v4sf (operands[0], subreg2));
+   }
+  DONE;
+}
+
   int i, j, size, num_elements;
   rtvec v = rtvec_alloc (16);
   rtx mask = gen_reg_rtx (V16QImode);
@@ -4074,6 +4104,17 @@ (define_expand "altivec_vreve2"
   DONE;
 })

+;; Vector reverse elements for V2DI V2DF
+(define_expand "altivec_vreve2"
+  [(set (match_operand:VEC_64 0 "register_operand" "=v")
+   (unspec:VEC_64 [(match_operand:VEC_64 1 "register_operand" "v")]
+ UNSPEC_VREVEV))]
+  "TARGET_ALTIVEC"
+{
+  emit_insn (gen_xxswapd_ (operands[0], operands[1]));
+  DONE;
+})
+
 ;; Vector SIMD PEM v2.06c defines LVLX, LVLXL, LVRX, LVRXL,
 ;; STVLX, STVLXL, STVVRX, STVRXL are available only on Cell.
 (define_insn "altivec_lvlx"
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_reve_1.c 
b/gcc/testsuite/gcc.target/powerpc/vec_reve_1.c
new file mode 100644
index 000..83a9206758b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_reve_1.c
@@ -0,0 +1,16 @@
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-O2 -maltivec" } */
+
+#include 
+
+vector double foo1 (vector double a)
+{
+   return vec_reve (a);
+}
+
+vector long long foo2 (vector long long a)
+{
+   return vec_reve (a);
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_reve_2.c 
b/gcc/testsuite/gcc.target/powerpc/vec_reve_2.c
new file mode 100644
index 000..b6dd33d6d79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_reve_2.c
@@ -0,0 +1,28 @@
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2 -maltivec" } */
+
+#include 
+
+vector int foo1 (vector int a)
+{
+   return vec_reve (a);
+}
+
+vector float foo2 (vector float a)
+{
+   return vec_reve (a);
+}
+
+vector short foo3 (vector short a)
+{
+   return vec_reve (a);
+}
+
+vector char foo4 (vector char a)
+{
+   return vec_reve (a);
+}
+
+/* { dg-final { scan-assembler-times {\mxxbrq\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mxxbrw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxbrh\M} 1 } } */
2021-11-17 Haochen Gui 

gcc/
* config/rs6000/altivec.md (altivec_vreve2 for VEC_K): Use
xxbrq for v16qi, xxbrq + xxbrh for v8hi and xxbrq + xxbrw for v4si
or v4sf when p9_vector is set.
(altivec_vreve2 for VEC_64): Defined. Implemented 

Re: [PATCH][GCC] arm: add armv9-a architecture to -march

2021-11-17 Thread Christophe Lyon via Gcc-patches
Hi,

Indeed I wasn't asking for a ~partial revert, rather making sure the new
behaviour is intended: current GCC trunk
won't build with A-profile multilibs unless using trunk binutils. We
probably had the same situation when
v8-a was introduced, just maybe I wasn't running CI with multilibs enabled
at that time.

Anyway, as Richard & Ramana said, if there is no real need for v9-a
multilibs, dropping them would save
build time and disk size.

Thanks,

Christophe


On Tue, Nov 16, 2021 at 1:29 PM Ramana Radhakrishnan <
ramana.radhakrish...@arm.com> wrote:

> Hi There,
>
>
>
> I think for AArch32 mapping it back to armv8-a sounds sufficient.  Unless
> we have string or math routines in newlib that make use of any ACLE guards
> that are beyond armv8-a …
>
>
>
> Ramana
>
>
>
>
>
> *From: *Richard Earnshaw 
> *Date: *Tuesday, 16 November 2021 at 11:48
> *To: *Christophe Lyon , Przemyslaw Wirkus <
> przemyslaw.wir...@arm.com>
> *Cc: *Ramana Radhakrishnan ,
> gcc-patches@gcc.gnu.org , Richard Earnshaw <
> richard.earns...@arm.com>
> *Subject: *Re: [PATCH][GCC] arm: add armv9-a architecture to -march
>
> You can't make an omelette without breaking eggs, as they say.  New
> architectures need new assemblers.
>
> However, I wonder if there's anything in v9-a that significantly affects
> the quality of the base multilib code needed for building the libraries.
>   It might be that we can deal with v9-a by just mapping it to the v8-a
> equivalents.  That would then avoid the need for an updated assembler,
> and reduce the build time and install footprint.
>
> R.
>
>
> On 16/11/2021 08:03, Christophe Lyon via Gcc-patches wrote:
> > Hi,
> >
> >
> > On Tue, Nov 9, 2021 at 12:36 PM Przemyslaw Wirkus via Gcc-patches <
> > gcc-patches@gcc.gnu.org> wrote:
> >
> > -Original Message-
> > From: Przemyslaw Wirkus
> > Sent: 18 October 2021 10:37
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Earnshaw ; Ramana
> > Radhakrishnan ; Kyrylo Tkachov
> > ; ni...@redhat.com
> > Subject: [PATCH][GCC] arm: add armv9-a architecture to -march
> >
> > Hi,
> >
> > This patch is adding `armv9-a` to -march in Arm GCC.
> >
> > In this patch:
> >+ Add `armv9-a` to -march.
> >+ Update multilib with armv9-a and armv9-a+simd.
> >
> > After this patch three additional multilib directories are available:
> >
> > $ arm-none-eabi-gcc --print-multi-lib .; [...vanilla multi-lib
> > dirs...] thumb/v9-a/nofp;@mthumb@march=armv9-a@mfloat-abi=soft
> > thumb/v9-a+simd/softfp;@mthumb@march=armv9-a+simd@mfloat-
> > abi=softfp
> > thumb/v9-a+simd/hard;@mthumb@march=armv9-a+simd@mfloat-
> > abi=hard
> >
> >>
> >
> > This is causing a GCC build failure when using "old" binutils (I'm using
> > 2.36.1),
> > because the new -march=armv9-a option is not supported. This breaks the
> > multilib support.
> >
> > I don't remember how we handled similar cases in the past? Is that just
> > "expected", and
> > "current" GCC needs "current" binutils, or should we have a multilib list
> > dependent on
> > the actual binutils support? (I think this is not the case, and it sounds
> > like an undesirable
> > extra complication in an already overcrowded mutilib-Makefile)
> >
> > Christophe
> >
>  New multi-lib directories under
> > $GCC_INSTALL_DIE/lib/gcc/arm-none-eabi/12.0.0/thumb are created:
> >
> > thumb/
> > +--- v9-a
> > ||--- nofp
> > |
> > +--- v9-a+simd
> >   |--- hard
> >   |--- softfp
> >
> > Regtested on arm-none-eabi cross and no issues.
> >
> > OK for master?
> >>
> >> Thanks.
> >>
> >> commit 32ba7860ccaddd5219e6dae94a3d0653e124c9dd
> >>
> >>> Ok.
> >>> Thanks,
> >>> Kyrill
> >>>
> >>>
> >
> > gcc/ChangeLog:
> >
> >* config/arm/arm-cpus.in (armv9): New define.
> >(ARMv9a): New group.
> >(armv9-a): New arch definition.
> >* config/arm/arm-tables.opt: Regenerate.
> >* config/arm/arm.h (BASE_ARCH_9A): New arch enum value.
> >* config/arm/t-aprofile: Added armv9-a and armv9+simd.
> >* config/arm/t-arm-elf: Added arm9-a, v9_fps and all_v9_archs
> >to MULTILIB_MATCHES.
> >* config/arm/t-multilib: Added v9_a_nosimd_variants and
> >v9_a_simd_variants to MULTILIB_MATCHES.
> >* doc/invoke.texi: Update docs.
> >
> > gcc/testsuite/ChangeLog:
> >
> >* gcc.target/arm/multilib.exp: Update test with armv9-a entries.
> >* lib/target-supports.exp (v9a): Add new armflag.
> >(__ARM_ARCH_9A__): Add new armdef.
> >
> > --
> > kind regards,
> > Przemyslaw Wirkus
> >>
> >>
>


Re: [RFC] c++: Print function template parms when relevant (was: [PATCH v4] c++: Add gnu::diagnose_as attribute)

2021-11-17 Thread Matthias Kretz
On Wednesday, 17 November 2021 07:09:18 CET Jason Merrill wrote:
> > -  if (CHECKING_P)
> > -SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (a, TREE_VEC_LENGTH (a));
> > +  SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (a, nondefault);
> 
> should have been
> 
> if (CHECKING_P || nondefault != TREE_VEC_LENGTH (a))
>SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (a, nondefault);

TBH, I don't understand the purpose of CHECKING_P here, or rather it makes me 
nervous because AFAIU I'm only testing with CHECKING_P enabled. Why make 
behavior dependent on CHECKING_P? I expected CHECKING_P to basically only add 
more assertions.

> > (copy_template_args): Jason?
> 
> Only copy the non-default template args count on TREE_VECs that should
> have it.

Why not simply set the count on all args? Is it a performance concern? The 
INTEGER_CST the TREE_CHAIN has to point to exists anyway, so it's not wasting 
any memory, right?

> 
> > +  /* Pretty print only template instantiations. Don't pretty print
> > explicit
> > + specializations like 'template <> void fun (int)'. 
> 
> This seems like a significant change of behavior unrelated to printing
> default template arguments.  What's the rationale for handling
> specializations differently from instantiations?

Right, this is about "The general idea of this change is to print template 
parms wherever they would appear in the source code as well".

Initially, the change to print function template arguments/parameters only if 
the args were explicitly specified lead to printing 'void fun (T) [with T = 
...]' or 'template <> void fun (int)'. Both are not telling the full story, 
even if the former is how the function would be called. But if the reader 
should quickly recognize what code is getting called, it is helpful to see 
right away that a function template specialization is called. (It might also 
reveal an implementation detail of a library, so it's not 100% obvious how to 
choose here.) Also, saying 'T = int' is kind of wrong. Yes, 'int' was deduced. 
But there's no T in fun:

template  void fun (T);
template <> void fun (int);

__FUNCTION__ was 'fun' all the time, but __PRETTY_FUNCTION__ was 'void 
fun(T) [with T = int]'. It's more consistent that __PRETTY_FUNCTION__ contains 
__FUNCTION__, IMHO, so it would have to be at least 'void fun(T) [with T 
= int]'. But that's strange: How it uses T and int for the same type. So I 
settled on 'void fun(int)'.

> I also don't understand the purpose of TFF_AS_PRIMARY.

dump_function_decl generalizes the TEMPLATE_DECL (if flag_pretty_templates is 
true) and, before this change, passes the generalized TEMPLATE_DECL to 
dump_type (... DECL_CONTEXT (t) ...) and dump_function_name (... t ...). 
That's why the whole template is printed as primary template (i.e. with 
template parms instead of template args, as is needed for 
flag_pretty_templates). But this drops the count of non-default template args. 
To retain the count, dump_type and dump_function_name need to be called with 
the original TEMPLATE_DECL. But if I do this, pretty-templates is broken.
'template  struct A { template  void f(T, U); };' would 
print as 'A::f(T, U) [with U = float, T = int]'. To get back to 
'A::f(T, U) [with U = float, T = int]' I needed to tell 
dump_template_parms that even though the template args are there, it should 
print only the template parms. The most obvious way to do that was to carry it 
through via flags.

Note that this creates another problem. Given

template  struct Outer {
  template  struct A;
  template  struct A {
void f();
  };
};

we want to print e.g. 'void Outer::A::f() [with X = int, T0 = 
int]', but certainly not 'void Outer::A::f() [with X = int, T0 = 
int]'. However, specialized_t holds A which is printed as A 
with TFF_AS_PRIMARY. Only most_general_template of the function's 
TEMPLATE_DECL can give us A as DECL_CONTEXT.

I have a solution in the diagnose_as patch, where I had to solve a similar 
problem because for the diagnose_as attribute (dump_template_scope).

> > +/* Print function template parameters if:
> > +   1. t is template, and
> > +   2. flags did not request "show only template-name", and
> > +   3. t is a specialization of a template (Why is this needed? This was
> > present +  since 1999 via !DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION:
> > "Don't crash if +  given a friend pseudo-instantiation". The
> > DECL_USE_TEMPLATE should likely +  inform the PRIMARY parameter of
> > dump_template_parms.), and
> 
> Good question.  It seems that the existing
> !DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION has mostly been excluding the
> most general template; removing that line changes things like
> 
> int bar(T) [with T = int]
> 
> to
> 
> int bar(T) [with T = int]
> 
> 
> 
> > +   4. either
> > +  - flags requests to show no function arguments, in which case
> > deduced +types could be hidden, or
> > +  - at least one function template argument was given explicitly, or
> > +  - we're printing a DWARF n

Re: [PATCH] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches
 wrote:
>
> Add -mharden-sls= to mitigate against straight line speculation (SLS)
> for function return and indirect branch by adding an INT3 instruction
> after function return and indirect branch.
>
> gcc/
>
> PR target/102952
> * config/i386/i386-opts.h (harden_sls): New enum.
> * config/i386/i386.c (output_indirect_thunk): Mitigate against
> SLS for function return.
> (ix86_output_function_return): Likewise.
> (ix86_output_jmp_thunk_or_indirect): Mitigate against indirect
> branch.
> (ix86_output_indirect_jmp): Likewise.
> (ix86_output_call_insn): Likewise.
> * config/i386/i386.opt: Add -mharden-sls=.
> * doc/invoke.texi: Document -mharden-sls=.
>
> gcc/testsuite/
>
> PR target/102952
> * gcc.target/i386/harden-sls-1.c: New test.
> * gcc.target/i386/harden-sls-2.c: Likewise.
> * gcc.target/i386/harden-sls-3.c: Likewise.
> * gcc.target/i386/harden-sls-4.c: Likewise.
> ---
>  gcc/config/i386/i386-opts.h  |  7 +
>  gcc/config/i386/i386.c   | 30 
>  gcc/config/i386/i386.opt | 20 +
>  gcc/doc/invoke.texi  | 10 ++-
>  gcc/testsuite/gcc.target/i386/harden-sls-1.c | 14 +
>  gcc/testsuite/gcc.target/i386/harden-sls-2.c | 14 +
>  gcc/testsuite/gcc.target/i386/harden-sls-3.c | 14 +
>  gcc/testsuite/gcc.target/i386/harden-sls-4.c | 14 +
>  8 files changed, 116 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-4.c
>
> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
> index 04e4ad608fb..171d3106d0a 100644
> --- a/gcc/config/i386/i386-opts.h
> +++ b/gcc/config/i386/i386-opts.h
> @@ -121,4 +121,11 @@ enum instrument_return {
>instrument_return_nop5
>  };
>
> +enum harden_sls {
> +  harden_sls_none = 0,
> +  harden_sls_return = 1 << 0,
> +  harden_sls_indirect_branch = 1 << 1,
> +  harden_sls_all = harden_sls_return | harden_sls_indirect_branch
> +};
> +
>  #endif
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index cc9f9322fad..0a902d66321 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -5914,6 +5914,8 @@ output_indirect_thunk (unsigned int regno)
>  }
>
>fputs ("\tret\n", asm_out_file);
> +  if ((ix86_harden_sls & harden_sls_return))
> +fputs ("\tint3\n", asm_out_file);
>  }
>
>  /* Output a funtion with a call and return thunk for indirect branch.
> @@ -15987,6 +15989,8 @@ ix86_output_jmp_thunk_or_indirect (const char 
> *thunk_name, const int regno)
>fprintf (asm_out_file, "\tjmp\t");
>assemble_name (asm_out_file, thunk_name);
>putc ('\n', asm_out_file);
> +  if ((ix86_harden_sls & harden_sls_indirect_branch))
> +   fputs ("\tint3\n", asm_out_file);
>  }
>else
>  output_indirect_thunk (regno);
> @@ -16212,10 +16216,14 @@ ix86_output_indirect_jmp (rtx call_op)
> gcc_unreachable ();
>
>ix86_output_indirect_branch (call_op, "%0", true);
> -  return "";
> +  if ((ix86_harden_sls & harden_sls_indirect_branch))
> +   return "int3";
> +  else
> +   return "";
>  }
>else
> -return "%!jmp\t%A0";
> +return ((ix86_harden_sls & harden_sls_indirect_branch)
> +   ? "%!jmp\t%A0\n\tint3" : "%!jmp\t%A0");
>  }

Just change existing returns to fputs and end function with:

return (ix86_harden_sls & harden_sls_indirect_branch) ? "int3" : "";

>  /* Output return instrumentation for current function if needed.  */
> @@ -16283,10 +16291,15 @@ ix86_output_function_return (bool long_p)
>return "";
>  }
>
> -  if (!long_p)
> -return "%!ret";
> +  if ((ix86_harden_sls & harden_sls_return))
> +return "%!ret\n\tint3";
> +  else
> +{
> +  if (!long_p)
> +   return "%!ret";
>
> -  return "rep%; ret";
> +  return "rep%; ret";
> +}
>  }

Also here.

>
>  /* Output indirect function return.  RET_OP is the function return
> @@ -16381,7 +16394,12 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op)
>if (output_indirect_p && !direct_p)
> ix86_output_indirect_branch (call_op, xasm, true);
>else
> -   output_asm_insn (xasm, &call_op);
> +   {
> + output_asm_insn (xasm, &call_op);
> + if (!direct_p
> + && (ix86_harden_sls & harden_sls_indirect_branch))
> +   return "int3";
> +   }
>return "";
>  }
>
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index b38ac13fc91..c5452c49597 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1

Re: [PATCH v4 1/1] [ARM] Add support for TLS register based stack protector canary access

2021-11-17 Thread Ramana Radhakrishnan via Gcc-patches
Thanks Ard and Qing.

I have been busy with other things in the last few weeks and I don’t work on 
GCC as part of my day job : however I’ll try to find some time to review this 
patch set in the coming days.

Sorry about the delay.

Regards,
Ramana

From: Ard Biesheuvel 
Date: Tuesday, 9 November 2021 at 22:03
To: Qing Zhao 
Cc: Ramana Radhakrishnan , 
linux-harden...@vger.kernel.org , kees Cook 
, Keith Packard , 
thomas.preudho...@celest.fr , 
adhemerval.zane...@linaro.org , Richard 
Sandiford , gcc-patches@gcc.gnu.org 

Subject: Re: [PATCH v4 1/1] [ARM] Add support for TLS register based stack 
protector canary access
On Tue, 9 Nov 2021 at 21:45, Qing Zhao  wrote:
>
> Hi, Ard,
>
> Sorry for the late reply (since I don’t have the right to approve a patch, I 
> has been waiting for any arm port maintainer to review this patch).
> The following is the arm port maintainer information I got from MAINTAINERS 
> file (you might want to explicitly cc’ing one of them for a review)
>
> arm portNick Clifton
> arm portRichard Earnshaw
> arm portRamana Radhakrishnan
> arm portKyrylo Tkachov  
>
> I see that Ramana implemented the similar patch for aarch64 (commit 
> cd0b2d361df82c848dc7e1c3078651bb0624c3c6), So, I am CCing him with this 
> email. Hopefully he will review this patch.
>

Thank you Qing. But I know Ramana well, and I know he no longer works
on GCC. I collaborated with him on the AArch64 implementation at the
time (but he wrote all the code)

> Anyway, I briefly read your patch (version 4), and have the following 
> questions and comments:
>
> 1.  When the option -mstack-protector-guard=tls presents,  should the option 
> mstack-protector-guard-offset=.. be required to present?
>  If it’s required to present, you might want to add such requirement to 
> the documentation, and also issue errors when it’s not present.
>  It’s not clear right now from the current implementation, so, you might 
> need to update both "arm_option_override_internal “ in arm.c
>  and doc/invoke.texi to make this clear.
>

An  offset of 0x0 is a reasonable default, so I don't think it is
necessary to require the offset param to be passed in that case.

> 2. For arm, is there only one system register can be used for this purpose?
>

There are other registers that might be used in the same way, but the
TLS register is the obvious choice. On AArch64, we decided to use
'sysreg' and permit the user to specify the register because the Linux
kernel uses the user space stack pointer (SP_EL0), which is kind of
odd so we did not want to hard code that.

> 3. For the functionality you added, I didn’t see any testing cases added, I 
> think testing cases are needed.
>

Yes, I am aware of that. I'm just not sure I know how to proceed here:
any pointers?

> More comments are embedded below:
>
> > On Oct 28, 2021, at 6:27 AM, Ard Biesheuvel  wrote:
> >
> > Add support for accessing the stack canary value via the TLS register,
> > so that multiple threads running in the same address space can use
> > distinct canary values. This is intended for the Linux kernel running in
> > SMP mode, where processes entering the kernel are essentially threads
> > running the same program concurrently: using a global variable for the
> > canary in that context is problematic because it can never be rotated,
> > and so the OS is forced to use the same value as long as it remains up.
> >
> > Using the TLS register to index the stack canary helps with this, as it
> > allows each CPU to context switch the TLS register along with the rest
> > of the process, permitting each process to use its own value for the
> > stack canary.
> >
> > 2021-10-28 Ard Biesheuvel 
> >
> >   * config/arm/arm-opts.h (enum stack_protector_guard): New
> >   * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem):
> >   New
> >   * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define
> >   (arm_option_override_internal): Handle and put in error checks
> >   for stack protector guard options.
> >   (arm_option_reconfigure_globals): Likewise
> >   (arm_stack_protect_tls_canary_mem): New
> >   (arm_stack_protect_guard): New
> >   * config/arm/arm.md (stack_protect_set): New
> >   (stack_protect_set_tls): Likewise
> >   (stack_protect_test): Likewise
> >   (stack_protect_test_tls): Likewise
> >   (reload_tp_hard): Likewise
> >   * config/arm/arm.opt (-mstack-protector-guard): New
> >   (-mstack-protector-guard-offset): New.
> >   * doc/invoke.texi: Document new options
> >
> > Signed-off-by: Ard Biesheuvel 
> > ---
> > gcc/config/arm/arm-opts.h   |  6 ++
> > gcc/config/arm/arm-protos.h |  2 +
> > gcc/config/arm/arm.c| 55 +++
> > gcc/config/arm/arm.md   | 71 +++-
> > gcc/config/arm/arm.opt  | 22 ++
> > gcc/doc/invoke.texi |  9 +++
> > 6 files changed, 163 ins

Re: [PATCH] x86: Add -mindirect-branch-cs-prefix

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Tue, Nov 16, 2021 at 7:51 PM H.J. Lu via Gcc-patches
 wrote:
>
> Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk
> via r8-r15 registers when converting indirect call and jump to increase
> the instruction length to 6, allowing the non-thunk form to be inlined.
>
> gcc/
>
> PR target/102952
> * config/i386/i386.c (ix86_output_jmp_thunk_or_indirect): Emit
> CS prefix for -mindirect-branch-cs-prefix.
> (ix86_output_indirect_branch_via_reg): Likewise.
> * config/i386/i386.opt: Add -mindirect-branch-cs-prefix.
> * doc/invoke.texi: Document -mindirect-branch-cs-prefix.
>
> gcc/testsuite/
>
> PR target/102952
> * gcc.target/i386/indirect-thunk-cs-prefix-1.c: New test.
> * gcc.target/i386/indirect-thunk-cs-prefix-2.c: Likewise.
> ---
>  gcc/config/i386/i386.c|  6 ++
>  gcc/config/i386/i386.opt  |  4 
>  gcc/doc/invoke.texi   |  8 +++-
>  .../gcc.target/i386/indirect-thunk-cs-prefix-1.c  | 14 ++
>  .../gcc.target/i386/indirect-thunk-cs-prefix-2.c  | 15 +++
>  5 files changed, 46 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-2.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 7e9b7bc347f..0a902d66321 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -15983,6 +15983,9 @@ ix86_output_jmp_thunk_or_indirect (const char 
> *thunk_name, const int regno)
>  {
>if (thunk_name != NULL)
>  {
> +  if (regno >= FIRST_REX_INT_REG

 REX_INT_REGNO_P

> + && ix86_indirect_branch_cs_prefix)
> +   fprintf (asm_out_file, "\tcs\n");
>fprintf (asm_out_file, "\tjmp\t");
>assemble_name (asm_out_file, thunk_name);
>putc ('\n', asm_out_file);
> @@ -16036,6 +16039,9 @@ ix86_output_indirect_branch_via_reg (rtx call_op, 
> bool sibcall_p)
>  {
>if (thunk_name != NULL)
> {
> + if (regno >= FIRST_REX_INT_REG

 REX_INT_REGNO_P

> + && ix86_indirect_branch_cs_prefix)
> +   fprintf (asm_out_file, "\tcs\n");
>   fprintf (asm_out_file, "\tcall\t");
>   assemble_name (asm_out_file, thunk_name);
>   putc ('\n', asm_out_file);
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 8d499a5a4df..c5452c49597 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1076,6 +1076,10 @@ Enum(indirect_branch) String(thunk-inline) 
> Value(indirect_branch_thunk_inline)
>  EnumValue
>  Enum(indirect_branch) String(thunk-extern) 
> Value(indirect_branch_thunk_extern)
>
> +mindirect-branch-cs-prefix
> +Target Var(ix86_indirect_branch_cs_prefix) Init(0)
> +Add CS prefix to call and jmp to thunk when converting indirect call and 
> jump.

This is not what the function really does. It adds cs to REX prefixed regs.

> +
>  mindirect-branch-register
>  Target Var(ix86_indirect_branch_register) Init(0)
>  Force indirect call and jump via register.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index f3b4b467765..c992a7152f5 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1425,7 +1425,8 @@ See RS/6000 and PowerPC Options.
>  -mstack-protector-guard-symbol=@var{symbol} @gol
>  -mgeneral-regs-only  -mcall-ms2sysv-xlogues -mrelax-cmpxchg-loop @gol
>  -mindirect-branch=@var{choice}  -mfunction-return=@var{choice} @gol
> --mindirect-branch-register -mharden-sls=@var{choice} -mneeded}
> +-mindirect-branch-register -mharden-sls=@var{choice} @gol
> +-mindirect-branch-cs-prefix -mneeded}
>
>  @emph{x86 Windows Options}
>  @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
> @@ -32390,6 +32391,11 @@ hardening.  @samp{return} enables SLS hardening for 
> function return.
>  @samp{indirect-branch} enables SLS hardening for indirect branch.
>  @samp{all} enables all SLS hardening.
>
> +@item -mindirect-branch-cs-prefix
> +@opindex mindirect-branch-cs-prefix
> +Add CS prefix to call and jmp to thunk via r8-r15 registers when
> +converting indirect call and jump.
> +
>  @end table
>
>  These @samp{-m} switches are supported in addition to the above
> diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c 
> b/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c
> new file mode 100644
> index 000..db2f3416823
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -ffixed-rax -ffixed-rbx -ffixed-rcx -ffixed-rdx 
> -ffixed-rdi -ffixed-rsi -mindirect-branch-cs-prefix 
> -mindirect-branch=thunk-extern" } */
> +/* { dg-additional-options "-fno-pic" { target { ! *-*-darwin* } } } */
> +
> +extern void (*fptr) (void);
> +
> +void
> +foo (void)
> +{
>

[PATCH] lim: Reset flow sensitive info even for pointers [PR103192]

2021-11-17 Thread Jakub Jelinek via Gcc-patches
Hi!

Since 2014 is lim clearing SSA_NAME_RANGE_INFO for integral SSA_NAMEs
if moving them from conditional contexts inside of a loop into unconditional
before the loop, but as the miscompilation of gimplify.c shows, we need to
treat pointers the same, even for them we need to reset whether the pointer
can/can't be null or the recorded pointer alignment.

This fixes
-FAIL: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c (internal 
compiler error)
-FAIL: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c (test for 
excess errors)
-UNRESOLVED: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c 
compilation failed to produce executable
-FAIL: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c (internal 
compiler error)
-FAIL: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c (test for 
excess errors)
-UNRESOLVED: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c 
compilation failed to produce executable
-FAIL: libgomp.c++/target-in-reduction-2.C (internal compiler error)
-FAIL: libgomp.c++/target-in-reduction-2.C (test for excess errors)
-UNRESOLVED: libgomp.c++/target-in-reduction-2.C compilation failed to produce 
executable
on both x86_64 and i686.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-11-17  Jakub Jelinek  

PR tree-optimization/103192
* tree-ssa-loop-im.c (move_computations_worker): Use
reset_flow_sensitive_info instead of manually clearing
SSA_NAME_RANGE_INFO and do it for all SSA_NAMEs, not just ones
with integral types.

--- gcc/tree-ssa-loop-im.c.jj   2021-09-11 09:33:37.936331237 +0200
+++ gcc/tree-ssa-loop-im.c  2021-11-16 14:42:19.546562429 +0100
@@ -1183,14 +1183,10 @@ move_computations_worker (basic_block bb
  COND_EXPR, t, arg0, arg1);
  todo |= TODO_cleanup_cfg;
}
-  if (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (new_stmt)))
- && (!ALWAYS_EXECUTED_IN (bb)
- || (ALWAYS_EXECUTED_IN (bb) != level
- && !flow_loop_nested_p (ALWAYS_EXECUTED_IN (bb), level
-   {
- tree lhs = gimple_assign_lhs (new_stmt);
- SSA_NAME_RANGE_INFO (lhs) = NULL;
-   }
+  if (!ALWAYS_EXECUTED_IN (bb)
+ || (ALWAYS_EXECUTED_IN (bb) != level
+ && !flow_loop_nested_p (ALWAYS_EXECUTED_IN (bb), level)))
+   reset_flow_sensitive_info (gimple_assign_lhs (new_stmt));
   gsi_insert_on_edge (loop_preheader_edge (level), new_stmt);
   remove_phi_node (&bsi, false);
 }
@@ -1253,14 +1249,10 @@ move_computations_worker (basic_block bb
   gsi_remove (&bsi, false);
   if (gimple_has_lhs (stmt)
  && TREE_CODE (gimple_get_lhs (stmt)) == SSA_NAME
- && INTEGRAL_TYPE_P (TREE_TYPE (gimple_get_lhs (stmt)))
  && (!ALWAYS_EXECUTED_IN (bb)
  || !(ALWAYS_EXECUTED_IN (bb) == level
   || flow_loop_nested_p (ALWAYS_EXECUTED_IN (bb), level
-   {
- tree lhs = gimple_get_lhs (stmt);
- SSA_NAME_RANGE_INFO (lhs) = NULL;
-   }
+   reset_flow_sensitive_info (gimple_get_lhs (stmt));
   /* In case this is a stmt that is not unconditionally executed
  when the target loop header is executed and the stmt may
 invoke undefined integer or pointer overflow rewrite it to

Jakub



[PATCH] libcpp: Fix up handling of block comments in -fdirectives-only mode [PR103130]

2021-11-17 Thread Jakub Jelinek via Gcc-patches
Hi!

Normal preprocessing, -fdirectives-only preprocessing before the Nathan's
rewrite, and all other compilers I've tried on godbolt treat even \*/
as end of a block comment, but the new -fdirectives-only handling doesn't.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2021-11-17  Jakub Jelinek  

PR preprocessor/103130
* lex.c (cpp_directive_only_process): Treat even \*/ as end of block
comment.

* c-c++-common/cpp/dir-only-9.c: New test.

--- libcpp/lex.c.jj 2021-11-01 14:37:06.706853026 +0100
+++ libcpp/lex.c2021-11-16 16:54:04.022644499 +0100
@@ -4493,7 +4493,7 @@ cpp_directive_only_process (cpp_reader *
break;
 
  case '*':
-   if (pos > peek && !esc)
+   if (pos > peek)
  star = is_block;
esc = false;
break;
--- gcc/testsuite/c-c++-common/cpp/dir-only-9.c.jj  2021-11-16 
16:56:57.121217975 +0100
+++ gcc/testsuite/c-c++-common/cpp/dir-only-9.c 2021-11-16 16:56:14.524815094 
+0100
@@ -0,0 +1,13 @@
+/* PR preprocessor/103130 */
+/* { dg-do preprocess } */
+/* { dg-options -fdirectives-only } */
+
+/*\
+ * this is a comment
+\*/
+
+int
+main ()
+{
+  return 0;
+}

Jakub



Re: [PATCH 2/6] Add returns_zero_on_success/failure attributes

2021-11-17 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 16 Nov 2021 at 03:42, David Malcolm  wrote:
>
> On Mon, 2021-11-15 at 12:33 +0530, Prathamesh Kulkarni wrote:
> > On Sun, 14 Nov 2021 at 02:07, David Malcolm via Gcc-patches
> >  wrote:
> > >
> > > This patch adds two new attributes.  The followup patch makes use of
> > > the attributes in -fanalyzer.
>
> [...snip...]
>
> > > +/* Handle "returns_zero_on_failure" and "returns_zero_on_success"
> > > attributes;
> > > +   arguments as in struct attribute_spec.handler.  */
> > > +
> > > +static tree
> > > +handle_returns_zero_on_attributes (tree *node, tree name, tree,
> > > int,
> > > +  bool *no_add_attrs)
> > > +{
> > > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (*node)))
> > > +{
> > > +  error ("%qE attribute on a function not returning an
> > > integral type",
> > > +name);
> > > +  *no_add_attrs = true;
> > > +}
> > > +  return NULL_TREE;
> > Hi David,
>
> Thanks for the ideas.
>
> > Just curious if a warning should be emitted if the function is marked
> > with the attribute but it's return value isn't actually 0 ?
>
> That sounds like a worthwhile extension of the idea.  It should be
> possible to identify functions that can't return zero or non-zero that
> have been marked as being able to.
>
> That said:
>
> (a) if you apply the attribute to a function pointer for a callback,
> you could have an implementation of the callback that always fails and
> returns, say, -1; should the warning complain that the function has the
> "returns_zero_on_success" property and is always returning -1?
Ah OK. In that case, emitting a diagnostic if the return value
isn't 0, doesn't make sense for "returns_zero_on_success" since the
function "always fails".
Thanks for pointing out!
>
> (b) the attributes introduce a concept of "success" vs "failure", which
> might be hard for a machine to determine.  It's only used later on in
> terms of the events presented to the user, so that -fanalyzer can emit
> e.g. "when 'copy_from_user' fails", which IMHO is easier to read than
> "when 'copy_from_user' returns non-zero".
Indeed.
>
> >
> > There are other constants like -1 or 1 that are often used to indicate
> > error, so maybe tweak the attribute to
> > take the integer as an argument ?
> > Sth like returns_int_on_success(cst) / returns_int_on_failure(cst) ?
>
> Those could work nicely; I like the idea of them being supplementary to
> the returns_zero_on_* ones.
>
> I got the urge to bikeshed about wording; some ideas:
>   success_return_value(CST)
>   failure_return_value(CST)
> or maybe additionally:
>   success_return_range(LOWER_BOUND_CST, UPPER_BOUND_CST)
>   failure_return_range(LOWER_BOUND_CST, UPPER_BOUND_CST)
Extending to range is a nice idea ;-)
Apart from success / failure, if we just had an attribute
return_range(low_cst, high_cst), I suppose that could
be useful for return value optimization ?
>
> I can also imagine a
>   sets_errno_on_failure
> attribute being useful (and perhaps a "doesnt_touch_errno"???)
More generally, would it be a good idea to provide attributes for
mod/ref anaylsis ?
So sth like:
void foo(void) __attribute__((modifies(errno)));
which would state that foo modifies errno, but neither reads nor
modifies any other global var.
and
void bar(void) __attribute__((reads(errno)))
which would state that bar only reads errno, and doesn't modify or
read any other global var.
I guess that can benefit optimization, since we can have better
context about side-effects of a function call.
For success / failure context, we could add attributes
modifies_on_success, modifies_on_failure ?

Thanks,
Prathamesh
>
> > Also, would it make sense to extend it for pointers too for returning
> > NULL on success / failure ?
>
> Possibly expressible by generalizing it to allow pointer types, or by
> adding this pair:
>
>   returns_null_on_failure
>   returns_null_on_success
>
> or by using the "range" idea above.
>
> In terms of scope, for the trust boundary stuff, I want to be able to
> express the idea that a call can succeed vs fail, what the success vs
> failure is in terms of nonzero vs zero, and to be able to wire up the
> heuristic that if it looks like a "copy function" (use of access
> attributes and a size), that success/failure can mean "copies all of
> it" vs "copies none of it" (which seems to get decent test coverage on
> the Linux kernel with the copy_from/to_user fns).
>
> Thanks
> Dave
>
>
> >
> > Thanks,
> > Prathamesh
>
> [...snip...]
>


[PATCH] ranger: Fix up fold_using_range::range_of_address [PR103255]

2021-11-17 Thread Jakub Jelinek via Gcc-patches
Hi!

If on &base->member the offset isn't constant or isn't zero and
-fdelete-null-pointer-checks and not -fwrapv-pointer and base has a range
that doesn't include NULL, we return the range of the base.
Usually it isn't a big deal, because for most pointers we just use
varying, range_zero and range_nonzero ranges and nothing beyond that,
but if a pointer is initialized from a constant, we actually track the
exact range and in that case this causes miscompilation.
As discussed on IRC, I think doing something like:
  offset_int off2;
  if (off_cst && off.is_constant (&off2))
{
  tree cst = wide_int_to_tree (sizetype, off2 / BITS_PER_UNIT);
  // adjust range r with POINTER_PLUS_EXPR cst
  if (!range_includes_zero_p (&r))
return true;
}
  // Fallback
  r = range_nonzero (TREE_TYPE (gimple_assign_rhs1 (stmt)));
  return true;
could work, given that most of the pointer ranges are just the simple ones
perhaps it is too much for little benefit.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-11-17  Jakub Jelinek  

PR tree-optimization/103255
* gimple-range-fold.cc (fold_using_range::range_of_address): Return
range_nonzero rather than unadjusted base's range.  Formatting fixes.

* gcc.c-torture/execute/pr103255.c: New test.

--- gcc/gimple-range-fold.cc.jj 2021-11-04 12:27:02.341298923 +0100
+++ gcc/gimple-range-fold.cc2021-11-16 22:10:44.453974329 +0100
@@ -720,14 +720,20 @@ fold_using_range::range_of_address (iran
}
   /* If &X->a is equal to X, the range of X is the result.  */
   if (off_cst && known_eq (off, 0))
- return true;
+   return true;
   else if (flag_delete_null_pointer_checks
   && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (expr)))
{
-/* For -fdelete-null-pointer-checks -fno-wrapv-pointer we don't
-allow going from non-NULL pointer to NULL.  */
-  if(!range_includes_zero_p (&r))
-   return true;
+ /* For -fdelete-null-pointer-checks -fno-wrapv-pointer we don't
+allow going from non-NULL pointer to NULL.  */
+ if (!range_includes_zero_p (&r))
+   {
+ /* We could here instead adjust r by off >> LOG2_BITS_PER_UNIT
+using POINTER_PLUS_EXPR if off_cst and just fall back to
+this.  */
+ r = range_nonzero (TREE_TYPE (gimple_assign_rhs1 (stmt)));
+ return true;
+   }
}
   /* If MEM_REF has a "positive" offset, consider it non-NULL
 always, for -fdelete-null-pointer-checks also "negative"
--- gcc/testsuite/gcc.c-torture/execute/pr103255.c.jj   2021-11-16 
22:14:10.660118225 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr103255.c  2021-11-16 
22:13:56.506314265 +0100
@@ -0,0 +1,41 @@
+/* PR tree-optimization/103255 */
+
+struct H
+{
+  unsigned a;
+  unsigned b;
+  unsigned c;
+};
+
+#if __SIZEOF_POINTER__ >= 4
+#define ADDR 0x40
+#else
+#define ADDR 0x4000
+#endif
+#define OFF 0x20
+
+int
+main ()
+{
+  struct H *h = 0;
+  unsigned long o;
+  volatile int t = 1;
+
+  for (o = OFF; o <= OFF; o += 0x1000)
+{
+  struct H *u;
+  u = (struct H *) (ADDR + o);
+  if (t)
+   {
+ h = u;
+ break;
+   }
+}
+
+  if (h == 0)
+return 0;
+  unsigned *tt = &h->b;
+  if ((__SIZE_TYPE__) tt != (ADDR + OFF + __builtin_offsetof (struct H, b)))
+__builtin_abort ();
+  return 0;
+}

Jakub



Finish lto parts of kill analysis

2021-11-17 Thread Jan Hubicka via Gcc-patches
Hi,
this patch adds the IPA part of modref kill analysis.  It just copies of
what local code did alrady.  I did not manage to push out all patches
for modref I planned and I will wait for next stage1.  This one however
I would like to push since it is quite simple and it makes no sense to
leave the ipa bits being collected but unused.

Bootstrapped/regtested x86_64-linux. I will commit it tonight if there
are no complains.

Honza

gcc/ChangeLog:

2021-11-17  Jan Hubicka  

* ipa-modref-tree.c: Include cgraph.h and tree-streamer.h.
(modref_access_node::stream_out): New member function.
(modref_access_node::stream_in): New member function.
* ipa-modref-tree.h (modref_access_node::stream_out,
modref_access_node::stream_in): Declare.
* ipa-modref.c (modref_summary_lto::useful_p): Free useless kills.
(modref_summary_lto::dump): Dump kills.
(analyze_store): Record kills for LTO
(analyze_stmt): Likewise.
(modref_summaries_lto::duplicate): Duplicate kills.
(write_modref_records): Use new stream_out member function.
(read_modref_records): Likewise.
(modref_write): Stream out kills.
(read_section): Stream in kills
(remap_kills): New function.
(update_signature): Use it.

diff --git a/gcc/ipa-modref-tree.c b/gcc/ipa-modref-tree.c
index bbe23a5a211..ece42ade225 100644
--- a/gcc/ipa-modref-tree.c
+++ b/gcc/ipa-modref-tree.c
@@ -27,6 +27,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "selftest.h"
 #include "tree-ssa-alias.h"
 #include "gimple.h"
+#include "cgraph.h"
+#include "tree-streamer.h"
 
 /* Return true if both accesses are the same.  */
 bool
@@ -458,6 +460,50 @@ modref_access_node::try_merge_with (vec 
 *&accesses,
   i++;
 }
 
+/* Stream out to OB.  */
+
+void
+modref_access_node::stream_out (struct output_block *ob) const
+{
+  streamer_write_hwi (ob, parm_index);
+  if (parm_index != -1)
+{
+  streamer_write_uhwi (ob, parm_offset_known);
+  if (parm_offset_known)
+   {
+ streamer_write_poly_int64 (ob, parm_offset);
+ streamer_write_poly_int64 (ob, offset);
+ streamer_write_poly_int64 (ob, size);
+ streamer_write_poly_int64 (ob, max_size);
+   }
+}
+}
+
+modref_access_node
+modref_access_node::stream_in (struct lto_input_block *ib)
+{
+  int parm_index = streamer_read_hwi (ib);
+  bool parm_offset_known = false;
+  poly_int64 parm_offset = 0;
+  poly_int64 offset = 0;
+  poly_int64 size = -1;
+  poly_int64 max_size = -1;
+
+  if (parm_index != -1)
+{
+  parm_offset_known = streamer_read_uhwi (ib);
+  if (parm_offset_known)
+   {
+ parm_offset = streamer_read_poly_int64 (ib);
+ offset = streamer_read_poly_int64 (ib);
+ size = streamer_read_poly_int64 (ib);
+ max_size = streamer_read_poly_int64 (ib);
+   }
+}
+  return {offset, size, max_size, parm_offset, parm_index,
+ parm_offset_known, false};
+}
+
 /* Insert access with OFFSET and SIZE.
Collapse tree if it has more than MAX_ACCESSES entries.
If RECORD_ADJUSTMENTs is true avoid too many interval extensions.
diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index 1bf2aa8460e..0a097349ebd 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -99,6 +99,10 @@ struct GTY(()) modref_access_node
   tree get_call_arg (const gcall *stmt) const;
   /* Build ao_ref corresponding to the access and return true if succesful.  */
   bool get_ao_ref (const gcall *stmt, class ao_ref *ref) const;
+  /* Stream access to OB.  */
+  void stream_out (struct output_block *ob) const;
+  /* Stream access in from IB.  */
+  static modref_access_node stream_in (struct lto_input_block *ib);
   /* Insert A into vector ACCESSES.  Limit size of vector to MAX_ACCESSES and
  if RECORD_ADJUSTMENT is true keep track of adjustment counts.
  Return 0 if nothing changed, 1 is insertion suceeded and -1 if failed.  */
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 90cd1be764c..9ceecdd479f 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -410,6 +410,8 @@ modref_summary_lto::useful_p (int ecf_flags, bool 
check_flags)
&& (ecf_flags & ECF_LOOPING_CONST_OR_PURE));
   if (loads && !loads->every_base)
 return true;
+  else
+kills.release ();
   if (ecf_flags & ECF_PURE)
 return ((!side_effects || !nondeterministic)
&& (ecf_flags & ECF_LOOPING_CONST_OR_PURE));
@@ -634,6 +636,15 @@ modref_summary_lto::dump (FILE *out)
   dump_lto_records (loads, out);
   fprintf (out, "  stores:\n");
   dump_lto_records (stores, out);
+  if (kills.length ())
+{
+  fprintf (out, "  kills:\n");
+  for (auto kill : kills)
+   {
+ fprintf (out, "");
+ kill.dump (out);
+   }
+}
   if (writes_errno)
 fprintf (out, "  Writes errno\n");
   if (side_effects)
@@ -1527,15 +1538,17 @@ analyze_store (gimple *stmt, tree, tree op, void 

Re: [PATCH] Fix PR target/103100 -mstrict-align and memset on not aligned buffers

2021-11-17 Thread Richard Sandiford via Gcc-patches
apinski--- via Gcc-patches  writes:
> From: Andrew Pinski 
>
> The problem here is with -mstrict-align, aarch64_expand_setmem needs
> to check the alginment of the mode to make sure we can use it for
> doing the stores.
>
> gcc/ChangeLog:
>
>   PR target/103100
>   * config/aarch64/aarch64.c (aarch64_expand_setmem):
>   Add check for alignment of the mode if STRICT_ALIGNMENT is true.
> ---
>  gcc/config/aarch64/aarch64.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index fdf05505846..2c00583e12c 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -23738,7 +23738,9 @@ aarch64_expand_setmem (rtx *operands)
>over writing.  */
>opt_scalar_int_mode mode_iter;
>FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT)
> - if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit))
> + if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit)
> + && (!STRICT_ALIGNMENT
> + || MEM_ALIGN (dst) >= GET_MODE_ALIGNMENT (mode_iter.require 
> (

Sorry for the slow review.  I think instead we should have keep
track of the alignment of the start byte.  This will be MEM_ALIGN
for the first iteration but could decrease after writing some bytes.

The net effect should be the same in practice.  It just seems
more robust.

Thanks,
Richard

> cur_mode = mode_iter.require ();
>  
>gcc_assert (cur_mode != BLKmode);


Re: [PATCH 12/15] i386: Fix non-robust split condition in define_insn_and_split

2021-11-17 Thread Kewen.Lin via Gcc-patches
Hi Uros,

on 2021/11/17 下午3:13, Uros Bizjak wrote:
> On Thu, Nov 11, 2021 at 12:25 PM Kewen Lin  wrote:
>>
>> This patch is to fix some non-robust split conditions in some
>> define_insn_and_splits, to make each of them applied on top of
>> the corresponding condition for define_insn part, otherwise the
>> splitting could perform unexpectedly.
>>
>> gcc/ChangeLog:
>>
>> * config/i386/i386.md (*add3_doubleword, *addv4_doubleword,
>> *addv4_doubleword_1, *sub3_doubleword,
>> *subv4_doubleword, *subv4_doubleword_1,
>> *add3_doubleword_cc_overflow_1, *divmodsi4_const,
>> *neg2_doubleword, *tls_dynamic_gnu2_combine_64_): Fix 
>> split
>> condition.
> 
> OK.
> 

Thanks!  Committed as r12-5334.

BR,
Kewen

> Thanks,
> Uros.
> 
>> ---
>>  gcc/config/i386/i386.md | 20 ++--
>>  1 file changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
>> index 6eb9de81921..2bd09e502ae 100644
>> --- a/gcc/config/i386/i386.md
>> +++ b/gcc/config/i386/i386.md
>> @@ -5491,7 +5491,7 @@ (define_insn_and_split "*add3_doubleword"
>> (clobber (reg:CC FLAGS_REG))]
>>"ix86_binary_operator_ok (PLUS, mode, operands)"
>>"#"
>> -  "reload_completed"
>> +  "&& reload_completed"
>>[(parallel [(set (reg:CCC FLAGS_REG)
>>(compare:CCC
>>  (plus:DWIH (match_dup 1) (match_dup 2))
>> @@ -6300,7 +6300,7 @@ (define_insn_and_split "*addv4_doubleword"
>> (plus: (match_dup 1) (match_dup 2)))]
>>"ix86_binary_operator_ok (PLUS, mode, operands)"
>>"#"
>> -  "reload_completed"
>> +  "&& reload_completed"
>>[(parallel [(set (reg:CCC FLAGS_REG)
>>(compare:CCC
>>  (plus:DWIH (match_dup 1) (match_dup 2))
>> @@ -6347,7 +6347,7 @@ (define_insn_and_split "*addv4_doubleword_1"
>> && CONST_SCALAR_INT_P (operands[2])
>> && rtx_equal_p (operands[2], operands[3])"
>>"#"
>> -  "reload_completed"
>> +  "&& reload_completed"
>>[(parallel [(set (reg:CCC FLAGS_REG)
>>(compare:CCC
>>  (plus:DWIH (match_dup 1) (match_dup 2))
>> @@ -6641,7 +6641,7 @@ (define_insn_and_split "*sub3_doubleword"
>> (clobber (reg:CC FLAGS_REG))]
>>"ix86_binary_operator_ok (MINUS, mode, operands)"
>>"#"
>> -  "reload_completed"
>> +  "&& reload_completed"
>>[(parallel [(set (reg:CC FLAGS_REG)
>>(compare:CC (match_dup 1) (match_dup 2)))
>>   (set (match_dup 0)
>> @@ -6817,7 +6817,7 @@ (define_insn_and_split "*subv4_doubleword"
>> (minus: (match_dup 1) (match_dup 2)))]
>>"ix86_binary_operator_ok (MINUS, mode, operands)"
>>"#"
>> -  "reload_completed"
>> +  "&& reload_completed"
>>[(parallel [(set (reg:CC FLAGS_REG)
>>(compare:CC (match_dup 1) (match_dup 2)))
>>   (set (match_dup 0)
>> @@ -6862,7 +6862,7 @@ (define_insn_and_split "*subv4_doubleword_1"
>> && CONST_SCALAR_INT_P (operands[2])
>> && rtx_equal_p (operands[2], operands[3])"
>>"#"
>> -  "reload_completed"
>> +  "&& reload_completed"
>>[(parallel [(set (reg:CC FLAGS_REG)
>>(compare:CC (match_dup 1) (match_dup 2)))
>>   (set (match_dup 0)
>> @@ -7542,7 +7542,7 @@ (define_insn_and_split 
>> "*add3_doubleword_cc_overflow_1"
>> (plus: (match_dup 1) (match_dup 2)))]
>>"ix86_binary_operator_ok (PLUS, mode, operands)"
>>"#"
>> -  "reload_completed"
>> +  "&& reload_completed"
>>[(parallel [(set (reg:CCC FLAGS_REG)
>>(compare:CCC
>>  (plus:DWIH (match_dup 1) (match_dup 2))
>> @@ -9000,7 +9000,7 @@ (define_insn_and_split "*divmodsi4_const"
>> (clobber (reg:CC FLAGS_REG))]
>>"!optimize_function_for_size_p (cfun)"
>>"#"
>> -  "reload_completed"
>> +  "&& reload_completed"
>>[(set (match_dup 0) (match_dup 2))
>> (set (match_dup 1) (match_dup 4))
>> (parallel [(set (match_dup 0)
>> @@ -10515,7 +10515,7 @@ (define_insn_and_split "*neg2_doubleword"
>> (clobber (reg:CC FLAGS_REG))]
>>"ix86_unary_operator_ok (NEG, mode, operands)"
>>"#"
>> -  "reload_completed"
>> +  "&& reload_completed"
>>[(parallel
>>  [(set (reg:CCC FLAGS_REG)
>>   (ne:CCC (match_dup 1) (const_int 0)))
>> @@ -16898,7 +16898,7 @@ (define_insn_and_split 
>> "*tls_dynamic_gnu2_combine_64_"
>> (clobber (reg:CC FLAGS_REG))]
>>"TARGET_64BIT && TARGET_GNU2_TLS"
>>"#"
>> -  ""
>> +  "&& 1"
>>[(set (match_dup 0) (match_dup 4))]
>>  {
>>operands[4] = can_create_pseudo_p () ? gen_reg_rtx (ptr_mode) : 
>> operands[0];
>> --
>> 2.27.0
>>



Re: [PATCH] ranger: Fix up fold_using_range::range_of_address [PR103255]

2021-11-17 Thread Aldy Hernandez via Gcc-patches
On Wed, Nov 17, 2021 at 10:32 AM Jakub Jelinek  wrote:
>
> Hi!
>
> If on &base->member the offset isn't constant or isn't zero and
> -fdelete-null-pointer-checks and not -fwrapv-pointer and base has a range
> that doesn't include NULL, we return the range of the base.
> Usually it isn't a big deal, because for most pointers we just use
> varying, range_zero and range_nonzero ranges and nothing beyond that,
> but if a pointer is initialized from a constant, we actually track the
> exact range and in that case this causes miscompilation.
> As discussed on IRC, I think doing something like:
>   offset_int off2;
>   if (off_cst && off.is_constant (&off2))
> {
>   tree cst = wide_int_to_tree (sizetype, off2 / 
> BITS_PER_UNIT);
>   // adjust range r with POINTER_PLUS_EXPR cst
>   if (!range_includes_zero_p (&r))
> return true;
> }
>   // Fallback
>   r = range_nonzero (TREE_TYPE (gimple_assign_rhs1 (stmt)));
>   return true;
> could work, given that most of the pointer ranges are just the simple ones
> perhaps it is too much for little benefit.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.  Thanks for the formatting fixes btw.

Unrelated, but we could remove the return value for range_of_address
since it always returns true.  I suppose it's consistent with
range_of_*, but unlike range_of_address, the other ones can receive
statements they can't process.

Aldy

>
> 2021-11-17  Jakub Jelinek  
>
> PR tree-optimization/103255
> * gimple-range-fold.cc (fold_using_range::range_of_address): Return
> range_nonzero rather than unadjusted base's range.  Formatting fixes.
>
> * gcc.c-torture/execute/pr103255.c: New test.
>
> --- gcc/gimple-range-fold.cc.jj 2021-11-04 12:27:02.341298923 +0100
> +++ gcc/gimple-range-fold.cc2021-11-16 22:10:44.453974329 +0100
> @@ -720,14 +720,20 @@ fold_using_range::range_of_address (iran
> }
>/* If &X->a is equal to X, the range of X is the result.  */
>if (off_cst && known_eq (off, 0))
> - return true;
> +   return true;
>else if (flag_delete_null_pointer_checks
>&& !TYPE_OVERFLOW_WRAPS (TREE_TYPE (expr)))
> {
> -/* For -fdelete-null-pointer-checks -fno-wrapv-pointer we don't
> -allow going from non-NULL pointer to NULL.  */
> -  if(!range_includes_zero_p (&r))
> -   return true;
> + /* For -fdelete-null-pointer-checks -fno-wrapv-pointer we don't
> +allow going from non-NULL pointer to NULL.  */
> + if (!range_includes_zero_p (&r))
> +   {
> + /* We could here instead adjust r by off >> LOG2_BITS_PER_UNIT
> +using POINTER_PLUS_EXPR if off_cst and just fall back to
> +this.  */
> + r = range_nonzero (TREE_TYPE (gimple_assign_rhs1 (stmt)));
> + return true;
> +   }
> }
>/* If MEM_REF has a "positive" offset, consider it non-NULL
>  always, for -fdelete-null-pointer-checks also "negative"
> --- gcc/testsuite/gcc.c-torture/execute/pr103255.c.jj   2021-11-16 
> 22:14:10.660118225 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr103255.c  2021-11-16 
> 22:13:56.506314265 +0100
> @@ -0,0 +1,41 @@
> +/* PR tree-optimization/103255 */
> +
> +struct H
> +{
> +  unsigned a;
> +  unsigned b;
> +  unsigned c;
> +};
> +
> +#if __SIZEOF_POINTER__ >= 4
> +#define ADDR 0x40
> +#else
> +#define ADDR 0x4000
> +#endif
> +#define OFF 0x20
> +
> +int
> +main ()
> +{
> +  struct H *h = 0;
> +  unsigned long o;
> +  volatile int t = 1;
> +
> +  for (o = OFF; o <= OFF; o += 0x1000)
> +{
> +  struct H *u;
> +  u = (struct H *) (ADDR + o);
> +  if (t)
> +   {
> + h = u;
> + break;
> +   }
> +}
> +
> +  if (h == 0)
> +return 0;
> +  unsigned *tt = &h->b;
> +  if ((__SIZE_TYPE__) tt != (ADDR + OFF + __builtin_offsetof (struct H, b)))
> +__builtin_abort ();
> +  return 0;
> +}
>
> Jakub
>



Re: [PATCH][GCC] aarch64: Add new vector mode V8DI

2021-11-17 Thread Richard Sandiford via Gcc-patches
Oops, only just realised that I hadn't reviewed this.

Przemyslaw Wirkus  writes:
> Hi,
> This patch is adding new V8DI mode which will be used with new Armv8.7-A
> LS64 extension intrinsics.
>
> Regtested on aarch64-elf and no issues.
>
> OK for master?
>
> gcc/ChangeLog:
>
> 2021-11-10  Przemyslaw Wirkus  
>
> * config/aarch64/aarch64-modes.def (VECTOR_MODE): New V8DI mode.
> * config/aarch64/aarch64.c (aarch64_hard_regno_mode_ok): Handle
> V8DImode.
> * config/aarch64/iterators.md (define_mode_attr nunits): Add entry
> for V8DI.
>
> Kind regards,
> Przemyslaw Wirkus
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-modes.def 
> b/gcc/config/aarch64/aarch64-modes.def
> index 
> ac97d222789c6701d858c014736f8c211512a4d9..62595b8af6e1eea8fc769885bba9fe54f0a9ec05
>  100644
> --- a/gcc/config/aarch64/aarch64-modes.def
> +++ b/gcc/config/aarch64/aarch64-modes.def
> @@ -81,6 +81,11 @@ INT_MODE (OI, 32);
>  INT_MODE (CI, 48);
>  INT_MODE (XI, 64);
>
> +/* V8DI mode.  */
> +VECTOR_MODE_WITH_PREFIX (V, INT, DI, 8, 5); \
> +  \
> +  ADJUST_ALIGNMENT (V8DI, 8);

The backslashes aren't needed here, can just be:

VECTOR_MODE_WITH_PREFIX (V, INT, DI, 8, 5);

ADJUST_ALIGNMENT (V8DI, 8);

> +
>  /* Define Advanced SIMD modes for structures of 2, 3 and 4 d-registers.  */
>  #define ADV_SIMD_D_REG_STRUCT_MODES(NVECS, VB, VH, VS, VD) \
>VECTOR_MODES_WITH_PREFIX (V##NVECS##x, INT, 8, 3); \
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 69f08052ce808c140ed2933ab6b2e2617ca6f669..0e102a83a8dc34e715fafb58169897b12c9b3a20
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3376,6 +3376,9 @@ aarch64_hard_regno_nregs (unsigned regno, machine_mode 
> mode)
>  static bool
>  aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
>  {
> +  if (mode == V8DImode)
> +return IN_RANGE (regno, R0_REGNUM, R23_REGNUM);

As you pointed out off-list, this should also check for even registers:

return (IN_RANGE (regno, R0_REGNUM, R23_REGNUM);
&& multiple_p (regno - R0_REGNUM, 2));

OK with those changes, thanks.

Richard

> +
>if (GET_MODE_CLASS (mode) == MODE_CC)
>  return regno == CC_REGNUM;
>
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 
> bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..cea277f3a03cfd20178e51e6abd7e256e206299f
>  100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -1053,7 +1053,7 @@ (define_mode_attr vas [(DI "") (SI ".2s")])
>  (define_mode_attr nunits [(V8QI "8") (V16QI "16")
>   (V4HI "4") (V8HI "8")
>   (V2SI "2") (V4SI "4")
> -(V2DI "2")
> + (V2DI "2") (V8DI "8")
>   (V4HF "4") (V8HF "8")
>   (V4BF "4") (V8BF "8")
>   (V2SF "2") (V4SF "4")


Re: [PATCH v2] rs6000: Test case adjustments for new builtins

2021-11-17 Thread Segher Boessenkool
Hi!

On Tue, Nov 16, 2021 at 02:26:22PM -0600, Bill Schmidt wrote:
> Hi!  I recently submitted [1] to make adjustments to test cases for the new 
> builtins
> support, mostly due to error messages changing for consistency.  Thanks for 
> the
> previous review.  I've reviewed the reasons for the changes and removed 
> unrelated
> changes as requested.

And the results are?  This is much easier to write up, and to review, if
you split the patch into pieces with one theme each.  If you do that
right then most reviews will be rubber-stamping, and some might require
some thought (and some may even get objections).  The way things are it
is a puzzle hunt to review this.

>  - For fold-vect-splat-floatdouble.c and fold-vec-splat-longlong.c, the 
> existing
>test cases have some bad tests in them (checking two bits when only one bit
>is meaningful).  The new builtin support catches this but the old support 
> did
>not.  Removing those bad cases changes some of the scan-assembler-times 
> expected
>values.

Do this is a separate patch then, independent of the series?  With this
explanation in the commit message.  This is pre-approved.

>  - For int_128bit-runnable.c, I chose not to do gimple folding on the 128-bit
>comparison operations in the new implementation, because doing so results 
> in
>bad code that splits things into two 64-bit values.  That needs separate
>attention; but the point here is, when I did that, I started generating
>more of the vcmpequq, vcmpgtsq, and vcmpgtuq instructions.

And you now get worse code (albeit in some cases no longer invalid)?


> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
> @@ -14,7 +14,7 @@ get_exponent (double *p)
>  {
>double source = *p;
>  
> -  return scalar_extract_exp (source);/* { dg-error 
> "'__builtin_vec_scalar_extract_exp' is not supported in this compiler 
> configuration" } */
> +  return scalar_extract_exp (source);/* { dg-error 
> "'__builtin_vsx_scalar_extract_exp' requires the" } */
>  }

The testcase uses __builtin_vec_scalar_extract_exp, so this is not okay.

> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
> @@ -12,5 +12,5 @@ get_significand (double *p)
>  {
>double source = *p;
>  
> -  return __builtin_vec_scalar_extract_sig (source); /* { dg-error 
> "'__builtin_vec_scalar_extract_sig' is not supported in this compiler 
> configuration" } */
> +  return __builtin_vec_scalar_extract_sig (source); /* { dg-error 
> "'__builtin_vsx_scalar_extract_sig' requires the" } */
>  }

This not either.

> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
> @@ -16,5 +16,5 @@ insert_exponent (unsigned long long int *significand_p,
>unsigned long long int significand = *significand_p;
>unsigned long long int exponent = *exponent_p;
>  
> -  return scalar_insert_exp (significand, exponent); /* { dg-error 
> "'__builtin_vec_scalar_insert_exp' is not supported in this compiler 
> configuration" } */
> +  return scalar_insert_exp (significand, exponent); /* { dg-error 
> "'__builtin_vsx_scalar_insert_exp' requires the" } */

Or this.

> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c
> @@ -16,5 +16,5 @@ insert_exponent (double *significand_p,
>double significand = *significand_p;
>unsigned long long int exponent = *exponent_p;
>  
> -  return scalar_insert_exp (significand, exponent); /* { dg-error 
> "'__builtin_vec_scalar_insert_exp' is not supported in this compiler 
> configuration" } */
> +  return scalar_insert_exp (significand, exponent); /* { dg-error 
> "'__builtin_vsx_scalar_insert_exp_dp' requires the" } */
>  }

Etc.

It is not okay to blindly adjust the testcases to accept what the new
code does.  This is a regression.  It is okay to have it regressed for a
while.  It is also okay to xfail things, if there is no expectation it
can be fixed before the next release (or some other suitably big time
frame, this isn't an exact science).

> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-2.c
> @@ -10,5 +10,5 @@ test_neg (float *p)
>  {
>float source = *p;
>  
> -  return __builtin_vec_scalar_test_neg_sp (source); /* { dg-error 
> "'__builtin_vsx_scalar_test_neg_sp' requires" } */
> +  return __builtin_vec_scalar_test_neg (source); /* { dg-error 
> "'__builtin_vsx_scalar_test_neg_sp' requires" } */
>  }

This one is very curious.  You change the test to use a more generic
builtin name, presumably because the (undocumented) more specific name
is no longer allowed, but the error message still uses that name?

> --- a/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c
> +++ b/gcc/tes

[PATCH] libsanitizer: Fix bootstrap on FreeBSD [PR102675]

2021-11-17 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 08, 2021 at 08:50:41AM +0100, Gerald Pfeifer wrote:
> This is the first part I committed on Friday, the second will 
> follow today.

Here is an alternative to the patch changing a file imported from
compiler-rt upstream, so that we don't need to cary a local patch for that
particular problem.

Bootstrapped/regtested on x86_64-linux and i686-linux (verified that
-DUSE_SYSTEM_MD5 is passed only when compiling
sanitizer_platform_limits_freebsd.cpp) and Gerald in the PR said
it passed bootstrap on FreeBSD as well.

Ok for trunk?

2021-11-17  Jakub Jelinek  

PR bootstrap/102675
* sanitizer_common/Makefile.am: Use -DUSE_SYSTEM_MD5 in AM_CXXFLAGS
of sanitizer_platform_limits_freebsd.cpp.
* sanitizer_common/Makefile.in: Regenerated.

--- libsanitizer/sanitizer_common/Makefile.am.jj2021-11-05 
00:43:22.647623646 +0100
+++ libsanitizer/sanitizer_common/Makefile.am   2021-11-16 12:29:58.574930436 
+0100
@@ -17,6 +17,7 @@ AM_CXXFLAGS += -DSANITIZER_LIBBACKTRACE
 endif
 AM_CCASFLAGS = $(EXTRA_ASFLAGS)
 ACLOCAL_AMFLAGS = -I m4
+sanitizer_platform_limits_freebsd.lo: AM_CXXFLAGS += -DUSE_SYSTEM_MD5
 
 noinst_LTLIBRARIES = libsanitizer_common.la
 
--- libsanitizer/sanitizer_common/Makefile.in.jj2021-11-05 
00:43:22.647623646 +0100
+++ libsanitizer/sanitizer_common/Makefile.in   2021-11-16 12:30:58.611088913 
+0100
@@ -796,6 +796,7 @@ uninstall-am:
 
 .PRECIOUS: Makefile
 
+sanitizer_platform_limits_freebsd.lo: AM_CXXFLAGS += -DUSE_SYSTEM_MD5
 
 # Tell versions [3.59,3.63) of GNU make to not export all variables.
 # Otherwise a system limit (for SysV at least) may be exceeded.

Jakub



Re: [PATCH] lim: Reset flow sensitive info even for pointers [PR103192]

2021-11-17 Thread Jeff Law via Gcc-patches




On 11/17/2021 2:16 AM, Jakub Jelinek via Gcc-patches wrote:

Hi!

Since 2014 is lim clearing SSA_NAME_RANGE_INFO for integral SSA_NAMEs
if moving them from conditional contexts inside of a loop into unconditional
before the loop, but as the miscompilation of gimplify.c shows, we need to
treat pointers the same, even for them we need to reset whether the pointer
can/can't be null or the recorded pointer alignment.

This fixes
-FAIL: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c (internal 
compiler error)
-FAIL: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c (test for 
excess errors)
-UNRESOLVED: libgomp.c/../libgomp.c-c++-common/target-in-reduction-2.c 
compilation failed to produce executable
-FAIL: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c (internal 
compiler error)
-FAIL: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c (test for 
excess errors)
-UNRESOLVED: libgomp.c++/../libgomp.c-c++-common/target-in-reduction-2.c 
compilation failed to produce executable
-FAIL: libgomp.c++/target-in-reduction-2.C (internal compiler error)
-FAIL: libgomp.c++/target-in-reduction-2.C (test for excess errors)
-UNRESOLVED: libgomp.c++/target-in-reduction-2.C compilation failed to produce 
executable
on both x86_64 and i686.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-11-17  Jakub Jelinek  

PR tree-optimization/103192
* tree-ssa-loop-im.c (move_computations_worker): Use
reset_flow_sensitive_info instead of manually clearing
SSA_NAME_RANGE_INFO and do it for all SSA_NAMEs, not just ones
with integral types.

OK
jeff



Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-17 Thread Andre Vieira (lists) via Gcc-patches


On 16/11/2021 12:10, Richard Biener wrote:

On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote:


On 12/11/2021 10:56, Richard Biener wrote:

On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:


Hi,

This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
optabs and mappings. It also creates a backend pattern to implement them
for
aarch64 and a match.pd pattern to idiom recognize these.
These IFN's (and optabs) represent a truncation towards zero, as if
performed
by first casting it to a signed integer of 32 or 64 bits and then back to
the
same floating point type/mode.

The match.pd pattern choses to use these, when supported, regardless of
trapping math, since these new patterns mimic the original behavior of
truncating through an integer.

I didn't think any of the existing IFN's represented these. I know it's a
bit
late in stage 1, but I thought this might be OK given it's only used by a
single target and should have very little impact on anything else.

Bootstrapped on aarch64-none-linux.

OK for trunk?

On the RTL side ftrunc32/ftrunc64 would probably be better a conversion
optab (with two modes), so not

+OPTAB_D (ftrunc32_optab, "ftrunc$asi2")
+OPTAB_D (ftrunc64_optab, "ftrunc$adi2")

but

OPTAB_CD (ftrunc_shrt_optab, "ftrunc$a$I$b2")

or so?  I know that gets somewhat awkward for the internal function,
but IMHO we shouldn't tie our hands because of that?

I tried doing this originally, but indeed I couldn't find a way to correctly
tie the internal function to it.

direct_optab_supported_p with multiple types expect those to be of the same
mode. I see convert_optab_supported_p does but I don't know how that is
used...

Any ideas?

No "nice" ones.  The "usual" way is to provide fake arguments that
specify the type/mode.  We could use an integer argument directly
secifying the mode (then the IL would look host dependent - ugh),
or specify a constant zero in the intended mode (less visibly
obvious - but at least with -gimple dumping you'd see the type...).

Hi,

So I reworked this to have a single optab and IFN. This required a bit 
of fiddling with custom expander and supported_p functions for the IFN. 
I decided to pass a MAX_INT for the 'int' type to the IFN to be able to 
pass on the size of the int we use as an intermediate cast.  I tried 0 
first, but gcc was being too smart and just demoted it to an 'int' for 
the long long test-cases.


Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

    * config/aarch64/aarch64.md (ftrunc2): New 
pattern.

    * config/aarch64/iterators.md (FRINTZ): New iterator.
    * doc/md.texi: New entry for ftrunc pattern name.
    * internal-fn.def (FTRUNC_INT): New IFN.
    * match.pd: Add to the existing TRUNC pattern match.
    * optabs.def (ftrunc_int): New entry.

gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.

    * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
    * gcc.target/aarch64/frintnz.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
4035e061706793849c68ae09bcb2e4b9580ab7b6..62adbc4cb6bbbe0c856f9fbe451aee08f2dea3b5
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7345,6 +7345,14 @@ (define_insn "despeculate_simpleti"
(set_attr "speculation_barrier" "true")]
 )
 
+(define_expand "ftrunc2"
+  [(set (match_operand:VSFDF 0 "register_operand" "=w")
+(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
+ FRINTNZ))]
+  "TARGET_FRINT && TARGET_FLOAT
+   && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)"
+)
+
 (define_insn "aarch64_"
   [(set (match_operand:VSFDF 0 "register_operand" "=w")
(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..49510488a2a800689e95c399f2e6c967b566516d
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -3067,6 +3067,8 @@ (define_int_iterator FCMLA [UNSPEC_FCMLA
 (define_int_iterator FRINTNZX [UNSPEC_FRINT32Z UNSPEC_FRINT32X
   UNSPEC_FRINT64Z UNSPEC_FRINT64X])
 
+(define_int_iterator FRINTNZ [UNSPEC_FRINT32Z UNSPEC_FRINT64Z])
+
 (define_int_iterator SVE_BRK_UNARY [UNSPEC_BRKA UNSPEC_BRKB])
 
 (define_int_iterator SVE_BRK_BINARY [UNSPEC_BRKN UNSPEC_BRKPA UNSPEC_BRKPB])
@@ -3482,6 +3484,8 @@ (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") 
(UNSPEC_FMLSL "s")
 (define_int_attr frintnzs_op [(UNSPEC_FRINT32Z "frint32z") (UNSPEC_FRINT32X 
"frint32x")
  (UNSPEC_FRINT64Z "frint64z") (UNSPEC_FRINT64X 
"frint64x")])
 
+(define_int_attr frintnz_mode [(UNSPEC_FRINT32Z "si") (UNSPEC_FRINT64Z "di")])
+
 ;; The condition associated with an UNSPEC_COND_.
 (define_int_attr cmp_op [(UNSPEC_COND_CMPEQ_WIDE "eq")
 (UNSPEC_COND_CMPG

Re: [PATCH 3/3] elf: Add _dl_find_eh_frame function

2021-11-17 Thread Florian Weimer via Gcc-patches
* Adhemerval Zanella via Libc-alpha:

> However the code is somewhat complex and I would like to have some feedback
> if gcc will be willing to accept this change (I assume it would require
> this code merge on glibc beforehand).

There's a long review queue on the GCC side due to the stage1 close.
It may still be considered for GCC 12.  Jakub has also requested that
we hold off committing the glibc side until the GCC side is reviewed.

I'll flesh out the commit message and NEWS entry once we have agreed
upon the interface.

>> new file mode 100644
>> index 00..c7313c122d
>> --- /dev/null
>> +++ b/elf/dl-find_eh_frame.c

>> +/* Data for the main executable.  There is usually a large gap between
>> +   the main executable and initially loaded shared objects.  Record
>> +   the main executable separately, to increase the chance that the
>> +   range for the non-closeable mappings below covers only the shared
>> +   objects (and not also the gap between main executable and shared
>> +   objects).  */
>> +static uintptr_t _dl_eh_main_map_start attribute_relro;
>> +static struct dl_eh_frame_info _dl_eh_main_info attribute_relro;
>> +
>> +/* Data for initally loaded shared objects that cannot be unlaoded.
>
> s/initally/initially and s/unlaoded/unloaded.

Fixed.

>
>> +   The mapping base addresses are stored in address order in the
>> +   _dl_eh_nodelete_mappings_bases array (containing
>> +   _dl_eh_nodelete_mappings_size elements).  The EH data for a base
>> +   address is stored in the parallel _dl_eh_nodelete_mappings_infos.
>> +   These arrays are not modified after initialization.  */
>> +static uintptr_t _dl_eh_nodelete_mappings_end attribute_relro;
>> +static size_t _dl_eh_nodelete_mappings_size attribute_relro;
>> +static uintptr_t *_dl_eh_nodelete_mappings_bases attribute_relro;
>> +static struct dl_eh_frame_info *_dl_eh_nodelete_mappings_infos
>> +  attribute_relro;
>> +
>> +/* Mappings created by dlopen can go away with dlclose, so a data
>> +   dynamic data structure with some synchronization is needed.
>
> This sounds strange ("a data dynamic data").

I dropped the first data.

>
>> +   Individual segments are similar to the _dl_eh_nodelete_mappings
>
> Maybe use _dl_eh_nodelete_mappings_*, because '_dl_eh_nodelete_mappings'
> itself if not defined anywhere.

Right.

>> +   Adding new elements to this data structure is another source of
>> +   quadratic behavior for dlopen.  If the other causes of quadratic
>> +   behavior are eliminated, a more complicated data structure will be
>> +   needed.  */
>
> This worries me, specially we have reports that python and other dynamic
> environments do use a lot of plugin and generates a lot of dlopen() calls.
> What kind of performance implication do you foresee here?

The additional overhead is not disproportionate to the other sources of
quadratic behavior.  With 1,000 dlopen'ed objects, overall run-time
seems to be comparable to the strcmp time required soname matching, for
example, and is quite difficult to measure.  So we could fix the
performance regression if we used a hash table for that …

It's just an undesirable complexity class.  The implementation is not
actually slow because it's a mostly-linear copy (although a backwards
one).  Other parts of dlopen involve pointer chasing and are much
slower.

>> +/* Allocate an empty segment that is at least SIZE large.  PREVIOUS */
>
> What this PREVIOUS refer to?

Oops, it's now:

/* Allocate an empty segment that is at least SIZE large.  PREVIOUS
   points to the chain of previously allocated segments and can be
   NULL.  */

>> +/* Update the version to reflect that an update is happening.  This
>> +   does not change the bit that controls the active segment chain.
>> +   Returns the index of the currently active segment chain.  */
>> +static inline unsigned int
>> +_dl_eh_mappings_begin_update (void)
>> +{
>> +  unsigned int v
>> += __atomic_wide_counter_fetch_add_relaxed 
>> (&_dl_eh_loaded_mappings_version,
>> +   2);
>
> Why use an 'unsigned int' for the wide counter here?

Because …

>> +  /* Subsequent stores to the TM data must not be reordered before the
>> + store above with the version update.  */
>> +  atomic_thread_fence_release ();
>> +  return v & 1;
>> +}

… we only need the lower bit.

>> +  /* Other initially loaded objects.  */
>> +  if (pc >= *_dl_eh_nodelete_mappings_bases
>> +  && pc < _dl_eh_nodelete_mappings_end)
>> +{
>> +  size_t idx = _dl_eh_find_lower_bound (pc,
>> +_dl_eh_nodelete_mappings_bases,
>> +_dl_eh_nodelete_mappings_size);
>> +  const struct dl_eh_frame_info *info
>> += _dl_eh_nodelete_mappings_infos + idx;
>
> Ins't a UB if idx is not a valid one?

idx is always valid here.

>> +  bool match;
>> +  if (idx < _dl_eh_nodelete_mappings_size
>> +  && pc == _dl_eh_nodelete_mappings_base

Re: [PATCH] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread H.J. Lu via Gcc-patches
On Wed, Nov 17, 2021 at 1:05 AM Uros Bizjak  wrote:
>
> On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches
>  wrote:
> >
> > Add -mharden-sls= to mitigate against straight line speculation (SLS)
> > for function return and indirect branch by adding an INT3 instruction
> > after function return and indirect branch.
> >
> > gcc/
> >
> > PR target/102952
> > * config/i386/i386-opts.h (harden_sls): New enum.
> > * config/i386/i386.c (output_indirect_thunk): Mitigate against
> > SLS for function return.
> > (ix86_output_function_return): Likewise.
> > (ix86_output_jmp_thunk_or_indirect): Mitigate against indirect
> > branch.
> > (ix86_output_indirect_jmp): Likewise.
> > (ix86_output_call_insn): Likewise.
> > * config/i386/i386.opt: Add -mharden-sls=.
> > * doc/invoke.texi: Document -mharden-sls=.
> >
> > gcc/testsuite/
> >
> > PR target/102952
> > * gcc.target/i386/harden-sls-1.c: New test.
> > * gcc.target/i386/harden-sls-2.c: Likewise.
> > * gcc.target/i386/harden-sls-3.c: Likewise.
> > * gcc.target/i386/harden-sls-4.c: Likewise.
> > ---
> >  gcc/config/i386/i386-opts.h  |  7 +
> >  gcc/config/i386/i386.c   | 30 
> >  gcc/config/i386/i386.opt | 20 +
> >  gcc/doc/invoke.texi  | 10 ++-
> >  gcc/testsuite/gcc.target/i386/harden-sls-1.c | 14 +
> >  gcc/testsuite/gcc.target/i386/harden-sls-2.c | 14 +
> >  gcc/testsuite/gcc.target/i386/harden-sls-3.c | 14 +
> >  gcc/testsuite/gcc.target/i386/harden-sls-4.c | 14 +
> >  8 files changed, 116 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-4.c
> >
> > diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
> > index 04e4ad608fb..171d3106d0a 100644
> > --- a/gcc/config/i386/i386-opts.h
> > +++ b/gcc/config/i386/i386-opts.h
> > @@ -121,4 +121,11 @@ enum instrument_return {
> >instrument_return_nop5
> >  };
> >
> > +enum harden_sls {
> > +  harden_sls_none = 0,
> > +  harden_sls_return = 1 << 0,
> > +  harden_sls_indirect_branch = 1 << 1,
> > +  harden_sls_all = harden_sls_return | harden_sls_indirect_branch
> > +};
> > +
> >  #endif
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index cc9f9322fad..0a902d66321 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -5914,6 +5914,8 @@ output_indirect_thunk (unsigned int regno)
> >  }
> >
> >fputs ("\tret\n", asm_out_file);
> > +  if ((ix86_harden_sls & harden_sls_return))
> > +fputs ("\tint3\n", asm_out_file);
> >  }
> >
> >  /* Output a funtion with a call and return thunk for indirect branch.
> > @@ -15987,6 +15989,8 @@ ix86_output_jmp_thunk_or_indirect (const char 
> > *thunk_name, const int regno)
> >fprintf (asm_out_file, "\tjmp\t");
> >assemble_name (asm_out_file, thunk_name);
> >putc ('\n', asm_out_file);
> > +  if ((ix86_harden_sls & harden_sls_indirect_branch))
> > +   fputs ("\tint3\n", asm_out_file);
> >  }
> >else
> >  output_indirect_thunk (regno);
> > @@ -16212,10 +16216,14 @@ ix86_output_indirect_jmp (rtx call_op)
> > gcc_unreachable ();
> >
> >ix86_output_indirect_branch (call_op, "%0", true);
> > -  return "";
> > +  if ((ix86_harden_sls & harden_sls_indirect_branch))
> > +   return "int3";
> > +  else
> > +   return "";
> >  }
> >else
> > -return "%!jmp\t%A0";
> > +return ((ix86_harden_sls & harden_sls_indirect_branch)
> > +   ? "%!jmp\t%A0\n\tint3" : "%!jmp\t%A0");
> >  }
>
> Just change existing returns to fputs and end function with:
>
> return (ix86_harden_sls & harden_sls_indirect_branch) ? "int3" : "";

But fputs doesn't support %A0.

> >  /* Output return instrumentation for current function if needed.  */
> > @@ -16283,10 +16291,15 @@ ix86_output_function_return (bool long_p)
> >return "";
> >  }
> >
> > -  if (!long_p)
> > -return "%!ret";
> > +  if ((ix86_harden_sls & harden_sls_return))
> > +return "%!ret\n\tint3";
> > +  else
> > +{
> > +  if (!long_p)
> > +   return "%!ret";
> >
> > -  return "rep%; ret";
> > +  return "rep%; ret";
> > +}
> >  }
>
> Also here.

But fputs doesn't know "%!".

>
> >
> >  /* Output indirect function return.  RET_OP is the function return
> > @@ -16381,7 +16394,12 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op)
> >if (output_indirect_p && !direct_p)
> > ix86_output_indirect_branch (call_op, xasm, true);
> >else
> > -   output_asm_insn (xasm, &call_op);
> > +   {
> > + outp

[PATCH v2] x86: Add -mindirect-branch-cs-prefix

2021-11-17 Thread H.J. Lu via Gcc-patches
Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk
via r8-r15 registers when converting indirect call and jump to increase
the instruction length to 6, allowing the non-thunk form to be inlined.

gcc/

PR target/102952
* config/i386/i386.c (ix86_output_jmp_thunk_or_indirect): Emit
CS prefix for -mindirect-branch-cs-prefix.
(ix86_output_indirect_branch_via_reg): Likewise.
* config/i386/i386.opt: Add -mindirect-branch-cs-prefix.
* doc/invoke.texi: Document -mindirect-branch-cs-prefix.

gcc/testsuite/

PR target/102952
* gcc.target/i386/indirect-thunk-cs-prefix-1.c: New test.
* gcc.target/i386/indirect-thunk-cs-prefix-2.c: Likewise.
---
 gcc/config/i386/i386.c|  6 ++
 gcc/config/i386/i386.opt  |  4 
 gcc/doc/invoke.texi   |  8 +++-
 .../gcc.target/i386/indirect-thunk-cs-prefix-1.c  | 14 ++
 .../gcc.target/i386/indirect-thunk-cs-prefix-2.c  | 15 +++
 5 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-2.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7e9b7bc347f..ae92df0be2f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15983,6 +15983,9 @@ ix86_output_jmp_thunk_or_indirect (const char 
*thunk_name, const int regno)
 {
   if (thunk_name != NULL)
 {
+  if (REX_INT_REGNO_P (regno)
+ && ix86_indirect_branch_cs_prefix)
+   fprintf (asm_out_file, "\tcs\n");
   fprintf (asm_out_file, "\tjmp\t");
   assemble_name (asm_out_file, thunk_name);
   putc ('\n', asm_out_file);
@@ -16036,6 +16039,9 @@ ix86_output_indirect_branch_via_reg (rtx call_op, bool 
sibcall_p)
 {
   if (thunk_name != NULL)
{
+ if (REX_INT_REGNO_P (regno)
+ && ix86_indirect_branch_cs_prefix)
+   fprintf (asm_out_file, "\tcs\n");
  fprintf (asm_out_file, "\tcall\t");
  assemble_name (asm_out_file, thunk_name);
  putc ('\n', asm_out_file);
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 8d499a5a4df..806ffd7b0ac 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1076,6 +1076,10 @@ Enum(indirect_branch) String(thunk-inline) 
Value(indirect_branch_thunk_inline)
 EnumValue
 Enum(indirect_branch) String(thunk-extern) Value(indirect_branch_thunk_extern)
 
+mindirect-branch-cs-prefix
+Target Var(ix86_indirect_branch_cs_prefix) Init(0)
+Add CS prefix to call and jmp to thunk via r8-r15 registers when converting 
indirect call and jump.
+
 mindirect-branch-register
 Target Var(ix86_indirect_branch_register) Init(0)
 Force indirect call and jump via register.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0265c160e02..233f3b579d9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1427,7 +1427,8 @@ See RS/6000 and PowerPC Options.
 -mstack-protector-guard-symbol=@var{symbol} @gol
 -mgeneral-regs-only  -mcall-ms2sysv-xlogues -mrelax-cmpxchg-loop @gol
 -mindirect-branch=@var{choice}  -mfunction-return=@var{choice} @gol
--mindirect-branch-register -mharden-sls=@var{choice} -mneeded}
+-mindirect-branch-register -mharden-sls=@var{choice} @gol
+-mindirect-branch-cs-prefix -mneeded}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
@@ -32409,6 +32410,11 @@ hardening.  @samp{return} enables SLS hardening for 
function return.
 @samp{indirect-branch} enables SLS hardening for indirect branch.
 @samp{all} enables all SLS hardening.
 
+@item -mindirect-branch-cs-prefix
+@opindex mindirect-branch-cs-prefix
+Add CS prefix to call and jmp to thunk via r8-r15 registers when
+converting indirect call and jump.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c 
b/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c
new file mode 100644
index 000..db2f3416823
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -ffixed-rax -ffixed-rbx -ffixed-rcx -ffixed-rdx 
-ffixed-rdi -ffixed-rsi -mindirect-branch-cs-prefix 
-mindirect-branch=thunk-extern" } */
+/* { dg-additional-options "-fno-pic" { target { ! *-*-darwin* } } } */
+
+extern void (*fptr) (void);
+
+void
+foo (void)
+{
+  fptr ();
+}
+
+/* { dg-final { scan-assembler-times "jmp\[ 
\t\]+_?__x86_indirect_thunk_r\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "\tcs" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-2.c 
b/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-2.c
new file mode 100644
index 000..adfc39a49d4
--- /dev/null
+++ b/gcc/testsuite/gcc.t

Re: [PATCH v2] rs6000: Test case adjustments for new builtins

2021-11-17 Thread Bill Schmidt via Gcc-patches


On 11/17/21 6:44 AM, Segher Boessenkool wrote:
> Hi!
>
> On Tue, Nov 16, 2021 at 02:26:22PM -0600, Bill Schmidt wrote:
>> Hi!  I recently submitted [1] to make adjustments to test cases for the new 
>> builtins
>> support, mostly due to error messages changing for consistency.  Thanks for 
>> the
>> previous review.  I've reviewed the reasons for the changes and removed 
>> unrelated
>> changes as requested.
> And the results are?  This is much easier to write up, and to review, if
> you split the patch into pieces with one theme each.  If you do that
> right then most reviews will be rubber-stamping, and some might require
> some thought (and some may even get objections).  The way things are it
> is a puzzle hunt to review this.

Sorry!  I thought I was addressing the issues that came up last time.  I didn't
intend for this to be difficult.  I will break the patch up going forward.

>
>>  - For fold-vect-splat-floatdouble.c and fold-vec-splat-longlong.c, the 
>> existing
>>test cases have some bad tests in them (checking two bits when only one 
>> bit
>>is meaningful).  The new builtin support catches this but the old support 
>> did
>>not.  Removing those bad cases changes some of the scan-assembler-times 
>> expected
>>values.
> Do this is a separate patch then, independent of the series?  With this
> explanation in the commit message.  This is pre-approved.
OK, will do.
>
>>  - For int_128bit-runnable.c, I chose not to do gimple folding on the 128-bit
>>comparison operations in the new implementation, because doing so results 
>> in
>>bad code that splits things into two 64-bit values.  That needs separate
>>attention; but the point here is, when I did that, I started generating
>>more of the vcmpequq, vcmpgtsq, and vcmpgtuq instructions.
> And you now get worse code (albeit in some cases no longer invalid)?

No, sorry that this wasn't more clear.  The "old" builtins code performs
gimple folding on 128-bit compares.  This results in correct but very
inefficient code.  The "new" builtins code has removed the gimple folding
for 128-bit compares.  This results in directly generating vcmpequq and
friends, which is the efficient code we're looking for.  This test case
then needs modification to show we're doing better.  I'll submit this
separately.

>
>
>> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
>> @@ -14,7 +14,7 @@ get_exponent (double *p)
>>  {
>>double source = *p;
>>  
>> -  return scalar_extract_exp (source);   /* { dg-error 
>> "'__builtin_vec_scalar_extract_exp' is not supported in this compiler 
>> configuration" } */
>> +  return scalar_extract_exp (source);   /* { dg-error 
>> "'__builtin_vsx_scalar_extract_exp' requires the" } */
>>  }
> The testcase uses __builtin_vec_scalar_extract_exp, so this is not okay.

Sorry, this is a case of my bad eyesight not identifying this had changed.
As with the test case (cmpb-3.c) in the 32-bit patch, this error message
isn't all that the user sees.  There is also a "note" diagnostic that ties
the generic overload name to the specific underlying builtin name so that
confusion is avoided.  I'll just submit these separately with a full
explanation.

Same applies to the similar cases below.

>
>> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
>> @@ -12,5 +12,5 @@ get_significand (double *p)
>>  {
>>double source = *p;
>>  
>> -  return __builtin_vec_scalar_extract_sig (source); /* { dg-error 
>> "'__builtin_vec_scalar_extract_sig' is not supported in this compiler 
>> configuration" } */
>> +  return __builtin_vec_scalar_extract_sig (source); /* { dg-error 
>> "'__builtin_vsx_scalar_extract_sig' requires the" } */
>>  }
> This not either.
>
>> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
>> @@ -16,5 +16,5 @@ insert_exponent (unsigned long long int *significand_p,
>>unsigned long long int significand = *significand_p;
>>unsigned long long int exponent = *exponent_p;
>>  
>> -  return scalar_insert_exp (significand, exponent); /* { dg-error 
>> "'__builtin_vec_scalar_insert_exp' is not supported in this compiler 
>> configuration" } */
>> +  return scalar_insert_exp (significand, exponent); /* { dg-error 
>> "'__builtin_vsx_scalar_insert_exp' requires the" } */
> Or this.
>
>> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c
>> @@ -16,5 +16,5 @@ insert_exponent (double *significand_p,
>>double significand = *significand_p;
>>unsigned long long int exponent = *exponent_p;
>>  
>> -  return scalar_insert_exp (significand, exponent); /* { dg-error 
>> "'__builtin_vec_scalar_insert_exp' is not supported in this compiler 
>> configuration" } */
>> +  return

Re: [PATCH] x86: Add -mindirect-branch-cs-prefix

2021-11-17 Thread H.J. Lu via Gcc-patches
On Wed, Nov 17, 2021 at 1:10 AM Uros Bizjak  wrote:
>
> On Tue, Nov 16, 2021 at 7:51 PM H.J. Lu via Gcc-patches
>  wrote:
> >
> > Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to thunk
> > via r8-r15 registers when converting indirect call and jump to increase
> > the instruction length to 6, allowing the non-thunk form to be inlined.
> >
> > gcc/
> >
> > PR target/102952
> > * config/i386/i386.c (ix86_output_jmp_thunk_or_indirect): Emit
> > CS prefix for -mindirect-branch-cs-prefix.
> > (ix86_output_indirect_branch_via_reg): Likewise.
> > * config/i386/i386.opt: Add -mindirect-branch-cs-prefix.
> > * doc/invoke.texi: Document -mindirect-branch-cs-prefix.
> >
> > gcc/testsuite/
> >
> > PR target/102952
> > * gcc.target/i386/indirect-thunk-cs-prefix-1.c: New test.
> > * gcc.target/i386/indirect-thunk-cs-prefix-2.c: Likewise.
> > ---
> >  gcc/config/i386/i386.c|  6 ++
> >  gcc/config/i386/i386.opt  |  4 
> >  gcc/doc/invoke.texi   |  8 +++-
> >  .../gcc.target/i386/indirect-thunk-cs-prefix-1.c  | 14 ++
> >  .../gcc.target/i386/indirect-thunk-cs-prefix-2.c  | 15 +++
> >  5 files changed, 46 insertions(+), 1 deletion(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-2.c
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 7e9b7bc347f..0a902d66321 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -15983,6 +15983,9 @@ ix86_output_jmp_thunk_or_indirect (const char 
> > *thunk_name, const int regno)
> >  {
> >if (thunk_name != NULL)
> >  {
> > +  if (regno >= FIRST_REX_INT_REG
>
>  REX_INT_REGNO_P

Fixed in the v2 patch.

> > + && ix86_indirect_branch_cs_prefix)
> > +   fprintf (asm_out_file, "\tcs\n");
> >fprintf (asm_out_file, "\tjmp\t");
> >assemble_name (asm_out_file, thunk_name);
> >putc ('\n', asm_out_file);
> > @@ -16036,6 +16039,9 @@ ix86_output_indirect_branch_via_reg (rtx call_op, 
> > bool sibcall_p)
> >  {
> >if (thunk_name != NULL)
> > {
> > + if (regno >= FIRST_REX_INT_REG
>
>  REX_INT_REGNO_P

Fixed in the v2 patch.

> > + && ix86_indirect_branch_cs_prefix)
> > +   fprintf (asm_out_file, "\tcs\n");
> >   fprintf (asm_out_file, "\tcall\t");
> >   assemble_name (asm_out_file, thunk_name);
> >   putc ('\n', asm_out_file);
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> > index 8d499a5a4df..c5452c49597 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -1076,6 +1076,10 @@ Enum(indirect_branch) String(thunk-inline) 
> > Value(indirect_branch_thunk_inline)
> >  EnumValue
> >  Enum(indirect_branch) String(thunk-extern) 
> > Value(indirect_branch_thunk_extern)
> >
> > +mindirect-branch-cs-prefix
> > +Target Var(ix86_indirect_branch_cs_prefix) Init(0)
> > +Add CS prefix to call and jmp to thunk when converting indirect call and 
> > jump.
>
> This is not what the function really does. It adds cs to REX prefixed regs.

Fixed in the v2 patch.

Thanks.

> > +
> >  mindirect-branch-register
> >  Target Var(ix86_indirect_branch_register) Init(0)
> >  Force indirect call and jump via register.
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index f3b4b467765..c992a7152f5 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -1425,7 +1425,8 @@ See RS/6000 and PowerPC Options.
> >  -mstack-protector-guard-symbol=@var{symbol} @gol
> >  -mgeneral-regs-only  -mcall-ms2sysv-xlogues -mrelax-cmpxchg-loop @gol
> >  -mindirect-branch=@var{choice}  -mfunction-return=@var{choice} @gol
> > --mindirect-branch-register -mharden-sls=@var{choice} -mneeded}
> > +-mindirect-branch-register -mharden-sls=@var{choice} @gol
> > +-mindirect-branch-cs-prefix -mneeded}
> >
> >  @emph{x86 Windows Options}
> >  @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
> > @@ -32390,6 +32391,11 @@ hardening.  @samp{return} enables SLS hardening 
> > for function return.
> >  @samp{indirect-branch} enables SLS hardening for indirect branch.
> >  @samp{all} enables all SLS hardening.
> >
> > +@item -mindirect-branch-cs-prefix
> > +@opindex mindirect-branch-cs-prefix
> > +Add CS prefix to call and jmp to thunk via r8-r15 registers when
> > +converting indirect call and jump.
> > +
> >  @end table
> >
> >  These @samp{-m} switches are supported in addition to the above
> > diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c 
> > b/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c
> > new file mode 100644
> > index 000..db2f3416823
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c
> > @@ 

RE: [PATCH][GCC] aarch64: Add new vector mode V8DI

2021-11-17 Thread Przemyslaw Wirkus via Gcc-patches



> -Original Message-
> From: Richard Sandiford 
> Sent: 17 November 2021 10:08
> To: Przemyslaw Wirkus 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> ; Kyrylo Tkachov ;
> Marcus Shawcroft 
> Subject: Re: [PATCH][GCC] aarch64: Add new vector mode V8DI
> 
> Oops, only just realised that I hadn't reviewed this.
> 
> Przemyslaw Wirkus  writes:
> > Hi,
> > This patch is adding new V8DI mode which will be used with new
> > Armv8.7-A
> > LS64 extension intrinsics.
> >
> > Regtested on aarch64-elf and no issues.
> >
> > OK for master?
> >
> > gcc/ChangeLog:
> >
> > 2021-11-10  Przemyslaw Wirkus  
> >
> > * config/aarch64/aarch64-modes.def (VECTOR_MODE): New V8DI
> mode.
> > * config/aarch64/aarch64.c (aarch64_hard_regno_mode_ok): Handle
> > V8DImode.
> > * config/aarch64/iterators.md (define_mode_attr nunits): Add entry
> > for V8DI.
> >
> > Kind regards,
> > Przemyslaw Wirkus
> >
> > ---
> >
> > diff --git a/gcc/config/aarch64/aarch64-modes.def
> > b/gcc/config/aarch64/aarch64-modes.def
> > index
> >
> ac97d222789c6701d858c014736f8c211512a4d9..62595b8af6e1eea8fc769885
> bba9
> > fe54f0a9ec05 100644
> > --- a/gcc/config/aarch64/aarch64-modes.def
> > +++ b/gcc/config/aarch64/aarch64-modes.def
> > @@ -81,6 +81,11 @@ INT_MODE (OI, 32);
> >  INT_MODE (CI, 48);
> >  INT_MODE (XI, 64);
> >
> > +/* V8DI mode.  */
> > +VECTOR_MODE_WITH_PREFIX (V, INT, DI, 8, 5); \
> > +  \
> > +  ADJUST_ALIGNMENT (V8DI, 8);
> 
> The backslashes aren't needed here, can just be:
> 
> VECTOR_MODE_WITH_PREFIX (V, INT, DI, 8, 5);
> 
> ADJUST_ALIGNMENT (V8DI, 8);
> 
> > +
> >  /* Define Advanced SIMD modes for structures of 2, 3 and 4
> > d-registers.  */  #define ADV_SIMD_D_REG_STRUCT_MODES(NVECS, VB,
> VH, VS, VD) \
> >VECTOR_MODES_WITH_PREFIX (V##NVECS##x, INT, 8, 3); \ diff --git
> > a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index
> >
> 69f08052ce808c140ed2933ab6b2e2617ca6f669..0e102a83a8dc34e715fafb58
> 1698
> > 97b12c9b3a20 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -3376,6 +3376,9 @@ aarch64_hard_regno_nregs (unsigned regno,
> > machine_mode mode)  static bool  aarch64_hard_regno_mode_ok
> (unsigned
> > regno, machine_mode mode)  {
> > +  if (mode == V8DImode)
> > +return IN_RANGE (regno, R0_REGNUM, R23_REGNUM);
> 
> As you pointed out off-list, this should also check for even registers:
> 
> return (IN_RANGE (regno, R0_REGNUM, R23_REGNUM);
>   && multiple_p (regno - R0_REGNUM, 2));
> 
> OK with those changes, thanks.

Thank you.

Installed with changes:
commit dd159a4167ca19b5ff26e7156333c88e854943bf

/Przemek

> Richard
> 
> > +
> >if (GET_MODE_CLASS (mode) == MODE_CC)
> >  return regno == CC_REGNUM;
> >
> > diff --git a/gcc/config/aarch64/iterators.md
> > b/gcc/config/aarch64/iterators.md index
> >
> bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..cea277f3a03cfd20178e51e6
> abd7
> > e256e206299f 100644
> > --- a/gcc/config/aarch64/iterators.md
> > +++ b/gcc/config/aarch64/iterators.md
> > @@ -1053,7 +1053,7 @@ (define_mode_attr vas [(DI "") (SI ".2s")])
> > (define_mode_attr nunits [(V8QI "8") (V16QI "16")
> >   (V4HI "4") (V8HI "8")
> >   (V2SI "2") (V4SI "4")
> > -(V2DI "2")
> > + (V2DI "2") (V8DI "8")
> >   (V4HF "4") (V8HF "8")
> >   (V4BF "4") (V8BF "8")
> >   (V2SF "2") (V4SF "4")


Re: [PATCH v1 2/2] RISC-V: Add instruction fusion (for ventana-vt1)

2021-11-17 Thread Kito Cheng via Gcc-patches
Hi Philipp:

Thanks for the patch, I like this approach, that can easily configure
different capabilities for each core :)

So there are only a few minor comments for this patch.

On Mon, Nov 15, 2021 at 5:49 AM Philipp Tomsich
 wrote:
>
> From: Philipp Tomsich 
>
> The Ventana VT1 core supports quad-issue and instruction fusion.
> This implemented TARGET_SCHED_MACRO_FUSION_P to keep fusible sequences
> together and adds idiom matcheing for the supported fusion cases.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.c (enum riscv_fusion_pairs): Add symbolic
> constants to identify supported fusion patterns.
> (struct riscv_tune_param): Add fusible_op field.
> (riscv_macro_fusion_p): Implement.
> (riscv_fusion_enabled_p): Implement.
> (riscv_macro_fusion_pair_p): Implement and recoginze fusible
> idioms for Ventana VT1.
> (TARGET_SCHED_MACRO_FUSION_P): Point to riscv_macro_fusion_p.
> (TARGET_SCHED_MACRO_FUSION_PAIR_P): Point to 
> riscv_macro_fusion_pair_p.
>
> Signed-off-by: Philipp Tomsich 
> ---
>
>  gcc/config/riscv/riscv.c | 196 +++
>  1 file changed, 196 insertions(+)
>
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index 6b918db65e9..8eac52101a3 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -211,6 +211,19 @@ struct riscv_integer_op {
> The worst case is LUI, ADDI, SLLI, ADDI, SLLI, ADDI, SLLI, ADDI.  */
>  #define RISCV_MAX_INTEGER_OPS 8
>
> +enum riscv_fusion_pairs
> +{
> +  RISCV_FUSE_NOTHING = 0,
> +  RISCV_FUSE_ZEXTW = (1 << 0),
> +  RISCV_FUSE_ZEXTH = (1 << 1),
> +  RISCV_FUSE_ZEXTWS = (1 << 2),
> +  RISCV_FUSE_LDINDEXED = (1 << 3),

RISCV_FUSE_LDINDEXED -> RISCV_FUSE_LD_INDEXED

Could you add some comment for above enums, like that:
/* slli rx, rx, 32 + srli rx, rx, 32 */
RISCV_FUSE_ZEXTW

So that we could know what kind of instruction will be funded for this enum.

> +  RISCV_FUSE_LUI_ADDI = (1 << 4),
> +  RISCV_FUSE_AUIPC_ADDI = (1 << 5),
> +  RISCV_FUSE_LUI_LD = (1 << 6),
> +  RISCV_FUSE_AUIPC_LD = (1 << 7),
> +};
> +
>  /* Costs of various operations on the different architectures.  */
>
>  struct riscv_tune_param
> @@ -224,6 +237,7 @@ struct riscv_tune_param
>unsigned short branch_cost;
>unsigned short memory_cost;
>bool slow_unaligned_access;
> +  unsigned int fusible_ops;
>  };
>
>  /* Information about one micro-arch we know about.  */
> @@ -289,6 +303,7 @@ static const struct riscv_tune_param rocket_tune_info = {
>3,   /* branch_cost */
>5,   /* memory_cost */
>true,/* 
> slow_unaligned_access */
> +  RISCV_FUSE_NOTHING,   /* fusible_ops */
>  };
>
>  /* Costs to use when optimizing for Sifive 7 Series.  */
> @@ -302,6 +317,7 @@ static const struct riscv_tune_param sifive_7_tune_info = 
> {
>4,   /* branch_cost */
>3,   /* memory_cost */
>true,/* 
> slow_unaligned_access */
> +  RISCV_FUSE_NOTHING,   /* fusible_ops */
>  };
>
>  /* Costs to use when optimizing for T-HEAD c906.  */
> @@ -328,6 +344,7 @@ static const struct riscv_tune_param 
> optimize_size_tune_info = {
>1,   /* branch_cost */
>2,   /* memory_cost */
>false,   /* slow_unaligned_access */
> +  RISCV_FUSE_NOTHING,   /* fusible_ops */
>  };
>
>  /* Costs to use when optimizing for Ventana Micro VT1.  */
> @@ -341,6 +358,10 @@ static const struct riscv_tune_param 
> ventana_vt1_tune_info = {
>4,   /* branch_cost */
>5,   /* memory_cost */
>false,   /* slow_unaligned_access */
> +  ( RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH |   /* fusible_ops */
> +RISCV_FUSE_ZEXTWS | RISCV_FUSE_LDINDEXED |
> +RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI |
> +RISCV_FUSE_LUI_LD | RISCV_FUSE_AUIPC_LD )
>  };
>
>  static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
> @@ -4909,6 +4930,177 @@ riscv_issue_rate (void)
>return tune_param->issue_rate;
>  }
>
> +/* Implement TARGET_SCHED_MACRO_FUSION_P.  Return true if target supports
> +   instruction fusion of some sort.  */
> +
> +static bool
> +riscv_macro_fusion_p (void)
> +{
> +  return tune_param->fusible_ops != RISCV_FUSE_NOTHING;
> +}
> +
> +/* Return true iff the instruction fusion described by OP is enabled.  */
> +
> +static bool
> +riscv_fusion_enabled_p(enum riscv_fusion_pairs op)

space between function name and parentheses.

riscv_fusion_enabled_p (enum riscv_fusion_pairs op)

> +

Re: [PATCH v1 0/2] Basic support for the Ventana VT1 w/ instruction fusion

2021-11-17 Thread Kito Cheng via Gcc-patches
Hi Philipp:

This patch set LGTM, feel free to commit once addressed those issues.

On Mon, Nov 15, 2021 at 5:48 AM Philipp Tomsich
 wrote:
>
>
> This series provides support for the Ventana VT1 (a 4-way superscalar
> rv64gc_zba_zbb_zbc_zbs core) including support for the supported
> instruction fusion patterns.
>
> This includes the addition of the fusion-aware scheduling
> infrastructure for RISC-V and implements idiom recognition for the
> fusion patterns supported by VT1.
>
>
> Philipp Tomsich (2):
>   RISC-V: Add basic support for the Ventana-VT1 core
>   RISC-V: Add instruction fusion (for ventana-vt1)
>
>  gcc/config/riscv/riscv-cores.def |   2 +
>  gcc/config/riscv/riscv-opts.h|   3 +-
>  gcc/config/riscv/riscv.c | 210 +++
>  gcc/config/riscv/riscv.md|   2 +-
>  gcc/doc/invoke.texi  |   4 +-
>  5 files changed, 217 insertions(+), 4 deletions(-)
>
> --
> 2.32.0
>


Re: [PATCH] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 17, 2021 at 2:46 PM H.J. Lu  wrote:
>
> On Wed, Nov 17, 2021 at 1:05 AM Uros Bizjak  wrote:
> >
> > On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches
> >  wrote:
> > >
> > > Add -mharden-sls= to mitigate against straight line speculation (SLS)
> > > for function return and indirect branch by adding an INT3 instruction
> > > after function return and indirect branch.
> > >
> > > gcc/
> > >
> > > PR target/102952
> > > * config/i386/i386-opts.h (harden_sls): New enum.
> > > * config/i386/i386.c (output_indirect_thunk): Mitigate against
> > > SLS for function return.
> > > (ix86_output_function_return): Likewise.
> > > (ix86_output_jmp_thunk_or_indirect): Mitigate against indirect
> > > branch.
> > > (ix86_output_indirect_jmp): Likewise.
> > > (ix86_output_call_insn): Likewise.
> > > * config/i386/i386.opt: Add -mharden-sls=.
> > > * doc/invoke.texi: Document -mharden-sls=.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/102952
> > > * gcc.target/i386/harden-sls-1.c: New test.
> > > * gcc.target/i386/harden-sls-2.c: Likewise.
> > > * gcc.target/i386/harden-sls-3.c: Likewise.
> > > * gcc.target/i386/harden-sls-4.c: Likewise.
> > > ---
> > >  gcc/config/i386/i386-opts.h  |  7 +
> > >  gcc/config/i386/i386.c   | 30 
> > >  gcc/config/i386/i386.opt | 20 +
> > >  gcc/doc/invoke.texi  | 10 ++-
> > >  gcc/testsuite/gcc.target/i386/harden-sls-1.c | 14 +
> > >  gcc/testsuite/gcc.target/i386/harden-sls-2.c | 14 +
> > >  gcc/testsuite/gcc.target/i386/harden-sls-3.c | 14 +
> > >  gcc/testsuite/gcc.target/i386/harden-sls-4.c | 14 +
> > >  8 files changed, 116 insertions(+), 7 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-3.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-4.c
> > >
> > > diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
> > > index 04e4ad608fb..171d3106d0a 100644
> > > --- a/gcc/config/i386/i386-opts.h
> > > +++ b/gcc/config/i386/i386-opts.h
> > > @@ -121,4 +121,11 @@ enum instrument_return {
> > >instrument_return_nop5
> > >  };
> > >
> > > +enum harden_sls {
> > > +  harden_sls_none = 0,
> > > +  harden_sls_return = 1 << 0,
> > > +  harden_sls_indirect_branch = 1 << 1,
> > > +  harden_sls_all = harden_sls_return | harden_sls_indirect_branch
> > > +};
> > > +
> > >  #endif
> > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > index cc9f9322fad..0a902d66321 100644
> > > --- a/gcc/config/i386/i386.c
> > > +++ b/gcc/config/i386/i386.c
> > > @@ -5914,6 +5914,8 @@ output_indirect_thunk (unsigned int regno)
> > >  }
> > >
> > >fputs ("\tret\n", asm_out_file);
> > > +  if ((ix86_harden_sls & harden_sls_return))
> > > +fputs ("\tint3\n", asm_out_file);
> > >  }
> > >
> > >  /* Output a funtion with a call and return thunk for indirect branch.
> > > @@ -15987,6 +15989,8 @@ ix86_output_jmp_thunk_or_indirect (const char 
> > > *thunk_name, const int regno)
> > >fprintf (asm_out_file, "\tjmp\t");
> > >assemble_name (asm_out_file, thunk_name);
> > >putc ('\n', asm_out_file);
> > > +  if ((ix86_harden_sls & harden_sls_indirect_branch))
> > > +   fputs ("\tint3\n", asm_out_file);
> > >  }
> > >else
> > >  output_indirect_thunk (regno);
> > > @@ -16212,10 +16216,14 @@ ix86_output_indirect_jmp (rtx call_op)
> > > gcc_unreachable ();
> > >
> > >ix86_output_indirect_branch (call_op, "%0", true);
> > > -  return "";
> > > +  if ((ix86_harden_sls & harden_sls_indirect_branch))
> > > +   return "int3";
> > > +  else
> > > +   return "";
> > >  }
> > >else
> > > -return "%!jmp\t%A0";
> > > +return ((ix86_harden_sls & harden_sls_indirect_branch)
> > > +   ? "%!jmp\t%A0\n\tint3" : "%!jmp\t%A0");
> > >  }
> >
> > Just change existing returns to fputs and end function with:
> >
> > return (ix86_harden_sls & harden_sls_indirect_branch) ? "int3" : "";
>
> But fputs doesn't support %A0.

Sorry for the thinko, output_asm_insn instead of fputs will do the trick.

Uros.

>
> > >  /* Output return instrumentation for current function if needed.  */
> > > @@ -16283,10 +16291,15 @@ ix86_output_function_return (bool long_p)
> > >return "";
> > >  }
> > >
> > > -  if (!long_p)
> > > -return "%!ret";
> > > +  if ((ix86_harden_sls & harden_sls_return))
> > > +return "%!ret\n\tint3";
> > > +  else
> > > +{
> > > +  if (!long_p)
> > > +   return "%!ret";
> > >
> > > -  return "rep%; ret";
> > > +  return "rep%; ret";
> > > +}
> > >  }
> >
> > Also here.
>
> But fpu

Re: [PATCH v1 1/8] bswap: synthesize HImode bswap from SImode or DImode

2021-11-17 Thread Kito Cheng via Gcc-patches
Hi Philipp:

I would suggest add define_expand pattern for bswaphi2 rather than
changing expand_unop with following reasons:

- There is a comment above this change, and it also tried widen_bswap
after this if-block,
  so I think this patch is kind of violating this comment.
 /* HImode is special because in this mode BSWAP is equivalent to ROTATE
or ROTATERT.  First try these directly; if this fails, then try the
obvious pair of shifts with allowed widening, as this will probably
be always more efficient than the other fallback methods.  */

- This change doesn't improve the code gen without bswapsi2 or bswapdi2,
  (e.g. rv64gc result same code) and this also might also affect other targets,
  but we didn't have evidence it will always get better results, so I guess at
  least we should add a target hook for this.

- ...I didn't have permission to approve this change since it's not
part of RISC-V back-end :p

On Thu, Nov 11, 2021 at 10:10 PM Philipp Tomsich
 wrote:
>
> The RISC-V Zbb extension adds an XLEN (i.e. SImode for rv32, DImode
> for rv64) bswap instruction (rev8).  While, with the current master,
> SImode is synthesized correctly from DImode, HImode is not.
>
> This change adds an appropriate expansion for a HImode bswap, if a
> wider bswap is available.
>
> Without this change, the following rv64gc_zbb code is generated for
> __builtin_bswap16():
> slliw   a5,a0,8
> zext.h  a0,a0
> srliw   a0,a0,8
> or  a0,a5,a0
> sext.h  a0,a0  // this is a 16bit sign-extension following
>// the byteswap (e.g. on a 'short' function
>// return).
>
> After this change, a bswap (rev8) is used and any extensions are
> combined into the shift-right:
> rev8a0,a0
> sraia0,a0,48   // the sign-extension is combined into the
>// shift; a srli is emitted otherwise...
>
> gcc/ChangeLog:
>
> * optabs.c (expand_unop): support expanding a HImode bswap
>   using SImode or DImode, followed by a shift.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/zbb-bswap.c: New test.
>
> Signed-off-by: Philipp Tomsich 
> ---
>
>  gcc/optabs.c   |  6 ++
>  gcc/testsuite/gcc.target/riscv/zbb-bswap.c | 22 ++
>  2 files changed, 28 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-bswap.c
>
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index 019bbb62882..7a3ffbe4525 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -3307,6 +3307,12 @@ expand_unop (machine_mode mode, optab unoptab, rtx 
> op0, rtx target,
> return temp;
> }
>
> + /* If we are missing a HImode BSWAP, but have one for SImode or
> +DImode, use a BSWAP followed by a SHIFT.  */
> + temp = widen_bswap (as_a  (mode), op0, target);
> + if (temp)
> +   return temp;
> +
>   last = get_last_insn ();
>
>   temp1 = expand_binop (mode, ashl_optab, op0,
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-bswap.c 
> b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c
> new file mode 100644
> index 000..6ee27d9f47a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
> +
> +unsigned long
> +func64 (unsigned long i)
> +{
> +  return __builtin_bswap64(i);
> +}
> +
> +unsigned int
> +func32 (unsigned int i)
> +{
> +  return __builtin_bswap32(i);
> +}
> +
> +unsigned short
> +func16 (unsigned short i)
> +{
> +  return __builtin_bswap16(i);
> +}
> +
> +/* { dg-final { scan-assembler-times "rev8" 3 } } */
> --
> 2.32.0
>


Re: [PATCH v1 2/8] RISC-V: costs: handle BSWAP

2021-11-17 Thread Kito Cheng via Gcc-patches
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index c77b0322869..8480cf09294 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -2131,6 +2131,14 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
> outer_code, int opno ATTRIBUTE_UN
>*total = riscv_extend_cost (XEXP (x, 0), GET_CODE (x) == ZERO_EXTEND);
>return false;
>
> +case BSWAP:
> +  if (TARGET_ZBB)
> +   {
> + *total = COSTS_N_INSNS (1);

Add a cost model for HImode? maybe `*total = COSTS_N_INSNS (mode ==
HImode ? 2 : 1);` ?


Re: [PATCH] handle folded nonconstant array bounds [PR101702]

2021-11-17 Thread Marek Polacek via Gcc-patches
On Tue, Nov 16, 2021 at 05:32:00PM -0700, Martin Sebor via Gcc-patches wrote:
> -Warray-parameter and -Wvla-parameter assume that array bounds
> in function parameters are either constant integers or variable,
> but not something in between like a cast of a constant that's
> not recognized as an INTEGER_CST until we strip the cast from
> it.  This leads to an ICE as the the internal checks fail.
> 
> The attached patch fixes the problem by stripping the casts
> earlier than before, preventing the inconsistency.  In addition,
> it also folds the array bound, avoiding a class of false
> positives and negatives that not doing so would lead to otherwise.
> 
> Tested on x86_64-linux.
> 
> Martin

> Handle folded nonconstant array bounds [PR101702]
> 
> PR c/101702 - ICE: in handle_argspec_attribute, at c-family/c-attribs.c:3623
> 
> gcc/c/ChangeLog:
> 
>   PR c/101702
>   * c-decl.c (get_parm_array_spec): Strip casts earlier and fold array
>   bounds before deciding if they're constant.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c/101702
>   * gcc.dg/Warray-parameter-11.c: New test.
> 
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index 186fa1692c1..63d806a84c9 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -5866,6 +5866,12 @@ get_parm_array_spec (const struct c_parm *parm, tree 
> attrs)
>if (pd->u.array.static_p)
>   spec += 's';
>  
> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (nelts)))
> + /* Avoid invalid NELTS.  */
> + return attrs;
> +
> +  STRIP_NOPS (nelts);
> +  nelts = c_fully_fold (nelts, false, nullptr);

STRIP_NOPS before a call to c_fully_fold looks sort of weird, but I see
it's needed to prevent bogus warnings in Wvla-parameter-12.c:

void f2ci_can (const int m, char a[m]);
void f2ci_can (int n,   char a[n])

OK for trunk then.

>if (TREE_CODE (nelts) == INTEGER_CST)
>   {
> /* Skip all constant bounds except the most significant one.
> @@ -5883,13 +5889,9 @@ get_parm_array_spec (const struct c_parm *parm, tree 
> attrs)
> spec += buf;
> break;
>   }
> -  else if (!INTEGRAL_TYPE_P (TREE_TYPE (nelts)))
> - /* Avoid invalid NELTS.  */
> - return attrs;
>  
>/* Each variable VLA bound is represented by a dollar sign.  */
>spec += "$";
> -  STRIP_NOPS (nelts);
>vbchain = tree_cons (NULL_TREE, nelts, vbchain);
>  }
>  
> diff --git a/gcc/testsuite/gcc.dg/Warray-parameter-11.c 
> b/gcc/testsuite/gcc.dg/Warray-parameter-11.c
> new file mode 100644
> index 000..8ca1b55bd28
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/Warray-parameter-11.c
> @@ -0,0 +1,24 @@
> +/* PR c/101702 - ICE on invalid function redeclaration
> +   { dg-do compile }
> +   { dg-options "-Wall" } */
> +
> +typedef __INTPTR_TYPE__ intptr_t;
> +
> +#define copysign(x, y) __builtin_copysign (x, y)
> +
> +void f0 (double[!copysign (~2, 3)]);
> +
> +void f1 (double[!copysign (~2, 3)]);
> +void f1 (double[1]);// { dg-warning "-Warray-parameter" }
> +
> +void f2 (int[(int)+1.0]);
> +void f2 (int[(int)+1.1]);
> +
> +/* Also verify that equivalent expressions don't needlessly cause false
> +   positives or negatives.  */
> +struct S { int a[1]; };
> +extern struct S *sp;
> +
> +void f3 (int[(intptr_t)((char*)sp->a - (char*)sp)]);
> +void f3 (int[(intptr_t)((char*)&sp->a[0] - (char*)sp)]);
> +void f3 (int[(intptr_t)((char*)&sp->a[1] - (char*)sp)]);   // { dg-warning 
> "-Warray-parameter" }


Marek



[PATCH v2] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread H.J. Lu via Gcc-patches
Add -mharden-sls= to mitigate against straight line speculation (SLS)
for function return and indirect branch by adding an INT3 instruction
after function return and indirect branch.

gcc/

PR target/102952
* config/i386/i386-opts.h (harden_sls): New enum.
* config/i386/i386.c (output_indirect_thunk): Mitigate against
SLS for function return.
(ix86_output_function_return): Likewise.
(ix86_output_jmp_thunk_or_indirect): Mitigate against indirect
branch.
(ix86_output_indirect_jmp): Likewise.
(ix86_output_call_insn): Likewise.
* config/i386/i386.opt: Add -mharden-sls=.
* doc/invoke.texi: Document -mharden-sls=.

gcc/testsuite/

PR target/102952
* gcc.target/i386/harden-sls-1.c: New test.
* gcc.target/i386/harden-sls-2.c: Likewise.
* gcc.target/i386/harden-sls-3.c: Likewise.
* gcc.target/i386/harden-sls-4.c: Likewise.
* gcc.target/i386/harden-sls-5.c: Likewise.
---
 gcc/config/i386/i386-opts.h  |  7 ++
 gcc/config/i386/i386.c   | 23 ++--
 gcc/config/i386/i386.opt | 20 +
 gcc/doc/invoke.texi  | 10 -
 gcc/testsuite/gcc.target/i386/harden-sls-1.c | 14 
 gcc/testsuite/gcc.target/i386/harden-sls-2.c | 14 
 gcc/testsuite/gcc.target/i386/harden-sls-3.c | 14 
 gcc/testsuite/gcc.target/i386/harden-sls-4.c | 16 ++
 gcc/testsuite/gcc.target/i386/harden-sls-5.c | 17 +++
 9 files changed, 127 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-5.c

diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index 04e4ad608fb..171d3106d0a 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -121,4 +121,11 @@ enum instrument_return {
   instrument_return_nop5
 };
 
+enum harden_sls {
+  harden_sls_none = 0,
+  harden_sls_return = 1 << 0,
+  harden_sls_indirect_branch = 1 << 1,
+  harden_sls_all = harden_sls_return | harden_sls_indirect_branch
+};
+
 #endif
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 73c4d5115bb..8bbf6ae9875 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5914,6 +5914,8 @@ output_indirect_thunk (unsigned int regno)
 }
 
   fputs ("\tret\n", asm_out_file);
+  if ((ix86_harden_sls & harden_sls_return))
+fputs ("\tint3\n", asm_out_file);
 }
 
 /* Output a funtion with a call and return thunk for indirect branch.
@@ -15984,6 +15986,8 @@ ix86_output_jmp_thunk_or_indirect (const char 
*thunk_name, const int regno)
   fprintf (asm_out_file, "\tjmp\t");
   assemble_name (asm_out_file, thunk_name);
   putc ('\n', asm_out_file);
+  if ((ix86_harden_sls & harden_sls_indirect_branch))
+   fputs ("\tint3\n", asm_out_file);
 }
   else
 output_indirect_thunk (regno);
@@ -16206,10 +16210,10 @@ ix86_output_indirect_jmp (rtx call_op)
gcc_unreachable ();
 
   ix86_output_indirect_branch (call_op, "%0", true);
-  return "";
 }
   else
-return "%!jmp\t%A0";
+output_asm_insn ("%!jmp\t%A0", &call_op);
+  return (ix86_harden_sls & harden_sls_indirect_branch) ? "int3" : "";
 }
 
 /* Output return instrumentation for current function if needed.  */
@@ -16277,10 +16281,10 @@ ix86_output_function_return (bool long_p)
   return "";
 }
 
-  if (!long_p)
-return "%!ret";
-
-  return "rep%; ret";
+  if ((ix86_harden_sls & harden_sls_return))
+long_p = false;
+  output_asm_insn (long_p ? "rep%; ret" : "%!ret", nullptr);
+  return (ix86_harden_sls & harden_sls_return) ? "int3" : "";
 }
 
 /* Output indirect function return.  RET_OP is the function return
@@ -16375,7 +16379,12 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op)
   if (output_indirect_p && !direct_p)
ix86_output_indirect_branch (call_op, xasm, true);
   else
-   output_asm_insn (xasm, &call_op);
+   {
+ output_asm_insn (xasm, &call_op);
+ if (!direct_p
+ && (ix86_harden_sls & harden_sls_indirect_branch))
+   return "int3";
+   }
   return "";
 }
 
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 46fad3cc038..8d499a5a4df 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1117,6 +1117,26 @@ mrecord-return
 Target Var(ix86_flag_record_return) Init(0)
 Generate a __return_loc section pointing to all return instrumentation code.
 
+mharden-sls=
+Target RejectNegative Joined Enum(harden_sls) Var(ix86_harden_sls) 
Init(harden_sls_none)
+Generate code to mitigate against straight line speculation.
+
+Enum

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-17 Thread Richard Sandiford via Gcc-patches
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 4035e061706793849c68ae09bcb2e4b9580ab7b6..62adbc4cb6bbbe0c856f9fbe451aee08f2dea3b5
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -7345,6 +7345,14 @@ (define_insn "despeculate_simpleti"
> (set_attr "speculation_barrier" "true")]
>  )
>  
> +(define_expand "ftrunc2"
> +  [(set (match_operand:VSFDF 0 "register_operand" "=w")
> +(unspec:VSFDF [(match_operand:VSFDF 1 "register_operand" "w")]
> +   FRINTNZ))]
> +  "TARGET_FRINT && TARGET_FLOAT
> +   && !(VECTOR_MODE_P (mode) && !TARGET_SIMD)"
> +)

Probably just me, but this condition seems quite hard to read.
I think it'd be better to add conditions to the VSFDF definition instead,
a bit like we do for the HF entries in VHSDF_HSDF and VHSDF_DF.  I.e.:

(define_mode_iterator VSFDF [(V2SF "TARGET_SIMD")
 (V4SF "TARGET_SIMD")
 (V2DF "TARGET_SIMD")
 (SF "TARGET_FLOAT")
 (DF "TARGET_FLOAT")])

Then the condition can be "TARGET_FRINT".

Same for the existing aarch64_.

> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 
> bb13c6cce1bf55633760bc14980402f1f0ac1689..fb97d37cecae17cdb6444e7f3391361b214f0712
>  100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -269,6 +269,7 @@ DEF_INTERNAL_FLT_FLOATN_FN (RINT, ECF_CONST, rint, unary)
>  DEF_INTERNAL_FLT_FLOATN_FN (ROUND, ECF_CONST, round, unary)
>  DEF_INTERNAL_FLT_FLOATN_FN (ROUNDEVEN, ECF_CONST, roundeven, unary)
>  DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
> +DEF_INTERNAL_OPTAB_FN (FTRUNC_INT, ECF_CONST, ftruncint, ftrunc_int)

ftrunc_int should be described in the comment at the top of the file.
E.g.:

  - ftrunc_int: a unary conversion optab that takes and returns values
of the same mode, but internally converts via another mode.  This
second mode is specified using a dummy final function argument.

> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c 
> b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> new file mode 100644
> index 
> ..2e1971f8aa11d8b95f454d03a03e050a3bf96747
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> @@ -0,0 +1,88 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv8.5-a" } */
> +/* { dg-require-effective-target arm_v8_5a_frintnzx_ok } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** f1:
> +**   ...
> +**   frint32zs0, s0
> +**   ...

Are these functions ever more than just:

f1:
frint32zs0, s0
ret

?  If not, I think we should match that sequence and “defend” the
good codegen.  The problem with ... on both sides is that it's
then not clear why we can rely on register 0 being used.

> +*/
> +float
> +f1 (float x)
> +{
> +  int y = x;
> +  return (float) y;
> +}
> +
> +/*
> +** f2:
> +**   ...
> +**   frint64zs0, s0
> +**   ...
> +*/
> +float
> +f2 (float x)
> +{
> +  long long int y = x;
> +  return (float) y;
> +}
> +
> +/*
> +** f3:
> +**   ...
> +**   frint32zd0, d0
> +**   ...
> +*/
> +double
> +f3 (double x)
> +{
> +  int y = x;
> +  return (double) y;
> +}
> +
> +/*
> +** f4:
> +**   ...
> +**   frint64zd0, d0
> +**   ...
> +*/
> +double
> +f4 (double x)
> +{
> +  long long int y = x;
> +  return (double) y;
> +}
> +
> +float
> +f1_dont (float x)
> +{
> +  unsigned int y = x;
> +  return (float) y;
> +}
> +
> +float
> +f2_dont (float x)
> +{
> +  unsigned long long int y = x;
> +  return (float) y;
> +}
> +
> +double
> +f3_dont (double x)
> +{
> +  unsigned int y = x;
> +  return (double) y;
> +}
> +
> +double
> +f4_dont (double x)
> +{
> +  unsigned long long int y = x;
> +  return (double) y;
> +}
> +
> +/* Make sure the 'dont's don't generate any frintNz.  */
> +/* { dg-final { scan-assembler-times {frint32z} 2 } } */
> +/* { dg-final { scan-assembler-times {frint64z} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c 
> b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
> index 
> 07217064e2ba54fcf4f5edc440e6ec19ddae66e1..3b34dc3ad79f1406a41ec4c00db10347ba1ca2c4
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/merge_trunc1.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -ffast-math" } */
> +/* { dg-skip-if "" { arm_v8_5a_frintnzx_ok } } */
>  
>  float
>  f1 (float x)
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 
> 8cbda192fe0fae59ea208ee43696b4d22c43e61e..7fa1659ce734257f3cd96f1e2e50ace4d02dcf51
>  100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -11365,6 +11365,33 @@ proc check_effective_target_arm_v8_3a_bkey_directive 
> { } {
>   }]
>  }
>  
> +# Return 1 if the target supports ARMv8.

Re: [PATCH] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread H.J. Lu via Gcc-patches
On Wed, Nov 17, 2021 at 6:08 AM Uros Bizjak  wrote:
>
> On Wed, Nov 17, 2021 at 2:46 PM H.J. Lu  wrote:
> >
> > On Wed, Nov 17, 2021 at 1:05 AM Uros Bizjak  wrote:
> > >
> > > On Tue, Nov 16, 2021 at 7:20 PM H.J. Lu via Gcc-patches
> > >  wrote:
> > > >
> > > > Add -mharden-sls= to mitigate against straight line speculation (SLS)
> > > > for function return and indirect branch by adding an INT3 instruction
> > > > after function return and indirect branch.
> > > >
> > > > gcc/
> > > >
> > > > PR target/102952
> > > > * config/i386/i386-opts.h (harden_sls): New enum.
> > > > * config/i386/i386.c (output_indirect_thunk): Mitigate against
> > > > SLS for function return.
> > > > (ix86_output_function_return): Likewise.
> > > > (ix86_output_jmp_thunk_or_indirect): Mitigate against indirect
> > > > branch.
> > > > (ix86_output_indirect_jmp): Likewise.
> > > > (ix86_output_call_insn): Likewise.
> > > > * config/i386/i386.opt: Add -mharden-sls=.
> > > > * doc/invoke.texi: Document -mharden-sls=.
> > > >
> > > > gcc/testsuite/
> > > >
> > > > PR target/102952
> > > > * gcc.target/i386/harden-sls-1.c: New test.
> > > > * gcc.target/i386/harden-sls-2.c: Likewise.
> > > > * gcc.target/i386/harden-sls-3.c: Likewise.
> > > > * gcc.target/i386/harden-sls-4.c: Likewise.
> > > > ---
> > > >  gcc/config/i386/i386-opts.h  |  7 +
> > > >  gcc/config/i386/i386.c   | 30 
> > > >  gcc/config/i386/i386.opt | 20 +
> > > >  gcc/doc/invoke.texi  | 10 ++-
> > > >  gcc/testsuite/gcc.target/i386/harden-sls-1.c | 14 +
> > > >  gcc/testsuite/gcc.target/i386/harden-sls-2.c | 14 +
> > > >  gcc/testsuite/gcc.target/i386/harden-sls-3.c | 14 +
> > > >  gcc/testsuite/gcc.target/i386/harden-sls-4.c | 14 +
> > > >  8 files changed, 116 insertions(+), 7 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-1.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-2.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-3.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-4.c
> > > >
> > > > diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
> > > > index 04e4ad608fb..171d3106d0a 100644
> > > > --- a/gcc/config/i386/i386-opts.h
> > > > +++ b/gcc/config/i386/i386-opts.h
> > > > @@ -121,4 +121,11 @@ enum instrument_return {
> > > >instrument_return_nop5
> > > >  };
> > > >
> > > > +enum harden_sls {
> > > > +  harden_sls_none = 0,
> > > > +  harden_sls_return = 1 << 0,
> > > > +  harden_sls_indirect_branch = 1 << 1,
> > > > +  harden_sls_all = harden_sls_return | harden_sls_indirect_branch
> > > > +};
> > > > +
> > > >  #endif
> > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > index cc9f9322fad..0a902d66321 100644
> > > > --- a/gcc/config/i386/i386.c
> > > > +++ b/gcc/config/i386/i386.c
> > > > @@ -5914,6 +5914,8 @@ output_indirect_thunk (unsigned int regno)
> > > >  }
> > > >
> > > >fputs ("\tret\n", asm_out_file);
> > > > +  if ((ix86_harden_sls & harden_sls_return))
> > > > +fputs ("\tint3\n", asm_out_file);
> > > >  }
> > > >
> > > >  /* Output a funtion with a call and return thunk for indirect branch.
> > > > @@ -15987,6 +15989,8 @@ ix86_output_jmp_thunk_or_indirect (const char 
> > > > *thunk_name, const int regno)
> > > >fprintf (asm_out_file, "\tjmp\t");
> > > >assemble_name (asm_out_file, thunk_name);
> > > >putc ('\n', asm_out_file);
> > > > +  if ((ix86_harden_sls & harden_sls_indirect_branch))
> > > > +   fputs ("\tint3\n", asm_out_file);
> > > >  }
> > > >else
> > > >  output_indirect_thunk (regno);
> > > > @@ -16212,10 +16216,14 @@ ix86_output_indirect_jmp (rtx call_op)
> > > > gcc_unreachable ();
> > > >
> > > >ix86_output_indirect_branch (call_op, "%0", true);
> > > > -  return "";
> > > > +  if ((ix86_harden_sls & harden_sls_indirect_branch))
> > > > +   return "int3";
> > > > +  else
> > > > +   return "";
> > > >  }
> > > >else
> > > > -return "%!jmp\t%A0";
> > > > +return ((ix86_harden_sls & harden_sls_indirect_branch)
> > > > +   ? "%!jmp\t%A0\n\tint3" : "%!jmp\t%A0");
> > > >  }
> > >
> > > Just change existing returns to fputs and end function with:
> > >
> > > return (ix86_harden_sls & harden_sls_indirect_branch) ? "int3" : "";
> >
> > But fputs doesn't support %A0.
>
> Sorry for the thinko, output_asm_insn instead of fputs will do the trick.

Fixed in the v2 patch.

Thanks.

> Uros.
>
> >
> > > >  /* Output return instrumentation for current function if needed.  */
> > > > @@ -16283,10 +16291,15 @@ ix86_output_function_return (bool long_p)
> > > >return "";
> > > >  }
> > > >
> > > > -  if (!long

Re: [PATCH v2] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 17, 2021 at 4:35 PM H.J. Lu  wrote:
>
> Add -mharden-sls= to mitigate against straight line speculation (SLS)
> for function return and indirect branch by adding an INT3 instruction
> after function return and indirect branch.
>
> gcc/
>
> PR target/102952
> * config/i386/i386-opts.h (harden_sls): New enum.
> * config/i386/i386.c (output_indirect_thunk): Mitigate against
> SLS for function return.
> (ix86_output_function_return): Likewise.
> (ix86_output_jmp_thunk_or_indirect): Mitigate against indirect
> branch.
> (ix86_output_indirect_jmp): Likewise.
> (ix86_output_call_insn): Likewise.
> * config/i386/i386.opt: Add -mharden-sls=.
> * doc/invoke.texi: Document -mharden-sls=.
>
> gcc/testsuite/
>
> PR target/102952
> * gcc.target/i386/harden-sls-1.c: New test.
> * gcc.target/i386/harden-sls-2.c: Likewise.
> * gcc.target/i386/harden-sls-3.c: Likewise.
> * gcc.target/i386/harden-sls-4.c: Likewise.
> * gcc.target/i386/harden-sls-5.c: Likewise.
> ---
>  gcc/config/i386/i386-opts.h  |  7 ++
>  gcc/config/i386/i386.c   | 23 ++--
>  gcc/config/i386/i386.opt | 20 +
>  gcc/doc/invoke.texi  | 10 -
>  gcc/testsuite/gcc.target/i386/harden-sls-1.c | 14 
>  gcc/testsuite/gcc.target/i386/harden-sls-2.c | 14 
>  gcc/testsuite/gcc.target/i386/harden-sls-3.c | 14 
>  gcc/testsuite/gcc.target/i386/harden-sls-4.c | 16 ++
>  gcc/testsuite/gcc.target/i386/harden-sls-5.c | 17 +++
>  9 files changed, 127 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-5.c
>
> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
> index 04e4ad608fb..171d3106d0a 100644
> --- a/gcc/config/i386/i386-opts.h
> +++ b/gcc/config/i386/i386-opts.h
> @@ -121,4 +121,11 @@ enum instrument_return {
>instrument_return_nop5
>  };
>
> +enum harden_sls {
> +  harden_sls_none = 0,
> +  harden_sls_return = 1 << 0,
> +  harden_sls_indirect_branch = 1 << 1,
> +  harden_sls_all = harden_sls_return | harden_sls_indirect_branch
> +};
> +
>  #endif
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 73c4d5115bb..8bbf6ae9875 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -5914,6 +5914,8 @@ output_indirect_thunk (unsigned int regno)
>  }
>
>fputs ("\tret\n", asm_out_file);
> +  if ((ix86_harden_sls & harden_sls_return))
> +fputs ("\tint3\n", asm_out_file);
>  }
>
>  /* Output a funtion with a call and return thunk for indirect branch.
> @@ -15984,6 +15986,8 @@ ix86_output_jmp_thunk_or_indirect (const char 
> *thunk_name, const int regno)
>fprintf (asm_out_file, "\tjmp\t");
>assemble_name (asm_out_file, thunk_name);
>putc ('\n', asm_out_file);
> +  if ((ix86_harden_sls & harden_sls_indirect_branch))
> +   fputs ("\tint3\n", asm_out_file);
>  }
>else
>  output_indirect_thunk (regno);
> @@ -16206,10 +16210,10 @@ ix86_output_indirect_jmp (rtx call_op)
> gcc_unreachable ();
>
>ix86_output_indirect_branch (call_op, "%0", true);
> -  return "";
>  }
>else
> -return "%!jmp\t%A0";
> +output_asm_insn ("%!jmp\t%A0", &call_op);
> +  return (ix86_harden_sls & harden_sls_indirect_branch) ? "int3" : "";
>  }
>
>  /* Output return instrumentation for current function if needed.  */
> @@ -16277,10 +16281,10 @@ ix86_output_function_return (bool long_p)
>return "";
>  }
>
> -  if (!long_p)
> -return "%!ret";
> -
> -  return "rep%; ret";
> +  if ((ix86_harden_sls & harden_sls_return))
> +long_p = false;

Is the above really needed? This will change "rep ret" to a "[notrack]
ret" when SLS hardening is in effect, with a conditional [notrack]
prefix, even when long ret was requested.

On a related note, "notrack ret" does not assemble for me, the
assembler reports:

notrack.s:1: Error: expecting indirect branch instruction after `notrack'

Can you please clarify the above change?

Uros.

> +  output_asm_insn (long_p ? "rep%; ret" : "%!ret", nullptr);
> +  return (ix86_harden_sls & harden_sls_return) ? "int3" : "";
>  }
>
>  /* Output indirect function return.  RET_OP is the function return
> @@ -16375,7 +16379,12 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op)
>if (output_indirect_p && !direct_p)
> ix86_output_indirect_branch (call_op, xasm, true);
>else
> -   output_asm_insn (xasm, &call_op);
> +   {
> + output_asm_insn (xasm, &call_op);
>

Re: [PATCH] libcpp: Fix up handling of block comments in -fdirectives-only mode [PR103130]

2021-11-17 Thread Marek Polacek via Gcc-patches
On Wed, Nov 17, 2021 at 10:22:32AM +0100, Jakub Jelinek wrote:
> Hi!
> 
> Normal preprocessing, -fdirectives-only preprocessing before the Nathan's
> rewrite, and all other compilers I've tried on godbolt treat even \*/
> as end of a block comment, but the new -fdirectives-only handling doesn't.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

OK for trunk and 11, thanks.
 
> 2021-11-17  Jakub Jelinek  
> 
>   PR preprocessor/103130
>   * lex.c (cpp_directive_only_process): Treat even \*/ as end of block
>   comment.
> 
>   * c-c++-common/cpp/dir-only-9.c: New test.
> 
> --- libcpp/lex.c.jj   2021-11-01 14:37:06.706853026 +0100
> +++ libcpp/lex.c  2021-11-16 16:54:04.022644499 +0100
> @@ -4493,7 +4493,7 @@ cpp_directive_only_process (cpp_reader *
>   break;
>  
> case '*':
> - if (pos > peek && !esc)
> + if (pos > peek)
> star = is_block;
>   esc = false;
>   break;
> --- gcc/testsuite/c-c++-common/cpp/dir-only-9.c.jj2021-11-16 
> 16:56:57.121217975 +0100
> +++ gcc/testsuite/c-c++-common/cpp/dir-only-9.c   2021-11-16 
> 16:56:14.524815094 +0100
> @@ -0,0 +1,13 @@
> +/* PR preprocessor/103130 */
> +/* { dg-do preprocess } */
> +/* { dg-options -fdirectives-only } */
> +
> +/*\
> + * this is a comment
> +\*/
> +
> +int
> +main ()
> +{
> +  return 0;
> +}
> 
>   Jakub
> 

Marek



[OG11][committed][PATCH 00/22] OpenACC "kernels" Improvements

2021-11-17 Thread Frederik Harwath
Hi,

this patch series implements the re-work of the OpenACC "kernels"
implementation that has been announced at the GNU Tools Track of this
year's Linux Plumbers Conference; see
https://linuxplumbersconf.org/event/11/contributions/998/.  The
central step is contained in the commit titled "openacc: Use Graphite
for dependence analysis in \"kernels\" regions" whose commit message
also contains further explanations.

Best regards,
Frederik

PS: The commit series also includes a backport from master
"00b98b6cac25 Add dg-final option-based target selectors" and two
trivial unrelated commits "fa558c2a6664 Fix gimple_debug_cfg
declaration" and "35cdc94463fe Fix branch prediction dump message"



Andrew Stubbs (2):
  openacc: Add data optimization pass
  openacc: Add runtime alias checking for OpenACC kernels

Frederik Harwath (19):
  openacc: Move pass_oacc_device_lower after pass_graphite
  graphite: Extend SCoP detection dump output
  graphite: Rename isl_id_for_ssa_name
  graphite: Fix minor mistakes in comments
  Fix branch prediction dump message
  Move compute_alias_check_pairs to tree-data-ref.c
  graphite: Add runtime alias checking
  openacc: Use Graphite for dependence analysis in "kernels" regions
  openacc: Add "can_be_parallel" flag info to "graph" dumps
  openacc: Add further kernels tests
  openacc: Remove unused partitioning in "kernels" regions
  Add function for printing a single OMP_CLAUSE
  openacc: Warn about "independent" "kernels" loops with
data-dependences
  openacc: Handle internal function calls in pass_lim
  openacc: Disable pass_pre on outlined functions analyzed by Graphite
  graphite: Tune parameters for OpenACC use
  graphite: Adjust scop loop-nest choice
  graphite: Accept loops without data references
  openacc: Adjust test expectations to new "kernels" handling

Sandra Loosemore (1):
  Fortran: delinearize multi-dimensional array accesses

 gcc/Makefile.in   |2 +
 gcc/cfgloop.c |1 +
 gcc/cfgloop.h |6 +
 gcc/cfgloopmanip.c|1 +
 gcc/common.opt|9 +
 gcc/config/nvptx/nvptx.c  |7 +
 gcc/doc/gimple.texi   |2 +
 gcc/doc/invoke.texi   |   20 +-
 gcc/doc/passes.texi   |6 +-
 gcc/expr.c|1 +
 gcc/flag-types.h  |1 +
 gcc/fortran/lang.opt  |4 +
 gcc/fortran/trans-array.c |  321 --
 gcc/gimple-loop-interchange.cc|2 +-
 gcc/gimple-pretty-print.c |3 +
 gcc/gimple-walk.c |   15 +-
 gcc/gimple-walk.h |6 +
 gcc/gimple.h  |7 +-
 gcc/gimplify.c|   13 +-
 gcc/graph.c   |   35 +-
 gcc/graphite-dependences.c|  220 +++-
 gcc/graphite-isl-ast-to-gimple.c  |  271 -
 gcc/graphite-oacc.c   |  689 
 gcc/graphite-oacc.h   |   55 +
 gcc/graphite-optimize-isl.c   |   42 +-
 gcc/graphite-poly.c   |   41 +-
 gcc/graphite-scop-detection.c |  654 +--
 gcc/graphite-sese-to-poly.c   |   90 +-
 gcc/graphite.c|  120 +-
 gcc/graphite.h|   40 +-
 gcc/internal-fn.c |2 +
 gcc/internal-fn.h |4 +-
 gcc/omp-data-optimize.cc  |  951 
 gcc/omp-expand.c  |  110 +-
 gcc/omp-general.c |   23 +-
 gcc/omp-general.h |1 +
 gcc/omp-low.c |  321 +-
 gcc/omp-oacc-kernels-decompose.cc |  145 ++-
 gcc/omp-offload.c | 1001 +
 gcc/omp-offload.h |2 +
 gcc/params.opt|5 +-
 gcc/passes.c  |   42 +
 gcc/passes.def|   47 +-
 gcc/predict.c |2 +-
 gcc/sese.c|   25 +-
 gcc/sese.h|   19 +
 gcc/testsuite/c-c++-common/goacc/acc-icf.c|4 +-
 gcc/testsuite/c-c++-common/goacc/cache-3-1.c  |2 +-
 ...classify-kernels-unparallelized-graphite.c |   41 +
 ...lassify-kernels-unparallelized-parloops.c} |   12 +-
 .../c-c++-common/goacc/classify-kernels.c |   27 +-
 .../c-c++-common/goacc/classify-parallel.c|8 +-
 .../c-c++-common/goacc/classify-routine.c |8 +-
 .../c-c++-common/goacc/classi

[OG11][committed][PATCH 01/22] Fortran: delinearize multi-dimensional array accesses

2021-11-17 Thread Frederik Harwath
From: Sandra Loosemore 

The Fortran front end presently linearizes accesses to
multi-dimensional arrays by combining the indices for the various
dimensions into a series of explicit multiplies and adds with
refactoring to allow CSE of invariant parts of the computation.
Unfortunately this representation interferes with Graphite-based loop
optimizations.  It is difficult to recover the original
multi-dimensional form of the access by the time loop optimizations
run because parts of it have already been optimized away or into a
form that is not easily recognizable, so it seems better to have the
Fortran front end produce delinearized accesses to begin with, a set
of nested ARRAY_REFs similar to the existing behavior of the C and C++
front ends.  This is a long-standing problem that has previously been
discussed e.g. in PR 14741 and PR61000.

This patch is an initial implementation for explicit array accesses
only; it doesn't handle the accesses generated during scalarization of
whole-array or array-section operations, which follow a different code
path.

gcc/
* expr.c (get_inner_reference): Handle NOP_EXPR like
VIEW_CONVERT_EXPR.

gcc/fortran/
* lang.opt (-param=delinearize=): New.
* trans-array.c (get_class_array_vptr): New, split from...
(build_array_ref): ...here.
(get_array_lbound, get_array_ubound): New, split from...
(gfc_conv_array_ref): ...here.  Additional code refactoring
plus support for delinearization of the array access.

gcc/testsuite/
* gfortran.dg/assumed_type_2.f90: Adjust patterns.
* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
* gfortran.dg/graphite/block-3.f90: Remove xfails.
* gfortran.dg/graphite/block-4.f90: Likewise.
* gfortran.dg/inline_matmul_24.f90: Adjust patterns.
* gfortran.dg/no_arg_check_2.f90: Likewise.
* gfortran.dg/pr32921.f: Likewise.
* gfortran.dg/reassoc_4.f: Disable delinearization for this test.

Co-Authored-By: Tobias Burnus  
---
 gcc/expr.c|   1 +
 gcc/fortran/lang.opt  |   4 +
 gcc/fortran/trans-array.c | 321 +-
 gcc/testsuite/gfortran.dg/assumed_type_2.f90  |   6 +-
 .../gfortran.dg/goacc/kernels-loop-inner.f95  |   2 +-
 gcc/testsuite/gfortran.dg/graphite/block-2.f  |   9 +-
 .../gfortran.dg/graphite/block-3.f90  |   1 -
 .../gfortran.dg/graphite/block-4.f90  |   1 -
 gcc/testsuite/gfortran.dg/graphite/id-9.f |   2 +-
 .../gfortran.dg/inline_matmul_24.f90  |   2 +-
 gcc/testsuite/gfortran.dg/no_arg_check_2.f90  |   6 +-
 gcc/testsuite/gfortran.dg/pr32921.f   |   2 +-
 gcc/testsuite/gfortran.dg/reassoc_4.f |   2 +-
 13 files changed, 264 insertions(+), 95 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index 21b7e96ed62e..c7ee800c4d4f 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7539,6 +7539,7 @@ get_inner_reference (tree exp, poly_int64_pod *pbitsize,
  break;

case VIEW_CONVERT_EXPR:
+   case NOP_EXPR:
  break;

case MEM_REF:
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index dba333448c11..1548d56278a4 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -521,6 +521,10 @@ fdefault-real-16
 Fortran Var(flag_default_real_16)
 Set the default real kind to an 16 byte wide type.

+-param=delinearize=
+Common Joined UInteger Var(flag_delinearize_aref) Init(1) IntegerRange(0,1) 
Param Optimization
+Delinearize array references.
+
 fdollar-ok
 Fortran Var(flag_dollar_ok)
 Allow dollar signs in entity names.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index b7d949929722..3eb9a1778173 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3747,11 +3747,9 @@ add_to_offset (tree *cst_offset, tree *offset, tree t)
 }
 }

-
 static tree
-build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+get_class_array_vptr (tree desc, tree vptr)
 {
-  tree tmp;
   tree type;
   tree cdesc;

@@ -3775,19 +3773,74 @@ build_array_ref (tree desc, tree offset, tree decl, 
tree vptr)
  && GFC_CLASS_TYPE_P (TYPE_CANONICAL (type)))
vptr = gfc_class_vptr_get (TREE_OPERAND (cdesc, 0));
 }
+  return vptr;
+}

+static tree
+build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+{
+  tree tmp;
+  vptr = get_class_array_vptr (desc, vptr);
   tmp = gfc_conv_array_data (desc);
   tmp = build_fold_indirect_ref_loc (input_location, tmp);
   tmp = gfc_build_array_ref (tmp, offset, decl, vptr);
   return tmp;
 }

+/* Get the declared lower bound for rank N of array DECL which might
+   be either a bare array or a descriptor.  This differs from
+   gfc_conv_array_lbound because it gets information for temporary array
+   objects from AR instead of the descriptor (they can differ).  */
+
+static tree
+get_array_lbound (tree decl, int n, gfc_symbol *sym,
+   

[OG11][committed][PATCH 03/22] graphite: Extend SCoP detection dump output

2021-11-17 Thread Frederik Harwath
Extend dump output to make understanding why Graphite rejects to
include a loop in a SCoP easier (for GCC developers).

ChangeLog:

* graphite-scop-detection.c (scop_detection::can_represent_loop):
Output reason for failure to dump file.
(scop_detection::harmful_loop_in_region): Likewise.
(scop_detection::graphite_can_represent_expr): Likewise.
(scop_detection::stmt_has_simple_data_refs_p): Likewise.
(scop_detection::stmt_simple_for_scop_p): Likewise.
(print_sese_loop_numbers): New function.
(scop_detection::add_scop): Use from here to print loops in
rejected SCoP.
---
 gcc/graphite-scop-detection.c | 188 +-
 1 file changed, 165 insertions(+), 23 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 3e729b159b09..46c470210d05 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -69,12 +69,27 @@ public:
 fprintf (output.dump_file, "%d", i);
 return output;
   }
+
   friend debug_printer &
   operator<< (debug_printer &output, const char *s)
   {
 fprintf (output.dump_file, "%s", s);
 return output;
   }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, gimple* stmt)
+  {
+print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS);
+return output;
+  }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, tree t)
+  {
+print_generic_expr (output.dump_file, t, TDF_SLIM);
+return output;
+  }
 } dp;

 #define DEBUG_PRINT(args) do \
@@ -506,6 +521,24 @@ scop_detection::merge_sese (sese_l first, sese_l second) 
const
   return combined;
 }

+/* Print the loop numbers of the loops contained
+   in SESE to FILE. */
+
+static void
+print_sese_loop_numbers (FILE *file, sese_l sese)
+{
+  loop_p loop;
+  bool printed = false;
+  FOR_EACH_LOOP (loop, 0)
+  {
+if (loop_in_sese_p (loop, sese))
+  fprintf (file, "%d, ", loop->num);
+printed = true;
+  }
+  if (printed)
+fprintf (file, "\b\b");
+}
+
 /* Build scop outer->inner if possible.  */

 void
@@ -519,8 +552,13 @@ scop_detection::build_scop_depth (loop_p loop)
   if (! next
  || harmful_loop_in_region (next))
{
- if (s)
-   add_scop (s);
+  if (next)
+DEBUG_PRINT (
+dp << "[scop-detection] Discarding SCoP on loops ";
+print_sese_loop_numbers (dump_file, next);
+dp << " because of harmful loops\n";);
+  if (s)
+add_scop (s);
  build_scop_depth (loop);
  s = invalid_sese;
}
@@ -560,14 +598,62 @@ scop_detection::can_represent_loop (loop_p loop, sese_l 
scop)
   || !single_pred_p (loop->latch)
   || exit->src != single_pred (loop->latch)
   || !empty_block_p (loop->latch))
-return false;
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop shape 
unsupported.\n");
+  return false;
+}
+
+  bool edge_irreducible
+  = loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP;
+  if (edge_irreducible)
+{
+  DEBUG_PRINT (
+  dp << "[can_represent_loop-fail] Loop is not a natural loop.\n");
+  return false;
+}
+
+  bool niter_is_unconditional = number_of_iterations_exit (loop,
+  single_exit (loop),
+  &niter_desc, false);

-  return !(loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP)
-&& number_of_iterations_exit (loop, single_exit (loop), &niter_desc, false)
-&& niter_desc.control.no_overflow
-&& (niter = number_of_latch_executions (loop))
-&& !chrec_contains_undetermined (niter)
-&& graphite_can_represent_expr (scop, loop, niter);
+  if (!niter_is_unconditional)
+{
+  DEBUG_PRINT (
+  dp << "[can_represent_loop-fail] Loop niter not unconditional.\n"
+ << "Condition: " << niter_desc.assumptions << "\n");
+  return false;
+}
+
+  niter = number_of_latch_executions (loop);
+  if (!niter)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter unknown.\n");
+  return false;
+}
+  if (!niter_desc.control.no_overflow)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter can 
overflow.\n");
+  return false;
+}
+
+  bool undetermined_coefficients = chrec_contains_undetermined (niter);
+  if (undetermined_coefficients)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+  << "Loop niter chrec contains undetermined coefficients.\n");
+  return false;
+}
+
+  bool can_represent_expr = graphite_can_represent_expr (scop, loop, niter);
+  if (!can_represent_expr)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+  << "Loop niter expression cannot be represented: "
+  << niter << "\n");
+  return false;
+}
+
+  ret

[OG11][committed][PATCH 02/22] openacc: Move pass_oacc_device_lower after pass_graphite

2021-11-17 Thread Frederik Harwath
The OpenACC device lowering pass must run after the Graphite pass to
allow for the use of Graphite for automatic parallelization of kernels
regions in the future. Experimentation has shown that it is best,
performancewise, to run pass_oacc_device_lower together with the
related passes pass_oacc_loop_designation and pass_oacc_gimple_workers
early after pass_graphite in pass_tree_loop, at least if the other
tree loop passes are not adjusted. In particular, to enable
vectorization which is crucial for GCN offloading, device lowering
should happen before pass_vectorize. To bring the loops contained in
the offloading functions into the shape expected by the loop
vectorizer, we have to make sure that some passes that previously were
executed only once before pass_tree_loop are also executed on the
offloading functions.  To ensure the execution of
pass_oacc_device_lower if pass_tree_loop does not execute (no loops,
no optimizations), we introduce two further copies of the pass to the
pipeline that run if there are no loops or if no optimization is
performed.

gcc/ChangeLog:

* omp-general.c (oacc_get_fn_dim_size): Return 0 on
missing "dims".
* omp-offload.c (pass_oacc_loop_designation::clone): New
member function.
(pass_oacc_gimple_workers::clone): Likewise.
(pass_oacc_gimple_device_lower::clone): Likewise.
* passes.c (pass_data_no_loop_optimizations): New pass_data.
(class pass_no_loop_optimizations): New pass.
(make_pass_no_loop_optimizations): New function.
* passes.def: Move pass_oacc_{loop_designation,
gimple_workers, device_lower} into tree_loop, and add
copies to pass_tree_no_loop and to new
pass_no_loop_optimizations.  Add copies of passes pass_ccp,
pass_ipa_warn, pass_complete_unrolli, pass_backprop,
pass_phiprop, pass_fix_loops after the OpenACC passes
in pass_tree_loop.
* tree-ssa-loop-ivcanon.c (pass_complete_unroll::clone):
New member function.
(pass_complete_unrolli::clone): Likewise.
* tree-ssa-loop.c (pass_fix_loops::clone): Likewise.
(pass_tree_loop_init::clone): Likewise.
(pass_tree_loop_done::clone): Likewise.
* tree-ssa-phiprop.c (pass_phiprop::clone): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust
expected output to pass name changes due to the pass
reordering and cloning.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/goacc/loop-processing-1.c: Adjust expected output
* to pass name changes due to the pass reordering and cloning.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* c-c++-common/goacc/routine-nohost-1.c: Likewise.
* c-c++-common/unroll-1.c: Likewise.
* c-c++-common/unroll-4.c: Likewise.
* gcc.dg/goacc/loop-processing-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-2.c: Likewise.
* gcc.dg/tree-ssa/backprop-3.c: Likewise.
* gcc.dg/tree-ssa/backprop-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-5.c: Likewise.
* gcc.dg/tree-ssa/backprop-6.c: Likewise.
* gcc.dg/tree-ssa/cunroll-1.c: Likewise.
* gcc.dg/tree-ssa/cunroll-3.c: Likewise.
* gcc.dg/tree-ssa/cunroll-9.c: Likewise.
* gcc.dg/tree-ssa/ldist-17.c: Likewise.
* gcc.dg/tree-ssa/loop-38.c: Likewise.
* gcc.dg/tree-ssa/pr21463.c: Likewise.
* gcc.dg/tree-ssa/pr45427.c: Likewise.
* gcc.dg/tree-ssa/pr61743-1.c: Likewise.
* gcc.dg/unroll-2.c: Likewise.
* gcc.dg/unroll-3.c: Likewise.
* gcc.dg/unroll-4.c: Likewise.
* gcc.dg/unroll-5.c: Likewise.
* gcc.dg/vect/vect-profile-1.c: Likewise.
* c-c++-common/goacc/device-lowering-debug-optimization.c: New test.
* c-c++-common/goacc/device-lowering-no-loops.c: New test.
* c-c++-common/goacc/device-lowering-no-optimization.c: New test.

Co-Authored-By: Thomas Schwinge 
---
 gcc/omp-general.c |  8 +-
 gcc/omp-offload.c |  8 ++
 gcc/passes.c  | 42 
 gcc/passes.def  

[OG11][committed][PATCH 04/22] graphite: Rename isl_id_for_ssa_name

2021-11-17 Thread Frederik Harwath
The SSA names for which this function gets used are always SCoP
parameters and hence "isl_id_for_parameter" is a better name.  It also
explains the prefix "P_" for those names in the ISL representation.

gcc/ChangeLog:

* graphite-sese-to-poly.c (isl_id_for_ssa_name): Rename to ...
  (isl_id_for_parameter): ... this new function name.
  (build_scop_context): Adjust function use.
---
 gcc/graphite-sese-to-poly.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index eebf2e02cfca..195851cb540a 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -100,14 +100,15 @@ extract_affine_mul (scop_p s, tree e, __isl_take 
isl_space *space)
   return isl_pw_aff_mul (lhs, rhs);
 }

-/* Return an isl identifier from the name of the ssa_name E.  */
+/* Return an isl identifier for the parameter P.  */

 static isl_id *
-isl_id_for_ssa_name (scop_p s, tree e)
+isl_id_for_parameter (scop_p s, tree p)
 {
-  char name1[14];
-  snprintf (name1, sizeof (name1), "P_%d", SSA_NAME_VERSION (e));
-  return isl_id_alloc (s->isl_context, name1, e);
+  gcc_checking_assert (TREE_CODE (p) == SSA_NAME);
+  char name[14];
+  snprintf (name, sizeof (name), "P_%d", SSA_NAME_VERSION (p));
+  return isl_id_alloc (s->isl_context, name, p);
 }

 /* Return an isl identifier for the data reference DR.  Data references and
@@ -893,15 +894,15 @@ build_scop_context (scop_p scop)
   isl_space *space = isl_space_set_alloc (scop->isl_context, nbp, 0);

   unsigned i;
-  tree e;
-  FOR_EACH_VEC_ELT (region->params, i, e)
+  tree p;
+  FOR_EACH_VEC_ELT (region->params, i, p)
 space = isl_space_set_dim_id (space, isl_dim_param, i,
-  isl_id_for_ssa_name (scop, e));
+  isl_id_for_parameter (scop, p));

   scop->param_context = isl_set_universe (space);

-  FOR_EACH_VEC_ELT (region->params, i, e)
-add_param_constraints (scop, i, e);
+  FOR_EACH_VEC_ELT (region->params, i, p)
+add_param_constraints (scop, i, p);
 }

 /* Return true when loop A is nested in loop B.  */
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 05/22] graphite: Fix minor mistakes in comments

2021-11-17 Thread Frederik Harwath
gcc/ChangeLog:

* graphite-sese-to-poly.c (build_poly_sr_1): Fix a typo and
  a reference to a variable which does not exist.
* graphite-isl-ast-to-gimple.c (gsi_insert_earliest): Fix typo
  in comment.
---
 gcc/graphite-isl-ast-to-gimple.c | 2 +-
 gcc/graphite-sese-to-poly.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index c202213f39b3..44c06016f1a2 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1018,7 +1018,7 @@ gsi_insert_earliest (gimple_seq seq)
   basic_block begin_bb = get_entry_bb (codegen_region);

   /* Inserting the gimple statements in a vector because gimple_seq behave
- in strage ways when inserting the stmts from it into different basic
+ in strange ways when inserting the stmts from it into different basic
  blocks one at a time.  */
   auto_vec stmts;
   for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi);
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 195851cb540a..12fa2d669b3c 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -644,14 +644,14 @@ build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, 
enum poly_dr_type kind,
 isl_map *acc, isl_set *subscript_sizes)
 {
   scop_p scop = PBB_SCOP (pbb);
-  /* Each scalar variables has a unique alias set number starting from
+  /* Each scalar variable has a unique alias set number starting from
  the maximum alias set assigned to a dr.  */
   int alias_set = scop->max_alias_set + SSA_NAME_VERSION (var);
   subscript_sizes = isl_set_fix_si (subscript_sizes, isl_dim_set, 0,
alias_set);

   /* Add a constrain to the ACCESSES polyhedron for the alias set of
- data reference DR.  */
+ the reference */
   isl_constraint *c
 = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space 
(acc)));
   c = isl_constraint_set_constant_si (c, -alias_set);
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 07/22] Move compute_alias_check_pairs to tree-data-ref.c

2021-11-17 Thread Frederik Harwath
Move this function from tree-loop-distribution.c to tree-data-ref.c
and make it non-static to enable its use from other parts of GCC.

gcc/ChangeLog:
* tree-loop-distribution.c (data_ref_segment_size): Remove function.
(latch_dominated_by_data_ref): Likewise.
(compute_alias_check_pairs): Likewise.

* tree-data-ref.c (data_ref_segment_size): New function,
copied from tree-loop-distribution.c
(compute_alias_check_pairs): Likewise.
(latch_dominated_by_data_ref): Likewise.

* tree-data-ref.h (compute_alias_check_pairs): New declaration.
---
 gcc/tree-data-ref.c  | 87 
 gcc/tree-data-ref.h  |  3 ++
 gcc/tree-loop-distribution.c | 87 
 3 files changed, 90 insertions(+), 87 deletions(-)

diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index d04e95f7c285..71f8d790e618 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -2645,6 +2645,93 @@ create_intersect_range_checks (class loop *loop, tree 
*cond_expr,
 dump_printf (MSG_NOTE, "using an address-based overlap test\n");
 }

+/* Compute and return an expression whose value is the segment length which
+   will be accessed by DR in NITERS iterations.  */
+
+static tree
+data_ref_segment_size (struct data_reference *dr, tree niters)
+{
+  niters = size_binop (MINUS_EXPR,
+  fold_convert (sizetype, niters),
+  size_one_node);
+  return size_binop (MULT_EXPR,
+fold_convert (sizetype, DR_STEP (dr)),
+fold_convert (sizetype, niters));
+}
+
+/* Return true if LOOP's latch is dominated by statement for data reference
+   DR.  */
+
+static inline bool
+latch_dominated_by_data_ref (class loop *loop, data_reference *dr)
+{
+  return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src,
+gimple_bb (DR_STMT (dr)));
+}
+
+/* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's
+   data dependence relations ALIAS_DDRS.  */
+
+void
+compute_alias_check_pairs (class loop *loop, vec *alias_ddrs,
+  vec *comp_alias_pairs)
+{
+  unsigned int i;
+  unsigned HOST_WIDE_INT factor = 1;
+  tree niters_plus_one, niters = number_of_latch_executions (loop);
+
+  gcc_assert (niters != NULL_TREE && niters != chrec_dont_know);
+  niters = fold_convert (sizetype, niters);
+  niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, "Creating alias check pairs:\n");
+
+  /* Iterate all data dependence relations and compute alias check pairs.  */
+  for (i = 0; i < alias_ddrs->length (); i++)
+{
+  ddr_p ddr = (*alias_ddrs)[i];
+  struct data_reference *dr_a = DDR_A (ddr);
+  struct data_reference *dr_b = DDR_B (ddr);
+  tree seg_length_a, seg_length_b;
+
+  if (latch_dominated_by_data_ref (loop, dr_a))
+   seg_length_a = data_ref_segment_size (dr_a, niters_plus_one);
+  else
+   seg_length_a = data_ref_segment_size (dr_a, niters);
+
+  if (latch_dominated_by_data_ref (loop, dr_b))
+   seg_length_b = data_ref_segment_size (dr_b, niters_plus_one);
+  else
+   seg_length_b = data_ref_segment_size (dr_b, niters);
+
+  unsigned HOST_WIDE_INT access_size_a
+   = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a;
+  unsigned HOST_WIDE_INT access_size_b
+   = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b;
+  unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
+  unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
+
+  dr_with_seg_len_pair_t dr_with_seg_len_pair
+   (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
+dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b),
+/* ??? Would WELL_ORDERED be safe?  */
+dr_with_seg_len_pair_t::REORDERED);
+
+  comp_alias_pairs->safe_push (dr_with_seg_len_pair);
+}
+
+  if (tree_fits_uhwi_p (niters))
+factor = tree_to_uhwi (niters);
+
+  /* Prune alias check pairs.  */
+  prune_runtime_alias_test_list (comp_alias_pairs, factor);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file,
+"Improved number of alias checks from %d to %d\n",
+alias_ddrs->length (), comp_alias_pairs->length ());
+}
+
 /* Create a conditional expression that represents the run-time checks for
overlapping of address ranges represented by a list of data references
pairs passed in ALIAS_PAIRS.  Data references are in LOOP.  The returned
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 8001cc54f518..5016ec926b1d 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -577,6 +577,9 @@ extern opt_result runtime_alias_check_p (ddr_p, class loop 
*, bool);
 extern int data_ref_compare_tree (tree, tree);
 extern void prune_runtime_alias_test_list (vec 

[OG11][committed][PATCH 08/22] graphite: Add runtime alias checking

2021-11-17 Thread Frederik Harwath
Graphite rejects a SCoP if it contains a pair of data references for
which it cannot determine statically if they may alias. This happens
very often, for instance in C code which does not use explicit
"restrict".  This commit adds the possibility to analyze a SCoP
nevertheless and perform an alias check at runtime.  Then, if aliasing
is detected, the execution will fall back to the unoptimized SCoP.

TODO This needs more testing on non-OpenACC code.

gcc/ChangeLog:

* common.opt: Add fgraphite-runtime-alias-checks.
* graphite-isl-ast-to-gimple.c
(generate_alias_cond): New function.
(graphite_regenerate_ast_isl): Use from here.
* graphite-poly.c (new_scop): Create unhandled_alias_ddrs vec ...
(free_scop): and release here.
* graphite-scop-detection.c (dr_defs_outside_region): New function.
(dr_well_analyzed_for_runtime_alias_check_p): New function.
(graphite_runtime_alias_check_p): New function.
(build_alias_set): Record unhandled alias ddrs for later alias check
creation if flag_graphite_runtime_alias_checks is true instead
of failing.
* graphite.h (struct scop): Add field unhandled_alias_ddrs.
* sese.h (has_operands_from_region_p): New function.
gcc/testsuite/ChangeLog:

* gcc.dg/graphite/alias-1.c: New test.
---
 gcc/common.opt  |   4 +
 gcc/graphite-isl-ast-to-gimple.c|  60 ++
 gcc/graphite-poly.c |   2 +
 gcc/graphite-scop-detection.c   | 239 +---
 gcc/graphite.h  |   4 +
 gcc/sese.h  |  18 ++
 gcc/testsuite/gcc.dg/graphite/alias-1.c |  22 +++
 7 files changed, 326 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 771398bc03de..aa695e56dc48 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1636,6 +1636,10 @@ fgraphite-identity
 Common Var(flag_graphite_identity) Optimization
 Enable Graphite Identity transformation.

+fgraphite-runtime-alias-checks
+Common Var(flag_graphite_runtime_alias_checks) Optimization Init(1)
+Allow Graphite to add runtime alias checks to loop-nests if aliasing cannot be 
resolved statically.
+
 fhoist-adjacent-loads
 Common Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 44c06016f1a2..caa0160b9bce 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1456,6 +1456,34 @@ generate_entry_out_of_ssa_copies (edge false_entry,
 }
 }

+/* Create a condition that evaluates to TRUE if all ALIAS_DDRS are free of
+   aliasing. */
+
+static tree
+generate_alias_cond (vec &alias_ddrs, loop_p context_loop)
+{
+  gcc_checking_assert (flag_graphite_runtime_alias_checks
+   && alias_ddrs.length () > 0);
+  gcc_checking_assert (context_loop);
+
+  auto_vec check_pairs;
+  compute_alias_check_pairs (context_loop, &alias_ddrs, &check_pairs);
+  gcc_checking_assert (check_pairs.length () > 0);
+
+  tree alias_cond = NULL_TREE;
+  create_runtime_alias_checks (context_loop, &check_pairs, &alias_cond);
+  gcc_checking_assert (alias_cond);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Generated runtime alias check: ");
+  print_generic_expr (dump_file, alias_cond, dump_flags);
+  fprintf (dump_file, "\n");
+}
+
+  return alias_cond;
+}
+
 /* GIMPLE Loop Generator: generates loops in GIMPLE form for the given SCOP.
Return true if code generation succeeded.  */

@@ -1496,12 +1524,44 @@ graphite_regenerate_ast_isl (scop_p scop)
   region->if_region = if_region;

   loop_p context_loop = region->region.entry->src->loop_father;
+  gcc_checking_assert (context_loop);
   edge e = single_succ_edge (if_region->true_region->region.entry->dest);
   basic_block bb = split_edge (e);

   /* Update the true_region exit edge.  */
   region->if_region->true_region->region.exit = single_succ_edge (bb);

+  if (flag_graphite_runtime_alias_checks
+  && scop->unhandled_alias_ddrs.length () > 0)
+{
+  /* SCoP detection has failed to handle the aliasing between some data
+references of the SCoP statically. Generate an alias check that selects
+the newly generated version of the SCoP in the true-branch of the
+conditional if aliasing can be ruled out at runtime and the original
+version of the SCoP, otherwise. */
+
+  loop_p loop
+  = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
+  scop->scop_info->region.exit->src->loop_father);
+  tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop);
+  tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+  set_ifsese_condition (re

[OG11][committed][PATCH 10/22] openacc: Add "can_be_parallel" flag info to "graph" dumps

2021-11-17 Thread Frederik Harwath
gcc/ChangeLog:

* graph.c (oacc_get_fn_attrib): New declaration.
(find_loop_location): New declaration.
(draw_cfg_nodes_for_loop): Print value of the
can_be_parallel flag at the top of loops in OpenACC
functions.
---
 gcc/graph.c | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/gcc/graph.c b/gcc/graph.c
index ce8de33ffe10..3ad07be3b309 100644
--- a/gcc/graph.c
+++ b/gcc/graph.c
@@ -191,6 +191,10 @@ draw_cfg_nodes_no_loops (pretty_printer *pp, struct 
function *fun)
 }
 }

+
+extern tree oacc_get_fn_attrib (tree);
+extern dump_user_location_t find_loop_location (class loop *);
+
 /* Draw all the basic blocks in LOOP.  Print the blocks in breath-first
order to get a good ranking of the nodes.  This function is recursive:
It first prints inner loops, then the body of LOOP itself.  */
@@ -205,17 +209,26 @@ draw_cfg_nodes_for_loop (pretty_printer *pp, int 
funcdef_no,

   if (loop->header != NULL
   && loop->latch != EXIT_BLOCK_PTR_FOR_FN (cfun))
-pp_printf (pp,
-  "\tsubgraph cluster_%d_%d {\n"
-  "\tstyle=\"filled\";\n"
-  "\tcolor=\"darkgreen\";\n"
-  "\tfillcolor=\"%s\";\n"
-  "\tlabel=\"loop %d\";\n"
-  "\tlabeljust=l;\n"
-  "\tpenwidth=2;\n",
-  funcdef_no, loop->num,
-  fillcolors[(loop_depth (loop) - 1) % 3],
-  loop->num);
+{
+  pp_printf (pp,
+ "\tsubgraph cluster_%d_%d {\n"
+ "\tstyle=\"filled\";\n"
+ "\tcolor=\"darkgreen\";\n"
+ "\tfillcolor=\"%s\";\n"
+ "\tlabel=\"loop %d %s\";\n"
+ "\tlabeljust=l;\n"
+ "\tpenwidth=2;\n",
+ funcdef_no, loop->num,
+ fillcolors[(loop_depth (loop) - 1) % 3], loop->num,
+ /* This is only meaningful for loops that have been processed
+by Graphite.
+
+TODO Use can_be_parallel_valid_p? */
+ !oacc_get_fn_attrib (cfun->decl)
+ ? ""
+ : loop->can_be_parallel ? "(can_be_parallel = true)"
+ : "(can_be_parallel = false)");
+}

   for (class loop *inner = loop->inner; inner; inner = inner->next)
 draw_cfg_nodes_for_loop (pp, funcdef_no, inner);
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 12/22] openacc: Remove unused partitioning in "kernels" regions

2021-11-17 Thread Frederik Harwath
With the old "kernels" handling, unparallelized regions would
get executed with 1x1x1 partitioning even if the user provided
explicit num_gangs, num_workers clauses etc.

This commit restores this behavior by removing unused partitioning
after assigning the parallelism dimensions to loops.

gcc/ChangeLog:

* omp-offload.c (oacc_remove_unused_partitioning): New function
for removing partitioning that is not used by any loop.
(oacc_validate_dims): Call oacc_remove_unused_partitioning and
enable warnings about unused partitioning.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust
expectations.
---
 gcc/omp-offload.c | 51 +--
 .../acc_prof-kernels-1.c  | 19 ---
 2 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index f5cb222efd8c..68cc5a9d9e5d 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1215,6 +1215,39 @@ oacc_parse_default_dims (const char *dims)
   targetm.goacc.validate_dims (NULL_TREE, oacc_min_dims, -2, 0);
 }

+/* Remove parallelism dimensions below LEVEL which are not set in USED
+   from DIMS and emit a warning pointing to the location of FN. */
+
+static void
+oacc_remove_unused_partitioning (tree fn, int *dims, int level, unsigned used)
+{
+
+  bool host_compiler = true;
+#ifdef ACCEL_COMPILER
+  host_compiler = false;
+#endif
+
+  static char const *const axes[] =
+  /* Must be kept in sync with GOMP_DIM enumeration.  */
+  { "gang", "worker", "vector" };
+
+  char removed_partitions[20] = "\0";
+  for (int ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
+if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] >= 0)
+  {
+if (host_compiler)
+  {
+strcat (removed_partitions, axes[ix]);
+strcat (removed_partitions, " ");
+  }
+dims[ix] = -1;
+  }
+  if (removed_partitions[0] != '\0')
+warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism,
+"removed %spartitioning from % region",
+removed_partitions);
+}
+
 /* Validate and update the dimensions for offloaded FN.  ATTRS is the
raw attribute.  DIMS is an array of dimensions, which is filled in.
LEVEL is the partitioning level of a routine, or -1 for an offload
@@ -1235,6 +1268,7 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
   for (ix = 0; ix != GOMP_DIM_MAX; ix++)
 {
   purpose[ix] = TREE_PURPOSE (pos);
+
   tree val = TREE_VALUE (pos);
   dims[ix] = val ? TREE_INT_CST_LOW (val) : -1;
   pos = TREE_CHAIN (pos);
@@ -1244,14 +1278,15 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
 #ifdef ACCEL_COMPILER
   check = false;
 #endif
+
+  static char const *const axes[] =
+  /* Must be kept in sync with GOMP_DIM enumeration.  */
+  { "gang", "worker", "vector" };
+
   if (check
   && warn_openacc_parallelism
-  && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn))
-  && !lookup_attribute ("oacc parallel_kernels_graphite", DECL_ATTRIBUTES 
(fn)))
+  && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn)))
 {
-  static char const *const axes[] =
-  /* Must be kept in sync with GOMP_DIM enumeration.  */
-   { "gang", "worker", "vector" };
   for (ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
if (dims[ix] < 0)
  ; /* Defaulting axis.  */
@@ -1262,14 +1297,20 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
  "region contains %s partitioned code but"
  " is not %s partitioned", axes[ix], axes[ix]);
else if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] != 1)
+ {
  /* The dimension is explicitly partitioned to non-unity, but
 no use is made within the region.  */
  warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism,
  "region is %s partitioned but"
  " does not contain %s partitioned code",
  axes[ix], axes[ix]);
+  }
 }

+  if (lookup_attribute ("oacc parallel_kernels_graphite",
+ DECL_ATTRIBUTES (fn)))
+oacc_remove_unused_partitioning  (fn, dims, level, used);
+
   bool changed = targetm.goacc.validate_dims (fn, dims, level, used);

   /* Default anything left to 1 or a partitioned default.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
index 4a9b11a3d3fe..d398b3463617 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
@@ -7,6 +7,8 @@

 #include 

+/* { dg-skip-if "'kernels' not analyzed by Graphite at -O0" { *-*-* } { "-O0" 
} { "" } } */

[OG11][committed][PATCH 11/22] openacc: Add further kernels tests

2021-11-17 Thread Frederik Harwath
Add some copies of tests to continue covering the old "parloops"-based
"kernels" implementation - until it gets removed from GCC - and
add further tests for the new Graphite-based implementation.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90:
New test.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/classify-kernels-unparallelized-graphite.c:
New test.
* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
New test.
* c-c++-common/goacc/kernels-decompose-1-parloops.c: New test.
* c-c++-common/goacc/kernels-reduction-parloops.c: New test.
* c-c++-common/goacc/loop-auto-reductions.c: New test.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c:
New test.
* c-c++-common/goacc/note-parallelism-kernels-loops-1.c: New test.
* c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c:
New test.
* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
New test.
* gfortran.dg/goacc/kernels-conversion.f95: New test.
* gfortran.dg/goacc/kernels-decompose-1-parloops.f95: New test.
* gfortran.dg/goacc/kernels-decompose-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-data-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-parloops.f95: New test.
* gfortran.dg/goacc/kernels-reductions.f90: New test.
---
 ...classify-kernels-unparallelized-graphite.c |  41 +
 ...classify-kernels-unparallelized-parloops.c |  47 ++
 .../goacc/kernels-decompose-1-parloops.c  | 125 ++
 .../goacc/kernels-reduction-parloops.c|  36 
 .../c-c++-common/goacc/loop-auto-reductions.c |  22 +++
 ...parallelism-1-kernels-loop-auto-parloops.c | 128 +++
 .../goacc/note-parallelism-kernels-loops-1.c  |  61 +++
 .../note-parallelism-kernels-loops-parloops.c |  53 ++
 ...assify-kernels-unparallelized-parloops.f95 |  44 +
 .../gfortran.dg/goacc/kernels-conversion.f95  |  52 ++
 .../goacc/kernels-decompose-1-parloops.f95| 121 ++
 .../goacc/kernels-decompose-parloops-2.f95| 154 ++
 .../goacc/kernels-loop-data-parloops-2.f95|  52 ++
 .../goacc/kernels-loop-parloops-2.f95 |  45 +
 .../goacc/kernels-loop-parloops.f95   |  39 +
 .../gfortran.dg/goacc/kernels-reductions.f90  |  37 +
 .../parallel-loop-auto-reduction-2.f90|  98 +++
 17 files changed, 1155 insertions(+)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-decompose-1-parloops.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-reduction-parloops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/loop-auto-reductions.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-1.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1-parloops.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-decompose-parloops-2.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-parloops-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90
 create mode 100644 
libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90

diff --git 
a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c 
b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
new file mode 100644
index ..77f4524907a9
--- /dev/null
+++ 
b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
@@ -0,0 +1,41 @@
+/* Check offloaded function's attributes and classification for unparallelized
+   OpenACC 'kernels' with Graphite kernles handling (default).  */
+
+/* { dg-additional-options "-O2" }
+   { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
+   { dg-additional-options "-fopt-info-optimized-omp" }
+   { dg-additional-options "-fopt-info-note-omp" }
+   { dg-additional-options "-fdump-tree-ompexp" }
+   { dg-additional-options "-fdump-tree-graphite-details" }
+   { dg-additional-options "-fdump-tree-oaccloops1" }
+  

[OG11][committed][PATCH 13/22] Add function for printing a single OMP_CLAUSE

2021-11-17 Thread Frederik Harwath
Commit 89f4f339130c ("For 'OMP_CLAUSE' in 'dump_generic_node', dump
the whole OMP clause chain") changed the dumping behavior for
OMP_CLAUSEs.  The old behavior is required for a follow-up
commit ("openacc: Add data optimization pass") that optimizes single
OMP_CLAUSEs.

gcc/ChangeLog:

* tree-pretty-print.c (print_omp_clause_to_str): Add new function.
* tree-pretty-print.h (print_omp_clause_to_str): Add declaration.
---
 gcc/tree-pretty-print.c | 11 +++
 gcc/tree-pretty-print.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index d769cd8f07c5..2e0255176c76 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1402,6 +1402,17 @@ dump_omp_clause (pretty_printer *pp, tree clause, int 
spc, dump_flags_t flags)
 }
 }

+/* Print the single clause at the top of the clause chain C to a string and
+   return it. Note that print_generic_expr_to_str prints the whole clause chain
+   instead. The caller must free the returned memory. */
+
+char *
+print_omp_clause_to_str (tree c)
+{
+  pretty_printer pp;
+  dump_omp_clause (&pp, c, 0, TDF_VOPS|TDF_MEMSYMS);
+  return xstrdup (pp_formatted_text (&pp));
+}

 /* Dump chain of OMP clauses.

diff --git a/gcc/tree-pretty-print.h b/gcc/tree-pretty-print.h
index cafe9aa95989..3368cb9f1544 100644
--- a/gcc/tree-pretty-print.h
+++ b/gcc/tree-pretty-print.h
@@ -41,6 +41,7 @@ extern void print_generic_expr (FILE *, tree, dump_flags_t = 
TDF_NONE);
 extern char *print_generic_expr_to_str (tree);
 extern void dump_omp_clauses (pretty_printer *, tree, int, dump_flags_t,
  bool = true);
+extern char *print_omp_clause_to_str (tree);
 extern void dump_omp_atomic_memory_order (pretty_printer *,
  enum omp_memory_order);
 extern void dump_omp_loop_non_rect_expr (pretty_printer *, tree, int,
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 15/22] openacc: Add runtime alias checking for OpenACC kernels

2021-11-17 Thread Frederik Harwath
From: Andrew Stubbs 

This commit adds the code generation for the runtime alias checks for
OpenACC loops that have been analyzed by Graphite.  The runtime alias
check condition gets generated in Graphite. It is evaluated by the
code generated for the IFN_GOACC_LOOP internal function calls.  If
aliasing is detected at runtime, the execution dimensions get adjusted
to execute the affected loops sequentially.

gcc/ChangeLog:

* graphite-isl-ast-to-gimple.c: Include internal-fn.h.
(graphite_oacc_analyze_scop): Implement runtime alias checks.
* omp-expand.c (expand_oacc_for): Add an additional "noalias" parameter
to GOACC_LOOP internal calls, and initialise it to integer_one_node.
* omp-offload.c (oacc_xform_loop): Integrate the runtime alias check
into the GOACC_LOOP expansion.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c: New test.
---
 gcc/graphite-isl-ast-to-gimple.c  | 122 ++
 gcc/graphite-scop-detection.c |  18 +-
 gcc/omp-expand.c  |  37 +-
 gcc/omp-offload.c | 413 ++
 .../runtime-alias-check-1.c   |  79 
 .../runtime-alias-check-2.c   |  90 
 6 files changed, 550 insertions(+), 209 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c
 create mode 100644 
libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index c516170d9493..bdabe588c3d8 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "graphite.h"
 #include "graphite-oacc.h"
 #include "stdlib.h"
+#include "internal-fn.h"

 struct ast_build_info
 {
@@ -1698,6 +1699,127 @@ graphite_oacc_analyze_scop (scop_p scop)
   print_isl_schedule (dump_file, scop->original_schedule);
 }

+  if (flag_graphite_runtime_alias_checks
+  && scop->unhandled_alias_ddrs.length () > 0)
+{
+  sese_info_p region = scop->scop_info;
+
+  /* Usually there will be a chunking loop with the actual work loop
+inside it.  In some corner cases there may only be one loop.  */
+  loop_p top_loop = region->region.entry->dest->loop_father;
+  loop_p active_loop = top_loop->inner ? top_loop->inner : top_loop;
+  tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, 
active_loop);
+
+  /* Walk back to GOACC_LOOP block.  */
+  basic_block goacc_loop_block = region->region.entry->src;
+
+  /* Find the GOACC_LOOP calls. If there aren't any then this is not an
+OpenACC kernels loop and will need different handling.  */
+  gimple_stmt_iterator gsitop = gsi_start_bb (goacc_loop_block);
+  while (!gsi_end_p (gsitop)
+&& (!is_gimple_call (gsi_stmt (gsitop))
+|| !gimple_call_internal_p (gsi_stmt (gsitop))
+|| (gimple_call_internal_fn (gsi_stmt (gsitop))
+!= IFN_GOACC_LOOP)))
+   gsi_next (&gsitop);
+
+  if (!gsi_end_p (gsitop))
+   {
+ /* Move the GOACC_LOOP CHUNK and STEP calls to after any hoisted
+statements.  There ought not be any problematic dependencies 
because
+the chunk size and step are only computed for very specific 
purposes.
+They may not be at the very top of the block, but they should be
+found together (the asserts test this assuption). */
+ gimple_stmt_iterator gsibottom = gsi_last_bb (goacc_loop_block);
+ gsi_move_after (&gsitop, &gsibottom);
+ gimple_stmt_iterator gsiinsert = gsibottom;
+ gcc_checking_assert (is_gimple_call (gsi_stmt (gsitop))
+  && gimple_call_internal_p (gsi_stmt (gsitop))
+  && (gimple_call_internal_fn (gsi_stmt (gsitop))
+  == IFN_GOACC_LOOP));
+ gsi_move_after (&gsitop, &gsibottom);
+
+ /* Insert "noalias_p = COND" before the GOACC_LOOP statements.
+Note that these likely depend on some of the hoisted statements.  
*/
+ tree cond_val = force_gimple_operand_gsi (&gsiinsert, cond, true, 
NULL,
+   true, GSI_NEW_STMT);
+
+ /* Insert the cond_val into each GOACC_LOOP call in the region.  */
+ for (int n = -1; n < (int)region->bbs.length (); n++)
+   {
+ /* Cover the region plus goacc_loop_block.  */
+ basic_block bb = n < 0 ? goacc_loop_block : region->bbs[n];
+
+ for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+  !gsi_end_p (gsi);
+  gsi_next (&gsi))
+   {
+ gimpl

[OG11][committed][PATCH 14/22] openacc: Add data optimization pass

2021-11-17 Thread Frederik Harwath
From: Andrew Stubbs 

Address PR90591 "Avoid unnecessary data transfer out of OMP
construct", for simple (but common) cases.

This commit adds a pass that optimizes data mapping clauses.
Currently, it can optimize copy/map(tofrom) clauses involving scalars
to copyin/map(to) and further to "private".  The pass is restricted
"kernels" regions but could be extended to other types of regions.

gcc/ChangeLog:

* Makefile.in: Add pass.
* doc/gimple.texi: TODO.
* gimple-walk.c (walk_gimple_seq_mod): Adjust for backward walking.
* gimple-walk.h (struct walk_stmt_info): Add field.
* passes.def: Add new pass.
* tree-pass.h (make_pass_omp_data_optimize): New declaration.
* omp-data-optimize.cc: New file.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Expect optimization messages.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.
* c-c++-common/goacc/uninit-copy-clause.c: Likewise.
* gfortran.dg/goacc/uninit-copy-clause.f95: Likewise.
* c-c++-common/goacc/omp_data_optimize-1.c: New test.
* g++.dg/goacc/omp_data_optimize-1.C: New test.
* gfortran.dg/goacc/omp_data_optimize-1.f90: New test.

Co-Authored-By: Thomas Schwinge 
---
 gcc/Makefile.in   |   1 +
 gcc/doc/gimple.texi   |   2 +
 gcc/gimple-walk.c |  15 +-
 gcc/gimple-walk.h |   6 +
 gcc/omp-data-optimize.cc  | 951 ++
 gcc/passes.def|   1 +
 .../goacc/note-parallelism-1-kernels-loops.c  |   7 +-
 ...note-parallelism-1-kernels-straight-line.c |   9 +-
 .../goacc/note-parallelism-kernels-loops.c|  10 +-
 .../c-c++-common/goacc/omp_data_optimize-1.c  | 677 +
 .../c-c++-common/goacc/uninit-copy-clause.c   |   6 +
 .../g++.dg/goacc/omp_data_optimize-1.C| 169 
 .../gfortran.dg/goacc/omp_data_optimize-1.f90 | 588 +++
 .../gfortran.dg/goacc/uninit-copy-clause.f95  |   2 +
 gcc/tree-pass.h   |   1 +
 .../kernels-decompose-1.c |   2 +
 .../libgomp.oacc-fortran/pr94358-1.f90|   4 +
 17 files changed, 2444 insertions(+), 7 deletions(-)
 create mode 100644 gcc/omp-data-optimize.cc
 create mode 100644 gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c
 create mode 100644 gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 4ebdcdbc5f8c..8c02b85d2a96 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1507,6 +1507,7 @@ OBJS = \
omp-low.o \
omp-oacc-kernels-decompose.o \
omp-simd-clone.o \
+   omp-data-optimize.o \
opt-problem.o \
optabs.o \
optabs-libfuncs.o \
diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 4b3d7d7452e3..a83e17f71a40 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -2778,4 +2778,6 @@ calling @code{walk_gimple_stmt} on each one.  @code{WI} 
is as in
 @code{walk_gimple_stmt}.  If @code{walk_gimple_stmt} returns non-@code{NULL}, 
the walk
 is stopped and the value returned.  Otherwise, all the statements
 are walked and @code{NULL_TREE} returned.
+
+TODO update for forward vs. backward.
 @end deftypefn
diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c
index cd287860994e..66fd491844d7 100644
--- a/gcc/gimple-walk.c
+++ b/gcc/gimple-walk.c
@@ -32,6 +32,8 @@ along with GCC; see the file COPYING3.  If not see
 /* Walk all the statements in the sequence *PSEQ calling walk_gimple_stmt
on each one.  WI is as in walk_gimple_stmt.

+   TODO update for forward vs. backward.
+
If walk_gimple_stmt returns non-NULL, the walk is stopped, and the
value is stored in WI->CALLBACK_RESULT.  Also, the statement that
produced the value is returned if this statement has not been
@@ -44,9 +46,10 @@ gimple *
 walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt,
 walk_tree_fn callback_op, struct walk_stmt_info *wi)
 {
-  gimple_stmt_iterator gsi;
+  bool forward = !(wi && wi->backward);

-  for (gsi = gsi_start (*pseq); !gsi_end_p (gsi); )
+  gimple_stmt_iterator gsi = forward ? gsi_start (*pseq) : gsi_last (*pseq);
+  for (; !gsi_end_p (gsi); )
 {
   tree ret = walk_gimple_stmt (&gsi, callback_stmt, callback_op, wi);
   if (ret)
@@ -60,7 +63,13 @@ walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn 
callback_stmt,
}

   if (!wi->removed_stmt)
-   gsi_next (&gsi);
+   {
+ if (forward)
+   gsi_next (&gs

[OG11][committed][PATCH 16/22] openacc: Warn about "independent" "kernels" loops with data-dependences

2021-11-17 Thread Frederik Harwath
This commit concerns loops in OpenACC "kernels" region that have been marked
up with an explicit "independent" clause by the user, but for which Graphite
found data dependences.  A discussion on the private internal OpenACC mailing
list suggested that warning the user about the dependences woud be a more
acceptable solution than reverting the user's decision. This behavior is
implemented by the present commit.

gcc/ChangeLog:

* common.opt: Add flag Wopenacc-false-independent.
* omp-offload.c (oacc_loop_warn_if_false_independent): New function.
(oacc_loop_fixed_partitions): Call from here.
---
 gcc/common.opt|  5 +
 gcc/omp-offload.c | 49 +++
 2 files changed, 54 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index aa695e56dc48..4c38ed5cf9ab 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -838,6 +838,11 @@ Wtsan
 Common Var(warn_tsan) Init(1) Warning
 Warn about unsupported features in ThreadSanitizer.

+Wopenacc-false-independent
+Common Var(warn_openacc_false_independent) Init(1) Warning
+Warn in case a loop in an OpenACC \"kernels\" region has an \"independent\"
+clause but analysis shows that it has loop-carried dependences.
+
 Xassembler
 Driver Separate

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 94a975a88660..b806e36ef515 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -2043,6 +2043,51 @@ oacc_loop_transform_auto_into_independent (oacc_loop 
*loop)
   return true;
 }

+/* Emit a warning if LOOP has an "independent" clause but Graphite's
+   analysis shows that it has data dependences. Note that we respect
+   the user's explicit decision to parallelize the loop but we
+   nevertheless warn that this decision could be wrong. */
+
+static void
+oacc_loop_warn_if_false_independent (oacc_loop *loop)
+{
+  if (!optimize)
+return;
+
+  if (loop->routine)
+return;
+
+  /* TODO Warn about "auto" & "independent" in "parallel" regions? */
+  if (!oacc_parallel_kernels_graphite_fun_p ())
+return;
+
+  if (!(loop->flags & OLF_INDEPENDENT))
+return;
+
+  bool analyzed = false;
+  bool can_be_parallel = oacc_loop_can_be_parallel_p (loop, analyzed);
+  loop_p cfg_loop = oacc_loop_get_cfg_loop (loop);
+
+  if (cfg_loop && cfg_loop->inner && !analyzed)
+{
+  if (dump_enabled_p ())
+   {
+ const dump_user_location_t loc
+   = dump_user_location_t::from_location_t (loop->loc);
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+  "'independent' loop in 'kernels' region has not been 
"
+  "analyzed (cf. 'graphite' "
+  "dumps for more information).\n");
+   }
+  return;
+}
+
+  if (!can_be_parallel)
+warning_at (loop->loc, 0,
+"loop has \"independent\" clause but data dependences were "
+"found.");
+}
+
 /* Walk the OpenACC loop hierarchy checking and assigning the
programmer-specified partitionings.  OUTER_MASK is the partitioning
this loop is contained within.  Return mask of partitioning
@@ -2094,6 +2139,10 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned 
outer_mask)
}
}

+  /* TODO Is this flag needed? Perhaps use -Wopenacc-parallelism? */
+  if (warn_openacc_false_independent)
+oacc_loop_warn_if_false_independent (loop);
+
   if (maybe_auto && (loop->flags & OLF_INDEPENDENT))
{
  loop->flags |= OLF_AUTO;
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 17/22] openacc: Handle internal function calls in pass_lim

2021-11-17 Thread Frederik Harwath
The loop invariant motion pass correctly refuses to move statements
out of a loop if any other statement in the loop is unanalyzable.  The
pass does not know how to handle the OpenACC internal function calls
which was not necessary until recently when the OpenACC device
lowering pass was moved to a later position in the pass pipeline.

This commit changes pass_lim to ignore the OpenACC internal function
calls which do not contain any memory references. The hoisting enabled
by this change can be useful for the data-dependence analysis in
Graphite; for instance, in the outlined functions for OpenACC regions,
all invariant accesses to the ".omp_data_i" struct should be hoisted
out of the OpenACC loop.  This is particularly important for variables
that were scalars in the original loop and which have been turned into
accesses to the struct by the outlining process.  Not hoisting those
can prevent scalar evolution analysis which is crucial for Graphite.
Since any hoisting that introduces intermediate names - and hence,
"fake" dependences - inside the analyzed nest can be harmful to
data-dependence analysis, a flag to restrict the hoisting in OpenACC
functions is added to the pass. The pass instance that executes before
Graphite now runs with this flag set to true and the pass instance
after Graphite runs unrestricted.

A more precise way of selecting the statements for which hoisting
should be enabled is left for a future improvement.

gcc/ChangeLog:
* passes.def: Set restrict_oacc_hoisting to true for the early
pass_lim instance.
* tree-ssa-loop-im.c (movement_possibility): Add
restrict_oacc_hoisting flag to function; restrict movement if set.
(compute_invariantness): Add restrict_oacc_hoisting flag and pass it on.
(gather_mem_refs_stmt): Skip IFN_GOACC_LOOP and IFN_UNIQUE
calls.
(loop_invariant_motion_in_fun): Add restrict_oacc_hoisting flag and
pass it on.
(pass_lim::execute): Pass on new flags.
* tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Adjust 
declaration.
* gimple-loop-interchange.cc (pass_linterchange::execute): Adjust call 
to
loop_invariant_motion_in_fun.
---
 gcc/gimple-loop-interchange.cc |  2 +-
 gcc/passes.def |  2 +-
 gcc/tree-ssa-loop-im.c | 58 --
 gcc/tree-ssa-loop-manip.h  |  2 +-
 4 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
index 7b799eca805c..d617438910fd 100644
--- a/gcc/gimple-loop-interchange.cc
+++ b/gcc/gimple-loop-interchange.cc
@@ -2096,7 +2096,7 @@ pass_linterchange::execute (function *fun)
   if (changed_p)
 {
   unsigned todo = TODO_update_ssa_only_virtuals;
-  todo |= loop_invariant_motion_in_fun (cfun, false);
+  todo |= loop_invariant_motion_in_fun (cfun, false, false);
   scev_reset ();
   return todo;
 }
diff --git a/gcc/passes.def b/gcc/passes.def
index 48c9821011f0..d1dedbc287e2 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -247,7 +247,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_cse_sincos);
   NEXT_PASS (pass_optimize_bswap);
   NEXT_PASS (pass_laddress);
-  NEXT_PASS (pass_lim);
+  NEXT_PASS (pass_lim, true /* restrict_oacc_hoisting */);
   NEXT_PASS (pass_walloca, false);
   NEXT_PASS (pass_pre);
   NEXT_PASS (pass_sink_code);
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 7de47edbcb30..b392ae609aaf 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -47,6 +47,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "tree-dfa.h"
 #include "dbgcnt.h"
+#include "graphite-oacc.h"
+#include "internal-fn.h"

 /* TODO:  Support for predicated code motion.  I.e.

@@ -320,11 +322,23 @@ enum move_pos
Otherwise return MOVE_IMPOSSIBLE.  */

 enum move_pos
-movement_possibility (gimple *stmt)
+movement_possibility (gimple *stmt, bool restrict_oacc_hoisting)
 {
   tree lhs;
   enum move_pos ret = MOVE_POSSIBLE;

+  if (restrict_oacc_hoisting && oacc_get_fn_attrib (cfun->decl)
+  && gimple_code (stmt) == GIMPLE_ASSIGN)
+{
+  tree rhs = gimple_assign_rhs1 (stmt);
+
+  if (TREE_CODE (rhs) == VIEW_CONVERT_EXPR)
+   rhs = TREE_OPERAND (rhs, 0);
+
+  if (TREE_CODE (rhs) == ARRAY_REF)
+ return MOVE_IMPOSSIBLE;
+}
+
   if (flag_unswitch_loops
   && gimple_code (stmt) == GIMPLE_COND)
 {
@@ -974,7 +988,7 @@ rewrite_bittest (gimple_stmt_iterator *bsi)
statements.  */

 static void
-compute_invariantness (basic_block bb)
+compute_invariantness (basic_block bb, bool restrict_oacc_hoisting)
 {
   enum move_pos pos;
   gimple_stmt_iterator bsi;
@@ -1002,7 +1016,7 @@ compute_invariantness (basic_block bb)
   {
stmt = gsi_stmt (bsi);

-   pos = movement_possibility (stmt);
+   pos = movement_possibility (stmt, re

[OG11][committed][PATCH 18/22] openacc: Disable pass_pre on outlined functions analyzed by Graphite

2021-11-17 Thread Frederik Harwath
The additional dependences introduced by partial redundancy
elimination proper and by the code hoisting step of the pass very
often cause Graphite to fail on OpenACC functions. On the other hand,
the pass can also enable the analysis of OpenACC loops (cf. e.g. the
loop-auto-transfer-4.f90 testcase), for instance, because full
redundancy elimination removes definitions that would otherwise
prevent the creation of runtime alias checks outside of the SCoP.

This commit disables the actual partial redundancy elimination step as
well as the code hoisting step of pass_pre on OpenACC functions that
might be handled by Graphite.

gcc/ChangeLog:

* tree-ssa-pre.c (insert): Skip any insertions in OpenACC
functions that might be processed by Graphite.
---
 gcc/tree-ssa-pre.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 2aedc31e1d73..b904354e4c78 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dce.h"
 #include "tree-cfgcleanup.h"
 #include "alias.h"
+#include "graphite-oacc.h"

 /* Even though this file is called tree-ssa-pre.c, we actually
implement a bit more than just PRE here.  All of them piggy-back
@@ -3736,6 +3737,22 @@ do_hoist_insertion (basic_block block)
 static void
 insert (void)
 {
+
+/* The additional dependences introduced by the code insertions
+ can cause Graphite's dependence analysis to fail .  Without
+ special handling of those dependences in Graphite, it seems
+ better to skip this step if OpenACC loops that need to be handled
+ by Graphite are found.  Note that the full redundancy elimination
+ step of this pass is useful for the purpose of dependence
+ analysis, for instance, because it can remove definitions from
+ SCoPs that would otherwise prevent the creation of runtime alias
+ checks since those may only use definitions that are available
+ before the SCoP. */
+
+  if (oacc_function_p (cfun)
+  && ::graphite_analyze_oacc_function_p (cfun))
+return;
+
   basic_block bb;

   FOR_ALL_BB_FN (bb, cfun)
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 19/22] graphite: Tune parameters for OpenACC use

2021-11-17 Thread Frederik Harwath
The default values of some parameters that restrict Graphite's
resource usage are too low for many OpenACC codes.  Furthermore,
exceeding the limits does not alwas lead to user-visible diagnostic
messages.

This commit increases the parameter values on OpenACC functions.  The
values were chosen to allow for the analysis of all "kernels" regions
in the SPEC ACCEL v1.3 benchmark suite.  Warnings about exceeded
Graphite-related limits are added to the -fopt-info-missed
output. Those warnings are phrased in a uniform way that intentionally
refers to the "data-dependence analysis" of "OpenACC loops" instead of
"a failure in Graphite" to make them easier to understand for users.

gcc/ChangeLog:

* graphite-optimize-isl.c (optimize_isl): Adjust
param_max_isl_operations value for OpenACC functions and add
special warnings if value gets exceeded.

* graphite-scop-detection.c (build_scops): Likewise for
param_graphite_max_arrays_per_scop.

gcc/testsuite/ChangeLog:

* gcc.dg/goacc/graphite-parameter-1.c: New test.
* gcc.dg/goacc/graphite-parameter-2.c: New test.
---
 gcc/graphite-optimize-isl.c   | 35 ---
 gcc/graphite-scop-detection.c | 28 ++-
 .../gcc.dg/goacc/graphite-parameter-1.c   | 21 +++
 .../gcc.dg/goacc/graphite-parameter-2.c   | 23 
 4 files changed, 101 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 019452700a49..4eecbd20b740 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "tree-vectorizer.h"
 #include "graphite.h"
+#include "graphite-oacc.h"


 /* get_schedule_for_node_st - Improve schedule for the schedule node.
@@ -115,6 +116,14 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite)
   int old_err = isl_options_get_on_error (scop->isl_context);
   int old_max_operations = isl_ctx_get_max_operations (scop->isl_context);
   int max_operations = param_max_isl_operations;
+
+  /* The default value for param_max_isl_operations is easily exceeded
+ by "kernels" loops in existing OpenACC codes.  Raise the values
+ significantly since analyzing those loops is crucial. */
+  if (param_max_isl_operations == 35 /* default value */
+  && oacc_function_p (cfun))
+max_operations = 200;
+
   if (max_operations)
 isl_ctx_set_max_operations (scop->isl_context, max_operations);
   isl_options_set_on_error (scop->isl_context, ISL_ON_ERROR_CONTINUE);
@@ -164,11 +173,27 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite)
  dump_user_location_t loc = find_loop_location
(scop->scop_info->region.entry->dest->loop_father);
  if (isl_ctx_last_error (scop->isl_context) == isl_error_quota)
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
-"loop nest not optimized, optimization timed out "
-"after %d operations [--param 
max-isl-operations]\n",
-max_operations);
- else
+   {
+  if (oacc_function_p (cfun))
+   {
+ /* Special casing for OpenACC to unify diagnostic messages
+here and in graphite-scop-detection.c. */
+  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+   "data-dependence analysis of OpenACC loop "
+   "nest "
+   "failed; try increasing the value of "
+   "--param="
+   "max-isl-operations=%d.\n",
+   max_operations);
+}
+  else
+dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+ "loop nest not optimized, optimization timed "
+ "out after %d operations [--param "
+ "max-isl-operations]\n",
+ max_operations);
+}
+  else
dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
 "loop nest not optimized, ISL signalled an 
error\n");
}
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 8b41044bce5e..afc955cc97eb 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -2056,6 +2056,9 @@ determine_openacc_reductions (scop_p scop)
   }
 }

+
+extern dump_user_location_t find_loop_location (class loop *);
+
 /* Find Static Control Parts (SCoP) in the current function and pushes
them to SCOPS.  */

@@ -2109,6 +2112,11 @@ build_scops (vec *scops)

[OG11][committed][PATCH 20/22] graphite: Adjust scop loop-nest choice

2021-11-17 Thread Frederik Harwath
The find_common_loop function is used in Graphite to obtain a common
super-loop of all loops inside a SCoP.  The function is applied to the
loop of the destination block of the edge that leads into the SESE
region and the loop of the source block of the edge that exits the
region.  The exit block is usually introduced by the canonicalization
of the loop structure that Graphite does to support its code
generation. If it is empty, it may happen that it belongs to the outer
fake loop.  This way, build_alias_set may end up analysing
data-references with respect to this loop although there may exist a
proper super-loop of the SCoP loops.  This does not seem to be correct
in general and it leads to problems with runtime alias check creation
which fails if executed on a loop without niter information.

gcc/ChangeLog:

* graphite-scop-detection.c (scop_context_loop): New function.
(build_alias_set): Use scop_context_loop instead of find_common_loop.
* graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise.
* graphite.h (scop_context_loop): New declaration.
---
 gcc/graphite-isl-ast-to-gimple.c |  4 +---
 gcc/graphite-scop-detection.c| 21 ++---
 gcc/graphite.h   |  1 +
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index bdabe588c3d8..ec055a358f39 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1543,9 +1543,7 @@ graphite_regenerate_ast_isl (scop_p scop)
 conditional if aliasing can be ruled out at runtime and the original
 version of the SCoP, otherwise. */

-  loop_p loop
-  = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
-  scop->scop_info->region.exit->src->loop_father);
+  loop_p loop = scop_context_loop (scop);
   tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop);
   tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
   set_ifsese_condition (region->if_region, non_alias_cond);
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index afc955cc97eb..99e906a5d120 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -297,6 +297,23 @@ single_pred_cond_non_loop_exit (basic_block bb)
   return NULL;
 }

+
+/* Return the innermost loop that encloses all loops in SCOP. */
+
+loop_p
+scop_context_loop (scop_p scop)
+{
+  edge scop_entry = scop->scop_info->region.entry;
+  edge scop_exit = scop->scop_info->region.exit;
+  basic_block exit_bb = scop_exit->src;
+
+  while (sese_trivially_empty_bb_p (exit_bb) && single_pred_p (exit_bb))
+exit_bb = single_pred (exit_bb);
+
+  loop_p entry_loop = scop_entry->dest->loop_father;
+  return find_common_loop (entry_loop, exit_bb->loop_father);
+}
+
 namespace
 {

@@ -1776,9 +1793,7 @@ build_alias_set (scop_p scop)
   int i, j;
   int *all_vertices;

-  struct loop *nest
-= find_common_loop (scop->scop_info->region.entry->dest->loop_father,
-   scop->scop_info->region.exit->src->loop_father);
+  struct loop *nest = scop_context_loop (scop);

   gcc_checking_assert (nest);

diff --git a/gcc/graphite.h b/gcc/graphite.h
index 9c508f31109f..dacb27a9073c 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -480,4 +480,5 @@ extern tree cached_scalar_evolution_in_region (const sese_l 
&, loop_p, tree);
 extern void dot_all_sese (FILE *, vec &);
 extern void dot_sese (sese_l &);
 extern void dot_cfg ();
+extern loop_p scop_context_loop (scop_p);
 #endif
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 21/22] graphite: Accept loops without data references

2021-11-17 Thread Frederik Harwath
It seems that the check that rejects loops without data references is
only included to avoid handling non-profitable loops.  Including those
loops in Graphite's analysis enables more consistent diagnostic
messages in OpenACC "kernels" code and does not introduce any
testsuite regressions.  If executing Graphite on loops without
data references leads to noticeable compile time slow-downs for
non-OpenACC users of Graphite, the check can be re-introduced but
restricted to non-OpenACC functions.

gcc/ChangeLog:

* graphite-scop-detection.c (scop_detection::harmful_loop_in_region):
Remove check for loops without data references.
---
 gcc/graphite-scop-detection.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 99e906a5d120..9311a0e42a57 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -851,19 +851,6 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
  return true;
}

-  /* Check if all loop nests have at least one data reference.
-???  This check is expensive and loops premature at this point.
-If important to retain we can pre-compute this for all innermost
-loops and reject those when we build a SESE region for a loop
-during SESE discovery.  */
-  if (! loop->inner
- && ! loop_nest_has_data_refs (loop))
-   {
- DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num
-  << " does not have any data reference.\n");
- return true;
-   }
-
   DEBUG_PRINT (dp << "[scop-detection] loop_" << loop->num << " is 
harmless.\n");
 }

--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [committed] analyzer: fix missing -Wanalyzer-write-to-const [PR102695]

2021-11-17 Thread Martin Sebor via Gcc-patches

On 11/16/21 7:05 PM, David Malcolm via Gcc-patches wrote:

This patch fixes -Wanalyzer-write-to-const so that it will complain
about attempts to write to functions, to labels.
It also "teaches" the analyzer about strchr, in that strchr can either
return a pointer into the input area (and thus -Wanalyzer-write-to-const
can now complain about writes into a string literal seen this way),
or return NULL (and thus the analyzer can complain about NULL
dereferences if the result is used without a check).


Fow what it's worth, I used strchr in the test case as an example.
There are a few other built-ins like it, including index, rindex,
memchr, strrchr, and strstr (just going through the switch
statements in my code).

At least some of these built-ins have an attribute "fn spec" that
describes some of their properties (like what argument they read
from; see builtin_fnspec in builtins.c).  But it doesn't look
like attr_fnspec has a way of encoding a function that returns
a pointer argument plus some offset.  That seems like a useful
enhancement both for our work and also for optimizers.  It would
let us avoid having to hardcode these properties in duplicate
case and switch statements in multiple places.

Martin



Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-5330-g111fd515f2894d7cddf62f80c69765c43ae18577.

gcc/analyzer/ChangeLog:
PR analyzer/102695
* region-model-impl-calls.cc (region_model::impl_call_strchr): New.
* region-model-manager.cc
(region_model_manager::maybe_fold_unaryop): Simplify cast to
pointer type of an existing pointer to a region.
* region-model.cc (region_model::on_call_pre): Handle
BUILT_IN_STRCHR and "strchr".
(write_to_const_diagnostic::emit): Add auto_diagnostic_group.  Add
alternate wordings for functions and labels.
(write_to_const_diagnostic::describe_final_event): Add alternate
wordings for functions and labels.
(region_model::check_for_writable_region): Handle RK_FUNCTION and
RK_LABEL.
* region-model.h (region_model::impl_call_strchr): New decl.

gcc/testsuite/ChangeLog:
PR analyzer/102695
* gcc.dg/analyzer/pr102695.c: New test.
* gcc.dg/analyzer/strchr-1.c: New test.

Signed-off-by: David Malcolm 
---
  gcc/analyzer/region-model-impl-calls.cc  | 69 
  gcc/analyzer/region-model-manager.cc |  7 +++
  gcc/analyzer/region-model.cc | 52 --
  gcc/analyzer/region-model.h  |  1 +
  gcc/testsuite/gcc.dg/analyzer/pr102695.c | 44 +++
  gcc/testsuite/gcc.dg/analyzer/strchr-1.c | 26 +
  6 files changed, 196 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr102695.c
  create mode 100644 gcc/testsuite/gcc.dg/analyzer/strchr-1.c

diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index 90d4cf9c2db..ae50e69542e 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -678,6 +678,75 @@ region_model::impl_call_realloc (const call_details &cd)
  }
  }
  
+/* Handle the on_call_pre part of "strchr" and "__builtin_strchr".  */

+
+void
+region_model::impl_call_strchr (const call_details &cd)
+{
+  class strchr_call_info : public call_info
+  {
+  public:
+strchr_call_info (const call_details &cd, bool found)
+: call_info (cd), m_found (found)
+{
+}
+
+label_text get_desc (bool can_colorize) const FINAL OVERRIDE
+{
+  if (m_found)
+   return make_label_text (can_colorize,
+   "when %qE returns non-NULL",
+   get_fndecl ());
+  else
+   return make_label_text (can_colorize,
+   "when %qE returns NULL",
+   get_fndecl ());
+}
+
+bool update_model (region_model *model,
+  const exploded_edge *,
+  region_model_context *ctxt) const FINAL OVERRIDE
+{
+  const call_details cd (get_call_details (model, ctxt));
+  if (tree lhs_type = cd.get_lhs_type ())
+   {
+ region_model_manager *mgr = model->get_manager ();
+ const svalue *result;
+ if (m_found)
+   {
+ const svalue *str_sval = cd.get_arg_svalue (0);
+ const region *str_reg
+   = model->deref_rvalue (str_sval, cd.get_arg_tree (0),
+  cd.get_ctxt ());
+ /* We want str_sval + OFFSET for some unknown OFFSET.
+Use a conjured_svalue to represent the offset,
+using the str_reg as the id of the conjured_svalue.  */
+ const svalue *offset
+   = mgr->get_or_create_conjured_svalue (size_type_node,
+ cd.get_call_stmt (),
+  

Re: [PATCH] rs6000: Better error messages for power8/9-vector builtins

2021-11-17 Thread Paul A. Clarke via Gcc-patches
On Tue, Nov 16, 2021 at 11:12:35AM -0600, Bill Schmidt via Gcc-patches wrote:
> Hi!  During a previous patch review, Segher asked that I provide better
> messages when builtins are unavailable because they require both a minimum
> CPU and the enablement of VSX instructions.  This patch does just that.
...
> gcc/
>   * config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin): Change
>   error messages for ENB_P8V and ENB_P9V.
> ---
>  gcc/config/rs6000/rs6000-call.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 85fec80c6d7..035266eb001 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -11943,7 +11943,8 @@ rs6000_invalid_new_builtin (enum rs6000_gen_builtins 
> fncode)
>error ("%qs requires the %qs option", name, "-mcpu=power8");
>break;
>  case ENB_P8V:
> -  error ("%qs requires the %qs option", name, "-mpower8-vector");
> +  error ("%qs requires the %qs and %qs options", name, "-mcpu=power8",
> +  "-mvsx");

"-mcpu=power8" itself enables "-mvsx", doesn't it?

>break;
>  case ENB_P9:
>error ("%qs requires the %qs option", name, "-mcpu=power9");
> @@ -11953,7 +11954,8 @@ rs6000_invalid_new_builtin (enum rs6000_gen_builtins 
> fncode)
>name, "-mcpu=power9", "-m64", "-mpowerpc64");
>break;
>  case ENB_P9V:
> -  error ("%qs requires the %qs option", name, "-mpower9-vector");
> +  error ("%qs requires the %qs and %qs options", name, "-mcpu=power9",
> +  "-mvsx");

Similarly, "-mcpu=power9" itself enables "-mvsx", doesn't it?

Are you trying to also say "don't use -mno-vsx"?  If so, maybe s/and/with/
would be slightly less confusing? This is going to be awkward unless it can
be more precise, like two messages depending on actual context:
- with "-mcpu=power8 -mno-vsx:  "...requires -mvsx".
- without "-mcpu=power8":  "...requires -mcpu=power8".

PC


[committed] Fix two mips target tests compromised by recent IPA work

2021-11-17 Thread Jeff Law via Gcc-patches
Jan's recent IPA work compromised two mips tests.   This restores the 
tests by disabling IPA analysis on the key function in both tests.


Committed to the trunk,

Jeffcommit c70546482388951b5c9c19cff002ee6ab920b7f5
Author: Jeff Law 
Date:   Wed Nov 17 11:55:50 2021 -0500

Fix two mips target tests compromised by recent IPA work

gcc/testsuite
* gcc.target/mips/frame-header-1.c (bar): Add noipa attribute.
* gcc.target/mips/frame-header-2.c (bar): Likewise.

diff --git a/gcc/testsuite/gcc.target/mips/frame-header-1.c 
b/gcc/testsuite/gcc.target/mips/frame-header-1.c
index 971656ddaa3..55efc0b02f8 100644
--- a/gcc/testsuite/gcc.target/mips/frame-header-1.c
+++ b/gcc/testsuite/gcc.target/mips/frame-header-1.c
@@ -8,7 +8,7 @@
 /* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
 /* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-24" } } */
 
-NOMIPS16 void __attribute__((noinline))
+NOMIPS16 void __attribute__((noinline)) __attribute__((noipa))
 bar (int* a)
 {
   *a = 1;
diff --git a/gcc/testsuite/gcc.target/mips/frame-header-2.c 
b/gcc/testsuite/gcc.target/mips/frame-header-2.c
index 0e86bc91994..31aa27e990f 100644
--- a/gcc/testsuite/gcc.target/mips/frame-header-2.c
+++ b/gcc/testsuite/gcc.target/mips/frame-header-2.c
@@ -8,7 +8,7 @@
 /* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
 /* { dg-final { scan-assembler "\taddiu\t\\\$sp,\\\$sp,-8" } } */
 
-NOMIPS16 void __attribute__((noinline))
+NOMIPS16 void __attribute__((noinline)) __attribute__((noipa))
 bar (int* a)
 {
   *a = 1;


Re: [PATCH] rs6000: Better error messages for power8/9-vector builtins

2021-11-17 Thread Bill Schmidt via Gcc-patches
On 11/17/21 10:54 AM, Paul A. Clarke wrote:
> On Tue, Nov 16, 2021 at 11:12:35AM -0600, Bill Schmidt via Gcc-patches wrote:
>> Hi!  During a previous patch review, Segher asked that I provide better
>> messages when builtins are unavailable because they require both a minimum
>> CPU and the enablement of VSX instructions.  This patch does just that.
> ...
>> gcc/
>>  * config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin): Change
>>  error messages for ENB_P8V and ENB_P9V.
>> ---
>>  gcc/config/rs6000/rs6000-call.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-call.c 
>> b/gcc/config/rs6000/rs6000-call.c
>> index 85fec80c6d7..035266eb001 100644
>> --- a/gcc/config/rs6000/rs6000-call.c
>> +++ b/gcc/config/rs6000/rs6000-call.c
>> @@ -11943,7 +11943,8 @@ rs6000_invalid_new_builtin (enum rs6000_gen_builtins 
>> fncode)
>>error ("%qs requires the %qs option", name, "-mcpu=power8");
>>break;
>>  case ENB_P8V:
>> -  error ("%qs requires the %qs option", name, "-mpower8-vector");
>> +  error ("%qs requires the %qs and %qs options", name, "-mcpu=power8",
>> + "-mvsx");
> "-mcpu=power8" itself enables "-mvsx", doesn't it?

Of course, but it can be disabled with -mno-vsx.  Then you get this error.
You won't get it unless you deliberately did something strange with the
compile options.

>
>>break;
>>  case ENB_P9:
>>error ("%qs requires the %qs option", name, "-mcpu=power9");
>> @@ -11953,7 +11954,8 @@ rs6000_invalid_new_builtin (enum rs6000_gen_builtins 
>> fncode)
>>   name, "-mcpu=power9", "-m64", "-mpowerpc64");
>>break;
>>  case ENB_P9V:
>> -  error ("%qs requires the %qs option", name, "-mpower9-vector");
>> +  error ("%qs requires the %qs and %qs options", name, "-mcpu=power9",
>> + "-mvsx");
> Similarly, "-mcpu=power9" itself enables "-mvsx", doesn't it?
>
> Are you trying to also say "don't use -mno-vsx"?  If so, maybe s/and/with/
> would be slightly less confusing? This is going to be awkward unless it can
> be more precise, like two messages depending on actual context:
> - with "-mcpu=power8 -mno-vsx:  "...requires -mvsx".
> - without "-mcpu=power8":  "...requires -mcpu=power8".

This seems like a YMMV situation...I don't see the confusion myself.

Bill

>
> PC


Re: [PATCH v5 1/1] [ARM] Add support for TLS register based stack protector canary access

2021-11-17 Thread Ard Biesheuvel via Gcc-patches
(+ Ramana)

On Mon, 15 Nov 2021 at 19:04, Ard Biesheuvel  wrote:
>
> Add support for accessing the stack canary value via the TLS register,
> so that multiple threads running in the same address space can use
> distinct canary values. This is intended for the Linux kernel running in
> SMP mode, where processes entering the kernel are essentially threads
> running the same program concurrently: using a global variable for the
> canary in that context is problematic because it can never be rotated,
> and so the OS is forced to use the same value as long as it remains up.
>
> Using the TLS register to index the stack canary helps with this, as it
> allows each CPU to context switch the TLS register along with the rest
> of the process, permitting each process to use its own value for the
> stack canary.
>
> 2021-11-15 Ard Biesheuvel 
>
> * config/arm/arm-opts.h (enum stack_protector_guard): New
> * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem):
> New
> * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define
> (arm_option_override_internal): Handle and put in error checks
> for stack protector guard options.
> (arm_option_reconfigure_globals): Likewise
> (arm_stack_protect_tls_canary_mem): New
> (arm_stack_protect_guard): New
> * config/arm/arm.md (stack_protect_set): New
> (stack_protect_set_tls): Likewise
> (stack_protect_test): Likewise
> (stack_protect_test_tls): Likewise
> (reload_tp_hard): Likewise
> * config/arm/arm.opt (-mstack-protector-guard): New
> (-mstack-protector-guard-offset): New.
> * doc/invoke.texi: Document new options
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arm/stack-protector-7.c: New test.
> * gcc.target/arm/stack-protector-8.c: New test.
>
> Signed-off-by: Ard Biesheuvel 
> ---
>  gcc/config/arm/arm-opts.h|  6 ++
>  gcc/config/arm/arm-protos.h  |  2 +
>  gcc/config/arm/arm.c | 55 +++
>  gcc/config/arm/arm.md| 71 +++-
>  gcc/config/arm/arm.opt   | 22 ++
>  gcc/doc/invoke.texi  | 11 +++
>  gcc/testsuite/gcc.target/arm/stack-protector-7.c | 10 +++
>  gcc/testsuite/gcc.target/arm/stack-protector-8.c |  5 ++
>  8 files changed, 180 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h
> index 5c4b62f404f7..581ba3c4fbbb 100644
> --- a/gcc/config/arm/arm-opts.h
> +++ b/gcc/config/arm/arm-opts.h
> @@ -69,4 +69,10 @@ enum arm_tls_type {
>TLS_GNU,
>TLS_GNU2
>  };
> +
> +/* Where to get the canary for the stack protector.  */
> +enum stack_protector_guard {
> +  SSP_TLSREG,  /* per-thread canary in TLS register */
> +  SSP_GLOBAL   /* global canary */
> +};
>  #endif
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 9b1f61394ad7..d8d605920c97 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, 
> rtx, rtx, rtx, rtx);
>  extern rtx arm_load_tp (rtx);
>  extern bool arm_coproc_builtin_available (enum unspecv);
>  extern bool arm_coproc_ldc_stc_legitimate_address (rtx);
> +extern rtx arm_stack_protect_tls_canary_mem (bool);
> +
>
>  #if defined TREE_CODE
>  extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index a5b403eb3e49..e5077348ce07 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] =
>
>  #undef TARGET_MD_ASM_ADJUST
>  #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
> +
> +#undef TARGET_STACK_PROTECT_GUARD
> +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard
>
>  /* Obstack for minipool constant handling.  */
>  static struct obstack minipool_obstack;
> @@ -3176,6 +3179,26 @@ arm_option_override_internal (struct gcc_options *opts,
>if (TARGET_THUMB2_P (opts->x_target_flags))
>  opts->x_inline_asm_unified = true;
>
> +  if (arm_stack_protector_guard == SSP_GLOBAL
> +  && opts->x_arm_stack_protector_guard_offset_str)
> +{
> +  error ("incompatible options %'-mstack-protector-guard=global%' and"
> +"%'-mstack-protector-guard-offset=%qs%'",
> +arm_stack_protector_guard_offset_str);
> +}
> +
> +  if (opts->x_arm_stack_protector_guard_offset_str)
> +{
> +  char *end;
> +  const char *str = arm_stack_protector_guard_offset_str;
> +  errno = 0;
> +  long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0);
> +  if (!*str || *end || errno)
> +   error ("%qs is not a valid offset in %qs", str,
> +  "-mstack-protector-guard-offset=");
> +  arm_stack_pr

[PATCH] DWARF: Match behaviour of .cfi_xxx when doing manual frame output.

2021-11-17 Thread Iain Sandoe via Gcc-patches
At present, for several reasons, it is not possible to switch
Darwin to use .cfi instructions for frame output.

When GCC uses .cfi_ instructions, the behaviour w.r.t frame
sections (for a target with unwind frames by defaults):

(no options ) .eh_frame
(-g ) .eh_frame
(-g -fno-unwind-tables -fno-asynchronous-unwind-tables) .debug_frame
(   -fno-unwind-tables -fno-asynchronous-unwind-tables) ---

However, for a target which outputs the FDEs "manually" (using
output_call_frame_info()) we have:

(no options ) __eh_frame
(-g ) __eh_frame *and* __debug_frame
(-g -fno-unwind-tables -fno-asynchronous-unwind-tables) __debug_frame
(   -fno-unwind-tables -fno-asynchronous-unwind-tables) ---

The first two cases are, of course, the most common and the extra
frame table is (a) a waste of space and (b) actually triggers a bug
when used with the LLVM assembler [with assertions enabled] for
Mach-O when we have hot/cold partitioning on, since that emits
Letext{.cold}0 labels *after* the __DWARF,__debug_frame and the
assembler is set up reject switches to non-debug sections after the
first __DWARF debug one has been seen.

The following patch makes the manual output of frame data follow the
same pattern as the .cfi instructions.

(a) From testing on Darwin which uses the 'manual frame output' I see
around 200Mb saving on gcc/ for master (5%).
(b) Since Darwin defaults to unwind frames for all languages, we see
only eh_frame sections before the "real debug" is emitted, so that
the LLVM constraint is avoided.

On testing on x86_64 and powerpc64le Linux, I see only a single test
that would need amendment (it counts the number of references to the
start/end local labels).

Since the majority of targets are using .cfi instructions, it is hard
to get wider testing.

It would be possible, of course, to wrap the change in a target hook
but it's not clear that we need to.

Is there some case that I've missed?
or - OK for master (the testcase amendments are not attached here)
but are simple.

thanks,
Iain

Signed-off-by: Iain Sandoe 

gcc/ChangeLog:

* dwarf2out.c (output_call_frame_info): Output the FDEs when
either EH or debug support is needed.
(dwarf2out_frame_finish): When either EH or debug support is
needed, call output_call_frame_info().
---
 gcc/dwarf2out.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index e1d6a79ecd7..96307d6747a 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -283,7 +283,7 @@ static GTY(()) dw_die_ref decltype_auto_die;
 
 /* Forward declarations for functions defined in this file.  */
 
-static void output_call_frame_info (int);
+static void output_call_frame_info (bool, bool);
 
 /* Personality decl of current unit.  Used only when assembler does not support
personality CFI.  */
@@ -750,7 +750,7 @@ fde_needed_for_eh_p (dw_fde_ref fde)
location of saved registers.  */
 
 static void
-output_call_frame_info (int for_eh)
+output_call_frame_info (bool for_eh, bool for_debug)
 {
   unsigned int i;
   dw_fde_ref fde;
@@ -795,7 +795,7 @@ output_call_frame_info (int for_eh)
targetm.asm_out.emit_unwind_label (asm_out_file, fde->decl, 1, 1);
}
 
-  if (!any_eh_needed)
+  if (!any_eh_needed && !for_debug)
return;
 }
 
@@ -1271,12 +1271,9 @@ void
 dwarf2out_frame_finish (void)
 {
   /* Output call frame information.  */
-  if (targetm.debug_unwind_info () == UI_DWARF2)
-output_call_frame_info (0);
-
-  /* Output another copy for the unwinder.  */
-  if (do_eh_frame)
-output_call_frame_info (1);
+  if (targetm.debug_unwind_info () == UI_DWARF2 || do_eh_frame)
+output_call_frame_info (do_eh_frame,
+   targetm.debug_unwind_info () == UI_DWARF2);
 }
 
 static void var_location_switch_text_section (void);
-- 
2.24.3 (Apple Git-128)



[committed] libstdc++: Fix std::type_info::before for ARM [PR103240]

2021-11-17 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, and briefly checkd on armv7hl-linux-gnueabi,
pushed to trunk.


The r179236 fix for std::type_info::operator== should also have been
applied to std::type_info::before. Otherwise two distinct types can
compare equivalent due to using a string comparison, when they should do
a pointer comparison.

libstdc++-v3/ChangeLog:

PR libstdc++/103240
* libsupc++/tinfo2.cc (type_info::before): Use unadjusted name
to check for the '*' prefix.
* testsuite/util/testsuite_shared.cc: Add type_info object for
use in new test.
* testsuite/18_support/type_info/103240.cc: New test.
---
 libstdc++-v3/libsupc++/tinfo2.cc  |  5 ++-
 .../testsuite/18_support/type_info/103240.cc  | 36 +++
 .../testsuite/util/testsuite_shared.cc| 12 +++
 3 files changed, 52 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/18_support/type_info/103240.cc

diff --git a/libstdc++-v3/libsupc++/tinfo2.cc b/libstdc++-v3/libsupc++/tinfo2.cc
index b587cfd037b..d02021fe538 100644
--- a/libstdc++-v3/libsupc++/tinfo2.cc
+++ b/libstdc++-v3/libsupc++/tinfo2.cc
@@ -36,7 +36,10 @@ type_info::before (const type_info &arg) const 
_GLIBCXX_NOEXCEPT
 #if __GXX_MERGED_TYPEINFO_NAMES
   return name () < arg.name ();
 #else
-  return (name ()[0] == '*') ? name () < arg.name ()
+  /* The name() method will strip any leading '*' prefix. Therefore
+ take care to look at __name rather than name() when looking for
+ the "pointer" prefix.  */
+  return (__name[0] == '*') ? name () < arg.name ()
 :  __builtin_strcmp (name (), arg.name ()) < 0;
 #endif
 }
diff --git a/libstdc++-v3/testsuite/18_support/type_info/103240.cc 
b/libstdc++-v3/testsuite/18_support/type_info/103240.cc
new file mode 100644
index 000..3d5968ac25c
--- /dev/null
+++ b/libstdc++-v3/testsuite/18_support/type_info/103240.cc
@@ -0,0 +1,36 @@
+// { dg-do run }
+// { dg-require-sharedlib "" }
+// { dg-options "./testsuite_shared.so" }
+
+#include 
+#include 
+
+namespace __gnu_test
+{
+namespace
+{
+  struct S { };
+  struct T { };
+}
+
+// Defined in testsuite_shared.so, referring to private type in that library
+// with the same mangled name as __gnu_testS defined here.
+extern const std::type_info& pr103240_private_S;
+}
+
+const std::type_info& private_S = __gnu_test::pr103240_private_S;
+const std::type_info& local_S = typeid(__gnu_test::S);
+const std::type_info& local_T = typeid(__gnu_test::T);
+
+int main()
+{
+  VERIFY( local_S == local_S );
+  VERIFY( ! local_S.before(local_S) );
+
+  VERIFY( local_S != local_T );
+  VERIFY( local_S.before(local_T) || local_T.before(local_S) );
+
+  VERIFY( local_S != private_S );
+  // PR libstdc++/103240
+  VERIFY( local_S.before(private_S) || private_S.before(local_S) );
+}
diff --git a/libstdc++-v3/testsuite/util/testsuite_shared.cc 
b/libstdc++-v3/testsuite/util/testsuite_shared.cc
index c4a7ed4abe5..8c10534c511 100644
--- a/libstdc++-v3/testsuite/util/testsuite_shared.cc
+++ b/libstdc++-v3/testsuite/util/testsuite_shared.cc
@@ -23,6 +23,9 @@
 #include 
 #include 
 #include 
+#if __cpp_rtti
+# include 
+#endif
 
 namespace __gnu_test
 {
@@ -130,4 +133,13 @@ try_function_random_fail()
   }
 #endif
 
+#if __cpp_rtti
+// PR libstdc++/103240
+namespace
+{
+  struct S { };
+}
+const std::type_info& pr103240_private_S = typeid(S);
+#endif
+
 } // end namepace __gnu_test
-- 
2.31.1



[committed] libstdc++: Set active member of union in std::string [PR103295]

2021-11-17 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.


Clang diagnoses that the new constexpr std::string constructors are not
usable in constant expressions, because they start to write to members
of the union without setting an active member.

This adds a new helper function which returns the address of the local
buffer after making it the active member.

This doesn't fix all problems with Clang, because it still refuses to
write to memory returned by the allocator.

libstdc++-v3/ChangeLog:

PR libstdc++/103295
* include/bits/basic_string.h (_M_use_local_data()): New
member function to make local buffer the active member.
(assign(const basic_string&)): Use it.
* include/bits/basic_string.tcc (_M_construct, reserve()):
Likewise.
---
 libstdc++-v3/include/bits/basic_string.h   | 15 ++-
 libstdc++-v3/include/bits/basic_string.tcc | 10 --
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 0b7d6c0a981..9d281f5daf2 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -325,6 +325,19 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   _M_get_allocator() const
   { return _M_dataplus; }
 
+  // Ensure that _M_local_buf is the active member of the union.
+  __attribute__((__always_inline__))
+  _GLIBCXX14_CONSTEXPR
+  pointer
+  _M_use_local_data() _GLIBCXX_NOEXCEPT
+  {
+#if __cpp_lib_is_constant_evaluated
+   if (__builtin_is_constant_evaluated())
+ _M_local_buf[0] = _CharT();
+#endif
+   return _M_local_data();
+  }
+
 private:
 
 #ifdef _GLIBCXX_DISAMBIGUATE_REPLACE_INST
@@ -1487,7 +1500,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
if (__str.size() <= _S_local_capacity)
  {
_M_destroy(_M_allocated_capacity);
-   _M_data(_M_local_data());
+   _M_data(_M_use_local_data());
_M_set_length(0);
  }
else
diff --git a/libstdc++-v3/include/bits/basic_string.tcc 
b/libstdc++-v3/include/bits/basic_string.tcc
index 5743770b42a..5a51f7e21b5 100644
--- a/libstdc++-v3/include/bits/basic_string.tcc
+++ b/libstdc++-v3/include/bits/basic_string.tcc
@@ -170,9 +170,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
size_type __len = 0;
size_type __capacity = size_type(_S_local_capacity);
 
+   pointer __p = _M_use_local_data();
+
while (__beg != __end && __len < __capacity)
  {
-   _M_data()[__len++] = *__beg;
+   __p[__len++] = *__beg;
++__beg;
  }
 
@@ -223,6 +225,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_data(_M_create(__dnew, size_type(0)));
_M_capacity(__dnew);
  }
+   else
+ _M_use_local_data();
 
// Check for out_of_range and length_error exceptions.
__try
@@ -247,6 +251,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _M_data(_M_create(__n, size_type(0)));
  _M_capacity(__n);
}
+  else
+   _M_use_local_data();
 
   if (__n)
this->_S_assign(_M_data(), __n, __c);
@@ -355,7 +361,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   if (__length <= size_type(_S_local_capacity))
{
- this->_S_copy(_M_local_data(), _M_data(), __length + 1);
+ this->_S_copy(_M_use_local_data(), _M_data(), __length + 1);
  _M_destroy(__capacity);
  _M_data(_M_local_data());
}
-- 
2.31.1



[committed] libstdc++: Simplify std::string constructors

2021-11-17 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.


Several std::basic_string constructors dispatch to one of the
two-argument overloads of _M_construct, which then dispatches again to
_M_construct_aux to detect whether the arguments are iterators or not.
That then dispatches to one of _M_construct(size_type, char_type) or
_M_construct(Iter, Iter, iterator_traits::iterator_category{}).

For most of those constructors this is a waste of time, because we know
the arguments are already iterators. For basic_string(const CharT*) and
basic_string(initializer_list) we know that we call _M_construct with
two pointers, and for basic_string(const basic_string&) we call it with
two const_iterators.  Those constructors can call the three-argument
overload of _M_construct with the iterator category tag right away,
without the intermediate dispatching.

The case where this doesn't apply is basic_string(InputIter, InputIter),
but for C++11 and later this is constrained so we know it's an iterator
here as well. We can restrict the dispatching in this constructor to
only be done for C++98 and to call _M_construct_aux directly, which
allows us to remove the two-argument _M_construct(InputIter, InputIter)
overload entirely.

N.B. When calling the three-arg _M_construct with pointers or string
iterators, we pass forward_iterator_tag not random_access_iterator_tag.
This is because it makes no difference which overload gets called, and
simplifies overload resolution to not have to do a base-to-derived
check. If we ever add a new overload of M_construct for random access
iterators we would have to revisit this, but that seems unlikely.

This patch also moves the __is_null_pointer checks from the three-arg
_M_construct into the constructors where a null pointer argument is
actually possible. This avoids redundant checks where we know we have a
non-null pointer, or don't have a pointer at all.

Finally, this patch replaces some try-blocks with an RAII type, so that
memory is deallocated during unwinding. This avoids the overhead of
catching and rethrowing an exception.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (_M_construct_aux): Only define
for C++98. Remove constexpr.
(_M_construct_aux_2): Likewise.
(_M_construct(InputIter, InputIter)): Remove.
(basic_string(const basic_string&)): Call _M_construct with
iterator category argument.
(basic_string(const basic_string&, size_type, const Alloc&)):
Likewise.
(basic_string(const basic_string&, size_type, size_type)):
Likewise.
(basic_string(const charT*, size_type, const Alloc&)): Likewise.
Check for null pointer.
(basic_string(const charT*, const Alloc&)): Likewise.
(basic_string(initializer_list, const Alloc&)): Call
_M_construct with iterator category argument.
(basic_string(const basic_string&, const Alloc&)): Likewise.
(basic_string(basic_string&&, const Alloc&)): Likewise.
(basic_string(_InputIter, _InputIter, const Alloc&)): Likewise
for C++11 and later, call _M_construct_aux for C++98.
* include/bits/basic_string.tcc
(_M_construct(I, I, input_iterator_tag)): Replace try-block with
RAII type.
(_M_construct(I, I, forward_iterator_tag)): Likewise. Remove
__is_null_pointer check.
---
 libstdc++-v3/include/bits/basic_string.h   | 61 +++
 libstdc++-v3/include/bits/basic_string.tcc | 69 --
 2 files changed, 74 insertions(+), 56 deletions(-)

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 9d281f5daf2..d29c9cdc410 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -262,10 +262,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   _M_destroy(size_type __size) throw()
   { _Alloc_traits::deallocate(_M_get_allocator(), _M_data(), __size + 1); }
 
+#if __cplusplus < 201103L || defined _GLIBCXX_DEFINING_STRING_INSTANTIATIONS
   // _M_construct_aux is used to implement the 21.3.1 para 15 which
   // requires special behaviour if _InIterator is an integral type
   template
-   _GLIBCXX20_CONSTEXPR
 void
 _M_construct_aux(_InIterator __beg, _InIterator __end,
 std::__false_type)
@@ -277,24 +277,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 438. Ambiguity in the "do the right thing" clause
   template
-   _GLIBCXX20_CONSTEXPR
 void
 _M_construct_aux(_Integer __beg, _Integer __end, std::__true_type)
{ _M_construct_aux_2(static_cast(__beg), __end); }
 
-  _GLIBCXX20_CONSTEXPR
   void
   _M_construct_aux_2(size_type __req, _CharT __c)
   { _M_construct(__req, __c); }
-
-  template
-   _GLIBCXX20_CONSTEXPR
-void
-_M_construct(_InIterator __beg, _InIterator __end)
-   {
- typedef typename std

[committed] libstdc++: Use std::construct_at in net::ip::address

2021-11-17 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.


Using placement-new isn't valid in constant expressions, so this
replaces it with std::construct_at (via the std::_Construct function
that is usable before C++20).

libstdc++-v3/ChangeLog:

* include/experimental/internet (address): Use std::_Construct
to initialize union members.
---
 libstdc++-v3/include/experimental/internet | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/experimental/internet 
b/libstdc++-v3/include/experimental/internet
index 95b8cdc9963..5e2ef00c16f 100644
--- a/libstdc++-v3/include/experimental/internet
+++ b/libstdc++-v3/include/experimental/internet
@@ -466,9 +466,9 @@ namespace ip
 address(const address& __a) noexcept : _M_uninit(), _M_is_v4(__a._M_is_v4)
 {
   if (_M_is_v4)
-   ::new (std::addressof(_M_v4)) address_v4(__a.to_v4());
+   std::_Construct(std::addressof(_M_v4), __a.to_v4());
   else
-   ::new (std::addressof(_M_v6)) address_v6(__a.to_v6());
+   std::_Construct(std::addressof(_M_v6), __a.to_v6());
 }
 
 constexpr
@@ -491,7 +491,7 @@ namespace ip
 address&
 operator=(const address_v4& __a) noexcept
 {
-  ::new (std::addressof(_M_v4)) address_v4(__a);
+  std::_Construct(std::addressof(_M_v4), __a);
   _M_is_v4 = true;
   return *this;
 }
@@ -499,7 +499,7 @@ namespace ip
 address&
 operator=(const address_v6& __a) noexcept
 {
-  ::new (std::addressof(_M_v6)) address_v6(__a);
+  std::_Construct(std::addressof(_M_v6), __a);
   _M_is_v4 = false;
   return *this;
 }
-- 
2.31.1



Re: [PATCH] rs6000: Better error messages for power8/9-vector builtins

2021-11-17 Thread Paul A. Clarke via Gcc-patches
On Wed, Nov 17, 2021 at 11:00:07AM -0600, Bill Schmidt via Gcc-patches wrote:
> On 11/17/21 10:54 AM, Paul A. Clarke wrote:
> > On Tue, Nov 16, 2021 at 11:12:35AM -0600, Bill Schmidt via Gcc-patches 
> > wrote:
> >> Hi!  During a previous patch review, Segher asked that I provide better
> >> messages when builtins are unavailable because they require both a minimum
> >> CPU and the enablement of VSX instructions.  This patch does just that.
> > ...
> >> gcc/
> >>* config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin): Change
> >>error messages for ENB_P8V and ENB_P9V.
> >> ---
> >>  gcc/config/rs6000/rs6000-call.c | 6 --
> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/gcc/config/rs6000/rs6000-call.c 
> >> b/gcc/config/rs6000/rs6000-call.c
> >> index 85fec80c6d7..035266eb001 100644
> >> --- a/gcc/config/rs6000/rs6000-call.c
> >> +++ b/gcc/config/rs6000/rs6000-call.c
> >> @@ -11943,7 +11943,8 @@ rs6000_invalid_new_builtin (enum 
> >> rs6000_gen_builtins fncode)
> >>error ("%qs requires the %qs option", name, "-mcpu=power8");
> >>break;
> >>  case ENB_P8V:
> >> -  error ("%qs requires the %qs option", name, "-mpower8-vector");
> >> +  error ("%qs requires the %qs and %qs options", name, "-mcpu=power8",
> >> +   "-mvsx");
> > "-mcpu=power8" itself enables "-mvsx", doesn't it?
> 
> Of course, but it can be disabled with -mno-vsx.  Then you get this error.
> You won't get it unless you deliberately did something strange with the
> compile options.
> 
> >
> >>break;
> >>  case ENB_P9:
> >>error ("%qs requires the %qs option", name, "-mcpu=power9");
> >> @@ -11953,7 +11954,8 @@ rs6000_invalid_new_builtin (enum 
> >> rs6000_gen_builtins fncode)
> >> name, "-mcpu=power9", "-m64", "-mpowerpc64");
> >>break;
> >>  case ENB_P9V:
> >> -  error ("%qs requires the %qs option", name, "-mpower9-vector");
> >> +  error ("%qs requires the %qs and %qs options", name, "-mcpu=power9",
> >> +   "-mvsx");
> > Similarly, "-mcpu=power9" itself enables "-mvsx", doesn't it?
> >
> > Are you trying to also say "don't use -mno-vsx"?  If so, maybe s/and/with/
> > would be slightly less confusing? This is going to be awkward unless it can
> > be more precise, like two messages depending on actual context:
> > - with "-mcpu=power8 -mno-vsx:  "...requires -mvsx".
> > - without "-mcpu=power8":  "...requires -mcpu=power8".
> 
> This seems like a YMMV situation...I don't see the confusion myself.

I guess I'm being pedantic.  "requires -mcpu=power8 and -mvsx" is not
accurate from a user's point a view, as "-mcpu=power8" is sufficient,
since "-mvsx" is enabled when "-mcpu=power8" is specified.

The real "requires" is "-mcpu=power8" and no "-mno-vsx".

(I'm just picturing myself fumbling around in a Makefile written by
somebody else. ;-)

It's not a strong objection, since specifying "-mno-vsx" should be
uncommon.  (Right?)  And, specifying "-mcpu=power8 -mvsx" is harmless.

PC


Re: [RFC] c++: Print function template parms when relevant (was: [PATCH v4] c++: Add gnu::diagnose_as attribute)

2021-11-17 Thread Jason Merrill via Gcc-patches

On 11/17/21 04:04, Matthias Kretz wrote:

On Wednesday, 17 November 2021 07:09:18 CET Jason Merrill wrote:

-  if (CHECKING_P)
-SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (a, TREE_VEC_LENGTH (a));
+  SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (a, nondefault);


should have been

if (CHECKING_P || nondefault != TREE_VEC_LENGTH (a))
SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (a, nondefault);


TBH, I don't understand the purpose of CHECKING_P here, or rather it makes me
nervous because AFAIU I'm only testing with CHECKING_P enabled. Why make
behavior dependent on CHECKING_P? I expected CHECKING_P to basically only add
more assertions.


The idea when NON_DEFAULT_TEMPLATE_ARGS_COUNT was added years back was 
to leave the TREE_CHAIN null when !CHECKING_P and treat that as 
equivalent to TREE_VEC_LENGTH (args).  But perhaps you're right that 
it's not a savings worth the complexity.



(copy_template_args): Jason?


Only copy the non-default template args count on TREE_VECs that should
have it.


Why not simply set the count on all args? Is it a performance concern? The
INTEGER_CST the TREE_CHAIN has to point to exists anyway, so it's not wasting
any memory, right?


In this case the TREE_VEC we're excluding is the one wrapping multiple 
levels of template args; it doesn't contain args directly, so setting 
NON_DEFAULT_ARGS_COUNT on it doesn't make sense.



+  /* Pretty print only template instantiations. Don't pretty print
explicit
+ specializations like 'template <> void fun (int)'.


This seems like a significant change of behavior unrelated to printing
default template arguments.  What's the rationale for handling
specializations differently from instantiations?


Right, this is about "The general idea of this change is to print template
parms wherever they would appear in the source code as well".

Initially, the change to print function template arguments/parameters only if
the args were explicitly specified lead to printing 'void fun (T) [with T =
...]' or 'template <> void fun (int)'. Both are not telling the full story,
even if the former is how the function would be called.


and the latter is how I expect the specialization to be declared, not 
with the deducible template argument made explicit.



But if the reader
should quickly recognize what code is getting called, it is helpful to see
right away that a function template specialization is called. (It might also
reveal an implementation detail of a library, so it's not 100% obvious how to
choose here.) Also, saying 'T = int' is kind of wrong. Yes, 'int' was deduced.
But there's no T in fun:

template  void fun (T);
template <> void fun (int);


There's a T in the template, and as you said above, that's how it's 
called (and mangled).



__FUNCTION__ was 'fun' all the time, but __PRETTY_FUNCTION__ was 'void
fun(T) [with T = int]'.


Isn't that true for instantiations, as well?


It's more consistent that __PRETTY_FUNCTION__ contains __FUNCTION__, IMHO


I suppose, but I don't see that as a strong enough motivation to mix 
this up.



so it would have to be at least 'void fun(T) [with T
= int]'. But that's strange: How it uses T and int for the same type. So I
settled on 'void fun(int)'.


I also don't understand the purpose of TFF_AS_PRIMARY.


dump_function_decl generalizes the TEMPLATE_DECL (if flag_pretty_templates is
true) and, before this change, passes the generalized TEMPLATE_DECL to
dump_type (... DECL_CONTEXT (t) ...) and dump_function_name (... t ...).
That's why the whole template is printed as primary template (i.e. with
template parms instead of template args, as is needed for
flag_pretty_templates). But this drops the count of non-default template args.


Ah, you're trying to omit defaulted parms from the ?  I'm not sure 
that's necessary, leaving them out of the [with ...] list should be 
sufficient.



To retain the count, dump_type and dump_function_name need to be called with
the original TEMPLATE_DECL. But if I do this, pretty-templates is broken.
'template  struct A { template  void f(T, U); };' would
print as 'A::f(T, U) [with U = float, T = int]'. To get back to
'A::f(T, U) [with U = float, T = int]' I needed to tell
dump_template_parms that even though the template args are there, it should
print only the template parms. The most obvious way to do that was to carry it
through via flags.

Note that this creates another problem. Given

template  struct Outer {
   template  struct A;
   template  struct A {
 void f();
   };
};

we want to print e.g. 'void Outer::A::f() [with X = int, T0 =
int]', but certainly not 'void Outer::A::f() [with X = int, T0 =
int]'. However, specialized_t holds A which is printed as A
with TFF_AS_PRIMARY. Only most_general_template of the function's
TEMPLATE_DECL can give us A as DECL_CONTEXT.

I have a solution in the diagnose_as patch, where I had to solve a similar
problem because for the diagnose_as attribute (dump_template_scope).


+/* Print function template parameters if:
+   1. t is template, and

Re: [PATCH] restore ancient -Waddress for weak symbols [PR33925]

2021-11-17 Thread Jason Merrill via Gcc-patches

On 11/16/21 20:11, Martin Sebor wrote:

On 11/16/21 1:23 PM, Jason Merrill wrote:

On 10/23/21 19:06, Martin Sebor wrote:

On 10/4/21 3:37 PM, Jason Merrill wrote:

On 10/4/21 14:42, Martin Sebor wrote:

While resolving the recent -Waddress enhancement request (PR
PR102103) I came across a 2007 problem report about GCC 4 having
stopped warning for using the address of inline functions in
equality comparisons with null.  With inline functions being
commonplace in C++ this seems like an important use case for
the warning.

The change that resulted in suppressing the warning in these
cases was introduced inadvertently in a fix for PR 22252.

To restore the warning, the attached patch enhances
the decl_with_nonnull_addr_p() function to return true also for
weak symbols for which a definition has been provided.


I think you probably want to merge this function with 
fold-const.c:maybe_nonzero_address, which already handles more cases.


maybe_nonzero_address() doesn't behave quite like
decl_with_nonnull_addr_p() expects and I'm reluctant to muck
around with the former too much since it's used for codegen,
while the latter just for warnings.  (There is even a case
where the functions don't behave the same, and would result
in different warnings between C and C++ without some extra
help.)

So in the attached revision I just have maybe_nonzero_address()
call decl_with_nonnull_addr_p() and then refine the failing
(or uncertain) cases separately, with some overlap between
them.

Since I worked on this someone complained that some instances
of the warning newly enhanced under PR102103 aren't suppresed
in code resulting from macro expansion.  Since it's trivial,
I include the fix for that report in this patch as well.



+   allocated stroage might have a null address.  */


typo.

OK with that fixed.


After retesting the patch before committing I noticed it triggers
a regression in weak/weak-3.c that I missed the first time around.
Here's the test case:

extern void * ffoo1f (void);
void * foo1f (void)
{
   if (ffoo1f) /* { dg-warning "-Waddress" } */
     ffoo1f ();
   return 0;
}

void * ffoox1f (void) { return (void *)0; }
extern void * ffoo1f (void)  __attribute__((weak, alias ("ffoox1f")));

The unexpected error is:

a.c: At top level:
a.c:1:15: error: ‘ffoo1f’ declared weak after being used
     1 | extern void * ffoo1f (void);
   |   ^~

The error is caused by the new call to maybe_nonzero_address()
made from decl_with_nonnull_addr_p().  The call registers
the symbol as used.

So unless the error is desirable for this case I think it's
best to go back to the originally proposed solution.  I attach
it for reference and will plan to commit it tomorrow unless I
hear otherwise.


Hmm, the error seems correct to me: we tested whether the address is 
nonzero in the dg-warning line, and presumably evaluating that test 
could depend on the absence of weak.



PS I don't know enough about the logic behind issuing this error
in other situations to tell for sure that it's wrong in this one
but I see no difference in the emitted code for a case in the same
test that declares the alias first, before taking its address and
that's accepted and this one.  I also checked that both Clang and
ICC accept the code either way, so I'm inclined to think the error
would be a bug.




Re: [PATCH v3] c-family: Add __builtin_assoc_barrier

2021-11-17 Thread Jason Merrill via Gcc-patches

On 11/11/21 03:49, Matthias Kretz wrote:

On Wednesday, 8 September 2021 15:49:27 CET Matthias Kretz wrote:

On Wednesday, 8 September 2021 15:44:28 CEST Jason Merrill wrote:

On 9/8/21 5:37 AM, Matthias Kretz wrote:

On Tuesday, 7 September 2021 19:36:22 CEST Jason Merrill wrote:

case PAREN_EXPR:
-  RETURN (finish_parenthesized_expr (RECUR (TREE_OPERAND (t,
0;
+  if (REF_PARENTHESIZED_P (t))
+   RETURN (finish_parenthesized_expr (RECUR (TREE_OPERAND (t,
0;
+  else
+   RETURN (RECUR (TREE_OPERAND (t, 0)));


I think you need to build a new PAREN_EXPR in the assoc barrier case as
well, for it to have any effect in templates.


My intent was to ignore __builtin_assoc_barrier in templates / constexpr
evaluation since it's not affected by -fassociative-math anyway. Or do
you
mean something else?


I agree about constexpr, but why wouldn't template instantiations be
affected by -fassociative-math like any other function?


Oh, that seems like a major misunderstanding on my part. I assumed
tsubst_copy_and_build would evaluate the expressions in template arguments
🤦. I'll expand the test and will fix.


Sorry for the long delay. New patch is attached. OK for trunk?


OK.


New builtin to enable explicit use of PAREN_EXPR in C & C++ code.

Signed-off-by: Matthias Kretz 

gcc/testsuite/ChangeLog:

 * c-c++-common/builtin-assoc-barrier-1.c: New test.

gcc/cp/ChangeLog:

 * constexpr.c (cxx_eval_constant_expression): Handle PAREN_EXPR
 via cxx_eval_constant_expression.
 * cp-objcp-common.c (names_builtin_p): Handle
 RID_BUILTIN_ASSOC_BARRIER.
 * cp-tree.h: Adjust TREE_LANG_FLAG documentation to include
 PAREN_EXPR in REF_PARENTHESIZED_P.
 (REF_PARENTHESIZED_P): Add PAREN_EXPR.
 * parser.c (cp_parser_postfix_expression): Handle
 RID_BUILTIN_ASSOC_BARRIER.
 * pt.c (tsubst_copy_and_build): If the PAREN_EXPR is not a
 parenthesized initializer, build a new PAREN_EXPR.
 * semantics.c (force_paren_expr): Simplify conditionals. Set
 REF_PARENTHESIZED_P on PAREN_EXPR.
 (maybe_undo_parenthesized_ref): Test PAREN_EXPR for
 REF_PARENTHESIZED_P.

gcc/c-family/ChangeLog:

 * c-common.c (c_common_reswords): Add __builtin_assoc_barrier.
 * c-common.h (enum rid): Add RID_BUILTIN_ASSOC_BARRIER.

gcc/c/ChangeLog:

 * c-decl.c (names_builtin_p): Handle RID_BUILTIN_ASSOC_BARRIER.
 * c-parser.c (c_parser_postfix_expression): Likewise.

gcc/ChangeLog:

 * doc/extend.texi: Document __builtin_assoc_barrier.
---
  gcc/c-family/c-common.c   |  1 +
  gcc/c-family/c-common.h   |  2 +-
  gcc/c/c-decl.c|  1 +
  gcc/c/c-parser.c  | 20 ++
  gcc/cp/constexpr.c|  8 +++
  gcc/cp/cp-objcp-common.c  |  1 +
  gcc/cp/cp-tree.h  | 12 ++--
  gcc/cp/parser.c   | 14 
  gcc/cp/pt.c   | 10 ++-
  gcc/cp/semantics.c| 23 ++
  gcc/doc/extend.texi   | 18 +
  .../c-c++-common/builtin-assoc-barrier-1.c| 71 +++
  12 files changed, 158 insertions(+), 23 deletions(-)
  create mode 100644 gcc/testsuite/c-c++-common/builtin-assoc-barrier-1.c






Re: [PATCH] c++: implicit dummy object in requires clause [PR103198]

2021-11-17 Thread Jason Merrill via Gcc-patches

On 11/11/21 20:25, Patrick Palka wrote:

In the testcase below satisfaction misbehaves for f and g ultimately
because find_template_parameters fails to notice that the constraint
'val.x' depends on the template parameters of the class template.
In contrast, satisfaction works just fine for h.

The problem seems to come down to a difference in how any_template_parm_r
handles 'this' vs a dummy object: we walk TREE_TYPE of the former but
not the latter, and this causes us to miss the tparm dependencies in
f/g's constraints since in their case the implicit object parameter
through which we access 'val' is a dummy object.  (For h, since we know
it's a non-static member function when parsing its trailing constraints,
the implicit object parameter is 'this' instead of a dummy object.)

This patch fixes this inconsistency by making any_template_parm_r also
walk into the TREE_TYPE of a dummy object, as is already done for
'this'.

Bootstrapped and regtested on x86_64-pc-linux-gnu, also tested on
cmcstl2 and range-v3, does this look OK for trunk and 11?

PR c++/103198

gcc/cp/ChangeLog:

* pt.c (any_template_parm_r): Walk the TREE_TYPE of a dummy
object.


Should we handle CONVERT_EXPR with the various casts in cp_walk_subtrees?


gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-this1.C: New test.
---
  gcc/cp/pt.c |  5 
  gcc/testsuite/g++.dg/cpp2a/concepts-this1.C | 30 +
  2 files changed, 35 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-this1.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 82bf7dc26f6..fa55857d783 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10766,6 +10766,11 @@ any_template_parm_r (tree t, void *data)
WALK_SUBTREE (TREE_TYPE (t));
break;
  
+case CONVERT_EXPR:

+  if (is_dummy_object (t))
+   WALK_SUBTREE (TREE_TYPE (t));
+  break;
+
  default:
break;
  }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-this1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-this1.C
new file mode 100644
index 000..d717028201a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-this1.C
@@ -0,0 +1,30 @@
+// PR c++/103198
+// { dg-do compile { target c++20 } }
+
+template
+struct A {
+  T val;
+
+  template
+requires requires { val.x; }
+  void f(U);
+
+  static void g(int)
+requires requires { val.x; };
+
+  void h(int)
+requires requires { val.x; };
+};
+
+struct B { int x; };
+struct C { };
+
+int main() {
+  A().f(0);
+  A().g(0);
+  A().h(0);
+
+  A().f(0); // { dg-error "no match" }
+  A().g(0); // { dg-error "no match" }
+  A().h(0); // { dg-error "no match" }
+}





Re: [PATCH v2] rs6000: Test case adjustments for new builtins

2021-11-17 Thread Segher Boessenkool
On Wed, Nov 17, 2021 at 07:52:38AM -0600, Bill Schmidt wrote:
> >>  - For int_128bit-runnable.c, I chose not to do gimple folding on the 
> >> 128-bit
> >>comparison operations in the new implementation, because doing so 
> >> results in
> >>bad code that splits things into two 64-bit values.  That needs separate
> >>attention; but the point here is, when I did that, I started generating
> >>more of the vcmpequq, vcmpgtsq, and vcmpgtuq instructions.
> > And you now get worse code (albeit in some cases no longer invalid)?
> 
> No, sorry that this wasn't more clear.  The "old" builtins code performs
> gimple folding on 128-bit compares.  This results in correct but very
> inefficient code.  The "new" builtins code has removed the gimple folding
> for 128-bit compares.  This results in directly generating vcmpequq and
> friends, which is the efficient code we're looking for.  This test case
> then needs modification to show we're doing better.  I'll submit this
> separately.

Hrm.  Folding should always be a good thing to do; and folding should
never split an operation on a 128-bit datum into two operations on
64-bit things.  That kind of optimisation cannot be sanely done on
Gimple level: the abstractions are not close enough to the hardware for
that, and the instruction stream is not close at all to what the
eventual machine insns will be.  We have an RTL pass that does this
("subreg"), it runs almost immediately after expand (and two more
times, even again after the split pass).

So there is a generic bug that you counteract with a target bug :-(

> >> --- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
> >> +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
> >> @@ -14,7 +14,7 @@ get_exponent (double *p)
> >>  {
> >>double source = *p;
> >>  
> >> -  return scalar_extract_exp (source); /* { dg-error 
> >> "'__builtin_vec_scalar_extract_exp' is not supported in this compiler 
> >> configuration" } */
> >> +  return scalar_extract_exp (source); /* { dg-error 
> >> "'__builtin_vsx_scalar_extract_exp' requires the" } */
> >>  }
> > The testcase uses __builtin_vec_scalar_extract_exp, so this is not okay.
> 
> Sorry, this is a case of my bad eyesight not identifying this had changed.
> As with the test case (cmpb-3.c) in the 32-bit patch, this error message
> isn't all that the user sees.  There is also a "note" diagnostic that ties
> the generic overload name to the specific underlying builtin name so that
> confusion is avoided.  I'll just submit these separately with a full
> explanation.

Can't you go just two inches further and report the actual builtin used
by the user (which even is documented!), and not cause any confusion?

> > It is not okay to blindly adjust the testcases to accept what the new
> > code does.  This is a regression.  It is okay to have it regressed for a
> > while.  It is also okay to xfail things, if there is no expectation it
> > can be fixed before the next release (or some other suitably big time
> > frame, this isn't an exact science).
> 
> This isn't really a regression, as I'll describe with each patch.

Looking forward to it :-)

> >> --- a/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c
> >> +++ b/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c
> >> @@ -10,5 +10,5 @@
> >>  int
> >>  test_byte_in_set (unsigned char b, unsigned long long set_members)
> >>  {
> >> -  return __builtin_byte_in_set (b, set_members); /* { dg-warning 
> >> "implicit declaration of function" } */
> >> +  return __builtin_byte_in_set (b, set_members); /* { dg-error 
> >> "'__builtin_scalar_byte_in_set' requires the" } */
> >>  }
> > Huh.  How can the old warning ever have fired?  Was the builtin not
> > declared on 32-bit before?  Ouch.
> 
> I'll remind myself what changed here, but yes, that's what it looks like --
> an inadvertent problem with the old logic for 32-bit.

In general it is better to always have all builtins (and other
interfaces) declared internally, so that you can give much better error
messages (and so that you get errors if there are conflicts, etc.)

There can be exceptions, but this is not a case like that :-)  (So your
change is great :-) )

> >> --- a/gcc/testsuite/gcc.target/powerpc/pr80315-2.c
> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr80315-2.c
> >> @@ -10,6 +10,6 @@ main ()
> >>int mask;
> >>  
> >>/* Argument 2 must be 0 or 1.  Argument 3 must be in range 0..15.  */
> >> -  res = __builtin_crypto_vshasigmad (test, 1, 0xff); /* { dg-error 
> >> {argument 3 must be in the range \[0, 15\]} } */
> >> +  res = __builtin_crypto_vshasigmad (test, 1, 0xff); /* { dg-error 
> >> {argument 3 must be a 4-bit unsigned literal} } */
> >>return 0;
> >>  }
> > Hrm, make this say "must be a literal between 0 and 15, inclusive" like
> > the other errors?
> 
> The "n-bit unsigned literal" is the usual case.  I'll provide more explanation
> in the separate patch.

We should use the same formulation always.  I like the mo

Re: [PATCH] restore ancient -Waddress for weak symbols [PR33925]

2021-11-17 Thread Martin Sebor via Gcc-patches

On 11/17/21 11:31 AM, Jason Merrill wrote:

On 11/16/21 20:11, Martin Sebor wrote:

On 11/16/21 1:23 PM, Jason Merrill wrote:

On 10/23/21 19:06, Martin Sebor wrote:

On 10/4/21 3:37 PM, Jason Merrill wrote:

On 10/4/21 14:42, Martin Sebor wrote:

While resolving the recent -Waddress enhancement request (PR
PR102103) I came across a 2007 problem report about GCC 4 having
stopped warning for using the address of inline functions in
equality comparisons with null.  With inline functions being
commonplace in C++ this seems like an important use case for
the warning.

The change that resulted in suppressing the warning in these
cases was introduced inadvertently in a fix for PR 22252.

To restore the warning, the attached patch enhances
the decl_with_nonnull_addr_p() function to return true also for
weak symbols for which a definition has been provided.


I think you probably want to merge this function with 
fold-const.c:maybe_nonzero_address, which already handles more cases.


maybe_nonzero_address() doesn't behave quite like
decl_with_nonnull_addr_p() expects and I'm reluctant to muck
around with the former too much since it's used for codegen,
while the latter just for warnings.  (There is even a case
where the functions don't behave the same, and would result
in different warnings between C and C++ without some extra
help.)

So in the attached revision I just have maybe_nonzero_address()
call decl_with_nonnull_addr_p() and then refine the failing
(or uncertain) cases separately, with some overlap between
them.

Since I worked on this someone complained that some instances
of the warning newly enhanced under PR102103 aren't suppresed
in code resulting from macro expansion.  Since it's trivial,
I include the fix for that report in this patch as well.



+   allocated stroage might have a null address.  */


typo.

OK with that fixed.


After retesting the patch before committing I noticed it triggers
a regression in weak/weak-3.c that I missed the first time around.
Here's the test case:

extern void * ffoo1f (void);
void * foo1f (void)
{
   if (ffoo1f) /* { dg-warning "-Waddress" } */
 ffoo1f ();
   return 0;
}

void * ffoox1f (void) { return (void *)0; }
extern void * ffoo1f (void)  __attribute__((weak, alias ("ffoox1f")));

The unexpected error is:

a.c: At top level:
a.c:1:15: error: ‘ffoo1f’ declared weak after being used
 1 | extern void * ffoo1f (void);
   |   ^~

The error is caused by the new call to maybe_nonzero_address()
made from decl_with_nonnull_addr_p().  The call registers
the symbol as used.

So unless the error is desirable for this case I think it's
best to go back to the originally proposed solution.  I attach
it for reference and will plan to commit it tomorrow unless I
hear otherwise.


Hmm, the error seems correct to me: we tested whether the address is 
nonzero in the dg-warning line, and presumably evaluating that test 
could depend on the absence of weak.


Sorry, I don't know enough yet to judge this.

Since the error is unrelated to what I'm fixing I would prefer
not to introduce it in the same patch.  I'm happy to open
a separate bug for the missing error for the test case above,
look some more into why it isn't issued, and if it's decided
the error is intended either add the call back to trigger it
or do whatever else may be more appropriate).

Are you okay with me going ahead and committing the most recent
patch as is?

If not, do you want me to commit the previous version and change
the weak-3.c test to expect the error?

Martin




PS I don't know enough about the logic behind issuing this error
in other situations to tell for sure that it's wrong in this one
but I see no difference in the emitted code for a case in the same
test that declares the alias first, before taking its address and
that's accepted and this one.  I also checked that both Clang and
ICC accept the code either way, so I'm inclined to think the error
would be a bug.






[PATCH] i386: Introduce LEGACY_SSE_REGNO_P predicate

2021-11-17 Thread Uros Bizjak via Gcc-patches
Introduce LEGACY_SSE_REGNO_P predicate to simplify a couple of places.

No functional changes.

2021-11-17  Uroš Bizjak  

gcc/ChangeLog:

* config/i386/i386.h (LEGACY_SSE_REGNO_P): New predicate.
(SSE_REGNO_P): Use LEGACY_SSE_REGNO_P predicate.
* config/i386/i386.c (zero_all_vector_registers):
Use LEGACY_SSE_REGNO_P predicate.
(ix86_register_priority): Use REX_INT_REGNO_P, REX_SSE_REGNO_P
and EXT_REG_SSE_REGNO_P predicates.
(ix86_hard_regno_call_part_clobbered): Use REX_SSE_REGNO_P
and LEGACY_SSE_REGNO_P predicates.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 73c4d5115bb..0c5439dc7a7 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3665,7 +3665,7 @@ zero_all_vector_registers (HARD_REG_SET 
need_zeroed_hardregs)
 return NULL;
 
   for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
+if ((LEGACY_SSE_REGNO_P (regno)
 || (TARGET_64BIT
 && (REX_SSE_REGNO_P (regno)
 || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)
@@ -19089,15 +19089,13 @@ ix86_register_priority (int hard_regno)
 return 0;
   if (hard_regno == BP_REG)
 return 1;
-  /* New x86-64 int registers result in bigger code size.  Discourage
- them.  */
-  if (IN_RANGE (hard_regno, FIRST_REX_INT_REG, LAST_REX_INT_REG))
+  /* New x86-64 int registers result in bigger code size.  Discourage them.  */
+  if (REX_INT_REGNO_P (hard_regno))
 return 2;
-  /* New x86-64 SSE registers result in bigger code size.  Discourage
- them.  */
-  if (IN_RANGE (hard_regno, FIRST_REX_SSE_REG, LAST_REX_SSE_REG))
+  /* New x86-64 SSE registers result in bigger code size.  Discourage them.  */
+  if (REX_SSE_REGNO_P (hard_regno))
 return 2;
-  if (IN_RANGE (hard_regno, FIRST_EXT_REX_SSE_REG, LAST_EXT_REX_SSE_REG))
+  if (EXT_REX_SSE_REGNO_P (hard_regno))
 return 1;
   /* Usage of AX register results in smaller code.  Prefer it.  */
   if (hard_regno == AX_REG)
@@ -19974,9 +19972,8 @@ ix86_hard_regno_call_part_clobbered (unsigned int 
abi_id, unsigned int regno,
   /* Special ABI for vzeroupper which only clobber higher part of sse regs.  */
   if (abi_id == ABI_VZEROUPPER)
   return (GET_MODE_SIZE (mode) > 16
- && ((TARGET_64BIT
-  && (IN_RANGE (regno, FIRST_REX_SSE_REG, LAST_REX_SSE_REG)))
- || (IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG;
+ && ((TARGET_64BIT && REX_SSE_REGNO_P (regno))
+ || LEGACY_SSE_REGNO_P (regno)));
 
   return SSE_REGNO_P (regno) && GET_MODE_SIZE (mode) > 16;
 }
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e35c79c192c..2fda1e0686e 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1409,10 +1409,13 @@ enum reg_class
 
 #define SSE_REG_P(X) (REG_P (X) && SSE_REGNO_P (REGNO (X)))
 #define SSE_REGNO_P(N) \
-  (IN_RANGE ((N), FIRST_SSE_REG, LAST_SSE_REG) \
+  (LEGACY_SSE_REGNO_P (N)  \
|| REX_SSE_REGNO_P (N)  \
|| EXT_REX_SSE_REGNO_P (N))
 
+#define LEGACY_SSE_REGNO_P(N) \
+  IN_RANGE ((N), FIRST_SSE_REG, LAST_SSE_REG)
+
 #define REX_SSE_REGNO_P(N) \
   IN_RANGE ((N), FIRST_REX_SSE_REG, LAST_REX_SSE_REG)
 


Re: [PATCH v1 2/2] RISC-V: Add instruction fusion (for ventana-vt1)

2021-11-17 Thread Palmer Dabbelt
[This is my first time trying my Rivos address on the lists, so sorry if 
something goes off the rails.]


On Wed, 17 Nov 2021 06:05:04 PST (-0800), gcc-patches@gcc.gnu.org wrote:

Hi Philipp:

Thanks for the patch, I like this approach, that can easily configure
different capabilities for each core :)

So there are only a few minor comments for this patch.

On Mon, Nov 15, 2021 at 5:49 AM Philipp Tomsich
 wrote:


From: Philipp Tomsich 

The Ventana VT1 core supports quad-issue and instruction fusion.
This implemented TARGET_SCHED_MACRO_FUSION_P to keep fusible sequences
together and adds idiom matcheing for the supported fusion cases.


There's a typo at "matcheing".



gcc/ChangeLog:

* config/riscv/riscv.c (enum riscv_fusion_pairs): Add symbolic
constants to identify supported fusion patterns.
(struct riscv_tune_param): Add fusible_op field.
(riscv_macro_fusion_p): Implement.
(riscv_fusion_enabled_p): Implement.
(riscv_macro_fusion_pair_p): Implement and recoginze fusible
idioms for Ventana VT1.
(TARGET_SCHED_MACRO_FUSION_P): Point to riscv_macro_fusion_p.
(TARGET_SCHED_MACRO_FUSION_PAIR_P): Point to riscv_macro_fusion_pair_p.

Signed-off-by: Philipp Tomsich 


This doesn't match the From (though admittedly I'm pretty new to the SoB 
stuff in GCC, so I'm not sure if that's even a rule here).



---

 gcc/config/riscv/riscv.c | 196 +++
 1 file changed, 196 insertions(+)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 6b918db65e9..8eac52101a3 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -211,6 +211,19 @@ struct riscv_integer_op {
The worst case is LUI, ADDI, SLLI, ADDI, SLLI, ADDI, SLLI, ADDI.  */
 #define RISCV_MAX_INTEGER_OPS 8

+enum riscv_fusion_pairs
+{
+  RISCV_FUSE_NOTHING = 0,
+  RISCV_FUSE_ZEXTW = (1 << 0),
+  RISCV_FUSE_ZEXTH = (1 << 1),
+  RISCV_FUSE_ZEXTWS = (1 << 2),
+  RISCV_FUSE_LDINDEXED = (1 << 3),


RISCV_FUSE_LDINDEXED -> RISCV_FUSE_LD_INDEXED

Could you add some comment for above enums, like that:
/* slli rx, rx, 32 + srli rx, rx, 32 */
RISCV_FUSE_ZEXTW

So that we could know what kind of instruction will be funded for this enum.


+  RISCV_FUSE_LUI_ADDI = (1 << 4),
+  RISCV_FUSE_AUIPC_ADDI = (1 << 5),
+  RISCV_FUSE_LUI_LD = (1 << 6),
+  RISCV_FUSE_AUIPC_LD = (1 << 7),
+};
+
 /* Costs of various operations on the different architectures.  */

 struct riscv_tune_param
@@ -224,6 +237,7 @@ struct riscv_tune_param
   unsigned short branch_cost;
   unsigned short memory_cost;
   bool slow_unaligned_access;
+  unsigned int fusible_ops;
 };

 /* Information about one micro-arch we know about.  */
@@ -289,6 +303,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   3,   /* branch_cost */
   5,   /* memory_cost */
   true,/* 
slow_unaligned_access */
+  RISCV_FUSE_NOTHING,   /* fusible_ops */
 };


There's some tab/space issues here (and in the below ones).  They align 
when merged, but the new lines are spaces-only and the old ones have 
internal spaces mixed with tabs (IIRC that's to the GCC style, if not we 
should fix these to at least be consistent).




 /* Costs to use when optimizing for Sifive 7 Series.  */
@@ -302,6 +317,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   4,   /* branch_cost */
   3,   /* memory_cost */
   true,/* 
slow_unaligned_access */
+  RISCV_FUSE_NOTHING,   /* fusible_ops */
 };

 /* Costs to use when optimizing for T-HEAD c906.  */
@@ -328,6 +344,7 @@ static const struct riscv_tune_param 
optimize_size_tune_info = {
   1,   /* branch_cost */
   2,   /* memory_cost */
   false,   /* slow_unaligned_access */
+  RISCV_FUSE_NOTHING,   /* fusible_ops */
 };

 /* Costs to use when optimizing for Ventana Micro VT1.  */
@@ -341,6 +358,10 @@ static const struct riscv_tune_param ventana_vt1_tune_info 
= {
   4,   /* branch_cost */
   5,   /* memory_cost */
   false,   /* slow_unaligned_access */
+  ( RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH |   /* fusible_ops */
+RISCV_FUSE_ZEXTWS | RISCV_FUSE_LDINDEXED |
+RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI |
+RISCV_FUSE_LUI_LD | RISCV_FUSE_AUIPC_LD )
 };

 static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
@@ -4909,6 +4930,177 @@ riscv_issue_rate (void)
   return tune_param->issue_rate;
 }

+/* Implement TARGET_SCHED_MACRO_FUSION_P.  Return true if targ

Re: [PATCH v1 2/2] RISC-V: Add instruction fusion (for ventana-vt1)

2021-11-17 Thread Philipp Tomsich
On Wed, 17 Nov 2021 at 20:40, Palmer Dabbelt  wrote:

> [This is my first time trying my Rivos address on the lists, so sorry if
> something goes off the rails.]
>
> On Wed, 17 Nov 2021 06:05:04 PST (-0800), gcc-patches@gcc.gnu.org wrote:
> > Hi Philipp:
> >
> > Thanks for the patch, I like this approach, that can easily configure
> > different capabilities for each core :)
> >
> > So there are only a few minor comments for this patch.
> >
> > On Mon, Nov 15, 2021 at 5:49 AM Philipp Tomsich
> >  wrote:
> >>
> >> From: Philipp Tomsich 
> >>
> >> The Ventana VT1 core supports quad-issue and instruction fusion.
> >> This implemented TARGET_SCHED_MACRO_FUSION_P to keep fusible sequences
> >> together and adds idiom matcheing for the supported fusion cases.
>
> There's a typo at "matcheing".
>
> >>
> >> gcc/ChangeLog:
> >>
> >> * config/riscv/riscv.c (enum riscv_fusion_pairs): Add symbolic
> >> constants to identify supported fusion patterns.
> >> (struct riscv_tune_param): Add fusible_op field.
> >> (riscv_macro_fusion_p): Implement.
> >> (riscv_fusion_enabled_p): Implement.
> >> (riscv_macro_fusion_pair_p): Implement and recoginze fusible
> >> idioms for Ventana VT1.
> >> (TARGET_SCHED_MACRO_FUSION_P): Point to riscv_macro_fusion_p.
> >> (TARGET_SCHED_MACRO_FUSION_PAIR_P): Point to
> riscv_macro_fusion_pair_p.
> >>
> >> Signed-off-by: Philipp Tomsich 
>
> This doesn't match the From (though admittedly I'm pretty new to the SoB
> stuff in GCC, so I'm not sure if that's even a rule here).
>

I noticed that I hadn't reset the authors and that patman had inserted a
Signed-off-by: for that reason, right after I sent this out.
Given that it's all me and there's both individual assignment paperwork and
company disclaimers on file for all of the email-addresses, this should be
fine.

>> ---
> >>
> >>  gcc/config/riscv/riscv.c | 196 +++
> >>  1 file changed, 196 insertions(+)
> >>
> >> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> >> index 6b918db65e9..8eac52101a3 100644
> >> --- a/gcc/config/riscv/riscv.c
> >> +++ b/gcc/config/riscv/riscv.c
> >> @@ -211,6 +211,19 @@ struct riscv_integer_op {
> >> The worst case is LUI, ADDI, SLLI, ADDI, SLLI, ADDI, SLLI, ADDI.  */
> >>  #define RISCV_MAX_INTEGER_OPS 8
> >>
> >> +enum riscv_fusion_pairs
> >> +{
> >> +  RISCV_FUSE_NOTHING = 0,
> >> +  RISCV_FUSE_ZEXTW = (1 << 0),
> >> +  RISCV_FUSE_ZEXTH = (1 << 1),
> >> +  RISCV_FUSE_ZEXTWS = (1 << 2),
> >> +  RISCV_FUSE_LDINDEXED = (1 << 3),
> >
> > RISCV_FUSE_LDINDEXED -> RISCV_FUSE_LD_INDEXED
> >
> > Could you add some comment for above enums, like that:
> > /* slli rx, rx, 32 + srli rx, rx, 32 */
> > RISCV_FUSE_ZEXTW
> >
> > So that we could know what kind of instruction will be funded for this
> enum.
> >
> >> +  RISCV_FUSE_LUI_ADDI = (1 << 4),
> >> +  RISCV_FUSE_AUIPC_ADDI = (1 << 5),
> >> +  RISCV_FUSE_LUI_LD = (1 << 6),
> >> +  RISCV_FUSE_AUIPC_LD = (1 << 7),
> >> +};
> >> +
> >>  /* Costs of various operations on the different architectures.  */
> >>
> >>  struct riscv_tune_param
> >> @@ -224,6 +237,7 @@ struct riscv_tune_param
> >>unsigned short branch_cost;
> >>unsigned short memory_cost;
> >>bool slow_unaligned_access;
> >> +  unsigned int fusible_ops;
> >>  };
> >>
> >>  /* Information about one micro-arch we know about.  */
> >> @@ -289,6 +303,7 @@ static const struct riscv_tune_param
> rocket_tune_info = {
> >>3,   /* branch_cost */
> >>5,   /* memory_cost */
> >>true,/*
> slow_unaligned_access */
> >> +  RISCV_FUSE_NOTHING,   /* fusible_ops */
> >>  };
>
> There's some tab/space issues here (and in the below ones).  They align
> when merged, but the new lines are spaces-only and the old ones have
> internal spaces mixed with tabs (IIRC that's to the GCC style, if not we
> should fix these to at least be consistent).
>
> >>
> >>  /* Costs to use when optimizing for Sifive 7 Series.  */
> >> @@ -302,6 +317,7 @@ static const struct riscv_tune_param
> sifive_7_tune_info = {
> >>4,   /* branch_cost */
> >>3,   /* memory_cost */
> >>true,/*
> slow_unaligned_access */
> >> +  RISCV_FUSE_NOTHING,   /* fusible_ops */
> >>  };
> >>
> >>  /* Costs to use when optimizing for T-HEAD c906.  */
> >> @@ -328,6 +344,7 @@ static const struct riscv_tune_param
> optimize_size_tune_info = {
> >>1,   /* branch_cost */
> >>2,   /* memory_cost */
> >>false,   /*
> slow_unaligned_access */
> >> +  RISCV_FUSE_NOTHING,   /* fusible_

[PATCH] x86: Remove "%!" before ret

2021-11-17 Thread H.J. Lu via Gcc-patches
Before MPX was removed, "%!" was mapped to

case '!':
  if (ix86_bnd_prefixed_insn_p (current_output_insn))
fputs ("bnd ", file);
  return;

After CET was added and MPX was removed, "%!" was mapped to

   case '!':
  if (ix86_notrack_prefixed_insn_p (current_output_insn))
fputs ("notrack ", file);
  return;

ix86_notrack_prefixed_insn_p always returns false on ret since the
notrack prefix is only for indirect branches.  Remove the unused "%!"
before ret.

PR target/103307
* config/i386/i386.c (ix86_code_end): Remove "%!" before ret.
(ix86_output_function_return): Likewise.
* config/i386/i386.md (simple_return_pop_internal): Likewise.
---
 gcc/config/i386/i386.c  | 4 ++--
 gcc/config/i386/i386.md | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 73c4d5115bb..95d238e9efc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6116,7 +6116,7 @@ ix86_code_end (void)
   xops[0] = gen_rtx_REG (Pmode, regno);
   xops[1] = gen_rtx_MEM (Pmode, stack_pointer_rtx);
   output_asm_insn ("mov%z0\t{%1, %0|%0, %1}", xops);
-  output_asm_insn ("%!ret", NULL);
+  output_asm_insn ("ret", NULL);
   final_end_function ();
   init_insn_lengths ();
   free_after_compilation (cfun);
@@ -16278,7 +16278,7 @@ ix86_output_function_return (bool long_p)
 }
 
   if (!long_p)
-return "%!ret";
+return "ret";
 
   return "rep%; ret";
 }
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 73d15de88b2..7b2de60706d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14705,7 +14705,7 @@ (define_insn_and_split "simple_return_pop_internal"
   [(simple_return)
(use (match_operand:SI 0 "const_int_operand"))]
   "reload_completed"
-  "%!ret\t%0"
+  "ret\t%0"
   "&& cfun->machine->function_return_type != indirect_branch_keep"
   [(const_int 0)]
   "ix86_split_simple_return_pop_internal (operands[0]); DONE;"
-- 
2.33.1



Re: [PATCH] x86: Remove "%!" before ret

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 17, 2021 at 8:44 PM H.J. Lu  wrote:
>
> Before MPX was removed, "%!" was mapped to
>
> case '!':
>   if (ix86_bnd_prefixed_insn_p (current_output_insn))
> fputs ("bnd ", file);
>   return;
>
> After CET was added and MPX was removed, "%!" was mapped to
>
>case '!':
>   if (ix86_notrack_prefixed_insn_p (current_output_insn))
> fputs ("notrack ", file);
>   return;
>
> ix86_notrack_prefixed_insn_p always returns false on ret since the
> notrack prefix is only for indirect branches.  Remove the unused "%!"
> before ret.
>
> PR target/103307
> * config/i386/i386.c (ix86_code_end): Remove "%!" before ret.
> (ix86_output_function_return): Likewise.
> * config/i386/i386.md (simple_return_pop_internal): Likewise.
> ---
>  gcc/config/i386/i386.c  | 4 ++--
>  gcc/config/i386/i386.md | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 73c4d5115bb..95d238e9efc 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -6116,7 +6116,7 @@ ix86_code_end (void)
>xops[0] = gen_rtx_REG (Pmode, regno);
>xops[1] = gen_rtx_MEM (Pmode, stack_pointer_rtx);
>output_asm_insn ("mov%z0\t{%1, %0|%0, %1}", xops);
> -  output_asm_insn ("%!ret", NULL);
> +  output_asm_insn ("ret", NULL);

This can use fputs.

Uros.

>final_end_function ();
>init_insn_lengths ();
>free_after_compilation (cfun);
> @@ -16278,7 +16278,7 @@ ix86_output_function_return (bool long_p)
>  }
>
>if (!long_p)
> -return "%!ret";
> +return "ret";
>
>return "rep%; ret";
>  }
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 73d15de88b2..7b2de60706d 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -14705,7 +14705,7 @@ (define_insn_and_split "simple_return_pop_internal"
>[(simple_return)
> (use (match_operand:SI 0 "const_int_operand"))]
>"reload_completed"
> -  "%!ret\t%0"
> +  "ret\t%0"
>"&& cfun->machine->function_return_type != indirect_branch_keep"
>[(const_int 0)]
>"ix86_split_simple_return_pop_internal (operands[0]); DONE;"
> --
> 2.33.1
>


[PATCH] i386: Redefine indirect_thunks_used as HARD_REG_SET.

2021-11-17 Thread Uros Bizjak via Gcc-patches
Change indirect_thunks_used to HARD_REG_SET to avoid recalculations
of correct register numbers and allow usage of SET/TEST_HARD_REG_BIT
accessors.

2021-11-17  Uroš Bizjak  

gcc/ChangeLog:

* config/i386/i386.c (indirect_thunks_used): Redefine as HARD_REG_SET.
(ix86_code_end): Use TEST_HARD_REG_BIT on indirect_thunks_used.
(ix86_output_indirect_branch_via_reg): Use SET_HARD_REG_BIT
on indirect_thunks_used.
(ix86_output_indirect_function_return): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0c5439dc7a7..c9129ae25e4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5733,7 +5733,7 @@ static bool indirect_thunk_needed = false;
 
 /* Bit masks of integer registers, which contain branch target, used
by call thunk functions.  */
-static int indirect_thunks_used;
+static HARD_REG_SET indirect_thunks_used;
 
 /* True if return thunk function is needed.  */
 static bool indirect_return_needed = false;
@@ -6030,8 +6030,7 @@ ix86_code_end (void)
 
   for (regno = FIRST_REX_INT_REG; regno <= LAST_REX_INT_REG; regno++)
 {
-  unsigned int i = regno - FIRST_REX_INT_REG + LAST_INT_REG + 1;
-  if ((indirect_thunks_used & (1 << i)))
+  if (TEST_HARD_REG_BIT (indirect_thunks_used, regno))
output_indirect_thunk_function (indirect_thunk_prefix_none,
regno, false);
 }
@@ -6041,7 +6040,7 @@ ix86_code_end (void)
   char name[32];
   tree decl;
 
-  if ((indirect_thunks_used & (1 << regno)))
+  if (TEST_HARD_REG_BIT (indirect_thunks_used, regno))
output_indirect_thunk_function (indirect_thunk_prefix_none,
regno, false);
 
@@ -16014,12 +16013,8 @@ ix86_output_indirect_branch_via_reg (rtx call_op, bool 
sibcall_p)
   != indirect_branch_thunk_inline)
 {
   if (cfun->machine->indirect_branch_type == indirect_branch_thunk)
-   {
- int i = regno;
- if (i >= FIRST_REX_INT_REG)
-   i -= (FIRST_REX_INT_REG - LAST_INT_REG - 1);
- indirect_thunks_used |= 1 << i;
-   }
+   SET_HARD_REG_BIT (indirect_thunks_used, regno);
+
   indirect_thunk_name (thunk_name_buf, regno, need_prefix, false);
   thunk_name = thunk_name_buf;
 }
@@ -16307,7 +16302,7 @@ ix86_output_indirect_function_return (rtx ret_op)
  if (need_thunk)
{
  indirect_return_via_cx = true;
- indirect_thunks_used |= 1 << CX_REG;
+ SET_HARD_REG_BIT (indirect_thunks_used, CX_REG);
}
  fprintf (asm_out_file, "\tjmp\t");
  assemble_name (asm_out_file, thunk_name);


Re: [PATCH] c++: implicit dummy object in requires clause [PR103198]

2021-11-17 Thread Patrick Palka via Gcc-patches
On Wed, 17 Nov 2021, Jason Merrill wrote:

> On 11/11/21 20:25, Patrick Palka wrote:
> > In the testcase below satisfaction misbehaves for f and g ultimately
> > because find_template_parameters fails to notice that the constraint
> > 'val.x' depends on the template parameters of the class template.
> > In contrast, satisfaction works just fine for h.
> > 
> > The problem seems to come down to a difference in how any_template_parm_r
> > handles 'this' vs a dummy object: we walk TREE_TYPE of the former but
> > not the latter, and this causes us to miss the tparm dependencies in
> > f/g's constraints since in their case the implicit object parameter
> > through which we access 'val' is a dummy object.  (For h, since we know
> > it's a non-static member function when parsing its trailing constraints,
> > the implicit object parameter is 'this' instead of a dummy object.)
> > 
> > This patch fixes this inconsistency by making any_template_parm_r also
> > walk into the TREE_TYPE of a dummy object, as is already done for
> > 'this'.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, also tested on
> > cmcstl2 and range-v3, does this look OK for trunk and 11?
> > 
> > PR c++/103198
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.c (any_template_parm_r): Walk the TREE_TYPE of a dummy
> > object.
> 
> Should we handle CONVERT_EXPR with the various casts in cp_walk_subtrees?

This seems to work well too.  But I'm not sure about doing this since
IIUC cp_walk_subtrees is generally supposed to walk subtrees that are
explicitly written in the source code, but when a CONVERT_EXPR
corresponds to an implicit conversion then the target type doesn't
explicitly appear anywhere.

> 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/concepts-this1.C: New test.
> > ---
> >   gcc/cp/pt.c |  5 
> >   gcc/testsuite/g++.dg/cpp2a/concepts-this1.C | 30 +
> >   2 files changed, 35 insertions(+)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-this1.C
> > 
> > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > index 82bf7dc26f6..fa55857d783 100644
> > --- a/gcc/cp/pt.c
> > +++ b/gcc/cp/pt.c
> > @@ -10766,6 +10766,11 @@ any_template_parm_r (tree t, void *data)
> > WALK_SUBTREE (TREE_TYPE (t));
> > break;
> >   +case CONVERT_EXPR:
> > +  if (is_dummy_object (t))
> > +   WALK_SUBTREE (TREE_TYPE (t));
> > +  break;
> > +
> >   default:
> > break;
> >   }
> > diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-this1.C
> > b/gcc/testsuite/g++.dg/cpp2a/concepts-this1.C
> > new file mode 100644
> > index 000..d717028201a
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-this1.C
> > @@ -0,0 +1,30 @@
> > +// PR c++/103198
> > +// { dg-do compile { target c++20 } }
> > +
> > +template
> > +struct A {
> > +  T val;
> > +
> > +  template
> > +requires requires { val.x; }
> > +  void f(U);
> > +
> > +  static void g(int)
> > +requires requires { val.x; };
> > +
> > +  void h(int)
> > +requires requires { val.x; };
> > +};
> > +
> > +struct B { int x; };
> > +struct C { };
> > +
> > +int main() {
> > +  A().f(0);
> > +  A().g(0);
> > +  A().h(0);
> > +
> > +  A().f(0); // { dg-error "no match" }
> > +  A().g(0); // { dg-error "no match" }
> > +  A().h(0); // { dg-error "no match" }
> > +}
> > 
> 
> 



Re: [PATCH] rs6000: Better error messages for power8/9-vector builtins

2021-11-17 Thread Segher Boessenkool
On Wed, Nov 17, 2021 at 11:45:02AM -0600, Paul A. Clarke wrote:
> I guess I'm being pedantic.  "requires -mcpu=power8 and -mvsx" is not
> accurate from a user's point a view, as "-mcpu=power8" is sufficient,
> since "-mvsx" is enabled when "-mcpu=power8" is specified.

To be really pedantic, -mcpu=power8 isn't required either: anythng that
enable the subset of ISA 2.07 that is needed is enough already.  But we
don't want to encourage users to use those interfaces.

> The real "requires" is "-mcpu=power8" and no "-mno-vsx".

And no -mno-altivec.  And and and.  There is a huge web.

> It's not a strong objection, since specifying "-mno-vsx" should be
> uncommon.  (Right?)  And, specifying "-mcpu=power8 -mvsx" is harmless.

Maybe the warning could say "requires -mcpu=power8 (and -mvsx)"?  Is
that clearer, to your eye?


Segher


[PATCH v3] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread H.J. Lu via Gcc-patches
On Wed, Nov 17, 2021 at 7:53 AM Uros Bizjak  wrote:
>
> On Wed, Nov 17, 2021 at 4:35 PM H.J. Lu  wrote:
> >
> > Add -mharden-sls= to mitigate against straight line speculation (SLS)
> > for function return and indirect branch by adding an INT3 instruction
> > after function return and indirect branch.
> >
> > gcc/
> >
> > PR target/102952
> > * config/i386/i386-opts.h (harden_sls): New enum.
> > * config/i386/i386.c (output_indirect_thunk): Mitigate against
> > SLS for function return.
> > (ix86_output_function_return): Likewise.
> > (ix86_output_jmp_thunk_or_indirect): Mitigate against indirect
> > branch.
> > (ix86_output_indirect_jmp): Likewise.
> > (ix86_output_call_insn): Likewise.
> > * config/i386/i386.opt: Add -mharden-sls=.
> > * doc/invoke.texi: Document -mharden-sls=.
> >
> > gcc/testsuite/
> >
> > PR target/102952
> > * gcc.target/i386/harden-sls-1.c: New test.
> > * gcc.target/i386/harden-sls-2.c: Likewise.
> > * gcc.target/i386/harden-sls-3.c: Likewise.
> > * gcc.target/i386/harden-sls-4.c: Likewise.
> > * gcc.target/i386/harden-sls-5.c: Likewise.
> > ---
> >  gcc/config/i386/i386-opts.h  |  7 ++
> >  gcc/config/i386/i386.c   | 23 ++--
> >  gcc/config/i386/i386.opt | 20 +
> >  gcc/doc/invoke.texi  | 10 -
> >  gcc/testsuite/gcc.target/i386/harden-sls-1.c | 14 
> >  gcc/testsuite/gcc.target/i386/harden-sls-2.c | 14 
> >  gcc/testsuite/gcc.target/i386/harden-sls-3.c | 14 
> >  gcc/testsuite/gcc.target/i386/harden-sls-4.c | 16 ++
> >  gcc/testsuite/gcc.target/i386/harden-sls-5.c | 17 +++
> >  9 files changed, 127 insertions(+), 8 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-5.c
> >
> > diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
> > index 04e4ad608fb..171d3106d0a 100644
> > --- a/gcc/config/i386/i386-opts.h
> > +++ b/gcc/config/i386/i386-opts.h
> > @@ -121,4 +121,11 @@ enum instrument_return {
> >instrument_return_nop5
> >  };
> >
> > +enum harden_sls {
> > +  harden_sls_none = 0,
> > +  harden_sls_return = 1 << 0,
> > +  harden_sls_indirect_branch = 1 << 1,
> > +  harden_sls_all = harden_sls_return | harden_sls_indirect_branch
> > +};
> > +
> >  #endif
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 73c4d5115bb..8bbf6ae9875 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -5914,6 +5914,8 @@ output_indirect_thunk (unsigned int regno)
> >  }
> >
> >fputs ("\tret\n", asm_out_file);
> > +  if ((ix86_harden_sls & harden_sls_return))
> > +fputs ("\tint3\n", asm_out_file);
> >  }
> >
> >  /* Output a funtion with a call and return thunk for indirect branch.
> > @@ -15984,6 +15986,8 @@ ix86_output_jmp_thunk_or_indirect (const char 
> > *thunk_name, const int regno)
> >fprintf (asm_out_file, "\tjmp\t");
> >assemble_name (asm_out_file, thunk_name);
> >putc ('\n', asm_out_file);
> > +  if ((ix86_harden_sls & harden_sls_indirect_branch))
> > +   fputs ("\tint3\n", asm_out_file);
> >  }
> >else
> >  output_indirect_thunk (regno);
> > @@ -16206,10 +16210,10 @@ ix86_output_indirect_jmp (rtx call_op)
> > gcc_unreachable ();
> >
> >ix86_output_indirect_branch (call_op, "%0", true);
> > -  return "";
> >  }
> >else
> > -return "%!jmp\t%A0";
> > +output_asm_insn ("%!jmp\t%A0", &call_op);
> > +  return (ix86_harden_sls & harden_sls_indirect_branch) ? "int3" : "";
> >  }
> >
> >  /* Output return instrumentation for current function if needed.  */
> > @@ -16277,10 +16281,10 @@ ix86_output_function_return (bool long_p)
> >return "";
> >  }
> >
> > -  if (!long_p)
> > -return "%!ret";
> > -
> > -  return "rep%; ret";
> > +  if ((ix86_harden_sls & harden_sls_return))
> > +long_p = false;
>
> Is the above really needed? This will change "rep ret" to a "[notrack]
> ret" when SLS hardening is in effect, with a conditional [notrack]
> prefix, even when long ret was requested.

Fixed in the v3 patch.

> On a related note, "notrack ret" does not assemble for me, the
> assembler reports:
>
> notrack.s:1: Error: expecting indirect branch instruction after `notrack'
>
> Can you please clarify the above change?

I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103307

Here is the v3 patch.


-- 
H.J.
From ed5e4a06b0488bff1fcdf218d93b54e0abf7ff3b Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 27 Oct 202

Re: [PATCH] rs6000: Better error messages for power8/9-vector builtins

2021-11-17 Thread David Edelsohn via Gcc-patches
On Wed, Nov 17, 2021 at 3:02 PM Segher Boessenkool
 wrote:
>
> > It's not a strong objection, since specifying "-mno-vsx" should be
> > uncommon.  (Right?)  And, specifying "-mcpu=power8 -mvsx" is harmless.
>
> Maybe the warning could say "requires -mcpu=power8 (and -mvsx)"?  Is
> that clearer, to your eye?

Maybe "requires -mcpu=power8 with VSX" or "requires -mcpu=power8 with
VSX enabled"?

Thanks, David


Re: [PATCH v3] x86: Add -mharden-sls=[none|all|return|indirect-branch]

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 17, 2021 at 9:02 PM H.J. Lu  wrote:
>
> On Wed, Nov 17, 2021 at 7:53 AM Uros Bizjak  wrote:
> >
> > On Wed, Nov 17, 2021 at 4:35 PM H.J. Lu  wrote:
> > >
> > > Add -mharden-sls= to mitigate against straight line speculation (SLS)
> > > for function return and indirect branch by adding an INT3 instruction
> > > after function return and indirect branch.
> > >
> > > gcc/
> > >
> > > PR target/102952
> > > * config/i386/i386-opts.h (harden_sls): New enum.
> > > * config/i386/i386.c (output_indirect_thunk): Mitigate against
> > > SLS for function return.
> > > (ix86_output_function_return): Likewise.
> > > (ix86_output_jmp_thunk_or_indirect): Mitigate against indirect
> > > branch.
> > > (ix86_output_indirect_jmp): Likewise.
> > > (ix86_output_call_insn): Likewise.
> > > * config/i386/i386.opt: Add -mharden-sls=.
> > > * doc/invoke.texi: Document -mharden-sls=.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/102952
> > > * gcc.target/i386/harden-sls-1.c: New test.
> > > * gcc.target/i386/harden-sls-2.c: Likewise.
> > > * gcc.target/i386/harden-sls-3.c: Likewise.
> > > * gcc.target/i386/harden-sls-4.c: Likewise.
> > > * gcc.target/i386/harden-sls-5.c: Likewise.

OK, with a small nit below.

Thanks,
Uros.

+mharden-sls=
+Target RejectNegative Joined Enum(harden_sls) Var(ix86_harden_sls)
Init(harden_sls_none)
+Generate code to mitigate against straight line speculation.
+
+Enum
+Name(harden_sls) Type(enum harden_sls)
+Known choices for mitigation against straight line speculation with
-mharden-sls=:
+
+EnumValue
+Enum(harden_sls) String(none) Value(harden_sls_none)
+
+EnumValue
+Enum(harden_sls) String(all) Value(harden_sls_all)

Please move the above enum to the last enum.

+
+EnumValue
+Enum(harden_sls) String(return) Value(harden_sls_return)
+
+EnumValue
+Enum(harden_sls) String(indirect-branch) Value(harden_sls_indirect_branch)
+


Re: [PATCH] rs6000: Better error messages for power8/9-vector builtins

2021-11-17 Thread Paul A. Clarke via Gcc-patches
On Wed, Nov 17, 2021 at 02:00:02PM -0600, Segher Boessenkool wrote:
> On Wed, Nov 17, 2021 at 11:45:02AM -0600, Paul A. Clarke wrote:
> > I guess I'm being pedantic.  "requires -mcpu=power8 and -mvsx" is not
> > accurate from a user's point a view, as "-mcpu=power8" is sufficient,
> > since "-mvsx" is enabled when "-mcpu=power8" is specified.
> 
> To be really pedantic, -mcpu=power8 isn't required either: anythng that
> enable the subset of ISA 2.07 that is needed is enough already.  But we
> don't want to encourage users to use those interfaces.
> 
> > The real "requires" is "-mcpu=power8" and no "-mno-vsx".
> 
> And no -mno-altivec.  And and and.  There is a huge web.
> 
> > It's not a strong objection, since specifying "-mno-vsx" should be
> > uncommon.  (Right?)  And, specifying "-mcpu=power8 -mvsx" is harmless.
> 
> Maybe the warning could say "requires -mcpu=power8 (and -mvsx)"?  Is
> that clearer, to your eye?

Hrm. No, but let me withdraw my expression of concern. Both "power8" and
"vsx" are required, and those two options get that explicitly.
That "-mcpu=power8" also pulls in "-mvsx" is a subtlety that is
perhaps not terribly relevant.

Thanks for entertaining my concern, but we've spent too much time on it
already.  :-)

PC


Re: [PATCH] rs6000: Better error messages for power8/9-vector builtins

2021-11-17 Thread Segher Boessenkool
On Tue, Nov 16, 2021 at 11:12:35AM -0600, Bill Schmidt wrote:
> Hi!  During a previous patch review, Segher asked that I provide better
> messages when builtins are unavailable because they require both a minimum
> CPU and the enablement of VSX instructions.  This patch does just that.
> 
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
> Is this okay for trunk?

It is.  Thank you!


Segher


Re: [PATCH] Fortran: Mark internal symbols as artificial [PR88009,PR68800]

2021-11-17 Thread Harald Anlauf via Gcc-patches

Do you have testcases/reproducers demonstrating that the patch actually
fixes the issues you're describing?

Am 17.11.21 um 09:12 schrieb Bernhard Reutner-Fischer via Gcc-patches:

On Tue, 16 Nov 2021 21:46:32 +0100
Harald Anlauf via Fortran  wrote:


Hi Bernhard,

I'm trying to understand your patch.  What does it really try to solve?


Compiler generated symbols should be marked artificial.
The fix for PR88009 ( f8add009ce300f24b75e9c2e2cc5dd944a020c28 ,
r9-5194 ) added artificial just to the _final component and left out all the 
rest.
Note that the majority of compiler generated symbols in class.c
already had artificial set properly.
The proposed patch amends the other generated symbols to be marked
artificial, too.

The other parts fix memory leaks.



PR88009 is closed and seems to have nothing to do with this.


Well it marked only _final as artificial and forgot to adjust the
others as well.
We can remove the reference to PR88009 if you prefer?

thanks!


Harald

Am 14.11.21 um 23:17 schrieb Bernhard Reutner-Fischer via Fortran:

Hi!

Amend fix for PR88009 to mark all these class components as artificial.

gcc/fortran/ChangeLog:

  * class.c (gfc_build_class_symbol, generate_finalization_wrapper,
  (gfc_find_derived_vtab, find_intrinsic_vtab): Use stringpool for
  names. Mark internal symbols as artificial.
  * decl.c (gfc_match_decl_type_spec, gfc_match_end): Fix
  indentation.
  (gfc_match_derived_decl): Fix indentation. Check extension level
  before incrementing refs counter.
  * parse.c (parse_derived): Fix style.
  * resolve.c (resolve_global_procedure): Likewise.
  * symbol.c (gfc_check_conflict): Do not ignore artificial symbols.
  (gfc_add_flavor): Reorder condition, cheapest first.
  (gfc_new_symbol, gfc_get_sym_tree,
  generate_isocbinding_symbol): Fix style.
  * trans-expr.c (gfc_trans_subcomponent_assign): Remove
  restriction on !artificial.
  * match.c (gfc_match_equivalence): Special-case CLASS_DATA for
  warnings.

---
gfc_match_equivalence(), too, should not bail-out early on the first
error but should diagnose all errors. I.e. not goto cleanup but set
err=true and continue in order to diagnose all constraints of a
statement. Maybe Sandra or somebody else will eventually find time to
tweak that.

I think it also plugs a very minor leak of name in gfc_find_derived_vtab
so i also tagged it [PR68800]. At least that was the initial
motiviation to look at that spot.
We were doing
-  name = xasprintf ("__vtab_%s", tname);
...
gfc_set_sym_referenced (vtab);
- name = xasprintf ("__vtype_%s", tname);

Bootstrapped and regtested without regressions on x86_64-unknown-linux.
Ok for trunk?
   











[PATCH v2] x86: Remove "%!" before ret

2021-11-17 Thread H.J. Lu via Gcc-patches
On Wed, Nov 17, 2021 at 11:46 AM Uros Bizjak  wrote:
>
> On Wed, Nov 17, 2021 at 8:44 PM H.J. Lu  wrote:
> >
> > Before MPX was removed, "%!" was mapped to
> >
> > case '!':
> >   if (ix86_bnd_prefixed_insn_p (current_output_insn))
> > fputs ("bnd ", file);
> >   return;
> >
> > After CET was added and MPX was removed, "%!" was mapped to
> >
> >case '!':
> >   if (ix86_notrack_prefixed_insn_p (current_output_insn))
> > fputs ("notrack ", file);
> >   return;
> >
> > ix86_notrack_prefixed_insn_p always returns false on ret since the
> > notrack prefix is only for indirect branches.  Remove the unused "%!"
> > before ret.
> >
> > PR target/103307
> > * config/i386/i386.c (ix86_code_end): Remove "%!" before ret.
> > (ix86_output_function_return): Likewise.
> > * config/i386/i386.md (simple_return_pop_internal): Likewise.
> > ---
> >  gcc/config/i386/i386.c  | 4 ++--
> >  gcc/config/i386/i386.md | 2 +-
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 73c4d5115bb..95d238e9efc 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -6116,7 +6116,7 @@ ix86_code_end (void)
> >xops[0] = gen_rtx_REG (Pmode, regno);
> >xops[1] = gen_rtx_MEM (Pmode, stack_pointer_rtx);
> >output_asm_insn ("mov%z0\t{%1, %0|%0, %1}", xops);
> > -  output_asm_insn ("%!ret", NULL);
> > +  output_asm_insn ("ret", NULL);
>
> This can use fputs.

Fixed.   Here is the v2 patch.

> Uros.
>
> >final_end_function ();
> >init_insn_lengths ();
> >free_after_compilation (cfun);
> > @@ -16278,7 +16278,7 @@ ix86_output_function_return (bool long_p)
> >  }
> >
> >if (!long_p)
> > -return "%!ret";
> > +return "ret";
> >
> >return "rep%; ret";
> >  }
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 73d15de88b2..7b2de60706d 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -14705,7 +14705,7 @@ (define_insn_and_split "simple_return_pop_internal"
> >[(simple_return)
> > (use (match_operand:SI 0 "const_int_operand"))]
> >"reload_completed"
> > -  "%!ret\t%0"
> > +  "ret\t%0"
> >"&& cfun->machine->function_return_type != indirect_branch_keep"
> >[(const_int 0)]
> >"ix86_split_simple_return_pop_internal (operands[0]); DONE;"
> > --
> > 2.33.1
> >



-- 
H.J.
From 594391d282f0066cb046dd06062e3efad8c74a08 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 17 Nov 2021 11:41:12 -0800
Subject: [PATCH v2] x86: Remove "%!" before ret

Before MPX was removed, "%!" was mapped to

case '!':
  if (ix86_bnd_prefixed_insn_p (current_output_insn))
fputs ("bnd ", file);
  return;

After CET was added and MPX was removed, "%!" was mapped to

   case '!':
  if (ix86_notrack_prefixed_insn_p (current_output_insn))
fputs ("notrack ", file);
  return;

ix86_notrack_prefixed_insn_p always returns false on ret since the
notrack prefix is only for indirect branches.  Remove the unused "%!"
before ret.

	PR target/103307
	* config/i386/i386.c (ix86_code_end): Remove "%!" before ret.
	(ix86_output_function_return): Likewise.
	* config/i386/i386.md (simple_return_pop_internal): Likewise.
---
 gcc/config/i386/i386.c  | 4 ++--
 gcc/config/i386/i386.md | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c9129ae25e4..a5bfb9efca9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6115,7 +6115,7 @@ ix86_code_end (void)
   xops[0] = gen_rtx_REG (Pmode, regno);
   xops[1] = gen_rtx_MEM (Pmode, stack_pointer_rtx);
   output_asm_insn ("mov%z0\t{%1, %0|%0, %1}", xops);
-  output_asm_insn ("%!ret", NULL);
+  fputs ("\tret\n", asm_out_file);
   final_end_function ();
   init_insn_lengths ();
   free_after_compilation (cfun);
@@ -16273,7 +16273,7 @@ ix86_output_function_return (bool long_p)
 }
 
   if (!long_p)
-return "%!ret";
+return "ret";
 
   return "rep%; ret";
 }
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 73d15de88b2..7b2de60706d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14705,7 +14705,7 @@ (define_insn_and_split "simple_return_pop_internal"
   [(simple_return)
(use (match_operand:SI 0 "const_int_operand"))]
   "reload_completed"
-  "%!ret\t%0"
+  "ret\t%0"
   "&& cfun->machine->function_return_type != indirect_branch_keep"
   [(const_int 0)]
   "ix86_split_simple_return_pop_internal (operands[0]); DONE;"
-- 
2.33.1



Re: [PATCH v2] x86: Remove "%!" before ret

2021-11-17 Thread Uros Bizjak via Gcc-patches
On Wed, Nov 17, 2021 at 9:33 PM H.J. Lu  wrote:
>
> On Wed, Nov 17, 2021 at 11:46 AM Uros Bizjak  wrote:
> >
> > On Wed, Nov 17, 2021 at 8:44 PM H.J. Lu  wrote:
> > >
> > > Before MPX was removed, "%!" was mapped to
> > >
> > > case '!':
> > >   if (ix86_bnd_prefixed_insn_p (current_output_insn))
> > > fputs ("bnd ", file);
> > >   return;
> > >
> > > After CET was added and MPX was removed, "%!" was mapped to
> > >
> > >case '!':
> > >   if (ix86_notrack_prefixed_insn_p (current_output_insn))
> > > fputs ("notrack ", file);
> > >   return;
> > >
> > > ix86_notrack_prefixed_insn_p always returns false on ret since the
> > > notrack prefix is only for indirect branches.  Remove the unused "%!"
> > > before ret.
> > >
> > > PR target/103307
> > > * config/i386/i386.c (ix86_code_end): Remove "%!" before ret.
> > > (ix86_output_function_return): Likewise.
> > > * config/i386/i386.md (simple_return_pop_internal): Likewise.
> > > ---
> > >  gcc/config/i386/i386.c  | 4 ++--
> > >  gcc/config/i386/i386.md | 2 +-
> > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > index 73c4d5115bb..95d238e9efc 100644
> > > --- a/gcc/config/i386/i386.c
> > > +++ b/gcc/config/i386/i386.c
> > > @@ -6116,7 +6116,7 @@ ix86_code_end (void)
> > >xops[0] = gen_rtx_REG (Pmode, regno);
> > >xops[1] = gen_rtx_MEM (Pmode, stack_pointer_rtx);
> > >output_asm_insn ("mov%z0\t{%1, %0|%0, %1}", xops);
> > > -  output_asm_insn ("%!ret", NULL);
> > > +  output_asm_insn ("ret", NULL);
> >
> > This can use fputs.
>
> Fixed.   Here is the v2 patch.

OK.

Thanks,
Uros.


[PATCH] rs6000: Builtins test changes for BFP scalar tests

2021-11-17 Thread Bill Schmidt via Gcc-patches
Hi!  This patch is broken out of the previous patch for all the builtins test
suite adjustments.  Here we have some slight changes in error messages due to
how the internals have changed between the old and new builtins methods.

For scalar-extract-exp-2.c we change:
  error: '__builtin_vec_scalar_extract_exp is not supported in this compiler 
configuration'

to:
  error: '__builtin_vsx_scalar_extract_exp' requires the '-mcpu=power9' option 
and either the '-m64' or '-mpowerpc64' option
  note: builtin '__builtin_vec_scalar_extract_exp' requires builtin 
'__builtin_vsx_scalar_extract_exp'

The new message provides more information.  In both cases, it is less than
ideal that we don't refer to scalar_extract_exp, which is referenced in
the source line, but this is because scalar_extract_exp is #define'd to
__builtin_vec_scalar_extract_exp, so it's unavoidable.  Certainly this is no
worse than before, and arguably better.

The cases for:
scalar-insert-exp-2.c
scalar-insert-exp-5.c
scalar-insert-exp-8.c
are all similar.

For scalar-extract-sig-2.c we again change:
  error: '__builtin_vec_scalar_extract_sig' is not supported in this compiler 
configuration'

to:
  error: '__builtin_vsx_scalar_extract_sig' requires the '-mcpu=power9' option 
and either the '-m64' or '-mpowerpc64' option
  note: builtin '__builtin_vec_scalar_extract_sig' requires builtin 
'__builtin_vsx_scalar_extract_sig'

Here it is clearer because there is no #define to muddy things up, and
again the new message is arguably better than the old.

For scalar-test-neg-{2,3,5}.c, we actually change the test case.  This is
because we deliberately removed some undocumented and pointless overloads,
where each overload mapped to a single builtin.  These were:
__builtin_vec_scalar_test_neg_sp
__builtin_vec_scalar_test_neg_dp
__builtin_vec_scalar_test_neg_qp
which are redundant with the "real" overload:
__builtin_vec_scalar_test_neg
The latter maps to three builtins of the appropriate type.

The revised test case uses the "real" overload instead, and otherwise the
changes to the error messages are the same as for all the other cases.

2021-11-17  Bill Schmidt  

gcc/testsuite/
* gcc.target/powerpc/bfp/scalar-extract-exp-2.c: Adjust error
message.
* gcc.target/powerpc/bfp/scalar-extract-sig-2.c: Likewise.
* gcc.target/powerpc/bfp/scalar-insert-exp-2.c: Likewise.
* gcc.target/powerpc/bfp/scalar-insert-exp-5.c: Likewise.
* gcc.target/powerpc/bfp/scalar-insert-exp-8.c: Likewise.
* gcc.target/powerpc/bfp/scalar-test-neg-2.c: Likewise.
* gcc.target/powerpc/bfp/scalar-test-neg-3.c: Likewise.
* gcc.target/powerpc/bfp/scalar-test-neg-5.c: Likewise.
---
 gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-8.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-2.c| 2 +-
 gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-3.c| 2 +-
 gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-5.c| 2 +-
 8 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c 
b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
index 922180675fc..53b67c95cf9 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
@@ -14,7 +14,7 @@ get_exponent (double *p)
 {
   double source = *p;
 
-  return scalar_extract_exp (source);  /* { dg-error 
"'__builtin_vec_scalar_extract_exp' is not supported in this compiler 
configuration" } */
+  return scalar_extract_exp (source);  /* { dg-error 
"'__builtin_vsx_scalar_extract_exp' requires the" } */
 }
 
 
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c 
b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
index e24d4bd23fe..39ee74c94dc 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
@@ -12,5 +12,5 @@ get_significand (double *p)
 {
   double source = *p;
 
-  return __builtin_vec_scalar_extract_sig (source); /* { dg-error 
"'__builtin_vec_scalar_extract_sig' is not supported in this compiler 
configuration" } */
+  return __builtin_vec_scalar_extract_sig (source); /* { dg-error 
"'__builtin_vsx_scalar_extract_sig' requires the" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c 
b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
index feb943104da..efd69725905 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
@@ -16,5 +16,5 @@ insert_

[PATCH 1/4] Driver : Provide a spec to insert rpaths for compiler lib dirs.

2021-11-17 Thread Iain Sandoe via Gcc-patches
This provides a spec to insert "-rpath DDD" for each DDD corresponding
to a compiler startfile directory.  This allows a target to use @rpath
as the install path for libraries, and have the compiler provide the
necessary rpath to handle this.

gcc/ChangeLog:

* gcc.c (RUNPATH_OPTION): New.
(do_spec_1): Provide '%P' as a spec to insert rpaths for
each compiler startfile path.
---
 gcc/gcc.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 506c2acc282..7b52d0bcbfd 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -572,6 +572,7 @@ or with constant text in a single argument.
  %l process LINK_SPEC as a spec.
  %L process LIB_SPEC as a spec.
  %M Output multilib_os_dir.
+ %POutput a RUNPATH_OPTION for each directory in startfile_prefixes.
  %G process LIBGCC_SPEC as a spec.
  %R Output the concatenation of target_system_root and
 target_sysroot_suffix.
@@ -1191,6 +1192,10 @@ proper position among the other output files.  */
 # define SYSROOT_HEADERS_SUFFIX_SPEC ""
 #endif
 
+#ifndef RUNPATH_OPTION
+# define RUNPATH_OPTION "-rpath"
+#endif
+
 static const char *asm_debug = ASM_DEBUG_SPEC;
 static const char *asm_debug_option = ASM_DEBUG_OPTION_SPEC;
 static const char *cpp_spec = CPP_SPEC;
@@ -6130,6 +6135,19 @@ do_spec_1 (const char *spec, int inswitch, const char 
*soft_matched_part)
}
break;
 
+ case 'P':
+   {
+ struct spec_path_info info;
+
+ info.option = RUNPATH_OPTION;
+ info.append_len = 0;
+ info.omit_relative = false;
+ info.separate_options = true;
+
+ for_each_path (&startfile_prefixes, true, 0, spec_path, &info);
+   }
+   break;
+
  case 'e':
/* %efoo means report an error with `foo' as error message
   and don't execute any more commands for this file.  */
-- 
2.24.3 (Apple Git-128)



[PATCH 2/4] Darwin : Handle rpaths given on the command line.

2021-11-17 Thread Iain Sandoe via Gcc-patches
We want to produce a situation where a default rpath can be added
to each executable (or dylib), but that can be overridden by any
specific rpath provided by the user.

gcc/ChangeLog:

* config.gcc: Include rpath.opt
* config/darwin-driver.c (darwin_driver_init): Detect cases
where the user has added rpaths via a -Wl or -Xlinker command
and suppress default rpaths in that case.
* config/darwin.h (DRIVER_SELF_SPECS): Handle -rpath.
(DARWIN_RPATH_SPEC): New.
* config/darwin.opt: Add nodefaultrpath option.
---
 gcc/config/darwin-driver.c | 18 ++
 gcc/config/darwin.h| 11 ++-
 gcc/config/darwin.opt  |  4 
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/gcc/config/darwin-driver.c b/gcc/config/darwin-driver.c
index 4f0c6bad61f..ccc288f20ce 100644
--- a/gcc/config/darwin-driver.c
+++ b/gcc/config/darwin-driver.c
@@ -281,6 +281,7 @@ darwin_driver_init (unsigned int *decoded_options_count,
   const char *vers_string = NULL;
   bool seen_version_min = false;
   bool seen_sysroot_p = false;
+  bool seen_rpath_p = false;
 
   for (i = 1; i < *decoded_options_count; i++)
 {
@@ -349,6 +350,13 @@ darwin_driver_init (unsigned int *decoded_options_count,
  seen_sysroot_p = true;
  break;
 
+   case OPT_Xlinker:
+   case OPT_Wl_:
+ gcc_checking_assert ((*decoded_options)[i].arg);
+ if (strncmp ((*decoded_options)[i].arg, "-rpath", 6) == 0)
+   seen_rpath_p = true;
+ break;
+
default:
  break;
}
@@ -474,4 +482,14 @@ darwin_driver_init (unsigned int *decoded_options_count,
  &(*decoded_options)[*decoded_options_count - 1]);
 }
 }
+
+  if (seen_rpath_p)
+{
+  ++*decoded_options_count;
+  *decoded_options = XRESIZEVEC (struct cl_decoded_option,
+*decoded_options,
+*decoded_options_count);
+  generate_option (OPT_nodefaultrpath, NULL, 1, CL_DRIVER,
+  &(*decoded_options)[*decoded_options_count - 1]);
+}
 }
diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index 7ed01efa694..4423933890b 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -384,6 +384,7 @@ extern GTY(()) int darwin_ms_struct;
 DARWIN_NOPIE_SPEC \
 DARWIN_RDYNAMIC \
 DARWIN_NOCOMPACT_UNWIND \
+"%{!r:%{!nostdlib:%{!rpath:%{!nodefaultrpath:%(darwin_rpaths) " \
 "}}} % 10.5 mmacosx-version-min= -lcrt1.o)\
@@ -542,6 +544,13 @@ extern GTY(()) int darwin_ms_struct;
 "%{!static:%:version-compare(< 10.6 mmacosx-version-min= -lbundle1.o)  \
   %{fgnu-tm: -lcrttms.o}}"
 
+/* A default rpath, that picks up dependent libraries installed in the same 
+   director as one being loaded.  */
+#define DARWIN_RPATH_SPEC \
+  "%:version-compare(>= 10.5 mmacosx-version-min= -rpath) \
+   %:version-compare(>= 10.5 mmacosx-version-min= @loader_path) \
+   %P "
+
 #ifdef HAVE_AS_MMACOSX_VERSION_MIN_OPTION
 /* Emit macosx version (but only major).  */
 #define ASM_MMACOSX_VERSION_MIN_SPEC \
diff --git a/gcc/config/darwin.opt b/gcc/config/darwin.opt
index d1d1f816912..021d67b17c7 100644
--- a/gcc/config/darwin.opt
+++ b/gcc/config/darwin.opt
@@ -233,6 +233,10 @@ no_dead_strip_inits_and_terms
 Driver RejectNegative
 (Obsolete) Current linkers never dead-strip these items, so the option is not 
needed.
 
+nodefaultrpath
+Driver RejectNegative
+Do not add a default rpath to executables, modules or dynamic libraries.
+
 nofixprebinding
 Driver RejectNegative
 (Obsolete after 10.3.9) Set MH_NOPREFIXBINDING, in an executable.
-- 
2.24.3 (Apple Git-128)



[PATCH 4/4] Darwin, Ada : Add loader path as a default rpath.

2021-11-17 Thread Iain Sandoe via Gcc-patches
Allow the Ada runtimes to find GCC runtimes relative to their non-
standard install positions.

gcc/ada/
* gcc-interface/Makefile.in: Add @loader_path runpaths to the
libgnat and libgnarl shared library builds.

---
 gcc/ada/gcc-interface/Makefile.in | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/ada/gcc-interface/Makefile.in 
b/gcc/ada/gcc-interface/Makefile.in
index 53d0739470a..bffe9de4c89 100644
--- a/gcc/ada/gcc-interface/Makefile.in
+++ b/gcc/ada/gcc-interface/Makefile.in
@@ -788,6 +788,7 @@ gnatlib-shared-darwin:
$(GNATRTL_NONTASKING_OBJS) $(LIBGNAT_OBJS) \
$(SO_OPTS) \

-Wl,-install_name,@rpath/libgnat$(hyphen)$(LIBRARY_VERSION)$(soext) \
+   -Wl,-rpath,@loader_path \
$(MISCLIB)
cd $(RTSDIR); `echo "$(GCC_FOR_TARGET)" \
 | sed -e 's,\./xgcc,../../xgcc,' -e 's,-B\./,-B../../,'` 
-dynamiclib $(PICFLAG_FOR_TARGET) \
@@ -795,6 +796,7 @@ gnatlib-shared-darwin:
$(GNATRTL_TASKING_OBJS) \
$(SO_OPTS) \

-Wl,-install_name,@rpath/libgnarl$(hyphen)$(LIBRARY_VERSION)$(soext) \
+   -Wl,-rpath,@loader_path \
$(THREADSLIB) -Wl,libgnat$(hyphen)$(LIBRARY_VERSION)$(soext)
cd $(RTSDIR); $(LN_S) libgnat$(hyphen)$(LIBRARY_VERSION)$(soext) \
libgnat$(soext)
-- 
2.24.3 (Apple Git-128)



[PATCH 0/4] Darwin: Replace environment runpath with embedded [PR88590].

2021-11-17 Thread Iain Sandoe via Gcc-patches
This is a fairly long explanation of the problems being addressed by
the patch set.  Most of the changes are Darwin-specific - a change to
the libtool component allowing for this @rpath and some minor additions
to makefiles where libtool is not in use.  At present, this seems pretty
specific to the GCC build; since we depend on accessing newly-built
components during the bootstrap.

There are additional details relevant to each patch in its own commit
message.

=

Darwin builds shared libraries with information on the runpath as part
of the library name.  For example, /installation/path/for/libfoo.dylib.

That is regarded as two components; the 'runpath' /installation/path/for/
and the library name libfoo.dylib.

This means that (at runtime) two libraries with the same name can be
disambiguated by their runpaths, and potentially used by the same exe.

= Problem #1

That is fine and works well; until we disturb the assumptions by
overriding the library runpath with an environment one: DYLD_LIBRARY_PATH.

Now the library runpath(s) can be discarded and the libraries are first
searched on the basis of that provided by the environment; two libraries
with the same name are no longer distinct (if a library with that name is
found in the environment path).

This causes problems in configuring, building and testing GCC because we
set the runpath environment at a very high level so that it applies to
stage1+ target configures and stage2+ host configures.  This is needed so
that executables built during those configures get the newly-built libgcc_s
when the target defaults to using a shared libgcc.

However, it also means that every tool that is used during the configure
has its libgcc_s (or any of the newly-built bootstrapped libs) overriden
to use the new one(s) - which might be buggy.

In the testsuite it is more serious - since more target libs come into
play - especially libstdc++.  Several system tools on Darwin use(d) libc++
and that has caused wrong or crashed test output.  In principle,
LD_LIBRARY_PATH on Linux has the same issue - although perhaps there is
less tendency to default to use of shared dependent libs.

Ideally, one would have several environment paths, and some way to use
the appropriate one at the appropriate time.  I experimented with this
as a solution to both this and the following problem, but it proved
unrealistic - since the process would have to be applied to all relevant
OSS projects using auto-tools to be safe - and mostly the uninstalled
use of libraries is a GCC build-time issue.

= Problem #2

A change in security policy for Darwin means that DYLD_LIBRARY_PATH is
now removed from the environment for all system tools (e.g. /usr/sh, env
etc).  This means that for all realistic build steps that use any system
utility (like sh) will no longer see the the environment runpath and the
only ones available will be those in the libraries.

This breaks GCC's configuration since the steps mentioned above are now
not seeing the newly-built shared libraries, but actually much olders ones
installed on the system.  It means that for all Darwin15+ we misconfigure
libstdc++.

/bin/sh is hardwired into autoconf, one cannot use CONFIG_SHELL to work
around this - because /bin/sh is invoked first, and then passes control to
CONFIG_SHELL.

A second problem is that we cannot bump the SO name for libgcc_s (which
I need to do to solve an EH problem) - since the new SO name is not
available on the system, and therefore none of the stage1+ target configures
will succeed.  This is because the eventual install path is correctly
encoded into the built library, but it is not present at the install
position (and, in general, cannot be installed - since that might not even
be a suitable path on the build system).

This has also meant that we could not do in-tree testing without first
installing the target libraries (which is mostly inconvenient rather than
a show-stopper, but still).

= Tested solution.

Darwin has the ability to make the runpaths install-position independent.

One sets the library runpath to @rpath/ which essentially means "use the
runpath available at the time we want to load this".

One can then add (potentially multiple) runpaths to the executable, the
built library can be put anywhere convenient - providing we can put that
path into the exe.

For GCC's build, test and install process this means that we need at each
stage to build exes with the runpaths that are relevant to the positions
of the dependent libraries.

To do this, we add an rpath for each of the startfile paths.  While we are
building/testing GCC these correspond to (for example gcc/  or 
/libstdc++/src/.libs etc) and then, after the compiler is installed
at its intended install path - these become /compiler/installation/path/lib
etc.

I have tested this widely on i686, powerpc, x86_64 and aarch64 Darwin over
more than a year.

So patch 1 : provides a spec that expands to -rpath xxx for each xxx in the
startfiles (

[PATCH 3/4] Darwin : Allow for configuring Darwin to use embedded runpath.

2021-11-17 Thread Iain Sandoe via Gcc-patches
Recent Darwin versions place contraints on the use of run paths
specified in environment variables.  This breaks some assumptions
in the GCC build.

This change allows the user to configure a Darwin build to use
'@rpath/libraryname.dylib' in library names and then to add an
embedded runpath to executables (and libraries with dependents).

The embedded runpath is added by default unless:

1. the user adds an explicit -rpath / -Wl,-rpath,
2. the user adds '-nodefaultrpath'.

For an installed compiler, it means that any executable built with
that compiler will reference the runtimes installed with the
compiler (equivalent to hard-coding the library path into the name
of the library).

During build-time configurations  any "-B" entries will be added to
the runpath thus the newly-built libraries will be found by exes.

Since the install name is set in libtool, that decision needs to be
available here (but might also cause dependent ones in Makefiles,
so we need to export a conditional).

This facility is not available for Darwin 8 or earlier, however the
existing environment variable runpath does work there.

We default this on for systems where the external DYLD_LIBRARY_PATH
does not work and off for Darwin 8 or earlier.  For systems that can
use either method, if the value is unset, we use the default (which
is currently DYLD_LIBRARY_PATH).

ChangeLog:

* configure: Regenerate.
* configure.ac: Do not add default runpaths to GCC exes
when we are building -static-libstdc++/-static-libgcc (the
default).
* libtool.m4: Add 'enable-darwin-at-runpath'.  Act  on the
enable flag to alter Darwin libraries to use @rpath names.

gcc/ChangeLog:

* aclocal.m4: Regenerate.
* configure: Regenerate.

libatomic/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* testsuite/Makefile.in: Regenerate.

libcc1/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.

libffi/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.

libgcc/ChangeLog:

* config/t-slibgcc-darwin: Generate libgcc_s
with an @rpath name.

libgfortran/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths

libgomp/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.

libhsail-rt/ChangeLog:

* configure: Regenerate.

libitm/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.

libobjc/ChangeLog:

* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.

liboffloadmic/ChangeLog:

* configure: Regenerate.
* plugin/Makefile.in: Regenerate.
* plugin/aclocal.m4: Regenerate.
* plugin/configure: Regenerate.

libphobos/ChangeLog:

* configure: Regenerate.
* libdruntime/Makefile.am: Handle Darwin rpaths.
* libdruntime/Makefile.in: Regenerate.
* src/Makefile.am: Handle Darwin rpaths.
* src/Makefile.in: Regenerate.

libquadmath/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Handle Darwin rpaths.

libsanitizer/ChangeLog:

* asan/Makefile.am: Handle Darwin rpaths.
* asan/Makefile.in: Regenerate.
* configure: Regenerate.
* hwasan/Makefile.am: Handle Darwin rpaths.
* hwasan/Makefile.in: Regenerate.
* lsan/Makefile.am: Handle Darwin rpaths.
* lsan/Makefile.in: Regenerate.
* tsan/Makefile.am: Handle Darwin rpaths.
* tsan/Makefile.in: Regenerate.
* ubsan/Makefile.am: Handle Darwin rpaths.
* ubsan/Makefile.in: Regenerate.

libssp/ChangeLog:

* Makefile.am: Handle Darwin rpaths.
* Makefile.in: Regenerate.
* configure: Regenerate.

libstdc++-v3/ChangeLog:

* configure: Regenerate.
* src/Makefile.am: Handle Darwin rpaths.
* src/Makefile.in: Regenerate.

Darwin, libtool : Provide a mechanism to enable embedded rpaths.

We need to be able to build libraries with install names that begin
with @rpath so that we can use rpaths in DSOs that depend on
them.  Since the install name is set in libtool, that decision needs
to be available here (but might also cause dependent ones in
Makefiles, so we need to export a conditional).
---
 configure |   5 +
 configure.ac  |   5 +
 gcc/aclocal.m4|  50 +++
 gcc/configure | 157 +++--
 libatomic/Makefile.am |   6 +-
 libatomic/Makefile.in 

Fix gamess miscompare

2021-11-17 Thread Jan Hubicka via Gcc-patches
Hi,
this patch fixes bug in streaming in modref access tree that now cause a failure
of gamess benchmark.  The bug is quite old (present in GCC11 release) but it
needs quite interesting series of events to manifest. In particular
 1) At lto time ISRA turns some parameters passed by reference to scalar
 2) At lto time modref computes summaries for old parameters and then updates
them but does so quite stupidly believing that the load from parameters
are now unkonwn loads (rather than optimized out).
This renders summary not very useful since it thinks every memory aliasing
int is now accssed (as opposed as parameter dereference)
 3) At stream in we notice too early that summary is useless, set every_access
flag and drop the list.  However while reading rest of the summary we
overwrite the flag back to 0 which makes us to lose part of summary.
 4) right selection of partitions needs to be done to avoid late modref from
recalculating and thus fixing the summary.

This patch fixes the stream in bug, however we also should fix updating of
summaries.  Martin, would be possible to extend get_original_index by "deref"
parameter that would be set to true when refernce was turned to scalar?

Bootstrapped/regtested x86_64-linux. Comitted.

gcc/ChangeLog:

2021-11-17  Jan Hubicka  

PR ipa/103246
* ipa-modref.c (read_modref_records): Fix streaminig in of every_access
flag.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 9ceecdd479f..c94f0589d44 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -3460,10 +3460,10 @@ read_modref_records (lto_input_block *ib, struct 
data_in *data_in,
  size_t every_access = streamer_read_uhwi (ib);
  size_t naccesses = streamer_read_uhwi (ib);
 
- if (nolto_ref_node)
-   nolto_ref_node->every_access = every_access;
- if (lto_ref_node)
-   lto_ref_node->every_access = every_access;
+ if (nolto_ref_node && every_access)
+   nolto_ref_node->collapse ();
+ if (lto_ref_node && every_access)
+   lto_ref_node->collapse ();
 
  for (size_t k = 0; k < naccesses; k++)
{


[PATCH] PR fortran/101329 - ICE: Invalid expression in gfc_element_size

2021-11-17 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

as NULL() is not interoperable, we have to reject it.
Confirmed by NAG.  Other compilers show "interesting behavior".

Obvious patch by Steve.  Regtested on x86_64-pc-linux-gnu.

OK for mainline?

Thanks,
Harald

From 52a3ee53f0a12e897c4651fa8378e045653b9fd3 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 17 Nov 2021 22:21:24 +0100
Subject: [PATCH] Fortran: NULL() is not interoperable

gcc/fortran/ChangeLog:

	PR fortran/101329
	* check.c (is_c_interoperable): Reject NULL() as it is not
	interoperable.

gcc/testsuite/ChangeLog:

	PR fortran/101329
	* gfortran.dg/pr101329.f90: New test.

Co-authored-by: Steven G. Kargl 
---
 gcc/fortran/check.c|  6 ++
 gcc/testsuite/gfortran.dg/pr101329.f90 | 13 +
 2 files changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr101329.f90

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index ffa07b510cd..5a5aca10ebe 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -5223,6 +5223,12 @@ is_c_interoperable (gfc_expr *expr, const char **msg, bool c_loc, bool c_f_ptr)
 {
   *msg = NULL;

+  if (expr->expr_type == EXPR_NULL)
+{
+  *msg = "NULL() is not interoperable";
+  return false;
+}
+
   if (expr->ts.type == BT_CLASS)
 {
   *msg = "Expression is polymorphic";
diff --git a/gcc/testsuite/gfortran.dg/pr101329.f90 b/gcc/testsuite/gfortran.dg/pr101329.f90
new file mode 100644
index 000..b82210d4e28
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr101329.f90
@@ -0,0 +1,13 @@
+! { dg-do compile }
+! PR fortran/101329 - ICE: Invalid expression in gfc_element_size
+
+program p
+  use iso_c_binding
+  implicit none
+  integer(c_int), pointer :: ip4
+  integer(c_int64_t), pointer :: ip8
+  print *, c_sizeof (c_null_ptr) ! valid
+  print *, c_sizeof (null ())! { dg-error "is not interoperable" }
+  print *, c_sizeof (null (ip4)) ! { dg-error "is not interoperable" }
+  print *, c_sizeof (null (ip8)) ! { dg-error "is not interoperable" }
+end
--
2.26.2



Re: [PATCH] rs6000: Builtins test changes for BFP scalar tests

2021-11-17 Thread Segher Boessenkool
On Wed, Nov 17, 2021 at 02:58:54PM -0600, Bill Schmidt wrote:
> Hi!  This patch is broken out of the previous patch for all the builtins test
> suite adjustments.  Here we have some slight changes in error messages due to
> how the internals have changed between the old and new builtins methods.
> 
> For scalar-extract-exp-2.c we change:
>   error: '__builtin_vec_scalar_extract_exp is not supported in this compiler 
> configuration'
> 
> to:
>   error: '__builtin_vsx_scalar_extract_exp' requires the '-mcpu=power9' 
> option and either the '-m64' or '-mpowerpc64' option
>   note: builtin '__builtin_vec_scalar_extract_exp' requires builtin 
> '__builtin_vsx_scalar_extract_exp'

I don't like that at all.  The user didn't write the _vsx thing, and it
isn't documented either (neither is the _vec one, but that is a separate
issue, specific to this builtin).

> The new message provides more information.  In both cases, it is less than
> ideal that we don't refer to scalar_extract_exp, which is referenced in
> the source line, but this is because scalar_extract_exp is #define'd to
> __builtin_vec_scalar_extract_exp, so it's unavoidable.  Certainly this is no
> worse than before, and arguably better.

It is a macro, enough said there

The __builtin_ implementation should be documented (in the GCC manual,
if not elsewhere).  The warnings should talk about _vec, because the
_vsx thing only exists as implementation detail, and we should never
talk about those.  We don't have errors about adddi3 either!

>   error: '__builtin_vsx_scalar_extract_sig' requires the '-mcpu=power9' 
> option and either the '-m64' or '-mpowerpc64' option
>   note: builtin '__builtin_vec_scalar_extract_sig' requires builtin 
> '__builtin_vsx_scalar_extract_sig'

The rhs in the note does not *exist*, as far as the user is concerned.
One builtin requiring another is all gobbledygook.

> For scalar-test-neg-{2,3,5}.c, we actually change the test case.  This is
> because we deliberately removed some undocumented and pointless   
> overloads,
> where each overload mapped to a single builtin.  These were:
>   __builtin_vec_scalar_test_neg_sp
>   __builtin_vec_scalar_test_neg_dp
>   __builtin_vec_scalar_test_neg_qp
> which are redundant with the "real" overload:
>   __builtin_vec_scalar_test_neg
> The latter maps to three builtins of the appropriate type.

Yes.  And the new ones are undocumented and useless just as well, they
just have better names.


Segher


  1   2   >