Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake

2021-03-31 Thread Hongyu Wang via Gcc-patches
> > So in neither of those scenarios testing maxsize=minsize alone makes too
> > much sense to me... What was the original motivation for differentiating
> > between precisely known size?

There is a case that could meet small maxsize. https://godbolt.org/z/489Tf7ssj

typedef unsigned char e_u8;
#define MAXBC 8
void MixColumn(e_u8 a[4][MAXBC], e_u8 BC)
{
  e_u8 b[4][MAXBC];
  int i, j;

  for(i = 0; i < 4; i++)
for(j = 0; j < BC; j++) a[i][j] = b[i][j];
}

Where BC is unsigned char so maxsize will be 256.

If we set stringop_alg to rep_1_byte the code could be like

 movzbl  %sil, %r8d
 movq%rdi, %rdx
 leaq-40(%rsp), %rax
 movq%r8, %r9
 leaq-8(%rsp), %r10

 testb   %r9b, %r9b
 je  .L5
 movq%rdx, %rdi
 movq%rax, %rsi
 movq%r8, %rcx
 rep movsb

 addq$8, %rax
 addq$8, %rdx
 cmpq%r10, %rax
 jne .L2
 ret

In our test we found this is much slower than current trunk because
rep movsb triggers machine clear events, while in the current trunk
such small size is handled in the loop mov epilogue and rep movsq is
never executed.

So here we disabled inline for unknown size to avoid potential issues like this.

H.J. Lu via Gcc-patches  于2021年4月1日周四 上午1:55写道:
>
> On Wed, Mar 31, 2021 at 10:43 AM Jan Hubicka  wrote:
> >
> > > > Reading through the optimization manual it seems that mosvb is fast for
> > > > small block no matter if the size is hard wired. In that case you
> > > > probably want to check whetehr max_size or expected_size is known to be
> > > > small rather than max_size == min_size and both being small.
> > > >
> > > > But it depends on what CPU really does.
> > > > Honza
> > >
> > > For small data size, rep movsb is faster only under certain conditions.   
> > > We
> > > can continue fine tuning rep movsb.
> >
> > OK, I however wonder why you need condtion maxsize=minsize.
> >  - If CPU is looking for movl $cst, %rcx than we probably want to be
> >sure that it is not moved away fro rep ;movsb by adding fused pattern
> >  - If rep movsb is slower than loop for very small blocks then you want
> >to set lower bound on minsize & expected size, but you do not need
> >to require maxsize=minsize
> >  - If rep movsb is slower than sequence of moves for small blocks then
> >one needs to tweak move by pieces
> >  - If rep movsb is slower for larger blocks than you want to test
> >maxsize and expected size
> > So in neither of those scenarios testing maxsize=minsize alone makes too
> > much sense to me... What was the original motivation for differentiating
> > between precisely known size?
> >
> > I am mostly curious because it is not that uncomon to have small maxsize
> > because we are able to track the object size and using short sequence
> > for those would be nice.
> >
> > Having minsize non-trivial may not be that uncommon these days either
> > given that we track value ranges (and under assumption that
> > memcpy/memset expanders was updated to take these into account).
> >
>
> Hongyu has done some analysis on this.  Hongyu, can you share what
> you got?
>
> Thanks.
>
> --
> H.J.

-- 
Regards,

Hongyu, Wang


Re: [PATCH] c++, abi: Fix abi_tag attribute handling [PR98481]

2021-03-31 Thread Jakub Jelinek via Gcc-patches
On Thu, Apr 01, 2021 at 12:52:20AM -0400, Jason Merrill wrote:
> On 1/8/21 2:29 PM, Jakub Jelinek wrote:
> > On Fri, Jan 08, 2021 at 02:22:59PM -0500, Jason Merrill wrote:
> > > I like the idea to use *walk_subtrees to distinguish between walking
> > > syntactic subtrees and walking type-identity subtrees.  But it should be
> > > more general; how does this look to you?
> > 
> > LGTM, thanks.
> 
> As discussed on IRC, we probably want to fix this in GCC 10 as well, but not
> by default.  The first patch adds ABI v15 and fixes the bug for !v14, so
> -fabi-version=0 has the fix, but the default behavior does not.  The second
> patch adds ABI v15 on trunk.
> 
> It would be nice to make -Wabi/-fabi-compat-version handle this case, but
> that would require a bunch of new code unsuitable for this point in the
> process.
> 
> Does this make sense to you?

LGTM, thanks.

> commit 123f254cce416a4d50445465b88af6d8e754dc5e
> Author: Jakub Jelinek 
> Date:   Thu Jan 7 17:47:18 2021 +0100
> 
> c++, abi: Fix abi_tag attribute handling [PR98481]
> 
> In GCC10 cp_walk_subtrees has been changed to walk template arguments.
> As the following testcase, that changed the mangling of some functions.
> I believe the previous behavior that find_abi_tags_r doesn't recurse into
> template args has been the correct one, but setting *walk_subtrees = 0
> for the types and handling the types subtree walking manually in
> find_abi_tags_r looks too hard, there are a lot of subtrees and details 
> what
> should and shouldn't be walked, both in tree.c (walk_type_fields there,
> which is static) and in cp_walk_subtrees itself.
> 
> The following patch abuses the fact that *walk_subtrees is an int to
> tell cp_walk_subtrees it shouldn't walk the template args.
> 
> But we don't want to introduce an ABI change in the middle of the GCC 10
> cycle, so the GCC 10 version of this patch introduces ABI v15 for the fix,
> which will be available but not default in GCC 10.3.
> 
> Co-authored-by: Jason Merrill 
> 
> gcc/cp/ChangeLog:
> 
> PR c++/98481
> * class.c (find_abi_tags_r): Set *walk_subtrees to 2 instead of 1
> for types.
> (mark_abi_tags_r): Likewise.
> * tree.c (cp_walk_subtrees): If *walk_subtrees_p is 2, look 
> through
> typedefs.
> 
> gcc/testsuite/ChangeLog:
> 
> PR c++/98481
> * g++.dg/abi/abi-tag24.C: New test.
> * g++.dg/abi/abi-tag24a.C: New test.
> * g++.dg/abi/macro0.C: Adjust expected value.
> 
> gcc/ChangeLog:
> 
> PR c++/98481
> * common.opt (fabi-version): Default to 14.
> 
> gcc/c-family/ChangeLog:
> 
> PR c++/98481
> * c-opts.c (c_common_post_options): Bump latest_abi_version.
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 9cc47b16cac..ec5235c3a41 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -956,10 +956,13 @@ Driver Undocumented
>  ; 14: Corrects the mangling of nullptr expression.
>  ; Default in G++ 10.
>  ;
> +; 15: Corrects G++ 10 ABI tag regression [PR98481].
> +; Available, but not default, in G++ 10.3.
> +;
>  ; Additional positive integers will be assigned as new versions of
>  ; the ABI become the default version of the ABI.
>  fabi-version=
> -Common Joined RejectNegative UInteger Var(flag_abi_version) Init(0)
> +Common Joined RejectNegative UInteger Var(flag_abi_version) Init(14)
>  The version of the C++ ABI in use.
>  
>  faggressive-loop-optimizations
> diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
> index 58ba0948e79..c51d6d34726 100644
> --- a/gcc/c-family/c-opts.c
> +++ b/gcc/c-family/c-opts.c
> @@ -943,7 +943,7 @@ c_common_post_options (const char **pfilename)
>  
>/* Change flag_abi_version to be the actual current ABI level, for the
>   benefit of c_cpp_builtins, and to make comparison simpler.  */
> -  const int latest_abi_version = 14;
> +  const int latest_abi_version = 15;
>/* Generate compatibility aliases for ABI v11 (7.1) by default.  */
>const int abi_compat_default = 11;
>  
> diff --git a/gcc/cp/class.c b/gcc/cp/class.c
> index ed8f9527929..c0101130ba3 100644
> --- a/gcc/cp/class.c
> +++ b/gcc/cp/class.c
> @@ -1450,6 +1450,10 @@ mark_or_check_tags (tree t, tree *tp, abi_tag_data *p, 
> bool val)
>  static tree
>  find_abi_tags_r (tree *tp, int *walk_subtrees, void *data)
>  {
> +  if (TYPE_P (*tp) && *walk_subtrees == 1 && flag_abi_version != 14)
> +/* Tell cp_walk_subtrees to look though typedefs. [PR98481] */
> +*walk_subtrees = 2;
> +
>if (!OVERLOAD_TYPE_P (*tp))
>  return NULL_TREE;
>  
> @@ -1470,6 +1474,10 @@ find_abi_tags_r (tree *tp, int *walk_subtrees, void 
> *data)
>  static tree
>  mark_abi_tags_r (tree *tp, int *walk_subtrees, void *data)
>  {
> +  if (TYPE_P (*tp) && *walk_subtrees == 1 && flag_abi_version != 14)
> + 

Re: [PATCH] c++, abi: Fix abi_tag attribute handling [PR98481]

2021-03-31 Thread Jason Merrill via Gcc-patches

On 1/8/21 2:29 PM, Jakub Jelinek wrote:

On Fri, Jan 08, 2021 at 02:22:59PM -0500, Jason Merrill wrote:

I like the idea to use *walk_subtrees to distinguish between walking
syntactic subtrees and walking type-identity subtrees.  But it should be
more general; how does this look to you?


LGTM, thanks.


As discussed on IRC, we probably want to fix this in GCC 10 as well, but 
not by default.  The first patch adds ABI v15 and fixes the bug for 
!v14, so -fabi-version=0 has the fix, but the default behavior does not. 
 The second patch adds ABI v15 on trunk.


It would be nice to make -Wabi/-fabi-compat-version handle this case, 
but that would require a bunch of new code unsuitable for this point in 
the process.


Does this make sense to you?
commit 123f254cce416a4d50445465b88af6d8e754dc5e
Author: Jakub Jelinek 
Date:   Thu Jan 7 17:47:18 2021 +0100

c++, abi: Fix abi_tag attribute handling [PR98481]

In GCC10 cp_walk_subtrees has been changed to walk template arguments.
As the following testcase, that changed the mangling of some functions.
I believe the previous behavior that find_abi_tags_r doesn't recurse into
template args has been the correct one, but setting *walk_subtrees = 0
for the types and handling the types subtree walking manually in
find_abi_tags_r looks too hard, there are a lot of subtrees and details what
should and shouldn't be walked, both in tree.c (walk_type_fields there,
which is static) and in cp_walk_subtrees itself.

The following patch abuses the fact that *walk_subtrees is an int to
tell cp_walk_subtrees it shouldn't walk the template args.

But we don't want to introduce an ABI change in the middle of the GCC 10
cycle, so the GCC 10 version of this patch introduces ABI v15 for the fix,
which will be available but not default in GCC 10.3.

Co-authored-by: Jason Merrill 

gcc/cp/ChangeLog:

PR c++/98481
* class.c (find_abi_tags_r): Set *walk_subtrees to 2 instead of 1
for types.
(mark_abi_tags_r): Likewise.
* tree.c (cp_walk_subtrees): If *walk_subtrees_p is 2, look through
typedefs.

gcc/testsuite/ChangeLog:

PR c++/98481
* g++.dg/abi/abi-tag24.C: New test.
* g++.dg/abi/abi-tag24a.C: New test.
* g++.dg/abi/macro0.C: Adjust expected value.

gcc/ChangeLog:

PR c++/98481
* common.opt (fabi-version): Default to 14.

gcc/c-family/ChangeLog:

PR c++/98481
* c-opts.c (c_common_post_options): Bump latest_abi_version.

diff --git a/gcc/common.opt b/gcc/common.opt
index 9cc47b16cac..ec5235c3a41 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -956,10 +956,13 @@ Driver Undocumented
 ; 14: Corrects the mangling of nullptr expression.
 ; Default in G++ 10.
 ;
+; 15: Corrects G++ 10 ABI tag regression [PR98481].
+; Available, but not default, in G++ 10.3.
+;
 ; Additional positive integers will be assigned as new versions of
 ; the ABI become the default version of the ABI.
 fabi-version=
-Common Joined RejectNegative UInteger Var(flag_abi_version) Init(0)
+Common Joined RejectNegative UInteger Var(flag_abi_version) Init(14)
 The version of the C++ ABI in use.
 
 faggressive-loop-optimizations
diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 58ba0948e79..c51d6d34726 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -943,7 +943,7 @@ c_common_post_options (const char **pfilename)
 
   /* Change flag_abi_version to be the actual current ABI level, for the
  benefit of c_cpp_builtins, and to make comparison simpler.  */
-  const int latest_abi_version = 14;
+  const int latest_abi_version = 15;
   /* Generate compatibility aliases for ABI v11 (7.1) by default.  */
   const int abi_compat_default = 11;
 
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index ed8f9527929..c0101130ba3 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -1450,6 +1450,10 @@ mark_or_check_tags (tree t, tree *tp, abi_tag_data *p, bool val)
 static tree
 find_abi_tags_r (tree *tp, int *walk_subtrees, void *data)
 {
+  if (TYPE_P (*tp) && *walk_subtrees == 1 && flag_abi_version != 14)
+/* Tell cp_walk_subtrees to look though typedefs. [PR98481] */
+*walk_subtrees = 2;
+
   if (!OVERLOAD_TYPE_P (*tp))
 return NULL_TREE;
 
@@ -1470,6 +1474,10 @@ find_abi_tags_r (tree *tp, int *walk_subtrees, void *data)
 static tree
 mark_abi_tags_r (tree *tp, int *walk_subtrees, void *data)
 {
+  if (TYPE_P (*tp) && *walk_subtrees == 1 && flag_abi_version != 14)
+/* Tell cp_walk_subtrees to look though typedefs.  */
+*walk_subtrees = 2;
+
   if (!OVERLOAD_TYPE_P (*tp))
 return NULL_TREE;
 
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index b36ca4eddc0..10b818d1370 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -5055,8 +5055,18 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 

Re: Small refactoring of cgraph_node::release_body

2021-03-31 Thread Martin Liška
On 3/31/21 8:45 PM, David Edelsohn via Gcc-patches wrote:
> This patch is causing new crashes in the testsuite.
> 
> ICE in release_body, at graph.c:1863
> ranges offset out of range

Hello.

Should be fixed with 23ce9945d5efa77c96161443f68e03664705ada3.

Martin

> 
> Thanks, David
> 



Re: [PATCH] PR fortran/99840 - [8/9/10/11 Regression] ICE in gfc_simplify_matmul, at fortran/simplify.c:4777

2021-03-31 Thread Jerry DeLisle

Yes OK for trunk and affected branches.

Thanks,

Jerry

On 3/31/21 2:08 PM, Harald Anlauf via Fortran wrote:

Dear all,

the simplification of the TRANSPOSE of a zero-sized array would lead to
an ICE if the result was used in a subsequent simplification of a MATMUL.
The reason was the lack of the proper initialization of the shape, which
is mpz_t.  Use mpz_init_set instead of mpz_set.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Since this is a regression, backport to all affected branches?

Thanks,
Harald


PR fortran/99840 - ICE in gfc_simplify_matmul, at fortran/simplify.c:4777

The simplification of the transposition of a constant array shall properly
initialize and set the shape of the result.

gcc/fortran/ChangeLog:

PR fortran/99840
* simplify.c (gfc_simplify_transpose): Properly initialize
resulting shape.

gcc/testsuite/ChangeLog:

PR fortran/99840
* gfortran.dg/transpose_5.f90: New test.





[committed] analyzer: avoid printing '' for SSA names [PR99771]

2021-03-31 Thread David Malcolm via Gcc-patches
We don't want to print '' in our diagnostics, but
PR analyzer/99771 lists various cases where -fanalyzer does, due to
using the SSA_NAME for a temporary when determining the best tree to
use.

This can happen in two ways:

(a) ...when a better expression than the SSA_NAME could be built, but
finding it requires traversing the relationships in the region_model
in a graph-like way, rather than by considering individual svalues and
regions.

(b) ...when the only remaining user of the underlying svalue is the
SSA_NAME, typically due to the diagnostic referring to a temporary.

I've been experimenting with fixing (a), but don't have a good fix yet.
In the meantime, this patch addresses (b) by detecting if we have
the SSA_NAME for a temporary, and, for the cases where it's possible,
reconstructing a tree by walking the def-stmts.  This fixes various
cases of (b) and ameliorates some cases of (a).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r11-7941-ge4bb1bd60a9fd1bed36092a990aa5fed5d45bfa6.

gcc/analyzer/ChangeLog:
PR analyzer/99771
* analyzer.cc (maybe_reconstruct_from_def_stmt): New.
(fixup_tree_for_diagnostic_1): New.
(fixup_tree_for_diagnostic): New.
* analyzer.h (fixup_tree_for_diagnostic): New decl.
* checker-path.cc (call_event::get_desc): Call
fixup_tree_for_diagnostic and use it for the call_with_state call.
(warning_event::get_desc): Likewise for the final_event and
make_label_text calls.
* engine.cc (impl_region_model_context::on_state_leak): Likewise
for the on_leak and add_diagnostic calls.
* region-model.cc (region_model::get_representative_tree):
Likewise for the result.

gcc/testsuite/ChangeLog:
PR analyzer/99771
* gcc.dg/analyzer/data-model-10.c: Update expected output.
* gcc.dg/analyzer/malloc-ipa-13.c: Likewise.
* gcc.dg/analyzer/malloc-ipa-13a.c: New test.
* gcc.dg/analyzer/pr99771-1.c: New test.
---
 gcc/analyzer/analyzer.cc  | 128 ++
 gcc/analyzer/analyzer.h   |   1 +
 gcc/analyzer/checker-path.cc  |  10 +-
 gcc/analyzer/engine.cc|   5 +-
 gcc/analyzer/region-model.cc  |   4 +-
 gcc/testsuite/gcc.dg/analyzer/data-model-10.c |   3 +-
 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-13.c |   3 +-
 .../gcc.dg/analyzer/malloc-ipa-13a.c  |  38 ++
 gcc/testsuite/gcc.dg/analyzer/pr99771-1.c |  63 +
 9 files changed, 244 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-13a.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr99771-1.c

diff --git a/gcc/analyzer/analyzer.cc b/gcc/analyzer/analyzer.cc
index df8d881f3cd..2b4cffd08f5 100644
--- a/gcc/analyzer/analyzer.cc
+++ b/gcc/analyzer/analyzer.cc
@@ -60,6 +60,134 @@ get_stmt_location (const gimple *stmt, function *fun)
   return stmt->location;
 }
 
+static tree
+fixup_tree_for_diagnostic_1 (tree expr, hash_set *visited);
+
+/*  Subroutine of fixup_tree_for_diagnostic_1, called on SSA names.
+Attempt to reconstruct a a tree expression for SSA_NAME
+based on its def-stmt.
+SSA_NAME must be non-NULL.
+VISITED must be non-NULL; it is used to ensure termination.
+
+Return NULL_TREE if there is a problem.  */
+
+static tree
+maybe_reconstruct_from_def_stmt (tree ssa_name,
+hash_set *visited)
+{
+  /* Ensure termination.  */
+  if (visited->contains (ssa_name))
+return NULL_TREE;
+  visited->add (ssa_name);
+
+  gimple *def_stmt = SSA_NAME_DEF_STMT (ssa_name);
+
+  switch (gimple_code (def_stmt))
+{
+default:
+  gcc_unreachable ();
+case GIMPLE_NOP:
+case GIMPLE_PHI:
+  /* Can't handle these.  */
+  return NULL_TREE;
+case GIMPLE_ASSIGN:
+  {
+   enum tree_code code = gimple_assign_rhs_code (def_stmt);
+
+   /* Reverse the effect of extract_ops_from_tree during
+  gimplification.  */
+   switch (get_gimple_rhs_class (code))
+ {
+ default:
+ case GIMPLE_INVALID_RHS:
+   gcc_unreachable ();
+ case GIMPLE_TERNARY_RHS:
+ case GIMPLE_BINARY_RHS:
+ case GIMPLE_UNARY_RHS:
+   {
+ tree t = make_node (code);
+ TREE_TYPE (t) = TREE_TYPE (ssa_name);
+ unsigned num_rhs_args = gimple_num_ops (def_stmt) - 1;
+ for (unsigned i = 0; i < num_rhs_args; i++)
+   {
+ tree op = gimple_op (def_stmt, i + 1);
+ if (op)
+   {
+ op = fixup_tree_for_diagnostic_1 (op, visited);
+ if (op == NULL_TREE)
+   return NULL_TREE;
+   }
+ TREE_OPERAND (t, i) = op;
+   }
+ return t;
+   }
+ case GIMPLE_SINGLE_RHS:
+   {
+

[PATCH] tree-optimization: Optimize division followed by multiply [PR95176]

2021-03-31 Thread Victor Tong via Gcc-patches
Hello,

This patch fixes PR tree-optimization/95176. A new pattern in match.pd was 
added to transform "a * (b / a)" --> "b - (b % a)". A new test case was also 
added to cover this scenario.

The new pattern interfered with the existing pattern of "X - (X / Y) * Y". In 
some cases (such as in fn4() in gcc/testsuite/gcc.dg/fold-minus-6.c), the new 
pattern is applied causing the existing pattern to no longer apply. This 
results in worse code generation because the expression is left as "X - (X - 
Y)". An additional subtraction pattern of "X - (X - Y) --> Y" was added to this 
patch to avoid this regression.

I also didn't remove the existing pattern because it triggered in more cases 
than the new pattern because of a tree_invariant_p check that's inserted by 
genmatch for the new pattern.

I verified that all "make -k check" tests pass when targeting 
x86_64-pc-linux-gnu.

2021-03-31  Victor Tong   

gcc/ChangeLog:

* match.pd: Two new patterns: One to optimize division followed by 
multiply and the other to avoid a regression as explained above

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/20030807-10.c: Update existing test to look for a 
subtraction because a shift is no longer emitted
* gcc.dg/pr95176.c: New test to cover optimizing division followed by 
multiply

I don't have write access to the GCC repo but I've completed the FSF paperwork 
as I plan to make more contributions in the future. I'm looking for a 
sponsorship from an existing GCC maintainer before applying for write access.

Thanks,
Victor

pr95176.patch
Description: pr95176.patch


[PATCH] sra: Fix bug in grp_write propagation (PR 97009)

2021-03-31 Thread Martin Jambor
Hi,

SRA represents parts of aggregates which are arrays accessed with
unknown index as "unscalarizable regions."  When there are two such
regions one within another and the outer is only read whereas the
inner is written to, SRA fails to propagate that write information
across assignments.  This means that a second aggregate can contain
data while SRA thinks it does not and the pass can wrongly eliminate
big chunks of assignment from that second aggregate into a third
aggregate, which is what happens in PR 97009.

Fixed by checking all children of unscalariable accesses for the
grp_write flag.

Bootstrap and testing on top of trunk and the gcc-10 branch is underway
and I also plan to backport this to gcc-9.  OK if they pass?

Thanks,

Martin


gcc/ChangeLog:

2021-03-31  Martin Jambor  

PR tree-optimization/97009
* tree-sra.c (access_or_its_child_written): New function.
(propagate_subaccesses_from_rhs): Use it instead of a simple grp_write
test.

gcc/testsuite/ChangeLog:

2021-03-31  Martin Jambor  

PR tree-optimization/97009
* gcc.dg/tree-ssa/pr97009.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr97009.c | 66 +
 gcc/tree-sra.c  | 15 +-
 2 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr97009.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr97009.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr97009.c
new file mode 100644
index 000..741dbc270c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr97009.c
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+
+static int __attribute__((noipa))
+get_5 (void)
+{
+  return 5;
+}
+
+static int __attribute__((noipa))
+verify_5 (int v)
+{
+  if (v != 5)
+__builtin_abort ();
+}
+
+struct T
+{
+  int w;
+  int a[4];
+};
+
+struct S
+{
+  int v;
+  int x;
+  struct T t[2];
+  char alotofstuff[128];
+};
+
+volatile int vol;
+
+void __attribute__((noipa))
+consume_t (struct T t)
+{
+  vol = t.a[0];
+}
+
+int __attribute__((noipa))
+foo (int l1, int l2)
+{
+  struct S s1, s2, s3;
+  int i, j;
+
+  s1.v = get_5 ();
+  for (i = 0; i < l1; i++)
+{
+  for (j = 0; j < l2; j++)
+   s1.t[i].a[j] = get_5 ();
+  consume_t(s1.t[i]);
+}
+
+  s2 = s1;
+
+  s3 = s2;
+  for (i = 0; i < l1; i++)
+for (j = 0; j < l2; j++)
+  verify_5 (s3.t[i].a[j]);
+}
+
+int
+main (int argc, char *argv[])
+{
+  foo (2, 4);
+  return 0;
+}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index d177f1ba11c..8dfc923ed7e 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -2723,6 +2723,19 @@ budget_for_propagation_access (tree decl)
   return true;
 }
 
+/* Return true if ACC or any of its subaccesses has grp_child set.  */
+
+static bool
+access_or_its_child_written (struct access *acc)
+{
+  if (acc->grp_write)
+return true;
+  for (struct access *sub = acc->first_child; sub; sub = sub->next_sibling)
+if (access_or_its_child_written (sub))
+  return true;
+  return false;
+}
+
 /* Propagate subaccesses and grp_write flags of RACC across an assignment link
to LACC.  Enqueue sub-accesses as necessary so that the write flag is
propagated transitively.  Return true if anything changed.  Additionally, if
@@ -2836,7 +2849,7 @@ propagate_subaccesses_from_rhs (struct access *lacc, 
struct access *racc)
   if (rchild->grp_unscalarizable_region
  || !budget_for_propagation_access (lacc->base))
{
- if (rchild->grp_write && !lacc->grp_write)
+ if (!lacc->grp_write && access_or_its_child_written (rchild))
{
  ret = true;
  subtree_mark_written_and_rhs_enqueue (lacc);
-- 
2.30.2



[PATCH] PR fortran/99840 - [8/9/10/11 Regression] ICE in gfc_simplify_matmul, at fortran/simplify.c:4777

2021-03-31 Thread Harald Anlauf via Gcc-patches
Dear all,

the simplification of the TRANSPOSE of a zero-sized array would lead to
an ICE if the result was used in a subsequent simplification of a MATMUL.
The reason was the lack of the proper initialization of the shape, which
is mpz_t.  Use mpz_init_set instead of mpz_set.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Since this is a regression, backport to all affected branches?

Thanks,
Harald


PR fortran/99840 - ICE in gfc_simplify_matmul, at fortran/simplify.c:4777

The simplification of the transposition of a constant array shall properly
initialize and set the shape of the result.

gcc/fortran/ChangeLog:

PR fortran/99840
* simplify.c (gfc_simplify_transpose): Properly initialize
resulting shape.

gcc/testsuite/ChangeLog:

PR fortran/99840
* gfortran.dg/transpose_5.f90: New test.

diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index 388aca7c38c..c27b47aa98f 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -8123,8 +8123,8 @@ gfc_simplify_transpose (gfc_expr *matrix)
 			   >where);
   result->rank = 2;
   result->shape = gfc_get_shape (result->rank);
-  mpz_set (result->shape[0], matrix->shape[1]);
-  mpz_set (result->shape[1], matrix->shape[0]);
+  mpz_init_set (result->shape[0], matrix->shape[1]);
+  mpz_init_set (result->shape[1], matrix->shape[0]);

   if (matrix->ts.type == BT_CHARACTER)
 result->ts.u.cl = matrix->ts.u.cl;
diff --git a/gcc/testsuite/gfortran.dg/transpose_5.f90 b/gcc/testsuite/gfortran.dg/transpose_5.f90
new file mode 100644
index 000..682b1c8552b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/transpose_5.f90
@@ -0,0 +1,8 @@
+! { dg-do compiler }
+! { dg-options "-O2" }
+! PR fortran/99840 - ICE in gfc_simplify_matmul, at fortran/simplify.c:4777
+program p
+  integer, parameter :: x(0,0) = 0
+  integer :: y(0,0)
+  y = matmul (x, transpose(x))
+end


[committed] [PR99781] Update correctly reg notes in LRA for multi-registers and set up biggest mode safely

2021-03-31 Thread Vladimir Makarov via Gcc-patches

The following patch fixes

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99781

The patch was successfully bootstrapped and tested on x86-64, ppc64le, 
and aarch64.


commit 1458059fc1faf6170f2fe45159065f91876307ac
Author: Vladimir N. Makarov 
Date:   Wed Mar 31 13:26:30 2021 -0400

[PR99781] Update correctly reg notes in LRA for multi-registers and set up biggest mode safely

The PR is about incorrect use of partial_subreg_p for unordered modes.
I found 2 places of dangerous comparing unordered modes in LRA.  The
patch removes dangerous use of paradoxical_subreg_p and
partial_subreg_p in split_reg and process_bb_lives.  The both places
used them to solve PR77761 long time ago.  But the problem was also
fixed by later patches too (if there is no hard reg explicitly, it
have VOIDmode and we use natural mode to split hard reg live,
otherwise we use the biggest explicitly used mode for hard reg
splitting).  The PR also says about inaccurate update of reg notes in
LRA.  It happens for reg notes which refer for multi-registers.  The
patch also fixes this issue.

gcc/ChangeLog:

PR target/99781
* lra-constraints.c (split_reg): Don't check paradoxical_subreg_p.
* lra-lives.c (clear_sparseset_regnos, regnos_in_sparseset_p): New
functions.
(process_bb_lives): Don't update biggest mode of hard reg for
implicit in multi-register group.  Use the new functions for
updating dead_set and unused_set by register notes.

gcc/testsuite/ChangeLog:

PR target/99781
* g++.target/aarch64/sve/pr99781.C: New.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 9993065f8d6..62bcfc31772 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -5796,12 +5796,11 @@ split_reg (bool before_p, int original_regno, rtx_insn *insn,
   nregs = 1;
   mode = lra_reg_info[hard_regno].biggest_mode;
   machine_mode reg_rtx_mode = GET_MODE (regno_reg_rtx[hard_regno]);
-  /* A reg can have a biggest_mode of VOIDmode if it was only ever seen
-	 as part of a multi-word register.  In that case, or if the biggest
-	 mode was larger than a register, just use the reg_rtx.  Otherwise,
-	 limit the size to that of the biggest access in the function.  */
-  if (mode == VOIDmode
-	  || paradoxical_subreg_p (mode, reg_rtx_mode))
+  /* A reg can have a biggest_mode of VOIDmode if it was only ever seen as
+	 part of a multi-word register.  In that case, just use the reg_rtx.
+	 Otherwise, limit the size to that of the biggest access in the
+	 function.  */
+  if (mode == VOIDmode)
 	{
 	  original_reg = regno_reg_rtx[hard_regno];
 	  mode = reg_rtx_mode;
diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 0bddca13fee..29531843c63 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -615,6 +615,32 @@ reg_early_clobber_p (const struct lra_insn_reg *reg, int n_alt)
 	 && TEST_BIT (reg->early_clobber_alts, n_alt)));
 }
 
+/* Clear pseudo REGNO in SET or all hard registers of REGNO in MODE in SET.  */
+static void
+clear_sparseset_regnos (sparseset set, int regno, enum machine_mode mode)
+{
+  if (regno >= FIRST_PSEUDO_REGISTER)
+{
+  sparseset_clear_bit (dead_set, regno);
+  return;
+}
+  for (int last = end_hard_regno (mode, regno); regno < last; regno++)
+sparseset_clear_bit (set, regno);
+}
+
+/* Return true if pseudo REGNO is in SET or all hard registers of REGNO in MODE
+   are in SET.  */
+static bool
+regnos_in_sparseset_p (sparseset set, int regno, enum machine_mode mode)
+{
+  if (regno >= FIRST_PSEUDO_REGISTER)
+return sparseset_bit_p (dead_set, regno);
+  for (int last = end_hard_regno (mode, regno); regno < last; regno++)
+if (!sparseset_bit_p (set, regno))
+  return false;
+  return true;
+}
+
 /* Process insns of the basic block BB to update pseudo live ranges,
pseudo hard register conflicts, and insn notes.  We do it on
backward scan of BB insns.  CURR_POINT is the program point where
@@ -739,24 +765,13 @@ process_bb_lives (basic_block bb, int _point, bool dead_insn_p)
   /* Update max ref width and hard reg usage.  */
   for (reg = curr_id->regs; reg != NULL; reg = reg->next)
 	{
-	  int i, regno = reg->regno;
+	  int regno = reg->regno;
 
 	  if (partial_subreg_p (lra_reg_info[regno].biggest_mode,
 reg->biggest_mode))
 	lra_reg_info[regno].biggest_mode = reg->biggest_mode;
 	  if (HARD_REGISTER_NUM_P (regno))
-	{
-	  lra_hard_reg_usage[regno] += freq;
-	  /* A hard register explicitly can be used in small mode,
-		 but implicitly it can be used in natural mode as a
-		 part of multi-register group.  Process this case
-		 here.  */
-	  for (i = 1; i < hard_regno_nregs (regno, reg->biggest_mode); i++)
-		if (partial_subreg_p (lra_reg_info[regno + i].biggest_mode,
-  GET_MODE (regno_reg_rtx[regno + i])))
-		  lra_reg_info[regno + 

Re: [r11-7926 Regression] FAIL: libgomp.c/declare-variant-1.c (test for excess errors) on Linux/x86_64

2021-03-31 Thread H.J. Lu via Gcc-patches
On Wed, Mar 31, 2021 at 11:21 AM Jan Hubicka  wrote:
>
> > On Linux/x86_64,
> >
> > d7145b4bb6c8729a1e782373cb6256c06ed60465 is the first bad commit
> > commit d7145b4bb6c8729a1e782373cb6256c06ed60465
> > Author: Jan Hubicka 
> > Date:   Wed Mar 31 11:35:29 2021 +0200
> >
> > Small refactoring of cgraph_node::release_body
> >
> > caused
> >
> > FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++14 (internal compiler error)
>
> This should be fixed by the attached patch. Sorry for this - I added the
> check at last minute and apparently did not fully retest.
> We now have clones of thunks that does make sense even after body was
> released.  What does not make much sense to me is why we consider them
> clones at all, but that is something to revisit next stage1.
>
> Honza
>
> gcc/ChangeLog:
>
> PR lto/99447
> * cgraph.c (cgraph_node::release_body): Fix overactive check.
>
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index b77c676a58a..d7c78d518bc 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -1860,7 +1860,15 @@ cgraph_node::release_body (bool keep_arguments)
>lto_free_function_in_decl_state_for_node (this);
>lto_file_data = NULL;
>  }
> -  gcc_assert (!clones);
> +  if (flag_checking && clones)
> +{
> +  /* It is invalid to release body before materializing clones except
> +for thunks that don't really need a body.  Verify also that we do
> +not leak pointers to the call statements.  */
> +  for (cgraph_node *node = clones; node;
> +  node = node->next_sibling_clone)
> +   gcc_assert (node->thunk && !node->callees->call_stmt);
> +}
>remove_callees ();
>remove_all_references ();
>  }
> > FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++14 (test for excess errors)
> > FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++17 (internal compiler error)
> > FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++17 (test for excess errors)
> > FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++2a (internal compiler error)
> > FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++2a (test for excess errors)
> > FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++98 (internal compiler error)
> > FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++98 (test for excess errors)
> > FAIL: g++.dg/ipa/pr71146.C  -std=gnu++14 (internal compiler error)
> > FAIL: g++.dg/ipa/pr71146.C  -std=gnu++14 (test for excess errors)
> > FAIL: g++.dg/ipa/pr71146.C  -std=gnu++17 (internal compiler error)
> > FAIL: g++.dg/ipa/pr71146.C  -std=gnu++17 (test for excess errors)
> > FAIL: g++.dg/ipa/pr71146.C  -std=gnu++2a (internal compiler error)
> > FAIL: g++.dg/ipa/pr71146.C  -std=gnu++2a (test for excess errors)
> > FAIL: g++.dg/ipa/pr71146.C  -std=gnu++98 (internal compiler error)
> > FAIL: g++.dg/ipa/pr71146.C  -std=gnu++98 (test for excess errors)
> > FAIL: g++.dg/ipa/pr85421.C   (internal compiler error)
> > FAIL: g++.dg/ipa/pr85421.C   (test for excess errors)
> > FAIL: g++.dg/ipa/pr92528.C   (internal compiler error)
> > FAIL: g++.dg/ipa/pr92528.C   (test for excess errors)
> > FAIL: g++.dg/torture/covariant-1.C   -O2 -flto -fno-use-linker-plugin 
> > -flto-partition=none  (internal compiler error)
> > FAIL: g++.dg/torture/covariant-1.C   -O2 -flto -fno-use-linker-plugin 
> > -flto-partition=none  (test for excess errors)
> > FAIL: g++.dg/torture/covariant-1.C   -O2 -flto -fuse-linker-plugin 
> > -fno-fat-lto-objects  (internal compiler error)
> > FAIL: g++.dg/torture/covariant-1.C   -O2 -flto -fuse-linker-plugin 
> > -fno-fat-lto-objects  (test for excess errors)
> > FAIL: g++.dg/torture/pr46287.C   -O2 -flto -fno-use-linker-plugin 
> > -flto-partition=none  (internal compiler error)
> > FAIL: g++.dg/torture/pr46287.C   -O2 -flto -fno-use-linker-plugin 
> > -flto-partition=none  (test for excess errors)
> > FAIL: g++.dg/torture/pr46287.C   -O2 -flto -fuse-linker-plugin 
> > -fno-fat-lto-objects  (internal compiler error)
> > FAIL: g++.dg/torture/pr46287.C   -O2 -flto -fuse-linker-plugin 
> > -fno-fat-lto-objects  (test for excess errors)
> > FAIL: g++.dg/torture/pr46287.C   -O2  (internal compiler error)
> > FAIL: g++.dg/torture/pr46287.C   -O2  (test for excess errors)
> > FAIL: g++.dg/torture/pr46287.C   -O3 -fomit-frame-pointer -funroll-loops 
> > -fpeel-loops -ftracer -finline-functions  (internal compiler error)
> > FAIL: g++.dg/torture/pr46287.C   -O3 -fomit-frame-pointer -funroll-loops 
> > -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> > FAIL: g++.dg/torture/pr46287.C   -O3 -g  (internal compiler error)
> > FAIL: g++.dg/torture/pr46287.C   -O3 -g  (test for excess errors)
> > FAIL: g++.dg/torture/pr78692.C   -O2 -flto -fno-use-linker-plugin 
> > -flto-partition=none  (internal compiler error)
> > FAIL: g++.dg/torture/pr78692.C   -O2 -flto -fno-use-linker-plugin 
> > -flto-partition=none  (test for excess errors)
> > FAIL: g++.dg/torture/pr78692.C   -O2  (internal compiler error)
> > FAIL: g++.dg/torture/pr78692.C   -O2  (test for excess errors)
> > FAIL: g++.dg/torture/pr78692.C   -O3 -g  (internal compiler 

[PATCH] modules : Make sure we include in system.h.

2021-03-31 Thread Iain Sandoe

Hi,

This fixes a stage 1 bootstrap fail on some Darwin versions when the
bootstrap compiler is clang / libc++ from Xcode.

bootstrapped on x86_64-darwin16, x86_64-linux-gnu
OK for master?
thanks
Iain




It appears that many targets include the map header transitively in
other std headers included from system.h.  However there are some
editions of clang/libc++ in Xcode that do not, which results in a
bootstrap fail - since when resolver.h is included  there is then a
conflict in declaring abort().

The fix is to ensure that map is pulled in by system.h and before
resolver.h is included.  As a precautionary measure and to alert
anyone perhaps adding another header to resolver.h this patch also
gates the direct includes there on !IN_GCC.

c++tools/ChangeLog:

* resolver.h: Do not include std headers directly when
building in GCC.

gcc/cp/ChangeLog:

* mapper-client.cc (INCLUDE_MAP): New; require map to be
included from system.h.
* mapper-resolver.cc (INCLUDE_MAP): Likewise.
---
 c++tools/resolver.h   | 2 ++
 gcc/cp/mapper-client.cc   | 1 +
 gcc/cp/mapper-resolver.cc | 1 +
 3 files changed, 4 insertions(+)

diff --git a/c++tools/resolver.h b/c++tools/resolver.h
index 19339125b26..a9547bf6994 100644
--- a/c++tools/resolver.h
+++ b/c++tools/resolver.h
@@ -24,8 +24,10 @@ along with GCC; see the file COPYING3.  If not see
 // Mapper interface for client and server bits
 #include "cody.hh"
 // C++
+#if !IN_GCC
 #include 
 #include 
+#endif

 // This is a GCC class, so GCC coding conventions on new bits.
 class module_resolver : public Cody::Resolver
diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc
index 774e2b2b095..b9e02168d55 100644
--- a/gcc/cp/mapper-client.cc
+++ b/gcc/cp/mapper-client.cc
@@ -26,6 +26,7 @@ along with GCC; see the file COPYING3.  If not see
 #endif
 #define INCLUDE_STRING
 #define INCLUDE_VECTOR
+#define INCLUDE_MAP
 #include "system.h"

 #include "line-map.h"
diff --git a/gcc/cp/mapper-resolver.cc b/gcc/cp/mapper-resolver.cc
index bcf6c8871e5..db443fb4948 100644
--- a/gcc/cp/mapper-resolver.cc
+++ b/gcc/cp/mapper-resolver.cc
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 #define INCLUDE_STRING
 #define INCLUDE_VECTOR
 #define INCLUDE_ALGORITHM
+#define INCLUDE_MAP
 #include "system.h"

 // We don't want or need to be aware of networking
--
2.24.1




[PATCH] bswap: Handle bswapping of pointers [PR96573]

2021-03-31 Thread Jakub Jelinek via Gcc-patches
Hi!

In GCC8/9 we used to optimize this into a bswap, but we no longer do.
Handling byteswapping of pointers is easy, all we need is to allow them,
for the __builtin_bswap* we already use TYPE_PRECISION to determine
the precision and we cast the operand and result to the correct type
if they aren't uselessly convertible to what the builtin expects.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-03-31  Jakub Jelinek  

PR tree-optimization/96573
* gimple-ssa-store-merging.c (init_symbolic_number): Handle
also pointer types.

* gcc.dg/pr96573.c: New test.

--- gcc/gimple-ssa-store-merging.c.jj   2021-02-22 17:54:05.701798074 +0100
+++ gcc/gimple-ssa-store-merging.c  2021-03-31 11:30:55.519845647 +0200
@@ -333,7 +333,7 @@ init_symbolic_number (struct symbolic_nu
 {
   int size;
 
-  if (! INTEGRAL_TYPE_P (TREE_TYPE (src)))
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (src)) && !POINTER_TYPE_P (TREE_TYPE (src)))
 return false;
 
   n->base_addr = n->offset = n->alias_set = n->vuse = NULL_TREE;
--- gcc/testsuite/gcc.dg/pr96573.c.jj   2021-03-31 11:39:19.382122478 +0200
+++ gcc/testsuite/gcc.dg/pr96573.c  2021-03-31 11:39:54.186727148 +0200
@@ -0,0 +1,20 @@
+/* PR tree-optimization/96573 */
+/* { dg-do compile { target { lp64 || ilp32 } } } */
+/* { dg-require-effective-target bswap } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump "__builtin_bswap" "optimized" } } */
+
+typedef __SIZE_TYPE__ size_t;
+
+void *
+foo (void * const p)
+{
+  const size_t m = sizeof (p) - 1;
+  const unsigned char * const o = (unsigned char*) 
+  void *n;
+  unsigned char * const q = (unsigned char *) 
+  unsigned char i;
+  for (i = 0; i <= m; ++i)
+q[m - i] = o[i];
+  return n;
+}

Jakub



Re: [PATCH] rs6000: MMA test case ICEs using -O3

2021-03-31 Thread Segher Boessenkool
Hi!

On Tue, Mar 30, 2021 at 06:49:29PM -0500, Peter Bergner wrote:
> The mma_assemble_input_operand predicate does not accept reg+reg indexed
> addresses which can lead to ICEs.  The problem is that the quad_address_p
> function only accepts reg+offset addresses that are valid for quad word
> accesses, but not reg+reg addresses which are also valid for quad word
> accesses when dealing with vector types.  The solution used here is to
> call memory_operand, which uses rs6000_legitimate_address_p to ensure
> the address is valid.  For reg+offset addresses, it uses quad_address_p like
> before, but for reg+reg addresses, it calls legitimate_indexed_address_p
> addresses which fixes this specific ICE.

quad_address_p should really only be used for lq/stq (and their prefixed
forms).  Those insns have very different semantics: they are atomic,
while vector loads and stores are not; and the prefixed form has some
special semantics (it swaps halves on LE).

> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 859af75dfbd..e48c6eee19e 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -1171,8 +1171,7 @@
>  (define_special_predicate "mma_assemble_input_operand"
>(match_test "(mode == V16QImode
>   && (vsx_register_operand (op, mode)
> - || (MEM_P (op)
> - && quad_address_p (XEXP (op, 0), mode, false"))
> + || memory_operand (op, mode)))"))

This seems like it might need reloads very often, because it allows way
too much?  Or, can you just use reg_or_mem_operand?

> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/pr99842.C
> @@ -0,0 +1,188 @@
> +/* PR target/99842 */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power10 -w" } */

(Please document what warning you want to shut up.  Just a few words is
plenty.)

That testcase will likely not show the error anymore just a year or so
from now, but it is good to have more complex testcases anyhow.


Segher


[committed] wwwdocs: Don't list RMS as member of the Steering Committee

2021-03-31 Thread Gerald Pfeifer
Per https://gcc.gnu.org/pipermail/gcc/2021-March/235245.html

"In 2012 RMS was added to the GCC Steering Committee web page based
on his role in the GNU Project, though his role as a member of the
Steering Committee has been ambiguous and he was not a member of
the Steering Committee when EGCS became GCC[1] wwwdocs:.  We no longer feel
that this listing serves the best interests of the GCC developer
and user community.  Therefore, we are removing him from the page.

GCC supports the principles of Free Software and has remained
aligned with the GNU Project since EGCS became GCC, but effectively
has continued to operate as an autonomous project.

The GCC Steering Committee is committed to providing a friendly, safe
and welcoming environment for all, regardless of gender identity and
expression, sexual orientation, disabilities, neurodiversity, physical
appearance, body size, ethnicity, nationality, race, age, religion, or
similar personal characteristics.

- The GCC Steering Committee

[1] wwwdocs: https://static.lwn.net/1999/0429/a/gcc.html;
---
 htdocs/steering.html | 6 --
 1 file changed, 6 deletions(-)

diff --git a/htdocs/steering.html b/htdocs/steering.html
index 53faba78..7192b424 100644
--- a/htdocs/steering.html
+++ b/htdocs/steering.html
@@ -44,12 +44,6 @@ place to reach them is the gcc mailing 
list.
 Jim Wilson (SiFive)
 
 
-and
-
-
-Richard Stallman (Free Software Foundation)
-
-
 Membership in the steering committee is a personal membership.
 Affiliations are listed for identification purposes only; steering
 committee members do not represent their employers or academic
-- 
2.30.2


Re: Small refactoring of cgraph_node::release_body

2021-03-31 Thread David Edelsohn via Gcc-patches
This patch is causing new crashes in the testsuite.

ICE in release_body, at graph.c:1863
ranges offset out of range

Thanks, David


Re: improve future::poll calibration loop

2021-03-31 Thread Alexandre Oliva
On Mar 27, 2021, Jonathan Wakely  wrote:

> OK, thanks.

Thanks, I'm finally checking this in.

>> for  libstdc++-v3/ChangeLog
>> 
>> * testsuite/30_threads/future/members/poll.cc: Use faster
>> after-ready call in the calibration loop.


Thanks for the patch and for having kept me posted on the random_device
changes too!

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: [PATCH] data-ref: Tighten index-based alias checks [PR99726]

2021-03-31 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Wed, Mar 31, 2021 at 12:15 PM Richard Sandiford via Gcc-patches
>  wrote:
>>
>> create_intersect_range_checks_index tries to create a runtime
>> alias check based on index comparisons.  It looks through the
>> access functions for the two DRs to find a SCEV for the loop
>> that is being versioned and converts a DR_STEP-based check
>> into an index-based check.
>>
>> However, there isn't any reliable sign information in the types,
>> so the code expects the value of the IV step (when interpreted as
>> signed) to be negative iff the DR_STEP (when interpreted as signed)
>> is negative.
>>
>> r10-4762 added another assert related to this assumption and the
>> assert fired for the testcase in the PR.  The sign of the IV step
>> didn't match the sign of the DR_STEP.
>>
>> I think this is actually showing what was previously a wrong-code bug.
>> The signs didn't match because the DRs contained *two* access function
>> SCEVs for the loop being versioned.  It doesn't look like the code
>> is set up to deal with this, since it checks each access function
>> independently and treats it as the sole source of DR_STEP.
>
> I think it shows that matching DR_STEP with the access function step
> doesn't make much sense.  In practice it will work for the case where
> there's a single access function evolving in the respective loop since
> we don't have negative stride arrays via array_ref_element_size
> (do we?).  But of course for multiple access functions we can't
> simply choose one to do an index overlap test but instead we'd
> have to check all of them.
>
> That said, I wonder why the code needs the DR_STEP at all
> and cannot simply use the SCEVs step [direction].

Not 100% sure either, but I'm guessing it's because we don't have direct
access to the vectorisation factor here (or the parloops equivalent).
So I guess the code has no choice other than to work out the required
number of iterations from the step.

It would probably still be possible to handle multiple access
functions by accumulating the SCEV steps, even with the current
DR_STEP approach.  But I'm not sure the risk/reward is worth it.

Thanks,
Richard

> Of course the patch is correct in that it restricts handling to a single
> access fn evolving in the loop.  Thus - OK.  Removing the DR_STEP
> restriction might allow more (weird, see above) cases to be handled.
>
> Thanks,
> Richard.
>
>> The patch therefore moves the main condition out of the loop.
>> This also has the advantage of not building a tree for one access
>> function only to throw it away if we find an inner function that
>> makes the comparison invalid.
>>
>> Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu.
>> OK to install?
>>
>> Richard
>>
>>
>> gcc/
>> PR tree-optimization/99726
>> * tree-data-ref.c (create_intersect_range_checks_index): Bail
>> out if there is more than one access function SCEV for the loop
>> being versioned.
>>
>> gcc/testsuite/
>> PR tree-optimization/99726
>> * gcc.target/i386/pr99726.c: New test.
>> ---
>>  gcc/testsuite/gcc.target/i386/pr99726.c |  15 ++
>>  gcc/tree-data-ref.c | 245 +---
>>  2 files changed, 143 insertions(+), 117 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr99726.c
>>
>> [ -b version ]--
>>
>> diff --git a/gcc/testsuite/gcc.target/i386/pr99726.c 
>> b/gcc/testsuite/gcc.target/i386/pr99726.c
>> new file mode 100644
>> index 000..ff19bcabe4f
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/pr99726.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-options "-flive-patching=inline-clone -mavx512f -O2 
>> -floop-nest-optimize -ftree-loop-vectorize -ftrapv -m32" } */
>> +
>> +extern int a[256][1024];
>> +int b;
>> +long c, d;
>> +unsigned int e;
>> +
>> +int
>> +main ()
>> +{
>> +  for (; e < d; e++)
>> +for (unsigned j = 1; j < c; j++)
>> +  a[e][j] = b * a[e - 1][j + 1];
>> +  return 0;
>> +}
>> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
>> index 124a7bea6a9..e6dd5f15bed 100644
>> --- a/gcc/tree-data-ref.c
>> +++ b/gcc/tree-data-ref.c
>> @@ -2147,8 +2147,8 @@ create_intersect_range_checks_index (class loop *loop, 
>> tree *cond_expr,
>>
>>bool waw_or_war_p = (alias_pair.flags & ~(DR_ALIAS_WAR | DR_ALIAS_WAW)) 
>> == 0;
>>
>> -  unsigned int i;
>> -  for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
>> +  int found = -1;
>> +  for (unsigned int i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
>>  {
>>tree access1 = DR_ACCESS_FN (dr_a.dr, i);
>>tree access2 = DR_ACCESS_FN (dr_b.dr, i);
>> @@ -2164,7 +2164,19 @@ create_intersect_range_checks_index (class loop 
>> *loop, tree *cond_expr,
>>
>>   return false;
>> }
>> +  if (found >= 0)
>> +   return false;
>> +  found = i;
>> +}
>> +
>> +  /* Ought not to happen in practice, since if all accesses are equal 

[PATCH] rs6000: Fix up libgcc ABI when built with --with-long-double-format=ieee [PR97653]

2021-03-31 Thread Jakub Jelinek via Gcc-patches
Hi!

__floatunditf and __fixtfdi and a couple of other libgcc{.a,_s.so}
entrypoints for backwards compatibility should mean IBM double double
handling (i.e. IFmode), gcc emits such calls for that format and
form IEEE long double emits *kf* instead.
When gcc is configured without --with-long-double-format=ieee ,
everything is fine, but when it is not, we need to compile those
libgcc sources with -mno-gnu-attribute -mabi=ibmlongdouble.
The following snippet in libgcc/config/rs6000/t-linux was attempting
to ensure that, and for some routines it works fine (e.g. for _powitf2).
But, due to 4 different types of bugs it doesn't work for most of those
functions, which means that in --with-long-double-format=ieee
configured gcc those *tf* entrypoints instead handle the long double
arguments as if they were KFmode.

The bugs are:
1) the first few objs properly use $(objext) as suffix, but
   several other contain a typo and use $(object) instead,
   which is a variable that isn't set to anything, so we don't
   add .o etc. extensions
2) while unsigned fix are properly called _fixuns*, unsigned float
   are called _floatun* (without s), but the var was using there
   the extra s and so didn't match
3) the variable didn't cover any of the TF <-> TI conversions,
   only TF <-> DI conversions
4) nothing in libgcc_s.so was handled, as those object files are
   called *_s.o rather than *.o and IBM128_SHARED_OBJS used wrong
   syntax of the GNU make substitution reference, which should be
   $(var:a=b) standing for $(patsubst a,b,$(var)) but it used
   $(var:a:b) instead

Bootstrapped/regtested on powerpc64le-linux --with-long-double-format=ieee,
ok for trunk?

2021-03-31  Jakub Jelinek  

PR target/97653
* config/rs6000/t-linux (IBM128_STATIC_OBJS): Fix spelling, use
$(objext) instead of $(object).  Use _floatunditf instead of
_floatunsditf.  Add tf <-> ti conversion objects.
(IBM128_SHARED_OBJS): Use proper substitution reference syntax.

--- libgcc/config/rs6000/t-linux.jj 2020-12-04 08:08:06.556434569 +0100
+++ libgcc/config/rs6000/t-linux2021-03-31 14:20:50.953852864 +0200
@@ -11,10 +11,12 @@ HOST_LIBGCC2_CFLAGS += -mno-minimal-toc
 # the IBM extended double format.  Also turn off gnu attributes on the static
 # modules.
 IBM128_STATIC_OBJS = ibm-ldouble$(objext) _powitf2$(objext) \
- ppc64-fp$(objext) _divtc3$(object) _multc3$(object) \
- _fixtfdi$(object) _fixunstfdi$(object) \
- _floatditf$(objext) _floatunsditf$(objext)
-IBM128_SHARED_OBJS = $(IBM128_STATIC_OBJS:$(objext):_s$(objext))
+ ppc64-fp$(objext) _divtc3$(objext) _multc3$(objext) \
+ _fixtfdi$(objext) _fixunstfdi$(objext) \
+ _floatditf$(objext) _floatunditf$(objext) \
+ _fixtfti$(objext) _fixunstfti$(objext) \
+ _floattitf$(objext) _floatuntitf$(objext)
+IBM128_SHARED_OBJS = $(IBM128_STATIC_OBJS:$(objext)=_s$(objext))
 IBM128_OBJS= $(IBM128_STATIC_OBJS) $(IBM128_SHARED_OBJS)
 
 IBM128_CFLAGS  = -Wno-psabi -mabi=ibmlongdouble -mno-gnu-attribute

Jakub



Re: [r11-7926 Regression] FAIL: libgomp.c/declare-variant-1.c (test for excess errors) on Linux/x86_64

2021-03-31 Thread Jan Hubicka
> On Linux/x86_64,
> 
> d7145b4bb6c8729a1e782373cb6256c06ed60465 is the first bad commit
> commit d7145b4bb6c8729a1e782373cb6256c06ed60465
> Author: Jan Hubicka 
> Date:   Wed Mar 31 11:35:29 2021 +0200
> 
> Small refactoring of cgraph_node::release_body
> 
> caused
> 
> FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++14 (internal compiler error)

This should be fixed by the attached patch. Sorry for this - I added the
check at last minute and apparently did not fully retest.
We now have clones of thunks that does make sense even after body was
released.  What does not make much sense to me is why we consider them
clones at all, but that is something to revisit next stage1.

Honza

gcc/ChangeLog:

PR lto/99447
* cgraph.c (cgraph_node::release_body): Fix overactive check.

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index b77c676a58a..d7c78d518bc 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -1860,7 +1860,15 @@ cgraph_node::release_body (bool keep_arguments)
   lto_free_function_in_decl_state_for_node (this);
   lto_file_data = NULL;
 }
-  gcc_assert (!clones);
+  if (flag_checking && clones)
+{
+  /* It is invalid to release body before materializing clones except
+for thunks that don't really need a body.  Verify also that we do
+not leak pointers to the call statements.  */
+  for (cgraph_node *node = clones; node;
+  node = node->next_sibling_clone)
+   gcc_assert (node->thunk && !node->callees->call_stmt);
+}
   remove_callees ();
   remove_all_references ();
 }
> FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++14 (test for excess errors)
> FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++17 (internal compiler error)
> FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++17 (test for excess errors)
> FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++2a (internal compiler error)
> FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++2a (test for excess errors)
> FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++98 (internal compiler error)
> FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++98 (test for excess errors)
> FAIL: g++.dg/ipa/pr71146.C  -std=gnu++14 (internal compiler error)
> FAIL: g++.dg/ipa/pr71146.C  -std=gnu++14 (test for excess errors)
> FAIL: g++.dg/ipa/pr71146.C  -std=gnu++17 (internal compiler error)
> FAIL: g++.dg/ipa/pr71146.C  -std=gnu++17 (test for excess errors)
> FAIL: g++.dg/ipa/pr71146.C  -std=gnu++2a (internal compiler error)
> FAIL: g++.dg/ipa/pr71146.C  -std=gnu++2a (test for excess errors)
> FAIL: g++.dg/ipa/pr71146.C  -std=gnu++98 (internal compiler error)
> FAIL: g++.dg/ipa/pr71146.C  -std=gnu++98 (test for excess errors)
> FAIL: g++.dg/ipa/pr85421.C   (internal compiler error)
> FAIL: g++.dg/ipa/pr85421.C   (test for excess errors)
> FAIL: g++.dg/ipa/pr92528.C   (internal compiler error)
> FAIL: g++.dg/ipa/pr92528.C   (test for excess errors)
> FAIL: g++.dg/torture/covariant-1.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error)
> FAIL: g++.dg/torture/covariant-1.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: g++.dg/torture/covariant-1.C   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (internal compiler error)
> FAIL: g++.dg/torture/covariant-1.C   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
> FAIL: g++.dg/torture/pr46287.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error)
> FAIL: g++.dg/torture/pr46287.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: g++.dg/torture/pr46287.C   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (internal compiler error)
> FAIL: g++.dg/torture/pr46287.C   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  (test for excess errors)
> FAIL: g++.dg/torture/pr46287.C   -O2  (internal compiler error)
> FAIL: g++.dg/torture/pr46287.C   -O2  (test for excess errors)
> FAIL: g++.dg/torture/pr46287.C   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  (internal compiler error)
> FAIL: g++.dg/torture/pr46287.C   -O3 -fomit-frame-pointer -funroll-loops 
> -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> FAIL: g++.dg/torture/pr46287.C   -O3 -g  (internal compiler error)
> FAIL: g++.dg/torture/pr46287.C   -O3 -g  (test for excess errors)
> FAIL: g++.dg/torture/pr78692.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error)
> FAIL: g++.dg/torture/pr78692.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (test for excess errors)
> FAIL: g++.dg/torture/pr78692.C   -O2  (internal compiler error)
> FAIL: g++.dg/torture/pr78692.C   -O2  (test for excess errors)
> FAIL: g++.dg/torture/pr78692.C   -O3 -g  (internal compiler error)
> FAIL: g++.dg/torture/pr78692.C   -O3 -g  (test for excess errors)
> FAIL: g++.dg/torture/pr83619.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  (internal compiler error)
> FAIL: g++.dg/torture/pr83619.C   -O2 -flto 

[r11-7926 Regression] FAIL: libgomp.c/declare-variant-1.c (test for excess errors) on Linux/x86_64

2021-03-31 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

d7145b4bb6c8729a1e782373cb6256c06ed60465 is the first bad commit
commit d7145b4bb6c8729a1e782373cb6256c06ed60465
Author: Jan Hubicka 
Date:   Wed Mar 31 11:35:29 2021 +0200

Small refactoring of cgraph_node::release_body

caused

FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++14 (internal compiler error)
FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++17 (internal compiler error)
FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++17 (test for excess errors)
FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++2a (internal compiler error)
FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++2a (test for excess errors)
FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++98 (internal compiler error)
FAIL: g++.dg/ipa/devirt-7.C  -std=gnu++98 (test for excess errors)
FAIL: g++.dg/ipa/pr71146.C  -std=gnu++14 (internal compiler error)
FAIL: g++.dg/ipa/pr71146.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/ipa/pr71146.C  -std=gnu++17 (internal compiler error)
FAIL: g++.dg/ipa/pr71146.C  -std=gnu++17 (test for excess errors)
FAIL: g++.dg/ipa/pr71146.C  -std=gnu++2a (internal compiler error)
FAIL: g++.dg/ipa/pr71146.C  -std=gnu++2a (test for excess errors)
FAIL: g++.dg/ipa/pr71146.C  -std=gnu++98 (internal compiler error)
FAIL: g++.dg/ipa/pr71146.C  -std=gnu++98 (test for excess errors)
FAIL: g++.dg/ipa/pr85421.C   (internal compiler error)
FAIL: g++.dg/ipa/pr85421.C   (test for excess errors)
FAIL: g++.dg/ipa/pr92528.C   (internal compiler error)
FAIL: g++.dg/ipa/pr92528.C   (test for excess errors)
FAIL: g++.dg/torture/covariant-1.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error)
FAIL: g++.dg/torture/covariant-1.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: g++.dg/torture/covariant-1.C   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error)
FAIL: g++.dg/torture/covariant-1.C   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: g++.dg/torture/pr46287.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error)
FAIL: g++.dg/torture/pr46287.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: g++.dg/torture/pr46287.C   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (internal compiler error)
FAIL: g++.dg/torture/pr46287.C   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: g++.dg/torture/pr46287.C   -O2  (internal compiler error)
FAIL: g++.dg/torture/pr46287.C   -O2  (test for excess errors)
FAIL: g++.dg/torture/pr46287.C   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (internal compiler error)
FAIL: g++.dg/torture/pr46287.C   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: g++.dg/torture/pr46287.C   -O3 -g  (internal compiler error)
FAIL: g++.dg/torture/pr46287.C   -O3 -g  (test for excess errors)
FAIL: g++.dg/torture/pr78692.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error)
FAIL: g++.dg/torture/pr78692.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: g++.dg/torture/pr78692.C   -O2  (internal compiler error)
FAIL: g++.dg/torture/pr78692.C   -O2  (test for excess errors)
FAIL: g++.dg/torture/pr78692.C   -O3 -g  (internal compiler error)
FAIL: g++.dg/torture/pr78692.C   -O3 -g  (test for excess errors)
FAIL: g++.dg/torture/pr83619.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error)
FAIL: g++.dg/torture/pr83619.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: g++.dg/torture/pr83619.C   -O2  (internal compiler error)
FAIL: g++.dg/torture/pr83619.C   -O2  (test for excess errors)
FAIL: g++.dg/torture/pr83619.C   -O3 -g  (internal compiler error)
FAIL: g++.dg/torture/pr83619.C   -O3 -g  (test for excess errors)
FAIL: libgomp.c/declare-variant-1.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-7926/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/ipa/devirt-7.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/ipa/devirt-7.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/ipa/devirt-7.C 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/ipa/devirt-7.C 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/ipa/pr71146.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 

Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake

2021-03-31 Thread H.J. Lu via Gcc-patches
On Wed, Mar 31, 2021 at 10:43 AM Jan Hubicka  wrote:
>
> > > Reading through the optimization manual it seems that mosvb is fast for
> > > small block no matter if the size is hard wired. In that case you
> > > probably want to check whetehr max_size or expected_size is known to be
> > > small rather than max_size == min_size and both being small.
> > >
> > > But it depends on what CPU really does.
> > > Honza
> >
> > For small data size, rep movsb is faster only under certain conditions.   We
> > can continue fine tuning rep movsb.
>
> OK, I however wonder why you need condtion maxsize=minsize.
>  - If CPU is looking for movl $cst, %rcx than we probably want to be
>sure that it is not moved away fro rep ;movsb by adding fused pattern
>  - If rep movsb is slower than loop for very small blocks then you want
>to set lower bound on minsize & expected size, but you do not need
>to require maxsize=minsize
>  - If rep movsb is slower than sequence of moves for small blocks then
>one needs to tweak move by pieces
>  - If rep movsb is slower for larger blocks than you want to test
>maxsize and expected size
> So in neither of those scenarios testing maxsize=minsize alone makes too
> much sense to me... What was the original motivation for differentiating
> between precisely known size?
>
> I am mostly curious because it is not that uncomon to have small maxsize
> because we are able to track the object size and using short sequence
> for those would be nice.
>
> Having minsize non-trivial may not be that uncommon these days either
> given that we track value ranges (and under assumption that
> memcpy/memset expanders was updated to take these into account).
>

Hongyu has done some analysis on this.  Hongyu, can you share what
you got?

Thanks.

-- 
H.J.


Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake

2021-03-31 Thread Jan Hubicka
> > Reading through the optimization manual it seems that mosvb is fast for
> > small block no matter if the size is hard wired. In that case you
> > probably want to check whetehr max_size or expected_size is known to be
> > small rather than max_size == min_size and both being small.
> >
> > But it depends on what CPU really does.
> > Honza
> 
> For small data size, rep movsb is faster only under certain conditions.   We
> can continue fine tuning rep movsb.

OK, I however wonder why you need condtion maxsize=minsize.
 - If CPU is looking for movl $cst, %rcx than we probably want to be
   sure that it is not moved away fro rep ;movsb by adding fused pattern
 - If rep movsb is slower than loop for very small blocks then you want
   to set lower bound on minsize & expected size, but you do not need 
   to require maxsize=minsize
 - If rep movsb is slower than sequence of moves for small blocks then
   one needs to tweak move by pieces
 - If rep movsb is slower for larger blocks than you want to test
   maxsize and expected size
So in neither of those scenarios testing maxsize=minsize alone makes too
much sense to me... What was the original motivation for differentiating
between precisely known size?

I am mostly curious because it is not that uncomon to have small maxsize
because we are able to track the object size and using short sequence
for those would be nice.

Having minsize non-trivial may not be that uncommon these days either
given that we track value ranges (and under assumption that
memcpy/memset expanders was updated to take these into account).

Honza
> 
> -- 
> H.J.


[committed] wwwdocs: cilkplus.org is gone

2021-03-31 Thread Gerald Pfeifer
I pushed this.

At first cilkplus.org was broken for weeks, it not months. Now it
redirects to a generic intel.com page. So remove it.
---
 htdocs/gcc-4.9/changes.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/htdocs/gcc-4.9/changes.html b/htdocs/gcc-4.9/changes.html
index 342a9962..53031b7d 100644
--- a/htdocs/gcc-4.9/changes.html
+++ b/htdocs/gcc-4.9/changes.html
@@ -186,8 +186,8 @@
 consecutive iterations using SIMD (single instruction multiple data)
 instructions.
 
-Support for https://www.cilkplus.org/;>Cilk Plus has been
-added and can be enabled with the -fcilkplus option.  Cilk 
Plus
+Support for Cilk Plus has been added and can be enabled
+with the -fcilkplus option.  Cilk Plus
 is an extension to the C and C++ languages to support data and task
 parallelism.  The present implementation follows ABI version 1.2; all
 features but _Cilk_for have been implemented.
-- 
2.30.2


Re: [PATCH v9] Practical improvement to libgcc complex divide

2021-03-31 Thread Bernhard Reutner-Fischer via Gcc-patches
On Fri, 26 Mar 2021 23:14:41 +
Patrick McGehearty via Gcc-patches  wrote:

> diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
> index 9f993c4..c0d9e57 100644
> --- a/gcc/c-family/c-cppbuiltin.c
> +++ b/gcc/c-family/c-cppbuiltin.c
> @@ -1277,8 +1277,10 @@ c_cpp_builtins (cpp_reader *pfile)
>   {
> scalar_float_mode mode = mode_iter.require ();
> const char *name = GET_MODE_NAME (mode);
> +   const int name_len = strlen (name);

strlen returns a size_t

Funny that we do not have a pre-seeded mode_name_len array but call
strlen on modes over and over again.

> +   char float_h_prefix[16] = "";
> char *macro_name
> - = (char *) alloca (strlen (name)
> + = (char *) alloca (name_len
>  + sizeof ("__LIBGCC__MANT_DIG__"));
> sprintf (macro_name, "__LIBGCC_%s_MANT_DIG__", name);
> builtin_define_with_int_value (macro_name,
> @@ -1286,20 +1288,29 @@ c_cpp_builtins (cpp_reader *pfile)
> if (!targetm.scalar_mode_supported_p (mode)
> || !targetm.libgcc_floating_mode_supported_p (mode))
>   continue;
> -   macro_name = (char *) alloca (strlen (name)
> +   macro_name = (char *) alloca (name_len
>   + sizeof ("__LIBGCC_HAS__MODE__"));
> sprintf (macro_name, "__LIBGCC_HAS_%s_MODE__", name);
> cpp_define (pfile, macro_name);
> -   macro_name = (char *) alloca (strlen (name)
> +   macro_name = (char *) alloca (name_len
>   + sizeof ("__LIBGCC__FUNC_EXT__"));
> sprintf (macro_name, "__LIBGCC_%s_FUNC_EXT__", name);

The above should have been split out as separate independent patchlet
to get it out of the way.

As noted already the use of alloca is a pre-existing (coding-style) bug
and we should use XALLOCAVEC or appropriately sized fixed buffers to
begin with. The amount of alloca in defining all these is amazing.

> diff --git a/gcc/testsuite/gcc.c-torture/execute/ieee/cdivchkd.c 
> b/gcc/testsuite/gcc.c-torture/execute/ieee/cdivchkd.c
> new file mode 100644
> index 000..409123f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/ieee/cdivchkd.c
> @@ -0,0 +1,126 @@
> +/*
> +  Program to test complex divide for correct results on selected values.
> +  Checking known failure points.
> +*/
> +
> +#include 
> +
> +extern void abort ();
> +extern void exit ();

As Joseph pointed out in the v8 review already, please use prototyped
function declarations in all new tests.

> diff --git a/gcc/testsuite/gcc.c-torture/execute/ieee/cdivchkld.c 
> b/gcc/testsuite/gcc.c-torture/execute/ieee/cdivchkld.c
> new file mode 100644
> index 000..28d707d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/ieee/cdivchkld.c
> @@ -0,0 +1,167 @@
> +/*
> +  Program to test complex divide for correct results on selected values.
> +  Checking known failure points.
> +*/
> +
> +#include 
> +
> +extern void abort ();
> +extern void exit ();

Likewise.

> +#if (LDBL_MAX_EXP < 2048)
> +  /*
> +Test values when mantissa is 11 or fewer bits.  Either LDBL is
> +using DBL on this platform or we are using IBM extended double
> +precision Test values will be automatically trucated the available
> +precision.

s/trucated/truncated/g; # with an 'n'
I have trouble parsing the sentence nevertheless. There might be a
missing "which" somewhere and maybe a "to"?

> +#else
> +  /*
> +Test values intended for either IEEE128 or Intel80 formats.  In
> +either case, 15 bits of exponent are available.  Test values will
> +be automatically trucated the available precision.
> +  */

again, truNcated and a couple of words missing?

> diff --git a/libgcc/config/rs6000/_divkc3.c b/libgcc/config/rs6000/_divkc3.c
> index d261f40..f57e14f 100644
> --- a/libgcc/config/rs6000/_divkc3.c
> +++ b/libgcc/config/rs6000/_divkc3.c
> @@ -37,31 +37,118 @@ see the files COPYING3 and COPYING.RUNTIME respectively. 
>  If not, see
>  #define __divkc3 __divkc3_sw
>  #endif
>  
> +#define RBIG   (__LIBGCC_TF_MAX__ / 2)
> +#define RMIN   (__LIBGCC_TF_MIN__)
> +#define RMIN2  (__LIBGCC_TF_EPSILON__)
> +#define RMINSCAL (1 / __LIBGCC_TF_EPSILON__)
> +#define RMAX2  (RBIG * RMIN2)
> +
> +
>  TCtype
>  __divkc3 (TFtype a, TFtype b, TFtype c, TFtype d)
>  {
>TFtype denom, ratio, x, y;
>TCtype res;
>  
> -  /* ??? We can get better behavior from logarithmic scaling instead of
> - the division.  But that would mean starting to link libgcc against
> - libm.  We could implement something akin to ldexp/frexp as gcc builtins
> - fairly easily...  */
> +  /* long double has significant potential underflow/overflow errors than
> + can be greatly reduced with a limited number of tests and adjustments.
> +  */

"than" doesn't parse. that?

>else
>  {
> +  /* Prevent underflow when denominator is near max representable.  */
> +  if (FABS (c) >= RBIG)
> + {
> +   

Re: [PATCH] testsuite: Disable zero-scratch-regs-{8, 9, 10, 11}.c on s390* [PR97680]

2021-03-31 Thread Qing Zhao via Gcc-patches
Yes, basically, I agreed with Eric. 

One of the major reason to intentionally put these testing cases under 
c-c++-common
 is to fail them by default on the platforms that do not support this feature 
yet. 

Then the platform maintainer could decide whether to complete this feature on 
the 
specific platform or skip them if they don’t want such feature on this 
platform. 

Qing

> On Mar 31, 2021, at 2:14 AM, Eric Botcazou  wrote:
> 
>> That is true, but nothing really happened during the 5 months that the tests
>> have been failing on many other architectures (except that powerpc and arm
>> had skipped those tests).  There has been a PR open for all those 5 months.
> 
> So what?  This is not the first example and I don't see anything special with 
> it.  You or maintainers can decide to XFAIL particular architectures at will, 
> but hiding the failures by default is IMO not appropriate.
> 
>> We can perhaps revert the skips after branching GCC 11 off, but I have
>> little hope other target maintainers will do what you did, so unsure if it
>> would help.  And the changes need people familiar with each of the backends
>> to decide what needs to be done and what is doable.
> 
> That's exactly the same situation as for -fstack-usage/-Wstack-usage, where I 
> intentionally made gcc.dg/stack-usage-1.c fail by default so that maintainers 
> could add the missing bits; this worked relatively well.
> 
> -- 
> Eric Botcazou
> 
> 



[committed] add test for PR 65182

2021-03-31 Thread Martin Sebor via Gcc-patches

r11-7932 adds a test case for another ancient -Wuninitialized bug
fixed eons ago:

  https://gcc.gnu.org/g:31199d95de1304e200554bbf98b2d8a6a7298bec

Martin



Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake

2021-03-31 Thread H.J. Lu via Gcc-patches
On Wed, Mar 31, 2021 at 6:47 AM Jan Hubicka  wrote:
>
> > > >
> > > > Patch is OK now.  I was wondering about using avx256 for moves of known
> > >
> > > Done.   X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now.   Can
> > > you take a look at the patch for Skylake:
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html
> >
> > I was wondering, if CPU preffers rep movsb when rcx is a compile time
> > constant, it probably does some logic at the decode time (i.e. expands
> > it into some sequence) and if so, then it may require the code setting
> > the register to be near rep (via fusing or simlar mechanism)
> >
> > Perhaps we want to have fusing pattern for this, so we do not move them
> > far apart?
>
> Reading through the optimization manual it seems that mosvb is fast for
> small block no matter if the size is hard wired. In that case you
> probably want to check whetehr max_size or expected_size is known to be
> small rather than max_size == min_size and both being small.
>
> But it depends on what CPU really does.
> Honza

For small data size, rep movsb is faster only under certain conditions.   We
can continue fine tuning rep movsb.

-- 
H.J.


Re: [PATCH] testsuite/aarch64: Skip SLP diagnostic under ILP32 (PR target/96974)

2021-03-31 Thread Christophe Lyon via Gcc-patches
On Wed, 31 Mar 2021 at 12:08, Richard Biener  wrote:
>
> On Mon, Mar 29, 2021 at 1:40 PM Christophe Lyon via Gcc-patches
>  wrote:
> >
> > The vectorizer has a very different effect with -mabi=ilp32, and
> > doesn't emit the expecte diagnostic, so this patch expects it only
> > under lp64.
>
> OK (please also backport as necessary)
>
Thanks, pushed to trunk and gcc-10

Christophe

> > 2021-03-29  Christophe Lyon  
> >
> > gcc/testsuite/
> > PR target/96974
> > * g++.target/aarch64/sve/pr96974.C: Expect SLP diagnostic only
> > under lp64.
> > ---
> >  gcc/testsuite/g++.target/aarch64/sve/pr96974.C | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C 
> > b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
> > index 363241d..54000f5 100644
> > --- a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
> > +++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
> > @@ -15,4 +15,4 @@ struct c {
> >  int coeffs[10];
> >  } f;
> >
> > -/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of 
> > vector subparts between" "slp1" } } */
> > +/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of 
> > vector subparts between" "slp1" { target lp64 } } } */
> > --
> > 2.7.4
> >


[pushed] c++: Alias template in pack expansion [PR99445]

2021-03-31 Thread Jason Merrill via Gcc-patches
In this testcase, iterative_hash_template_arg checks
alias_template_specialization_p to determine whether to treat a type as a
dependent alias, and structural_comptypes checks
dependent_alias_template_spec_p.  Normally that difference isn't a problem
because canonicalizing template arguments strips non-dependent aliases, but
that wasn't happening for the pack expansion.  Fixed thus.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/99445
* tree.c (strip_typedefs): Handle TYPE_PACK_EXPANSION.

gcc/testsuite/ChangeLog:

PR c++/99445
* g++.dg/cpp0x/alias-decl-variadic1.C: New test.
---
 gcc/cp/tree.c |  9 +
 gcc/testsuite/g++.dg/cpp0x/alias-decl-variadic1.C | 14 ++
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-variadic1.C

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 8c4bd156d3f..dca947bf52a 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -1722,6 +1722,15 @@ strip_typedefs (tree t, bool *remove_attributes, 
unsigned int flags)
 remove_attributes, flags);
   result = finish_underlying_type (type);
   break;
+case TYPE_PACK_EXPANSION:
+  type = strip_typedefs (PACK_EXPANSION_PATTERN (t),
+remove_attributes, flags);
+  if (type != PACK_EXPANSION_PATTERN (t))
+   {
+ result = copy_node (t);
+ PACK_EXPANSION_PATTERN (result) = type;
+   }
+  break;
 default:
   break;
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-variadic1.C 
b/gcc/testsuite/g++.dg/cpp0x/alias-decl-variadic1.C
new file mode 100644
index 000..68b3a7fd009
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-variadic1.C
@@ -0,0 +1,14 @@
+// PR c++/99445
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-fchecking=2 
--param=hash-table-verification-limit=1000" }
+
+template  struct implicit_conversions;
+template 
+using implicit_conversions_t = typename implicit_conversions::type;
+template  struct response_type;
+
+template 
+using type1 = response_type...>;
+
+template 
+using type2 = response_type::type...>;

base-commit: 7c1d6e89994109e1b6efb5f13890be5586edeb75
-- 
2.27.0



Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake

2021-03-31 Thread Jan Hubicka
> > >
> > > Patch is OK now.  I was wondering about using avx256 for moves of known
> > 
> > Done.   X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now.   Can
> > you take a look at the patch for Skylake:
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html
> 
> I was wondering, if CPU preffers rep movsb when rcx is a compile time
> constant, it probably does some logic at the decode time (i.e. expands
> it into some sequence) and if so, then it may require the code setting
> the register to be near rep (via fusing or simlar mechanism)
> 
> Perhaps we want to have fusing pattern for this, so we do not move them
> far apart?

Reading through the optimization manual it seems that mosvb is fast for
small block no matter if the size is hard wired. In that case you
probably want to check whetehr max_size or expected_size is known to be
small rather than max_size == min_size and both being small.

But it depends on what CPU really does.
Honza


Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake

2021-03-31 Thread Jan Hubicka
> >
> > Patch is OK now.  I was wondering about using avx256 for moves of known
> 
> Done.   X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now.   Can
> you take a look at the patch for Skylake:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html

I was wondering, if CPU preffers rep movsb when rcx is a compile time
constant, it probably does some logic at the decode time (i.e. expands
it into some sequence) and if so, then it may require the code setting
the register to be near rep (via fusing or simlar mechanism)

Perhaps we want to have fusing pattern for this, so we do not move them
far apart?
> 
> > size (per comment on MOVE_MAX_PIECES there is issue with
> > MAX_FIXED_MODE_SIZE, but that seems not hard to fix). Did you look into
> > it?
> 
> It requires some changes in the middle-end.   See

yep, I know - tried that too for zen3 tuning :)
> users/hjl/pieces/master branch:
> 
> https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/pieces/master
> 
> I am rebasing it.

Thanks, it would also help to reduce the code size bloat by bumping up
the move by pieces. Clang is using those.

Honza
> 
> -- 
> H.J.


Re: [PATCH][GCC 10] tree-opt: Fix segfault in tree-if-conv.c with -march=armv8.2-a+sve [PR97849]

2021-03-31 Thread Richard Biener via Gcc-patches
On Wed, Mar 31, 2021 at 3:31 PM Alex Coplan via Gcc-patches
 wrote:
>
> Hi all,
>
> I'd like to backport the fix for PR97849 to GCC 10. The patch on trunk:
> https://gcc.gnu.org/g:5700973f4a30762b4fc21687bb5f7843e55da2e4
> applies cleanly to the 10 branch.
>
> Bootstrapped and regtested on aarch64-linux-gnu, no regressions.
>
> OK for GCC 10.3?

OK.

> Thanks,
> Alex
>
> ---
>
> gcc/ChangeLog
> 2020-11-24  Prathamesh Kulkarni  
>
> PR tree-optimization/97849
> * tree-if-conv.c (tree_if_conversion): Move ssa_name
> replacement code from ifcvt_local_dce to this function
> before calling do_rpo_vn.
>
> gcc/testsuite/ChangeLog
> 2020-11-24  Prathamesh Kulkarni  
>
> PR tree-optimization/97849
> * gcc.dg/tree-ssa/pr97849.c: New test.


[PATCH][GCC 10] tree-opt: Fix segfault in tree-if-conv.c with -march=armv8.2-a+sve [PR97849]

2021-03-31 Thread Alex Coplan via Gcc-patches
Hi all,

I'd like to backport the fix for PR97849 to GCC 10. The patch on trunk:
https://gcc.gnu.org/g:5700973f4a30762b4fc21687bb5f7843e55da2e4
applies cleanly to the 10 branch.

Bootstrapped and regtested on aarch64-linux-gnu, no regressions.

OK for GCC 10.3?

Thanks,
Alex

---

gcc/ChangeLog
2020-11-24  Prathamesh Kulkarni  

PR tree-optimization/97849
* tree-if-conv.c (tree_if_conversion): Move ssa_name
replacement code from ifcvt_local_dce to this function
before calling do_rpo_vn.

gcc/testsuite/ChangeLog
2020-11-24  Prathamesh Kulkarni  

PR tree-optimization/97849
* gcc.dg/tree-ssa/pr97849.c: New test.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr97849.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr97849.c
new file mode 100644
index 000..57a31e316a2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr97849.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -ftree-vectorize" } */
+/* { dg-additional-options "-march=armv8.2-a+sve" { target aarch64*-*-* } } */
+
+int a, b, c;
+
+int g() {
+  char i = 0;
+  for (c = 0; c <= 8; c++)
+--i;
+
+  while (b) {
+_Bool f = i <= 0;
+a = (a == 0) ? 0 : f / a;
+  }
+}
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 2062758f40f..93effaa811b 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -2916,12 +2916,6 @@ ifcvt_local_dce (class loop *loop)
   enum gimple_code code;
   use_operand_p use_p;
   imm_use_iterator imm_iter;
-  std::pair  *name_pair;
-  unsigned int i;
-
-  FOR_EACH_VEC_ELT (redundant_ssa_names, i, name_pair)
-replace_uses_by (name_pair->first, name_pair->second);
-  redundant_ssa_names.release ();
 
   /* The loop has a single BB only.  */
   basic_block bb = loop->header;
@@ -3124,6 +3118,13 @@ tree_if_conversion (class loop *loop, vec 
*preds)
   exit_bbs = BITMAP_ALLOC (NULL);
   bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
   bitmap_set_bit (exit_bbs, loop->latch->index);
+
+  std::pair  *name_pair;
+  unsigned ssa_names_idx;
+  FOR_EACH_VEC_ELT (redundant_ssa_names, ssa_names_idx, name_pair)
+replace_uses_by (name_pair->first, name_pair->second);
+  redundant_ssa_names.release ();
+
   todo |= do_rpo_vn (cfun, loop_preheader_edge (loop), exit_bbs);
 
   /* Delete dead predicate computations.  */


Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake

2021-03-31 Thread H.J. Lu via Gcc-patches
On Wed, Mar 31, 2021 at 1:05 AM Jan Hubicka  wrote:
>
> > > It looks like X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is quite obviously
> > > benefical and independent of the rest of changes.  I think we will need
> > > to discuss bit more the move ratio and the code size/uop cache polution
> > > issues - one option would be to use increased limits for -O3 only.
> >
> > My change only increases CLEAR_RATIO, not MOVE_RATIO.   We are
> > checking code size impacts on SPEC CPU 2017 and eembc.
> >
> > > Can you break this out to independent patch?  I also wonder if it owuld
> >
> > X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB improves performance
> > only when memcpy/memset costs and MOVE_RATIO are updated the same time,
> > like:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html
> >
> > Make it a standalone means moving from Ice Lake patch to Skylake patch.
> >
> > > not be more readable to special case this just on the beggining of
> > > decide_alg.
> > > > @@ -6890,6 +6891,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT 
> > > > expected_size,
> > > >const struct processor_costs *cost;
> > > >int i;
> > > >bool any_alg_usable_p = false;
> > > > +  bool known_size_p = expected_size != -1;
> > >
> > > expected_size is not -1 if we have profile feedback and we detected from
> > > histogram average size of a block.  It seems to me that from description
> > > that you want the const to be actual compile time constant that would be
> > > min_size == max_size I guess.
> > >
> >
> > You are right.  Here is the v2 patch with min_size != max_size check for
> > unknown size.
>
> Patch is OK now.  I was wondering about using avx256 for moves of known

Done.   X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now.   Can
you take a look at the patch for Skylake:

https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html

> size (per comment on MOVE_MAX_PIECES there is issue with
> MAX_FIXED_MODE_SIZE, but that seems not hard to fix). Did you look into
> it?

It requires some changes in the middle-end.   See
users/hjl/pieces/master branch:

https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/pieces/master

I am rebasing it.

-- 
H.J.


Re: znver3 tuning part 1

2021-03-31 Thread Jan Hubicka
> On 3/31/21 1:08 PM, Jan Hubicka wrote:
> > > 
> > > 2021-03-15  Jan Hubicka  
> > > 
> > >   * config/i386/i386-options.c (processor_cost_table): Add znver3_cost.
> > >   * config/i386/x86-tune-costs.h (znver3_cost): New gobal variable; copy
> > >   of znver2_cost.
> > 
> > I have backported the patch to gcc10 branch as
> > g:aa99212489545c6c970a8f91b3d37ea6466cb988.
> > 
> > Honza
> > 
> 
> Looking at the backport, it has likely enabled PR99753.
> Please consider backporting 4f00c4d40a539360938607561460904663c64cda.

Not this patch but backport of the original Venkat's change indeed
triggers PR99763.  I am now testing backport if this revision.

Thanks,
Honza
> 
> Cheers,
> Martin


[PATCH] rs6000: Avoid -fpatchable-function-entry* regressions on powerpc64 be [PR98125]

2021-03-31 Thread Jakub Jelinek via Gcc-patches
Hi!

The SECTION_LINK_ORDER changes broke powerpc64-linux ELFv1.  Seems
that the assembler/linker relies on the symbol mentioned for the
"awo" section to be in the same section as the symbols mentioned in
the relocations in that section (i.e. labels for the patchable area
in this case).  That is the case for most targets, including powerpc-linux
32-bit or powerpc64 ELFv2 (that one has -fpatchable-function-entry*
support broken for other reasons and it doesn't seem to be a regression).
But it doesn't work on powerpc64-linux ELFv1.
We emit:
.section".opd","aw"
.align 3
_Z3foov:
.quad   .L._Z3foov,.TOC.@tocbase,0
.previous
.type   _Z3foov, @function
.L._Z3foov:
.section__patchable_function_entries,"awo",@progbits,_Z3foov
.align 3
.8byte  .LPFE1
.section.text._Z3foov,"axG",@progbits,_Z3foov,comdat
.LPFE1:
nop
.LFB0:
.cfi_startproc
and because _Z3foov is in the .opd section rather than the function text
section, it doesn't work.

I'm afraid I don't know what exactly should be done, whether e.g.
it could use
.section__patchable_function_entries,"awo",@progbits,.L._Z3foov
instead, or whether the linker should be changed to handle it as is, or
something else.

But because we have a P1 regression that didn't see useful progress over the
4 months since it has been filed and we don't really have much time, below
is an attempt to do a targetted reversion of H.J's patch, basically act as
if HAVE_GAS_SECTION_LINK_ORDER is never true for powerpc64-linux ELFv1,
but for 32-bit or 64-bit ELFv2 keep working as is.
This would give us time to resolve it for GCC 12 properly.

Bootstrapped/regtested on powerpc64-linux (regtest with both -m32/-m64).

2021-03-31  Jakub Jelinek  

PR testsuite/98125
* targhooks.h (default_print_patchable_function_entry_1): Declare.
* targhooks.c (default_print_patchable_function_entry_1): New function,
copied from default_print_patchable_function_entry with an added flags
argument.
(default_print_patchable_function_entry): Rewritten into a small
wrapper around default_print_patchable_function_entry_1.
* config/rs6000/rs6000.c (TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY):
Redefine.
(rs6000_print_patchable_function_entry): New function.

* g++.dg/pr93195a.C: Skip on powerpc*-*-* 64-bit.

--- gcc/targhooks.h.jj  2021-01-04 10:25:39.665224403 +0100
+++ gcc/targhooks.h 2021-03-30 15:48:42.826706369 +0200
@@ -230,6 +230,9 @@ extern bool default_use_by_pieces_infras
bool);
 extern int default_compare_by_pieces_branch_ratio (machine_mode);
 
+extern void default_print_patchable_function_entry_1 (FILE *,
+ unsigned HOST_WIDE_INT,
+ bool, unsigned int);
 extern void default_print_patchable_function_entry (FILE *,
unsigned HOST_WIDE_INT,
bool);
--- gcc/targhooks.c.jj  2021-01-04 10:25:38.974232228 +0100
+++ gcc/targhooks.c 2021-03-30 15:51:22.924932795 +0200
@@ -1832,17 +1832,15 @@ default_compare_by_pieces_branch_ratio (
   return 1;
 }
 
-/* Write PATCH_AREA_SIZE NOPs into the asm outfile FILE around a function
-   entry.  If RECORD_P is true and the target supports named sections,
-   the location of the NOPs will be recorded in a special object section
-   called "__patchable_function_entries".  This routine may be called
-   twice per function to put NOPs before and after the function
-   entry.  */
+/* Helper for default_print_patchable_function_entry and other
+   print_patchable_function_entry hook implementations.  */
 
 void
-default_print_patchable_function_entry (FILE *file,
-   unsigned HOST_WIDE_INT patch_area_size,
-   bool record_p)
+default_print_patchable_function_entry_1 (FILE *file,
+ unsigned HOST_WIDE_INT
+ patch_area_size,
+ bool record_p,
+ unsigned int flags)
 {
   const char *nop_templ = 0;
   int code_num;
@@ -1864,9 +1862,6 @@ default_print_patchable_function_entry (
   patch_area_number++;
   ASM_GENERATE_INTERNAL_LABEL (buf, "LPFE", patch_area_number);
 
-  unsigned int flags = SECTION_WRITE | SECTION_RELRO;
-  if (HAVE_GAS_SECTION_LINK_ORDER)
-   flags |= SECTION_LINK_ORDER;
   switch_to_section (get_section ("__patchable_function_entries",
  flags, current_function_decl));
   assemble_align (POINTER_SIZE);
@@ -1883,6 +1878,25 @@ default_print_patchable_function_entry (
 output_asm_insn (nop_templ, NULL);
 }
 

Re: znver3 tuning part 1

2021-03-31 Thread Martin Liška

On 3/31/21 1:08 PM, Jan Hubicka wrote:


2021-03-15  Jan Hubicka  

* config/i386/i386-options.c (processor_cost_table): Add znver3_cost.
* config/i386/x86-tune-costs.h (znver3_cost): New gobal variable; copy
of znver2_cost.


I have backported the patch to gcc10 branch as
g:aa99212489545c6c970a8f91b3d37ea6466cb988.

Honza



Looking at the backport, it has likely enabled PR99753.
Please consider backporting 4f00c4d40a539360938607561460904663c64cda.

Cheers,
Martin


Re: [PATCH] Handle CONST_POLY_INTs in CONST_VECTORs [PR97141, PR98726]

2021-03-31 Thread Richard Biener via Gcc-patches
On Wed, Mar 31, 2021 at 12:24 PM Richard Sandiford via Gcc-patches
 wrote:
>
> This PR is caused by POLY_INT_CSTs being (necessarily) valid
> in tree-level VECTOR_CSTs but CONST_POLY_INTs not being valid
> in RTL CONST_VECTORs.  I can't tell/remember how deliberate
> that was, but I'm guessing not very.  In particular,
> valid_for_const_vector_p was added to guard against symbolic
> constants rather than CONST_POLY_INTs.
>
> I did briefly consider whether we should maintain the current
> status anyway.  However, that would then require a way of
> constructing variable-length vectors from individiual elements
> if, say, we have:
>
>{ [2, 2], [3, 2], [4, 2], … }
>
> So I'm chalking this up to an oversight.  I think the intention
> (and certainly the natural thing) is to have the same rules for
> both trees and RTL.
>
> The SVE CONST_VECTOR code should already be set up to handle
> CONST_POLY_INTs.  However, we need to add support for Advanced SIMD
> CONST_VECTORs that happen to contain SVE-based values.  The patch does
> that by expanding such CONST_VECTORs in the same way as variable vectors.
>
> Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu.
> OK for the target-independent bits?

OK.

Richard.

> Richard
>
>
> gcc/
> PR rtl-optimization/97141
> PR rtl-optimization/98726
> * emit-rtl.c (valid_for_const_vector_p): Return true for
> CONST_POLY_INT_P.
> * rtx-vector-builder.h (rtx_vector_builder::step): Return a
> poly_wide_int instead of a wide_int.
> (rtx_vector_builder::apply_set): Take a poly_wide_int instead
> of a wide_int.
> * rtx-vector-builder.c (rtx_vector_builder::apply_set): Likewise.
> * config/aarch64/aarch64.c (aarch64_legitimate_constant_p): Return
> false for CONST_VECTORs that cannot be forced to memory.
> * config/aarch64/aarch64-simd.md (mov): If a CONST_VECTOR
> is too complex to force to memory, build it up from individual
> elements instead.
>
> gcc/testsuite/
> PR rtl-optimization/97141
> PR rtl-optimization/98726
> * gcc.c-torture/compile/pr97141.c: New test.
> * gcc.c-torture/compile/pr98726.c: Likewise.
> * gcc.target/aarch64/sve/pr97141.c: Likewise.
> * gcc.target/aarch64/sve/pr98726.c: Likewise.
> ---
>  gcc/config/aarch64/aarch64-simd.md| 11 +
>  gcc/config/aarch64/aarch64.c  | 24 +++
>  gcc/emit-rtl.c|  1 +
>  gcc/rtx-vector-builder.c  |  6 ++---
>  gcc/rtx-vector-builder.h  | 10 
>  gcc/testsuite/gcc.c-torture/compile/pr97141.c |  8 +++
>  gcc/testsuite/gcc.c-torture/compile/pr98726.c |  7 ++
>  .../gcc.target/aarch64/sve/pr97141.c  | 10 
>  .../gcc.target/aarch64/sve/pr98726.c  |  9 +++
>  9 files changed, 68 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr97141.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr98726.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr97141.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr98726.c
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index d86e8e72f18..4edee99051c 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -35,6 +35,17 @@ (define_expand "mov"
> && aarch64_mem_pair_operand (operands[0], DImode))
>|| known_eq (GET_MODE_SIZE (mode), 8
>operands[1] = force_reg (mode, operands[1]);
> +
> +  /* If a constant is too complex to force to memory (e.g. because it
> + contains CONST_POLY_INTs), build it up from individual elements instead.
> + We should only need to do this before RA; aarch64_legitimate_constant_p
> + should ensure that we don't try to rematerialize the constant later.  */
> +  if (GET_CODE (operands[1]) == CONST_VECTOR
> +  && targetm.cannot_force_const_mem (mode, operands[1]))
> +{
> +  aarch64_expand_vector_init (operands[0], operands[1]);
> +  DONE;
> +}
>"
>  )
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5eda9e80bd0..77573e96820 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -17925,10 +17925,22 @@ aarch64_legitimate_constant_p (machine_mode mode, 
> rtx x)
>  {
>/* Support CSE and rematerialization of common constants.  */
>if (CONST_INT_P (x)
> -  || (CONST_DOUBLE_P (x) && GET_MODE_CLASS (mode) == MODE_FLOAT)
> -  || GET_CODE (x) == CONST_VECTOR)
> +  || (CONST_DOUBLE_P (x) && GET_MODE_CLASS (mode) == MODE_FLOAT))
>  return true;
>
> +  /* Only accept variable-length vector constants if they can be
> + handled directly.
> +
> + ??? It would be possible (but complex) to handle rematerialization
> + of other 

Re: znver3 tuning part 1

2021-03-31 Thread Jan Hubicka
> 
> 2021-03-15  Jan Hubicka  
> 
>   * config/i386/i386-options.c (processor_cost_table): Add znver3_cost.
>   * config/i386/x86-tune-costs.h (znver3_cost): New gobal variable; copy
>   of znver2_cost.

I have backported the patch to gcc10 branch as
g:aa99212489545c6c970a8f91b3d37ea6466cb988.

Honza


Re: [PATCH] gimple-fold: Recompute ADDR_EXPR flags after folding a TMR [PR98268]

2021-03-31 Thread Richard Biener via Gcc-patches
On Wed, Mar 31, 2021 at 12:23 PM Richard Sandiford via Gcc-patches
 wrote:
>
> The gimple verifier picked up that an ADDR_EXPR of a MEM_REF was not
> marked TREE_CONSTANT even though the address was in fact invariant.
> This came from folding a _MEM_REF with constant operands to
> a _REF; _MEM_REF is never treated as TREE_CONSTANT
> but _REF can be.
>
> Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu.
> OK to install?

OK.

Richard.

> Richard
>
> gcc/
> PR tree-optimization/98268
> * gimple-fold.c (maybe_canonicalize_mem_ref_addr): Call
> recompute_tree_invariant_for_addr_expr after successfully
> folding a TARGET_MEM_REF that occurs inside an ADDR_EXPR.
>
> gcc/testsuite/
> PR tree-optimization/98268
> * gcc.target/aarch64/sve/pr98268-1.c: New test.
> * gcc.target/aarch64/sve/pr98268-2.c: Likewise.
> ---
>  gcc/gimple-fold.c|  2 ++
>  gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c | 11 +++
>  gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c | 10 ++
>  3 files changed, 23 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c
>
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index de5a6c22395..9e6683dbac9 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -5877,6 +5877,8 @@ maybe_canonicalize_mem_ref_addr (tree *t, bool is_debug 
> = false)
>if (tem)
> {
>   *t = tem;
> + if (TREE_CODE (*orig_t) == ADDR_EXPR)
> +   recompute_tree_invariant_for_addr_expr (*orig_t);
>   res = true;
> }
>  }
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c
> new file mode 100644
> index 000..fdbe55e0b4e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c
> @@ -0,0 +1,11 @@
> +/* { dg-do link } */
> +/* { dg-options "-flto -O -ftree-vectorize 
> --param=aarch64-autovec-preference=3" } */
> +/* { dg-additional-sources "pr98268-2.c" } */
> +
> +short d, e;
> +void f(char, long*);
> +int main() {
> +  long x;
> +  f(-114, );
> +  return d == e;
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c
> new file mode 100644
> index 000..de3b05d5e15
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -ftree-vectorize --param=aarch64-autovec-preference=3" } 
> */
> +
> +extern short d[], e[];
> +void f(char a, long *b) {
> +  for (int c = 0; c < a - 12; c++) {
> +d[c] = b[c];
> +e[c] = 0;
> +  }
> +}


Re: [PATCH] data-ref: Tighten index-based alias checks [PR99726]

2021-03-31 Thread Richard Biener via Gcc-patches
On Wed, Mar 31, 2021 at 12:15 PM Richard Sandiford via Gcc-patches
 wrote:
>
> create_intersect_range_checks_index tries to create a runtime
> alias check based on index comparisons.  It looks through the
> access functions for the two DRs to find a SCEV for the loop
> that is being versioned and converts a DR_STEP-based check
> into an index-based check.
>
> However, there isn't any reliable sign information in the types,
> so the code expects the value of the IV step (when interpreted as
> signed) to be negative iff the DR_STEP (when interpreted as signed)
> is negative.
>
> r10-4762 added another assert related to this assumption and the
> assert fired for the testcase in the PR.  The sign of the IV step
> didn't match the sign of the DR_STEP.
>
> I think this is actually showing what was previously a wrong-code bug.
> The signs didn't match because the DRs contained *two* access function
> SCEVs for the loop being versioned.  It doesn't look like the code
> is set up to deal with this, since it checks each access function
> independently and treats it as the sole source of DR_STEP.

I think it shows that matching DR_STEP with the access function step
doesn't make much sense.  In practice it will work for the case where
there's a single access function evolving in the respective loop since
we don't have negative stride arrays via array_ref_element_size
(do we?).  But of course for multiple access functions we can't
simply choose one to do an index overlap test but instead we'd
have to check all of them.

That said, I wonder why the code needs the DR_STEP at all
and cannot simply use the SCEVs step [direction].

Of course the patch is correct in that it restricts handling to a single
access fn evolving in the loop.  Thus - OK.  Removing the DR_STEP
restriction might allow more (weird, see above) cases to be handled.

Thanks,
Richard.

> The patch therefore moves the main condition out of the loop.
> This also has the advantage of not building a tree for one access
> function only to throw it away if we find an inner function that
> makes the comparison invalid.
>
> Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu.
> OK to install?
>
> Richard
>
>
> gcc/
> PR tree-optimization/99726
> * tree-data-ref.c (create_intersect_range_checks_index): Bail
> out if there is more than one access function SCEV for the loop
> being versioned.
>
> gcc/testsuite/
> PR tree-optimization/99726
> * gcc.target/i386/pr99726.c: New test.
> ---
>  gcc/testsuite/gcc.target/i386/pr99726.c |  15 ++
>  gcc/tree-data-ref.c | 245 +---
>  2 files changed, 143 insertions(+), 117 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr99726.c
>
> [ -b version ]--
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr99726.c 
> b/gcc/testsuite/gcc.target/i386/pr99726.c
> new file mode 100644
> index 000..ff19bcabe4f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr99726.c
> @@ -0,0 +1,15 @@
> +/* { dg-options "-flive-patching=inline-clone -mavx512f -O2 
> -floop-nest-optimize -ftree-loop-vectorize -ftrapv -m32" } */
> +
> +extern int a[256][1024];
> +int b;
> +long c, d;
> +unsigned int e;
> +
> +int
> +main ()
> +{
> +  for (; e < d; e++)
> +for (unsigned j = 1; j < c; j++)
> +  a[e][j] = b * a[e - 1][j + 1];
> +  return 0;
> +}
> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
> index 124a7bea6a9..e6dd5f15bed 100644
> --- a/gcc/tree-data-ref.c
> +++ b/gcc/tree-data-ref.c
> @@ -2147,8 +2147,8 @@ create_intersect_range_checks_index (class loop *loop, 
> tree *cond_expr,
>
>bool waw_or_war_p = (alias_pair.flags & ~(DR_ALIAS_WAR | DR_ALIAS_WAW)) == 
> 0;
>
> -  unsigned int i;
> -  for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
> +  int found = -1;
> +  for (unsigned int i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
>  {
>tree access1 = DR_ACCESS_FN (dr_a.dr, i);
>tree access2 = DR_ACCESS_FN (dr_b.dr, i);
> @@ -2164,7 +2164,19 @@ create_intersect_range_checks_index (class loop *loop, 
> tree *cond_expr,
>
>   return false;
> }
> +  if (found >= 0)
> +   return false;
> +  found = i;
> +}
> +
> +  /* Ought not to happen in practice, since if all accesses are equal then 
> the
> + alias should be decidable at compile time.  */
> +  if (found < 0)
> +return false;
> +
>/* The two indices must have the same step.  */
> +  tree access1 = DR_ACCESS_FN (dr_a.dr, found);
> +  tree access2 = DR_ACCESS_FN (dr_b.dr, found);
>if (!operand_equal_p (CHREC_RIGHT (access1), CHREC_RIGHT (access2), 0))
>  return false;
>
> @@ -2221,7 +2233,7 @@ create_intersect_range_checks_index (class loop *loop, 
> tree *cond_expr,
>   When checking for general overlap, we need to test whether
>   the range:
>
> -  [min1 + low_offset1, min2 + high_offset1 + idx_access1 - 1]
> +   

[committed] aarch64: Fix target alignment for SVE [PR98119]

2021-03-31 Thread Richard Sandiford via Gcc-patches
The vectoriser supports peeling for alignment using predication:
we move back to the previous aligned boundary and make the skipped
elements inactive in the first loop iteration.  As it happens,
the costs for existing CPUs give an equal cost to aligned and
unaligned accesses, so this feature is rarely used.

However, the PR shows that when the feature was forced on, we were
still trying to align to a full-vector boundary even when using
partial vectors.

Tested on aarch64-linux-gnu, pushed to trunk so far.

Richard


gcc/
PR target/98119
* config/aarch64/aarch64.c
(aarch64_vectorize_preferred_vector_alignment): Query the size
of the provided SVE vector; do not assume that all SVE vectors
have the same size.

gcc/testsuite/
PR target/98119
* gcc.target/aarch64/sve/pr98119.c: New test.
---
 gcc/config/aarch64/aarch64.c   |  7 ---
 gcc/testsuite/gcc.target/aarch64/sve/pr98119.c | 13 +
 2 files changed, 17 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr98119.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5eda9e80bd0..f878721f13c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -20275,10 +20275,11 @@ aarch64_vectorize_preferred_vector_alignment 
(const_tree type)
 {
   if (aarch64_sve_data_mode_p (TYPE_MODE (type)))
 {
-  /* If the length of the vector is fixed, try to align to that length,
-otherwise don't try to align at all.  */
+  /* If the length of the vector is a fixed power of 2, try to align
+to that length, otherwise don't try to align at all.  */
   HOST_WIDE_INT result;
-  if (!BITS_PER_SVE_VECTOR.is_constant ())
+  if (!GET_MODE_BITSIZE (TYPE_MODE (type)).is_constant ()
+ || !pow2p_hwi (result))
result = TYPE_ALIGN (TREE_TYPE (type));
   return result;
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98119.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr98119.c
new file mode 100644
index 000..da6208c2ce3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98119.c
@@ -0,0 +1,13 @@
+/* { dg-options "-O3 -msve-vector-bits=512 -mtune=thunderx" } */
+
+void
+f (unsigned short *x)
+{
+  for (int i = 0; i < 1000; ++i)
+x[i] += x[i - 16];
+}
+
+/* { dg-final { scan-assembler-not {\tubfx\t[wx][0-9]+, [wx][0-9]+, #?1, 
#?5\n} } } */
+/* { dg-final { scan-assembler-not {\tand\tx[0-9]+, x[0-9]+, #?-63\n} } } */
+/* { dg-final { scan-assembler {\tubfx\t[wx][0-9]+, [wx][0-9]+, #?1, #?4\n} } 
} */
+/* { dg-final { scan-assembler {\tand\tx[0-9]+, x[0-9]+, #?-31\n} } } */


[PATCH] Handle CONST_POLY_INTs in CONST_VECTORs [PR97141, PR98726]

2021-03-31 Thread Richard Sandiford via Gcc-patches
This PR is caused by POLY_INT_CSTs being (necessarily) valid
in tree-level VECTOR_CSTs but CONST_POLY_INTs not being valid
in RTL CONST_VECTORs.  I can't tell/remember how deliberate
that was, but I'm guessing not very.  In particular,
valid_for_const_vector_p was added to guard against symbolic
constants rather than CONST_POLY_INTs.

I did briefly consider whether we should maintain the current
status anyway.  However, that would then require a way of
constructing variable-length vectors from individiual elements
if, say, we have:

   { [2, 2], [3, 2], [4, 2], … }

So I'm chalking this up to an oversight.  I think the intention
(and certainly the natural thing) is to have the same rules for
both trees and RTL.

The SVE CONST_VECTOR code should already be set up to handle
CONST_POLY_INTs.  However, we need to add support for Advanced SIMD
CONST_VECTORs that happen to contain SVE-based values.  The patch does
that by expanding such CONST_VECTORs in the same way as variable vectors.

Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu.
OK for the target-independent bits?

Richard


gcc/
PR rtl-optimization/97141
PR rtl-optimization/98726
* emit-rtl.c (valid_for_const_vector_p): Return true for
CONST_POLY_INT_P.
* rtx-vector-builder.h (rtx_vector_builder::step): Return a
poly_wide_int instead of a wide_int.
(rtx_vector_builder::apply_set): Take a poly_wide_int instead
of a wide_int.
* rtx-vector-builder.c (rtx_vector_builder::apply_set): Likewise.
* config/aarch64/aarch64.c (aarch64_legitimate_constant_p): Return
false for CONST_VECTORs that cannot be forced to memory.
* config/aarch64/aarch64-simd.md (mov): If a CONST_VECTOR
is too complex to force to memory, build it up from individual
elements instead.

gcc/testsuite/
PR rtl-optimization/97141
PR rtl-optimization/98726
* gcc.c-torture/compile/pr97141.c: New test.
* gcc.c-torture/compile/pr98726.c: Likewise.
* gcc.target/aarch64/sve/pr97141.c: Likewise.
* gcc.target/aarch64/sve/pr98726.c: Likewise.
---
 gcc/config/aarch64/aarch64-simd.md| 11 +
 gcc/config/aarch64/aarch64.c  | 24 +++
 gcc/emit-rtl.c|  1 +
 gcc/rtx-vector-builder.c  |  6 ++---
 gcc/rtx-vector-builder.h  | 10 
 gcc/testsuite/gcc.c-torture/compile/pr97141.c |  8 +++
 gcc/testsuite/gcc.c-torture/compile/pr98726.c |  7 ++
 .../gcc.target/aarch64/sve/pr97141.c  | 10 
 .../gcc.target/aarch64/sve/pr98726.c  |  9 +++
 9 files changed, 68 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr97141.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr98726.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr97141.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr98726.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index d86e8e72f18..4edee99051c 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -35,6 +35,17 @@ (define_expand "mov"
&& aarch64_mem_pair_operand (operands[0], DImode))
   || known_eq (GET_MODE_SIZE (mode), 8
   operands[1] = force_reg (mode, operands[1]);
+
+  /* If a constant is too complex to force to memory (e.g. because it
+ contains CONST_POLY_INTs), build it up from individual elements instead.
+ We should only need to do this before RA; aarch64_legitimate_constant_p
+ should ensure that we don't try to rematerialize the constant later.  */
+  if (GET_CODE (operands[1]) == CONST_VECTOR
+  && targetm.cannot_force_const_mem (mode, operands[1]))
+{
+  aarch64_expand_vector_init (operands[0], operands[1]);
+  DONE;
+}
   "
 )
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5eda9e80bd0..77573e96820 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17925,10 +17925,22 @@ aarch64_legitimate_constant_p (machine_mode mode, rtx 
x)
 {
   /* Support CSE and rematerialization of common constants.  */
   if (CONST_INT_P (x)
-  || (CONST_DOUBLE_P (x) && GET_MODE_CLASS (mode) == MODE_FLOAT)
-  || GET_CODE (x) == CONST_VECTOR)
+  || (CONST_DOUBLE_P (x) && GET_MODE_CLASS (mode) == MODE_FLOAT))
 return true;
 
+  /* Only accept variable-length vector constants if they can be
+ handled directly.
+
+ ??? It would be possible (but complex) to handle rematerialization
+ of other constants via secondary reloads.  */
+  if (!GET_MODE_SIZE (mode).is_constant ())
+return aarch64_simd_valid_immediate (x, NULL);
+
+  /* Otherwise, accept any CONST_VECTOR that, if all else fails, can at
+ least be forced to memory and loaded from there.  */
+  if (GET_CODE (x) == 

[PATCH] gimple-fold: Recompute ADDR_EXPR flags after folding a TMR [PR98268]

2021-03-31 Thread Richard Sandiford via Gcc-patches
The gimple verifier picked up that an ADDR_EXPR of a MEM_REF was not
marked TREE_CONSTANT even though the address was in fact invariant.
This came from folding a _MEM_REF with constant operands to
a _REF; _MEM_REF is never treated as TREE_CONSTANT
but _REF can be.

Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu.
OK to install?

Richard

gcc/
PR tree-optimization/98268
* gimple-fold.c (maybe_canonicalize_mem_ref_addr): Call
recompute_tree_invariant_for_addr_expr after successfully
folding a TARGET_MEM_REF that occurs inside an ADDR_EXPR.

gcc/testsuite/
PR tree-optimization/98268
* gcc.target/aarch64/sve/pr98268-1.c: New test.
* gcc.target/aarch64/sve/pr98268-2.c: Likewise.
---
 gcc/gimple-fold.c|  2 ++
 gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c | 11 +++
 gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c | 10 ++
 3 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index de5a6c22395..9e6683dbac9 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -5877,6 +5877,8 @@ maybe_canonicalize_mem_ref_addr (tree *t, bool is_debug = 
false)
   if (tem)
{
  *t = tem;
+ if (TREE_CODE (*orig_t) == ADDR_EXPR)
+   recompute_tree_invariant_for_addr_expr (*orig_t);
  res = true;
}
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c
new file mode 100644
index 000..fdbe55e0b4e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c
@@ -0,0 +1,11 @@
+/* { dg-do link } */
+/* { dg-options "-flto -O -ftree-vectorize 
--param=aarch64-autovec-preference=3" } */
+/* { dg-additional-sources "pr98268-2.c" } */
+
+short d, e;
+void f(char, long*);
+int main() {
+  long x;
+  f(-114, );
+  return d == e;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c
new file mode 100644
index 000..de3b05d5e15
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O -ftree-vectorize --param=aarch64-autovec-preference=3" } */
+
+extern short d[], e[];
+void f(char a, long *b) {
+  for (int c = 0; c < a - 12; c++) {
+d[c] = b[c];
+e[c] = 0;
+  }
+}


[PATCH] data-ref: Tighten index-based alias checks [PR99726]

2021-03-31 Thread Richard Sandiford via Gcc-patches
create_intersect_range_checks_index tries to create a runtime
alias check based on index comparisons.  It looks through the
access functions for the two DRs to find a SCEV for the loop
that is being versioned and converts a DR_STEP-based check
into an index-based check.

However, there isn't any reliable sign information in the types,
so the code expects the value of the IV step (when interpreted as
signed) to be negative iff the DR_STEP (when interpreted as signed)
is negative.

r10-4762 added another assert related to this assumption and the
assert fired for the testcase in the PR.  The sign of the IV step
didn't match the sign of the DR_STEP.

I think this is actually showing what was previously a wrong-code bug.
The signs didn't match because the DRs contained *two* access function
SCEVs for the loop being versioned.  It doesn't look like the code
is set up to deal with this, since it checks each access function
independently and treats it as the sole source of DR_STEP.

The patch therefore moves the main condition out of the loop.
This also has the advantage of not building a tree for one access
function only to throw it away if we find an inner function that
makes the comparison invalid.

Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu.
OK to install?

Richard


gcc/
PR tree-optimization/99726
* tree-data-ref.c (create_intersect_range_checks_index): Bail
out if there is more than one access function SCEV for the loop
being versioned.

gcc/testsuite/
PR tree-optimization/99726
* gcc.target/i386/pr99726.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr99726.c |  15 ++
 gcc/tree-data-ref.c | 245 +---
 2 files changed, 143 insertions(+), 117 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr99726.c

[ -b version ]--

diff --git a/gcc/testsuite/gcc.target/i386/pr99726.c 
b/gcc/testsuite/gcc.target/i386/pr99726.c
new file mode 100644
index 000..ff19bcabe4f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr99726.c
@@ -0,0 +1,15 @@
+/* { dg-options "-flive-patching=inline-clone -mavx512f -O2 
-floop-nest-optimize -ftree-loop-vectorize -ftrapv -m32" } */
+
+extern int a[256][1024];
+int b;
+long c, d;
+unsigned int e;
+
+int
+main ()
+{
+  for (; e < d; e++)
+for (unsigned j = 1; j < c; j++)
+  a[e][j] = b * a[e - 1][j + 1];
+  return 0;
+}
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 124a7bea6a9..e6dd5f15bed 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -2147,8 +2147,8 @@ create_intersect_range_checks_index (class loop *loop, 
tree *cond_expr,
 
   bool waw_or_war_p = (alias_pair.flags & ~(DR_ALIAS_WAR | DR_ALIAS_WAW)) == 0;
 
-  unsigned int i;
-  for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
+  int found = -1;
+  for (unsigned int i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
 {
   tree access1 = DR_ACCESS_FN (dr_a.dr, i);
   tree access2 = DR_ACCESS_FN (dr_b.dr, i);
@@ -2164,7 +2164,19 @@ create_intersect_range_checks_index (class loop *loop, 
tree *cond_expr,
 
  return false;
}
+  if (found >= 0)
+   return false;
+  found = i;
+}
+
+  /* Ought not to happen in practice, since if all accesses are equal then the
+ alias should be decidable at compile time.  */
+  if (found < 0)
+return false;
+
   /* The two indices must have the same step.  */
+  tree access1 = DR_ACCESS_FN (dr_a.dr, found);
+  tree access2 = DR_ACCESS_FN (dr_b.dr, found);
   if (!operand_equal_p (CHREC_RIGHT (access1), CHREC_RIGHT (access2), 0))
 return false;
 
@@ -2221,7 +2233,7 @@ create_intersect_range_checks_index (class loop *loop, 
tree *cond_expr,
  When checking for general overlap, we need to test whether
  the range:
 
-  [min1 + low_offset1, min2 + high_offset1 + idx_access1 - 1]
+   [min1 + low_offset1, min1 + high_offset1 + idx_access1 - 1]
 
  overlaps:
 
@@ -2312,7 +2324,6 @@ create_intersect_range_checks_index (class loop *loop, 
tree *cond_expr,
  *cond_expr, part_cond_expr);
   else
 *cond_expr = part_cond_expr;
-}
   if (dump_enabled_p ())
 {
   if (waw_or_war_p)

[ Real patch ]--

diff --git a/gcc/testsuite/gcc.target/i386/pr99726.c 
b/gcc/testsuite/gcc.target/i386/pr99726.c
new file mode 100644
index 000..ff19bcabe4f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr99726.c
@@ -0,0 +1,15 @@
+/* { dg-options "-flive-patching=inline-clone -mavx512f -O2 
-floop-nest-optimize -ftree-loop-vectorize -ftrapv -m32" } */
+
+extern int a[256][1024];
+int b;
+long c, d;
+unsigned int e;
+
+int
+main ()
+{
+  for (; e < d; e++)
+for (unsigned j = 1; j < c; j++)
+  a[e][j] = b * a[e - 1][j + 1];
+  return 0;
+}
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 124a7bea6a9..e6dd5f15bed 100644

Re: [PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]

2021-03-31 Thread Richard Biener
On Wed, 31 Mar 2021, Stam Markianos-Wright wrote:

> On 29/03/2021 10:20, Richard Biener wrote:
> > On Fri, 26 Mar 2021, Richard Sandiford wrote:
> > 
> >> Richard Biener  writes:
> >>> On Wed, 24 Mar 2021, Stam Markianos-Wright wrote:
> >>>
>  Hi all,
> 
>  This patch resolves bug:
> 
>  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974
> 
>  This is achieved by forcing a re-calculation of *stmt_vectype_out if an
>  incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an
>  extra introduced max_nunits ceiling.
> 
>  I am not 100% sure if this is the best way to go about fixing this,
>  because
>  this is my first look at the vectorizer and I lack knowledge of the wider
>  context, so do let me know if you see a better way to do this!
> 
>  I have added the previously ICE-ing reproducer as a new test.
> 
>  This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4"
>  for
>  GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10.
> 
>  (the non-fdisable-tree-fre4 version has gone latent on GCC11)
> 
>  Bootstrapped and reg-tested on aarch64-linux-gnu.
>  Also reg-tested on aarch64-none-elf.
> >>>
> >>> I don't think this is going to work well given uses will expect
> >>> a vector type that's consistent here.
> >>>
> >>> I think giving up is for the moment the best choice, thus replacing
> >>> the assert with vectorization failure.
> >>>
> >>> In the end we shouldn't require those nunits vectypes to be
> >>> separately computed - we compute the vector type of the defs
> >>> anyway and in case they're invariant the vectorizable_* function
> >>> either can deal with the type mix or not anyway.
> >>
> >> I agree this area needs simplification, but I think the direction of
> >> travel should be to make the assert valid.  I agree this is probably
> >> the pragmatic fix for GCC 11 and earlier though.
> > 
> > The issue is that we compute a vector type for a use that may differ
> > from what we'd compute for it in the context of its definition (or
> > in the context of another use).  Any such "local" decision is likely
> > flawed and I'd rather simplify further doing the only decision on
> > the definition side - if there's a disconnect between the number
> > of lanes (and thus altering the VF won't help) then we have to give
> > up anyway.
> > 
> > Richard.
> > 
> 
> Thank you both for the further info! Would it be fair to close the initial PR
> regarding the ICE (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974) and
> then open a second one at a lower priority level to address these further
> improvements?

I have closed the original report.  If there's a testcase where
vectorization is possible and useful but is now rejected because of
the fix then yes, please open a new missed-optimization bugreport.

Richard.


Re: [PATCH] testsuite/aarch64: Skip SLP diagnostic under ILP32 (PR target/96974)

2021-03-31 Thread Richard Biener via Gcc-patches
On Mon, Mar 29, 2021 at 1:40 PM Christophe Lyon via Gcc-patches
 wrote:
>
> The vectorizer has a very different effect with -mabi=ilp32, and
> doesn't emit the expecte diagnostic, so this patch expects it only
> under lp64.

OK (please also backport as necessary)

> 2021-03-29  Christophe Lyon  
>
> gcc/testsuite/
> PR target/96974
> * g++.target/aarch64/sve/pr96974.C: Expect SLP diagnostic only
> under lp64.
> ---
>  gcc/testsuite/g++.target/aarch64/sve/pr96974.C | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C 
> b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
> index 363241d..54000f5 100644
> --- a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
> +++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
> @@ -15,4 +15,4 @@ struct c {
>  int coeffs[10];
>  } f;
>
> -/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of 
> vector subparts between" "slp1" } } */
> +/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of 
> vector subparts between" "slp1" { target lp64 } } } */
> --
> 2.7.4
>


[PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]

2021-03-31 Thread Stam Markianos-Wright via Gcc-patches

On 29/03/2021 10:20, Richard Biener wrote:

On Fri, 26 Mar 2021, Richard Sandiford wrote:


Richard Biener  writes:

On Wed, 24 Mar 2021, Stam Markianos-Wright wrote:


Hi all,

This patch resolves bug:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974

This is achieved by forcing a re-calculation of *stmt_vectype_out if an
incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an
extra introduced max_nunits ceiling.

I am not 100% sure if this is the best way to go about fixing this, because
this is my first look at the vectorizer and I lack knowledge of the wider
context, so do let me know if you see a better way to do this!

I have added the previously ICE-ing reproducer as a new test.

This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" for
GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10.

(the non-fdisable-tree-fre4 version has gone latent on GCC11)

Bootstrapped and reg-tested on aarch64-linux-gnu.
Also reg-tested on aarch64-none-elf.


I don't think this is going to work well given uses will expect
a vector type that's consistent here.

I think giving up is for the moment the best choice, thus replacing
the assert with vectorization failure.

In the end we shouldn't require those nunits vectypes to be
separately computed - we compute the vector type of the defs
anyway and in case they're invariant the vectorizable_* function
either can deal with the type mix or not anyway.


I agree this area needs simplification, but I think the direction of
travel should be to make the assert valid.  I agree this is probably
the pragmatic fix for GCC 11 and earlier though.


The issue is that we compute a vector type for a use that may differ
from what we'd compute for it in the context of its definition (or
in the context of another use).  Any such "local" decision is likely
flawed and I'd rather simplify further doing the only decision on
the definition side - if there's a disconnect between the number
of lanes (and thus altering the VF won't help) then we have to give
up anyway.

Richard.



Thank you both for the further info! Would it be fair to close the 
initial PR regarding the ICE 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974) and then open a 
second one at a lower priority level to address these further improvements?


Also Christophe has kindly found out that the test FAILs in ILP32, so it 
would be great to get that one in asap, too! 
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567431.html


Cheers,
Stam



Small refactoring of cgraph_node::release_body

2021-03-31 Thread Jan Hubicka
Hi,
in the dicussion on PR 99447 there was some confusion about release_body
being used in context where call edges/references survive. This is not
a valid use because it would leave stale pointers to ggc_freed memory
location. By auditing code I did not find any however this patch moves
the callees/references removal into the function itself that makes it
bit more robust.

Some code paths calling release_body already free these earlier, but
checking poitners for being NULL is not that expensive.

Bootstrapped/regtested x86_64-linux, comitted.

PR lto/99447
* cgraph.c (cgraph_node::release_body): Remove all callers and
references.
* cgraphclones.c (cgraph_node::materialize_clone): Do not do it here.
* cgraphunit.c (cgraph_node::expand): And here.
diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 80140757d16..b77c676a58a 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -1860,6 +1860,9 @@ cgraph_node::release_body (bool keep_arguments)
   lto_free_function_in_decl_state_for_node (this);
   lto_file_data = NULL;
 }
+  gcc_assert (!clones);
+  remove_callees ();
+  remove_all_references ();
 }
 
 /* Remove function from symbol table.  */
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index 95103a423f7..9f86463b42d 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -1143,11 +1143,7 @@ cgraph_node::materialize_clone ()
   /* Function is no longer clone.  */
   remove_from_clone_tree ();
   if (!this_clone_of->analyzed && !this_clone_of->clones)
-{
-  this_clone_of->release_body ();
-  this_clone_of->remove_callees ();
-  this_clone_of->remove_all_references ();
-}
+this_clone_of->release_body ();
 }
 
 #include "gt-cgraphclones.h"
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 1c74cee69ac..0b70e4d4fde 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1892,10 +1892,6 @@ cgraph_node::expand (void)
  comdat groups.  */
   assemble_thunks_and_aliases ();
   release_body ();
-  /* Eliminate all call edges.  This is important so the GIMPLE_CALL no longer
- points to the dead function body.  */
-  remove_callees ();
-  remove_all_references ();
 }
 
 /* Node comparator that is responsible for the order that corresponds


[PATCH][pushed] Fix coding style in IPA modref.

2021-03-31 Thread Martin Liška

Approved by Honza and pushed to master.

Martin

gcc/ChangeLog:

* ipa-modref.c (analyze_ssa_name_flags): Fix coding style
and one negated condition.
---
 gcc/ipa-modref.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 5f33bb5b410..ef5e62beb0e 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -1673,7 +1673,7 @@ analyze_ssa_name_flags (tree name, vec 
, int depth,
 
 			if (!record_ipa)

  lattice[index].merge (call_flags);
-   if (record_ipa)
+   else
  lattice[index].add_escape_point (call, i,
   call_flags, true);
  }
@@ -1691,8 +1691,8 @@ analyze_ssa_name_flags (tree name, vec 
, int depth,
int call_flags = deref_flags
   (gimple_call_arg_flags (call, i), ignore_stores);
if (!record_ipa)
-   lattice[index].merge (call_flags);
-   if (record_ipa)
+ lattice[index].merge (call_flags);
+   else
  lattice[index].add_escape_point (call, i,
   call_flags, false);
  }
--
2.30.2



Re: [PATCH] aarch64: Fix up *add3_poly_1 [PR99813]

2021-03-31 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> Hi!
>
> As mentioned in the PR, Uai constraint stands for
> aarch64_sve_scalar_inc_dec_immediate
> while Uav for
> aarch64_sve_addvl_addpl_immediate.
> Both *add3_aarch64 and *add3_poly_1 patterns use
>   * return aarch64_output_sve_scalar_inc_dec (operands[2]);
>   * return aarch64_output_sve_addvl_addpl (operands[2]);
> in that order, but the former with Uai,Uav order, while the
> latter with Uav,Uai instead.  This patch swaps the constraints
> so that they match the output.
>
> Bootstrapped/regtested on aarch64-linux, ok for trunk?

OK, thanks.

Richard

> 2021-03-30  Jakub Jelinek  
>   Richard Sandiford  
>
>   PR target/99813
>   * config/aarch64/aarch64.md (*add3_poly_1): Swap Uai and Uav
>   constraints on operands[2] and similarly 0 and rk constraints
>   on operands[1] corresponding to that.
>
>   * g++.target/aarch64/sve/pr99813.C: New test.
>
> --- gcc/config/aarch64/aarch64.md.jj  2021-02-25 23:07:07.851319165 +0100
> +++ gcc/config/aarch64/aarch64.md 2021-03-30 11:13:35.994077470 +0200
> @@ -2050,8 +2050,8 @@
>[(set
>  (match_operand:GPI 0 "register_operand" "=r,r,r,r,r,r,")
>  (plus:GPI
> - (match_operand:GPI 1 "register_operand" "%rk,rk,rk,rk,rk,0,rk")
> - (match_operand:GPI 2 "aarch64_pluslong_or_poly_operand" 
> "I,r,J,Uaa,Uav,Uai,Uat")))]
> + (match_operand:GPI 1 "register_operand" "%rk,rk,rk,rk,0,rk,rk")
> + (match_operand:GPI 2 "aarch64_pluslong_or_poly_operand" 
> "I,r,J,Uaa,Uai,Uav,Uat")))]
>"TARGET_SVE && operands[0] != stack_pointer_rtx"
>"@
>add\\t%0, %1, %2
> --- gcc/testsuite/g++.target/aarch64/sve/pr99813.C.jj 2021-03-30 
> 11:22:13.430290522 +0200
> +++ gcc/testsuite/g++.target/aarch64/sve/pr99813.C2021-03-30 
> 11:22:55.526819721 +0200
> @@ -0,0 +1,27 @@
> +// PR target/99813
> +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> +/* { dg-options "-O3 -march=armv8.2-a+sve -fvect-cost-model=unlimited 
> -fno-tree-dominator-opts -mtune=cortex-a72" } */
> +
> +long a, b;
> +bool c[2][14][2][16], f[2][14][2][16];
> +bool d;
> +char e[2][4][2][6];
> +void g() {
> +  a = 0;
> +  for (int h = 0; h < 2; ++h)
> +for (int i = 0; i < 14; ++i)
> +  for (int j = 0; j < 2; ++j)
> +for (int k = 0; k < 16; ++k)
> +  c[h][i][j][k] = 0;
> +  d = 0;
> +  for (int h; h < 2; ++h)
> +for (int i = 0; i < 4; ++i)
> +  for (int j = 0; j < 2; ++j)
> +for (int k = 0; k < 6; ++k)
> +  e[h][i][j][k] = 6;
> +  for (int h = 0; h < 2; ++h)
> +for (int i = 0; i < 14; ++i)
> +  for (int j = 0; j < 2; ++j)
> +for (int k = 0; k < 16; ++k)
> +  f[h][i][j][k] = b = 9;
> +}
>
>   Jakub


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2021-03-31 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]

Hi Honza, 

> -Original Message-
> From: Jan Hubicka 
> Sent: Wednesday, March 31, 2021 1:27 PM
> To: Kumar, Venkataramanan 
> Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3
> CPU
> 
> [CAUTION: External Email]
> 
> > [AMD Public Use]
> >
> > Hi Honza,
> >
> > > -Original Message-
> > > From: Jan Hubicka 
> > > Sent: Wednesday, March 31, 2021 1:15 AM
> > > To: Kumar, Venkataramanan 
> > > Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH] [X86_64]: Enable support for next generation
> > > AMD Zen3 CPU
> > >
> > > [CAUTION: External Email]
> > >
> > > Hi,
> > > this patch backports the initial support to gcc10 branch.  Since the
> > > trunk and branch diverged there is non-trivial change to cpuinfo
> > > discovery.  I do;
> > >
> > > --- a/libgcc/config/i386/cpuinfo.c
> > > +++ b/libgcc/config/i386/cpuinfo.c
> > > @@ -111,6 +111,12 @@ get_amd_cpu (unsigned int family, unsigned int
> model)
> > >if (model >= 0x30)
> > >  __cpu_model.__cpu_subtype = AMDFAM17H_ZNVER2;
> > >break;
> > > +case 0x19:
> > > +  __cpu_model.__cpu_type = AMDFAM19H;
> > > +  /* AMD family 19h version 1.  */
> > > +  if (model <= 0x0f)
> > > +   __cpu_model.__cpu_subtype = AMDFAM19H_ZNVER3;
> > > +  break;
> > >  default:
> > >break;
> > >  }
> > >
> > > While your patch also sets ZNVER3 for case where VAES is supporte
> > > that would require backporting more of logic detecting VAES.  Is
> > > that necessary?
> >
> > I think you are referring to the below change.
> >
> > diff --git a/gcc/config/i386/driver-i386.c
> > b/gcc/config/i386/driver-i386.c index ecdad5765d5..2bfa037dd8b 100644
> > --- a/gcc/config/i386/driver-i386.c
> > +++ b/gcc/config/i386/driver-i386.c
> > @@ -455,6 +455,8 @@ const char *host_detect_local_cpu (int argc, const
> char **argv)
> > processor = PROCESSOR_GEODE;
> >else if (has_feature (FEATURE_MOVBE) && family == 22)
> > processor = PROCESSOR_BTVER2;
> > +  else if (has_feature (FEATURE_VAES))
> > +   processor = PROCESSOR_ZNVER3;
> >else if (has_feature (FEATURE_CLWB))
> > processor = PROCESSOR_ZNVER2;
> >
> > My understanding is that when we use -march=native on znver3 machine it
> would check for "vaes" to detect the machine.
> > Otherwise it would detect it as znver2 machine.  So we need that detection
> logic.
> 
> I was wondering about
> 
> +case 0x19:
> +  cpu_model->__cpu_type = AMDFAM19H;
> +  /* AMD family 19h version 1.  */
> +  if (model <= 0x0f)
> +   {
> + cpu = "znver3";
> + CHECK___builtin_cpu_is ("znver3");
> + cpu_model->__cpu_subtype = AMDFAM19H_ZNVER3;
> +   }
> +  else if (has_cpu_feature (cpu_model, cpu_features2,
> +   FEATURE_VAES))
> +   {
> + cpu = "znver3";
> + CHECK___builtin_cpu_is ("znver3");
> + cpu_model->__cpu_subtype = AMDFAM19H_ZNVER3;
> +   }
> +  break;
> 
> For znver3 we detect the model number and I wonder why we also test the VAES
> feature when we don't do that for other families.

Ah I see,  I will double check on the model numbers again.
Yes we can remove the code which checks for VAES here.


> 
> Honza

Regards,
Venkat.


Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake

2021-03-31 Thread Jan Hubicka
> > It looks like X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is quite obviously
> > benefical and independent of the rest of changes.  I think we will need
> > to discuss bit more the move ratio and the code size/uop cache polution
> > issues - one option would be to use increased limits for -O3 only.
> 
> My change only increases CLEAR_RATIO, not MOVE_RATIO.   We are
> checking code size impacts on SPEC CPU 2017 and eembc.
> 
> > Can you break this out to independent patch?  I also wonder if it owuld
> 
> X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB improves performance
> only when memcpy/memset costs and MOVE_RATIO are updated the same time,
> like:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html
> 
> Make it a standalone means moving from Ice Lake patch to Skylake patch.
> 
> > not be more readable to special case this just on the beggining of
> > decide_alg.
> > > @@ -6890,6 +6891,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT 
> > > expected_size,
> > >const struct processor_costs *cost;
> > >int i;
> > >bool any_alg_usable_p = false;
> > > +  bool known_size_p = expected_size != -1;
> >
> > expected_size is not -1 if we have profile feedback and we detected from
> > histogram average size of a block.  It seems to me that from description
> > that you want the const to be actual compile time constant that would be
> > min_size == max_size I guess.
> >
> 
> You are right.  Here is the v2 patch with min_size != max_size check for
> unknown size.

Patch is OK now.  I was wondering about using avx256 for moves of known
size (per comment on MOVE_MAX_PIECES there is issue with
MAX_FIXED_MODE_SIZE, but that seems not hard to fix). Did you look into
it?

Honza
> 
> Thanks.
> 
> --
> H.J.




Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2021-03-31 Thread Jan Hubicka
> [AMD Public Use]
> 
> Hi Honza,
> 
> > -Original Message-
> > From: Jan Hubicka 
> > Sent: Wednesday, March 31, 2021 1:15 AM
> > To: Kumar, Venkataramanan 
> > Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3
> > CPU
> > 
> > [CAUTION: External Email]
> > 
> > Hi,
> > this patch backports the initial support to gcc10 branch.  Since the
> > trunk and branch diverged there is non-trivial change to cpuinfo
> > discovery.  I do;
> > 
> > --- a/libgcc/config/i386/cpuinfo.c
> > +++ b/libgcc/config/i386/cpuinfo.c
> > @@ -111,6 +111,12 @@ get_amd_cpu (unsigned int family, unsigned int model)
> >if (model >= 0x30)
> >  __cpu_model.__cpu_subtype = AMDFAM17H_ZNVER2;
> >break;
> > +case 0x19:
> > +  __cpu_model.__cpu_type = AMDFAM19H;
> > +  /* AMD family 19h version 1.  */
> > +  if (model <= 0x0f)
> > +   __cpu_model.__cpu_subtype = AMDFAM19H_ZNVER3;
> > +  break;
> >  default:
> >break;
> >  }
> > 
> > While your patch also sets ZNVER3 for case where VAES is supporte that
> > would require backporting more of logic detecting VAES.  Is that
> > necessary? 
> 
> I think you are referring to the below change. 
> 
> diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
> index ecdad5765d5..2bfa037dd8b 100644
> --- a/gcc/config/i386/driver-i386.c
> +++ b/gcc/config/i386/driver-i386.c
> @@ -455,6 +455,8 @@ const char *host_detect_local_cpu (int argc, const char 
> **argv)
> processor = PROCESSOR_GEODE;
>else if (has_feature (FEATURE_MOVBE) && family == 22)
> processor = PROCESSOR_BTVER2;
> +  else if (has_feature (FEATURE_VAES))
> +   processor = PROCESSOR_ZNVER3;
>else if (has_feature (FEATURE_CLWB))
> processor = PROCESSOR_ZNVER2;
> 
> My understanding is that when we use -march=native on znver3 machine it would 
> check for "vaes" to detect the machine.
> Otherwise it would detect it as znver2 machine.  So we need that detection 
> logic.

I was wondering about

+case 0x19:
+  cpu_model->__cpu_type = AMDFAM19H;
+  /* AMD family 19h version 1.  */
+  if (model <= 0x0f)
+   {
+ cpu = "znver3";
+ CHECK___builtin_cpu_is ("znver3");
+ cpu_model->__cpu_subtype = AMDFAM19H_ZNVER3;
+   }
+  else if (has_cpu_feature (cpu_model, cpu_features2,
+   FEATURE_VAES))
+   {
+ cpu = "znver3";
+ CHECK___builtin_cpu_is ("znver3");
+ cpu_model->__cpu_subtype = AMDFAM19H_ZNVER3;
+   }
+  break;

For znver3 we detect the model number and I wonder why we also test the
VAES feature when we don't do that for other families.

Honza


RE: [PATCH] arm: Fix mult autovectorization patterm for iwmmxt (PR target/99786)

2021-03-31 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 29 March 2021 14:16
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] arm: Fix mult autovectorization patterm for iwmmxt (PR
> target/99786)
> 
> Similarly to other recently-added autovectorization patterns, mult has
> been erroneously enabled for iwmmxt. However, V4HI and V2SI modes are
> supported, so we make an exception for them.

Ok.
Thanks,
Kyrill

> 
> The new testcase is derived from gcc.dg/ubsan/pr79904.c, with
> additional modes added.
> 
> I kept dg-do compile because 'assemble' results in error messages from
> the assembler, which are not related to this PR:
> 
> Error: selected processor does not support `tmcrr wr0,r4,r5' in ARM mode
> Error: selected processor does not support `wstrd wr0,[r0]' in ARM mode
> Error: selected processor does not support `wldrd wr0,[r0]' in ARM mode
> Error: selected processor does not support `wldrd wr2,.L5' in ARM mode
> Error: selected processor does not support `wmulul wr0,wr0,wr2' in ARM
> mode
> Error: selected processor does not support `wstrd wr0,[r0]' in ARM mode
> Error: selected processor does not support `wldrd wr0,[r0]' in ARM mode
> Error: selected processor does not support `wldrd wr2,.L8' in ARM mode
> Error: selected processor does not support `wmulwl wr0,wr0,wr2' in ARM
> mode
> Error: selected processor does not support `wstrd wr0,[r0]' in ARM mode
> 
> 2021-03-29  Christophe Lyon  
> 
>   PR target/99786
> 
>   gcc/
>   * config/arm/vec-common.md (mul3): Disable on iwMMXT,
> expect
>   for V4HI and V2SI.
> 
>   gcc/testsuite/
>   * gcc.target/arm/pr99786.c: New test.
> ---
>  gcc/config/arm/vec-common.md   |  5 -
>  gcc/testsuite/gcc.target/arm/pr99786.c | 30
> ++
>  2 files changed, 34 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/pr99786.c
> 
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> common.md
> index 48ee659..0b2b3b1 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -103,7 +103,10 @@ (define_expand "mul3"
>[(set (match_operand:VDQWH 0 "s_register_operand")
>   (mult:VDQWH (match_operand:VDQWH 1 "s_register_operand")
>   (match_operand:VDQWH 2 "s_register_operand")))]
> -  "ARM_HAVE__ARITH"
> +  "ARM_HAVE__ARITH
> +   && (!TARGET_REALLY_IWMMXT
> +   || mode == V4HImode
> +   || mode == V2SImode)"
>  )
> 
>  (define_expand "smin3"
> diff --git a/gcc/testsuite/gcc.target/arm/pr99786.c
> b/gcc/testsuite/gcc.target/arm/pr99786.c
> new file mode 100644
> index 000..11d86f0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pr99786.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-mcpu=*" }
> { "-mcpu=iwmmxt" } } */
> +/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-mabi=*" }
> { "-mabi=iwmmxt" } } */
> +/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-march=*" }
> { "-march=iwmmxt" } } */
> +/* { dg-skip-if "Test is specific to ARM mode" { arm*-*-* } { "-mthumb" }
> { "" } } */
> +/* { dg-require-effective-target arm32 } */
> +/* { dg-require-effective-target arm_iwmmxt_ok } */
> +/* { dg-options "-O3 -mcpu=iwmmxt" } */
> +
> +typedef signed char V __attribute__((vector_size (8)));
> +
> +void
> +foo (V *a)
> +{
> +  *a = *a * 3;
> +}
> +
> +typedef signed short Vshort __attribute__((vector_size (8)));
> +void
> +foo_short (Vshort *a)
> +{
> +  *a = *a * 3;
> +}
> +
> +typedef signed int Vint __attribute__((vector_size (8)));
> +void
> +foo_int (Vint *a)
> +{
> +  *a = *a * 3;
> +}
> --
> 2.7.4



Re: [PATCH] testsuite: Disable zero-scratch-regs-{8, 9, 10, 11}.c on s390* [PR97680]

2021-03-31 Thread Eric Botcazou
> That is true, but nothing really happened during the 5 months that the tests
> have been failing on many other architectures (except that powerpc and arm
> had skipped those tests).  There has been a PR open for all those 5 months.

So what?  This is not the first example and I don't see anything special with 
it.  You or maintainers can decide to XFAIL particular architectures at will, 
but hiding the failures by default is IMO not appropriate.

> We can perhaps revert the skips after branching GCC 11 off, but I have
> little hope other target maintainers will do what you did, so unsure if it
> would help.  And the changes need people familiar with each of the backends
> to decide what needs to be done and what is doable.

That's exactly the same situation as for -fstack-usage/-Wstack-usage, where I 
intentionally made gcc.dg/stack-usage-1.c fail by default so that maintainers 
could add the missing bits; this worked relatively well.

-- 
Eric Botcazou




RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2021-03-31 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]

Hi Honza,

> -Original Message-
> From: Jan Hubicka 
> Sent: Wednesday, March 31, 2021 1:15 AM
> To: Kumar, Venkataramanan 
> Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3
> CPU
> 
> [CAUTION: External Email]
> 
> Hi,
> this patch backports the initial support to gcc10 branch.  Since the
> trunk and branch diverged there is non-trivial change to cpuinfo
> discovery.  I do;
> 
> --- a/libgcc/config/i386/cpuinfo.c
> +++ b/libgcc/config/i386/cpuinfo.c
> @@ -111,6 +111,12 @@ get_amd_cpu (unsigned int family, unsigned int model)
>if (model >= 0x30)
>  __cpu_model.__cpu_subtype = AMDFAM17H_ZNVER2;
>break;
> +case 0x19:
> +  __cpu_model.__cpu_type = AMDFAM19H;
> +  /* AMD family 19h version 1.  */
> +  if (model <= 0x0f)
> +   __cpu_model.__cpu_subtype = AMDFAM19H_ZNVER3;
> +  break;
>  default:
>break;
>  }
> 
> While your patch also sets ZNVER3 for case where VAES is supporte that
> would require backporting more of logic detecting VAES.  Is that
> necessary? 

I think you are referring to the below change. 

diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index ecdad5765d5..2bfa037dd8b 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -455,6 +455,8 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
processor = PROCESSOR_GEODE;
   else if (has_feature (FEATURE_MOVBE) && family == 22)
processor = PROCESSOR_BTVER2;
+  else if (has_feature (FEATURE_VAES))
+   processor = PROCESSOR_ZNVER3;
   else if (has_feature (FEATURE_CLWB))
processor = PROCESSOR_ZNVER2;

My understanding is that when we use -march=native on znver3 machine it would 
check for "vaes" to detect the machine.
Otherwise it would detect it as znver2 machine.  So we need that detection 
logic.


> I see it may make znver3 to be defaulted on future znver4 if
> it stays with amdfam19, but we did not do this before.
Usually we have at least one differentiating ISA or change in family name or 
model name going from one processor to another. 

> 
> Bootstrapped/regtested x86_64-linux.  With -march=native on znver3
> machine we get right flags, but trunk in addition passes:
> 
> -mno-amx-bf16
> -mno-amx-int8
> -mno-amx-tile
> -mno-avxvnni
> -mno-hreset
> -mno-kl
> -mno-serialize
> -mno-tsxldtrk
> -mno-uintr
> -mno-widekl
> 
> Which are options we did not backported.
> Atop of that I plan to backport the tuning patches with exception of
> gather which seems bit controversal and can wait for gcc11.

Ok that seems fine.

Regards,
Venkat.

> 
> Honza
> 
> 2021-03-30  Jan Hubicka  
> 
> Backport
> 
> Venkataramanan Kumar  
> Sharavan Kumar  
> * common/config/i386/cpuinfo.h (get_amd_cpu) recognize znver3.
> * common/config/i386/i386-common.c (processor_names): Add
> znver3.
> (processor_alias_table): Add znver3 and AMDFAM19H entry.
> * common/config/i386/i386-cpuinfo.h (processor_types): Add
> AMDFAM19H.
> (processor_subtypes): AMDFAM19H_ZNVER3.
> * config.gcc (i[34567]86-*-linux* | ...): Likewise.
> * config/i386/driver-i386.c: (host_detect_local_cpu): Let
> -march=native recognize znver3 processors.
> * config/i386/i386-c.c (ix86_target_macros_internal): Add
> znver3.
> * config/i386/i386-options.c (m_znver3): New definition.
> (m_ZNVER): Include m_znver3.
> (processor_cost_table): Add znver3.
> * config/i386/i386.c (ix86_reassociation_width): Likewise.
> * config/i386/i386.h (TARGET_znver3): New definition.
> (enum processor_type): Add PROCESSOR_ZNVER3.
> * config/i386/i386.md (define_attr "cpu"): Add znver3.
> * config/i386/x86-tune-sched.c: (ix86_issue_rate): Likewise.
> (ix86_adjust_cost): Likewise.
> * config/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_CHAINS:
> Likewise.
> * config/i386/znver1.md: Add new reservations for znver3.
> * doc/extend.texi: Add details about znver3.
> * doc/invoke.texi: Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2021-03-30  Jan Hubicka  
> 
> * gcc.target/i386/funcspec-56.inc: Handle new march.
> 
> libgcc/ChangeLog:
> 
> 2021-03-30  Jan Hubicka  
> 
> * config/i386/cpuinfo.c (get_amd_cpu): Support amdfam19.
> * config/i386/cpuinfo.h (enum processor_types): Add AMDFAM19H.
> (enum processor_subtypes): Add AMDFAM19H_ZNVER3.
> 
> diff --git a/gcc/common/config/i386/i386-common.c
> b/gcc/common/config/i386/i386-common.c
> index 1e4d25f052a..97335d42af1 100644
> --- a/gcc/common/config/i386/i386-common.c
> +++ b/gcc/common/config/i386/i386-common.c
> @@ -1582,7 +1582,8 @@ const char *const processor_names[] =
>"btver1",
>"btver2",
>"znver1",
> -  "znver2"
> +  "znver2",
> +  "znver3"
>  

Re: Patch ping

2021-03-31 Thread Richard Biener
On Wed, 31 Mar 2021, Jakub Jelinek wrote:

> Hi!
> 
> I'd like to ping the PR98860 P1 fix - workaround for PECOFF linkers without
> DWARF5 support - to make -gdwarf-4 the default in such configurations.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567245.html

OK.

Richard.

> Thanks
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Patch ping

2021-03-31 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping the PR98860 P1 fix - workaround for PECOFF linkers without
DWARF5 support - to make -gdwarf-4 the default in such configurations.

https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567245.html

Thanks

Jakub



Re: [PATCH] testsuite: Disable zero-scratch-regs-{8,9,10,11}.c on s390* [PR97680]

2021-03-31 Thread Jakub Jelinek via Gcc-patches
On Wed, Mar 31, 2021 at 08:01:29AM +0200, Eric Botcazou wrote:
> > It looks like the latter - I've seen no attempt by the original authors to
> > make the feature work on more targets than they cared for.
> 
> On the other hand, if you hide the failures, there is essentially zero chance 
> that architecture maintainers pick up the pieces (I personally implemented 
> the 
> SPARC support only because I had ran into the failures in the testsuite).  So 
> doing the inverse filtering sounds quite counterproductive to me and IMO it's 
> up to the architecture maintainers to decide on a case-by-case basis.

That is true, but nothing really happened during the 5 months that the tests
have been failing on many other architectures (except that powerpc and arm
had skipped those tests).  There has been a PR open for all those 5 months.

We can perhaps revert the skips after branching GCC 11 off, but I have
little hope other target maintainers will do what you did, so unsure if it
would help.  And the changes need people familiar with each of the backends
to decide what needs to be done and what is doable.

Jakub



Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake

2021-03-31 Thread Richard Biener via Gcc-patches
On Tue, Mar 23, 2021 at 12:59 AM H.J. Lu via Gcc-patches
 wrote:
>
> On Mon, Mar 22, 2021 at 7:10 AM Jan Hubicka  wrote:
> >
> > >
> > > gcc/
> > >
> > >   * config/i386/i386-expand.c (expand_set_or_cpymem_via_rep):
> > >   For TARGET_PREFER_KNOWN_REP_MOVSB_STOSB, don't convert QImode
> > >   to SImode.
> > >   (decide_alg): For TARGET_PREFER_KNOWN_REP_MOVSB_STOSB, use
> > >   "rep movsb/stosb" only for known sizes.
> > >   * config/i386/i386-options.c (processor_cost_table): Use Ice
> > >   Lake cost for Cannon Lake, Ice Lake, Tiger Lake, Sapphire
> > >   Rapids and Alder Lake.
> > >   * config/i386/i386.h (TARGET_PREFER_KNOWN_REP_MOVSB_STOSB): New.
> > >   * config/i386/x86-tune-costs.h (icelake_memcpy): New.
> > >   (icelake_memset): Likewise.
> > >   (icelake_cost): Likewise.
> > >   * config/i386/x86-tune.def (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB):
> > >   New.
> >
> > It looks like X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is quite obviously
> > benefical and independent of the rest of changes.  I think we will need
> > to discuss bit more the move ratio and the code size/uop cache polution
> > issues - one option would be to use increased limits for -O3 only.
>
> My change only increases CLEAR_RATIO, not MOVE_RATIO.   We are
> checking code size impacts on SPEC CPU 2017 and eembc.
>
> > Can you break this out to independent patch?  I also wonder if it owuld
>
> X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB improves performance
> only when memcpy/memset costs and MOVE_RATIO are updated the same time,
> like:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html
>
> Make it a standalone means moving from Ice Lake patch to Skylake patch.
>
> > not be more readable to special case this just on the beggining of
> > decide_alg.
> > > @@ -6890,6 +6891,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT 
> > > expected_size,
> > >const struct processor_costs *cost;
> > >int i;
> > >bool any_alg_usable_p = false;
> > > +  bool known_size_p = expected_size != -1;
> >
> > expected_size is not -1 if we have profile feedback and we detected from
> > histogram average size of a block.  It seems to me that from description
> > that you want the const to be actual compile time constant that would be
> > min_size == max_size I guess.
> >
>
> You are right.  Here is the v2 patch with min_size != max_size check for
> unknown size.

OK.

Thanks,
Richard.

> Thanks.
>
> --
> H.J.


Re: [PATCH] testsuite, v2: Disable zero-scratch-regs-{8, 9, 10, 11}.c on all but ... [PR97680]

2021-03-31 Thread Richard Biener via Gcc-patches
On Tue, Mar 30, 2021 at 5:01 PM Jakub Jelinek  wrote:
>
> On Tue, Mar 30, 2021 at 02:33:17PM +0200, Richard Biener via Gcc-patches 
> wrote:
> > > I don't know, perhaps.
> > > Seems the target hook is only defined on
> > > config/i386/i386.c:#undef TARGET_ZERO_CALL_USED_REGS
> > > config/i386/i386.c:#define TARGET_ZERO_CALL_USED_REGS 
> > > ix86_zero_call_used_regs
> > > config/sparc/sparc.c:#undef TARGET_ZERO_CALL_USED_REGS
> > > config/sparc/sparc.c:#define TARGET_ZERO_CALL_USED_REGS 
> > > sparc_zero_call_used_regs
> > > but apparently many of the tests actually succeed on various targets that
> > > don't define those hooks.  E.g. I haven't seen them to fail on aarch64,
> > > on arm only the -10.c fails, on powerpc*/s390* all {8,9,10,11} fail (plus
> > > 5 is skipped on power*-aix*).
> > > On ia64 according to testresults {6,7,8,9,10,11} fail, some with ICEs.
> > > On mipsel according to testresults {9,10,11} fail, some with ICEs.
> > > On nvptx at least 1-9 succeed, 10-11 don't know, don't have assert.h 
> > > around.
> > >
> > > So, do we want to fill in negative dg-skip-if for the 6-11 tests or
> > > positive?  In any case, is there any hope any of the maintainers or the
> > > original submitter will change anything for GCC 12, or are we going to end
> > > up with a very narrowly supported feature?
> >
> > It looks like the latter - I've seen no attempt by the original authors to 
> > make
> > the feature work on more targets than they cared for.
>
> So, like this instead?
>
> I've kept {5,6,7} with aix,ia64,ia64 skipped because those seems like
> outliers, it works pretty much everywhere but on those.
> The rest have known good targets.
>
> Tested on x86_64-linux, verified all tests are run there.

OK.

> 2021-03-30  Jakub Jelinek  
>
> PR testsuite/97680
> * c-c++-common/zero-scratch-regs-6.c: Skip on ia64.
> * c-c++-common/zero-scratch-regs-7.c: Likewise.
> * c-c++-common/zero-scratch-regs-8.c: Change from dg-skip-if of
> selected unsupported triplets to all targets but selected triplets
> of supported targets.
> * c-c++-common/zero-scratch-regs-9.c: Likewise.
> * c-c++-common/zero-scratch-regs-10.c: Likewise.
> * c-c++-common/zero-scratch-regs-11.c: Likewise.
>
> --- gcc/testsuite/c-c++-common/zero-scratch-regs-6.c.jj 2020-10-31 
> 17:41:19.793739605 +0100
> +++ gcc/testsuite/c-c++-common/zero-scratch-regs-6.c2021-03-30 
> 16:19:00.509582587 +0200
> @@ -1,4 +1,5 @@
>  /* { dg-do run } */
> +/* { dg-skip-if "not implemented" { ia64*-*-* } } */
>  /* { dg-options "-O2 -fzero-call-used-regs=all-gpr-arg" } */
>
>  #include "zero-scratch-regs-1.c"
> --- gcc/testsuite/c-c++-common/zero-scratch-regs-7.c.jj 2020-10-31 
> 17:41:19.793739605 +0100
> +++ gcc/testsuite/c-c++-common/zero-scratch-regs-7.c2021-03-30 
> 16:19:12.059454807 +0200
> @@ -1,4 +1,5 @@
>  /* { dg-do run } */
> +/* { dg-skip-if "not implemented" { ia64*-*-* } } */
>  /* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
>
>  #include "zero-scratch-regs-1.c"
> --- gcc/testsuite/c-c++-common/zero-scratch-regs-8.c.jj 2020-11-11 
> 01:46:03.392696119 +0100
> +++ gcc/testsuite/c-c++-common/zero-scratch-regs-8.c2021-03-30 
> 16:21:28.453945834 +0200
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-skip-if "not implemented" { powerpc*-*-* } } */
> +/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
> aarch64*-*-* arm*-*-* nvptx*-*-* } } } */
>  /* { dg-options "-O2 -fzero-call-used-regs=all-arg" } */
>
>  #include "zero-scratch-regs-1.c"
> --- gcc/testsuite/c-c++-common/zero-scratch-regs-9.c.jj 2020-11-11 
> 01:46:03.392696119 +0100
> +++ gcc/testsuite/c-c++-common/zero-scratch-regs-9.c2021-03-30 
> 16:21:41.711799156 +0200
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-skip-if "not implemented" { powerpc*-*-* } } */
> +/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
> aarch64*-*-* arm*-*-* nvptx*-*-* } } } */
>  /* { dg-options "-O2 -fzero-call-used-regs=all" } */
>
>  #include "zero-scratch-regs-1.c"
> --- gcc/testsuite/c-c++-common/zero-scratch-regs-10.c.jj2021-03-18 
> 15:32:56.459617723 +0100
> +++ gcc/testsuite/c-c++-common/zero-scratch-regs-10.c   2021-03-30 
> 16:21:55.017651951 +0200
> @@ -1,6 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-skip-if "not implemented" { powerpc*-*-* } } */
> -/* { dg-skip-if "not implemented" { arm*-*-* } } */
> +/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
> aarch64*-*-* nvptx*-*-* } } } */
>  /* { dg-options "-O2" } */
>
>  #include 
> --- gcc/testsuite/c-c++-common/zero-scratch-regs-11.c.jj2020-11-11 
> 01:46:03.392696119 +0100
> +++ gcc/testsuite/c-c++-common/zero-scratch-regs-11.c   2021-03-30 
> 16:22:04.439547999 +0200
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-skip-if "not implemented" { powerpc*-*-* } } */
> +/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* 
> aarch64*-*-* 

Re: [PATCH] testsuite: Disable zero-scratch-regs-{8, 9, 10, 11}.c on s390* [PR97680]

2021-03-31 Thread Eric Botcazou
> It looks like the latter - I've seen no attempt by the original authors to
> make the feature work on more targets than they cared for.

On the other hand, if you hide the failures, there is essentially zero chance 
that architecture maintainers pick up the pieces (I personally implemented the 
SPARC support only because I had ran into the failures in the testsuite).  So 
doing the inverse filtering sounds quite counterproductive to me and IMO it's 
up to the architecture maintainers to decide on a case-by-case basis.

-- 
Eric Botcazou