Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake
> > So in neither of those scenarios testing maxsize=minsize alone makes too > > much sense to me... What was the original motivation for differentiating > > between precisely known size? There is a case that could meet small maxsize. https://godbolt.org/z/489Tf7ssj typedef unsigned char e_u8; #define MAXBC 8 void MixColumn(e_u8 a[4][MAXBC], e_u8 BC) { e_u8 b[4][MAXBC]; int i, j; for(i = 0; i < 4; i++) for(j = 0; j < BC; j++) a[i][j] = b[i][j]; } Where BC is unsigned char so maxsize will be 256. If we set stringop_alg to rep_1_byte the code could be like movzbl %sil, %r8d movq%rdi, %rdx leaq-40(%rsp), %rax movq%r8, %r9 leaq-8(%rsp), %r10 testb %r9b, %r9b je .L5 movq%rdx, %rdi movq%rax, %rsi movq%r8, %rcx rep movsb addq$8, %rax addq$8, %rdx cmpq%r10, %rax jne .L2 ret In our test we found this is much slower than current trunk because rep movsb triggers machine clear events, while in the current trunk such small size is handled in the loop mov epilogue and rep movsq is never executed. So here we disabled inline for unknown size to avoid potential issues like this. H.J. Lu via Gcc-patches 于2021年4月1日周四 上午1:55写道: > > On Wed, Mar 31, 2021 at 10:43 AM Jan Hubicka wrote: > > > > > > Reading through the optimization manual it seems that mosvb is fast for > > > > small block no matter if the size is hard wired. In that case you > > > > probably want to check whetehr max_size or expected_size is known to be > > > > small rather than max_size == min_size and both being small. > > > > > > > > But it depends on what CPU really does. > > > > Honza > > > > > > For small data size, rep movsb is faster only under certain conditions. > > > We > > > can continue fine tuning rep movsb. > > > > OK, I however wonder why you need condtion maxsize=minsize. > > - If CPU is looking for movl $cst, %rcx than we probably want to be > >sure that it is not moved away fro rep ;movsb by adding fused pattern > > - If rep movsb is slower than loop for very small blocks then you want > >to set lower bound on minsize & expected size, but you do not need > >to require maxsize=minsize > > - If rep movsb is slower than sequence of moves for small blocks then > >one needs to tweak move by pieces > > - If rep movsb is slower for larger blocks than you want to test > >maxsize and expected size > > So in neither of those scenarios testing maxsize=minsize alone makes too > > much sense to me... What was the original motivation for differentiating > > between precisely known size? > > > > I am mostly curious because it is not that uncomon to have small maxsize > > because we are able to track the object size and using short sequence > > for those would be nice. > > > > Having minsize non-trivial may not be that uncommon these days either > > given that we track value ranges (and under assumption that > > memcpy/memset expanders was updated to take these into account). > > > > Hongyu has done some analysis on this. Hongyu, can you share what > you got? > > Thanks. > > -- > H.J. -- Regards, Hongyu, Wang
Re: [PATCH] c++, abi: Fix abi_tag attribute handling [PR98481]
On Thu, Apr 01, 2021 at 12:52:20AM -0400, Jason Merrill wrote: > On 1/8/21 2:29 PM, Jakub Jelinek wrote: > > On Fri, Jan 08, 2021 at 02:22:59PM -0500, Jason Merrill wrote: > > > I like the idea to use *walk_subtrees to distinguish between walking > > > syntactic subtrees and walking type-identity subtrees. But it should be > > > more general; how does this look to you? > > > > LGTM, thanks. > > As discussed on IRC, we probably want to fix this in GCC 10 as well, but not > by default. The first patch adds ABI v15 and fixes the bug for !v14, so > -fabi-version=0 has the fix, but the default behavior does not. The second > patch adds ABI v15 on trunk. > > It would be nice to make -Wabi/-fabi-compat-version handle this case, but > that would require a bunch of new code unsuitable for this point in the > process. > > Does this make sense to you? LGTM, thanks. > commit 123f254cce416a4d50445465b88af6d8e754dc5e > Author: Jakub Jelinek > Date: Thu Jan 7 17:47:18 2021 +0100 > > c++, abi: Fix abi_tag attribute handling [PR98481] > > In GCC10 cp_walk_subtrees has been changed to walk template arguments. > As the following testcase, that changed the mangling of some functions. > I believe the previous behavior that find_abi_tags_r doesn't recurse into > template args has been the correct one, but setting *walk_subtrees = 0 > for the types and handling the types subtree walking manually in > find_abi_tags_r looks too hard, there are a lot of subtrees and details > what > should and shouldn't be walked, both in tree.c (walk_type_fields there, > which is static) and in cp_walk_subtrees itself. > > The following patch abuses the fact that *walk_subtrees is an int to > tell cp_walk_subtrees it shouldn't walk the template args. > > But we don't want to introduce an ABI change in the middle of the GCC 10 > cycle, so the GCC 10 version of this patch introduces ABI v15 for the fix, > which will be available but not default in GCC 10.3. > > Co-authored-by: Jason Merrill > > gcc/cp/ChangeLog: > > PR c++/98481 > * class.c (find_abi_tags_r): Set *walk_subtrees to 2 instead of 1 > for types. > (mark_abi_tags_r): Likewise. > * tree.c (cp_walk_subtrees): If *walk_subtrees_p is 2, look > through > typedefs. > > gcc/testsuite/ChangeLog: > > PR c++/98481 > * g++.dg/abi/abi-tag24.C: New test. > * g++.dg/abi/abi-tag24a.C: New test. > * g++.dg/abi/macro0.C: Adjust expected value. > > gcc/ChangeLog: > > PR c++/98481 > * common.opt (fabi-version): Default to 14. > > gcc/c-family/ChangeLog: > > PR c++/98481 > * c-opts.c (c_common_post_options): Bump latest_abi_version. > > diff --git a/gcc/common.opt b/gcc/common.opt > index 9cc47b16cac..ec5235c3a41 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -956,10 +956,13 @@ Driver Undocumented > ; 14: Corrects the mangling of nullptr expression. > ; Default in G++ 10. > ; > +; 15: Corrects G++ 10 ABI tag regression [PR98481]. > +; Available, but not default, in G++ 10.3. > +; > ; Additional positive integers will be assigned as new versions of > ; the ABI become the default version of the ABI. > fabi-version= > -Common Joined RejectNegative UInteger Var(flag_abi_version) Init(0) > +Common Joined RejectNegative UInteger Var(flag_abi_version) Init(14) > The version of the C++ ABI in use. > > faggressive-loop-optimizations > diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c > index 58ba0948e79..c51d6d34726 100644 > --- a/gcc/c-family/c-opts.c > +++ b/gcc/c-family/c-opts.c > @@ -943,7 +943,7 @@ c_common_post_options (const char **pfilename) > >/* Change flag_abi_version to be the actual current ABI level, for the > benefit of c_cpp_builtins, and to make comparison simpler. */ > - const int latest_abi_version = 14; > + const int latest_abi_version = 15; >/* Generate compatibility aliases for ABI v11 (7.1) by default. */ >const int abi_compat_default = 11; > > diff --git a/gcc/cp/class.c b/gcc/cp/class.c > index ed8f9527929..c0101130ba3 100644 > --- a/gcc/cp/class.c > +++ b/gcc/cp/class.c > @@ -1450,6 +1450,10 @@ mark_or_check_tags (tree t, tree *tp, abi_tag_data *p, > bool val) > static tree > find_abi_tags_r (tree *tp, int *walk_subtrees, void *data) > { > + if (TYPE_P (*tp) && *walk_subtrees == 1 && flag_abi_version != 14) > +/* Tell cp_walk_subtrees to look though typedefs. [PR98481] */ > +*walk_subtrees = 2; > + >if (!OVERLOAD_TYPE_P (*tp)) > return NULL_TREE; > > @@ -1470,6 +1474,10 @@ find_abi_tags_r (tree *tp, int *walk_subtrees, void > *data) > static tree > mark_abi_tags_r (tree *tp, int *walk_subtrees, void *data) > { > + if (TYPE_P (*tp) && *walk_subtrees == 1 && flag_abi_version != 14) > +
Re: [PATCH] c++, abi: Fix abi_tag attribute handling [PR98481]
On 1/8/21 2:29 PM, Jakub Jelinek wrote: On Fri, Jan 08, 2021 at 02:22:59PM -0500, Jason Merrill wrote: I like the idea to use *walk_subtrees to distinguish between walking syntactic subtrees and walking type-identity subtrees. But it should be more general; how does this look to you? LGTM, thanks. As discussed on IRC, we probably want to fix this in GCC 10 as well, but not by default. The first patch adds ABI v15 and fixes the bug for !v14, so -fabi-version=0 has the fix, but the default behavior does not. The second patch adds ABI v15 on trunk. It would be nice to make -Wabi/-fabi-compat-version handle this case, but that would require a bunch of new code unsuitable for this point in the process. Does this make sense to you? commit 123f254cce416a4d50445465b88af6d8e754dc5e Author: Jakub Jelinek Date: Thu Jan 7 17:47:18 2021 +0100 c++, abi: Fix abi_tag attribute handling [PR98481] In GCC10 cp_walk_subtrees has been changed to walk template arguments. As the following testcase, that changed the mangling of some functions. I believe the previous behavior that find_abi_tags_r doesn't recurse into template args has been the correct one, but setting *walk_subtrees = 0 for the types and handling the types subtree walking manually in find_abi_tags_r looks too hard, there are a lot of subtrees and details what should and shouldn't be walked, both in tree.c (walk_type_fields there, which is static) and in cp_walk_subtrees itself. The following patch abuses the fact that *walk_subtrees is an int to tell cp_walk_subtrees it shouldn't walk the template args. But we don't want to introduce an ABI change in the middle of the GCC 10 cycle, so the GCC 10 version of this patch introduces ABI v15 for the fix, which will be available but not default in GCC 10.3. Co-authored-by: Jason Merrill gcc/cp/ChangeLog: PR c++/98481 * class.c (find_abi_tags_r): Set *walk_subtrees to 2 instead of 1 for types. (mark_abi_tags_r): Likewise. * tree.c (cp_walk_subtrees): If *walk_subtrees_p is 2, look through typedefs. gcc/testsuite/ChangeLog: PR c++/98481 * g++.dg/abi/abi-tag24.C: New test. * g++.dg/abi/abi-tag24a.C: New test. * g++.dg/abi/macro0.C: Adjust expected value. gcc/ChangeLog: PR c++/98481 * common.opt (fabi-version): Default to 14. gcc/c-family/ChangeLog: PR c++/98481 * c-opts.c (c_common_post_options): Bump latest_abi_version. diff --git a/gcc/common.opt b/gcc/common.opt index 9cc47b16cac..ec5235c3a41 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -956,10 +956,13 @@ Driver Undocumented ; 14: Corrects the mangling of nullptr expression. ; Default in G++ 10. ; +; 15: Corrects G++ 10 ABI tag regression [PR98481]. +; Available, but not default, in G++ 10.3. +; ; Additional positive integers will be assigned as new versions of ; the ABI become the default version of the ABI. fabi-version= -Common Joined RejectNegative UInteger Var(flag_abi_version) Init(0) +Common Joined RejectNegative UInteger Var(flag_abi_version) Init(14) The version of the C++ ABI in use. faggressive-loop-optimizations diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c index 58ba0948e79..c51d6d34726 100644 --- a/gcc/c-family/c-opts.c +++ b/gcc/c-family/c-opts.c @@ -943,7 +943,7 @@ c_common_post_options (const char **pfilename) /* Change flag_abi_version to be the actual current ABI level, for the benefit of c_cpp_builtins, and to make comparison simpler. */ - const int latest_abi_version = 14; + const int latest_abi_version = 15; /* Generate compatibility aliases for ABI v11 (7.1) by default. */ const int abi_compat_default = 11; diff --git a/gcc/cp/class.c b/gcc/cp/class.c index ed8f9527929..c0101130ba3 100644 --- a/gcc/cp/class.c +++ b/gcc/cp/class.c @@ -1450,6 +1450,10 @@ mark_or_check_tags (tree t, tree *tp, abi_tag_data *p, bool val) static tree find_abi_tags_r (tree *tp, int *walk_subtrees, void *data) { + if (TYPE_P (*tp) && *walk_subtrees == 1 && flag_abi_version != 14) +/* Tell cp_walk_subtrees to look though typedefs. [PR98481] */ +*walk_subtrees = 2; + if (!OVERLOAD_TYPE_P (*tp)) return NULL_TREE; @@ -1470,6 +1474,10 @@ find_abi_tags_r (tree *tp, int *walk_subtrees, void *data) static tree mark_abi_tags_r (tree *tp, int *walk_subtrees, void *data) { + if (TYPE_P (*tp) && *walk_subtrees == 1 && flag_abi_version != 14) +/* Tell cp_walk_subtrees to look though typedefs. */ +*walk_subtrees = 2; + if (!OVERLOAD_TYPE_P (*tp)) return NULL_TREE; diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c index b36ca4eddc0..10b818d1370 100644 --- a/gcc/cp/tree.c +++ b/gcc/cp/tree.c @@ -5055,8 +5055,18 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p,
Re: Small refactoring of cgraph_node::release_body
On 3/31/21 8:45 PM, David Edelsohn via Gcc-patches wrote: > This patch is causing new crashes in the testsuite. > > ICE in release_body, at graph.c:1863 > ranges offset out of range Hello. Should be fixed with 23ce9945d5efa77c96161443f68e03664705ada3. Martin > > Thanks, David >
Re: [PATCH] PR fortran/99840 - [8/9/10/11 Regression] ICE in gfc_simplify_matmul, at fortran/simplify.c:4777
Yes OK for trunk and affected branches. Thanks, Jerry On 3/31/21 2:08 PM, Harald Anlauf via Fortran wrote: Dear all, the simplification of the TRANSPOSE of a zero-sized array would lead to an ICE if the result was used in a subsequent simplification of a MATMUL. The reason was the lack of the proper initialization of the shape, which is mpz_t. Use mpz_init_set instead of mpz_set. Regtested on x86_64-pc-linux-gnu. OK for mainline? Since this is a regression, backport to all affected branches? Thanks, Harald PR fortran/99840 - ICE in gfc_simplify_matmul, at fortran/simplify.c:4777 The simplification of the transposition of a constant array shall properly initialize and set the shape of the result. gcc/fortran/ChangeLog: PR fortran/99840 * simplify.c (gfc_simplify_transpose): Properly initialize resulting shape. gcc/testsuite/ChangeLog: PR fortran/99840 * gfortran.dg/transpose_5.f90: New test.
[committed] analyzer: avoid printing '' for SSA names [PR99771]
We don't want to print '' in our diagnostics, but PR analyzer/99771 lists various cases where -fanalyzer does, due to using the SSA_NAME for a temporary when determining the best tree to use. This can happen in two ways: (a) ...when a better expression than the SSA_NAME could be built, but finding it requires traversing the relationships in the region_model in a graph-like way, rather than by considering individual svalues and regions. (b) ...when the only remaining user of the underlying svalue is the SSA_NAME, typically due to the diagnostic referring to a temporary. I've been experimenting with fixing (a), but don't have a good fix yet. In the meantime, this patch addresses (b) by detecting if we have the SSA_NAME for a temporary, and, for the cases where it's possible, reconstructing a tree by walking the def-stmts. This fixes various cases of (b) and ameliorates some cases of (a). Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r11-7941-ge4bb1bd60a9fd1bed36092a990aa5fed5d45bfa6. gcc/analyzer/ChangeLog: PR analyzer/99771 * analyzer.cc (maybe_reconstruct_from_def_stmt): New. (fixup_tree_for_diagnostic_1): New. (fixup_tree_for_diagnostic): New. * analyzer.h (fixup_tree_for_diagnostic): New decl. * checker-path.cc (call_event::get_desc): Call fixup_tree_for_diagnostic and use it for the call_with_state call. (warning_event::get_desc): Likewise for the final_event and make_label_text calls. * engine.cc (impl_region_model_context::on_state_leak): Likewise for the on_leak and add_diagnostic calls. * region-model.cc (region_model::get_representative_tree): Likewise for the result. gcc/testsuite/ChangeLog: PR analyzer/99771 * gcc.dg/analyzer/data-model-10.c: Update expected output. * gcc.dg/analyzer/malloc-ipa-13.c: Likewise. * gcc.dg/analyzer/malloc-ipa-13a.c: New test. * gcc.dg/analyzer/pr99771-1.c: New test. --- gcc/analyzer/analyzer.cc | 128 ++ gcc/analyzer/analyzer.h | 1 + gcc/analyzer/checker-path.cc | 10 +- gcc/analyzer/engine.cc| 5 +- gcc/analyzer/region-model.cc | 4 +- gcc/testsuite/gcc.dg/analyzer/data-model-10.c | 3 +- gcc/testsuite/gcc.dg/analyzer/malloc-ipa-13.c | 3 +- .../gcc.dg/analyzer/malloc-ipa-13a.c | 38 ++ gcc/testsuite/gcc.dg/analyzer/pr99771-1.c | 63 + 9 files changed, 244 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/analyzer/malloc-ipa-13a.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr99771-1.c diff --git a/gcc/analyzer/analyzer.cc b/gcc/analyzer/analyzer.cc index df8d881f3cd..2b4cffd08f5 100644 --- a/gcc/analyzer/analyzer.cc +++ b/gcc/analyzer/analyzer.cc @@ -60,6 +60,134 @@ get_stmt_location (const gimple *stmt, function *fun) return stmt->location; } +static tree +fixup_tree_for_diagnostic_1 (tree expr, hash_set *visited); + +/* Subroutine of fixup_tree_for_diagnostic_1, called on SSA names. +Attempt to reconstruct a a tree expression for SSA_NAME +based on its def-stmt. +SSA_NAME must be non-NULL. +VISITED must be non-NULL; it is used to ensure termination. + +Return NULL_TREE if there is a problem. */ + +static tree +maybe_reconstruct_from_def_stmt (tree ssa_name, +hash_set *visited) +{ + /* Ensure termination. */ + if (visited->contains (ssa_name)) +return NULL_TREE; + visited->add (ssa_name); + + gimple *def_stmt = SSA_NAME_DEF_STMT (ssa_name); + + switch (gimple_code (def_stmt)) +{ +default: + gcc_unreachable (); +case GIMPLE_NOP: +case GIMPLE_PHI: + /* Can't handle these. */ + return NULL_TREE; +case GIMPLE_ASSIGN: + { + enum tree_code code = gimple_assign_rhs_code (def_stmt); + + /* Reverse the effect of extract_ops_from_tree during + gimplification. */ + switch (get_gimple_rhs_class (code)) + { + default: + case GIMPLE_INVALID_RHS: + gcc_unreachable (); + case GIMPLE_TERNARY_RHS: + case GIMPLE_BINARY_RHS: + case GIMPLE_UNARY_RHS: + { + tree t = make_node (code); + TREE_TYPE (t) = TREE_TYPE (ssa_name); + unsigned num_rhs_args = gimple_num_ops (def_stmt) - 1; + for (unsigned i = 0; i < num_rhs_args; i++) + { + tree op = gimple_op (def_stmt, i + 1); + if (op) + { + op = fixup_tree_for_diagnostic_1 (op, visited); + if (op == NULL_TREE) + return NULL_TREE; + } + TREE_OPERAND (t, i) = op; + } + return t; + } + case GIMPLE_SINGLE_RHS: + { +
[PATCH] tree-optimization: Optimize division followed by multiply [PR95176]
Hello, This patch fixes PR tree-optimization/95176. A new pattern in match.pd was added to transform "a * (b / a)" --> "b - (b % a)". A new test case was also added to cover this scenario. The new pattern interfered with the existing pattern of "X - (X / Y) * Y". In some cases (such as in fn4() in gcc/testsuite/gcc.dg/fold-minus-6.c), the new pattern is applied causing the existing pattern to no longer apply. This results in worse code generation because the expression is left as "X - (X - Y)". An additional subtraction pattern of "X - (X - Y) --> Y" was added to this patch to avoid this regression. I also didn't remove the existing pattern because it triggered in more cases than the new pattern because of a tree_invariant_p check that's inserted by genmatch for the new pattern. I verified that all "make -k check" tests pass when targeting x86_64-pc-linux-gnu. 2021-03-31 Victor Tong gcc/ChangeLog: * match.pd: Two new patterns: One to optimize division followed by multiply and the other to avoid a regression as explained above gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/20030807-10.c: Update existing test to look for a subtraction because a shift is no longer emitted * gcc.dg/pr95176.c: New test to cover optimizing division followed by multiply I don't have write access to the GCC repo but I've completed the FSF paperwork as I plan to make more contributions in the future. I'm looking for a sponsorship from an existing GCC maintainer before applying for write access. Thanks, Victor pr95176.patch Description: pr95176.patch
[PATCH] sra: Fix bug in grp_write propagation (PR 97009)
Hi, SRA represents parts of aggregates which are arrays accessed with unknown index as "unscalarizable regions." When there are two such regions one within another and the outer is only read whereas the inner is written to, SRA fails to propagate that write information across assignments. This means that a second aggregate can contain data while SRA thinks it does not and the pass can wrongly eliminate big chunks of assignment from that second aggregate into a third aggregate, which is what happens in PR 97009. Fixed by checking all children of unscalariable accesses for the grp_write flag. Bootstrap and testing on top of trunk and the gcc-10 branch is underway and I also plan to backport this to gcc-9. OK if they pass? Thanks, Martin gcc/ChangeLog: 2021-03-31 Martin Jambor PR tree-optimization/97009 * tree-sra.c (access_or_its_child_written): New function. (propagate_subaccesses_from_rhs): Use it instead of a simple grp_write test. gcc/testsuite/ChangeLog: 2021-03-31 Martin Jambor PR tree-optimization/97009 * gcc.dg/tree-ssa/pr97009.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/pr97009.c | 66 + gcc/tree-sra.c | 15 +- 2 files changed, 80 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr97009.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr97009.c b/gcc/testsuite/gcc.dg/tree-ssa/pr97009.c new file mode 100644 index 000..741dbc270c3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr97009.c @@ -0,0 +1,66 @@ +/* { dg-do run } */ +/* { dg-options "-O1" } */ + +static int __attribute__((noipa)) +get_5 (void) +{ + return 5; +} + +static int __attribute__((noipa)) +verify_5 (int v) +{ + if (v != 5) +__builtin_abort (); +} + +struct T +{ + int w; + int a[4]; +}; + +struct S +{ + int v; + int x; + struct T t[2]; + char alotofstuff[128]; +}; + +volatile int vol; + +void __attribute__((noipa)) +consume_t (struct T t) +{ + vol = t.a[0]; +} + +int __attribute__((noipa)) +foo (int l1, int l2) +{ + struct S s1, s2, s3; + int i, j; + + s1.v = get_5 (); + for (i = 0; i < l1; i++) +{ + for (j = 0; j < l2; j++) + s1.t[i].a[j] = get_5 (); + consume_t(s1.t[i]); +} + + s2 = s1; + + s3 = s2; + for (i = 0; i < l1; i++) +for (j = 0; j < l2; j++) + verify_5 (s3.t[i].a[j]); +} + +int +main (int argc, char *argv[]) +{ + foo (2, 4); + return 0; +} diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index d177f1ba11c..8dfc923ed7e 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -2723,6 +2723,19 @@ budget_for_propagation_access (tree decl) return true; } +/* Return true if ACC or any of its subaccesses has grp_child set. */ + +static bool +access_or_its_child_written (struct access *acc) +{ + if (acc->grp_write) +return true; + for (struct access *sub = acc->first_child; sub; sub = sub->next_sibling) +if (access_or_its_child_written (sub)) + return true; + return false; +} + /* Propagate subaccesses and grp_write flags of RACC across an assignment link to LACC. Enqueue sub-accesses as necessary so that the write flag is propagated transitively. Return true if anything changed. Additionally, if @@ -2836,7 +2849,7 @@ propagate_subaccesses_from_rhs (struct access *lacc, struct access *racc) if (rchild->grp_unscalarizable_region || !budget_for_propagation_access (lacc->base)) { - if (rchild->grp_write && !lacc->grp_write) + if (!lacc->grp_write && access_or_its_child_written (rchild)) { ret = true; subtree_mark_written_and_rhs_enqueue (lacc); -- 2.30.2
[PATCH] PR fortran/99840 - [8/9/10/11 Regression] ICE in gfc_simplify_matmul, at fortran/simplify.c:4777
Dear all, the simplification of the TRANSPOSE of a zero-sized array would lead to an ICE if the result was used in a subsequent simplification of a MATMUL. The reason was the lack of the proper initialization of the shape, which is mpz_t. Use mpz_init_set instead of mpz_set. Regtested on x86_64-pc-linux-gnu. OK for mainline? Since this is a regression, backport to all affected branches? Thanks, Harald PR fortran/99840 - ICE in gfc_simplify_matmul, at fortran/simplify.c:4777 The simplification of the transposition of a constant array shall properly initialize and set the shape of the result. gcc/fortran/ChangeLog: PR fortran/99840 * simplify.c (gfc_simplify_transpose): Properly initialize resulting shape. gcc/testsuite/ChangeLog: PR fortran/99840 * gfortran.dg/transpose_5.f90: New test. diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c index 388aca7c38c..c27b47aa98f 100644 --- a/gcc/fortran/simplify.c +++ b/gcc/fortran/simplify.c @@ -8123,8 +8123,8 @@ gfc_simplify_transpose (gfc_expr *matrix) >where); result->rank = 2; result->shape = gfc_get_shape (result->rank); - mpz_set (result->shape[0], matrix->shape[1]); - mpz_set (result->shape[1], matrix->shape[0]); + mpz_init_set (result->shape[0], matrix->shape[1]); + mpz_init_set (result->shape[1], matrix->shape[0]); if (matrix->ts.type == BT_CHARACTER) result->ts.u.cl = matrix->ts.u.cl; diff --git a/gcc/testsuite/gfortran.dg/transpose_5.f90 b/gcc/testsuite/gfortran.dg/transpose_5.f90 new file mode 100644 index 000..682b1c8552b --- /dev/null +++ b/gcc/testsuite/gfortran.dg/transpose_5.f90 @@ -0,0 +1,8 @@ +! { dg-do compiler } +! { dg-options "-O2" } +! PR fortran/99840 - ICE in gfc_simplify_matmul, at fortran/simplify.c:4777 +program p + integer, parameter :: x(0,0) = 0 + integer :: y(0,0) + y = matmul (x, transpose(x)) +end
[committed] [PR99781] Update correctly reg notes in LRA for multi-registers and set up biggest mode safely
The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99781 The patch was successfully bootstrapped and tested on x86-64, ppc64le, and aarch64. commit 1458059fc1faf6170f2fe45159065f91876307ac Author: Vladimir N. Makarov Date: Wed Mar 31 13:26:30 2021 -0400 [PR99781] Update correctly reg notes in LRA for multi-registers and set up biggest mode safely The PR is about incorrect use of partial_subreg_p for unordered modes. I found 2 places of dangerous comparing unordered modes in LRA. The patch removes dangerous use of paradoxical_subreg_p and partial_subreg_p in split_reg and process_bb_lives. The both places used them to solve PR77761 long time ago. But the problem was also fixed by later patches too (if there is no hard reg explicitly, it have VOIDmode and we use natural mode to split hard reg live, otherwise we use the biggest explicitly used mode for hard reg splitting). The PR also says about inaccurate update of reg notes in LRA. It happens for reg notes which refer for multi-registers. The patch also fixes this issue. gcc/ChangeLog: PR target/99781 * lra-constraints.c (split_reg): Don't check paradoxical_subreg_p. * lra-lives.c (clear_sparseset_regnos, regnos_in_sparseset_p): New functions. (process_bb_lives): Don't update biggest mode of hard reg for implicit in multi-register group. Use the new functions for updating dead_set and unused_set by register notes. gcc/testsuite/ChangeLog: PR target/99781 * g++.target/aarch64/sve/pr99781.C: New. diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index 9993065f8d6..62bcfc31772 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -5796,12 +5796,11 @@ split_reg (bool before_p, int original_regno, rtx_insn *insn, nregs = 1; mode = lra_reg_info[hard_regno].biggest_mode; machine_mode reg_rtx_mode = GET_MODE (regno_reg_rtx[hard_regno]); - /* A reg can have a biggest_mode of VOIDmode if it was only ever seen - as part of a multi-word register. In that case, or if the biggest - mode was larger than a register, just use the reg_rtx. Otherwise, - limit the size to that of the biggest access in the function. */ - if (mode == VOIDmode - || paradoxical_subreg_p (mode, reg_rtx_mode)) + /* A reg can have a biggest_mode of VOIDmode if it was only ever seen as + part of a multi-word register. In that case, just use the reg_rtx. + Otherwise, limit the size to that of the biggest access in the + function. */ + if (mode == VOIDmode) { original_reg = regno_reg_rtx[hard_regno]; mode = reg_rtx_mode; diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c index 0bddca13fee..29531843c63 100644 --- a/gcc/lra-lives.c +++ b/gcc/lra-lives.c @@ -615,6 +615,32 @@ reg_early_clobber_p (const struct lra_insn_reg *reg, int n_alt) && TEST_BIT (reg->early_clobber_alts, n_alt))); } +/* Clear pseudo REGNO in SET or all hard registers of REGNO in MODE in SET. */ +static void +clear_sparseset_regnos (sparseset set, int regno, enum machine_mode mode) +{ + if (regno >= FIRST_PSEUDO_REGISTER) +{ + sparseset_clear_bit (dead_set, regno); + return; +} + for (int last = end_hard_regno (mode, regno); regno < last; regno++) +sparseset_clear_bit (set, regno); +} + +/* Return true if pseudo REGNO is in SET or all hard registers of REGNO in MODE + are in SET. */ +static bool +regnos_in_sparseset_p (sparseset set, int regno, enum machine_mode mode) +{ + if (regno >= FIRST_PSEUDO_REGISTER) +return sparseset_bit_p (dead_set, regno); + for (int last = end_hard_regno (mode, regno); regno < last; regno++) +if (!sparseset_bit_p (set, regno)) + return false; + return true; +} + /* Process insns of the basic block BB to update pseudo live ranges, pseudo hard register conflicts, and insn notes. We do it on backward scan of BB insns. CURR_POINT is the program point where @@ -739,24 +765,13 @@ process_bb_lives (basic_block bb, int _point, bool dead_insn_p) /* Update max ref width and hard reg usage. */ for (reg = curr_id->regs; reg != NULL; reg = reg->next) { - int i, regno = reg->regno; + int regno = reg->regno; if (partial_subreg_p (lra_reg_info[regno].biggest_mode, reg->biggest_mode)) lra_reg_info[regno].biggest_mode = reg->biggest_mode; if (HARD_REGISTER_NUM_P (regno)) - { - lra_hard_reg_usage[regno] += freq; - /* A hard register explicitly can be used in small mode, - but implicitly it can be used in natural mode as a - part of multi-register group. Process this case - here. */ - for (i = 1; i < hard_regno_nregs (regno, reg->biggest_mode); i++) - if (partial_subreg_p (lra_reg_info[regno + i].biggest_mode, - GET_MODE (regno_reg_rtx[regno + i]))) - lra_reg_info[regno +
Re: [r11-7926 Regression] FAIL: libgomp.c/declare-variant-1.c (test for excess errors) on Linux/x86_64
On Wed, Mar 31, 2021 at 11:21 AM Jan Hubicka wrote: > > > On Linux/x86_64, > > > > d7145b4bb6c8729a1e782373cb6256c06ed60465 is the first bad commit > > commit d7145b4bb6c8729a1e782373cb6256c06ed60465 > > Author: Jan Hubicka > > Date: Wed Mar 31 11:35:29 2021 +0200 > > > > Small refactoring of cgraph_node::release_body > > > > caused > > > > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++14 (internal compiler error) > > This should be fixed by the attached patch. Sorry for this - I added the > check at last minute and apparently did not fully retest. > We now have clones of thunks that does make sense even after body was > released. What does not make much sense to me is why we consider them > clones at all, but that is something to revisit next stage1. > > Honza > > gcc/ChangeLog: > > PR lto/99447 > * cgraph.c (cgraph_node::release_body): Fix overactive check. > > diff --git a/gcc/cgraph.c b/gcc/cgraph.c > index b77c676a58a..d7c78d518bc 100644 > --- a/gcc/cgraph.c > +++ b/gcc/cgraph.c > @@ -1860,7 +1860,15 @@ cgraph_node::release_body (bool keep_arguments) >lto_free_function_in_decl_state_for_node (this); >lto_file_data = NULL; > } > - gcc_assert (!clones); > + if (flag_checking && clones) > +{ > + /* It is invalid to release body before materializing clones except > +for thunks that don't really need a body. Verify also that we do > +not leak pointers to the call statements. */ > + for (cgraph_node *node = clones; node; > + node = node->next_sibling_clone) > + gcc_assert (node->thunk && !node->callees->call_stmt); > +} >remove_callees (); >remove_all_references (); > } > > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++14 (test for excess errors) > > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++17 (internal compiler error) > > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++17 (test for excess errors) > > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++2a (internal compiler error) > > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++2a (test for excess errors) > > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++98 (internal compiler error) > > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++98 (test for excess errors) > > FAIL: g++.dg/ipa/pr71146.C -std=gnu++14 (internal compiler error) > > FAIL: g++.dg/ipa/pr71146.C -std=gnu++14 (test for excess errors) > > FAIL: g++.dg/ipa/pr71146.C -std=gnu++17 (internal compiler error) > > FAIL: g++.dg/ipa/pr71146.C -std=gnu++17 (test for excess errors) > > FAIL: g++.dg/ipa/pr71146.C -std=gnu++2a (internal compiler error) > > FAIL: g++.dg/ipa/pr71146.C -std=gnu++2a (test for excess errors) > > FAIL: g++.dg/ipa/pr71146.C -std=gnu++98 (internal compiler error) > > FAIL: g++.dg/ipa/pr71146.C -std=gnu++98 (test for excess errors) > > FAIL: g++.dg/ipa/pr85421.C (internal compiler error) > > FAIL: g++.dg/ipa/pr85421.C (test for excess errors) > > FAIL: g++.dg/ipa/pr92528.C (internal compiler error) > > FAIL: g++.dg/ipa/pr92528.C (test for excess errors) > > FAIL: g++.dg/torture/covariant-1.C -O2 -flto -fno-use-linker-plugin > > -flto-partition=none (internal compiler error) > > FAIL: g++.dg/torture/covariant-1.C -O2 -flto -fno-use-linker-plugin > > -flto-partition=none (test for excess errors) > > FAIL: g++.dg/torture/covariant-1.C -O2 -flto -fuse-linker-plugin > > -fno-fat-lto-objects (internal compiler error) > > FAIL: g++.dg/torture/covariant-1.C -O2 -flto -fuse-linker-plugin > > -fno-fat-lto-objects (test for excess errors) > > FAIL: g++.dg/torture/pr46287.C -O2 -flto -fno-use-linker-plugin > > -flto-partition=none (internal compiler error) > > FAIL: g++.dg/torture/pr46287.C -O2 -flto -fno-use-linker-plugin > > -flto-partition=none (test for excess errors) > > FAIL: g++.dg/torture/pr46287.C -O2 -flto -fuse-linker-plugin > > -fno-fat-lto-objects (internal compiler error) > > FAIL: g++.dg/torture/pr46287.C -O2 -flto -fuse-linker-plugin > > -fno-fat-lto-objects (test for excess errors) > > FAIL: g++.dg/torture/pr46287.C -O2 (internal compiler error) > > FAIL: g++.dg/torture/pr46287.C -O2 (test for excess errors) > > FAIL: g++.dg/torture/pr46287.C -O3 -fomit-frame-pointer -funroll-loops > > -fpeel-loops -ftracer -finline-functions (internal compiler error) > > FAIL: g++.dg/torture/pr46287.C -O3 -fomit-frame-pointer -funroll-loops > > -fpeel-loops -ftracer -finline-functions (test for excess errors) > > FAIL: g++.dg/torture/pr46287.C -O3 -g (internal compiler error) > > FAIL: g++.dg/torture/pr46287.C -O3 -g (test for excess errors) > > FAIL: g++.dg/torture/pr78692.C -O2 -flto -fno-use-linker-plugin > > -flto-partition=none (internal compiler error) > > FAIL: g++.dg/torture/pr78692.C -O2 -flto -fno-use-linker-plugin > > -flto-partition=none (test for excess errors) > > FAIL: g++.dg/torture/pr78692.C -O2 (internal compiler error) > > FAIL: g++.dg/torture/pr78692.C -O2 (test for excess errors) > > FAIL: g++.dg/torture/pr78692.C -O3 -g (internal compiler
[PATCH] modules : Make sure we include in system.h.
Hi, This fixes a stage 1 bootstrap fail on some Darwin versions when the bootstrap compiler is clang / libc++ from Xcode. bootstrapped on x86_64-darwin16, x86_64-linux-gnu OK for master? thanks Iain It appears that many targets include the map header transitively in other std headers included from system.h. However there are some editions of clang/libc++ in Xcode that do not, which results in a bootstrap fail - since when resolver.h is included there is then a conflict in declaring abort(). The fix is to ensure that map is pulled in by system.h and before resolver.h is included. As a precautionary measure and to alert anyone perhaps adding another header to resolver.h this patch also gates the direct includes there on !IN_GCC. c++tools/ChangeLog: * resolver.h: Do not include std headers directly when building in GCC. gcc/cp/ChangeLog: * mapper-client.cc (INCLUDE_MAP): New; require map to be included from system.h. * mapper-resolver.cc (INCLUDE_MAP): Likewise. --- c++tools/resolver.h | 2 ++ gcc/cp/mapper-client.cc | 1 + gcc/cp/mapper-resolver.cc | 1 + 3 files changed, 4 insertions(+) diff --git a/c++tools/resolver.h b/c++tools/resolver.h index 19339125b26..a9547bf6994 100644 --- a/c++tools/resolver.h +++ b/c++tools/resolver.h @@ -24,8 +24,10 @@ along with GCC; see the file COPYING3. If not see // Mapper interface for client and server bits #include "cody.hh" // C++ +#if !IN_GCC #include #include +#endif // This is a GCC class, so GCC coding conventions on new bits. class module_resolver : public Cody::Resolver diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc index 774e2b2b095..b9e02168d55 100644 --- a/gcc/cp/mapper-client.cc +++ b/gcc/cp/mapper-client.cc @@ -26,6 +26,7 @@ along with GCC; see the file COPYING3. If not see #endif #define INCLUDE_STRING #define INCLUDE_VECTOR +#define INCLUDE_MAP #include "system.h" #include "line-map.h" diff --git a/gcc/cp/mapper-resolver.cc b/gcc/cp/mapper-resolver.cc index bcf6c8871e5..db443fb4948 100644 --- a/gcc/cp/mapper-resolver.cc +++ b/gcc/cp/mapper-resolver.cc @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3. If not see #define INCLUDE_STRING #define INCLUDE_VECTOR #define INCLUDE_ALGORITHM +#define INCLUDE_MAP #include "system.h" // We don't want or need to be aware of networking -- 2.24.1
[PATCH] bswap: Handle bswapping of pointers [PR96573]
Hi! In GCC8/9 we used to optimize this into a bswap, but we no longer do. Handling byteswapping of pointers is easy, all we need is to allow them, for the __builtin_bswap* we already use TYPE_PRECISION to determine the precision and we cast the operand and result to the correct type if they aren't uselessly convertible to what the builtin expects. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2021-03-31 Jakub Jelinek PR tree-optimization/96573 * gimple-ssa-store-merging.c (init_symbolic_number): Handle also pointer types. * gcc.dg/pr96573.c: New test. --- gcc/gimple-ssa-store-merging.c.jj 2021-02-22 17:54:05.701798074 +0100 +++ gcc/gimple-ssa-store-merging.c 2021-03-31 11:30:55.519845647 +0200 @@ -333,7 +333,7 @@ init_symbolic_number (struct symbolic_nu { int size; - if (! INTEGRAL_TYPE_P (TREE_TYPE (src))) + if (!INTEGRAL_TYPE_P (TREE_TYPE (src)) && !POINTER_TYPE_P (TREE_TYPE (src))) return false; n->base_addr = n->offset = n->alias_set = n->vuse = NULL_TREE; --- gcc/testsuite/gcc.dg/pr96573.c.jj 2021-03-31 11:39:19.382122478 +0200 +++ gcc/testsuite/gcc.dg/pr96573.c 2021-03-31 11:39:54.186727148 +0200 @@ -0,0 +1,20 @@ +/* PR tree-optimization/96573 */ +/* { dg-do compile { target { lp64 || ilp32 } } } */ +/* { dg-require-effective-target bswap } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump "__builtin_bswap" "optimized" } } */ + +typedef __SIZE_TYPE__ size_t; + +void * +foo (void * const p) +{ + const size_t m = sizeof (p) - 1; + const unsigned char * const o = (unsigned char*) + void *n; + unsigned char * const q = (unsigned char *) + unsigned char i; + for (i = 0; i <= m; ++i) +q[m - i] = o[i]; + return n; +} Jakub
Re: [PATCH] rs6000: MMA test case ICEs using -O3
Hi! On Tue, Mar 30, 2021 at 06:49:29PM -0500, Peter Bergner wrote: > The mma_assemble_input_operand predicate does not accept reg+reg indexed > addresses which can lead to ICEs. The problem is that the quad_address_p > function only accepts reg+offset addresses that are valid for quad word > accesses, but not reg+reg addresses which are also valid for quad word > accesses when dealing with vector types. The solution used here is to > call memory_operand, which uses rs6000_legitimate_address_p to ensure > the address is valid. For reg+offset addresses, it uses quad_address_p like > before, but for reg+reg addresses, it calls legitimate_indexed_address_p > addresses which fixes this specific ICE. quad_address_p should really only be used for lq/stq (and their prefixed forms). Those insns have very different semantics: they are atomic, while vector loads and stores are not; and the prefixed form has some special semantics (it swaps halves on LE). > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md > index 859af75dfbd..e48c6eee19e 100644 > --- a/gcc/config/rs6000/predicates.md > +++ b/gcc/config/rs6000/predicates.md > @@ -1171,8 +1171,7 @@ > (define_special_predicate "mma_assemble_input_operand" >(match_test "(mode == V16QImode > && (vsx_register_operand (op, mode) > - || (MEM_P (op) > - && quad_address_p (XEXP (op, 0), mode, false")) > + || memory_operand (op, mode)))")) This seems like it might need reloads very often, because it allows way too much? Or, can you just use reg_or_mem_operand? > --- /dev/null > +++ b/gcc/testsuite/g++.target/powerpc/pr99842.C > @@ -0,0 +1,188 @@ > +/* PR target/99842 */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-O3 -mdejagnu-cpu=power10 -w" } */ (Please document what warning you want to shut up. Just a few words is plenty.) That testcase will likely not show the error anymore just a year or so from now, but it is good to have more complex testcases anyhow. Segher
[committed] wwwdocs: Don't list RMS as member of the Steering Committee
Per https://gcc.gnu.org/pipermail/gcc/2021-March/235245.html "In 2012 RMS was added to the GCC Steering Committee web page based on his role in the GNU Project, though his role as a member of the Steering Committee has been ambiguous and he was not a member of the Steering Committee when EGCS became GCC[1] wwwdocs:. We no longer feel that this listing serves the best interests of the GCC developer and user community. Therefore, we are removing him from the page. GCC supports the principles of Free Software and has remained aligned with the GNU Project since EGCS became GCC, but effectively has continued to operate as an autonomous project. The GCC Steering Committee is committed to providing a friendly, safe and welcoming environment for all, regardless of gender identity and expression, sexual orientation, disabilities, neurodiversity, physical appearance, body size, ethnicity, nationality, race, age, religion, or similar personal characteristics. - The GCC Steering Committee [1] wwwdocs: https://static.lwn.net/1999/0429/a/gcc.html; --- htdocs/steering.html | 6 -- 1 file changed, 6 deletions(-) diff --git a/htdocs/steering.html b/htdocs/steering.html index 53faba78..7192b424 100644 --- a/htdocs/steering.html +++ b/htdocs/steering.html @@ -44,12 +44,6 @@ place to reach them is the gcc mailing list. Jim Wilson (SiFive) -and - - -Richard Stallman (Free Software Foundation) - - Membership in the steering committee is a personal membership. Affiliations are listed for identification purposes only; steering committee members do not represent their employers or academic -- 2.30.2
Re: Small refactoring of cgraph_node::release_body
This patch is causing new crashes in the testsuite. ICE in release_body, at graph.c:1863 ranges offset out of range Thanks, David
Re: improve future::poll calibration loop
On Mar 27, 2021, Jonathan Wakely wrote: > OK, thanks. Thanks, I'm finally checking this in. >> for libstdc++-v3/ChangeLog >> >> * testsuite/30_threads/future/members/poll.cc: Use faster >> after-ready call in the calibration loop. Thanks for the patch and for having kept me posted on the random_device changes too! -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Vim, Vi, Voltei pro Emacs -- GNUlius Caesar
Re: [PATCH] data-ref: Tighten index-based alias checks [PR99726]
Richard Biener via Gcc-patches writes: > On Wed, Mar 31, 2021 at 12:15 PM Richard Sandiford via Gcc-patches > wrote: >> >> create_intersect_range_checks_index tries to create a runtime >> alias check based on index comparisons. It looks through the >> access functions for the two DRs to find a SCEV for the loop >> that is being versioned and converts a DR_STEP-based check >> into an index-based check. >> >> However, there isn't any reliable sign information in the types, >> so the code expects the value of the IV step (when interpreted as >> signed) to be negative iff the DR_STEP (when interpreted as signed) >> is negative. >> >> r10-4762 added another assert related to this assumption and the >> assert fired for the testcase in the PR. The sign of the IV step >> didn't match the sign of the DR_STEP. >> >> I think this is actually showing what was previously a wrong-code bug. >> The signs didn't match because the DRs contained *two* access function >> SCEVs for the loop being versioned. It doesn't look like the code >> is set up to deal with this, since it checks each access function >> independently and treats it as the sole source of DR_STEP. > > I think it shows that matching DR_STEP with the access function step > doesn't make much sense. In practice it will work for the case where > there's a single access function evolving in the respective loop since > we don't have negative stride arrays via array_ref_element_size > (do we?). But of course for multiple access functions we can't > simply choose one to do an index overlap test but instead we'd > have to check all of them. > > That said, I wonder why the code needs the DR_STEP at all > and cannot simply use the SCEVs step [direction]. Not 100% sure either, but I'm guessing it's because we don't have direct access to the vectorisation factor here (or the parloops equivalent). So I guess the code has no choice other than to work out the required number of iterations from the step. It would probably still be possible to handle multiple access functions by accumulating the SCEV steps, even with the current DR_STEP approach. But I'm not sure the risk/reward is worth it. Thanks, Richard > Of course the patch is correct in that it restricts handling to a single > access fn evolving in the loop. Thus - OK. Removing the DR_STEP > restriction might allow more (weird, see above) cases to be handled. > > Thanks, > Richard. > >> The patch therefore moves the main condition out of the loop. >> This also has the advantage of not building a tree for one access >> function only to throw it away if we find an inner function that >> makes the comparison invalid. >> >> Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu. >> OK to install? >> >> Richard >> >> >> gcc/ >> PR tree-optimization/99726 >> * tree-data-ref.c (create_intersect_range_checks_index): Bail >> out if there is more than one access function SCEV for the loop >> being versioned. >> >> gcc/testsuite/ >> PR tree-optimization/99726 >> * gcc.target/i386/pr99726.c: New test. >> --- >> gcc/testsuite/gcc.target/i386/pr99726.c | 15 ++ >> gcc/tree-data-ref.c | 245 +--- >> 2 files changed, 143 insertions(+), 117 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/i386/pr99726.c >> >> [ -b version ]-- >> >> diff --git a/gcc/testsuite/gcc.target/i386/pr99726.c >> b/gcc/testsuite/gcc.target/i386/pr99726.c >> new file mode 100644 >> index 000..ff19bcabe4f >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/i386/pr99726.c >> @@ -0,0 +1,15 @@ >> +/* { dg-options "-flive-patching=inline-clone -mavx512f -O2 >> -floop-nest-optimize -ftree-loop-vectorize -ftrapv -m32" } */ >> + >> +extern int a[256][1024]; >> +int b; >> +long c, d; >> +unsigned int e; >> + >> +int >> +main () >> +{ >> + for (; e < d; e++) >> +for (unsigned j = 1; j < c; j++) >> + a[e][j] = b * a[e - 1][j + 1]; >> + return 0; >> +} >> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c >> index 124a7bea6a9..e6dd5f15bed 100644 >> --- a/gcc/tree-data-ref.c >> +++ b/gcc/tree-data-ref.c >> @@ -2147,8 +2147,8 @@ create_intersect_range_checks_index (class loop *loop, >> tree *cond_expr, >> >>bool waw_or_war_p = (alias_pair.flags & ~(DR_ALIAS_WAR | DR_ALIAS_WAW)) >> == 0; >> >> - unsigned int i; >> - for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++) >> + int found = -1; >> + for (unsigned int i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++) >> { >>tree access1 = DR_ACCESS_FN (dr_a.dr, i); >>tree access2 = DR_ACCESS_FN (dr_b.dr, i); >> @@ -2164,7 +2164,19 @@ create_intersect_range_checks_index (class loop >> *loop, tree *cond_expr, >> >> return false; >> } >> + if (found >= 0) >> + return false; >> + found = i; >> +} >> + >> + /* Ought not to happen in practice, since if all accesses are equal
[PATCH] rs6000: Fix up libgcc ABI when built with --with-long-double-format=ieee [PR97653]
Hi! __floatunditf and __fixtfdi and a couple of other libgcc{.a,_s.so} entrypoints for backwards compatibility should mean IBM double double handling (i.e. IFmode), gcc emits such calls for that format and form IEEE long double emits *kf* instead. When gcc is configured without --with-long-double-format=ieee , everything is fine, but when it is not, we need to compile those libgcc sources with -mno-gnu-attribute -mabi=ibmlongdouble. The following snippet in libgcc/config/rs6000/t-linux was attempting to ensure that, and for some routines it works fine (e.g. for _powitf2). But, due to 4 different types of bugs it doesn't work for most of those functions, which means that in --with-long-double-format=ieee configured gcc those *tf* entrypoints instead handle the long double arguments as if they were KFmode. The bugs are: 1) the first few objs properly use $(objext) as suffix, but several other contain a typo and use $(object) instead, which is a variable that isn't set to anything, so we don't add .o etc. extensions 2) while unsigned fix are properly called _fixuns*, unsigned float are called _floatun* (without s), but the var was using there the extra s and so didn't match 3) the variable didn't cover any of the TF <-> TI conversions, only TF <-> DI conversions 4) nothing in libgcc_s.so was handled, as those object files are called *_s.o rather than *.o and IBM128_SHARED_OBJS used wrong syntax of the GNU make substitution reference, which should be $(var:a=b) standing for $(patsubst a,b,$(var)) but it used $(var:a:b) instead Bootstrapped/regtested on powerpc64le-linux --with-long-double-format=ieee, ok for trunk? 2021-03-31 Jakub Jelinek PR target/97653 * config/rs6000/t-linux (IBM128_STATIC_OBJS): Fix spelling, use $(objext) instead of $(object). Use _floatunditf instead of _floatunsditf. Add tf <-> ti conversion objects. (IBM128_SHARED_OBJS): Use proper substitution reference syntax. --- libgcc/config/rs6000/t-linux.jj 2020-12-04 08:08:06.556434569 +0100 +++ libgcc/config/rs6000/t-linux2021-03-31 14:20:50.953852864 +0200 @@ -11,10 +11,12 @@ HOST_LIBGCC2_CFLAGS += -mno-minimal-toc # the IBM extended double format. Also turn off gnu attributes on the static # modules. IBM128_STATIC_OBJS = ibm-ldouble$(objext) _powitf2$(objext) \ - ppc64-fp$(objext) _divtc3$(object) _multc3$(object) \ - _fixtfdi$(object) _fixunstfdi$(object) \ - _floatditf$(objext) _floatunsditf$(objext) -IBM128_SHARED_OBJS = $(IBM128_STATIC_OBJS:$(objext):_s$(objext)) + ppc64-fp$(objext) _divtc3$(objext) _multc3$(objext) \ + _fixtfdi$(objext) _fixunstfdi$(objext) \ + _floatditf$(objext) _floatunditf$(objext) \ + _fixtfti$(objext) _fixunstfti$(objext) \ + _floattitf$(objext) _floatuntitf$(objext) +IBM128_SHARED_OBJS = $(IBM128_STATIC_OBJS:$(objext)=_s$(objext)) IBM128_OBJS= $(IBM128_STATIC_OBJS) $(IBM128_SHARED_OBJS) IBM128_CFLAGS = -Wno-psabi -mabi=ibmlongdouble -mno-gnu-attribute Jakub
Re: [r11-7926 Regression] FAIL: libgomp.c/declare-variant-1.c (test for excess errors) on Linux/x86_64
> On Linux/x86_64, > > d7145b4bb6c8729a1e782373cb6256c06ed60465 is the first bad commit > commit d7145b4bb6c8729a1e782373cb6256c06ed60465 > Author: Jan Hubicka > Date: Wed Mar 31 11:35:29 2021 +0200 > > Small refactoring of cgraph_node::release_body > > caused > > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++14 (internal compiler error) This should be fixed by the attached patch. Sorry for this - I added the check at last minute and apparently did not fully retest. We now have clones of thunks that does make sense even after body was released. What does not make much sense to me is why we consider them clones at all, but that is something to revisit next stage1. Honza gcc/ChangeLog: PR lto/99447 * cgraph.c (cgraph_node::release_body): Fix overactive check. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index b77c676a58a..d7c78d518bc 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -1860,7 +1860,15 @@ cgraph_node::release_body (bool keep_arguments) lto_free_function_in_decl_state_for_node (this); lto_file_data = NULL; } - gcc_assert (!clones); + if (flag_checking && clones) +{ + /* It is invalid to release body before materializing clones except +for thunks that don't really need a body. Verify also that we do +not leak pointers to the call statements. */ + for (cgraph_node *node = clones; node; + node = node->next_sibling_clone) + gcc_assert (node->thunk && !node->callees->call_stmt); +} remove_callees (); remove_all_references (); } > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++14 (test for excess errors) > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++17 (internal compiler error) > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++17 (test for excess errors) > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++2a (internal compiler error) > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++2a (test for excess errors) > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++98 (internal compiler error) > FAIL: g++.dg/ipa/devirt-7.C -std=gnu++98 (test for excess errors) > FAIL: g++.dg/ipa/pr71146.C -std=gnu++14 (internal compiler error) > FAIL: g++.dg/ipa/pr71146.C -std=gnu++14 (test for excess errors) > FAIL: g++.dg/ipa/pr71146.C -std=gnu++17 (internal compiler error) > FAIL: g++.dg/ipa/pr71146.C -std=gnu++17 (test for excess errors) > FAIL: g++.dg/ipa/pr71146.C -std=gnu++2a (internal compiler error) > FAIL: g++.dg/ipa/pr71146.C -std=gnu++2a (test for excess errors) > FAIL: g++.dg/ipa/pr71146.C -std=gnu++98 (internal compiler error) > FAIL: g++.dg/ipa/pr71146.C -std=gnu++98 (test for excess errors) > FAIL: g++.dg/ipa/pr85421.C (internal compiler error) > FAIL: g++.dg/ipa/pr85421.C (test for excess errors) > FAIL: g++.dg/ipa/pr92528.C (internal compiler error) > FAIL: g++.dg/ipa/pr92528.C (test for excess errors) > FAIL: g++.dg/torture/covariant-1.C -O2 -flto -fno-use-linker-plugin > -flto-partition=none (internal compiler error) > FAIL: g++.dg/torture/covariant-1.C -O2 -flto -fno-use-linker-plugin > -flto-partition=none (test for excess errors) > FAIL: g++.dg/torture/covariant-1.C -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (internal compiler error) > FAIL: g++.dg/torture/covariant-1.C -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (test for excess errors) > FAIL: g++.dg/torture/pr46287.C -O2 -flto -fno-use-linker-plugin > -flto-partition=none (internal compiler error) > FAIL: g++.dg/torture/pr46287.C -O2 -flto -fno-use-linker-plugin > -flto-partition=none (test for excess errors) > FAIL: g++.dg/torture/pr46287.C -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (internal compiler error) > FAIL: g++.dg/torture/pr46287.C -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects (test for excess errors) > FAIL: g++.dg/torture/pr46287.C -O2 (internal compiler error) > FAIL: g++.dg/torture/pr46287.C -O2 (test for excess errors) > FAIL: g++.dg/torture/pr46287.C -O3 -fomit-frame-pointer -funroll-loops > -fpeel-loops -ftracer -finline-functions (internal compiler error) > FAIL: g++.dg/torture/pr46287.C -O3 -fomit-frame-pointer -funroll-loops > -fpeel-loops -ftracer -finline-functions (test for excess errors) > FAIL: g++.dg/torture/pr46287.C -O3 -g (internal compiler error) > FAIL: g++.dg/torture/pr46287.C -O3 -g (test for excess errors) > FAIL: g++.dg/torture/pr78692.C -O2 -flto -fno-use-linker-plugin > -flto-partition=none (internal compiler error) > FAIL: g++.dg/torture/pr78692.C -O2 -flto -fno-use-linker-plugin > -flto-partition=none (test for excess errors) > FAIL: g++.dg/torture/pr78692.C -O2 (internal compiler error) > FAIL: g++.dg/torture/pr78692.C -O2 (test for excess errors) > FAIL: g++.dg/torture/pr78692.C -O3 -g (internal compiler error) > FAIL: g++.dg/torture/pr78692.C -O3 -g (test for excess errors) > FAIL: g++.dg/torture/pr83619.C -O2 -flto -fno-use-linker-plugin > -flto-partition=none (internal compiler error) > FAIL: g++.dg/torture/pr83619.C -O2 -flto
[r11-7926 Regression] FAIL: libgomp.c/declare-variant-1.c (test for excess errors) on Linux/x86_64
On Linux/x86_64, d7145b4bb6c8729a1e782373cb6256c06ed60465 is the first bad commit commit d7145b4bb6c8729a1e782373cb6256c06ed60465 Author: Jan Hubicka Date: Wed Mar 31 11:35:29 2021 +0200 Small refactoring of cgraph_node::release_body caused FAIL: g++.dg/ipa/devirt-7.C -std=gnu++14 (internal compiler error) FAIL: g++.dg/ipa/devirt-7.C -std=gnu++14 (test for excess errors) FAIL: g++.dg/ipa/devirt-7.C -std=gnu++17 (internal compiler error) FAIL: g++.dg/ipa/devirt-7.C -std=gnu++17 (test for excess errors) FAIL: g++.dg/ipa/devirt-7.C -std=gnu++2a (internal compiler error) FAIL: g++.dg/ipa/devirt-7.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ipa/devirt-7.C -std=gnu++98 (internal compiler error) FAIL: g++.dg/ipa/devirt-7.C -std=gnu++98 (test for excess errors) FAIL: g++.dg/ipa/pr71146.C -std=gnu++14 (internal compiler error) FAIL: g++.dg/ipa/pr71146.C -std=gnu++14 (test for excess errors) FAIL: g++.dg/ipa/pr71146.C -std=gnu++17 (internal compiler error) FAIL: g++.dg/ipa/pr71146.C -std=gnu++17 (test for excess errors) FAIL: g++.dg/ipa/pr71146.C -std=gnu++2a (internal compiler error) FAIL: g++.dg/ipa/pr71146.C -std=gnu++2a (test for excess errors) FAIL: g++.dg/ipa/pr71146.C -std=gnu++98 (internal compiler error) FAIL: g++.dg/ipa/pr71146.C -std=gnu++98 (test for excess errors) FAIL: g++.dg/ipa/pr85421.C (internal compiler error) FAIL: g++.dg/ipa/pr85421.C (test for excess errors) FAIL: g++.dg/ipa/pr92528.C (internal compiler error) FAIL: g++.dg/ipa/pr92528.C (test for excess errors) FAIL: g++.dg/torture/covariant-1.C -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error) FAIL: g++.dg/torture/covariant-1.C -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) FAIL: g++.dg/torture/covariant-1.C -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error) FAIL: g++.dg/torture/covariant-1.C -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) FAIL: g++.dg/torture/pr46287.C -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error) FAIL: g++.dg/torture/pr46287.C -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) FAIL: g++.dg/torture/pr46287.C -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (internal compiler error) FAIL: g++.dg/torture/pr46287.C -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) FAIL: g++.dg/torture/pr46287.C -O2 (internal compiler error) FAIL: g++.dg/torture/pr46287.C -O2 (test for excess errors) FAIL: g++.dg/torture/pr46287.C -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (internal compiler error) FAIL: g++.dg/torture/pr46287.C -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: g++.dg/torture/pr46287.C -O3 -g (internal compiler error) FAIL: g++.dg/torture/pr46287.C -O3 -g (test for excess errors) FAIL: g++.dg/torture/pr78692.C -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error) FAIL: g++.dg/torture/pr78692.C -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) FAIL: g++.dg/torture/pr78692.C -O2 (internal compiler error) FAIL: g++.dg/torture/pr78692.C -O2 (test for excess errors) FAIL: g++.dg/torture/pr78692.C -O3 -g (internal compiler error) FAIL: g++.dg/torture/pr78692.C -O3 -g (test for excess errors) FAIL: g++.dg/torture/pr83619.C -O2 -flto -fno-use-linker-plugin -flto-partition=none (internal compiler error) FAIL: g++.dg/torture/pr83619.C -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) FAIL: g++.dg/torture/pr83619.C -O2 (internal compiler error) FAIL: g++.dg/torture/pr83619.C -O2 (test for excess errors) FAIL: g++.dg/torture/pr83619.C -O3 -g (internal compiler error) FAIL: g++.dg/torture/pr83619.C -O3 -g (test for excess errors) FAIL: libgomp.c/declare-variant-1.c (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r11-7926/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/ipa/devirt-7.C --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/ipa/devirt-7.C --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/ipa/devirt-7.C --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/ipa/devirt-7.C --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/ipa/pr71146.C --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check
Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake
On Wed, Mar 31, 2021 at 10:43 AM Jan Hubicka wrote: > > > > Reading through the optimization manual it seems that mosvb is fast for > > > small block no matter if the size is hard wired. In that case you > > > probably want to check whetehr max_size or expected_size is known to be > > > small rather than max_size == min_size and both being small. > > > > > > But it depends on what CPU really does. > > > Honza > > > > For small data size, rep movsb is faster only under certain conditions. We > > can continue fine tuning rep movsb. > > OK, I however wonder why you need condtion maxsize=minsize. > - If CPU is looking for movl $cst, %rcx than we probably want to be >sure that it is not moved away fro rep ;movsb by adding fused pattern > - If rep movsb is slower than loop for very small blocks then you want >to set lower bound on minsize & expected size, but you do not need >to require maxsize=minsize > - If rep movsb is slower than sequence of moves for small blocks then >one needs to tweak move by pieces > - If rep movsb is slower for larger blocks than you want to test >maxsize and expected size > So in neither of those scenarios testing maxsize=minsize alone makes too > much sense to me... What was the original motivation for differentiating > between precisely known size? > > I am mostly curious because it is not that uncomon to have small maxsize > because we are able to track the object size and using short sequence > for those would be nice. > > Having minsize non-trivial may not be that uncommon these days either > given that we track value ranges (and under assumption that > memcpy/memset expanders was updated to take these into account). > Hongyu has done some analysis on this. Hongyu, can you share what you got? Thanks. -- H.J.
Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake
> > Reading through the optimization manual it seems that mosvb is fast for > > small block no matter if the size is hard wired. In that case you > > probably want to check whetehr max_size or expected_size is known to be > > small rather than max_size == min_size and both being small. > > > > But it depends on what CPU really does. > > Honza > > For small data size, rep movsb is faster only under certain conditions. We > can continue fine tuning rep movsb. OK, I however wonder why you need condtion maxsize=minsize. - If CPU is looking for movl $cst, %rcx than we probably want to be sure that it is not moved away fro rep ;movsb by adding fused pattern - If rep movsb is slower than loop for very small blocks then you want to set lower bound on minsize & expected size, but you do not need to require maxsize=minsize - If rep movsb is slower than sequence of moves for small blocks then one needs to tweak move by pieces - If rep movsb is slower for larger blocks than you want to test maxsize and expected size So in neither of those scenarios testing maxsize=minsize alone makes too much sense to me... What was the original motivation for differentiating between precisely known size? I am mostly curious because it is not that uncomon to have small maxsize because we are able to track the object size and using short sequence for those would be nice. Having minsize non-trivial may not be that uncommon these days either given that we track value ranges (and under assumption that memcpy/memset expanders was updated to take these into account). Honza > > -- > H.J.
[committed] wwwdocs: cilkplus.org is gone
I pushed this. At first cilkplus.org was broken for weeks, it not months. Now it redirects to a generic intel.com page. So remove it. --- htdocs/gcc-4.9/changes.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/htdocs/gcc-4.9/changes.html b/htdocs/gcc-4.9/changes.html index 342a9962..53031b7d 100644 --- a/htdocs/gcc-4.9/changes.html +++ b/htdocs/gcc-4.9/changes.html @@ -186,8 +186,8 @@ consecutive iterations using SIMD (single instruction multiple data) instructions. -Support for https://www.cilkplus.org/;>Cilk Plus has been -added and can be enabled with the -fcilkplus option. Cilk Plus +Support for Cilk Plus has been added and can be enabled +with the -fcilkplus option. Cilk Plus is an extension to the C and C++ languages to support data and task parallelism. The present implementation follows ABI version 1.2; all features but _Cilk_for have been implemented. -- 2.30.2
Re: [PATCH v9] Practical improvement to libgcc complex divide
On Fri, 26 Mar 2021 23:14:41 + Patrick McGehearty via Gcc-patches wrote: > diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c > index 9f993c4..c0d9e57 100644 > --- a/gcc/c-family/c-cppbuiltin.c > +++ b/gcc/c-family/c-cppbuiltin.c > @@ -1277,8 +1277,10 @@ c_cpp_builtins (cpp_reader *pfile) > { > scalar_float_mode mode = mode_iter.require (); > const char *name = GET_MODE_NAME (mode); > + const int name_len = strlen (name); strlen returns a size_t Funny that we do not have a pre-seeded mode_name_len array but call strlen on modes over and over again. > + char float_h_prefix[16] = ""; > char *macro_name > - = (char *) alloca (strlen (name) > + = (char *) alloca (name_len > + sizeof ("__LIBGCC__MANT_DIG__")); > sprintf (macro_name, "__LIBGCC_%s_MANT_DIG__", name); > builtin_define_with_int_value (macro_name, > @@ -1286,20 +1288,29 @@ c_cpp_builtins (cpp_reader *pfile) > if (!targetm.scalar_mode_supported_p (mode) > || !targetm.libgcc_floating_mode_supported_p (mode)) > continue; > - macro_name = (char *) alloca (strlen (name) > + macro_name = (char *) alloca (name_len > + sizeof ("__LIBGCC_HAS__MODE__")); > sprintf (macro_name, "__LIBGCC_HAS_%s_MODE__", name); > cpp_define (pfile, macro_name); > - macro_name = (char *) alloca (strlen (name) > + macro_name = (char *) alloca (name_len > + sizeof ("__LIBGCC__FUNC_EXT__")); > sprintf (macro_name, "__LIBGCC_%s_FUNC_EXT__", name); The above should have been split out as separate independent patchlet to get it out of the way. As noted already the use of alloca is a pre-existing (coding-style) bug and we should use XALLOCAVEC or appropriately sized fixed buffers to begin with. The amount of alloca in defining all these is amazing. > diff --git a/gcc/testsuite/gcc.c-torture/execute/ieee/cdivchkd.c > b/gcc/testsuite/gcc.c-torture/execute/ieee/cdivchkd.c > new file mode 100644 > index 000..409123f > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/execute/ieee/cdivchkd.c > @@ -0,0 +1,126 @@ > +/* > + Program to test complex divide for correct results on selected values. > + Checking known failure points. > +*/ > + > +#include > + > +extern void abort (); > +extern void exit (); As Joseph pointed out in the v8 review already, please use prototyped function declarations in all new tests. > diff --git a/gcc/testsuite/gcc.c-torture/execute/ieee/cdivchkld.c > b/gcc/testsuite/gcc.c-torture/execute/ieee/cdivchkld.c > new file mode 100644 > index 000..28d707d > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/execute/ieee/cdivchkld.c > @@ -0,0 +1,167 @@ > +/* > + Program to test complex divide for correct results on selected values. > + Checking known failure points. > +*/ > + > +#include > + > +extern void abort (); > +extern void exit (); Likewise. > +#if (LDBL_MAX_EXP < 2048) > + /* > +Test values when mantissa is 11 or fewer bits. Either LDBL is > +using DBL on this platform or we are using IBM extended double > +precision Test values will be automatically trucated the available > +precision. s/trucated/truncated/g; # with an 'n' I have trouble parsing the sentence nevertheless. There might be a missing "which" somewhere and maybe a "to"? > +#else > + /* > +Test values intended for either IEEE128 or Intel80 formats. In > +either case, 15 bits of exponent are available. Test values will > +be automatically trucated the available precision. > + */ again, truNcated and a couple of words missing? > diff --git a/libgcc/config/rs6000/_divkc3.c b/libgcc/config/rs6000/_divkc3.c > index d261f40..f57e14f 100644 > --- a/libgcc/config/rs6000/_divkc3.c > +++ b/libgcc/config/rs6000/_divkc3.c > @@ -37,31 +37,118 @@ see the files COPYING3 and COPYING.RUNTIME respectively. > If not, see > #define __divkc3 __divkc3_sw > #endif > > +#define RBIG (__LIBGCC_TF_MAX__ / 2) > +#define RMIN (__LIBGCC_TF_MIN__) > +#define RMIN2 (__LIBGCC_TF_EPSILON__) > +#define RMINSCAL (1 / __LIBGCC_TF_EPSILON__) > +#define RMAX2 (RBIG * RMIN2) > + > + > TCtype > __divkc3 (TFtype a, TFtype b, TFtype c, TFtype d) > { >TFtype denom, ratio, x, y; >TCtype res; > > - /* ??? We can get better behavior from logarithmic scaling instead of > - the division. But that would mean starting to link libgcc against > - libm. We could implement something akin to ldexp/frexp as gcc builtins > - fairly easily... */ > + /* long double has significant potential underflow/overflow errors than > + can be greatly reduced with a limited number of tests and adjustments. > + */ "than" doesn't parse. that? >else > { > + /* Prevent underflow when denominator is near max representable. */ > + if (FABS (c) >= RBIG) > + { > +
Re: [PATCH] testsuite: Disable zero-scratch-regs-{8, 9, 10, 11}.c on s390* [PR97680]
Yes, basically, I agreed with Eric. One of the major reason to intentionally put these testing cases under c-c++-common is to fail them by default on the platforms that do not support this feature yet. Then the platform maintainer could decide whether to complete this feature on the specific platform or skip them if they don’t want such feature on this platform. Qing > On Mar 31, 2021, at 2:14 AM, Eric Botcazou wrote: > >> That is true, but nothing really happened during the 5 months that the tests >> have been failing on many other architectures (except that powerpc and arm >> had skipped those tests). There has been a PR open for all those 5 months. > > So what? This is not the first example and I don't see anything special with > it. You or maintainers can decide to XFAIL particular architectures at will, > but hiding the failures by default is IMO not appropriate. > >> We can perhaps revert the skips after branching GCC 11 off, but I have >> little hope other target maintainers will do what you did, so unsure if it >> would help. And the changes need people familiar with each of the backends >> to decide what needs to be done and what is doable. > > That's exactly the same situation as for -fstack-usage/-Wstack-usage, where I > intentionally made gcc.dg/stack-usage-1.c fail by default so that maintainers > could add the missing bits; this worked relatively well. > > -- > Eric Botcazou > >
[committed] add test for PR 65182
r11-7932 adds a test case for another ancient -Wuninitialized bug fixed eons ago: https://gcc.gnu.org/g:31199d95de1304e200554bbf98b2d8a6a7298bec Martin
Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake
On Wed, Mar 31, 2021 at 6:47 AM Jan Hubicka wrote: > > > > > > > > > Patch is OK now. I was wondering about using avx256 for moves of known > > > > > > Done. X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now. Can > > > you take a look at the patch for Skylake: > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html > > > > I was wondering, if CPU preffers rep movsb when rcx is a compile time > > constant, it probably does some logic at the decode time (i.e. expands > > it into some sequence) and if so, then it may require the code setting > > the register to be near rep (via fusing or simlar mechanism) > > > > Perhaps we want to have fusing pattern for this, so we do not move them > > far apart? > > Reading through the optimization manual it seems that mosvb is fast for > small block no matter if the size is hard wired. In that case you > probably want to check whetehr max_size or expected_size is known to be > small rather than max_size == min_size and both being small. > > But it depends on what CPU really does. > Honza For small data size, rep movsb is faster only under certain conditions. We can continue fine tuning rep movsb. -- H.J.
Re: [PATCH] testsuite/aarch64: Skip SLP diagnostic under ILP32 (PR target/96974)
On Wed, 31 Mar 2021 at 12:08, Richard Biener wrote: > > On Mon, Mar 29, 2021 at 1:40 PM Christophe Lyon via Gcc-patches > wrote: > > > > The vectorizer has a very different effect with -mabi=ilp32, and > > doesn't emit the expecte diagnostic, so this patch expects it only > > under lp64. > > OK (please also backport as necessary) > Thanks, pushed to trunk and gcc-10 Christophe > > 2021-03-29 Christophe Lyon > > > > gcc/testsuite/ > > PR target/96974 > > * g++.target/aarch64/sve/pr96974.C: Expect SLP diagnostic only > > under lp64. > > --- > > gcc/testsuite/g++.target/aarch64/sve/pr96974.C | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C > > b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C > > index 363241d..54000f5 100644 > > --- a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C > > +++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C > > @@ -15,4 +15,4 @@ struct c { > > int coeffs[10]; > > } f; > > > > -/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of > > vector subparts between" "slp1" } } */ > > +/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of > > vector subparts between" "slp1" { target lp64 } } } */ > > -- > > 2.7.4 > >
[pushed] c++: Alias template in pack expansion [PR99445]
In this testcase, iterative_hash_template_arg checks alias_template_specialization_p to determine whether to treat a type as a dependent alias, and structural_comptypes checks dependent_alias_template_spec_p. Normally that difference isn't a problem because canonicalizing template arguments strips non-dependent aliases, but that wasn't happening for the pack expansion. Fixed thus. Tested x86_64-pc-linux-gnu, applying to trunk. gcc/cp/ChangeLog: PR c++/99445 * tree.c (strip_typedefs): Handle TYPE_PACK_EXPANSION. gcc/testsuite/ChangeLog: PR c++/99445 * g++.dg/cpp0x/alias-decl-variadic1.C: New test. --- gcc/cp/tree.c | 9 + gcc/testsuite/g++.dg/cpp0x/alias-decl-variadic1.C | 14 ++ 2 files changed, 23 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-variadic1.C diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c index 8c4bd156d3f..dca947bf52a 100644 --- a/gcc/cp/tree.c +++ b/gcc/cp/tree.c @@ -1722,6 +1722,15 @@ strip_typedefs (tree t, bool *remove_attributes, unsigned int flags) remove_attributes, flags); result = finish_underlying_type (type); break; +case TYPE_PACK_EXPANSION: + type = strip_typedefs (PACK_EXPANSION_PATTERN (t), +remove_attributes, flags); + if (type != PACK_EXPANSION_PATTERN (t)) + { + result = copy_node (t); + PACK_EXPANSION_PATTERN (result) = type; + } + break; default: break; } diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-variadic1.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-variadic1.C new file mode 100644 index 000..68b3a7fd009 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-variadic1.C @@ -0,0 +1,14 @@ +// PR c++/99445 +// { dg-do compile { target c++11 } } +// { dg-additional-options "-fchecking=2 --param=hash-table-verification-limit=1000" } + +template struct implicit_conversions; +template +using implicit_conversions_t = typename implicit_conversions::type; +template struct response_type; + +template +using type1 = response_type...>; + +template +using type2 = response_type::type...>; base-commit: 7c1d6e89994109e1b6efb5f13890be5586edeb75 -- 2.27.0
Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake
> > > > > > Patch is OK now. I was wondering about using avx256 for moves of known > > > > Done. X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now. Can > > you take a look at the patch for Skylake: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html > > I was wondering, if CPU preffers rep movsb when rcx is a compile time > constant, it probably does some logic at the decode time (i.e. expands > it into some sequence) and if so, then it may require the code setting > the register to be near rep (via fusing or simlar mechanism) > > Perhaps we want to have fusing pattern for this, so we do not move them > far apart? Reading through the optimization manual it seems that mosvb is fast for small block no matter if the size is hard wired. In that case you probably want to check whetehr max_size or expected_size is known to be small rather than max_size == min_size and both being small. But it depends on what CPU really does. Honza
Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake
> > > > Patch is OK now. I was wondering about using avx256 for moves of known > > Done. X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now. Can > you take a look at the patch for Skylake: > > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html I was wondering, if CPU preffers rep movsb when rcx is a compile time constant, it probably does some logic at the decode time (i.e. expands it into some sequence) and if so, then it may require the code setting the register to be near rep (via fusing or simlar mechanism) Perhaps we want to have fusing pattern for this, so we do not move them far apart? > > > size (per comment on MOVE_MAX_PIECES there is issue with > > MAX_FIXED_MODE_SIZE, but that seems not hard to fix). Did you look into > > it? > > It requires some changes in the middle-end. See yep, I know - tried that too for zen3 tuning :) > users/hjl/pieces/master branch: > > https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/pieces/master > > I am rebasing it. Thanks, it would also help to reduce the code size bloat by bumping up the move by pieces. Clang is using those. Honza > > -- > H.J.
Re: [PATCH][GCC 10] tree-opt: Fix segfault in tree-if-conv.c with -march=armv8.2-a+sve [PR97849]
On Wed, Mar 31, 2021 at 3:31 PM Alex Coplan via Gcc-patches wrote: > > Hi all, > > I'd like to backport the fix for PR97849 to GCC 10. The patch on trunk: > https://gcc.gnu.org/g:5700973f4a30762b4fc21687bb5f7843e55da2e4 > applies cleanly to the 10 branch. > > Bootstrapped and regtested on aarch64-linux-gnu, no regressions. > > OK for GCC 10.3? OK. > Thanks, > Alex > > --- > > gcc/ChangeLog > 2020-11-24 Prathamesh Kulkarni > > PR tree-optimization/97849 > * tree-if-conv.c (tree_if_conversion): Move ssa_name > replacement code from ifcvt_local_dce to this function > before calling do_rpo_vn. > > gcc/testsuite/ChangeLog > 2020-11-24 Prathamesh Kulkarni > > PR tree-optimization/97849 > * gcc.dg/tree-ssa/pr97849.c: New test.
[PATCH][GCC 10] tree-opt: Fix segfault in tree-if-conv.c with -march=armv8.2-a+sve [PR97849]
Hi all, I'd like to backport the fix for PR97849 to GCC 10. The patch on trunk: https://gcc.gnu.org/g:5700973f4a30762b4fc21687bb5f7843e55da2e4 applies cleanly to the 10 branch. Bootstrapped and regtested on aarch64-linux-gnu, no regressions. OK for GCC 10.3? Thanks, Alex --- gcc/ChangeLog 2020-11-24 Prathamesh Kulkarni PR tree-optimization/97849 * tree-if-conv.c (tree_if_conversion): Move ssa_name replacement code from ifcvt_local_dce to this function before calling do_rpo_vn. gcc/testsuite/ChangeLog 2020-11-24 Prathamesh Kulkarni PR tree-optimization/97849 * gcc.dg/tree-ssa/pr97849.c: New test. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr97849.c b/gcc/testsuite/gcc.dg/tree-ssa/pr97849.c new file mode 100644 index 000..57a31e316a2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr97849.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -ftree-vectorize" } */ +/* { dg-additional-options "-march=armv8.2-a+sve" { target aarch64*-*-* } } */ + +int a, b, c; + +int g() { + char i = 0; + for (c = 0; c <= 8; c++) +--i; + + while (b) { +_Bool f = i <= 0; +a = (a == 0) ? 0 : f / a; + } +} diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index 2062758f40f..93effaa811b 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -2916,12 +2916,6 @@ ifcvt_local_dce (class loop *loop) enum gimple_code code; use_operand_p use_p; imm_use_iterator imm_iter; - std::pair *name_pair; - unsigned int i; - - FOR_EACH_VEC_ELT (redundant_ssa_names, i, name_pair) -replace_uses_by (name_pair->first, name_pair->second); - redundant_ssa_names.release (); /* The loop has a single BB only. */ basic_block bb = loop->header; @@ -3124,6 +3118,13 @@ tree_if_conversion (class loop *loop, vec *preds) exit_bbs = BITMAP_ALLOC (NULL); bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index); bitmap_set_bit (exit_bbs, loop->latch->index); + + std::pair *name_pair; + unsigned ssa_names_idx; + FOR_EACH_VEC_ELT (redundant_ssa_names, ssa_names_idx, name_pair) +replace_uses_by (name_pair->first, name_pair->second); + redundant_ssa_names.release (); + todo |= do_rpo_vn (cfun, loop_preheader_edge (loop), exit_bbs); /* Delete dead predicate computations. */
Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake
On Wed, Mar 31, 2021 at 1:05 AM Jan Hubicka wrote: > > > > It looks like X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is quite obviously > > > benefical and independent of the rest of changes. I think we will need > > > to discuss bit more the move ratio and the code size/uop cache polution > > > issues - one option would be to use increased limits for -O3 only. > > > > My change only increases CLEAR_RATIO, not MOVE_RATIO. We are > > checking code size impacts on SPEC CPU 2017 and eembc. > > > > > Can you break this out to independent patch? I also wonder if it owuld > > > > X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB improves performance > > only when memcpy/memset costs and MOVE_RATIO are updated the same time, > > like: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html > > > > Make it a standalone means moving from Ice Lake patch to Skylake patch. > > > > > not be more readable to special case this just on the beggining of > > > decide_alg. > > > > @@ -6890,6 +6891,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT > > > > expected_size, > > > >const struct processor_costs *cost; > > > >int i; > > > >bool any_alg_usable_p = false; > > > > + bool known_size_p = expected_size != -1; > > > > > > expected_size is not -1 if we have profile feedback and we detected from > > > histogram average size of a block. It seems to me that from description > > > that you want the const to be actual compile time constant that would be > > > min_size == max_size I guess. > > > > > > > You are right. Here is the v2 patch with min_size != max_size check for > > unknown size. > > Patch is OK now. I was wondering about using avx256 for moves of known Done. X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is in now. Can you take a look at the patch for Skylake: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html > size (per comment on MOVE_MAX_PIECES there is issue with > MAX_FIXED_MODE_SIZE, but that seems not hard to fix). Did you look into > it? It requires some changes in the middle-end. See users/hjl/pieces/master branch: https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/pieces/master I am rebasing it. -- H.J.
Re: znver3 tuning part 1
> On 3/31/21 1:08 PM, Jan Hubicka wrote: > > > > > > 2021-03-15 Jan Hubicka > > > > > > * config/i386/i386-options.c (processor_cost_table): Add znver3_cost. > > > * config/i386/x86-tune-costs.h (znver3_cost): New gobal variable; copy > > > of znver2_cost. > > > > I have backported the patch to gcc10 branch as > > g:aa99212489545c6c970a8f91b3d37ea6466cb988. > > > > Honza > > > > Looking at the backport, it has likely enabled PR99753. > Please consider backporting 4f00c4d40a539360938607561460904663c64cda. Not this patch but backport of the original Venkat's change indeed triggers PR99763. I am now testing backport if this revision. Thanks, Honza > > Cheers, > Martin
[PATCH] rs6000: Avoid -fpatchable-function-entry* regressions on powerpc64 be [PR98125]
Hi! The SECTION_LINK_ORDER changes broke powerpc64-linux ELFv1. Seems that the assembler/linker relies on the symbol mentioned for the "awo" section to be in the same section as the symbols mentioned in the relocations in that section (i.e. labels for the patchable area in this case). That is the case for most targets, including powerpc-linux 32-bit or powerpc64 ELFv2 (that one has -fpatchable-function-entry* support broken for other reasons and it doesn't seem to be a regression). But it doesn't work on powerpc64-linux ELFv1. We emit: .section".opd","aw" .align 3 _Z3foov: .quad .L._Z3foov,.TOC.@tocbase,0 .previous .type _Z3foov, @function .L._Z3foov: .section__patchable_function_entries,"awo",@progbits,_Z3foov .align 3 .8byte .LPFE1 .section.text._Z3foov,"axG",@progbits,_Z3foov,comdat .LPFE1: nop .LFB0: .cfi_startproc and because _Z3foov is in the .opd section rather than the function text section, it doesn't work. I'm afraid I don't know what exactly should be done, whether e.g. it could use .section__patchable_function_entries,"awo",@progbits,.L._Z3foov instead, or whether the linker should be changed to handle it as is, or something else. But because we have a P1 regression that didn't see useful progress over the 4 months since it has been filed and we don't really have much time, below is an attempt to do a targetted reversion of H.J's patch, basically act as if HAVE_GAS_SECTION_LINK_ORDER is never true for powerpc64-linux ELFv1, but for 32-bit or 64-bit ELFv2 keep working as is. This would give us time to resolve it for GCC 12 properly. Bootstrapped/regtested on powerpc64-linux (regtest with both -m32/-m64). 2021-03-31 Jakub Jelinek PR testsuite/98125 * targhooks.h (default_print_patchable_function_entry_1): Declare. * targhooks.c (default_print_patchable_function_entry_1): New function, copied from default_print_patchable_function_entry with an added flags argument. (default_print_patchable_function_entry): Rewritten into a small wrapper around default_print_patchable_function_entry_1. * config/rs6000/rs6000.c (TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY): Redefine. (rs6000_print_patchable_function_entry): New function. * g++.dg/pr93195a.C: Skip on powerpc*-*-* 64-bit. --- gcc/targhooks.h.jj 2021-01-04 10:25:39.665224403 +0100 +++ gcc/targhooks.h 2021-03-30 15:48:42.826706369 +0200 @@ -230,6 +230,9 @@ extern bool default_use_by_pieces_infras bool); extern int default_compare_by_pieces_branch_ratio (machine_mode); +extern void default_print_patchable_function_entry_1 (FILE *, + unsigned HOST_WIDE_INT, + bool, unsigned int); extern void default_print_patchable_function_entry (FILE *, unsigned HOST_WIDE_INT, bool); --- gcc/targhooks.c.jj 2021-01-04 10:25:38.974232228 +0100 +++ gcc/targhooks.c 2021-03-30 15:51:22.924932795 +0200 @@ -1832,17 +1832,15 @@ default_compare_by_pieces_branch_ratio ( return 1; } -/* Write PATCH_AREA_SIZE NOPs into the asm outfile FILE around a function - entry. If RECORD_P is true and the target supports named sections, - the location of the NOPs will be recorded in a special object section - called "__patchable_function_entries". This routine may be called - twice per function to put NOPs before and after the function - entry. */ +/* Helper for default_print_patchable_function_entry and other + print_patchable_function_entry hook implementations. */ void -default_print_patchable_function_entry (FILE *file, - unsigned HOST_WIDE_INT patch_area_size, - bool record_p) +default_print_patchable_function_entry_1 (FILE *file, + unsigned HOST_WIDE_INT + patch_area_size, + bool record_p, + unsigned int flags) { const char *nop_templ = 0; int code_num; @@ -1864,9 +1862,6 @@ default_print_patchable_function_entry ( patch_area_number++; ASM_GENERATE_INTERNAL_LABEL (buf, "LPFE", patch_area_number); - unsigned int flags = SECTION_WRITE | SECTION_RELRO; - if (HAVE_GAS_SECTION_LINK_ORDER) - flags |= SECTION_LINK_ORDER; switch_to_section (get_section ("__patchable_function_entries", flags, current_function_decl)); assemble_align (POINTER_SIZE); @@ -1883,6 +1878,25 @@ default_print_patchable_function_entry ( output_asm_insn (nop_templ, NULL); }
Re: znver3 tuning part 1
On 3/31/21 1:08 PM, Jan Hubicka wrote: 2021-03-15 Jan Hubicka * config/i386/i386-options.c (processor_cost_table): Add znver3_cost. * config/i386/x86-tune-costs.h (znver3_cost): New gobal variable; copy of znver2_cost. I have backported the patch to gcc10 branch as g:aa99212489545c6c970a8f91b3d37ea6466cb988. Honza Looking at the backport, it has likely enabled PR99753. Please consider backporting 4f00c4d40a539360938607561460904663c64cda. Cheers, Martin
Re: [PATCH] Handle CONST_POLY_INTs in CONST_VECTORs [PR97141, PR98726]
On Wed, Mar 31, 2021 at 12:24 PM Richard Sandiford via Gcc-patches wrote: > > This PR is caused by POLY_INT_CSTs being (necessarily) valid > in tree-level VECTOR_CSTs but CONST_POLY_INTs not being valid > in RTL CONST_VECTORs. I can't tell/remember how deliberate > that was, but I'm guessing not very. In particular, > valid_for_const_vector_p was added to guard against symbolic > constants rather than CONST_POLY_INTs. > > I did briefly consider whether we should maintain the current > status anyway. However, that would then require a way of > constructing variable-length vectors from individiual elements > if, say, we have: > >{ [2, 2], [3, 2], [4, 2], … } > > So I'm chalking this up to an oversight. I think the intention > (and certainly the natural thing) is to have the same rules for > both trees and RTL. > > The SVE CONST_VECTOR code should already be set up to handle > CONST_POLY_INTs. However, we need to add support for Advanced SIMD > CONST_VECTORs that happen to contain SVE-based values. The patch does > that by expanding such CONST_VECTORs in the same way as variable vectors. > > Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu. > OK for the target-independent bits? OK. Richard. > Richard > > > gcc/ > PR rtl-optimization/97141 > PR rtl-optimization/98726 > * emit-rtl.c (valid_for_const_vector_p): Return true for > CONST_POLY_INT_P. > * rtx-vector-builder.h (rtx_vector_builder::step): Return a > poly_wide_int instead of a wide_int. > (rtx_vector_builder::apply_set): Take a poly_wide_int instead > of a wide_int. > * rtx-vector-builder.c (rtx_vector_builder::apply_set): Likewise. > * config/aarch64/aarch64.c (aarch64_legitimate_constant_p): Return > false for CONST_VECTORs that cannot be forced to memory. > * config/aarch64/aarch64-simd.md (mov): If a CONST_VECTOR > is too complex to force to memory, build it up from individual > elements instead. > > gcc/testsuite/ > PR rtl-optimization/97141 > PR rtl-optimization/98726 > * gcc.c-torture/compile/pr97141.c: New test. > * gcc.c-torture/compile/pr98726.c: Likewise. > * gcc.target/aarch64/sve/pr97141.c: Likewise. > * gcc.target/aarch64/sve/pr98726.c: Likewise. > --- > gcc/config/aarch64/aarch64-simd.md| 11 + > gcc/config/aarch64/aarch64.c | 24 +++ > gcc/emit-rtl.c| 1 + > gcc/rtx-vector-builder.c | 6 ++--- > gcc/rtx-vector-builder.h | 10 > gcc/testsuite/gcc.c-torture/compile/pr97141.c | 8 +++ > gcc/testsuite/gcc.c-torture/compile/pr98726.c | 7 ++ > .../gcc.target/aarch64/sve/pr97141.c | 10 > .../gcc.target/aarch64/sve/pr98726.c | 9 +++ > 9 files changed, 68 insertions(+), 18 deletions(-) > create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr97141.c > create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr98726.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr97141.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr98726.c > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index d86e8e72f18..4edee99051c 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -35,6 +35,17 @@ (define_expand "mov" > && aarch64_mem_pair_operand (operands[0], DImode)) >|| known_eq (GET_MODE_SIZE (mode), 8 >operands[1] = force_reg (mode, operands[1]); > + > + /* If a constant is too complex to force to memory (e.g. because it > + contains CONST_POLY_INTs), build it up from individual elements instead. > + We should only need to do this before RA; aarch64_legitimate_constant_p > + should ensure that we don't try to rematerialize the constant later. */ > + if (GET_CODE (operands[1]) == CONST_VECTOR > + && targetm.cannot_force_const_mem (mode, operands[1])) > +{ > + aarch64_expand_vector_init (operands[0], operands[1]); > + DONE; > +} >" > ) > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 5eda9e80bd0..77573e96820 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -17925,10 +17925,22 @@ aarch64_legitimate_constant_p (machine_mode mode, > rtx x) > { >/* Support CSE and rematerialization of common constants. */ >if (CONST_INT_P (x) > - || (CONST_DOUBLE_P (x) && GET_MODE_CLASS (mode) == MODE_FLOAT) > - || GET_CODE (x) == CONST_VECTOR) > + || (CONST_DOUBLE_P (x) && GET_MODE_CLASS (mode) == MODE_FLOAT)) > return true; > > + /* Only accept variable-length vector constants if they can be > + handled directly. > + > + ??? It would be possible (but complex) to handle rematerialization > + of other
Re: znver3 tuning part 1
> > 2021-03-15 Jan Hubicka > > * config/i386/i386-options.c (processor_cost_table): Add znver3_cost. > * config/i386/x86-tune-costs.h (znver3_cost): New gobal variable; copy > of znver2_cost. I have backported the patch to gcc10 branch as g:aa99212489545c6c970a8f91b3d37ea6466cb988. Honza
Re: [PATCH] gimple-fold: Recompute ADDR_EXPR flags after folding a TMR [PR98268]
On Wed, Mar 31, 2021 at 12:23 PM Richard Sandiford via Gcc-patches wrote: > > The gimple verifier picked up that an ADDR_EXPR of a MEM_REF was not > marked TREE_CONSTANT even though the address was in fact invariant. > This came from folding a _MEM_REF with constant operands to > a _REF; _MEM_REF is never treated as TREE_CONSTANT > but _REF can be. > > Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu. > OK to install? OK. Richard. > Richard > > gcc/ > PR tree-optimization/98268 > * gimple-fold.c (maybe_canonicalize_mem_ref_addr): Call > recompute_tree_invariant_for_addr_expr after successfully > folding a TARGET_MEM_REF that occurs inside an ADDR_EXPR. > > gcc/testsuite/ > PR tree-optimization/98268 > * gcc.target/aarch64/sve/pr98268-1.c: New test. > * gcc.target/aarch64/sve/pr98268-2.c: Likewise. > --- > gcc/gimple-fold.c| 2 ++ > gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c | 11 +++ > gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c | 10 ++ > 3 files changed, 23 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c > > diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c > index de5a6c22395..9e6683dbac9 100644 > --- a/gcc/gimple-fold.c > +++ b/gcc/gimple-fold.c > @@ -5877,6 +5877,8 @@ maybe_canonicalize_mem_ref_addr (tree *t, bool is_debug > = false) >if (tem) > { > *t = tem; > + if (TREE_CODE (*orig_t) == ADDR_EXPR) > + recompute_tree_invariant_for_addr_expr (*orig_t); > res = true; > } > } > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c > b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c > new file mode 100644 > index 000..fdbe55e0b4e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c > @@ -0,0 +1,11 @@ > +/* { dg-do link } */ > +/* { dg-options "-flto -O -ftree-vectorize > --param=aarch64-autovec-preference=3" } */ > +/* { dg-additional-sources "pr98268-2.c" } */ > + > +short d, e; > +void f(char, long*); > +int main() { > + long x; > + f(-114, ); > + return d == e; > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c > b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c > new file mode 100644 > index 000..de3b05d5e15 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O -ftree-vectorize --param=aarch64-autovec-preference=3" } > */ > + > +extern short d[], e[]; > +void f(char a, long *b) { > + for (int c = 0; c < a - 12; c++) { > +d[c] = b[c]; > +e[c] = 0; > + } > +}
Re: [PATCH] data-ref: Tighten index-based alias checks [PR99726]
On Wed, Mar 31, 2021 at 12:15 PM Richard Sandiford via Gcc-patches wrote: > > create_intersect_range_checks_index tries to create a runtime > alias check based on index comparisons. It looks through the > access functions for the two DRs to find a SCEV for the loop > that is being versioned and converts a DR_STEP-based check > into an index-based check. > > However, there isn't any reliable sign information in the types, > so the code expects the value of the IV step (when interpreted as > signed) to be negative iff the DR_STEP (when interpreted as signed) > is negative. > > r10-4762 added another assert related to this assumption and the > assert fired for the testcase in the PR. The sign of the IV step > didn't match the sign of the DR_STEP. > > I think this is actually showing what was previously a wrong-code bug. > The signs didn't match because the DRs contained *two* access function > SCEVs for the loop being versioned. It doesn't look like the code > is set up to deal with this, since it checks each access function > independently and treats it as the sole source of DR_STEP. I think it shows that matching DR_STEP with the access function step doesn't make much sense. In practice it will work for the case where there's a single access function evolving in the respective loop since we don't have negative stride arrays via array_ref_element_size (do we?). But of course for multiple access functions we can't simply choose one to do an index overlap test but instead we'd have to check all of them. That said, I wonder why the code needs the DR_STEP at all and cannot simply use the SCEVs step [direction]. Of course the patch is correct in that it restricts handling to a single access fn evolving in the loop. Thus - OK. Removing the DR_STEP restriction might allow more (weird, see above) cases to be handled. Thanks, Richard. > The patch therefore moves the main condition out of the loop. > This also has the advantage of not building a tree for one access > function only to throw it away if we find an inner function that > makes the comparison invalid. > > Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu. > OK to install? > > Richard > > > gcc/ > PR tree-optimization/99726 > * tree-data-ref.c (create_intersect_range_checks_index): Bail > out if there is more than one access function SCEV for the loop > being versioned. > > gcc/testsuite/ > PR tree-optimization/99726 > * gcc.target/i386/pr99726.c: New test. > --- > gcc/testsuite/gcc.target/i386/pr99726.c | 15 ++ > gcc/tree-data-ref.c | 245 +--- > 2 files changed, 143 insertions(+), 117 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr99726.c > > [ -b version ]-- > > diff --git a/gcc/testsuite/gcc.target/i386/pr99726.c > b/gcc/testsuite/gcc.target/i386/pr99726.c > new file mode 100644 > index 000..ff19bcabe4f > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr99726.c > @@ -0,0 +1,15 @@ > +/* { dg-options "-flive-patching=inline-clone -mavx512f -O2 > -floop-nest-optimize -ftree-loop-vectorize -ftrapv -m32" } */ > + > +extern int a[256][1024]; > +int b; > +long c, d; > +unsigned int e; > + > +int > +main () > +{ > + for (; e < d; e++) > +for (unsigned j = 1; j < c; j++) > + a[e][j] = b * a[e - 1][j + 1]; > + return 0; > +} > diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c > index 124a7bea6a9..e6dd5f15bed 100644 > --- a/gcc/tree-data-ref.c > +++ b/gcc/tree-data-ref.c > @@ -2147,8 +2147,8 @@ create_intersect_range_checks_index (class loop *loop, > tree *cond_expr, > >bool waw_or_war_p = (alias_pair.flags & ~(DR_ALIAS_WAR | DR_ALIAS_WAW)) == > 0; > > - unsigned int i; > - for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++) > + int found = -1; > + for (unsigned int i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++) > { >tree access1 = DR_ACCESS_FN (dr_a.dr, i); >tree access2 = DR_ACCESS_FN (dr_b.dr, i); > @@ -2164,7 +2164,19 @@ create_intersect_range_checks_index (class loop *loop, > tree *cond_expr, > > return false; > } > + if (found >= 0) > + return false; > + found = i; > +} > + > + /* Ought not to happen in practice, since if all accesses are equal then > the > + alias should be decidable at compile time. */ > + if (found < 0) > +return false; > + >/* The two indices must have the same step. */ > + tree access1 = DR_ACCESS_FN (dr_a.dr, found); > + tree access2 = DR_ACCESS_FN (dr_b.dr, found); >if (!operand_equal_p (CHREC_RIGHT (access1), CHREC_RIGHT (access2), 0)) > return false; > > @@ -2221,7 +2233,7 @@ create_intersect_range_checks_index (class loop *loop, > tree *cond_expr, > When checking for general overlap, we need to test whether > the range: > > - [min1 + low_offset1, min2 + high_offset1 + idx_access1 - 1] > +
[committed] aarch64: Fix target alignment for SVE [PR98119]
The vectoriser supports peeling for alignment using predication: we move back to the previous aligned boundary and make the skipped elements inactive in the first loop iteration. As it happens, the costs for existing CPUs give an equal cost to aligned and unaligned accesses, so this feature is rarely used. However, the PR shows that when the feature was forced on, we were still trying to align to a full-vector boundary even when using partial vectors. Tested on aarch64-linux-gnu, pushed to trunk so far. Richard gcc/ PR target/98119 * config/aarch64/aarch64.c (aarch64_vectorize_preferred_vector_alignment): Query the size of the provided SVE vector; do not assume that all SVE vectors have the same size. gcc/testsuite/ PR target/98119 * gcc.target/aarch64/sve/pr98119.c: New test. --- gcc/config/aarch64/aarch64.c | 7 --- gcc/testsuite/gcc.target/aarch64/sve/pr98119.c | 13 + 2 files changed, 17 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr98119.c diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5eda9e80bd0..f878721f13c 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -20275,10 +20275,11 @@ aarch64_vectorize_preferred_vector_alignment (const_tree type) { if (aarch64_sve_data_mode_p (TYPE_MODE (type))) { - /* If the length of the vector is fixed, try to align to that length, -otherwise don't try to align at all. */ + /* If the length of the vector is a fixed power of 2, try to align +to that length, otherwise don't try to align at all. */ HOST_WIDE_INT result; - if (!BITS_PER_SVE_VECTOR.is_constant ()) + if (!GET_MODE_BITSIZE (TYPE_MODE (type)).is_constant () + || !pow2p_hwi (result)) result = TYPE_ALIGN (TREE_TYPE (type)); return result; } diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98119.c b/gcc/testsuite/gcc.target/aarch64/sve/pr98119.c new file mode 100644 index 000..da6208c2ce3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98119.c @@ -0,0 +1,13 @@ +/* { dg-options "-O3 -msve-vector-bits=512 -mtune=thunderx" } */ + +void +f (unsigned short *x) +{ + for (int i = 0; i < 1000; ++i) +x[i] += x[i - 16]; +} + +/* { dg-final { scan-assembler-not {\tubfx\t[wx][0-9]+, [wx][0-9]+, #?1, #?5\n} } } */ +/* { dg-final { scan-assembler-not {\tand\tx[0-9]+, x[0-9]+, #?-63\n} } } */ +/* { dg-final { scan-assembler {\tubfx\t[wx][0-9]+, [wx][0-9]+, #?1, #?4\n} } } */ +/* { dg-final { scan-assembler {\tand\tx[0-9]+, x[0-9]+, #?-31\n} } } */
[PATCH] Handle CONST_POLY_INTs in CONST_VECTORs [PR97141, PR98726]
This PR is caused by POLY_INT_CSTs being (necessarily) valid in tree-level VECTOR_CSTs but CONST_POLY_INTs not being valid in RTL CONST_VECTORs. I can't tell/remember how deliberate that was, but I'm guessing not very. In particular, valid_for_const_vector_p was added to guard against symbolic constants rather than CONST_POLY_INTs. I did briefly consider whether we should maintain the current status anyway. However, that would then require a way of constructing variable-length vectors from individiual elements if, say, we have: { [2, 2], [3, 2], [4, 2], … } So I'm chalking this up to an oversight. I think the intention (and certainly the natural thing) is to have the same rules for both trees and RTL. The SVE CONST_VECTOR code should already be set up to handle CONST_POLY_INTs. However, we need to add support for Advanced SIMD CONST_VECTORs that happen to contain SVE-based values. The patch does that by expanding such CONST_VECTORs in the same way as variable vectors. Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu. OK for the target-independent bits? Richard gcc/ PR rtl-optimization/97141 PR rtl-optimization/98726 * emit-rtl.c (valid_for_const_vector_p): Return true for CONST_POLY_INT_P. * rtx-vector-builder.h (rtx_vector_builder::step): Return a poly_wide_int instead of a wide_int. (rtx_vector_builder::apply_set): Take a poly_wide_int instead of a wide_int. * rtx-vector-builder.c (rtx_vector_builder::apply_set): Likewise. * config/aarch64/aarch64.c (aarch64_legitimate_constant_p): Return false for CONST_VECTORs that cannot be forced to memory. * config/aarch64/aarch64-simd.md (mov): If a CONST_VECTOR is too complex to force to memory, build it up from individual elements instead. gcc/testsuite/ PR rtl-optimization/97141 PR rtl-optimization/98726 * gcc.c-torture/compile/pr97141.c: New test. * gcc.c-torture/compile/pr98726.c: Likewise. * gcc.target/aarch64/sve/pr97141.c: Likewise. * gcc.target/aarch64/sve/pr98726.c: Likewise. --- gcc/config/aarch64/aarch64-simd.md| 11 + gcc/config/aarch64/aarch64.c | 24 +++ gcc/emit-rtl.c| 1 + gcc/rtx-vector-builder.c | 6 ++--- gcc/rtx-vector-builder.h | 10 gcc/testsuite/gcc.c-torture/compile/pr97141.c | 8 +++ gcc/testsuite/gcc.c-torture/compile/pr98726.c | 7 ++ .../gcc.target/aarch64/sve/pr97141.c | 10 .../gcc.target/aarch64/sve/pr98726.c | 9 +++ 9 files changed, 68 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr97141.c create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr98726.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr97141.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr98726.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index d86e8e72f18..4edee99051c 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -35,6 +35,17 @@ (define_expand "mov" && aarch64_mem_pair_operand (operands[0], DImode)) || known_eq (GET_MODE_SIZE (mode), 8 operands[1] = force_reg (mode, operands[1]); + + /* If a constant is too complex to force to memory (e.g. because it + contains CONST_POLY_INTs), build it up from individual elements instead. + We should only need to do this before RA; aarch64_legitimate_constant_p + should ensure that we don't try to rematerialize the constant later. */ + if (GET_CODE (operands[1]) == CONST_VECTOR + && targetm.cannot_force_const_mem (mode, operands[1])) +{ + aarch64_expand_vector_init (operands[0], operands[1]); + DONE; +} " ) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5eda9e80bd0..77573e96820 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -17925,10 +17925,22 @@ aarch64_legitimate_constant_p (machine_mode mode, rtx x) { /* Support CSE and rematerialization of common constants. */ if (CONST_INT_P (x) - || (CONST_DOUBLE_P (x) && GET_MODE_CLASS (mode) == MODE_FLOAT) - || GET_CODE (x) == CONST_VECTOR) + || (CONST_DOUBLE_P (x) && GET_MODE_CLASS (mode) == MODE_FLOAT)) return true; + /* Only accept variable-length vector constants if they can be + handled directly. + + ??? It would be possible (but complex) to handle rematerialization + of other constants via secondary reloads. */ + if (!GET_MODE_SIZE (mode).is_constant ()) +return aarch64_simd_valid_immediate (x, NULL); + + /* Otherwise, accept any CONST_VECTOR that, if all else fails, can at + least be forced to memory and loaded from there. */ + if (GET_CODE (x) ==
[PATCH] gimple-fold: Recompute ADDR_EXPR flags after folding a TMR [PR98268]
The gimple verifier picked up that an ADDR_EXPR of a MEM_REF was not marked TREE_CONSTANT even though the address was in fact invariant. This came from folding a _MEM_REF with constant operands to a _REF; _MEM_REF is never treated as TREE_CONSTANT but _REF can be. Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu. OK to install? Richard gcc/ PR tree-optimization/98268 * gimple-fold.c (maybe_canonicalize_mem_ref_addr): Call recompute_tree_invariant_for_addr_expr after successfully folding a TARGET_MEM_REF that occurs inside an ADDR_EXPR. gcc/testsuite/ PR tree-optimization/98268 * gcc.target/aarch64/sve/pr98268-1.c: New test. * gcc.target/aarch64/sve/pr98268-2.c: Likewise. --- gcc/gimple-fold.c| 2 ++ gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c | 11 +++ gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c | 10 ++ 3 files changed, 23 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index de5a6c22395..9e6683dbac9 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -5877,6 +5877,8 @@ maybe_canonicalize_mem_ref_addr (tree *t, bool is_debug = false) if (tem) { *t = tem; + if (TREE_CODE (*orig_t) == ADDR_EXPR) + recompute_tree_invariant_for_addr_expr (*orig_t); res = true; } } diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c new file mode 100644 index 000..fdbe55e0b4e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-1.c @@ -0,0 +1,11 @@ +/* { dg-do link } */ +/* { dg-options "-flto -O -ftree-vectorize --param=aarch64-autovec-preference=3" } */ +/* { dg-additional-sources "pr98268-2.c" } */ + +short d, e; +void f(char, long*); +int main() { + long x; + f(-114, ); + return d == e; +} diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c new file mode 100644 index 000..de3b05d5e15 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr98268-2.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O -ftree-vectorize --param=aarch64-autovec-preference=3" } */ + +extern short d[], e[]; +void f(char a, long *b) { + for (int c = 0; c < a - 12; c++) { +d[c] = b[c]; +e[c] = 0; + } +}
[PATCH] data-ref: Tighten index-based alias checks [PR99726]
create_intersect_range_checks_index tries to create a runtime alias check based on index comparisons. It looks through the access functions for the two DRs to find a SCEV for the loop that is being versioned and converts a DR_STEP-based check into an index-based check. However, there isn't any reliable sign information in the types, so the code expects the value of the IV step (when interpreted as signed) to be negative iff the DR_STEP (when interpreted as signed) is negative. r10-4762 added another assert related to this assumption and the assert fired for the testcase in the PR. The sign of the IV step didn't match the sign of the DR_STEP. I think this is actually showing what was previously a wrong-code bug. The signs didn't match because the DRs contained *two* access function SCEVs for the loop being versioned. It doesn't look like the code is set up to deal with this, since it checks each access function independently and treats it as the sole source of DR_STEP. The patch therefore moves the main condition out of the loop. This also has the advantage of not building a tree for one access function only to throw it away if we find an inner function that makes the comparison invalid. Tested on aarch64-linux-gnu, armeb-eabi and x86_64-linux-gnu. OK to install? Richard gcc/ PR tree-optimization/99726 * tree-data-ref.c (create_intersect_range_checks_index): Bail out if there is more than one access function SCEV for the loop being versioned. gcc/testsuite/ PR tree-optimization/99726 * gcc.target/i386/pr99726.c: New test. --- gcc/testsuite/gcc.target/i386/pr99726.c | 15 ++ gcc/tree-data-ref.c | 245 +--- 2 files changed, 143 insertions(+), 117 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr99726.c [ -b version ]-- diff --git a/gcc/testsuite/gcc.target/i386/pr99726.c b/gcc/testsuite/gcc.target/i386/pr99726.c new file mode 100644 index 000..ff19bcabe4f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr99726.c @@ -0,0 +1,15 @@ +/* { dg-options "-flive-patching=inline-clone -mavx512f -O2 -floop-nest-optimize -ftree-loop-vectorize -ftrapv -m32" } */ + +extern int a[256][1024]; +int b; +long c, d; +unsigned int e; + +int +main () +{ + for (; e < d; e++) +for (unsigned j = 1; j < c; j++) + a[e][j] = b * a[e - 1][j + 1]; + return 0; +} diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index 124a7bea6a9..e6dd5f15bed 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -2147,8 +2147,8 @@ create_intersect_range_checks_index (class loop *loop, tree *cond_expr, bool waw_or_war_p = (alias_pair.flags & ~(DR_ALIAS_WAR | DR_ALIAS_WAW)) == 0; - unsigned int i; - for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++) + int found = -1; + for (unsigned int i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++) { tree access1 = DR_ACCESS_FN (dr_a.dr, i); tree access2 = DR_ACCESS_FN (dr_b.dr, i); @@ -2164,7 +2164,19 @@ create_intersect_range_checks_index (class loop *loop, tree *cond_expr, return false; } + if (found >= 0) + return false; + found = i; +} + + /* Ought not to happen in practice, since if all accesses are equal then the + alias should be decidable at compile time. */ + if (found < 0) +return false; + /* The two indices must have the same step. */ + tree access1 = DR_ACCESS_FN (dr_a.dr, found); + tree access2 = DR_ACCESS_FN (dr_b.dr, found); if (!operand_equal_p (CHREC_RIGHT (access1), CHREC_RIGHT (access2), 0)) return false; @@ -2221,7 +2233,7 @@ create_intersect_range_checks_index (class loop *loop, tree *cond_expr, When checking for general overlap, we need to test whether the range: - [min1 + low_offset1, min2 + high_offset1 + idx_access1 - 1] + [min1 + low_offset1, min1 + high_offset1 + idx_access1 - 1] overlaps: @@ -2312,7 +2324,6 @@ create_intersect_range_checks_index (class loop *loop, tree *cond_expr, *cond_expr, part_cond_expr); else *cond_expr = part_cond_expr; -} if (dump_enabled_p ()) { if (waw_or_war_p) [ Real patch ]-- diff --git a/gcc/testsuite/gcc.target/i386/pr99726.c b/gcc/testsuite/gcc.target/i386/pr99726.c new file mode 100644 index 000..ff19bcabe4f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr99726.c @@ -0,0 +1,15 @@ +/* { dg-options "-flive-patching=inline-clone -mavx512f -O2 -floop-nest-optimize -ftree-loop-vectorize -ftrapv -m32" } */ + +extern int a[256][1024]; +int b; +long c, d; +unsigned int e; + +int +main () +{ + for (; e < d; e++) +for (unsigned j = 1; j < c; j++) + a[e][j] = b * a[e - 1][j + 1]; + return 0; +} diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index 124a7bea6a9..e6dd5f15bed 100644
Re: [PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]
On Wed, 31 Mar 2021, Stam Markianos-Wright wrote: > On 29/03/2021 10:20, Richard Biener wrote: > > On Fri, 26 Mar 2021, Richard Sandiford wrote: > > > >> Richard Biener writes: > >>> On Wed, 24 Mar 2021, Stam Markianos-Wright wrote: > >>> > Hi all, > > This patch resolves bug: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974 > > This is achieved by forcing a re-calculation of *stmt_vectype_out if an > incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an > extra introduced max_nunits ceiling. > > I am not 100% sure if this is the best way to go about fixing this, > because > this is my first look at the vectorizer and I lack knowledge of the wider > context, so do let me know if you see a better way to do this! > > I have added the previously ICE-ing reproducer as a new test. > > This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" > for > GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10. > > (the non-fdisable-tree-fre4 version has gone latent on GCC11) > > Bootstrapped and reg-tested on aarch64-linux-gnu. > Also reg-tested on aarch64-none-elf. > >>> > >>> I don't think this is going to work well given uses will expect > >>> a vector type that's consistent here. > >>> > >>> I think giving up is for the moment the best choice, thus replacing > >>> the assert with vectorization failure. > >>> > >>> In the end we shouldn't require those nunits vectypes to be > >>> separately computed - we compute the vector type of the defs > >>> anyway and in case they're invariant the vectorizable_* function > >>> either can deal with the type mix or not anyway. > >> > >> I agree this area needs simplification, but I think the direction of > >> travel should be to make the assert valid. I agree this is probably > >> the pragmatic fix for GCC 11 and earlier though. > > > > The issue is that we compute a vector type for a use that may differ > > from what we'd compute for it in the context of its definition (or > > in the context of another use). Any such "local" decision is likely > > flawed and I'd rather simplify further doing the only decision on > > the definition side - if there's a disconnect between the number > > of lanes (and thus altering the VF won't help) then we have to give > > up anyway. > > > > Richard. > > > > Thank you both for the further info! Would it be fair to close the initial PR > regarding the ICE (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974) and > then open a second one at a lower priority level to address these further > improvements? I have closed the original report. If there's a testcase where vectorization is possible and useful but is now rejected because of the fix then yes, please open a new missed-optimization bugreport. Richard.
Re: [PATCH] testsuite/aarch64: Skip SLP diagnostic under ILP32 (PR target/96974)
On Mon, Mar 29, 2021 at 1:40 PM Christophe Lyon via Gcc-patches wrote: > > The vectorizer has a very different effect with -mabi=ilp32, and > doesn't emit the expecte diagnostic, so this patch expects it only > under lp64. OK (please also backport as necessary) > 2021-03-29 Christophe Lyon > > gcc/testsuite/ > PR target/96974 > * g++.target/aarch64/sve/pr96974.C: Expect SLP diagnostic only > under lp64. > --- > gcc/testsuite/g++.target/aarch64/sve/pr96974.C | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C > b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C > index 363241d..54000f5 100644 > --- a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C > +++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C > @@ -15,4 +15,4 @@ struct c { > int coeffs[10]; > } f; > > -/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of > vector subparts between" "slp1" } } */ > +/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of > vector subparts between" "slp1" { target lp64 } } } */ > -- > 2.7.4 >
[PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]
On 29/03/2021 10:20, Richard Biener wrote: On Fri, 26 Mar 2021, Richard Sandiford wrote: Richard Biener writes: On Wed, 24 Mar 2021, Stam Markianos-Wright wrote: Hi all, This patch resolves bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974 This is achieved by forcing a re-calculation of *stmt_vectype_out if an incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an extra introduced max_nunits ceiling. I am not 100% sure if this is the best way to go about fixing this, because this is my first look at the vectorizer and I lack knowledge of the wider context, so do let me know if you see a better way to do this! I have added the previously ICE-ing reproducer as a new test. This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" for GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10. (the non-fdisable-tree-fre4 version has gone latent on GCC11) Bootstrapped and reg-tested on aarch64-linux-gnu. Also reg-tested on aarch64-none-elf. I don't think this is going to work well given uses will expect a vector type that's consistent here. I think giving up is for the moment the best choice, thus replacing the assert with vectorization failure. In the end we shouldn't require those nunits vectypes to be separately computed - we compute the vector type of the defs anyway and in case they're invariant the vectorizable_* function either can deal with the type mix or not anyway. I agree this area needs simplification, but I think the direction of travel should be to make the assert valid. I agree this is probably the pragmatic fix for GCC 11 and earlier though. The issue is that we compute a vector type for a use that may differ from what we'd compute for it in the context of its definition (or in the context of another use). Any such "local" decision is likely flawed and I'd rather simplify further doing the only decision on the definition side - if there's a disconnect between the number of lanes (and thus altering the VF won't help) then we have to give up anyway. Richard. Thank you both for the further info! Would it be fair to close the initial PR regarding the ICE (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974) and then open a second one at a lower priority level to address these further improvements? Also Christophe has kindly found out that the test FAILs in ILP32, so it would be great to get that one in asap, too! https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567431.html Cheers, Stam
Small refactoring of cgraph_node::release_body
Hi, in the dicussion on PR 99447 there was some confusion about release_body being used in context where call edges/references survive. This is not a valid use because it would leave stale pointers to ggc_freed memory location. By auditing code I did not find any however this patch moves the callees/references removal into the function itself that makes it bit more robust. Some code paths calling release_body already free these earlier, but checking poitners for being NULL is not that expensive. Bootstrapped/regtested x86_64-linux, comitted. PR lto/99447 * cgraph.c (cgraph_node::release_body): Remove all callers and references. * cgraphclones.c (cgraph_node::materialize_clone): Do not do it here. * cgraphunit.c (cgraph_node::expand): And here. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 80140757d16..b77c676a58a 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -1860,6 +1860,9 @@ cgraph_node::release_body (bool keep_arguments) lto_free_function_in_decl_state_for_node (this); lto_file_data = NULL; } + gcc_assert (!clones); + remove_callees (); + remove_all_references (); } /* Remove function from symbol table. */ diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c index 95103a423f7..9f86463b42d 100644 --- a/gcc/cgraphclones.c +++ b/gcc/cgraphclones.c @@ -1143,11 +1143,7 @@ cgraph_node::materialize_clone () /* Function is no longer clone. */ remove_from_clone_tree (); if (!this_clone_of->analyzed && !this_clone_of->clones) -{ - this_clone_of->release_body (); - this_clone_of->remove_callees (); - this_clone_of->remove_all_references (); -} +this_clone_of->release_body (); } #include "gt-cgraphclones.h" diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 1c74cee69ac..0b70e4d4fde 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -1892,10 +1892,6 @@ cgraph_node::expand (void) comdat groups. */ assemble_thunks_and_aliases (); release_body (); - /* Eliminate all call edges. This is important so the GIMPLE_CALL no longer - points to the dead function body. */ - remove_callees (); - remove_all_references (); } /* Node comparator that is responsible for the order that corresponds
[PATCH][pushed] Fix coding style in IPA modref.
Approved by Honza and pushed to master. Martin gcc/ChangeLog: * ipa-modref.c (analyze_ssa_name_flags): Fix coding style and one negated condition. --- gcc/ipa-modref.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c index 5f33bb5b410..ef5e62beb0e 100644 --- a/gcc/ipa-modref.c +++ b/gcc/ipa-modref.c @@ -1673,7 +1673,7 @@ analyze_ssa_name_flags (tree name, vec , int depth, if (!record_ipa) lattice[index].merge (call_flags); - if (record_ipa) + else lattice[index].add_escape_point (call, i, call_flags, true); } @@ -1691,8 +1691,8 @@ analyze_ssa_name_flags (tree name, vec , int depth, int call_flags = deref_flags (gimple_call_arg_flags (call, i), ignore_stores); if (!record_ipa) - lattice[index].merge (call_flags); - if (record_ipa) + lattice[index].merge (call_flags); + else lattice[index].add_escape_point (call, i, call_flags, false); } -- 2.30.2
Re: [PATCH] aarch64: Fix up *add3_poly_1 [PR99813]
Jakub Jelinek writes: > Hi! > > As mentioned in the PR, Uai constraint stands for > aarch64_sve_scalar_inc_dec_immediate > while Uav for > aarch64_sve_addvl_addpl_immediate. > Both *add3_aarch64 and *add3_poly_1 patterns use > * return aarch64_output_sve_scalar_inc_dec (operands[2]); > * return aarch64_output_sve_addvl_addpl (operands[2]); > in that order, but the former with Uai,Uav order, while the > latter with Uav,Uai instead. This patch swaps the constraints > so that they match the output. > > Bootstrapped/regtested on aarch64-linux, ok for trunk? OK, thanks. Richard > 2021-03-30 Jakub Jelinek > Richard Sandiford > > PR target/99813 > * config/aarch64/aarch64.md (*add3_poly_1): Swap Uai and Uav > constraints on operands[2] and similarly 0 and rk constraints > on operands[1] corresponding to that. > > * g++.target/aarch64/sve/pr99813.C: New test. > > --- gcc/config/aarch64/aarch64.md.jj 2021-02-25 23:07:07.851319165 +0100 > +++ gcc/config/aarch64/aarch64.md 2021-03-30 11:13:35.994077470 +0200 > @@ -2050,8 +2050,8 @@ >[(set > (match_operand:GPI 0 "register_operand" "=r,r,r,r,r,r,") > (plus:GPI > - (match_operand:GPI 1 "register_operand" "%rk,rk,rk,rk,rk,0,rk") > - (match_operand:GPI 2 "aarch64_pluslong_or_poly_operand" > "I,r,J,Uaa,Uav,Uai,Uat")))] > + (match_operand:GPI 1 "register_operand" "%rk,rk,rk,rk,0,rk,rk") > + (match_operand:GPI 2 "aarch64_pluslong_or_poly_operand" > "I,r,J,Uaa,Uai,Uav,Uat")))] >"TARGET_SVE && operands[0] != stack_pointer_rtx" >"@ >add\\t%0, %1, %2 > --- gcc/testsuite/g++.target/aarch64/sve/pr99813.C.jj 2021-03-30 > 11:22:13.430290522 +0200 > +++ gcc/testsuite/g++.target/aarch64/sve/pr99813.C2021-03-30 > 11:22:55.526819721 +0200 > @@ -0,0 +1,27 @@ > +// PR target/99813 > +/* { dg-do assemble { target aarch64_asm_sve_ok } } */ > +/* { dg-options "-O3 -march=armv8.2-a+sve -fvect-cost-model=unlimited > -fno-tree-dominator-opts -mtune=cortex-a72" } */ > + > +long a, b; > +bool c[2][14][2][16], f[2][14][2][16]; > +bool d; > +char e[2][4][2][6]; > +void g() { > + a = 0; > + for (int h = 0; h < 2; ++h) > +for (int i = 0; i < 14; ++i) > + for (int j = 0; j < 2; ++j) > +for (int k = 0; k < 16; ++k) > + c[h][i][j][k] = 0; > + d = 0; > + for (int h; h < 2; ++h) > +for (int i = 0; i < 4; ++i) > + for (int j = 0; j < 2; ++j) > +for (int k = 0; k < 6; ++k) > + e[h][i][j][k] = 6; > + for (int h = 0; h < 2; ++h) > +for (int i = 0; i < 14; ++i) > + for (int j = 0; j < 2; ++j) > +for (int k = 0; k < 16; ++k) > + f[h][i][j][k] = b = 9; > +} > > Jakub
RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU
[AMD Public Use] Hi Honza, > -Original Message- > From: Jan Hubicka > Sent: Wednesday, March 31, 2021 1:27 PM > To: Kumar, Venkataramanan > Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 > CPU > > [CAUTION: External Email] > > > [AMD Public Use] > > > > Hi Honza, > > > > > -Original Message- > > > From: Jan Hubicka > > > Sent: Wednesday, March 31, 2021 1:15 AM > > > To: Kumar, Venkataramanan > > > Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org > > > Subject: Re: [PATCH] [X86_64]: Enable support for next generation > > > AMD Zen3 CPU > > > > > > [CAUTION: External Email] > > > > > > Hi, > > > this patch backports the initial support to gcc10 branch. Since the > > > trunk and branch diverged there is non-trivial change to cpuinfo > > > discovery. I do; > > > > > > --- a/libgcc/config/i386/cpuinfo.c > > > +++ b/libgcc/config/i386/cpuinfo.c > > > @@ -111,6 +111,12 @@ get_amd_cpu (unsigned int family, unsigned int > model) > > >if (model >= 0x30) > > > __cpu_model.__cpu_subtype = AMDFAM17H_ZNVER2; > > >break; > > > +case 0x19: > > > + __cpu_model.__cpu_type = AMDFAM19H; > > > + /* AMD family 19h version 1. */ > > > + if (model <= 0x0f) > > > + __cpu_model.__cpu_subtype = AMDFAM19H_ZNVER3; > > > + break; > > > default: > > >break; > > > } > > > > > > While your patch also sets ZNVER3 for case where VAES is supporte > > > that would require backporting more of logic detecting VAES. Is > > > that necessary? > > > > I think you are referring to the below change. > > > > diff --git a/gcc/config/i386/driver-i386.c > > b/gcc/config/i386/driver-i386.c index ecdad5765d5..2bfa037dd8b 100644 > > --- a/gcc/config/i386/driver-i386.c > > +++ b/gcc/config/i386/driver-i386.c > > @@ -455,6 +455,8 @@ const char *host_detect_local_cpu (int argc, const > char **argv) > > processor = PROCESSOR_GEODE; > >else if (has_feature (FEATURE_MOVBE) && family == 22) > > processor = PROCESSOR_BTVER2; > > + else if (has_feature (FEATURE_VAES)) > > + processor = PROCESSOR_ZNVER3; > >else if (has_feature (FEATURE_CLWB)) > > processor = PROCESSOR_ZNVER2; > > > > My understanding is that when we use -march=native on znver3 machine it > would check for "vaes" to detect the machine. > > Otherwise it would detect it as znver2 machine. So we need that detection > logic. > > I was wondering about > > +case 0x19: > + cpu_model->__cpu_type = AMDFAM19H; > + /* AMD family 19h version 1. */ > + if (model <= 0x0f) > + { > + cpu = "znver3"; > + CHECK___builtin_cpu_is ("znver3"); > + cpu_model->__cpu_subtype = AMDFAM19H_ZNVER3; > + } > + else if (has_cpu_feature (cpu_model, cpu_features2, > + FEATURE_VAES)) > + { > + cpu = "znver3"; > + CHECK___builtin_cpu_is ("znver3"); > + cpu_model->__cpu_subtype = AMDFAM19H_ZNVER3; > + } > + break; > > For znver3 we detect the model number and I wonder why we also test the VAES > feature when we don't do that for other families. Ah I see, I will double check on the model numbers again. Yes we can remove the code which checks for VAES here. > > Honza Regards, Venkat.
Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake
> > It looks like X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is quite obviously > > benefical and independent of the rest of changes. I think we will need > > to discuss bit more the move ratio and the code size/uop cache polution > > issues - one option would be to use increased limits for -O3 only. > > My change only increases CLEAR_RATIO, not MOVE_RATIO. We are > checking code size impacts on SPEC CPU 2017 and eembc. > > > Can you break this out to independent patch? I also wonder if it owuld > > X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB improves performance > only when memcpy/memset costs and MOVE_RATIO are updated the same time, > like: > > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html > > Make it a standalone means moving from Ice Lake patch to Skylake patch. > > > not be more readable to special case this just on the beggining of > > decide_alg. > > > @@ -6890,6 +6891,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT > > > expected_size, > > >const struct processor_costs *cost; > > >int i; > > >bool any_alg_usable_p = false; > > > + bool known_size_p = expected_size != -1; > > > > expected_size is not -1 if we have profile feedback and we detected from > > histogram average size of a block. It seems to me that from description > > that you want the const to be actual compile time constant that would be > > min_size == max_size I guess. > > > > You are right. Here is the v2 patch with min_size != max_size check for > unknown size. Patch is OK now. I was wondering about using avx256 for moves of known size (per comment on MOVE_MAX_PIECES there is issue with MAX_FIXED_MODE_SIZE, but that seems not hard to fix). Did you look into it? Honza > > Thanks. > > -- > H.J.
Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU
> [AMD Public Use] > > Hi Honza, > > > -Original Message- > > From: Jan Hubicka > > Sent: Wednesday, March 31, 2021 1:15 AM > > To: Kumar, Venkataramanan > > Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org > > Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 > > CPU > > > > [CAUTION: External Email] > > > > Hi, > > this patch backports the initial support to gcc10 branch. Since the > > trunk and branch diverged there is non-trivial change to cpuinfo > > discovery. I do; > > > > --- a/libgcc/config/i386/cpuinfo.c > > +++ b/libgcc/config/i386/cpuinfo.c > > @@ -111,6 +111,12 @@ get_amd_cpu (unsigned int family, unsigned int model) > >if (model >= 0x30) > > __cpu_model.__cpu_subtype = AMDFAM17H_ZNVER2; > >break; > > +case 0x19: > > + __cpu_model.__cpu_type = AMDFAM19H; > > + /* AMD family 19h version 1. */ > > + if (model <= 0x0f) > > + __cpu_model.__cpu_subtype = AMDFAM19H_ZNVER3; > > + break; > > default: > >break; > > } > > > > While your patch also sets ZNVER3 for case where VAES is supporte that > > would require backporting more of logic detecting VAES. Is that > > necessary? > > I think you are referring to the below change. > > diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c > index ecdad5765d5..2bfa037dd8b 100644 > --- a/gcc/config/i386/driver-i386.c > +++ b/gcc/config/i386/driver-i386.c > @@ -455,6 +455,8 @@ const char *host_detect_local_cpu (int argc, const char > **argv) > processor = PROCESSOR_GEODE; >else if (has_feature (FEATURE_MOVBE) && family == 22) > processor = PROCESSOR_BTVER2; > + else if (has_feature (FEATURE_VAES)) > + processor = PROCESSOR_ZNVER3; >else if (has_feature (FEATURE_CLWB)) > processor = PROCESSOR_ZNVER2; > > My understanding is that when we use -march=native on znver3 machine it would > check for "vaes" to detect the machine. > Otherwise it would detect it as znver2 machine. So we need that detection > logic. I was wondering about +case 0x19: + cpu_model->__cpu_type = AMDFAM19H; + /* AMD family 19h version 1. */ + if (model <= 0x0f) + { + cpu = "znver3"; + CHECK___builtin_cpu_is ("znver3"); + cpu_model->__cpu_subtype = AMDFAM19H_ZNVER3; + } + else if (has_cpu_feature (cpu_model, cpu_features2, + FEATURE_VAES)) + { + cpu = "znver3"; + CHECK___builtin_cpu_is ("znver3"); + cpu_model->__cpu_subtype = AMDFAM19H_ZNVER3; + } + break; For znver3 we detect the model number and I wonder why we also test the VAES feature when we don't do that for other families. Honza
RE: [PATCH] arm: Fix mult autovectorization patterm for iwmmxt (PR target/99786)
> -Original Message- > From: Gcc-patches On Behalf Of > Christophe Lyon via Gcc-patches > Sent: 29 March 2021 14:16 > To: gcc-patches@gcc.gnu.org > Subject: [PATCH] arm: Fix mult autovectorization patterm for iwmmxt (PR > target/99786) > > Similarly to other recently-added autovectorization patterns, mult has > been erroneously enabled for iwmmxt. However, V4HI and V2SI modes are > supported, so we make an exception for them. Ok. Thanks, Kyrill > > The new testcase is derived from gcc.dg/ubsan/pr79904.c, with > additional modes added. > > I kept dg-do compile because 'assemble' results in error messages from > the assembler, which are not related to this PR: > > Error: selected processor does not support `tmcrr wr0,r4,r5' in ARM mode > Error: selected processor does not support `wstrd wr0,[r0]' in ARM mode > Error: selected processor does not support `wldrd wr0,[r0]' in ARM mode > Error: selected processor does not support `wldrd wr2,.L5' in ARM mode > Error: selected processor does not support `wmulul wr0,wr0,wr2' in ARM > mode > Error: selected processor does not support `wstrd wr0,[r0]' in ARM mode > Error: selected processor does not support `wldrd wr0,[r0]' in ARM mode > Error: selected processor does not support `wldrd wr2,.L8' in ARM mode > Error: selected processor does not support `wmulwl wr0,wr0,wr2' in ARM > mode > Error: selected processor does not support `wstrd wr0,[r0]' in ARM mode > > 2021-03-29 Christophe Lyon > > PR target/99786 > > gcc/ > * config/arm/vec-common.md (mul3): Disable on iwMMXT, > expect > for V4HI and V2SI. > > gcc/testsuite/ > * gcc.target/arm/pr99786.c: New test. > --- > gcc/config/arm/vec-common.md | 5 - > gcc/testsuite/gcc.target/arm/pr99786.c | 30 > ++ > 2 files changed, 34 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/arm/pr99786.c > > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec- > common.md > index 48ee659..0b2b3b1 100644 > --- a/gcc/config/arm/vec-common.md > +++ b/gcc/config/arm/vec-common.md > @@ -103,7 +103,10 @@ (define_expand "mul3" >[(set (match_operand:VDQWH 0 "s_register_operand") > (mult:VDQWH (match_operand:VDQWH 1 "s_register_operand") > (match_operand:VDQWH 2 "s_register_operand")))] > - "ARM_HAVE__ARITH" > + "ARM_HAVE__ARITH > + && (!TARGET_REALLY_IWMMXT > + || mode == V4HImode > + || mode == V2SImode)" > ) > > (define_expand "smin3" > diff --git a/gcc/testsuite/gcc.target/arm/pr99786.c > b/gcc/testsuite/gcc.target/arm/pr99786.c > new file mode 100644 > index 000..11d86f0 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/pr99786.c > @@ -0,0 +1,30 @@ > +/* { dg-do compile } */ > +/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-mcpu=*" } > { "-mcpu=iwmmxt" } } */ > +/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-mabi=*" } > { "-mabi=iwmmxt" } } */ > +/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-march=*" } > { "-march=iwmmxt" } } */ > +/* { dg-skip-if "Test is specific to ARM mode" { arm*-*-* } { "-mthumb" } > { "" } } */ > +/* { dg-require-effective-target arm32 } */ > +/* { dg-require-effective-target arm_iwmmxt_ok } */ > +/* { dg-options "-O3 -mcpu=iwmmxt" } */ > + > +typedef signed char V __attribute__((vector_size (8))); > + > +void > +foo (V *a) > +{ > + *a = *a * 3; > +} > + > +typedef signed short Vshort __attribute__((vector_size (8))); > +void > +foo_short (Vshort *a) > +{ > + *a = *a * 3; > +} > + > +typedef signed int Vint __attribute__((vector_size (8))); > +void > +foo_int (Vint *a) > +{ > + *a = *a * 3; > +} > -- > 2.7.4
Re: [PATCH] testsuite: Disable zero-scratch-regs-{8, 9, 10, 11}.c on s390* [PR97680]
> That is true, but nothing really happened during the 5 months that the tests > have been failing on many other architectures (except that powerpc and arm > had skipped those tests). There has been a PR open for all those 5 months. So what? This is not the first example and I don't see anything special with it. You or maintainers can decide to XFAIL particular architectures at will, but hiding the failures by default is IMO not appropriate. > We can perhaps revert the skips after branching GCC 11 off, but I have > little hope other target maintainers will do what you did, so unsure if it > would help. And the changes need people familiar with each of the backends > to decide what needs to be done and what is doable. That's exactly the same situation as for -fstack-usage/-Wstack-usage, where I intentionally made gcc.dg/stack-usage-1.c fail by default so that maintainers could add the missing bits; this worked relatively well. -- Eric Botcazou
RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU
[AMD Public Use] Hi Honza, > -Original Message- > From: Jan Hubicka > Sent: Wednesday, March 31, 2021 1:15 AM > To: Kumar, Venkataramanan > Cc: Uros Bizjak ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 > CPU > > [CAUTION: External Email] > > Hi, > this patch backports the initial support to gcc10 branch. Since the > trunk and branch diverged there is non-trivial change to cpuinfo > discovery. I do; > > --- a/libgcc/config/i386/cpuinfo.c > +++ b/libgcc/config/i386/cpuinfo.c > @@ -111,6 +111,12 @@ get_amd_cpu (unsigned int family, unsigned int model) >if (model >= 0x30) > __cpu_model.__cpu_subtype = AMDFAM17H_ZNVER2; >break; > +case 0x19: > + __cpu_model.__cpu_type = AMDFAM19H; > + /* AMD family 19h version 1. */ > + if (model <= 0x0f) > + __cpu_model.__cpu_subtype = AMDFAM19H_ZNVER3; > + break; > default: >break; > } > > While your patch also sets ZNVER3 for case where VAES is supporte that > would require backporting more of logic detecting VAES. Is that > necessary? I think you are referring to the below change. diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c index ecdad5765d5..2bfa037dd8b 100644 --- a/gcc/config/i386/driver-i386.c +++ b/gcc/config/i386/driver-i386.c @@ -455,6 +455,8 @@ const char *host_detect_local_cpu (int argc, const char **argv) processor = PROCESSOR_GEODE; else if (has_feature (FEATURE_MOVBE) && family == 22) processor = PROCESSOR_BTVER2; + else if (has_feature (FEATURE_VAES)) + processor = PROCESSOR_ZNVER3; else if (has_feature (FEATURE_CLWB)) processor = PROCESSOR_ZNVER2; My understanding is that when we use -march=native on znver3 machine it would check for "vaes" to detect the machine. Otherwise it would detect it as znver2 machine. So we need that detection logic. > I see it may make znver3 to be defaulted on future znver4 if > it stays with amdfam19, but we did not do this before. Usually we have at least one differentiating ISA or change in family name or model name going from one processor to another. > > Bootstrapped/regtested x86_64-linux. With -march=native on znver3 > machine we get right flags, but trunk in addition passes: > > -mno-amx-bf16 > -mno-amx-int8 > -mno-amx-tile > -mno-avxvnni > -mno-hreset > -mno-kl > -mno-serialize > -mno-tsxldtrk > -mno-uintr > -mno-widekl > > Which are options we did not backported. > Atop of that I plan to backport the tuning patches with exception of > gather which seems bit controversal and can wait for gcc11. Ok that seems fine. Regards, Venkat. > > Honza > > 2021-03-30 Jan Hubicka > > Backport > > Venkataramanan Kumar > Sharavan Kumar > * common/config/i386/cpuinfo.h (get_amd_cpu) recognize znver3. > * common/config/i386/i386-common.c (processor_names): Add > znver3. > (processor_alias_table): Add znver3 and AMDFAM19H entry. > * common/config/i386/i386-cpuinfo.h (processor_types): Add > AMDFAM19H. > (processor_subtypes): AMDFAM19H_ZNVER3. > * config.gcc (i[34567]86-*-linux* | ...): Likewise. > * config/i386/driver-i386.c: (host_detect_local_cpu): Let > -march=native recognize znver3 processors. > * config/i386/i386-c.c (ix86_target_macros_internal): Add > znver3. > * config/i386/i386-options.c (m_znver3): New definition. > (m_ZNVER): Include m_znver3. > (processor_cost_table): Add znver3. > * config/i386/i386.c (ix86_reassociation_width): Likewise. > * config/i386/i386.h (TARGET_znver3): New definition. > (enum processor_type): Add PROCESSOR_ZNVER3. > * config/i386/i386.md (define_attr "cpu"): Add znver3. > * config/i386/x86-tune-sched.c: (ix86_issue_rate): Likewise. > (ix86_adjust_cost): Likewise. > * config/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_CHAINS: > Likewise. > * config/i386/znver1.md: Add new reservations for znver3. > * doc/extend.texi: Add details about znver3. > * doc/invoke.texi: Likewise. > > gcc/testsuite/ChangeLog: > > 2021-03-30 Jan Hubicka > > * gcc.target/i386/funcspec-56.inc: Handle new march. > > libgcc/ChangeLog: > > 2021-03-30 Jan Hubicka > > * config/i386/cpuinfo.c (get_amd_cpu): Support amdfam19. > * config/i386/cpuinfo.h (enum processor_types): Add AMDFAM19H. > (enum processor_subtypes): Add AMDFAM19H_ZNVER3. > > diff --git a/gcc/common/config/i386/i386-common.c > b/gcc/common/config/i386/i386-common.c > index 1e4d25f052a..97335d42af1 100644 > --- a/gcc/common/config/i386/i386-common.c > +++ b/gcc/common/config/i386/i386-common.c > @@ -1582,7 +1582,8 @@ const char *const processor_names[] = >"btver1", >"btver2", >"znver1", > - "znver2" > + "znver2", > + "znver3" >
Re: Patch ping
On Wed, 31 Mar 2021, Jakub Jelinek wrote: > Hi! > > I'd like to ping the PR98860 P1 fix - workaround for PECOFF linkers without > DWARF5 support - to make -gdwarf-4 the default in such configurations. > > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567245.html OK. Richard. > Thanks > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Patch ping
Hi! I'd like to ping the PR98860 P1 fix - workaround for PECOFF linkers without DWARF5 support - to make -gdwarf-4 the default in such configurations. https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567245.html Thanks Jakub
Re: [PATCH] testsuite: Disable zero-scratch-regs-{8,9,10,11}.c on s390* [PR97680]
On Wed, Mar 31, 2021 at 08:01:29AM +0200, Eric Botcazou wrote: > > It looks like the latter - I've seen no attempt by the original authors to > > make the feature work on more targets than they cared for. > > On the other hand, if you hide the failures, there is essentially zero chance > that architecture maintainers pick up the pieces (I personally implemented > the > SPARC support only because I had ran into the failures in the testsuite). So > doing the inverse filtering sounds quite counterproductive to me and IMO it's > up to the architecture maintainers to decide on a case-by-case basis. That is true, but nothing really happened during the 5 months that the tests have been failing on many other architectures (except that powerpc and arm had skipped those tests). There has been a PR open for all those 5 months. We can perhaps revert the skips after branching GCC 11 off, but I have little hope other target maintainers will do what you did, so unsure if it would help. And the changes need people familiar with each of the backends to decide what needs to be done and what is doable. Jakub
Re: [PATCH v2 1/3] x86: Update memcpy/memset inline strategies for Ice Lake
On Tue, Mar 23, 2021 at 12:59 AM H.J. Lu via Gcc-patches wrote: > > On Mon, Mar 22, 2021 at 7:10 AM Jan Hubicka wrote: > > > > > > > > gcc/ > > > > > > * config/i386/i386-expand.c (expand_set_or_cpymem_via_rep): > > > For TARGET_PREFER_KNOWN_REP_MOVSB_STOSB, don't convert QImode > > > to SImode. > > > (decide_alg): For TARGET_PREFER_KNOWN_REP_MOVSB_STOSB, use > > > "rep movsb/stosb" only for known sizes. > > > * config/i386/i386-options.c (processor_cost_table): Use Ice > > > Lake cost for Cannon Lake, Ice Lake, Tiger Lake, Sapphire > > > Rapids and Alder Lake. > > > * config/i386/i386.h (TARGET_PREFER_KNOWN_REP_MOVSB_STOSB): New. > > > * config/i386/x86-tune-costs.h (icelake_memcpy): New. > > > (icelake_memset): Likewise. > > > (icelake_cost): Likewise. > > > * config/i386/x86-tune.def (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): > > > New. > > > > It looks like X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB is quite obviously > > benefical and independent of the rest of changes. I think we will need > > to discuss bit more the move ratio and the code size/uop cache polution > > issues - one option would be to use increased limits for -O3 only. > > My change only increases CLEAR_RATIO, not MOVE_RATIO. We are > checking code size impacts on SPEC CPU 2017 and eembc. > > > Can you break this out to independent patch? I also wonder if it owuld > > X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB improves performance > only when memcpy/memset costs and MOVE_RATIO are updated the same time, > like: > > https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567096.html > > Make it a standalone means moving from Ice Lake patch to Skylake patch. > > > not be more readable to special case this just on the beggining of > > decide_alg. > > > @@ -6890,6 +6891,7 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT > > > expected_size, > > >const struct processor_costs *cost; > > >int i; > > >bool any_alg_usable_p = false; > > > + bool known_size_p = expected_size != -1; > > > > expected_size is not -1 if we have profile feedback and we detected from > > histogram average size of a block. It seems to me that from description > > that you want the const to be actual compile time constant that would be > > min_size == max_size I guess. > > > > You are right. Here is the v2 patch with min_size != max_size check for > unknown size. OK. Thanks, Richard. > Thanks. > > -- > H.J.
Re: [PATCH] testsuite, v2: Disable zero-scratch-regs-{8, 9, 10, 11}.c on all but ... [PR97680]
On Tue, Mar 30, 2021 at 5:01 PM Jakub Jelinek wrote: > > On Tue, Mar 30, 2021 at 02:33:17PM +0200, Richard Biener via Gcc-patches > wrote: > > > I don't know, perhaps. > > > Seems the target hook is only defined on > > > config/i386/i386.c:#undef TARGET_ZERO_CALL_USED_REGS > > > config/i386/i386.c:#define TARGET_ZERO_CALL_USED_REGS > > > ix86_zero_call_used_regs > > > config/sparc/sparc.c:#undef TARGET_ZERO_CALL_USED_REGS > > > config/sparc/sparc.c:#define TARGET_ZERO_CALL_USED_REGS > > > sparc_zero_call_used_regs > > > but apparently many of the tests actually succeed on various targets that > > > don't define those hooks. E.g. I haven't seen them to fail on aarch64, > > > on arm only the -10.c fails, on powerpc*/s390* all {8,9,10,11} fail (plus > > > 5 is skipped on power*-aix*). > > > On ia64 according to testresults {6,7,8,9,10,11} fail, some with ICEs. > > > On mipsel according to testresults {9,10,11} fail, some with ICEs. > > > On nvptx at least 1-9 succeed, 10-11 don't know, don't have assert.h > > > around. > > > > > > So, do we want to fill in negative dg-skip-if for the 6-11 tests or > > > positive? In any case, is there any hope any of the maintainers or the > > > original submitter will change anything for GCC 12, or are we going to end > > > up with a very narrowly supported feature? > > > > It looks like the latter - I've seen no attempt by the original authors to > > make > > the feature work on more targets than they cared for. > > So, like this instead? > > I've kept {5,6,7} with aix,ia64,ia64 skipped because those seems like > outliers, it works pretty much everywhere but on those. > The rest have known good targets. > > Tested on x86_64-linux, verified all tests are run there. OK. > 2021-03-30 Jakub Jelinek > > PR testsuite/97680 > * c-c++-common/zero-scratch-regs-6.c: Skip on ia64. > * c-c++-common/zero-scratch-regs-7.c: Likewise. > * c-c++-common/zero-scratch-regs-8.c: Change from dg-skip-if of > selected unsupported triplets to all targets but selected triplets > of supported targets. > * c-c++-common/zero-scratch-regs-9.c: Likewise. > * c-c++-common/zero-scratch-regs-10.c: Likewise. > * c-c++-common/zero-scratch-regs-11.c: Likewise. > > --- gcc/testsuite/c-c++-common/zero-scratch-regs-6.c.jj 2020-10-31 > 17:41:19.793739605 +0100 > +++ gcc/testsuite/c-c++-common/zero-scratch-regs-6.c2021-03-30 > 16:19:00.509582587 +0200 > @@ -1,4 +1,5 @@ > /* { dg-do run } */ > +/* { dg-skip-if "not implemented" { ia64*-*-* } } */ > /* { dg-options "-O2 -fzero-call-used-regs=all-gpr-arg" } */ > > #include "zero-scratch-regs-1.c" > --- gcc/testsuite/c-c++-common/zero-scratch-regs-7.c.jj 2020-10-31 > 17:41:19.793739605 +0100 > +++ gcc/testsuite/c-c++-common/zero-scratch-regs-7.c2021-03-30 > 16:19:12.059454807 +0200 > @@ -1,4 +1,5 @@ > /* { dg-do run } */ > +/* { dg-skip-if "not implemented" { ia64*-*-* } } */ > /* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */ > > #include "zero-scratch-regs-1.c" > --- gcc/testsuite/c-c++-common/zero-scratch-regs-8.c.jj 2020-11-11 > 01:46:03.392696119 +0100 > +++ gcc/testsuite/c-c++-common/zero-scratch-regs-8.c2021-03-30 > 16:21:28.453945834 +0200 > @@ -1,5 +1,5 @@ > /* { dg-do run } */ > -/* { dg-skip-if "not implemented" { powerpc*-*-* } } */ > +/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* > aarch64*-*-* arm*-*-* nvptx*-*-* } } } */ > /* { dg-options "-O2 -fzero-call-used-regs=all-arg" } */ > > #include "zero-scratch-regs-1.c" > --- gcc/testsuite/c-c++-common/zero-scratch-regs-9.c.jj 2020-11-11 > 01:46:03.392696119 +0100 > +++ gcc/testsuite/c-c++-common/zero-scratch-regs-9.c2021-03-30 > 16:21:41.711799156 +0200 > @@ -1,5 +1,5 @@ > /* { dg-do run } */ > -/* { dg-skip-if "not implemented" { powerpc*-*-* } } */ > +/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* > aarch64*-*-* arm*-*-* nvptx*-*-* } } } */ > /* { dg-options "-O2 -fzero-call-used-regs=all" } */ > > #include "zero-scratch-regs-1.c" > --- gcc/testsuite/c-c++-common/zero-scratch-regs-10.c.jj2021-03-18 > 15:32:56.459617723 +0100 > +++ gcc/testsuite/c-c++-common/zero-scratch-regs-10.c 2021-03-30 > 16:21:55.017651951 +0200 > @@ -1,6 +1,5 @@ > /* { dg-do run } */ > -/* { dg-skip-if "not implemented" { powerpc*-*-* } } */ > -/* { dg-skip-if "not implemented" { arm*-*-* } } */ > +/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* > aarch64*-*-* nvptx*-*-* } } } */ > /* { dg-options "-O2" } */ > > #include > --- gcc/testsuite/c-c++-common/zero-scratch-regs-11.c.jj2020-11-11 > 01:46:03.392696119 +0100 > +++ gcc/testsuite/c-c++-common/zero-scratch-regs-11.c 2021-03-30 > 16:22:04.439547999 +0200 > @@ -1,5 +1,5 @@ > /* { dg-do run } */ > -/* { dg-skip-if "not implemented" { powerpc*-*-* } } */ > +/* { dg-skip-if "not implemented" { ! { i?86*-*-* x86_64*-*-* sparc*-*-* > aarch64*-*-*
Re: [PATCH] testsuite: Disable zero-scratch-regs-{8, 9, 10, 11}.c on s390* [PR97680]
> It looks like the latter - I've seen no attempt by the original authors to > make the feature work on more targets than they cared for. On the other hand, if you hide the failures, there is essentially zero chance that architecture maintainers pick up the pieces (I personally implemented the SPARC support only because I had ran into the failures in the testsuite). So doing the inverse filtering sounds quite counterproductive to me and IMO it's up to the architecture maintainers to decide on a case-by-case basis. -- Eric Botcazou