[PATCH v2 4/4] Rename duplicate_loop_to_header_edge to duplicate_loop_body_to_header_edge

2021-10-26 Thread Xionghu Luo via Gcc-patches
gcc/ChangeLog:

* cfghooks.c (cfg_hook_duplicate_loop_to_header_edge): Rename
duplicate_loop_to_header_edge to
duplicate_loop_body_to_header_edge.
(cfg_hook_duplicate_loop_body_to_header_edge): Likewise.
* cfghooks.h (struct cfg_hooks): Likewise.
(cfg_hook_duplicate_loop_body_to_header_edge): Likewise.
* cfgloopmanip.c (duplicate_loop_body_to_header_edge): Likewise.
(clone_loop_to_header_edge): Likewise.
* cfgloopmanip.h (duplicate_loop_body_to_header_edge): Likewise.
* cfgrtl.c (struct cfg_hooks): Likewise.
* doc/loop.texi: Likewise.
* loop-unroll.c (unroll_loop_constant_iterations): Likewise.
(unroll_loop_runtime_iterations): Likewise.
(unroll_loop_stupid): Likewise.
(apply_opt_in_copies): Likewise.
* tree-cfg.c (struct cfg_hooks): Likewise.
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Likewise.
(try_peel_loop): Likewise.
* tree-ssa-loop-manip.c (copy_phi_node_args): Likewise.
(gimple_duplicate_loop_body_to_header_edge): Likewise.
(tree_transform_and_unroll_loop): Likewise.
* tree-ssa-loop-manip.h (gimple_duplicate_loop_body_to_header_edge):
Likewise.
---
 gcc/cfghooks.c  | 27 ---
 gcc/cfghooks.h  | 13 ++---
 gcc/cfgloopmanip.c  |  9 -
 gcc/cfgloopmanip.h  |  6 +++---
 gcc/cfgrtl.c|  2 +-
 gcc/doc/loop.texi   |  4 ++--
 gcc/loop-unroll.c   | 27 ---
 gcc/tree-cfg.c  |  2 +-
 gcc/tree-ssa-loop-ivcanon.c |  4 ++--
 gcc/tree-ssa-loop-manip.c   | 22 --
 gcc/tree-ssa-loop-manip.h   |  7 +++
 11 files changed, 58 insertions(+), 65 deletions(-)

diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c
index 50b9b177639..23eb364bee6 100644
--- a/gcc/cfghooks.c
+++ b/gcc/cfghooks.c
@@ -1226,25 +1226,22 @@ lv_flush_pending_stmts (edge e)
 cfg_hooks->flush_pending_stmts (e);
 }
 
-/* Loop versioning uses the duplicate_loop_to_header_edge to create
+/* Loop versioning uses the duplicate_loop_body_to_header_edge to create
a new version of the loop basic-blocks, the parameters here are
-   exactly the same as in duplicate_loop_to_header_edge or
-   tree_duplicate_loop_to_header_edge; while in tree-ssa there is
+   exactly the same as in duplicate_loop_body_to_header_edge or
+   tree_duplicate_loop_body_to_header_edge; while in tree-ssa there is
additional work to maintain ssa information that's why there is
-   a need to call the tree_duplicate_loop_to_header_edge rather
-   than duplicate_loop_to_header_edge when we are in tree mode.  */
+   a need to call the tree_duplicate_loop_body_to_header_edge rather
+   than duplicate_loop_body_to_header_edge when we are in tree mode.  */
 bool
-cfg_hook_duplicate_loop_to_header_edge (class loop *loop, edge e,
-   unsigned int ndupl,
-   sbitmap wont_exit, edge orig,
-   vec *to_remove,
-   int flags)
+cfg_hook_duplicate_loop_body_to_header_edge (class loop *loop, edge e,
+unsigned int ndupl,
+sbitmap wont_exit, edge orig,
+vec *to_remove, int flags)
 {
-  gcc_assert (cfg_hooks->cfg_hook_duplicate_loop_to_header_edge);
-  return cfg_hooks->cfg_hook_duplicate_loop_to_header_edge (loop, e,
-   ndupl, wont_exit,
-   orig, to_remove,
-   flags);
+  gcc_assert (cfg_hooks->cfg_hook_duplicate_loop_body_to_header_edge);
+  return cfg_hooks->cfg_hook_duplicate_loop_body_to_header_edge (
+loop, e, ndupl, wont_exit, orig, to_remove, flags);
 }
 
 /* Conditional jumps are represented differently in trees and RTL,
diff --git a/gcc/cfghooks.h b/gcc/cfghooks.h
index 8645fe5b9e7..29aa2bf0636 100644
--- a/gcc/cfghooks.h
+++ b/gcc/cfghooks.h
@@ -166,7 +166,7 @@ struct cfg_hooks
 
   /* A hook for duplicating loop in CFG, currently this is used
  in loop versioning.  */
-  bool (*cfg_hook_duplicate_loop_to_header_edge) (class loop *, edge,
+  bool (*cfg_hook_duplicate_loop_body_to_header_edge) (class loop *, edge,
  unsigned, sbitmap,
  edge, vec *,
  int);
@@ -250,12 +250,11 @@ extern bool block_ends_with_condjump_p (const_basic_block 
bb);
 extern int flow_call_edges_add (sbitmap);
 extern void execute_on_growing_pred (edge);
 extern void execute_on_shrinking_pred (edge);
-extern bool cfg_hook_duplicate_loop_to_header_edge (class loop *loop, edge,
- 

[PATCH v2 3/4] Rename loop_version to clone_loop_to_header_edge.

2021-10-26 Thread Xionghu Luo via Gcc-patches
Name loop_copy is used in gcc/cfg.c already.

gcc/ChangeLog:

* cfgloopmanip.c (force_single_succ_latches): Rename
loop_version to clone_loop_to_header_edge.
(lv_adjust_loop_entry_edge): Likewise.
(loop_version): Likewise.
(clone_loop_to_header_edge): Likewise.
* cfgloopmanip.h (class loop): Likewise.
(clone_loop_to_header_edge): Likewise.
* gimple-loop-versioning.cc (loop_versioning::version_loop): Likewise.
* modulo-sched.c (sms_schedule): Likewise.
* tree-if-conv.c (version_loop_for_if_conversion): Likewise.
* tree-loop-distribution.c (version_loop_by_alias_check): Likewise.
* tree-parloops.c (gen_parallel_loop): Likewise.
* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Likewise.
* tree-ssa-loop-split.c (split_loop): Likewise.
(get_cond_branch_to_split_loop): Likewise.
(do_split_loop_on_cond): Likewise.
* tree-ssa-loop-unswitch.c (tree_unswitch_loop): Likewise.
* tree-vect-loop-manip.c (vect_loop_versioning): Likewise.
---
 gcc/cfgloopmanip.c| 22 +++---
 gcc/cfgloopmanip.h|  8 
 gcc/gimple-loop-versioning.cc | 11 ++-
 gcc/modulo-sched.c|  6 +++---
 gcc/tree-if-conv.c| 13 +++--
 gcc/tree-loop-distribution.c  |  4 ++--
 gcc/tree-parloops.c   | 10 +-
 gcc/tree-ssa-loop-manip.c |  9 +
 gcc/tree-ssa-loop-split.c | 30 +++---
 gcc/tree-ssa-loop-unswitch.c  |  8 +++-
 gcc/tree-vect-loop-manip.c|  5 +++--
 11 files changed, 64 insertions(+), 62 deletions(-)

diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index a30ebe1cdb4..066fbddbcfe 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -1535,11 +1535,10 @@ force_single_succ_latches (void)
   loops_state_set (LOOPS_HAVE_SIMPLE_LATCHES);
 }
 
-/* This function is called from loop_version.  It splits the entry edge
-   of the loop we want to version, adds the versioning condition, and
-   adjust the edges to the two versions of the loop appropriately.
-   e is an incoming edge. Returns the basic block containing the
-   condition.
+/* This function is called from clone_loop_to_header_edge.  It splits the entry
+  edge of the loop we want to version, adds the versioning condition, and 
adjust
+  the edges to the two versions of the loop appropriately. e is an incoming
+  edge. Returns the basic block containing the condition.
 
--- edge e  > [second_head]
 
@@ -1588,7 +1587,7 @@ lv_adjust_loop_entry_edge (basic_block first_head, 
basic_block second_head,
   return new_head;
 }
 
-/* Main entry point for Loop Versioning transformation.
+/* Main entry point for Loop copy transformation.
 
This transformation given a condition and a loop, creates
-if (condition) { loop_copy1 } else { loop_copy2 },
@@ -1609,11 +1608,12 @@ lv_adjust_loop_entry_edge (basic_block first_head, 
basic_block second_head,
instruction stream, otherwise it is placed before LOOP.  */
 
 class loop *
-loop_version (class loop *loop,
- void *cond_expr, basic_block *condition_bb,
- profile_probability then_prob, profile_probability else_prob,
- profile_probability then_scale, profile_probability else_scale,
- bool place_after)
+clone_loop_to_header_edge (class loop *loop, void *cond_expr,
+  basic_block *condition_bb,
+  profile_probability then_prob,
+  profile_probability else_prob,
+  profile_probability then_scale,
+  profile_probability else_scale, bool place_after)
 {
   basic_block first_head, second_head;
   edge entry, latch_edge;
diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h
index 312a3b48d05..eac09518702 100644
--- a/gcc/cfgloopmanip.h
+++ b/gcc/cfgloopmanip.h
@@ -55,9 +55,9 @@ extern bool mfb_keep_just (edge);
 basic_block create_preheader (class loop *, int);
 extern void create_preheaders (int);
 extern void force_single_succ_latches (void);
-class loop * loop_version (class loop *, void *,
-   basic_block *,
-   profile_probability, profile_probability,
-   profile_probability, profile_probability, bool);
+class loop *
+clone_loop_to_header_edge (class loop *, void *, basic_block *,
+  profile_probability, profile_probability,
+  profile_probability, profile_probability, bool);
 
 #endif /* GCC_CFGLOOPMANIP_H */
diff --git a/gcc/gimple-loop-versioning.cc b/gcc/gimple-loop-versioning.cc
index 15e0803dc29..c0fa60b91f8 100644
--- a/gcc/gimple-loop-versioning.cc
+++ b/gcc/gimple-loop-versioning.cc
@@ -1686,11 +1686,12 @@ loop_versioning::version_loop (class loop *loop)
   /* Version the loop.  */
   initialize_original_copy_tables ();
   basic_block cond_bb;
-  li.op

[PATCH v2 2/4] Refactor loop_version

2021-10-26 Thread Xionghu Luo via Gcc-patches
loop_version currently does lv_adjust_loop_entry_edge
before it loopifys the copy inserted on the header.  This patch moves
the condition generation later and thus we have four pieces to help
understanding of how the adjustment works:
 1) duplicating the loop on the entry edge.
 2) loopify the duplicated new loop.
 3) adjusting the CFG to insert a condition branching to either loop
 with lv_adjust_loop_entry_edge.
 4) From loopify extract the scale_loop_frequencies bits.

Also removed some piece of code seems obviously useless which is not
completely sure:
 - redirect_all_edges since it is false and loopify only called once.
 - extract_cond_bb_edges and lv_flush_pending_stmts (false_edge) as the
 edge is not redirected actually.

gcc/ChangeLog:

* cfgloopmanip.c (loop_version): Refactor loopify to
loop_version.  Move condition generation after loopify.
(loopify): Delete.
* cfgloopmanip.h (loopify): Delete.
---
 gcc/cfgloopmanip.c | 113 -
 gcc/cfgloopmanip.h |   3 --
 2 files changed, 29 insertions(+), 87 deletions(-)

diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 82c242dd720..a30ebe1cdb4 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -846,71 +846,6 @@ create_empty_loop_on_edge (edge entry_edge,
   return loop;
 }
 
-/* Make area between HEADER_EDGE and LATCH_EDGE a loop by connecting
-   latch to header and update loop tree and dominators
-   accordingly. Everything between them plus LATCH_EDGE destination must
-   be dominated by HEADER_EDGE destination, and back-reachable from
-   LATCH_EDGE source.  HEADER_EDGE is redirected to basic block SWITCH_BB,
-   FALSE_EDGE of SWITCH_BB to original destination of HEADER_EDGE and
-   TRUE_EDGE of SWITCH_BB to original destination of LATCH_EDGE.
-   Returns the newly created loop.  Frequencies and counts in the new loop
-   are scaled by FALSE_SCALE and in the old one by TRUE_SCALE.  */
-
-class loop *
-loopify (edge latch_edge, edge header_edge,
-basic_block switch_bb, edge true_edge, edge false_edge,
-bool redirect_all_edges, profile_probability true_scale,
-profile_probability false_scale)
-{
-  basic_block succ_bb = latch_edge->dest;
-  basic_block pred_bb = header_edge->src;
-  class loop *loop = alloc_loop ();
-  class loop *outer = loop_outer (succ_bb->loop_father);
-  profile_count cnt;
-
-  loop->header = header_edge->dest;
-  loop->latch = latch_edge->src;
-
-  cnt = header_edge->count ();
-
-  /* Redirect edges.  */
-  loop_redirect_edge (latch_edge, loop->header);
-  loop_redirect_edge (true_edge, succ_bb);
-
-  /* During loop versioning, one of the switch_bb edge is already properly
- set. Do not redirect it again unless redirect_all_edges is true.  */
-  if (redirect_all_edges)
-{
-  loop_redirect_edge (header_edge, switch_bb);
-  loop_redirect_edge (false_edge, loop->header);
-
-  /* Update dominators.  */
-  set_immediate_dominator (CDI_DOMINATORS, switch_bb, pred_bb);
-  set_immediate_dominator (CDI_DOMINATORS, loop->header, switch_bb);
-}
-
-  set_immediate_dominator (CDI_DOMINATORS, succ_bb, switch_bb);
-
-  /* Compute new loop.  */
-  add_loop (loop, outer);
-
-  /* Add switch_bb to appropriate loop.  */
-  if (switch_bb->loop_father)
-remove_bb_from_loops (switch_bb);
-  add_bb_to_loop (switch_bb, outer);
-
-  /* Fix counts.  */
-  if (redirect_all_edges)
-{
-  switch_bb->count = cnt;
-}
-  scale_loop_frequencies (loop, false_scale);
-  scale_loop_frequencies (succ_bb->loop_father, true_scale);
-  update_dominators_in_loop (loop);
-
-  return loop;
-}
-
 /* Remove the latch edge of a LOOP and update loops to indicate that
the LOOP was removed.  After this function, original loop latch will
have no successor, which caller is expected to fix somehow.
@@ -1681,7 +1616,7 @@ loop_version (class loop *loop,
  bool place_after)
 {
   basic_block first_head, second_head;
-  edge entry, latch_edge, true_edge, false_edge;
+  edge entry, latch_edge;
   int irred_flag;
   class loop *nloop;
   basic_block cond_bb;
@@ -1694,7 +1629,7 @@ loop_version (class loop *loop,
   /* Note down head of loop as first_head.  */
   first_head = entry->dest;
 
-  /* Duplicate loop.  */
+  /* 1) Duplicate loop on the entry edge.  */
   if (!cfg_hook_duplicate_loop_to_header_edge (loop, entry, 1,
   NULL, NULL, NULL, 0))
 {
@@ -1702,11 +1637,28 @@ loop_version (class loop *loop,
   return NULL;
 }
 
+  /* 2) loopify the duplicated new loop. */
+  latch_edge = single_succ_edge (get_bb_copy (loop->latch));
+  nloop = alloc_loop ();
+  class loop *outer = loop_outer (latch_edge->dest->loop_father);
+  edge new_header_edge = single_pred_edge (get_bb_copy (loop->header));
+  nloop->header = new_header_edge->dest;
+  nloop->latch = latch_edge->src;
+  loop_redirect_edge (latch_edge, nloop->header);
+
+  /* Compute new loop.  */
+  ad

[PATCH v2 1/4] Fix loop split incorrect count and probability

2021-10-26 Thread Xionghu Luo via Gcc-patches
loop split condition is moved between loop1 and loop2, the split bb's
count and probability should also be duplicated instead of (100% vs INV),
secondly, the original loop1 and loop2 count need be propotional from the
original loop.

Regression tested pass, OK for master?

diff base/loop-cond-split-1.c.151t.lsplit  
patched/loop-cond-split-1.c.151t.lsplit:
...
   int prephitmp_16;
   int prephitmp_25;

[local count: 118111600]:
   if (n_7(D) > 0)
 goto ; [89.00%]
   else
 goto ; [11.00%]

[local count: 118111600]:
   return;

[local count: 105119324]:
   pretmp_3 = ga;

-   [local count: 955630225]:
+   [local count: 315357973]:
   # i_13 = PHI 
   # prephitmp_12 = PHI 
   if (prephitmp_12 != 0)
 goto ; [33.00%]
   else
 goto ; [67.00%]

-   [local count: 315357972]:
+   [local count: 104068130]:
   _2 = do_something ();
   ga = _2;

-   [local count: 955630225]:
+   [local count: 315357973]:
   # prephitmp_5 = PHI 
   i_10 = inc (i_13);
   if (n_7(D) > i_10)
 goto ; [89.00%]
   else
 goto ; [11.00%]

[local count: 105119324]:
   goto ; [100.00%]

-   [local count: 850510901]:
+   [local count: 280668596]:
   if (prephitmp_12 != 0)
-goto ; [100.00%]
+goto ; [33.00%]
   else
-goto ; [INV]
+goto ; [67.00%]

-   [local count: 850510901]:
+   [local count: 280668596]:
   goto ; [100.00%]

-   [count: 0]:
+   [local count: 70429947]:
   # i_23 = PHI 
   # prephitmp_25 = PHI 

-   [local count: 955630225]:
+   [local count: 640272252]:
   # i_15 = PHI 
   # prephitmp_16 = PHI 
   i_22 = inc (i_15);
   if (n_7(D) > i_22)
 goto ; [89.00%]
   else
 goto ; [11.00%]

-   [local count: 850510901]:
+   [local count: 569842305]:
   goto ; [100.00%]

 }

gcc/ChangeLog:

* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
(do_split_loop_on_cond): Likewise.
---
 gcc/tree-ssa-loop-split.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 3f6ad046623..d30782888f3 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -575,7 +575,11 @@ split_loop (class loop *loop1)
stmts2);
tree cond = build2 (guard_code, boolean_type_node, guard_init, border);
if (!initial_true)
- cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); 
+ cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+
+   edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
+  ? EDGE_SUCC (bbs[i], 0)
+  : EDGE_SUCC (bbs[i], 1);
 
/* Now version the loop, placing loop2 after loop1 connecting
   them, and fix up SSA form for that.  */
@@ -583,10 +587,10 @@ split_loop (class loop *loop1)
basic_block cond_bb;
 
class loop *loop2 = loop_version (loop1, cond, &cond_bb,
-  profile_probability::always (),
-  profile_probability::always (),
-  profile_probability::always (),
-  profile_probability::always (),
+  true_edge->probability,
+  true_edge->probability.invert (),
+  true_edge->probability,
+  true_edge->probability.invert (),
   true);
gcc_assert (loop2);
 
@@ -1486,10 +1490,10 @@ do_split_loop_on_cond (struct loop *loop1, edge 
invar_branch)
   initialize_original_copy_tables ();
 
   struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
-profile_probability::always (),
-profile_probability::never (),
-profile_probability::always (),
-profile_probability::always (),
+invar_branch->probability.invert (),
+invar_branch->probability,
+invar_branch->probability.invert (),
+invar_branch->probability,
 true);
   if (!loop2)
 {
@@ -1530,6 +1534,9 @@ do_split_loop_on_cond (struct loop *loop1, edge 
invar_branch)
   to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
   to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
 
+  to_loop1->probability = invar_branch->probability.invert ();
+  to_loop2->probability = invar_branch->probability;
+
   /* Due to introduction of a control flow edge from loop1 latch to loop2
  pre-header, we should update PHIs in loop2 to reflect this connection
  between loop1 and loop2.  */
-- 
2.27.0.90.geebb51ba8c



[PATCH v2 0/4] loop split fix and functions renaming

2021-10-26 Thread Xionghu Luo via Gcc-patches
This patchset is an followed update from [1].
Patch 1 is expecting review comments from Honza[2];
Patch 2 refactors loop_version to remove loopify call and adjust
condition generation later than loopify;
Patch 3 and Patch 4 are function renamings to help better understanding.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582600.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582607.html

Xionghu Luo (4):
  Fix loop split incorrect count and probability
  Refactor loop_version
  Rename loop_version to clone_loop_to_header_edge.
  Rename duplicate_loop_to_header_edge to duplicate_loop_body_to_header_edge

 gcc/cfghooks.c|  27 +++
 gcc/cfghooks.h|  13 ++-
 gcc/cfgloopmanip.c| 144 +++---
 gcc/cfgloopmanip.h|  17 ++--
 gcc/cfgrtl.c  |   2 +-
 gcc/doc/loop.texi |   4 +-
 gcc/gimple-loop-versioning.cc |  11 +--
 gcc/loop-unroll.c |  27 +++
 gcc/modulo-sched.c|   6 +-
 gcc/tree-cfg.c|   2 +-
 gcc/tree-if-conv.c|  13 +--
 gcc/tree-loop-distribution.c  |   4 +-
 gcc/tree-parloops.c   |  10 +--
 gcc/tree-ssa-loop-ivcanon.c   |   4 +-
 gcc/tree-ssa-loop-manip.c |  31 
 gcc/tree-ssa-loop-manip.h |   7 +-
 gcc/tree-ssa-loop-split.c |  39 +
 gcc/tree-ssa-loop-unswitch.c  |   8 +-
 gcc/tree-vect-loop-manip.c|   5 +-
 19 files changed, 159 insertions(+), 215 deletions(-)

-- 
2.27.0.90.geebb51ba8c



Re: [PATCH] Fix negative integer range for UInteger.

2021-10-26 Thread Richard Biener via Gcc-patches
On Tue, Oct 26, 2021 at 5:11 PM Martin Liška  wrote:
>
> Hello.
>
> We should not allow a negative IntegerRange for UInteger options.
> The negative ranges for the 2 params are default and does not make sense.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK.  (btw, can we use opts_set.param_... now instead of -1?)

Thanks,
Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * opt-functions.awk: Add new sanity checking.
> * optc-gen.awk: Add new argument to integer_range_info.
> * params.opt: Update 2 params which have negative IntegerRange.
> ---
>   gcc/opt-functions.awk | 4 +++-
>   gcc/optc-gen.awk  | 2 +-
>   gcc/params.opt| 4 ++--
>   3 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/opt-functions.awk b/gcc/opt-functions.awk
> index be9255064d7..9bc85604066 100644
> --- a/gcc/opt-functions.awk
> +++ b/gcc/opt-functions.awk
> @@ -356,7 +356,7 @@ function search_var_name(name, opt_numbers, opts, flags, 
> n_opts)
>   return ""
>   }
>
> -function integer_range_info(range_option, init, option)
> +function integer_range_info(range_option, init, option, uinteger_used)
>   {
>   if (range_option != "") {
> ival = init + 0;
> @@ -364,6 +364,8 @@ function integer_range_info(range_option, init, option)
> end = nth_arg(1, range_option) + 0;
> if (init != "" && init != "-1" && (ival < start || ival > end))
>   print "#error initial value " init " of '" option "' must be in 
> range [" start "," end "]"
> +   if (uinteger_used && start < 0)
> + print "#error '" option"': negative IntegerRange (" start ", " end 
> ") cannot be combined with UInteger"
> return start ", " end
>   }
>   else
> diff --git a/gcc/optc-gen.awk b/gcc/optc-gen.awk
> index 77e598efd60..ebc1a02fa36 100644
> --- a/gcc/optc-gen.awk
> +++ b/gcc/optc-gen.awk
> @@ -422,7 +422,7 @@ for (i = 0; i < n_opts; i++) {
>cl_flags, cl_bit_fields)
> printf("%s, %s, %s }%s\n", var_ref(opts[i], flags[i]),
>var_set(flags[i]), integer_range_info(opt_args("IntegerRange", 
> flags[i]),
> -   opt_args("Init", flags[i]), opts[i]), comma)
> +   opt_args("Init", flags[i]), opts[i], 
> flag_set_p("UInteger", flags[i])), comma)
>
> # Bump up the informational option index.
> ++optindex
> diff --git a/gcc/params.opt b/gcc/params.opt
> index 393d52bc660..1bad2b51df0 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -398,7 +398,7 @@ Common Joined UInteger Var(param_lim_expensive) Init(20) 
> Param Optimization
>   The minimum cost of an expensive expression in the loop invariant motion.
>
>   -param=logical-op-non-short-circuit=
> -Common Joined UInteger Var(param_logical_op_non_short_circuit) Init(-1) 
> IntegerRange(-1, 1) Param
> +Common Joined UInteger Var(param_logical_op_non_short_circuit) Init(-1) 
> IntegerRange(0, 1) Param
>   True if a non-short-circuit operation is optimal.
>
>   -param=loop-block-tile-size=
> @@ -1128,7 +1128,7 @@ Common Joined UInteger Var(param_vect_epilogues_nomask) 
> Init(1) IntegerRange(0,
>   Enable loop epilogue vectorization using smaller vector size.
>
>   -param=vect-max-peeling-for-alignment=
> -Common Joined UInteger Var(param_vect_max_peeling_for_alignment) Init(-1) 
> IntegerRange(-1, 64) Param Optimization
> +Common Joined UInteger Var(param_vect_max_peeling_for_alignment) Init(-1) 
> IntegerRange(0, 64) Param Optimization
>   Maximum number of loop peels to enhance alignment of data references in a 
> loop.
>
>   -param=vect-max-version-for-alias-checks=
> --
> 2.33.1
>


[r12-4725 Regression] FAIL: libgomp.c/doacross-1.c (test for excess errors) on Linux/x86_64

2021-10-26 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

88b504b7a8c5affb0ffa97990d22af2b199e36ed is the first bad commit
commit 88b504b7a8c5affb0ffa97990d22af2b199e36ed
Author: Martin Sebor 
Date:   Tue Oct 26 14:34:16 2021 -0600

Detect overflow by atomic functions [PR102453].

caused

FAIL: gcc.dg/Warray-bounds-90.c (test for excess errors)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 100)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 103)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 105)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 107)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 110)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 112)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 114)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 121)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 124)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 127)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 129)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 132)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 134)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 136)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 139)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 141)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 144)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 146)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 66)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 69)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 73)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 77)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 85)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 88)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 91)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 93)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 96)
FAIL: gcc.dg/Warray-bounds-90.c  (test for warnings, line 98)
FAIL: libgomp.c/doacross-1.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4725/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/Warray-bounds-90.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gcc.dg/Warray-bounds-90.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c.exp=libgomp.c/doacross-1.c --target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c.exp=libgomp.c/doacross-1.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r12-4726 Regression] FAIL: g++.dg/warn/Wstringop-overflow-4.C -std=gnu++98 (test for excess errors) on Linux/x86_64

2021-10-26 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

9a27acc30a34b7854db32eac562306cebac6fa1e is the first bad commit
commit 9a27acc30a34b7854db32eac562306cebac6fa1e
Author: Martin Sebor 
Date:   Tue Oct 26 14:38:11 2021 -0600

Make full use of context-sensitive ranges in access warnings.

caused

FAIL: g++.dg/warn/Wstringop-overflow-4.C  -std=gnu++98 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4726/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-4.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-4.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-4.C 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-4.C 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


RISCV: Add zmmul extension

2021-10-26 Thread shihua
From: Liaoshihua 

---
 gcc/common/config/riscv/riscv-common.c |  3 +++
 gcc/config/riscv/riscv-c.c |  2 +-
 gcc/config/riscv/riscv-opts.h  |  3 +++
 gcc/config/riscv/riscv.c   |  5 -
 gcc/config/riscv/riscv.md  | 30 +-
 gcc/config/riscv/riscv.opt |  3 +++
 6 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index 34b74e52a2d..ad3180677be 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -101,6 +101,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
   {"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
 
+  {"zmmul",  ISA_SPEC_CLASS_NONE, 0, 1},
   /* Terminate the list.  */
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
@@ -904,6 +905,8 @@ static const riscv_ext_flag_table_t riscv_ext_flag_table[] =
   {"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
   {"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
 
+  {"zmmul", &gcc_options::x_riscv_zmmul_subext, MASK_ZMMUL},
+
   {NULL, NULL, 0}
 };
 
diff --git a/gcc/config/riscv/riscv-c.c b/gcc/config/riscv/riscv-c.c
index efd4a61ea29..72aa4e389c0 100644
--- a/gcc/config/riscv/riscv-c.c
+++ b/gcc/config/riscv/riscv-c.c
@@ -47,7 +47,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
   if (TARGET_ATOMIC)
 builtin_define ("__riscv_atomic");
 
-  if (TARGET_MUL)
+  if (TARGET_MUL || TARGET_ZMMUL)
 builtin_define ("__riscv_mul");
   if (TARGET_DIV)
 builtin_define ("__riscv_div");
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index f4cf6ca4b82..c52b18ebd80 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -73,4 +73,7 @@ enum stack_protector_guard {
 #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
 #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
 
+#define MASK_ZMMUL (1 << 0)
+#define TARGET_ZMMUL ((riscv_zmmul_subext & MASK_ZMMUL) != 0)
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 17cdf705c32..4f5cb35e625 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -1872,7 +1872,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
 case MULT:
   if (float_mode_p)
*total = tune_param->fp_mul[mode == DFmode];
-  else if (!TARGET_MUL)
+  else if (!TARGET_MUL && !TARGET_ZMMUL)
/* Estimate the cost of a library call.  */
*total = COSTS_N_INSNS (speed ? 32 : 6);
   else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
@@ -4736,6 +4736,9 @@ riscv_option_override (void)
   if (flag_pic)
 g_switch_value = 0;
 
+  /* zmmul */
+  if (TARGET_ZMMUL && TARGET_MUL)
+error ("can not use both the % and the % extension");
   /* The presence of the M extension implies that division instructions
  are present, so include them unless explicitly disabled.  */
   if (TARGET_MUL && (target_flags_explicit & MASK_DIV) == 0)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index c3687d57047..2ee7d801f1a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -600,7 +600,7 @@
   [(set (match_operand:SI  0 "register_operand" "=r")
(mult:SI (match_operand:SI 1 "register_operand" " r")
 (match_operand:SI 2 "register_operand" " r")))]
-  "TARGET_MUL"
+  "(TARGET_MUL || TARGET_ZMMUL)"
   { return TARGET_64BIT ? "mulw\t%0,%1,%2" : "mul\t%0,%1,%2"; }
   [(set_attr "type" "imul")
(set_attr "mode" "SI")])
@@ -609,7 +609,7 @@
   [(set (match_operand:DI  0 "register_operand" "=r")
(mult:DI (match_operand:DI 1 "register_operand" " r")
 (match_operand:DI 2 "register_operand" " r")))]
-  "TARGET_MUL && TARGET_64BIT"
+  "(TARGET_MUL || TARGET_ZMMUL) && TARGET_64BIT"
   "mul\t%0,%1,%2"
   [(set_attr "type" "imul")
(set_attr "mode" "DI")])
@@ -619,7 +619,7 @@
(sign_extend:DI
(mult:SI (match_operand:SI 1 "register_operand" " r")
 (match_operand:SI 2 "register_operand" " r"]
-  "TARGET_MUL && TARGET_64BIT"
+  "(TARGET_MUL || TARGET_ZMMUL) && TARGET_64BIT"
   "mulw\t%0,%1,%2"
   [(set_attr "type" "imul")
(set_attr "mode" "SI")])
@@ -630,7 +630,7 @@
  (match_operator:SI 3 "subreg_lowpart_operator"
[(mult:DI (match_operand:DI 1 "register_operand" " r")
  (match_operand:DI 2 "register_operand" " r"))])))]
-  "TARGET_MUL && TARGET_64BIT"
+  "(TARGET_MUL || TARGET_ZMMUL) && TARGET_64BIT"
   "mulw\t%0,%1,%2"
   [(set_attr "type" "imul")
(set_attr "mode" "SI")])
@@ -648,7 +648,7 @@
   [(set (match_operand:TI 0 "register_operand")
(mult:TI (any_extend:TI (match_operand:DI 1 "register_operand"))
 (any_extend:TI (match_operand:

Re: [RFC] Don't move cold code out of loop by checking bb count

2021-10-26 Thread Xionghu Luo via Gcc-patches



On 2021/10/26 21:20, Richard Biener wrote:
> On Mon, Oct 18, 2021 at 6:29 AM Xionghu Luo  wrote:
>>
>>
>>
>> On 2021/10/15 16:11, Richard Biener wrote:
>>> On Sat, Oct 9, 2021 at 5:45 AM Xionghu Luo  wrote:

 Hi,

 On 2021/9/28 20:09, Richard Biener wrote:
> On Fri, Sep 24, 2021 at 8:29 AM Xionghu Luo  wrote:
>>
>> Update the patch to v3, not sure whether you prefer the paste style
>> and continue to link the previous thread as Segher dislikes this...
>>
>>
>> [PATCH v3] Don't move cold code out of loop by checking bb count
>>
>>
>> Changes:
>> 1. Handle max_loop in determine_max_movement instead of
>> outermost_invariant_loop.
>> 2. Remove unnecessary changes.
>> 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in 
>> can_sm_ref_p.
>> 4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused
>> infinite loop when implementing v1 and the iteration is missed to be
>> updated actually.
>>
>> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
>> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html
>>
>> There was a patch trying to avoid move cold block out of loop:
>>
>> https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
>>
>> Richard suggested to "never hoist anything from a bb with lower execution
>> frequency to a bb with higher one in LIM invariantness_dom_walker
>> before_dom_children".
>>
>> In gimple LIM analysis, add find_coldest_out_loop to move invariants to
>> expected target loop, if profile count of the loop bb is colder
>> than target loop preheader, it won't be hoisted out of loop.
>> Likely for store motion, if all locations of the REF in loop is cold,
>> don't do store motion of it.
>>
>> SPEC2017 performance evaluation shows 1% performance improvement for
>> intrate GEOMEAN and no obvious regression for others.  Especially,
>> 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
>> largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
>> on P8LE.
>>
>> gcc/ChangeLog:
>>
>> * loop-invariant.c (find_invariants_bb): Check profile count
>> before motion.
>> (find_invariants_body): Add argument.
>> * tree-ssa-loop-im.c (find_coldest_out_loop): New function.
>> (determine_max_movement): Use find_coldest_out_loop.
>> (move_computations_worker): Adjust and fix iteration udpate.
>> (execute_sm_exit): Check pointer validness.
>> (class ref_in_loop_hot_body): New functor.
>> (ref_in_loop_hot_body::operator): New.
>> (can_sm_ref_p): Use for_all_locs_in_loop.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.dg/tree-ssa/recip-3.c: Adjust.
>> * gcc.dg/tree-ssa/ssa-lim-18.c: New test.
>> * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
>> * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
>> ---
>>  gcc/loop-invariant.c   | 10 ++--
>>  gcc/tree-ssa-loop-im.c | 61 --
>>  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c|  2 +-
>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++
>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 27 ++
>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 +
>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 28 ++
>>  7 files changed, 165 insertions(+), 8 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
>>
>> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>> index fca0c2b24be..5c3be7bf0eb 100644
>> --- a/gcc/loop-invariant.c
>> +++ b/gcc/loop-invariant.c
>> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool 
>> always_reached, bool always_executed)
>> call.  */
>>
>>  static void
>> -find_invariants_bb (basic_block bb, bool always_reached, bool 
>> always_executed)
>> +find_invariants_bb (class loop *loop, basic_block bb, bool 
>> always_reached,
>> +   bool always_executed)
>>  {
>>rtx_insn *insn;
>> +  basic_block preheader = loop_preheader_edge (loop)->src;
>> +
>> +  if (preheader->count > bb->count)
>> +return;
>>
>>FOR_BB_INSNS (bb, insn)
>>  {
>> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, 
>> basic_block *body,
>>unsigned i;
>>
>>for (i = 0; i < loop->num_nodes; i++)
>> -find_invariants_bb (body[i],
>> -   bitmap_bit_p (always_reached,

Re: [PATCH] Fix loop split incorrect count and probability

2021-10-26 Thread Xionghu Luo via Gcc-patches



On 2021/10/26 21:05, Jan Hubicka wrote:
>>>
> 
>> That said, likely the profile update cannot be done uniformly
>> for all blocks of a loop?
> 
> For the loop:
> 
> for (i = 0; i < n; i = inc (i))
> {
>   if (ga)
> ga = do_something ();
> }
> 
> to:
> 
>   for (i = 0; i < x; i = inc (i))
> {
> if (true)
>  ga = do_something ();
> if (!ga)
>   break;
> }
>   for (; i < n; i = inc (i))
> {
> if (false)
>  ga = do_something ();
> }
> 
> If probability of if (ga) being true is p, then you indeed can scale the
> first loop by p and second loop by 1-p.
> 
> Imagine that loop has n iterations and it takes m iterations for ga to
> become false, then probability of if(ga) is m/n and you get frequencies
> with m=n*(m/n) for first loop and n-m=n*(1-n/m) for second loop.
> 
> Because the conditional becomes constant true, one needs to scale up the
> basic block guarded by the if (true) up by n/m to compensate for the
> change.  With that the udpate should be right.
> Ideally one can bypass scaling of basic block(s) containing
>  ga = do_something () since the scaling first scales down to m/n
> and then scale sup to m/n. Which may not combine to noop.
> Perhaps one wants to have a parameter specifying basic blocks on which
> the scaling is performed while duplicating for this?

Yes.  This is the patch I tried to fix the issue for the case you
pasted for loop split.

https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576566.html


 5< 
 | \  |
 6-- 7|
|  \  |
21  11->3 |
| \   |
19 20--
|
12<
/ |   |
3<-11 16-- 

with the patch, Loop 1's bb {5,6,7,21,20} is scaled to 33% * count, 
Loop 2's bb {12,16} is scaled to 66% * count,
especially, probability of edge 21->19 and 21->20 is fixed to 
(33%, 67%) instead of (100%, INV).


-   [local count: 955630225]:
+   [local count: 315357973]:
   # i_13 = PHI 
   # prephitmp_12 = PHI 
   if (prephitmp_12 != 0)
 goto ; [33.00%]
   else
 goto ; [67.00%]

-   [local count: 315357972]:
+   [local count: 104068130]:
   _2 = do_something ();
   ga = _2;

-   [local count: 955630225]:
+   [local count: 315357973]:
   # prephitmp_5 = PHI 
   i_10 = inc (i_13);
   if (n_7(D) > i_10)
 goto ; [89.00%]
   else
 goto ; [11.00%]

[local count: 105119324]:
   goto ; [100.00%]

-   [local count: 850510901]:
+   [local count: 280668596]:
   if (prephitmp_12 != 0)
-goto ; [100.00%]
+goto ; [33.00%]
   else
-goto ; [INV]
+goto ; [67.00%]

-   [local count: 850510901]:
+   [local count: 280668596]:
   goto ; [100.00%]

-   [count: 0]:
+   [local count: 70429947]:
   # i_23 = PHI 
   # prephitmp_25 = PHI 

-   [local count: 955630225]:
+   [local count: 640272252]:
   # i_15 = PHI 
   # prephitmp_16 = PHI 
   i_22 = inc (i_15);
   if (n_7(D) > i_22)
 goto ; [89.00%]
   else
 goto ; [11.00%]

-   [local count: 850510901]:
+   [local count: 569842305]:
   goto ; [100.00%]

> 
> Honza
>>
>> Richard.

-- 
Thanks,
Xionghu


Go patch committed: Permit compiler directives in parenthesized groups

2021-10-26 Thread Ian Lance Taylor via Gcc-patches
This patch to the Go frontend permits compiler directives in
parenthesized groups.  The original compiler directive support was
only for //line at the start of a line and for //go: comments before
function declarations.  When support was added for //go:notinheap for
types and //go:embed for variables the code did not adapt to permit
spaces before the comment or to permit the comments in var() or type()
groups.  This change corrects those omissions.  Bootstrapped and ran
Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
4571c4689239eda7567a69b0decea52c5f6ce501
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index affba73e4bb..e7ff6705563 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-925ace70ac7426c3f8b5c0bfb75aa9601f071de4
+128ea3dce9b8753167f33d0a96bd093a6cbd58b8
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/lex.cc b/gcc/go/gofrontend/lex.cc
index dd66c0209a4..d6909b840a8 100644
--- a/gcc/go/gofrontend/lex.cc
+++ b/gcc/go/gofrontend/lex.cc
@@ -1909,14 +1909,13 @@ Lex::skip_cpp_comment()
   if (saw_error)
 return;
 
-  // Recognize various magic comments at the start of a line.
+  // Recognize various magic comments at the start of a line, preceded
+  // only by spaces or tabs.
 
-  if (lineoff != 2)
-{
-  // Not at the start of the line.  (lineoff == 2 because of the
-  // two characters in "//").
+  // "- 2" for the "//" at the start of the comment.
+  for (const char* psp = this->linebuf_; psp < p - 2; psp++)
+if (*psp != ' ' && *psp != '\t')
   return;
-}
 
   while (pend > p
 && (pend[-1] == ' ' || pend[-1] == '\t'
diff --git a/gcc/go/gofrontend/lex.h b/gcc/go/gofrontend/lex.h
index 75c8429b68f..1c4cc5bd2a1 100644
--- a/gcc/go/gofrontend/lex.h
+++ b/gcc/go/gofrontend/lex.h
@@ -420,6 +420,12 @@ class Lex
 std::swap(*embeds, this->embeds_);
   }
 
+  // Clear any go:embed patterns seen so far.  This is used for
+  // erroneous cases.
+  void
+  clear_embeds()
+  { this->embeds_.clear(); }
+
   // Return whether the identifier NAME should be exported.  NAME is a
   // mangled name which includes only ASCII characters.
   static bool
diff --git a/gcc/go/gofrontend/parse.cc b/gcc/go/gofrontend/parse.cc
index e43b5f21448..cc197e5eb35 100644
--- a/gcc/go/gofrontend/parse.cc
+++ b/gcc/go/gofrontend/parse.cc
@@ -1307,46 +1307,14 @@ void
 Parse::declaration()
 {
   const Token* token = this->peek_token();
-
-  unsigned int pragmas = this->lex_->get_and_clear_pragmas();
-  if (pragmas != 0
-  && !token->is_keyword(KEYWORD_FUNC)
-  && !token->is_keyword(KEYWORD_TYPE))
-go_warning_at(token->location(), 0,
- "ignoring magic comment before non-function");
-
-  std::vector* embeds = NULL;
-  if (this->lex_->has_embeds())
-{
-  embeds = new(std::vector);
-  this->lex_->get_and_clear_embeds(embeds);
-
-  if (!this->gogo_->current_file_imported_embed())
-   {
- go_error_at(token->location(),
- "invalid go:embed: missing import %");
- delete embeds;
- embeds = NULL;
-   }
-  if (!token->is_keyword(KEYWORD_VAR))
-   {
- go_error_at(token->location(), "misplaced go:embed directive");
- if (embeds != NULL)
-   {
- delete embeds;
- embeds = NULL;
-   }
-   }
-}
-
   if (token->is_keyword(KEYWORD_CONST))
 this->const_decl();
   else if (token->is_keyword(KEYWORD_TYPE))
-this->type_decl(pragmas);
+this->type_decl();
   else if (token->is_keyword(KEYWORD_VAR))
-this->var_decl(embeds);
+this->var_decl();
   else if (token->is_keyword(KEYWORD_FUNC))
-this->function_decl(pragmas);
+this->function_decl();
   else
 {
   go_error_at(this->location(), "expected declaration");
@@ -1367,8 +1335,7 @@ Parse::declaration_may_start_here()
 // Decl = P | "(" [ List ] ")" .
 
 void
-Parse::decl(void (Parse::*pfn)(unsigned int, std::vector*),
-   unsigned int pragmas, std::vector* embeds)
+Parse::decl(void (Parse::*pfn)())
 {
   if (this->peek_token()->is_eof())
 {
@@ -1378,15 +1345,18 @@ Parse::decl(void (Parse::*pfn)(unsigned int, 
std::vector*),
 }
 
   if (!this->peek_token()->is_op(OPERATOR_LPAREN))
-(this->*pfn)(pragmas, embeds);
+(this->*pfn)();
   else
 {
-  if (pragmas != 0)
-   go_warning_at(this->location(), 0,
- "ignoring magic % comment before group");
-  if (embeds != NULL)
+  if (this->lex_->get_and_clear_pragmas() != 0)
go_error_at(this->location(),
-   "ignoring % comment before group");
+   "ignoring compiler directive before group");
+  if (this->lex_->has_embeds())
+   {
+ this->lex_->clear_embeds();
+ go_error_at(this->location(),
+ "ignoring % comment before group");
+  

[PATCH] assert_streq: add newlines to failure message

2021-10-26 Thread David Malcolm via Gcc-patches
Adding newlines so that the two strings line up makes string equality
failures considerably easier to read.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
OK for trunk?

gcc/ChangeLog:
* selftest.c (assert_streq): Add newlines when emitting non-equal
non-NULL strings.

Signed-off-by: David Malcolm 
---
 gcc/selftest.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/selftest.c b/gcc/selftest.c
index 8f1cde0cc19..0db56f3b240 100644
--- a/gcc/selftest.c
+++ b/gcc/selftest.c
@@ -89,7 +89,7 @@ assert_streq (const location &loc,
if (strcmp (val1, val2) == 0)
  pass (loc, "ASSERT_STREQ");
else
- fail_formatted (loc, "ASSERT_STREQ (%s, %s) val1=\"%s\" val2=\"%s\"",
+ fail_formatted (loc, "ASSERT_STREQ (%s, %s)\n val1=\"%s\"\n 
val2=\"%s\"\n",
  desc_val1, desc_val2, val1, val2);
   }
 }
-- 
2.26.3



[PATCH] gcc/Makefile.in: fix bug in gengtype link rule

2021-10-26 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
OK for trunk?

gcc/ChangeLog:
* Makefile.in: Fix syntax for reference to LIBDEPS in
gengtype link rule.

Signed-off-by: David Malcolm 
---
 gcc/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 658093c11c0..842e57dff87 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2964,7 +2964,7 @@ build/gengtype$(build_exeext) : build/gengtype-lex.o 
build/gengtype-parse.o \
 gengtype$(exeext) : gengtype.o gengtype-lex.o gengtype-parse.o \
   gengtype-state.o errors.o $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
-   $(filter-out ($LIBDEPS), $^) $(LIBS)
+   $(filter-out $(LIBDEPS), $^) $(LIBS)
 
 # Rule for the generator programs:
 $(genprog:%=build/gen%$(build_exeext)): build/gen%$(build_exeext): 
build/gen%.o $(BUILD_LIBDEPS)
-- 
2.26.3



Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-26 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 26, 2021 at 05:07:43PM -0400, Patrick Palka wrote:
> The performance impact of the other calls to cxx_eval_outermost_const_expr
> from p_c_e_1 is probably already mostly mitigated by the constexpr call
> cache and the fact that we try to evaluate all calls to constexpr
> functions during cp_fold_function anyway (at least with -O).  So trial

constexpr function bodies don't go through cp_fold_function (intentionally,
so that we don't optimize away UB), the bodies are copied before the trees of 
the
normal copy are folded.

Jakub



Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-26 Thread Patrick Palka via Gcc-patches
On Tue, 26 Oct 2021, Jason Merrill wrote:

> On Tue, Oct 26, 2021 at 2:46 PM Jakub Jelinek via Gcc-patches 
>  wrote:
>   On Tue, Oct 26, 2021 at 01:44:18PM -0400, Patrick Palka via Gcc-patches 
> wrote:
>   > In the testcase below the two left fold expressions each expand into a
>   > logical expression with 1024 terms, for which potential_const_expr_1
>   > takes more than a minute to return true.  This is because p_c_e_1
>   > performs trial evaluation of the first operand of a &&/|| in order to
>   > determine whether to consider its second operand.  And because the
>   > expanded expression is left-associated, this trial evaluation causes
>   > p_c_e_1 to be quadratic in the number of terms of the expression.
>   >
>   > This patch fixes this quadratic behavior by making p_c_e_1 recurse 
> only
>   > into the first operand of a &&/|| when checking for potentiality.  
> This
>   > means we'll consider more non-constant expressions as potentially
>   > constant compared to the trial evaluation approach, but that should be
>   > harmless as later constexpr evaluation of the expression will deem it
>   > non-constant anyway.
> 
> 
> Fair, especially now that it seems that C++23 is removing the diagnostic for 
> a constexpr function that can't produce a constant expression.  And as more 
> functions become constexpr, the trial evaluation gets
> more expensive to get to the point of failure.  I wonder if we want to stop 
> calling cxx_eval_outermost_constant_expression from pot_c_e_1 entirely, only 
> check whether the operand is already constant.

The performance impact of the other calls to cxx_eval_outermost_const_expr
from p_c_e_1 is probably already mostly mitigated by the constexpr call
cache and the fact that we try to evaluate all calls to constexpr
functions during cp_fold_function anyway (at least with -O).  So trial
evaluation of e.g. the condition of a for/while loop that contains an
expensive constexpr call shouldn't cause the compiler to perform more
work overall IIUC.

> 
>   For very large && or || expressions (but not mixed or with ! etc.) 
> another
>   possibility would be to check if the lhs or rhs have the same code and 
> if
>   so, linearize the trees and recurse only on the leafs (trees with other
>   TREE_CODE) and stop when first expression isn't equal to tmp.
> 
> 
> This would be good if we wanted to keep the evaluation.
>  
> Jason
> 
> 


Re: [PATCH, v2] c++: Diagnose taking address of an immediate member function [PR102753]

2021-10-26 Thread Jason Merrill via Gcc-patches

On 10/21/21 07:17, Jakub Jelinek wrote:

On Wed, Oct 20, 2021 at 07:16:44PM -0400, Jason Merrill wrote:

or an unevaluated operand, or a subexpression of an immediate invocation.

Hmm...that suggests that in consteval23.C, bar(foo) should also be OK,


Ouch.  That opens a big can of worms, see below.


because 'foo' is a subexpression of an immediate invocation.  We can handle
that by...


+
+bool
+in_immediate_context ()
+{
+  return (cp_unevaluated_operand != 0
+ || (current_function_decl != NULL_TREE
+ && DECL_IMMEDIATE_FUNCTION_P (current_function_decl))
+ || (current_binding_level->kind == sk_function_parms
+ && current_binding_level->immediate_fn_ctx_p)
+ || in_consteval_if_p);
+}
+
   /* Return true if a call to FN with number of arguments NARGS
  is an immediate invocation.  */
@@ -9451,6 +9459,12 @@ build_over_call (struct z_candidate *can
   }
 /* Default arguments */
+  bool save_in_consteval_if_p = in_consteval_if_p;
+  /* If the call is immediate function invocation, make sure
+ taking address of immediate functions is allowed in default
+ arguments.  */
+  if (immediate_invocation_p (STRIP_TEMPLATE (fn), nargs))
+in_consteval_if_p = true;


...moving this earlier in build_over_call, e.g. shortly after
not_really_used:

You can also use make_temp_override.


make_temp_override can't be used, because in_consteval_if_p is BOOL_BITFIELD
under the hood and binding a reference to a bit-field is not valid.

In the patch I've used a specialized in_consteval_if_p_temp_override class
instead.  And it either needs to be restored after all the argument
processing, because we call immediate_invocation_p again and because of
the in_consteval_if_p override it is then false (the patch calls icip.reset
()), or we'd need to remember the result of the first immediate_invocation_p
call in some bool temporary and reuse that later.
Updated patch below.

This fixes the static_assert (bar (foo) == 42); case newly added to
consteval23.C, but unfortunately that isn't enough.
Consider incremental patch to the testcase:
--- gcc/testsuite/g++.dg/cpp2a/consteval23.C.jj 2021-10-21 11:32:48.570586881 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/consteval23.C2021-10-21 12:47:42.281769906 
+0200
@@ -2,6 +2,7 @@
  // { dg-do compile { target c++20 } }
  
  consteval int foo () { return 42; }

+constexpr auto baz (int (*fn) ()) { return fn; }
  
  consteval int

  bar (int (*fn) () = foo)
@@ -11,3 +12,5 @@ bar (int (*fn) () = foo)
  
  static_assert (bar () == 42);

  static_assert (bar (foo) == 42);
+static_assert (bar (&foo) == 42);
+static_assert (bar (baz (&foo)) == 42);

If even the last two lines are supposed to be valid, then I'm afraid we
need to reconsider the errors performed in cp_build_addr_expr_1, because
when that is called, we often don't know if we are in a subexpression
of immediate invocation or not, whether it is because we are in template,
something non-type dependent but other argumeents of the function call are
type dependent, or because it is an overloaded function and some overloads
are consteval and others are not and we are deep in a middle of some
argument parsing and other arguments haven't been parsed at all yet, etc.

I'm afraid I don't have a good idea where to move that diagnostic to though,
it would need to be done somewhere where we are certain we aren't in a
subexpression of immediate invocation.  Given statement expressions, even
diagnostics after parsing whole statements might not be good enough, e.g.
void
qux ()
{
   static_assert (bar (({ constexpr auto a = 1; foo; })) == 42);
}


I suppose (a wrapper for) fold_build_cleanup_point_expr would be a 
possible place to check, since that's called for full-expressions.



But if we e.g. diagnose it somewhere after parsing the whole function (and
for templates only after instantiating it) plus where we handle
non-automatic variable initializers etc., will we handle all spots where
we should diagnose it?  It better should be done before cp_fold...



In whatever spot is right doing something similar to what
cp_walk_tree find_immediate_fndecl, does, except probably error out on each
case we see (with the right locus) instead of just finding the first match.

Note, trying clang++ on godbolt on the consteval23.C testcase with the above
ammendments, it diagnoses the = foo default argument, that seems to be a
clang bug to me, but if I comment out the
static_assert (bar () == 42); test and comment out the default argument
bar (int (*fn) () /* = foo */)
then it succeeds.  It even accepts the above qux with statement expression.


This patch is OK.


2021-10-21  Jakub Jelinek  

PR c++/102753
* cp-tree.h (saved_scope): Document that consteval_if_p member
is also set while processing immediate invocation.
(in_immediate_context): Declare.
* call.c (in_immediate_context): New function.
(immediate_invocation_p): Use it.
(struct in_cons

Re: [committed] MAINTAINERS: Add myself as a VAX port maintainer

2021-10-26 Thread Jeff Law via Gcc-patches




On 10/26/2021 1:06 PM, Jakub Jelinek via Gcc-patches wrote:

On Tue, Oct 26, 2021 at 06:51:09PM +0100, Maciej W. Rozycki wrote:

On Tue, 26 Oct 2021, Martin Liška wrote:


Please remove your name from Write After Approval:

$ make check -k RUNTESTFLAGS="maintainers.exp"
...
Running /home/marxin/Programming/gcc/gcc/testsuite/gcc.src/maintainers.exp ...
Redundant in write approval: Maciej W. Rozycki
FAIL: maintainers-verify.sh

  Hmm, that seems like useless policy to me; the "Write After Approval"
section used to be an exhaustive, alphabetically sorted list of people
with commit rights, and it still is with our sister projects binutils and
GDB.  By removing entries for people listed elsewhere (which may be across
several places anyway) you require one to look for information scattered
around the file.  And contributors to remember adding themselves back as
they step down from maintainer posts.

  Besides, my e-mail addresses listed are different in the two sections,
and that's been deliberate.  Also what about people carrying identical
full names?  I insist on using my middle initial for a reason.

  It seems like there's been hardly any discussion about this matter around
the time this stuff was added with commit bddcac9d1c32 ("[contrib] Add
contrib/maintainers-verify.sh").  What was the actual motivation behind
that change?

That was only addition of a script and testcase to verify what has been done
in MAINTAINERS since forever.
Just look at all the commits to remove redundant entries from Write After
Approval, e.g.
https://gcc.gnu.org/legacy-ml/gcc-patches/2003-05/msg00366.html
All maintainers or reviewers (global or specific) have write after approval
rights for areas they don't maintain.

I went ahead and fixed Maciej's entries in the obvious way.

jeff


[PATCH, committed] PR fortran/86551 - ICE on invalid code with select type

2021-10-26 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

as has been validated by others before and checked again, the underlying
issue of this PR has been fixed before by an unknown commit.

To ensure that it doesn't pop up again, and as suggested in the PR,
I've packaged the testcase and committed as obvious.

Thanks,
Harald


commit 0ec53a3df536f83ec72ef25b045768c06c363f86
Author: Harald Anlauf 
Date:   Tue Oct 26 22:22:36 2021 +0200

Fortran: error recovery on invalid code with SELECT TYPE

gcc/testsuite/ChangeLog:

PR fortran/86551
* gfortran.dg/pr86551.f90: New test to verify that PR86551 remains
fixed.

diff --git a/gcc/testsuite/gfortran.dg/pr86551.f90 
b/gcc/testsuite/gfortran.dg/pr86551.f90
new file mode 100644
index 000..d96e17a1884
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr86551.f90
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! PR fortran/86551 - ICE on invalid code with select type / end select type
+
+subroutine b
+  type :: t1
+  end type t1
+  class(t1) :: c2
+  select type (d => c2)
+  end select type   ! { dg-error "Syntax error" }
+end ! { dg-error "END SELECT statement expected" }
+
+! { dg-prune-output "Unexpected end of file" }



Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-26 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 26, 2021 at 03:29:02PM -0400, Jason Merrill wrote:
> On Tue, Oct 26, 2021 at 2:46 PM Jakub Jelinek via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> 
> > On Tue, Oct 26, 2021 at 01:44:18PM -0400, Patrick Palka via Gcc-patches
> > wrote:
> > > In the testcase below the two left fold expressions each expand into a
> > > logical expression with 1024 terms, for which potential_const_expr_1
> > > takes more than a minute to return true.  This is because p_c_e_1
> > > performs trial evaluation of the first operand of a &&/|| in order to
> > > determine whether to consider its second operand.  And because the
> > > expanded expression is left-associated, this trial evaluation causes
> > > p_c_e_1 to be quadratic in the number of terms of the expression.
> > >
> > > This patch fixes this quadratic behavior by making p_c_e_1 recurse only
> > > into the first operand of a &&/|| when checking for potentiality.  This
> > > means we'll consider more non-constant expressions as potentially
> > > constant compared to the trial evaluation approach, but that should be
> > > harmless as later constexpr evaluation of the expression will deem it
> > > non-constant anyway.
> >
> 
> Fair, especially now that it seems that C++23 is removing the diagnostic
> for a constexpr function that can't produce a constant expression.  And as
> more functions become constexpr, the trial evaluation gets more expensive
> to get to the point of failure.  I wonder if we want to stop calling
> cxx_eval_outermost_constant_expression from pot_c_e_1 entirely, only check
> whether the operand is already constant.

That would make p_c_e_1 surely much faster, but on the other side we would
stop diagnosing some functions that can't be valid constant expressions
no matter with what arguments they are called.

Another option would be to do these cxx_eval_outermost_constant_expression
calls from p_c_e_1 only in a cheap mode, i.e. with constexpr_ops_limit
temporarily lowered to something very small, 100-ish or 1000-ish or so, so it 
would
handle small expressions like before, but would quietly punt on large
expressions.

Jakub



Re: [PING PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics

2021-10-26 Thread Paul A. Clarke via Gcc-patches
Patches 1/3 and 3/3 have been committed.
This is only a ping for 2/3.

On Mon, Oct 18, 2021 at 08:15:11PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Suppress exceptions (when specified), by saving, manipulating, and
> restoring the FPSCR.  Similarly, save, set, and restore the floating-point
> rounding mode when required.
> 
> No attempt is made to optimize writing the FPSCR (by checking if the new
> value would be the same), other than using lighter weight instructions
> when possible. Note that explicit instruction scheduling "barriers" are
> added to prevent floating-point computations from being moved before or
> after the explicit FPSCR manipulations.  (That these are required has
> been reported as an issue in GCC: PR102783.)
> 
> The scalar versions naively use the parallel versions to compute the
> single scalar result and then construct the remainder of the result.
> 
> Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
> are swapped from the corresponding values on x86 so as to match the
> corresponding rounding mode values in the Power ISA.
> 
> Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> convert _mm_ceil* and _mm_floor* into macros. This matches the current
> analogous implementations in config/i386/smmintrin.h.
> 
> Function signatures match the analogous functions in config/i386/smmintrin.h.
> 
> Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
> modeled after the very similar "floor" and "ceil" tests.
> 
> Include basic tests, plus tests at the boundaries for floating-point
> representation, positive and negative, test all of the parameterized
> rounding modes as well as the C99 rounding modes and interactions
> between the two.
> 
> Exceptions are not explicitly tested.
> 
> 2021-10-18  Paul A. Clarke  
> 
> gcc
>   * config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
>   _mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
>   _MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
>   _MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
>   _MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
>   _MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
>   * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
>   _mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
>   Convert from function to macro.
> 
> gcc/testsuite
>   * gcc.target/powerpc/sse4_1-round3.h: New.
>   * gcc.target/powerpc/sse4_1-roundpd.c: New.
>   * gcc.target/powerpc/sse4_1-roundps.c: New.
>   * gcc.target/powerpc/sse4_1-roundsd.c: New.
>   * gcc.target/powerpc/sse4_1-roundss.c: New.
> ---
>  gcc/config/rs6000/smmintrin.h | 292 ++
>  .../gcc.target/powerpc/sse4_1-round3.h|  81 +
>  .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 +
>  .../gcc.target/powerpc/sse4_1-roundps.c   |  98 ++
>  .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
>  .../gcc.target/powerpc/sse4_1-roundss.c   | 208 +
>  6 files changed, 1014 insertions(+), 64 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> 
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index 90ce03d22709..6bb03e6e20ac 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -42,6 +42,234 @@
>  #include 
>  #include 
>  
> +/* Rounding mode macros. */
> +#define _MM_FROUND_TO_NEAREST_INT   0x00
> +#define _MM_FROUND_TO_ZERO  0x01
> +#define _MM_FROUND_TO_POS_INF   0x02
> +#define _MM_FROUND_TO_NEG_INF   0x03
> +#define _MM_FROUND_CUR_DIRECTION0x04
> +
> +#define _MM_FROUND_NINT  \
> +  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_FLOOR \
> +  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_CEIL  \
> +  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_TRUNC \
> +  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_RINT  \
> +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_NEARBYINT \
> +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
> +
> +#define _MM_FROUND_RAISE_EXC0x00
> +#define _MM_FROUND_NO_EXC   0x08
> +
> +extern __inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_round_pd (__m128d __A, int __rounding)
> +{
> +  __v2df __r;
> +  union {
> +double __fr;
> +long long __fpscr;
> +  } __enables_save, __fpscr_save;
> +
> +  if (__rounding & _

Re: [PATCH] PR fortran/102956 - PDT KIND and LEN type parameters are mutually exclusive

2021-10-26 Thread Tobias Burnus

On 26.10.21 21:40, Harald Anlauf via Fortran wrote:


we were missing a conflict check for PDT KIND and LEN type parameters:


F2018: 7.5.3.1  Type parameter definition statement

R732  type-param-def-stmt is
   integer-type-spec, type-param-attr-spec :: type-param-decl-list

R734  type-param-attr-spec  is KIND or LEN

(3) The type-param-attr-spec explicitly specifies whether a type parameter is a 
kind parameter or a length parameter.

Thus the KIND and LEN attributes are mutually exclusive.


The attached trivial patch remedies that.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


LGTM, thanks.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-26 Thread Jonathan Wakely via Gcc-patches
On Tue, 26 Oct 2021 at 20:40, Jakub Jelinek wrote:

> On Tue, Oct 26, 2021 at 08:35:40PM +0100, Jonathan Wakely wrote:
> > We can change __cmp_cat::type if that would result in better code. I
> picked
> > signed char because we only need two bits, and preferably have a signed
> > type as it simplifies some things. Would int make more sense? Or int:2 ?
>
> I think signed char is fine (and changing it would be an ABI change),


We haven't committed to a C++20 ABI yet, so changes are possible. If we
don't need to change, that's obviously preferable.


> int is
> unnecessarily large and int:2 would be certainly slower (and harder).
>

OK, signed char it is then.


[PATCH] PR fortran/102956 - PDT KIND and LEN type parameters are mutually exclusive

2021-10-26 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

we were missing a conflict check for PDT KIND and LEN type parameters:


F2018: 7.5.3.1  Type parameter definition statement

R732  type-param-def-stmt is
  integer-type-spec, type-param-attr-spec :: type-param-decl-list

R734  type-param-attr-spec  is KIND or LEN

(3) The type-param-attr-spec explicitly specifies whether a type parameter is a 
kind parameter or a length parameter.

Thus the KIND and LEN attributes are mutually exclusive.


The attached trivial patch remedies that.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

Fortran: [PDT] KIND and LEN type parameters are mutually exclusive

gcc/fortran/ChangeLog:

	PR fortran/102956
	* symbol.c (gfc_check_conflict): Add conflict check for PDT KIND
	and LEN type parameters.

gcc/testsuite/ChangeLog:

	PR fortran/102956
	* gfortran.dg/pdt_32.f03: New test.

diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 2c4acd5abe1..c77f3f84962 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -720,6 +720,7 @@ gfc_check_conflict (symbol_attribute *attr, const char *name, locus *where)
   conf (pdt_len, pointer)
   conf (pdt_len, dimension)
   conf (pdt_len, codimension)
+  conf (pdt_len, pdt_kind)

   if (attr->access == ACCESS_PRIVATE)
 {
diff --git a/gcc/testsuite/gfortran.dg/pdt_32.f03 b/gcc/testsuite/gfortran.dg/pdt_32.f03
new file mode 100644
index 000..f8d40410828
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pdt_32.f03
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! PR fortran/102956
+! PDT KIND and LEN type parameters are mutually exclusive (F2018:R734)
+!
+module m
+  type :: good_pdt (k,l)
+ integer, kind   :: k = 1
+ integer, len:: l = 1
+ character(kind=k,len=l) :: c
+  end type good_pdt
+
+  type :: bad_pdt (k,l)   ! { dg-error "does not have a component" }
+ integer, kind, len  :: k = 1 ! { dg-error "attribute conflicts with" }
+ integer, len, kind  :: l = 1 ! { dg-error "attribute conflicts with" }
+ character(kind=k,len=l) :: c ! { dg-error "has not been declared" }
+  end type bad_pdt
+end


Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-26 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 26, 2021 at 08:35:40PM +0100, Jonathan Wakely wrote:
> We can change __cmp_cat::type if that would result in better code. I picked
> signed char because we only need two bits, and preferably have a signed
> type as it simplifies some things. Would int make more sense? Or int:2 ?

I think signed char is fine (and changing it would be an ABI change), int is
unnecessarily large and int:2 would be certainly slower (and harder).

Jakub



Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-26 Thread Jonathan Wakely via Gcc-patches
On Tue, 26 Oct 2021 at 14:36, Jakub Jelinek  wrote:

> On Tue, Oct 26, 2021 at 03:21:55PM +0200, Richard Biener wrote:
> > On Tue, 26 Oct 2021, Jakub Jelinek wrote:
> >
> > > On Tue, Oct 26, 2021 at 03:13:29PM +0200, Richard Biener wrote:
> > > > try
> > > >   auto c = ...;
> > > >   signed char c2 = c;
> > > >   return c2 >= ...
> > > > then
> > >
> > > That won't work, at least when using , which is what we with
> the
> > > optimization want to deal with primarily.
> > > Because std::partial_ordering etc. aren't implicitly nor explicitly
> > > convertible to int or signed char etc.
> > > Sure, one could in the testcase define its own std::strong_ordering
> etc.
> > > and define a conversion operator for it...
> >
> > So how do we end up with the signed char case in the first place?
> > Is the frontend using a type that's target dependent?
>
>  uses explicitly signed char:
> namespace std
> {
>   // [cmp.categories], comparison category types
>   namespace __cmp_cat
>   {
> using type = signed char;
> enum class _Ord : type { equivalent = 0, less = -1, greater = 1 };
> enum class _Ncmp : type { _Unordered = 2 };
> ...
> and __cmp_cat::type is what is used as type of _M_value of std::*_ordering
> -fsigned-char vs. -funsigned-char make no difference on the testcases on
> x86, but as mentioned in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589#c24
> some target decisions like load_extend_op uses in fold-const.c can affect
> it.  See https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570714.html
>
>

We can change __cmp_cat::type if that would result in better code. I picked
signed char because we only need two bits, and preferably have a signed
type as it simplifies some things. Would int make more sense? Or int:2 ?


Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-26 Thread Jason Merrill via Gcc-patches
On Tue, Oct 26, 2021 at 2:46 PM Jakub Jelinek via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> On Tue, Oct 26, 2021 at 01:44:18PM -0400, Patrick Palka via Gcc-patches
> wrote:
> > In the testcase below the two left fold expressions each expand into a
> > logical expression with 1024 terms, for which potential_const_expr_1
> > takes more than a minute to return true.  This is because p_c_e_1
> > performs trial evaluation of the first operand of a &&/|| in order to
> > determine whether to consider its second operand.  And because the
> > expanded expression is left-associated, this trial evaluation causes
> > p_c_e_1 to be quadratic in the number of terms of the expression.
> >
> > This patch fixes this quadratic behavior by making p_c_e_1 recurse only
> > into the first operand of a &&/|| when checking for potentiality.  This
> > means we'll consider more non-constant expressions as potentially
> > constant compared to the trial evaluation approach, but that should be
> > harmless as later constexpr evaluation of the expression will deem it
> > non-constant anyway.
>

Fair, especially now that it seems that C++23 is removing the diagnostic
for a constexpr function that can't produce a constant expression.  And as
more functions become constexpr, the trial evaluation gets more expensive
to get to the point of failure.  I wonder if we want to stop calling
cxx_eval_outermost_constant_expression from pot_c_e_1 entirely, only check
whether the operand is already constant.

For very large && or || expressions (but not mixed or with ! etc.) another
> possibility would be to check if the lhs or rhs have the same code and if
> so, linearize the trees and recurse only on the leafs (trees with other
> TREE_CODE) and stop when first expression isn't equal to tmp.
>

This would be good if we wanted to keep the evaluation.

Jason


[committed] [PR102842] LRA: Consider all outputs in generation of matching reloads

2021-10-26 Thread Vladimir Makarov via Gcc-patches

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102842

As the patch touches a sensitive LRA code, the patch was bootstrapped 
tested on x86-64, aarch64, and ppc64.


I've committed the patch only in master branch.  Later (after some 
observation), I'll commit it into gcc-10 and gcc-11 branches.


commit 8c59f4118357789cfa8df2cf0d3ecb61be7e9041
Author: Vladimir N. Makarov 
Date:   Tue Oct 26 14:03:42 2021 -0400

[PR102842] Consider all outputs in generation of matching reloads

Without considering all output insn operands (not only processed
before), in rare cases LRA can use the same hard register for
different outputs of the insn on different assignment subpasses.  The
patch fixes the problem.

gcc/ChangeLog:

PR rtl-optimization/102842
* lra-constraints.c (match_reload): Ignore out in checking values
of outs.
(curr_insn_transform): Collect outputs before doing reloads of operands.

gcc/testsuite/ChangeLog:

PR rtl-optimization/102842
* g++.target/arm/pr102842.C: New test.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 8f75125fc2e..0195b4fb9c3 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1102,7 +1102,7 @@ match_reload (signed char out, signed char *ins, signed char *outs,
 	  for (i = 0; outs[i] >= 0; i++)
 	{
 	  rtx other_out_rtx = *curr_id->operand_loc[outs[i]];
-	  if (REG_P (other_out_rtx)
+	  if (outs[i] != out && REG_P (other_out_rtx)
 		  && (regno_val_use_in (REGNO (in_rtx), other_out_rtx)
 		  != NULL_RTX))
 		{
@@ -4382,7 +4382,10 @@ curr_insn_transform (bool check_only_p)
   }
 
   n_outputs = 0;
-  outputs[0] = -1;
+  for (i = 0; i < n_operands; i++)
+if (curr_static_id->operand[i].type == OP_OUT)
+  outputs[n_outputs++] = i;
+  outputs[n_outputs] = -1;
   for (i = 0; i < n_operands; i++)
 {
   int regno;
@@ -4457,8 +4460,6 @@ curr_insn_transform (bool check_only_p)
 		 lra-lives.c.  */
 		  match_reload (i, goal_alt_matched[i], outputs, goal_alt[i], &before,
 &after, TRUE);
-		  outputs[n_outputs++] = i;
-		  outputs[n_outputs] = -1;
 		}
 	  continue;
 	}
@@ -4636,14 +4637,6 @@ curr_insn_transform (bool check_only_p)
 	   process_alt_operands decides that it is possible.  */
 	gcc_unreachable ();
 
-  /* Memorise processed outputs so that output remaining to be processed
-	 can avoid using the same register value (see match_reload).  */
-  if (curr_static_id->operand[i].type == OP_OUT)
-	{
-	  outputs[n_outputs++] = i;
-	  outputs[n_outputs] = -1;
-	}
-
   if (optional_p)
 	{
 	  rtx reg = op;
diff --git a/gcc/testsuite/g++.target/arm/pr102842.C b/gcc/testsuite/g++.target/arm/pr102842.C
new file mode 100644
index 000..a2bac66091a
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/pr102842.C
@@ -0,0 +1,30 @@
+/* PR rtl-optimization/102842 */
+/* { dg-do compile } */
+/* { dg-options "-fPIC  -O2 -fno-omit-frame-pointer -mthumb -march=armv7-a+fp" } */
+
+struct Plane {
+  using T = float;
+  T *Row();
+};
+using ImageF = Plane;
+long long Mirror_x;
+struct EnsurePaddingInPlaceRowByRow {
+  void Process() {
+switch (strategy_) {
+case kSlow:
+  float *row = img_.Row();
+  long long xsize = x1_;
+  while (Mirror_x >= xsize)
+if (Mirror_x)
+  Mirror_x = 2 * xsize - 1;
+  *row = Mirror_x;
+}
+  }
+  ImageF img_;
+  unsigned x1_;
+  enum { kSlow } strategy_;
+};
+void FinalizeImageRect() {
+  EnsurePaddingInPlaceRowByRow ensure_padding;
+  ensure_padding.Process();
+}


Re: [committed] MAINTAINERS: Add myself as a VAX port maintainer

2021-10-26 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 26, 2021 at 06:51:09PM +0100, Maciej W. Rozycki wrote:
> On Tue, 26 Oct 2021, Martin Liška wrote:
> 
> > Please remove your name from Write After Approval:
> > 
> > $ make check -k RUNTESTFLAGS="maintainers.exp"
> > ...
> > Running /home/marxin/Programming/gcc/gcc/testsuite/gcc.src/maintainers.exp 
> > ...
> > Redundant in write approval: Maciej W. Rozycki
> > FAIL: maintainers-verify.sh
> 
>  Hmm, that seems like useless policy to me; the "Write After Approval" 
> section used to be an exhaustive, alphabetically sorted list of people 
> with commit rights, and it still is with our sister projects binutils and 
> GDB.  By removing entries for people listed elsewhere (which may be across 
> several places anyway) you require one to look for information scattered 
> around the file.  And contributors to remember adding themselves back as 
> they step down from maintainer posts.
> 
>  Besides, my e-mail addresses listed are different in the two sections, 
> and that's been deliberate.  Also what about people carrying identical 
> full names?  I insist on using my middle initial for a reason.
> 
>  It seems like there's been hardly any discussion about this matter around 
> the time this stuff was added with commit bddcac9d1c32 ("[contrib] Add 
> contrib/maintainers-verify.sh").  What was the actual motivation behind 
> that change?

That was only addition of a script and testcase to verify what has been done
in MAINTAINERS since forever.
Just look at all the commits to remove redundant entries from Write After
Approval, e.g.
https://gcc.gnu.org/legacy-ml/gcc-patches/2003-05/msg00366.html
All maintainers or reviewers (global or specific) have write after approval
rights for areas they don't maintain.

Jakub



[PATCH] powerpc: Remove LINK_OS_EXTRA_SPEC{32, 64} from --with-advance-toolchain

2021-10-26 Thread Lucas A. M. Magalhaes via Gcc-patches
Historically this was added to fill gaps from ld.so.cache on early AT
releases. This now are just causing errors and rework. Since AT5.0 the
AT's ld.so is using a correctly configured ld.so.cache and sets the
DT_INTERP to AT's ld.so. This two factors are sufficient for an AT
builded program to get the correct libraries.

GCC congured with --with-advance-toolchain has issues building GlibC
releases because it adds DT_RUNPATH to ld.so and that's unsupported.

2021-03-11  Lucas A. M. Magalhães  

gcc/
* config.gcc (powerpc*-*-*): Remove -rpath from
  --with-advance-toochain
---
 gcc/config.gcc | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index fb1f06f3da8..9eba3ece0a9 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -5088,16 +5088,6 @@ case "${target}" in
(at="/opt/$with_advance_toolchain"
 echo "/* Use Advance Toolchain $at */"
 echo
-echo "#undef  LINK_OS_EXTRA_SPEC32"
-echo "#define LINK_OS_EXTRA_SPEC32" \
- "\"%(link_os_new_dtags)" \
- "-rpath $prefix/lib -rpath $at/lib\""
-echo
-echo "#undef  LINK_OS_EXTRA_SPEC64"
-echo "#define LINK_OS_EXTRA_SPEC64" \
- "\"%(link_os_new_dtags)" \
- "-rpath $prefix/lib64 -rpath $at/lib64\""
-echo
 echo "#undef  LINK_OS_NEW_DTAGS_SPEC"
 echo "#define LINK_OS_NEW_DTAGS_SPEC" \
  "\"--enable-new-dtags\""
-- 
2.31.1



Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-26 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 26, 2021 at 01:44:18PM -0400, Patrick Palka via Gcc-patches wrote:
> In the testcase below the two left fold expressions each expand into a
> logical expression with 1024 terms, for which potential_const_expr_1
> takes more than a minute to return true.  This is because p_c_e_1
> performs trial evaluation of the first operand of a &&/|| in order to
> determine whether to consider its second operand.  And because the
> expanded expression is left-associated, this trial evaluation causes
> p_c_e_1 to be quadratic in the number of terms of the expression.
> 
> This patch fixes this quadratic behavior by making p_c_e_1 recurse only
> into the first operand of a &&/|| when checking for potentiality.  This
> means we'll consider more non-constant expressions as potentially
> constant compared to the trial evaluation approach, but that should be
> harmless as later constexpr evaluation of the expression will deem it
> non-constant anyway.

For very large && or || expressions (but not mixed or with ! etc.) another
possibility would be to check if the lhs or rhs have the same code and if
so, linearize the trees and recurse only on the leafs (trees with other
TREE_CODE) and stop when first expression isn't equal to tmp.
For mixed &&/|| or with ! that could still be expensive though.

Jakub



Re: [committed] MAINTAINERS: Add myself as a VAX port maintainer

2021-10-26 Thread Maciej W. Rozycki
On Tue, 26 Oct 2021, Martin Liška wrote:

> Please remove your name from Write After Approval:
> 
> $ make check -k RUNTESTFLAGS="maintainers.exp"
> ...
> Running /home/marxin/Programming/gcc/gcc/testsuite/gcc.src/maintainers.exp ...
> Redundant in write approval: Maciej W. Rozycki
> FAIL: maintainers-verify.sh

 Hmm, that seems like useless policy to me; the "Write After Approval" 
section used to be an exhaustive, alphabetically sorted list of people 
with commit rights, and it still is with our sister projects binutils and 
GDB.  By removing entries for people listed elsewhere (which may be across 
several places anyway) you require one to look for information scattered 
around the file.  And contributors to remember adding themselves back as 
they step down from maintainer posts.

 Besides, my e-mail addresses listed are different in the two sections, 
and that's been deliberate.  Also what about people carrying identical 
full names?  I insist on using my middle initial for a reason.

 It seems like there's been hardly any discussion about this matter around 
the time this stuff was added with commit bddcac9d1c32 ("[contrib] Add 
contrib/maintainers-verify.sh").  What was the actual motivation behind 
that change?

  Maciej


[PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-26 Thread Patrick Palka via Gcc-patches
In the testcase below the two left fold expressions each expand into a
logical expression with 1024 terms, for which potential_const_expr_1
takes more than a minute to return true.  This is because p_c_e_1
performs trial evaluation of the first operand of a &&/|| in order to
determine whether to consider its second operand.  And because the
expanded expression is left-associated, this trial evaluation causes
p_c_e_1 to be quadratic in the number of terms of the expression.

This patch fixes this quadratic behavior by making p_c_e_1 recurse only
into the first operand of a &&/|| when checking for potentiality.  This
means we'll consider more non-constant expressions as potentially
constant compared to the trial evaluation approach, but that should be
harmless as later constexpr evaluation of the expression will deem it
non-constant anyway.

As a nice bonus, this change also reduces compile time for the libstdc++
testcase 20_util/variant/87619.cc by about 15%, even though our 
uses right folds instead of left folds.  Likewise for the testcase in
the PR, for which compile time is reduced by 30%.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/102780

gcc/cp/ChangeLog:

* constexpr.c (potential_constant_expression_1) :
Only recurse into the first operand, and ignore the second.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/fold13.C: New test.
---
 gcc/cp/constexpr.c  | 21 +++--
 gcc/testsuite/g++.dg/cpp1z/fold13.C | 29 +
 2 files changed, 32 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/fold13.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 6f83d303cdd..9fd2c403afb 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -8880,28 +8880,13 @@ potential_constant_expression_1 (tree t, bool 
want_rval, bool strict, bool now,
  goto binary;
   }
 
-  /* If the first operand is the non-short-circuit constant, look at
-the second operand; otherwise we only care about the first one for
-potentiality.  */
 case TRUTH_AND_EXPR:
 case TRUTH_ANDIF_EXPR:
-  tmp = boolean_true_node;
-  goto truth;
 case TRUTH_OR_EXPR:
 case TRUTH_ORIF_EXPR:
-  tmp = boolean_false_node;
-truth:
-  {
-   tree op = TREE_OPERAND (t, 0);
-   if (!RECUR (op, rval))
- return false;
-   if (!processing_template_decl)
- op = cxx_eval_outermost_constant_expr (op, true);
-   if (tree_int_cst_equal (op, tmp))
- return RECUR (TREE_OPERAND (t, 1), rval);
-   else
- return true;
-  }
+  /* Only look at the first operand for potentiality since the second
+operand may be irrelevant if the first short circuits.  */
+  return RECUR (TREE_OPERAND (t, 0), rval);
 
 case PLUS_EXPR:
 case MULT_EXPR:
diff --git a/gcc/testsuite/g++.dg/cpp1z/fold13.C 
b/gcc/testsuite/g++.dg/cpp1z/fold13.C
new file mode 100644
index 000..b2b8c954be9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/fold13.C
@@ -0,0 +1,29 @@
+// { dg-do compile { target c++17 } }
+// Verify constexpr evaluation of a large left fold logical expression
+// isn't quadratic in the size of the expanded expression.
+
+template struct S { static constexpr bool value = true; };
+
+template struct integer_sequence { };
+
+template
+  using make_integer_sequence
+#if __has_builtin(__make_integer_seq)
+= __make_integer_seq;
+#else
+= integer_sequence;
+#endif
+
+template
+constexpr bool f_impl(integer_sequence) {
+  return (... && S::value);
+}
+
+static_assert(f_impl(make_integer_sequence()));
+
+template
+constexpr bool g_impl(integer_sequence) {
+  return (... || !S::value);
+}
+
+static_assert(!g_impl(make_integer_sequence()));
-- 
2.33.1.711.g9d530dc002



[PATCH] c++: CTAD within template argument [PR102933]

2021-10-26 Thread Patrick Palka via Gcc-patches
Here when checking for erroneous occurrences of 'auto' inside a template
argument (which is allowed by the concepts TS for class templates),
extract_autos_r picks up the CTAD placeholder for X{T{0}} which causes
check_auto_in_tmpl_args to reject this valid template argument.  This
patch fixes this by making extract_autos_r ignore CTAD placeholders.

However, it seems we don't need to call check_auto_in_tmpl_args at all
outside of the concepts TS since using 'auto' as a type-id is otherwise
rejected more generally at parse time.  So this patch guards calls to
check_auto_in_tmpl_args with flag_concepts_ts instead of flag_concepts.

Relatedly, I think the concepts code paths in do_auto_deduction and
type_uses_auto are also necessary only for the concepts TS, so this
patch also restricts these code paths accordingly.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk and perhaps 11?

PR c++/102933

gcc/cp/ChangeLog:

* parser.c (cp_parser_template_id): Call check_auto_in_tmpl_args
only for the concepts TS not also for standard concepts.
(cp_parser_simple_type_specifier): Adjust diagnostic for using
auto in parameter declaration.
* pt.c (tsubst_qualified_id): Call check_auto_in_tmpl_args only
for the concepts TS not also for standard concepts.
(extract_autos_r): Ignore CTAD placeholders.
(extract_autos): Use range-based for.
(do_auto_deduction): Use extract_autos only for the concepts TS
and not also for standard concepts.
(type_uses_auto): Likewise with for_each_template_parm.
(check_auto_in_tmpl_args): Assert that this function is used only
for the concepts tS.  Simplify.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class50.C: New test.
* g++.dg/cpp2a/nontype-class50a.C: New test.
---
 gcc/cp/parser.c   |  4 ++--
 gcc/cp/pt.c   | 24 +--
 gcc/testsuite/g++.dg/cpp2a/nontype-class50.C  | 13 ++
 gcc/testsuite/g++.dg/cpp2a/nontype-class50a.C |  5 
 4 files changed, 32 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class50.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class50a.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 49d951cfb19..5052f534d40 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -18168,7 +18168,7 @@ cp_parser_template_id (cp_parser *parser,
  types.  We reject them in functions, but if what we have is an
  identifier, even with none_type we can't conclude it's NOT a
  type, we have to wait for template substitution.  */
-  if (flag_concepts && check_auto_in_tmpl_args (templ, arguments))
+  if (flag_concepts_ts && check_auto_in_tmpl_args (templ, arguments))
 template_id = error_mark_node;
   /* Build a representation of the specialization.  */
   else if (identifier_p (templ))
@@ -19505,7 +19505,7 @@ cp_parser_simple_type_specifier (cp_parser* parser,
  else if (!flag_concepts)
pedwarn (token->location, 0,
 "use of % in parameter declaration "
-"only available with %<-fconcepts-ts%>");
+"only available with %<-std=c++20%> or %<-fconcepts%>");
}
   else
type = make_auto ();
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 287cf4ce9d0..3321601e6ff 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -16456,7 +16456,7 @@ tsubst_qualified_id (tree qualified_id, tree args,
 we want to catch is when we couldn't tell then, and can now,
 namely when templ prior to substitution was an
 identifier.  */
-  if (flag_concepts && check_auto_in_tmpl_args (expr, template_args))
+  if (flag_concepts_ts && check_auto_in_tmpl_args (expr, template_args))
return error_mark_node;
 
   if (variable_template_p (expr))
@@ -28560,7 +28560,7 @@ static int
 extract_autos_r (tree t, void *data)
 {
   hash_table &hash = *(hash_table*)data;
-  if (is_auto (t))
+  if (is_auto (t) && !template_placeholder_p (t))
 {
   /* All the autos were built with index 0; fix that up now.  */
   tree *p = hash.find_slot (t, INSERT);
@@ -28594,10 +28594,8 @@ extract_autos (tree type)
   for_each_template_parm (type, extract_autos_r, &hash, &visited, true);
 
   tree tree_vec = make_tree_vec (hash.elements());
-  for (hash_table::iterator iter = hash.begin();
-   iter != hash.end(); ++iter)
+  for (tree elt : hash)
 {
-  tree elt = *iter;
   unsigned i = TEMPLATE_PARM_IDX (TEMPLATE_TYPE_PARM_INDEX (elt));
   TREE_VEC_ELT (tree_vec, i)
= build_tree_list (NULL_TREE, TYPE_NAME (elt));
@@ -29837,7 +29835,7 @@ do_auto_deduction (tree type, tree init, tree auto_node,
   tree parms = build_tree_list (NULL_TREE, type);
   tree tparms;
 
-  if (flag_concepts)
+  if (flag_concepts_ts)
tparms = extract_autos (type);
   else
{
@@

Re: [PATCH v2 COMMITTED] rs6000: Fixes for tests including only

2021-10-26 Thread Paul A. Clarke via Gcc-patches
On Mon, Oct 25, 2021 at 05:32:51PM -0500, Segher Boessenkool wrote:
> On Mon, Oct 25, 2021 at 03:33:21PM -0500, Paul A. Clarke wrote:
> > * config/rs6000/x86intrin.h: Move some included headers to new
> > headers; include new immintrin.h instead.
> 
> s/; i/.  I/  (And instead of what?)
> 
> > * config/rs6000/immintrin.h: New.
> > * config/rs6000/x86gprintrin.h: New.
> 
> (That is a filename worse than our worst mnemonic :-) )

Not my choice. ;-)

> > * config/config.gcc (powerpc-*-*): Add new headers to extra_headers.
> 
> powerpc*-*-*
> 
> > --- a/gcc/testsuite/gcc.target/powerpc/pr78102.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr78102.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -mvsx" } */
> > -/* { dg-require-effective-target vsx_hw } */
> > +/* { dg-options "-O2 -mpower8-vector -DNO_WARN_X86_INTRINSICS" } */
> > +/* { dg-require-effective-target p8vector_hw } */
> 
> Please use -mcpu=power8 instead?  (And -mdejagnu-cpu=power8 in
> testcases).

So, -mdejagnu-cpu=power8 here.

> (The changelog should say you added the -D btw).

OK

> If you run you need *_hw.  If you only compile, like here, you want to
> use *_ok instead.

Yep, my mistake.  Fixed.

> Okay for trunk with those things tuned up.  Thanks!

Thanks for the review!  This has been committed:
--
Tests which only include  expect many other include files
to be brought in, but not enough are.

Try to increase compatibility with x86 headers by:
- Create new immintrin.h, including the analogous subset of intrinsics
  headers available for powerpc.
- Create new x86gprintrin.h, serving exclusively as the umbrella for
  bmiintrin.h and bmi2intrin.h.
- Modify x86intrin.h:
  - Include new immintrin.h.
  - Remove mmintrin.h, xmmintrin.h, emmintrin.h, now included indirectly
from immintrin.h.
  - Remove bmiintrin.h, bmi2intrin.h, now included indirectly from
x86gprintrin.h (which is now included from immintrin.h).

Add the new files to gcc/config.gcc.

Also, fix up the testcase that provoked PR102719, which requires
Power8 vector support.

Fixes commit 29fb1e831bf1c25e4574bf2f98a9f534e5c67665.

2021-10-25  Paul A. Clarke  

gcc
PR target/102719
* config/rs6000/x86intrin.h: Move some included headers to new
headers.  Include new immintrin.h instead of those headers.
* config/rs6000/immintrin.h: New.
* config/rs6000/x86gprintrin.h: New.
* config.gcc (powerpc*-*-*): Add new headers to extra_headers.

gcc/testsuite
* gcc.target/powerpc/pr78102.c: Fix dg directives to require Power8
vector support.  Also, add -DNO_WARN_X86_INTRINSICS.
---
 gcc/config.gcc |  2 +-
 gcc/config/rs6000/immintrin.h  | 41 ++
 gcc/config/rs6000/x86gprintrin.h   | 31 
 gcc/config/rs6000/x86intrin.h  | 10 +-
 gcc/testsuite/gcc.target/powerpc/pr78102.c |  4 +--
 5 files changed, 76 insertions(+), 12 deletions(-)
 create mode 100644 gcc/config/rs6000/immintrin.h
 create mode 100644 gcc/config/rs6000/x86gprintrin.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index fb1f06f3da89..efd1f42ac234 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -490,7 +490,7 @@ powerpc*-*-*)
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
extra_headers="${extra_headers} mmintrin.h x86intrin.h"
extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h"
-   extra_headers="${extra_headers} nmmintrin.h"
+   extra_headers="${extra_headers} nmmintrin.h immintrin.h x86gprintrin.h"
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
extra_headers="${extra_headers} amo.h"
case x$with_cpu in
diff --git a/gcc/config/rs6000/immintrin.h b/gcc/config/rs6000/immintrin.h
new file mode 100644
index ..647a5ae49b5a
--- /dev/null
+++ b/gcc/config/rs6000/immintrin.h
@@ -0,0 +1,41 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   

Re: [PATCH v3 1/1] [ARM] Add support for TLS register based stack protector canary access

2021-10-26 Thread Kees Cook via Gcc-patches
On Tue, Oct 26, 2021 at 10:18:36AM +0200, Ard Biesheuvel wrote:
> Add support for accessing the stack canary value via the TLS register,
> so that multiple threads running in the same address space can use
> distinct canary values. This is intended for the Linux kernel running in
> SMP mode, where processes entering the kernel are essentially threads
> running the same program concurrently: using a global variable for the
> canary in that context is problematic because it can never be rotated,
> and so the OS is forced to use the same value as long as it remains up.
> 
> Using the TLS register to index the stack canary helps with this, as it
> allows each CPU to context switch the TLS register along with the rest
> of the process, permitting each process to use its own value for the
> stack canary.
> 
> 2021-10-21 Ard Biesheuvel 
> 
>   * config/arm/arm-opts.h (enum stack_protector_guard): New
>   * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem):
>   New
>   * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define
>   (arm_option_override_internal): Handle and put in error checks
>   for stack protector guard options.
>   (arm_option_reconfigure_globals): Likewise
>   (arm_stack_protect_tls_canary_mem): New
>   (arm_stack_protect_guard): New
>   * config/arm/arm.md (stack_protect_set): New
>   (stack_protect_set_tls): Likewise
>   (stack_protect_test): Likewise
>   (stack_protect_test_tls): Likewise
>   * config/arm/arm.opt (-mstack-protector-guard): New
>   (-mstack-protector-guard-offset): New.
> 
> Signed-off-by: Ard Biesheuvel 

I can't speak to the specific implementation details here, but this
builds for me, and behaves as expected. I get a working kernel[1],
and have verified[2] that we have per-task canaries for arm32. :) Yay!

Tested-by: Kees Cook 

Who's best to review and commit this? Qing, is something you're able to
review?

Thanks!

-Kees

[1] 
https://lore.kernel.org/linux-arm-kernel/20211021142516.1843042-1-a...@kernel.org/
[2] 
https://lore.kernel.org/linux-hardening/2021103826.330653-3-keesc...@chromium.org/

-- 
Kees Cook


Re: [PATCH] gcc/configure: check for powerpc64le-unknown-freebsd

2021-10-26 Thread Segher Boessenkool
On Sun, Oct 17, 2021 at 02:28:48AM +0200, Piotr Kubaj wrote:
> On 21-10-16 18:57:39, Segher Boessenkool wrote:
> > On Sat, Oct 16, 2021 at 04:09:05AM +0200, Piotr Kubaj wrote:
> > > Only powerpc64-unknown-freebsd was checked for.
> > > 
> > > Signed-off-by: Piotr Kubaj 
> > 
> > Thanks!
> > 
> > I pushed this now, with commit message (including changelog, which is
> > required):
> 
> Thank you!
> 
> Since powerpc64le-unknown-freebsd is supported also in 9, 10 and 11, can you 
> also backport it?

Done, for all of those.

Cheers,


Segher


Re: [committed] MAINTAINERS: Add myself as a VAX port maintainer

2021-10-26 Thread Martin Liška

On 10/26/21 17:26, Maciej W. Rozycki wrote:

* MAINTAINERS (CPU Port Maintainers): Add myself as a VAX port
maintainer.


Congratulations!

Please remove your name from Write After Approval:

$ make check -k RUNTESTFLAGS="maintainers.exp"
...
Running /home/marxin/Programming/gcc/gcc/testsuite/gcc.src/maintainers.exp ...
Redundant in write approval: Maciej W. Rozycki
FAIL: maintainers-verify.sh

Cheers,
Martin


---
  MAINTAINERS | 1 +
  1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b22f930583a..901a6305184 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -117,6 +117,7 @@ tilegx port Walter Lee  

  tilepro port  Walter Lee  
  v850 port Nick Clifton
  vax port  Matt Thomas 
+vax port   Maciej W. Rozycki   
  visium port   Eric Botcazou   
  x86-64 port   Jan Hubicka 
  xstormy16 portNick Clifton





[committed] MAINTAINERS: Add myself as a VAX port maintainer

2021-10-26 Thread Maciej W. Rozycki
* MAINTAINERS (CPU Port Maintainers): Add myself as a VAX port
maintainer.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b22f930583a..901a6305184 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -117,6 +117,7 @@ tilegx port Walter Lee  

 tilepro port   Walter Lee  
 v850 port  Nick Clifton
 vax port   Matt Thomas 
+vax port   Maciej W. Rozycki   
 visium portEric Botcazou   
 x86-64 portJan Hubicka 
 xstormy16 port Nick Clifton
-- 
2.11.0


[PATCH] Fix negative integer range for UInteger.

2021-10-26 Thread Martin Liška

Hello.

We should not allow a negative IntegerRange for UInteger options.
The negative ranges for the 2 params are default and does not make sense.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* opt-functions.awk: Add new sanity checking.
* optc-gen.awk: Add new argument to integer_range_info.
* params.opt: Update 2 params which have negative IntegerRange.
---
 gcc/opt-functions.awk | 4 +++-
 gcc/optc-gen.awk  | 2 +-
 gcc/params.opt| 4 ++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/opt-functions.awk b/gcc/opt-functions.awk
index be9255064d7..9bc85604066 100644
--- a/gcc/opt-functions.awk
+++ b/gcc/opt-functions.awk
@@ -356,7 +356,7 @@ function search_var_name(name, opt_numbers, opts, flags, 
n_opts)
 return ""
 }
 
-function integer_range_info(range_option, init, option)

+function integer_range_info(range_option, init, option, uinteger_used)
 {
 if (range_option != "") {
ival = init + 0;
@@ -364,6 +364,8 @@ function integer_range_info(range_option, init, option)
end = nth_arg(1, range_option) + 0;
if (init != "" && init != "-1" && (ival < start || ival > end))
  print "#error initial value " init " of '" option "' must be in range [" start 
"," end "]"
+   if (uinteger_used && start < 0)
+ print "#error '" option"': negative IntegerRange (" start ", " end ") 
cannot be combined with UInteger"
return start ", " end
 }
 else
diff --git a/gcc/optc-gen.awk b/gcc/optc-gen.awk
index 77e598efd60..ebc1a02fa36 100644
--- a/gcc/optc-gen.awk
+++ b/gcc/optc-gen.awk
@@ -422,7 +422,7 @@ for (i = 0; i < n_opts; i++) {
   cl_flags, cl_bit_fields)
printf("%s, %s, %s }%s\n", var_ref(opts[i], flags[i]),
   var_set(flags[i]), integer_range_info(opt_args("IntegerRange", 
flags[i]),
-   opt_args("Init", flags[i]), opts[i]), comma)
+   opt_args("Init", flags[i]), opts[i], flag_set_p("UInteger", 
flags[i])), comma)
 
 	# Bump up the informational option index.

++optindex
diff --git a/gcc/params.opt b/gcc/params.opt
index 393d52bc660..1bad2b51df0 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -398,7 +398,7 @@ Common Joined UInteger Var(param_lim_expensive) Init(20) 
Param Optimization
 The minimum cost of an expensive expression in the loop invariant motion.
 
 -param=logical-op-non-short-circuit=

-Common Joined UInteger Var(param_logical_op_non_short_circuit) Init(-1) 
IntegerRange(-1, 1) Param
+Common Joined UInteger Var(param_logical_op_non_short_circuit) Init(-1) 
IntegerRange(0, 1) Param
 True if a non-short-circuit operation is optimal.
 
 -param=loop-block-tile-size=

@@ -1128,7 +1128,7 @@ Common Joined UInteger Var(param_vect_epilogues_nomask) 
Init(1) IntegerRange(0,
 Enable loop epilogue vectorization using smaller vector size.
 
 -param=vect-max-peeling-for-alignment=

-Common Joined UInteger Var(param_vect_max_peeling_for_alignment) Init(-1) 
IntegerRange(-1, 64) Param Optimization
+Common Joined UInteger Var(param_vect_max_peeling_for_alignment) Init(-1) 
IntegerRange(0, 64) Param Optimization
 Maximum number of loop peels to enhance alignment of data references in a loop.
 
 -param=vect-max-version-for-alias-checks=

--
2.33.1



Re: [PATCH] aix: handle 64bit inodes for include directories

2021-10-26 Thread CHIGOT, CLEMENT via Gcc-patches
Hi everyone,

Gentle ping on this patch.

Clément

From: CHIGOT, CLEMENT 
Sent: Tuesday, October 12, 2021 10:35 AM
To: Jeff Law ; David Malcolm 
Cc: gcc-patches@gcc.gnu.org ; David Edelsohn 

Subject: Re: [PATCH] aix: handle 64bit inodes for include directories

Hi Jeff,

Any update on this patch ?
As it's dealing with configure files, I would like to have it merged
asap before any conflicts appear.

Thanks,
Clément


Re: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-10-26 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi,
>
> Following the discussion below here's a revised patch.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?

Looks good functionally, just got some comments about the implementation.

> @@ -14006,8 +14007,52 @@ cost_plus:
>mode, MULT, 1, speed);
>return true;
>  }
> + break;
> +case CONST_VECTOR:
> + {
> +   rtx gen_insn = aarch64_simd_make_constant (x, true);
> +   /* Not a valid const vector.  */
> +   if (!gen_insn)
> + break;
>  
> -  /* Fall through.  */
> +   switch (GET_CODE (gen_insn))
> +   {
> +   case CONST_VECTOR:
> + /* Load using MOVI/MVNI.  */
> + if (aarch64_simd_valid_immediate (x, NULL))
> +   *cost += extra_cost->vect.movi;
> + else /* Load using constant pool.  */
> +   *cost += extra_cost->ldst.load;
> + break;
> +   /* Load using a DUP.  */
> +   case VEC_DUPLICATE:
> + gcc_unreachable ();
> + break;
> +   default:
> + *cost += extra_cost->ldst.load;
> + break;
> +   }
> +   return true;
> + }

This might be a problem (if it is a problem) with some of the existing
cases too, but: is using += rather than = the right behaviour here?
It maens that we add our cost on top of whatever the target-independent
rtx_costs thought was a good default choice, whereas it looks like
these table entries specify the correct full cost.

If it's not clear-cut, then I think using = would be better.

Also, going back to an earlier part of the thread, I think the “inner”
CONST_VECTOR case is now a correct replacement for the “outer”
CONST_VECTOR case, meaning we don't need the aarch64_simd_make_constant
bits.  I.e. I think we can make the top-level case:

case CONST_VECTOR:
  /* Load using MOVI/MVNI.  */
  if (aarch64_simd_valid_immediate (x, NULL))
*cost = extra_cost->vect.movi;
  else /* Load using constant pool.  */
*cost = extra_cost->ldst.load;
  break;

> +case VEC_CONCAT:
> + /* depending on the operation, either DUP or INS.
> +For now, keep default costing.  */
> + break;
> +case VEC_DUPLICATE:
> + *cost += extra_cost->vect.dup;
> + return true;

For this I think we should do:

  *cost = extra_cost->vect.dup;
  return false;

so that we cost the operand of the vec_duplicate as well.
This will have no effect if the operand is a REG, but would
affect more complex expressions.

> +case VEC_SELECT:
> + {

Here I think we should recurse on operand 0:

  rtx op0 = XEXP (x, 0);
  *cost = rtx_cost (op0, GET_MODE (op0), VEC_SELECT, 0, speed);

> +   /* cost subreg of 0 as free, otherwise as DUP */
> +   rtx op1 = XEXP (x, 1);
> +   if (vec_series_lowpart_p (mode, GET_MODE (op1), op1))
> + ;
> +   else if (vec_series_highpart_p (mode, GET_MODE (op1), op1))
> + *cost += extra_cost->vect.dup;
> +   else
> + *cost += extra_cost->vect.extract;
> +   return true;
> + }
>  default:
>break;
>  }
> @@ -20654,9 +20699,12 @@ aarch64_builtin_support_vector_misalignment 
> (machine_mode mode,
>  
>  /* If VALS is a vector constant that can be loaded into a register
> using DUP, generate instructions to do so and return an RTX to
> -   assign to the register.  Otherwise return NULL_RTX.  */
> +   assign to the register.  Otherwise return NULL_RTX.
> +
> +   If CHECK then the resulting instruction may not be used in
> +   codegen but can be used for costing.  */
>  static rtx
> -aarch64_simd_dup_constant (rtx vals)
> +aarch64_simd_dup_constant (rtx vals, bool check = false)
>  {
>machine_mode mode = GET_MODE (vals);
>machine_mode inner_mode = GET_MODE_INNER (mode);
> @@ -20668,7 +20716,8 @@ aarch64_simd_dup_constant (rtx vals)
>/* We can load this constant by using DUP and a constant in a
>   single ARM register.  This will be cheaper than a vector
>   load.  */
> -  x = copy_to_mode_reg (inner_mode, x);
> +  if (!check)
> +x = copy_to_mode_reg (inner_mode, x);
>return gen_vec_duplicate (mode, x);
>  }
>  
> @@ -20676,9 +20725,12 @@ aarch64_simd_dup_constant (rtx vals)
>  /* Generate code to load VALS, which is a PARALLEL containing only
> constants (for vec_init) or CONST_VECTOR, efficiently into a
> register.  Returns an RTX to copy into the register, or NULL_RTX
> -   for a PARALLEL that cannot be converted into a CONST_VECTOR.  */
> +   for a PARALLEL that cannot be converted into a CONST_VECTOR.
> +
> +   If CHECK then the resulting instruction may not be used in
> +   codegen but can be used for costing.  */
>  static rtx
> -aarch64_simd_make_constant (rtx vals)
> +aarch64_simd_make_constant (rtx vals, bool check = false)
>  {
>machine_mode mode = GET_MODE (vals);
>rtx const_dup;
> @@ -20710,7 +20762,7 @@ aarch64_simd_make_constant (rtx vals)
> 

Re: [PATCH v2] c++: P2360R0: Extend init-stmt to allow alias-decl [PR102617]

2021-10-26 Thread Jason Merrill via Gcc-patches

On 10/21/21 18:27, Marek Polacek wrote:

On Thu, Oct 21, 2021 at 04:56:57PM -0400, Jason Merrill wrote:

On 10/21/21 16:26, Marek Polacek wrote:

The following patch implements C++23 P2360R0.  This proposal merely
extends init-statement to contain alias-declaration.  init-statement
is used in if/for/switch.  The unsightly duplication of the new code
seems to be necessary to handle

for ( init-statement condition[opt] ; expression[opt] ) statement

as well as

for ( init-statement[opt] for-range-declaration : for-range-initializer ) 
statement


It seems like the duplication of the new code is a consequence of the
duplication of the old code.  I'd think we could remove the duplication by
remembering the result of cp_parser_range_based_for_with_init_p and then
recursing at the end if it was true.  Or check it in cp_parser_for and call
cp_parser_init_statement twice.


That works well, just had to move the pedwarn too.  dg.exp passes, full testing
running, OK if it passes?


OK.


-- >8 --
The following patch implements C++23 P2360R0.  This proposal merely
extends init-statement to contain alias-declaration.  init-statement
is used in if/for/switch.  It also removes the unsightly duplication
of code by calling cp_parser_init_statement twice.

PR c++/102617

gcc/cp/ChangeLog:

* parser.c (cp_parser_for): Maybe call cp_parser_init_statement
twice.  Warn about range-based for loops with initializer here.
(cp_parser_init_statement): Don't duplicate code.  Allow
alias-declaration in init-statement.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/init-stmt1.C: New test.
* g++.dg/cpp23/init-stmt2.C: New test.
---
  gcc/cp/parser.c | 70 ++---
  gcc/testsuite/g++.dg/cpp23/init-stmt1.C | 31 +++
  gcc/testsuite/g++.dg/cpp23/init-stmt2.C | 25 +
  3 files changed, 95 insertions(+), 31 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp23/init-stmt1.C
  create mode 100644 gcc/testsuite/g++.dg/cpp23/init-stmt2.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 49d951cfb19..93335c817d7 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -12040,6 +12040,7 @@ cp_parser_handle_directive_omp_attributes (cp_parser 
*parser, tree *pattrs,
init-statement:
  expression-statement
  simple-declaration
+alias-declaration
  
TM Extension:
  
@@ -13327,6 +13328,23 @@ cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)

/* Begin the for-statement.  */
scope = begin_for_scope (&init);
  
+  /* Maybe parse the optional init-statement in a range-based for loop.  */

+  if (cp_parser_range_based_for_with_init_p (parser)
+  /* Checked for diagnostic purposes only.  */
+  && cp_lexer_next_token_is_not (parser->lexer, CPP_SEMICOLON))
+{
+  tree dummy;
+  cp_parser_init_statement (parser, &dummy);
+  if (cxx_dialect < cxx20)
+   {
+ pedwarn (cp_lexer_peek_token (parser->lexer)->location,
+  OPT_Wc__20_extensions,
+  "range-based % loops with initializer only "
+  "available with %<-std=c++20%> or %<-std=gnu++20%>");
+ decl = error_mark_node;
+   }
+}
+
/* Parse the initialization.  */
is_range_for = cp_parser_init_statement (parser, &decl);
  
@@ -13987,12 +14005,13 @@ cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep,

return statement;
  }
  
-/* Parse a init-statement or the declarator of a range-based-for.

+/* Parse an init-statement or the declarator of a range-based-for.
 Returns true if a range-based-for declaration is seen.
  
 init-statement:

   expression-statement
- simple-declaration  */
+ simple-declaration
+ alias-declaration  */
  
  static bool

  cp_parser_init_statement (cp_parser *parser, tree *decl)
@@ -14008,40 +14027,29 @@ cp_parser_init_statement (cp_parser *parser, tree 
*decl)
bool is_range_for = false;
bool saved_colon_corrects_to_scope_p = 
parser->colon_corrects_to_scope_p;
  
-  /* Try to parse the init-statement.  */

-  if (cp_parser_range_based_for_with_init_p (parser))
-   {
- tree dummy;
- cp_parser_parse_tentatively (parser);
- /* Parse the declaration.  */
- cp_parser_simple_declaration (parser,
-   /*function_definition_allowed_p=*/false,
-   &dummy);
- cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
- if (!cp_parser_parse_definitely (parser))
-   /* That didn't work, try to parse it as an expression-statement.  */
-   cp_parser_expression_statement (parser, NULL_TREE);
-
- if (cxx_dialect < cxx20)
-   {
- pedwarn (cp_lexer_peek_token (parser->lexer)->location,
-  OPT_Wc__20_extensions,
-  "range-based % loops with initializer only "
-  

Re: [PATCH v2] c++tools: Fix memory leak

2021-10-26 Thread Jason Merrill via Gcc-patches

On 10/21/21 17:34, Jonathan Wakely wrote:

On Thu, 21 Oct 2021 at 20:38, Jason Merrill wrote:

On 10/21/21 09:28, Jonathan Wakely wrote:
 >   #else
 >     buffer = xmalloc (stat.st_size);
 >     if (!buffer)
 >       return -errno;
 > +  struct Deleter { void operator()(void* p) const { free(p); } };
 > +  std::unique_ptr guard;

Don't you need to initialize guard from buffer?


Oops, yes!  Updated patch attached.


OK, thanks.

Jason




Re: [RFC] Partial vectors for s390

2021-10-26 Thread Richard Sandiford via Gcc-patches
Robin Dapp  writes:
> Hi Richard,
>
>> We already have code to probe the predicates of the underlying
>> define_expands/insns to see whether they support certain constant
>> IFN arguments; see e.g. internal_gather_scatter_fn_supported_p.
>> We could do something similar here: add an extra operand to the optab,
>> and an extra argument to the IFN, that gives a bias amount.
>> The PowerPC version would require 0, the System Z version would
>> require -1.  The vectoriser would probe to see which value
>> it should use.
>> 
>> Doing it that way ensures that the gimple is still self-describing.
>> It avoids gimple semantics depending on target hooks.
>
> As I don't have much previous exposure to the vectoriser code, I cobbled 
> together something pretty ad-hoc (attached).  Does this come somehow 
> close to what you have in mind?

Yeah, looks good.

> internal_len_load_supported_p should rather be called 
> internal_len_load_bias_supported_p or so I guess and the part where we 
> exclude multiple loop_lens is still missing.

Since we only support one bias, it might be better to make the
internal-fn.c function return the bias as an int (with some marker
value for “not supported”), so that the caller doesn't need to probe
both values.

> Would we also check for a viable bias there and then either accept
> multiple lens or not?

Yeah, I think so.

Thanks,
Richard

>
> Regards
>   Robin
>
> commit 2320dbfdfe1477b15a2ac59847d2a52e68de49ab
> Author: Robin Dapp 
> Date:   Tue Oct 26 14:36:08 2021 +0200
>
> bias1
>
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index 8312d08aab2..bf97d3e471a 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -2696,9 +2696,9 @@ expand_call_mem_ref (tree type, gcall *stmt, int index)
>  static void
>  expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>  {
> -  class expand_operand ops[3];
> -  tree type, lhs, rhs, maskt;
> -  rtx mem, target, mask;
> +  class expand_operand ops[4];
> +  tree type, lhs, rhs, maskt, biast;
> +  rtx mem, target, mask, bias;
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> @@ -2727,7 +2727,18 @@ expand_partial_load_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>TYPE_UNSIGNED (TREE_TYPE (maskt)));
>else
>  create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
> -  expand_insn (icode, 3, ops);
> +  if (optab == len_load_optab)
> +{
> +  biast = gimple_call_arg (stmt, 3);
> +  bias = expand_normal (biast);
> +  create_input_operand (&ops[3], bias, SImode);
> +}
> +
> +  if (optab != len_load_optab)
> +expand_insn (icode, 3, ops);
> +  else
> +expand_insn (icode, 4, ops);
> +
>if (!rtx_equal_p (target, ops[0].value))
>  emit_move_insn (target, ops[0].value);
>  }
> @@ -2741,9 +2752,9 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>  static void
>  expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>  {
> -  class expand_operand ops[3];
> -  tree type, lhs, rhs, maskt;
> -  rtx mem, reg, mask;
> +  class expand_operand ops[4];
> +  tree type, lhs, rhs, maskt, biast;
> +  rtx mem, reg, mask, bias;
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> @@ -2770,7 +2781,17 @@ expand_partial_store_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>TYPE_UNSIGNED (TREE_TYPE (maskt)));
>else
>  create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
> -  expand_insn (icode, 3, ops);
> +  if (optab == len_store_optab)
> +{
> +  biast = gimple_call_arg (stmt, 4);
> +  bias = expand_normal (biast);
> +  create_input_operand (&ops[3], bias, SImode);
> +}
> +
> +  if (optab != len_store_optab)
> +expand_insn (icode, 3, ops);
> +  else
> +expand_insn (icode, 4, ops);
>  }
>  
>  #define expand_mask_store_optab_fn expand_partial_store_optab_fn
> @@ -4154,6 +4175,25 @@ internal_gather_scatter_fn_supported_p (internal_fn 
> ifn, tree vector_type,
> && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale)));
>  }
>  
> +bool
> +internal_len_load_supported_p (internal_fn ifn, tree load_type, int bias)
> +{
> +  if (bias > 0 || bias < -1)
> +return false;
> +
> +  machine_mode mode = TYPE_MODE (load_type);
> +
> +  optab optab = direct_internal_fn_optab (ifn);
> +  insn_code icode = direct_optab_handler (optab, mode);
> +  int output_ops = internal_load_fn_p (ifn) ? 1 : 0;
> +
> +  if (icode != CODE_FOR_nothing
> +  && insn_operand_matches (icode, 2 + output_ops, GEN_INT (bias)))
> +return true;
> +
> +  return false;
> +}
> +
>  /* Return true if the target supports IFN_CHECK_{RAW,WAR}_PTRS function IFN
> for pointers of type TYPE when the accesses have LENGTH bytes and their
> common byte alignment is ALIGN.  */
> diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
> index 19d0f849a5a..d0bf9941bcc 100644
> --

Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-26 Thread Richard Biener via Gcc-patches
On Tue, 26 Oct 2021, Jakub Jelinek wrote:

> On Tue, Oct 26, 2021 at 03:21:55PM +0200, Richard Biener wrote:
> > On Tue, 26 Oct 2021, Jakub Jelinek wrote:
> > 
> > > On Tue, Oct 26, 2021 at 03:13:29PM +0200, Richard Biener wrote:
> > > > try
> > > >   auto c = ...;
> > > >   signed char c2 = c;
> > > >   return c2 >= ...
> > > > then
> > > 
> > > That won't work, at least when using , which is what we with the
> > > optimization want to deal with primarily.
> > > Because std::partial_ordering etc. aren't implicitly nor explicitly
> > > convertible to int or signed char etc.
> > > Sure, one could in the testcase define its own std::strong_ordering etc.
> > > and define a conversion operator for it...
> > 
> > So how do we end up with the signed char case in the first place?
> > Is the frontend using a type that's target dependent?
> 
>  uses explicitly signed char:
> namespace std
> {
>   // [cmp.categories], comparison category types
>   namespace __cmp_cat
>   {
> using type = signed char;
> enum class _Ord : type { equivalent = 0, less = -1, greater = 1 };
> enum class _Ncmp : type { _Unordered = 2 };
> ...
> and __cmp_cat::type is what is used as type of _M_value of std::*_ordering
> -fsigned-char vs. -funsigned-char make no difference on the testcases on
> x86, but as mentioned in 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589#c24
> some target decisions like load_extend_op uses in fold-const.c can affect
> it.  See https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570714.html

Eh, I see.  So there are alrady testcases covering the issues on
the affected targets.

So ignore my comment about adding additional testcases.

Richard.


Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-26 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 26, 2021 at 03:21:55PM +0200, Richard Biener wrote:
> On Tue, 26 Oct 2021, Jakub Jelinek wrote:
> 
> > On Tue, Oct 26, 2021 at 03:13:29PM +0200, Richard Biener wrote:
> > > try
> > >   auto c = ...;
> > >   signed char c2 = c;
> > >   return c2 >= ...
> > > then
> > 
> > That won't work, at least when using , which is what we with the
> > optimization want to deal with primarily.
> > Because std::partial_ordering etc. aren't implicitly nor explicitly
> > convertible to int or signed char etc.
> > Sure, one could in the testcase define its own std::strong_ordering etc.
> > and define a conversion operator for it...
> 
> So how do we end up with the signed char case in the first place?
> Is the frontend using a type that's target dependent?

 uses explicitly signed char:
namespace std
{
  // [cmp.categories], comparison category types
  namespace __cmp_cat
  {
using type = signed char;
enum class _Ord : type { equivalent = 0, less = -1, greater = 1 };
enum class _Ncmp : type { _Unordered = 2 };
...
and __cmp_cat::type is what is used as type of _M_value of std::*_ordering
-fsigned-char vs. -funsigned-char make no difference on the testcases on
x86, but as mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589#c24
some target decisions like load_extend_op uses in fold-const.c can affect
it.  See https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570714.html

Jakub



Re: [PATCH 1/4] Makefile.in: Ensure build CPP is used for build targets

2021-10-26 Thread Richard Biener via Gcc-patches
On Tue, Oct 26, 2021 at 2:57 PM Richard Purdie
 wrote:
>
> On Tue, 2021-10-26 at 14:55 +0200, Richard Biener wrote:
> > On Tue, Oct 26, 2021 at 2:45 PM Richard Purdie via Gcc-patches
> >  wrote:
> > >
> > > During cross compiling, CPP is being set to the target compiler even for
> > > build targets. As an example, when building a cross compiler targetting
> > > mingw, the config.log for libiberty in
> > > build.x86_64-pokysdk-mingw32.i586-poky-linux/build-x86_64-linux/libiberty/config.log
> > > shows:
> > >
> > > configure:3786: checking how to run the C preprocessor
> > > configure:3856: result: x86_64-pokysdk-mingw32-gcc -E 
> > > --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32
> > > configure:3876: x86_64-pokysdk-mingw32-gcc -E 
> > > --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 conftest.c
> > > configure:3876: $? = 0
> > >
> > > This is libiberty being built for the build environment, not the target 
> > > one
> > > (i.e. in build-x86_64-linux). As such it should be using the build 
> > > environment's
> > > gcc and not the target one. In the mingw case the system headers are quite
> > > different leading to build failures related to not being able to include a
> > > process.h file for pem-unix.c.
> > >
> > > Fix this by using CC_FOR_BUILD instead of CC. Ultimately a CPP_FOR_BUILD
> > > could be defined but CC_FOR_BUILD seems at least more correct and is a 
> > > simpler
> > > fix.
> >
> > Since we're using AC_PROG_CPP to find the preprocessor simply assuming
> > that $(CC_FOR_BUILD) -E works looks dangerous?
>
> Potentially, yes. So the route of adding CPP_FOR_BUILD would be preferred?

I would say so, yes.

Richard.

>
> Cheers,
>
> Richard
>


Re: [PATCH]AArch64 Lower intrinsics shift to GIMPLE when possible.

2021-10-26 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.c
>   (aarch64_general_gimple_fold_builtin): Add ashl, sshl, ushl, ashr,
>   ashr_simd, lshr, lshr_simd.
>   * config/aarch64/aarch64-simd-builtins.def (lshr): Use USHIFTIMM.
>   * config/aarch64/arm_neon.h (vshr_n_u8, vshr_n_u16, vshr_n_u32,
>   vshrq_n_u8, vshrq_n_u16, vshrq_n_u32, vshrq_n_u64): Fix type hack.
>
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-1.c: New test.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-2.c: New test.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-3.c: New test.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-4.c: New test.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-5.c: New test.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-6.c: New test.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-7.c: New test.
>   * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-8.c: New test.
>   * gcc.target/aarch64/signbit-2.c: New test.

OK, thanks.  For the record, I guess vshl-opt-[1-3].c are now not really
testing for optimisations, since the new versions of the tests use the
intrinsics directly associated with the output (instead of using vshl,
like the original [1-3].c did).  I think they're still worthwhile tests
though.  Who knows what they might find in future. :-)

Richard

>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index 
> f6b41d9c200d6300dee65ba60ae94488231a8a38..c362b29186cfc0bf0d39c08c314cfd6a99124cb2
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -2394,6 +2394,54 @@ aarch64_general_gimple_fold_builtin (unsigned int 
> fcode, gcall *stmt)
>  1, args[0]);
>   gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
>   break;
> +  BUILTIN_VSDQ_I_DI (BINOP, ashl, 3, NONE)
> + if (TREE_CODE (args[1]) == INTEGER_CST
> + && wi::ltu_p (wi::to_wide (args[1]), element_precision (args[0])))
> +   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
> +   LSHIFT_EXPR, args[0], args[1]);
> + break;
> +  BUILTIN_VSDQ_I_DI (BINOP, sshl, 0, NONE)
> +  BUILTIN_VSDQ_I_DI (BINOP_UUS, ushl, 0, NONE)
> + {
> +   tree cst = args[1];
> +   tree ctype = TREE_TYPE (cst);
> +   /* Left shifts can be both scalar or vector, e.g. uint64x1_t is
> +  treated as a scalar type not a vector one.  */
> +   if ((cst = uniform_integer_cst_p (cst)) != NULL_TREE)
> + {
> +   wide_int wcst = wi::to_wide (cst);
> +   tree unit_ty = TREE_TYPE (cst);
> +
> +   wide_int abs_cst = wi::abs (wcst);
> +   if (wi::geu_p (abs_cst, element_precision (args[0])))
> + break;
> +
> +   if (wi::neg_p (wcst, TYPE_SIGN (ctype)))
> + {
> +   tree final_cst;
> +   final_cst = wide_int_to_tree (unit_ty, abs_cst);
> +   if (TREE_CODE (cst) != INTEGER_CST)
> + final_cst = build_uniform_cst (ctype, final_cst);
> +
> +   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
> +   RSHIFT_EXPR, args[0],
> +   final_cst);
> + }
> +   else
> + new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
> + LSHIFT_EXPR, args[0], args[1]);
> + }
> + }
> + break;
> +  BUILTIN_VDQ_I (SHIFTIMM, ashr, 3, NONE)
> +  VAR1 (SHIFTIMM, ashr_simd, 0, NONE, di)
> +  BUILTIN_VDQ_I (USHIFTIMM, lshr, 3, NONE)
> +  VAR1 (USHIFTIMM, lshr_simd, 0, NONE, di)
> + if (TREE_CODE (args[1]) == INTEGER_CST
> + && wi::ltu_p (wi::to_wide (args[1]), element_precision (args[0])))
> +   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
> +   RSHIFT_EXPR, args[0], args[1]);
> + break;
>BUILTIN_GPF (BINOP, fmulx, 0, ALL)
>   {
> gcc_assert (nargs == 2);
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> 402453aa9bba5949da43c984c4603196b1efd092..bbe0a4a3c4aea4187e7b1a9f10ab60e79df7b138
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -409,7 +409,7 @@
>  
>BUILTIN_VDQ_I (SHIFTIMM, ashr, 3, NONE)
>VAR1 (SHIFTIMM, ashr_simd, 0, NONE, di)
> -  BUILTIN_VDQ_I (SHIFTIMM, lshr, 3, NONE)
> +  BUILTIN_VDQ_I (USHIFTIMM, lshr, 3, NONE)
>VAR1 (USHIFTIMM, lshr_simd, 0, NONE, di)
>/* Implemented by aarch64_shr_n.  */
>  

Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-26 Thread Richard Biener via Gcc-patches
On Tue, 26 Oct 2021, Jakub Jelinek wrote:

> On Tue, Oct 26, 2021 at 03:13:29PM +0200, Richard Biener wrote:
> > try
> >   auto c = ...;
> >   signed char c2 = c;
> >   return c2 >= ...
> > then
> 
> That won't work, at least when using , which is what we with the
> optimization want to deal with primarily.
> Because std::partial_ordering etc. aren't implicitly nor explicitly
> convertible to int or signed char etc.
> Sure, one could in the testcase define its own std::strong_ordering etc.
> and define a conversion operator for it...

So how do we end up with the signed char case in the first place?
Is the frontend using a type that's target dependent?

Richard.


Re: [RFC] Don't move cold code out of loop by checking bb count

2021-10-26 Thread Richard Biener via Gcc-patches
On Mon, Oct 18, 2021 at 6:29 AM Xionghu Luo  wrote:
>
>
>
> On 2021/10/15 16:11, Richard Biener wrote:
> > On Sat, Oct 9, 2021 at 5:45 AM Xionghu Luo  wrote:
> >>
> >> Hi,
> >>
> >> On 2021/9/28 20:09, Richard Biener wrote:
> >>> On Fri, Sep 24, 2021 at 8:29 AM Xionghu Luo  wrote:
> 
>  Update the patch to v3, not sure whether you prefer the paste style
>  and continue to link the previous thread as Segher dislikes this...
> 
> 
>  [PATCH v3] Don't move cold code out of loop by checking bb count
> 
> 
>  Changes:
>  1. Handle max_loop in determine_max_movement instead of
>  outermost_invariant_loop.
>  2. Remove unnecessary changes.
>  3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in 
>  can_sm_ref_p.
>  4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused
>  infinite loop when implementing v1 and the iteration is missed to be
>  updated actually.
> 
>  v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
>  v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html
> 
>  There was a patch trying to avoid move cold block out of loop:
> 
>  https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
> 
>  Richard suggested to "never hoist anything from a bb with lower execution
>  frequency to a bb with higher one in LIM invariantness_dom_walker
>  before_dom_children".
> 
>  In gimple LIM analysis, add find_coldest_out_loop to move invariants to
>  expected target loop, if profile count of the loop bb is colder
>  than target loop preheader, it won't be hoisted out of loop.
>  Likely for store motion, if all locations of the REF in loop is cold,
>  don't do store motion of it.
> 
>  SPEC2017 performance evaluation shows 1% performance improvement for
>  intrate GEOMEAN and no obvious regression for others.  Especially,
>  500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
>  largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
>  on P8LE.
> 
>  gcc/ChangeLog:
> 
>  * loop-invariant.c (find_invariants_bb): Check profile count
>  before motion.
>  (find_invariants_body): Add argument.
>  * tree-ssa-loop-im.c (find_coldest_out_loop): New function.
>  (determine_max_movement): Use find_coldest_out_loop.
>  (move_computations_worker): Adjust and fix iteration udpate.
>  (execute_sm_exit): Check pointer validness.
>  (class ref_in_loop_hot_body): New functor.
>  (ref_in_loop_hot_body::operator): New.
>  (can_sm_ref_p): Use for_all_locs_in_loop.
> 
>  gcc/testsuite/ChangeLog:
> 
>  * gcc.dg/tree-ssa/recip-3.c: Adjust.
>  * gcc.dg/tree-ssa/ssa-lim-18.c: New test.
>  * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
>  * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
>  ---
>   gcc/loop-invariant.c   | 10 ++--
>   gcc/tree-ssa-loop-im.c | 61 --
>   gcc/testsuite/gcc.dg/tree-ssa/recip-3.c|  2 +-
>   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++
>   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 27 ++
>   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 +
>   gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 28 ++
>   7 files changed, 165 insertions(+), 8 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
> 
>  diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
>  index fca0c2b24be..5c3be7bf0eb 100644
>  --- a/gcc/loop-invariant.c
>  +++ b/gcc/loop-invariant.c
>  @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool 
>  always_reached, bool always_executed)
>  call.  */
> 
>   static void
>  -find_invariants_bb (basic_block bb, bool always_reached, bool 
>  always_executed)
>  +find_invariants_bb (class loop *loop, basic_block bb, bool 
>  always_reached,
>  +   bool always_executed)
>   {
> rtx_insn *insn;
>  +  basic_block preheader = loop_preheader_edge (loop)->src;
>  +
>  +  if (preheader->count > bb->count)
>  +return;
> 
> FOR_BB_INSNS (bb, insn)
>   {
>  @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, 
>  basic_block *body,
> unsigned i;
> 
> for (i = 0; i < loop->num_nodes; i++)
>  -find_invariants_bb (body[i],
>  -   bitmap_bit_p (always_reached, i),
>  +find_invariants_bb (loop, body[i], 

Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-26 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 26, 2021 at 03:13:29PM +0200, Richard Biener wrote:
> try
>   auto c = ...;
>   signed char c2 = c;
>   return c2 >= ...
> then

That won't work, at least when using , which is what we with the
optimization want to deal with primarily.
Because std::partial_ordering etc. aren't implicitly nor explicitly
convertible to int or signed char etc.
Sure, one could in the testcase define its own std::strong_ordering etc.
and define a conversion operator for it...

Jakub



RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-26 Thread Richard Biener via Gcc-patches
On Tue, 26 Oct 2021, Tamar Christina wrote:

> 
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, October 26, 2021 9:46 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd
> > 
> > Subject: RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns
> > on signed values
> > 
> > On Tue, 26 Oct 2021, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Tuesday, October 26, 2021 9:26 AM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd
> > > > 
> > > > Subject: RE: [PATCH] middle-end: fix de-optimizations with bitclear
> > > > patterns on signed values
> > > >
> > > > On Mon, 25 Oct 2021, Tamar Christina wrote:
> > > >
> > > > > > -Original Message-
> > > > > > From: Richard Biener 
> > > > > > Sent: Friday, October 15, 2021 12:31 PM
> > > > > > To: Tamar Christina 
> > > > > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ;
> > > > > > nd 
> > > > > > Subject: Re: [PATCH] middle-end: fix de-optimizations with
> > > > > > bitclear patterns on signed values
> > > > > >
> > > > > > On Fri, 15 Oct 2021, Tamar Christina wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > During testing after rebasing to commit I noticed a failing
> > > > > > > testcase with the bitmask compare patch.
> > > > > > >
> > > > > > > Consider the following C++ testcase:
> > > > > > >
> > > > > > > #include 
> > > > > > >
> > > > > > > #define A __attribute__((noipa)) A bool f5 (double i, double
> > > > > > > j) { auto c = i <=> j; return c >= 0; }
> > > > > > >
> > > > > > > This turns into a comparison against chars, on systems where
> > > > > > > chars are signed the pattern inserts an unsigned convert such
> > > > > > > that it's able to do the transformation.
> > > > > > >
> > > > > > > i.e.:
> > > > > > >
> > > > > > >   # RANGE [-1, 2]
> > > > > > >   # c$_M_value_22 = PHI <-1(3), 0(2), 2(5), 1(4)>
> > > > > > >   # RANGE ~[3, 254]
> > > > > > >   _11 = (unsigned char) c$_M_value_22;
> > > > > > >   _19 = _11 <= 1;
> > > > > > >   # .MEM_24 = VDEF <.MEM_6(D)>
> > > > > > >   D.10434 ={v} {CLOBBER};
> > > > > > >   # .MEM_14 = VDEF <.MEM_24>
> > > > > > >   D.10407 ={v} {CLOBBER};
> > > > > > >   # VUSE <.MEM_14>
> > > > > > >   return _19;
> > > > > > >
> > > > > > > instead of:
> > > > > > >
> > > > > > >   # RANGE [-1, 2]
> > > > > > >   # c$_M_value_5 = PHI <-1(3), 0(2), 2(5), 1(4)>
> > > > > > >   # RANGE [-2, 2]
> > > > > > >   _3 = c$_M_value_5 & -2;
> > > > > > >   _19 = _3 == 0;
> > > > > > >   # .MEM_24 = VDEF <.MEM_6(D)>
> > > > > > >   D.10440 ={v} {CLOBBER};
> > > > > > >   # .MEM_14 = VDEF <.MEM_24>
> > > > > > >   D.10413 ={v} {CLOBBER};
> > > > > > >   # VUSE <.MEM_14>
> > > > > > >   return _19;
> > > > > > >
> > > > > > > This causes much worse codegen under -ffast-math due to phiops
> > > > > > > no longer recognizing the pattern.  It turns out that phiopts
> > > > > > > spaceship_replacement is looking for the exact form that was
> > > > > > > just
> > > > changed.
> > > > > > >
> > > > > > > Trying to get it to recognize the new form is not trivial as
> > > > > > > the transformation doesn't look to work when the thing it's
> > > > > > > pointing to is itself
> > > > > > a phi-node.
> > > > > >
> > > > > > What do you mean?  Where it handles the BIT_AND it could also
> > > > > > handle the conversion, no?  The later handling would probably
> > > > > > more explicitely need to distinguish between the BIT_AND and the
> > > > > > conversion
> > > > forms.
> > > > >
> > > > > Looks like I misunderstood the code, it was looking at the uses
> > > > > not the defs of the value.
> > > > >
> > > > > --- inline copy of patch ---
> > > > >
> > > > > The comments seems to suggest this code only checks for (res & ~1)
> > > > > ==
> > > > > 0 but the implementation seems to suggest it's broader.
> > > > >
> > > > > As such I added a case to check to see if the value comparison we
> > > > > found is a type cast.  and strips away the type cast and continues.
> > > > >
> > > > > In match.pd the typecasts are only added for signed comparisons to
> > > > > ==
> > > > > 0 and != 0 which are then rewritten into comparisons with 1.
> > > > >
> > > > > As such I only check for 1 and LE and GT, which is what match.pd
> > > > > would have rewritten it to.
> > > > >
> > > > > This fixes the regression but this is not code I 100% understand,
> > > > > since I don't really know the semantics of the spaceship operator
> > > > > so would appreciate an extra look.
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > x86_64-pc-linux-gnu and no regressions.
> > > > >
> > > > > Ok for master?
> > > >
> > > > Please add a testcase.  I hope Jakub can review the
> > > > spaceship_replacement patch since he's the one familiar with the code.
> > >
> > > There's already a bunch of testcases that test the various variants:
> > > gcc/testsuite/g++.dg/

Re: [PATCH] Fix loop split incorrect count and probability

2021-10-26 Thread Jan Hubicka via Gcc-patches
> >

> That said, likely the profile update cannot be done uniformly
> for all blocks of a loop?

For the loop:

for (i = 0; i < n; i = inc (i))
{
  if (ga)
ga = do_something ();
}

to:

  for (i = 0; i < x; i = inc (i))
{
if (true)
 ga = do_something ();
if (!ga)
  break;
}
  for (; i < n; i = inc (i))
{
if (false)
 ga = do_something ();
}

If probability of if (ga) being true is p, then you indeed can scale the
first loop by p and second loop by 1-p.

Imagine that loop has n iterations and it takes m iterations for ga to
become false, then probability of if(ga) is m/n and you get frequencies
with m=n*(m/n) for first loop and n-m=n*(1-n/m) for second loop.

Because the conditional becomes constant true, one needs to scale up the
basic block guarded by the if (true) up by n/m to compensate for the
change.  With that the udpate should be right.
Ideally one can bypass scaling of basic block(s) containing
 ga = do_something () since the scaling first scales down to m/n
and then scale sup to m/n. Which may not combine to noop.
Perhaps one wants to have a parameter specifying basic blocks on which
the scaling is performed while duplicating for this?

Honza
> 
> Richard.


Re: [RFC] Partial vectors for s390

2021-10-26 Thread Robin Dapp via Gcc-patches

Hi Richard,


We already have code to probe the predicates of the underlying
define_expands/insns to see whether they support certain constant
IFN arguments; see e.g. internal_gather_scatter_fn_supported_p.
We could do something similar here: add an extra operand to the optab,
and an extra argument to the IFN, that gives a bias amount.
The PowerPC version would require 0, the System Z version would
require -1.  The vectoriser would probe to see which value
it should use.

Doing it that way ensures that the gimple is still self-describing.
It avoids gimple semantics depending on target hooks.


As I don't have much previous exposure to the vectoriser code, I cobbled 
together something pretty ad-hoc (attached).  Does this come somehow 
close to what you have in mind?


internal_len_load_supported_p should rather be called 
internal_len_load_bias_supported_p or so I guess and the part where we 
exclude multiple loop_lens is still missing.  Would we also check for a 
viable bias there and then either accept multiple lens or not?


Regards
 Robincommit 2320dbfdfe1477b15a2ac59847d2a52e68de49ab
Author: Robin Dapp 
Date:   Tue Oct 26 14:36:08 2021 +0200

bias1

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 8312d08aab2..bf97d3e471a 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2696,9 +2696,9 @@ expand_call_mem_ref (tree type, gcall *stmt, int index)
 static void
 expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 {
-  class expand_operand ops[3];
-  tree type, lhs, rhs, maskt;
-  rtx mem, target, mask;
+  class expand_operand ops[4];
+  tree type, lhs, rhs, maskt, biast;
+  rtx mem, target, mask, bias;
   insn_code icode;
 
   maskt = gimple_call_arg (stmt, 2);
@@ -2727,7 +2727,18 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
  TYPE_UNSIGNED (TREE_TYPE (maskt)));
   else
 create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
-  expand_insn (icode, 3, ops);
+  if (optab == len_load_optab)
+{
+  biast = gimple_call_arg (stmt, 3);
+  bias = expand_normal (biast);
+  create_input_operand (&ops[3], bias, SImode);
+}
+
+  if (optab != len_load_optab)
+expand_insn (icode, 3, ops);
+  else
+expand_insn (icode, 4, ops);
+
   if (!rtx_equal_p (target, ops[0].value))
 emit_move_insn (target, ops[0].value);
 }
@@ -2741,9 +2752,9 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 static void
 expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 {
-  class expand_operand ops[3];
-  tree type, lhs, rhs, maskt;
-  rtx mem, reg, mask;
+  class expand_operand ops[4];
+  tree type, lhs, rhs, maskt, biast;
+  rtx mem, reg, mask, bias;
   insn_code icode;
 
   maskt = gimple_call_arg (stmt, 2);
@@ -2770,7 +2781,17 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
  TYPE_UNSIGNED (TREE_TYPE (maskt)));
   else
 create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
-  expand_insn (icode, 3, ops);
+  if (optab == len_store_optab)
+{
+  biast = gimple_call_arg (stmt, 4);
+  bias = expand_normal (biast);
+  create_input_operand (&ops[3], bias, SImode);
+}
+
+  if (optab != len_store_optab)
+expand_insn (icode, 3, ops);
+  else
+expand_insn (icode, 4, ops);
 }
 
 #define expand_mask_store_optab_fn expand_partial_store_optab_fn
@@ -4154,6 +4175,25 @@ internal_gather_scatter_fn_supported_p (internal_fn ifn, tree vector_type,
 	  && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale)));
 }
 
+bool
+internal_len_load_supported_p (internal_fn ifn, tree load_type, int bias)
+{
+  if (bias > 0 || bias < -1)
+return false;
+
+  machine_mode mode = TYPE_MODE (load_type);
+
+  optab optab = direct_internal_fn_optab (ifn);
+  insn_code icode = direct_optab_handler (optab, mode);
+  int output_ops = internal_load_fn_p (ifn) ? 1 : 0;
+
+  if (icode != CODE_FOR_nothing
+  && insn_operand_matches (icode, 2 + output_ops, GEN_INT (bias)))
+return true;
+
+  return false;
+}
+
 /* Return true if the target supports IFN_CHECK_{RAW,WAR}_PTRS function IFN
for pointers of type TYPE when the accesses have LENGTH bytes and their
common byte alignment is ALIGN.  */
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 19d0f849a5a..d0bf9941bcc 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -225,6 +225,7 @@ extern int internal_fn_mask_index (internal_fn);
 extern int internal_fn_stored_value_index (internal_fn);
 extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree,
 		tree, tree, int);
+extern bool internal_len_load_supported_p (internal_fn ifn, tree, int);
 extern bool internal_check_ptrs_fn_supported_p (internal_fn, tree,
 		poly_uint64, unsigned int);
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index d7723b1a92a..50537763ace 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -8272,12 

RE: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-10-26 Thread Tamar Christina via Gcc-patches
Hi,

Following the discussion below here's a revised patch.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/aarch-common-protos.h (struct vector_cost_table): Add
movi, dup and extract costing fields.
* config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs,
thunderx_extra_costs, thunderx2t99_extra_costs,
thunderx3t110_extra_costs, tsv110_extra_costs, a64fx_extra_costs): Use
them.
* config/arm/aarch-cost-tables.h (generic_extra_costs,
cortexa53_extra_costs, cortexa57_extra_costs, cortexa76_extra_costs,
exynosm1_extra_costs, xgene1_extra_costs): Likewise
* config/aarch64/aarch64-simd.md (aarch64_simd_dup): Add r->w dup.
* config/aarch64/aarch64.c (aarch64_simd_make_constant): Expose.
(aarch64_rtx_costs): Add extra costs.
(aarch64_simd_dup_constant): Support check only mode.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-cse-codegen.c: New test.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index 
dd2e7e7cbb13d24f0b51092270cd7e2d75fabf29..bb499a1eae62a145f1665d521f57c98b49ac5389
 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -124,7 +124,10 @@ const struct cpu_cost_table qdf24xx_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1),  /* alu.  */
-COSTS_N_INSNS (4)   /* mult.  */
+COSTS_N_INSNS (4),  /* mult.  */
+COSTS_N_INSNS (1),  /* movi.  */
+COSTS_N_INSNS (2),  /* dup.  */
+COSTS_N_INSNS (2)   /* extract.  */
   }
 };
 
@@ -229,7 +232,10 @@ const struct cpu_cost_table thunderx_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* Alu.  */
-COSTS_N_INSNS (4)  /* mult.  */
+COSTS_N_INSNS (4), /* mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -333,7 +339,10 @@ const struct cpu_cost_table thunderx2t99_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* Alu.  */
-COSTS_N_INSNS (4)  /* Mult.  */
+COSTS_N_INSNS (4), /* Mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -437,7 +446,10 @@ const struct cpu_cost_table thunderx3t110_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* Alu.  */
-COSTS_N_INSNS (4)  /* Mult.  */
+COSTS_N_INSNS (4), /* Mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -542,7 +554,10 @@ const struct cpu_cost_table tsv110_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1),  /* alu.  */
-COSTS_N_INSNS (4)   /* mult.  */
+COSTS_N_INSNS (4),  /* mult.  */
+COSTS_N_INSNS (1),  /* movi.  */
+COSTS_N_INSNS (2),  /* dup.  */
+COSTS_N_INSNS (2)   /* extract.  */
   }
 };
 
@@ -646,7 +661,10 @@ const struct cpu_cost_table a64fx_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1),  /* alu.  */
-COSTS_N_INSNS (4)   /* mult.  */
+COSTS_N_INSNS (4),  /* mult.  */
+COSTS_N_INSNS (1),  /* movi.  */
+COSTS_N_INSNS (2),  /* dup.  */
+COSTS_N_INSNS (2)   /* extract.  */
   }
 };
 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
29f381728a3b3d28bcd6a1002ba398c8b87713d2..61c3d7e195c510da88aa513f99af5f76f4d696e7
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -74,12 +74,14 @@ (define_insn "aarch64_simd_dup"
 )
 
 (define_insn "aarch64_simd_dup"
-  [(set (match_operand:VDQF_F16 0 "register_operand" "=w")
+  [(set (match_operand:VDQF_F16 0 "register_operand" "=w,w")
(vec_duplicate:VDQF_F16
- (match_operand: 1 "register_operand" "w")))]
+ (match_operand: 1 "register_operand" "w,r")))]
   "TARGET_SIMD"
-  "dup\\t%0., %1.[0]"
-  [(set_attr "type" "neon_dup")]
+  "@
+   dup\\t%0., %1.[0]
+   dup\\t%0., %1"
+  [(set_attr "type" "neon_dup, neon_from_gp")]
 )
 
 (define_insn "aarch64_dup_lane"
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
699c105a42a613c06c462e2de686795279d85bc9..1fb4350916572c915e5af339102444daf324efc7
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -303,6 +303,7 @@ static machine_mode aarch64_simd_container_mode 
(scalar_mode, poly_int64);
 static bool aarch64_print_address_internal (FILE*, machine_mode, rtx,
aarch64_addr_query_type);
 static HOST_WIDE_INT aarch64_clamp_to_uimm12_shift (HOST_WIDE_INT val);
+static rtx aarch64_simd_make_constant (rtx, bool);
 
 /* Major revision number of the ARM Architecture implemented by the target.  */
 unsigned aarch64_architecture_version;
@@ -12705,7 +12706,7 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer 
ATTRIB

Re: [PATCH 1/4] Makefile.in: Ensure build CPP is used for build targets

2021-10-26 Thread Richard Purdie via Gcc-patches
On Tue, 2021-10-26 at 14:55 +0200, Richard Biener wrote:
> On Tue, Oct 26, 2021 at 2:45 PM Richard Purdie via Gcc-patches
>  wrote:
> > 
> > During cross compiling, CPP is being set to the target compiler even for
> > build targets. As an example, when building a cross compiler targetting
> > mingw, the config.log for libiberty in
> > build.x86_64-pokysdk-mingw32.i586-poky-linux/build-x86_64-linux/libiberty/config.log
> > shows:
> > 
> > configure:3786: checking how to run the C preprocessor
> > configure:3856: result: x86_64-pokysdk-mingw32-gcc -E 
> > --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32
> > configure:3876: x86_64-pokysdk-mingw32-gcc -E 
> > --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 conftest.c
> > configure:3876: $? = 0
> > 
> > This is libiberty being built for the build environment, not the target one
> > (i.e. in build-x86_64-linux). As such it should be using the build 
> > environment's
> > gcc and not the target one. In the mingw case the system headers are quite
> > different leading to build failures related to not being able to include a
> > process.h file for pem-unix.c.
> > 
> > Fix this by using CC_FOR_BUILD instead of CC. Ultimately a CPP_FOR_BUILD
> > could be defined but CC_FOR_BUILD seems at least more correct and is a 
> > simpler
> > fix.
> 
> Since we're using AC_PROG_CPP to find the preprocessor simply assuming
> that $(CC_FOR_BUILD) -E works looks dangerous?

Potentially, yes. So the route of adding CPP_FOR_BUILD would be preferred?

Cheers,

Richard



Re: [PATCH 1/4] Makefile.in: Ensure build CPP is used for build targets

2021-10-26 Thread Richard Biener via Gcc-patches
On Tue, Oct 26, 2021 at 2:45 PM Richard Purdie via Gcc-patches
 wrote:
>
> During cross compiling, CPP is being set to the target compiler even for
> build targets. As an example, when building a cross compiler targetting
> mingw, the config.log for libiberty in
> build.x86_64-pokysdk-mingw32.i586-poky-linux/build-x86_64-linux/libiberty/config.log
> shows:
>
> configure:3786: checking how to run the C preprocessor
> configure:3856: result: x86_64-pokysdk-mingw32-gcc -E 
> --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32
> configure:3876: x86_64-pokysdk-mingw32-gcc -E 
> --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 conftest.c
> configure:3876: $? = 0
>
> This is libiberty being built for the build environment, not the target one
> (i.e. in build-x86_64-linux). As such it should be using the build 
> environment's
> gcc and not the target one. In the mingw case the system headers are quite
> different leading to build failures related to not being able to include a
> process.h file for pem-unix.c.
>
> Fix this by using CC_FOR_BUILD instead of CC. Ultimately a CPP_FOR_BUILD
> could be defined but CC_FOR_BUILD seems at least more correct and is a simpler
> fix.

Since we're using AC_PROG_CPP to find the preprocessor simply assuming
that $(CC_FOR_BUILD) -E works looks dangerous?

> 2021-10-26 Richard Purdie 
>
> Changelog:
>
> * Makefile.in: Use CC_FOR_BUILD as CPP for build targets
> to avoid host/target contamination
>
> Signed-off-by: Richard Purdie 
> ---
>  Makefile.in | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/Makefile.in b/Makefile.in
> index 34b2d89660d..f4815d7e75f 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -152,6 +152,7 @@ BUILD_EXPORTS = \
> AR="$(AR_FOR_BUILD)"; export AR; \
> AS="$(AS_FOR_BUILD)"; export AS; \
> CC="$(CC_FOR_BUILD)"; export CC; \
> +   CPP="$(CC_FOR_BUILD) -E"; export CPP; \
> CFLAGS="$(CFLAGS_FOR_BUILD)"; export CFLAGS; \
> CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
> CXX="$(CXX_FOR_BUILD)"; export CXX; \
> --
> 2.32.0
>


[PATCH 3/4] gcc: Add --nostdlib++ option

2021-10-26 Thread Richard Purdie via Gcc-patches
OpenEmbedded/Yocto Project builds libgcc and the other gcc runtime libraries
separately from the compiler and slightly differently to the standard gcc build.

In general this works well but in trying to build them separately we run into
an issue since we're using our gcc, not xgcc and there is no way to tell 
configure
to use libgcc but not look for libstdc++.

This adds such an option allowing such configurations to work.

2021-10-26 Richard Purdie 

gcc/c-family/ChangeLog:

* c.opt: Add --nostdlib++ option

gcc/cp/ChangeLog:

* g++spec.c (lang_specific_driver): Add --nostdlib++ option

gcc/ChangeLog:

* doc/invoke.texi: Document --nostdlib++ option
* gcc.c: Add --nostdlib++ option

Signed-off-by: Richard Purdie 
---
 gcc/c-family/c.opt  | 4 
 gcc/cp/g++spec.c| 1 +
 gcc/doc/invoke.texi | 8 ++--
 gcc/gcc.c   | 1 +
 4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 06457ac739e..ee742c831fd 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -2190,6 +2190,10 @@ nostdinc++
 C++ ObjC++
 Do not search standard system include directories for C++.
 
+nostdlib++
+Driver
+Do not link standard C++ runtime library
+
 o
 C ObjC C++ ObjC++ Joined Separate
 ; Documented in common.opt
diff --git a/gcc/cp/g++spec.c b/gcc/cp/g++spec.c
index 3c9bd1490b4..818beb61cee 100644
--- a/gcc/cp/g++spec.c
+++ b/gcc/cp/g++spec.c
@@ -159,6 +159,7 @@ lang_specific_driver (struct cl_decoded_option 
**in_decoded_options,
   switch (decoded_options[i].opt_index)
{
case OPT_nostdlib:
+   case OPT_nostdlib__:
case OPT_nodefaultlibs:
  library = -1;
  break;
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 71992b8c597..b30b79ebdc0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -236,7 +236,7 @@ in the following sections.
 -ftemplate-backtrace-limit=@var{n} @gol
 -ftemplate-depth=@var{n} @gol
 -fno-threadsafe-statics  -fuse-cxa-atexit @gol
--fno-weak  -nostdinc++ @gol
+-fno-weak  -nostdinc++  -nostdlib++ @gol
 -fvisibility-inlines-hidden @gol
 -fvisibility-ms-compat @gol
 -fext-numeric-literals @gol
@@ -633,7 +633,7 @@ Objective-C and Objective-C++ Dialects}.
 @item Linker Options
 @xref{Link Options,,Options for Linking}.
 @gccoptlist{@var{object-file-name}  -fuse-ld=@var{linker}  -l@var{library} @gol
--nostartfiles  -nodefaultlibs  -nolibc  -nostdlib @gol
+-nostartfiles  -nodefaultlibs  -nolibc  -nostdlib  -nostdlib++ @gol
 -e @var{entry}  --entry=@var{entry} @gol
 -pie  -pthread  -r  -rdynamic @gol
 -s  -static  -static-pie  -static-libgcc  -static-libstdc++ @gol
@@ -16125,6 +16125,10 @@ library subroutines.
 constructors are called; @pxref{Collect2,,@code{collect2}, gccint,
 GNU Compiler Collection (GCC) Internals}.)
 
+@item -nostdlib++
+@opindex nostdlib++
+Do not use the standard system C++ runtime libraries when linking.
+
 @item -e @var{entry}
 @itemx --entry=@var{entry}
 @opindex e
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 506c2acc282..abb900a4247 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -1167,6 +1167,7 @@ proper position among the other output files.  */
 %(mflib) " STACK_SPLIT_SPEC "\
 %{fprofile-arcs|fprofile-generate*|coverage:-lgcov} " SANITIZER_SPEC " \
 %{!nostdlib:%{!r:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}}\
+%{!nostdlib++:}\
 %{!nostdlib:%{!r:%{!nostartfiles:%E}}} %{T*}  \n%(post_link) }}"
 #endif
 
-- 
2.32.0



[PATCH 4/4] gcc/nios2: Define the musl linker

2021-10-26 Thread Richard Purdie via Gcc-patches
Add a definition of the musl linker used on the nios2 platform.

2021-10-26 Richard Purdie 

gcc/ChangeLog:

* config/nios2/linux.h (MUSL_DYNAMIC_LINKER): Add musl linker

Signed-off-by: Richard Purdie 
---
 gcc/config/nios2/linux.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/nios2/linux.h b/gcc/config/nios2/linux.h
index 08edf1521f6..15696d86241 100644
--- a/gcc/config/nios2/linux.h
+++ b/gcc/config/nios2/linux.h
@@ -30,6 +30,7 @@
 #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{pthread:-D_REENTRANT}"
 
 #define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-nios2.so.1"
+#define MUSL_DYNAMIC_LINKER  "/lib/ld-musl-nios2.so.1"
 
 #undef LINK_SPEC
 #define LINK_SPEC LINK_SPEC_ENDIAN \
-- 
2.32.0



[PATCH 2/4] gcc: Fix "argument list too long" from install-plugins

2021-10-26 Thread Richard Purdie via Gcc-patches
When building in longer build paths (200+ characters), the
"echo $(PLUGIN_HEADERS)" from the install-plugins target would cause an
"argument list too long error" on some systems.

Avoid this by calling make's sort function on the list which removes
duplicates and stops the overflow from reaching the echo command.
The original sort is left to handle the the .h and .def files.

2021-10-26 Richard Purdie 

gcc/ChangeLog:

* Makefile.in: Fix "argument list too long" from install-plugins

Signed-off-by: Richard Purdie 
---
 gcc/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 658093c11c0..89482c6dd4e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3685,7 +3685,7 @@ install-plugin: installdirs lang.install-plugin 
s-header-vars install-gengtype
 # We keep the directory structure for files in config, common/config or
 # c-family and .def files. All other files are flattened to a single directory.
$(mkinstalldirs) $(DESTDIR)$(plugin_includedir)
-   headers=`echo $(PLUGIN_HEADERS) $$(cd $(srcdir); echo *.h *.def) | tr ' 
' '\012' | sort -u`; \
+   headers=`echo $(sort $(PLUGIN_HEADERS)) $$(cd $(srcdir); echo *.h 
*.def) | tr ' ' '\012' | sort -u`; \
srcdirstrip=`echo "$(srcdir)" | sed 's/[].[^$$\\*|]/&/g'`; \
for file in $$headers; do \
  if [ -f $$file ] ; then \
-- 
2.32.0



[PATCH 1/4] Makefile.in: Ensure build CPP is used for build targets

2021-10-26 Thread Richard Purdie via Gcc-patches
During cross compiling, CPP is being set to the target compiler even for
build targets. As an example, when building a cross compiler targetting
mingw, the config.log for libiberty in
build.x86_64-pokysdk-mingw32.i586-poky-linux/build-x86_64-linux/libiberty/config.log
shows:

configure:3786: checking how to run the C preprocessor
configure:3856: result: x86_64-pokysdk-mingw32-gcc -E 
--sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32
configure:3876: x86_64-pokysdk-mingw32-gcc -E 
--sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 conftest.c
configure:3876: $? = 0

This is libiberty being built for the build environment, not the target one
(i.e. in build-x86_64-linux). As such it should be using the build environment's
gcc and not the target one. In the mingw case the system headers are quite
different leading to build failures related to not being able to include a
process.h file for pem-unix.c.

Fix this by using CC_FOR_BUILD instead of CC. Ultimately a CPP_FOR_BUILD
could be defined but CC_FOR_BUILD seems at least more correct and is a simpler
fix.

2021-10-26 Richard Purdie 

Changelog:

* Makefile.in: Use CC_FOR_BUILD as CPP for build targets
to avoid host/target contamination

Signed-off-by: Richard Purdie 
---
 Makefile.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Makefile.in b/Makefile.in
index 34b2d89660d..f4815d7e75f 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -152,6 +152,7 @@ BUILD_EXPORTS = \
AR="$(AR_FOR_BUILD)"; export AR; \
AS="$(AS_FOR_BUILD)"; export AS; \
CC="$(CC_FOR_BUILD)"; export CC; \
+   CPP="$(CC_FOR_BUILD) -E"; export CPP; \
CFLAGS="$(CFLAGS_FOR_BUILD)"; export CFLAGS; \
CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
CXX="$(CXX_FOR_BUILD)"; export CXX; \
-- 
2.32.0



[PATCH v3] bpf: Add support to eBPF atomic instructions

2021-10-26 Thread Guillermo E. Martinez via Gcc-patches
Apologies, I forgot update the ChangeLog(s)

Kind Regards,
Guillermo


eBPF add support for atomic instructions, the following
gcc built-in functions are implemented for bpf target using
both: 32 and 64 bits data types:

 __atomic_fetch_add
 __atomic_fetch_sub
 __atomic_fetch_and
 __atomic_fetch_xor
 __atomic_fetch_or
 __atomic_exchange
 __atomic_compare_exchange_n

Also calls to __atomic__fetch are fully supported.

New define instructions were added to bpf.md along with its
tests for gcc atomic built-in functions.

In order to restrict/enable the use of `add + fetch' and the
rest of atomic instructions the -m[no-]atomics was added.

Those instructions are fully compliant with:
  https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
  https://www.kernel.org/doc/Documentation/networking/filter.rst

This support depends of two previous submissions in CGEN and
binutils-gdb projects:
   https://sourceware.org/pipermail/cgen/2021q3/002774.html
   https://sourceware.org/pipermail/binutils/2021-August/117798.html

gcc/
  * config/bpf/atomic.md: New file for atomic instructions.
  * config/bpf/bpf.md: Move rtl atomic add definition to atomic.md
  * config/bpf/bpf.c: Enable atomics by default in ISA v3.
  * config/bpf/bpf.h: Pass option to gas to disabled use of
  atomics (-mno-atomics).
  * config/bpf/bpf.opt: Add -m[no-]atomics option.
  * doc/invoke.texi: Add documentation for -m[no-a]tomics.

gcc/testsuite/
   * gcc.target/bpf/atomic-compare-exchange.c: New test.
   * gcc.target/bpf/atomic-exchange.c: Likewise.
   * gcc.target/bpf/atomic-add.c: Likewise.
   * gcc.target/bpf/atomic-and.c: Likewise.
   * gcc.target/bpf/atomic-or.c: Likewise.
   * gcc.target/bpf/atomic-sub.c: Likewise.
   * gcc.target/bpf/atomic-xor.c: Likewise.
   * gcc.target/bpf/atomics-disabled.c: Likewise.
   * gcc.target/bpf/ftest-mcpuv3-atomics.c: Likewise.
   * gcc.target/bpf/ftest-no-atomics-add.c: Likewise.
---
 gcc/ChangeLog |   9 +
 gcc/config/bpf/atomic.md  | 169 ++
 gcc/config/bpf/bpf.c  |   2 +
 gcc/config/bpf/bpf.h  |  12 +-
 gcc/config/bpf/bpf.md |  20 +--
 gcc/config/bpf/bpf.opt|   4 +
 gcc/config/bpf/constraints.md |   3 +
 gcc/doc/invoke.texi   |  12 +-
 gcc/testsuite/ChangeLog   |  12 ++
 .../gcc.target/bpf/atomic-add-fetch.c |  29 +++
 gcc/testsuite/gcc.target/bpf/atomic-and.c |  25 +++
 .../gcc.target/bpf/atomic-compare-exchange.c  |  28 +++
 .../gcc.target/bpf/atomic-exchange.c  |  19 ++
 gcc/testsuite/gcc.target/bpf/atomic-or.c  |  25 +++
 gcc/testsuite/gcc.target/bpf/atomic-sub.c |  27 +++
 gcc/testsuite/gcc.target/bpf/atomic-xor.c |  25 +++
 .../gcc.target/bpf/atomics-disabled.c |  28 +++
 .../gcc.target/bpf/ftest-mcpuv3-atomics.c |  36 
 .../gcc.target/bpf/ftest-no-atomics-add.c |  23 +++
 19 files changed, 487 insertions(+), 21 deletions(-)
 create mode 100644 gcc/config/bpf/atomic.md
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-add-fetch.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-and.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-compare-exchange.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-exchange.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-or.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-sub.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-xor.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomics-disabled.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/ftest-mcpuv3-atomics.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/ftest-no-atomics-add.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 115f32e5061..af52f9fcc03 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2021-10-26  Guillermo E. Martinez  
+   * config/bpf/atomic.md: New file for atomic instructions.
+   * config/bpf/bpf.md: Move rtl atomic add definition to atomic.md
+   * config/bpf/bpf.c: Enable atomics by default in ISA v3.
+   * config/bpf/bpf.h: Pass option to gas to disable use of
+   atomics (-mno-atomics).
+   * config/bpf/bpf.opt: Add -m[no-]atomics option.
+   * doc/invoke.texi: Add documentation for -m[no-]atomics.
+
 2021-10-20  Alex Coplan  
 
* calls.c (initialize_argument_information): Remove some dead
diff --git a/gcc/config/bpf/atomic.md b/gcc/config/bpf/atomic.md
new file mode 100644
index 000..f8f9a6b96c1
--- /dev/null
+++ b/gcc/config/bpf/atomic.md
@@ -0,0 +1,169 @@
+;; Machine description for eBPF atomic Instructions.
+;; Copyright (C) 2021 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you ca

Re: [PATCH] Move negative stride bias out of dr_misalignment

2021-10-26 Thread Bernhard Reutner-Fischer via Gcc-patches
On Tue, 26 Oct 2021 14:10:05 +0200 (CEST)
Richard Biener  wrote:

> I agree in general, where I would diagnose this is when we build the
> CFG I would diagnose unreachable blocks - the above does not have
> any path to the second *poffset store.  Like with the prototype patch
> below we warn for
> 
> int *p;
> int foo()
> {
>   return 0;
>   *p = 0;
> }

mhm. I thought we have -Wunreachable-code but it seems that's gone
thanks,


Re: [PATCH] Fix loop split incorrect count and probability

2021-10-26 Thread Jan Hubicka via Gcc-patches
> On Tue, 26 Oct 2021, Xionghu Luo wrote:
> 
> > 
> > 
> > On 2021/10/21 18:55, Richard Biener wrote:
> > > On Thu, 21 Oct 2021, Xionghu Luo wrote:
> > > 
> > >>
> > >>
> > >> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
> > >>>
> > >>>
> > >>> On 2021/9/23 20:17, Richard Biener wrote:
> >  On Wed, 22 Sep 2021, Xionghu Luo wrote:
> > 
> > >
> > >
> > > On 2021/8/11 17:16, Richard Biener wrote:
> > >> On Wed, 11 Aug 2021, Xionghu Luo wrote:
> > >>
> > >>>
> > >>>
> > >>> On 2021/8/10 22:47, Richard Biener wrote:
> >  On Mon, 9 Aug 2021, Xionghu Luo wrote:
> > 
> > > Thanks,
> > >
> > > On 2021/8/6 19:46, Richard Biener wrote:
> > >> On Tue, 3 Aug 2021, Xionghu Luo wrote:
> > >>
> > >>> loop split condition is moved between loop1 and loop2, the 
> > >>> split bb's
> > >>> count and probability should also be duplicated instead of 
> > >>> (100% vs
> > >>> INV),
> > >>> secondly, the original loop1 and loop2 count need be 
> > >>> propotional from
> > >>> the
> > >>> original loop.
> > >>>
> > >>>
> > >>> diff base/loop-cond-split-1.c.151t.lsplit
> > >>> patched/loop-cond-split-1.c.151t.lsplit:
> > >>> ...
> > >>>   int prephitmp_16;
> > >>>   int prephitmp_25;
> > >>>
> > >>>[local count: 118111600]:
> > >>>   if (n_7(D) > 0)
> > >>> goto ; [89.00%]
> > >>>   else
> > >>> goto ; [11.00%]
> > >>>
> > >>>[local count: 118111600]:
> > >>>   return;
> > >>>
> > >>>[local count: 105119324]:
> > >>>   pretmp_3 = ga;
> > >>>
> > >>> -   [local count: 955630225]:
> > >>> +   [local count: 315357973]:
> > >>>   # i_13 = PHI 
> > >>>   # prephitmp_12 = PHI 
> > >>>   if (prephitmp_12 != 0)
> > >>> goto ; [33.00%]
> > >>>   else
> > >>> goto ; [67.00%]
> > >>>
> > >>> -   [local count: 315357972]:
> > >>> +   [local count: 104068130]:
> > >>>   _2 = do_something ();
> > >>>   ga = _2;
> > >>>
> > >>> -   [local count: 955630225]:
> > >>> +   [local count: 315357973]:
> > >>>   # prephitmp_5 = PHI 
> > >>>   i_10 = inc (i_13);
> > >>>   if (n_7(D) > i_10)
> > >>> goto ; [89.00%]
> > >>>   else
> > >>> goto ; [11.00%]
> > >>>
> > >>>[local count: 105119324]:
> > >>>   goto ; [100.00%]
> > >>>
> > >>> -   [local count: 850510901]:
> > >>> +   [local count: 280668596]:
> > >>>   if (prephitmp_12 != 0)
> > >>> -goto ; [100.00%]
> > >>> +goto ; [33.00%]
> > >>>   else
> > >>> -goto ; [INV]
> > >>> +goto ; [67.00%]
> > >>>
> > >>> -   [local count: 850510901]:
> > >>> +   [local count: 280668596]:
> > >>>   goto ; [100.00%]
> > >>>
> > >>> -   [count: 0]:
> > >>> +   [local count: 70429947]:
> > >>>   # i_23 = PHI 
> > >>>   # prephitmp_25 = PHI 
> > >>>
> > >>> -   [local count: 955630225]:
> > >>> +   [local count: 640272252]:
> > >>>   # i_15 = PHI 
> > >>>   # prephitmp_16 = PHI 
> > >>>   i_22 = inc (i_15);
> > >>>   if (n_7(D) > i_22)
> > >>> goto ; [89.00%]
> > >>>   else
> > >>> goto ; [11.00%]
> > >>>
> > >>> -   [local count: 850510901]:
> > >>> +   [local count: 569842305]:
> > >>>   goto ; [100.00%]
> > >>>
> > >>> }
> > >>>
> > >>> gcc/ChangeLog:
> > >>>
> > >>>  * tree-ssa-loop-split.c (split_loop): Fix incorrect 
> > >>> probability.
> > >>>  (do_split_loop_on_cond): Likewise.
> > >>> ---
> > >>> gcc/tree-ssa-loop-split.c | 16 
> > >>> 1 file changed, 8 insertions(+), 8 deletions(-)
> > >>>
> > >>> diff --git a/gcc/tree-ssa-loop-split.c 
> > >>> b/gcc/tree-ssa-loop-split.c
> > >>> index 3a09bbc39e5..8e5a7ded0f7 100644
> > >>> --- a/gcc/tree-ssa-loop-split.c
> > >>> +++ b/gcc/tree-ssa-loop-split.c
> > >>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> > >>>  basic_block cond_bb;
> > >
> > >   if (!initial_true)
> > > -   cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> > > +   cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> > > +
> > > + e

RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-26 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, October 26, 2021 9:46 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd
> 
> Subject: RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns
> on signed values
> 
> On Tue, 26 Oct 2021, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Tuesday, October 26, 2021 9:26 AM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd
> > > 
> > > Subject: RE: [PATCH] middle-end: fix de-optimizations with bitclear
> > > patterns on signed values
> > >
> > > On Mon, 25 Oct 2021, Tamar Christina wrote:
> > >
> > > > > -Original Message-
> > > > > From: Richard Biener 
> > > > > Sent: Friday, October 15, 2021 12:31 PM
> > > > > To: Tamar Christina 
> > > > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ;
> > > > > nd 
> > > > > Subject: Re: [PATCH] middle-end: fix de-optimizations with
> > > > > bitclear patterns on signed values
> > > > >
> > > > > On Fri, 15 Oct 2021, Tamar Christina wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > During testing after rebasing to commit I noticed a failing
> > > > > > testcase with the bitmask compare patch.
> > > > > >
> > > > > > Consider the following C++ testcase:
> > > > > >
> > > > > > #include 
> > > > > >
> > > > > > #define A __attribute__((noipa)) A bool f5 (double i, double
> > > > > > j) { auto c = i <=> j; return c >= 0; }
> > > > > >
> > > > > > This turns into a comparison against chars, on systems where
> > > > > > chars are signed the pattern inserts an unsigned convert such
> > > > > > that it's able to do the transformation.
> > > > > >
> > > > > > i.e.:
> > > > > >
> > > > > >   # RANGE [-1, 2]
> > > > > >   # c$_M_value_22 = PHI <-1(3), 0(2), 2(5), 1(4)>
> > > > > >   # RANGE ~[3, 254]
> > > > > >   _11 = (unsigned char) c$_M_value_22;
> > > > > >   _19 = _11 <= 1;
> > > > > >   # .MEM_24 = VDEF <.MEM_6(D)>
> > > > > >   D.10434 ={v} {CLOBBER};
> > > > > >   # .MEM_14 = VDEF <.MEM_24>
> > > > > >   D.10407 ={v} {CLOBBER};
> > > > > >   # VUSE <.MEM_14>
> > > > > >   return _19;
> > > > > >
> > > > > > instead of:
> > > > > >
> > > > > >   # RANGE [-1, 2]
> > > > > >   # c$_M_value_5 = PHI <-1(3), 0(2), 2(5), 1(4)>
> > > > > >   # RANGE [-2, 2]
> > > > > >   _3 = c$_M_value_5 & -2;
> > > > > >   _19 = _3 == 0;
> > > > > >   # .MEM_24 = VDEF <.MEM_6(D)>
> > > > > >   D.10440 ={v} {CLOBBER};
> > > > > >   # .MEM_14 = VDEF <.MEM_24>
> > > > > >   D.10413 ={v} {CLOBBER};
> > > > > >   # VUSE <.MEM_14>
> > > > > >   return _19;
> > > > > >
> > > > > > This causes much worse codegen under -ffast-math due to phiops
> > > > > > no longer recognizing the pattern.  It turns out that phiopts
> > > > > > spaceship_replacement is looking for the exact form that was
> > > > > > just
> > > changed.
> > > > > >
> > > > > > Trying to get it to recognize the new form is not trivial as
> > > > > > the transformation doesn't look to work when the thing it's
> > > > > > pointing to is itself
> > > > > a phi-node.
> > > > >
> > > > > What do you mean?  Where it handles the BIT_AND it could also
> > > > > handle the conversion, no?  The later handling would probably
> > > > > more explicitely need to distinguish between the BIT_AND and the
> > > > > conversion
> > > forms.
> > > >
> > > > Looks like I misunderstood the code, it was looking at the uses
> > > > not the defs of the value.
> > > >
> > > > --- inline copy of patch ---
> > > >
> > > > The comments seems to suggest this code only checks for (res & ~1)
> > > > ==
> > > > 0 but the implementation seems to suggest it's broader.
> > > >
> > > > As such I added a case to check to see if the value comparison we
> > > > found is a type cast.  and strips away the type cast and continues.
> > > >
> > > > In match.pd the typecasts are only added for signed comparisons to
> > > > ==
> > > > 0 and != 0 which are then rewritten into comparisons with 1.
> > > >
> > > > As such I only check for 1 and LE and GT, which is what match.pd
> > > > would have rewritten it to.
> > > >
> > > > This fixes the regression but this is not code I 100% understand,
> > > > since I don't really know the semantics of the spaceship operator
> > > > so would appreciate an extra look.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > x86_64-pc-linux-gnu and no regressions.
> > > >
> > > > Ok for master?
> > >
> > > Please add a testcase.  I hope Jakub can review the
> > > spaceship_replacement patch since he's the one familiar with the code.
> >
> > There's already a bunch of testcases that test the various variants:
> > gcc/testsuite/g++.dg/opt/pr94589-1.C
> > and gcc/testsuite/g++.dg/opt/pr94589-2.C which is how I noticed the
> > failure.  However they only trigger the failure on signed chars.  I
> > tried forcing `-fsigned-char` to see if I can make a general testcase but 
> > this
> seems to have not done it.
> >
> > Is there another f

Re: [PATCH] Move negative stride bias out of dr_misalignment

2021-10-26 Thread Richard Biener via Gcc-patches
On Tue, 26 Oct 2021, Bernhard Reutner-Fischer wrote:

> On 26 October 2021 11:19:44 CEST, Richard Biener via Gcc-patches 
>  wrote:
> 
> 
>  
> >@@ -2010,6 +2010,7 @@ get_negative_load_store_type (vec_info *vinfo,
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "negative step but alignment required.\n");
> >+  *poffset = 0;
> >   return VMAT_ELEMENTWISE;
> >   *poffset = 0;
> > }
> 
> I think you cannot really diagnose these, they would trigger a lot, in each 
> early return, no?
> Or would we see that there are unreachable, non artificial stmts in the same 
> block after a return?
> Somebody could experiment with diagnosing each and every DCEd stmt 
> (!artificial, nondebug) but I would expect that hell breaks loose..
> cheers,

[I fixed the issue before pushing this patch btw]

I agree in general, where I would diagnose this is when we build the
CFG I would diagnose unreachable blocks - the above does not have
any path to the second *poffset store.  Like with the prototype patch
below we warn for

int *p;
int foo()
{
  return 0;
  *p = 0;
}
> ./cc1 -quiet t.c
t.c: In function 'foo':
t.c:5:3: warning: statement is not reachable
5 |   *p = 0;
  |   ^~

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index b3a27bcd17c..2de661ba35d 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -374,6 +374,22 @@ execute_build_cfg (void)
   fprintf (dump_file, "Scope blocks:\n");
   dump_scope_blocks (dump_file, dump_flags);
 }
+
+  find_unreachable_blocks ();
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+if (!(bb->flags & BB_REACHABLE))
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p 
(gsi);
+  gsi_next (&gsi))
+   {
+ if (gimple_location (gsi_stmt (gsi)) != UNKNOWN_LOCATION)
+   warning_at (gimple_location (gsi_stmt (gsi)), 0,
+   "statement is not reachable");
+ /* ???  Mark blocks reachable from here.  And even better make
+sure to process entries to unreachable regions first.  */
+ break;
+   }
+
   cleanup_tree_cfg ();
 
   bb_to_omp_idx.release ();




[PATCH v3] bpf: Add support to eBPF atomic instructions

2021-10-26 Thread Guillermo E. Martinez via Gcc-patches
Hello people,

This patch v3 to add support for atomics instructions in eBPF target,
the rtl for eBPF atomics was splitted in a new file: atomic.md,

Thanks Andrew for the comments,

Kind Regards,
Guillermo

eBPF add support for atomic instructions, the following
gcc built-in functions are implemented for bpf target using
both: 32 and 64 bits data types:

 __atomic_fetch_add
 __atomic_fetch_sub
 __atomic_fetch_and
 __atomic_fetch_xor
 __atomic_fetch_or
 __atomic_exchange
 __atomic_compare_exchange_n

Also calls to __atomic__fetch are fully supported.

New define instructions were added to bpf.md along with its
tests for gcc atomic built-in functions.

In order to restrict/enable the use of `add + fetch' and the
rest of atomic instructions the -m[no-]atomics was added.

Those instructions are fully compliant with:
  https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
  https://www.kernel.org/doc/Documentation/networking/filter.rst

This support depends of two previous submissions in CGEN and
binutils-gdb projects:
   https://sourceware.org/pipermail/cgen/2021q3/002774.html
   https://sourceware.org/pipermail/binutils/2021-August/117798.html

gcc/
  * config/bpf/bpf.md: Add defines for atomic instructions.
  * config/bpf/bpf.c: Enable atomics by default in ISA v3.
  * config/bpf/bpf.h: Pass option to gas to disabled use of
  atomics (-mno-atomics).
  * config/bpf/bpf.opt: Add -m[no-]atomics option.
  * doc/invoke.texi: Add documentation for -m[no-a]tomics.

gcc/testsuite/
   * gcc.target/bpf/atomic-compare-exchange.c: New test.
   * gcc.target/bpf/atomic-exchange.c: Likewise.
   * gcc.target/bpf/atomic-add.c: Likewise.
   * gcc.target/bpf/atomic-and.c: Likewise.
   * gcc.target/bpf/atomic-or.c: Likewise.
   * gcc.target/bpf/atomic-sub.c: Likewise.
   * gcc.target/bpf/atomic-xor.c: Likewise.
   * gcc.target/bpf/atomics-disabled.c: Likewise.
   * gcc.target/bpf/ftest-mcpuv3-atomics.c: Likewise.
   * gcc.target/bpf/ftest-no-atomics-add.c: Likewise.
---
 gcc/ChangeLog |   8 +
 gcc/config/bpf/atomic.md  | 160 ++
 gcc/config/bpf/bpf.c  |   2 +
 gcc/config/bpf/bpf.h  |  12 +-
 gcc/config/bpf/bpf.md |  24 +--
 gcc/config/bpf/bpf.opt|   4 +
 gcc/config/bpf/constraints.md |   3 +
 gcc/doc/invoke.texi   |  12 +-
 gcc/testsuite/ChangeLog   |  12 ++
 .../gcc.target/bpf/atomic-add-fetch.c |  29 
 gcc/testsuite/gcc.target/bpf/atomic-and.c |  25 +++
 .../gcc.target/bpf/atomic-compare-exchange.c  |  28 +++
 .../gcc.target/bpf/atomic-exchange.c  |  19 +++
 gcc/testsuite/gcc.target/bpf/atomic-or.c  |  25 +++
 gcc/testsuite/gcc.target/bpf/atomic-sub.c |  27 +++
 gcc/testsuite/gcc.target/bpf/atomic-xor.c |  25 +++
 .../gcc.target/bpf/atomics-disabled.c |  28 +++
 .../gcc.target/bpf/ftest-mcpuv3-atomics.c |  36 
 .../gcc.target/bpf/ftest-no-atomics-add.c |  23 +++
 19 files changed, 482 insertions(+), 20 deletions(-)
 create mode 100644 gcc/config/bpf/atomic.md
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-add-fetch.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-and.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-compare-exchange.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-exchange.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-or.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-sub.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-xor.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/atomics-disabled.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/ftest-mcpuv3-atomics.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/ftest-no-atomics-add.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 115f32e5061..782d33908ba 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2021-10-25  Guillermo E. Martinez  
+   * config/bpf/bpf.md: Add defines for atomic instructions.
+   * config/bpf/bpf.c: Enable atomics by default in ISA v3.
+   * config/bpf/bpf.h: Pass option to gas to disable use of
+   atomics (-mno-atomics).
+   * config/bpf/bpf.opt: Add -m[no-]atomics option.
+   * doc/invoke.texi: Add documentation for -m[no-]atomics.
+
 2021-10-20  Alex Coplan  
 
* calls.c (initialize_argument_information): Remove some dead
diff --git a/gcc/config/bpf/atomic.md b/gcc/config/bpf/atomic.md
new file mode 100644
index 000..9afc38f01ae
--- /dev/null
+++ b/gcc/config/bpf/atomic.md
@@ -0,0 +1,160 @@
+;; Machine description for eBPF atomic Instructions.
+;; Copyright (C) 2021 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can red

RE: [PATCH]AArch64 Lower intrinsics shift to GIMPLE when possible.

2021-10-26 Thread Tamar Christina via Gcc-patches
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c
(aarch64_general_gimple_fold_builtin): Add ashl, sshl, ushl, ashr,
ashr_simd, lshr, lshr_simd.
* config/aarch64/aarch64-simd-builtins.def (lshr): Use USHIFTIMM.
* config/aarch64/arm_neon.h (vshr_n_u8, vshr_n_u16, vshr_n_u32,
vshrq_n_u8, vshrq_n_u16, vshrq_n_u32, vshrq_n_u64): Fix type hack.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-4.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-5.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-6.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-7.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-8.c: New test.
* gcc.target/aarch64/signbit-2.c: New test.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 
f6b41d9c200d6300dee65ba60ae94488231a8a38..c362b29186cfc0bf0d39c08c314cfd6a99124cb2
 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -2394,6 +2394,54 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, 
gcall *stmt)
   1, args[0]);
gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
break;
+  BUILTIN_VSDQ_I_DI (BINOP, ashl, 3, NONE)
+   if (TREE_CODE (args[1]) == INTEGER_CST
+   && wi::ltu_p (wi::to_wide (args[1]), element_precision (args[0])))
+ new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
+ LSHIFT_EXPR, args[0], args[1]);
+   break;
+  BUILTIN_VSDQ_I_DI (BINOP, sshl, 0, NONE)
+  BUILTIN_VSDQ_I_DI (BINOP_UUS, ushl, 0, NONE)
+   {
+ tree cst = args[1];
+ tree ctype = TREE_TYPE (cst);
+ /* Left shifts can be both scalar or vector, e.g. uint64x1_t is
+treated as a scalar type not a vector one.  */
+ if ((cst = uniform_integer_cst_p (cst)) != NULL_TREE)
+   {
+ wide_int wcst = wi::to_wide (cst);
+ tree unit_ty = TREE_TYPE (cst);
+
+ wide_int abs_cst = wi::abs (wcst);
+ if (wi::geu_p (abs_cst, element_precision (args[0])))
+   break;
+
+ if (wi::neg_p (wcst, TYPE_SIGN (ctype)))
+   {
+ tree final_cst;
+ final_cst = wide_int_to_tree (unit_ty, abs_cst);
+ if (TREE_CODE (cst) != INTEGER_CST)
+   final_cst = build_uniform_cst (ctype, final_cst);
+
+ new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
+ RSHIFT_EXPR, args[0],
+ final_cst);
+   }
+ else
+   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
+   LSHIFT_EXPR, args[0], args[1]);
+   }
+   }
+   break;
+  BUILTIN_VDQ_I (SHIFTIMM, ashr, 3, NONE)
+  VAR1 (SHIFTIMM, ashr_simd, 0, NONE, di)
+  BUILTIN_VDQ_I (USHIFTIMM, lshr, 3, NONE)
+  VAR1 (USHIFTIMM, lshr_simd, 0, NONE, di)
+   if (TREE_CODE (args[1]) == INTEGER_CST
+   && wi::ltu_p (wi::to_wide (args[1]), element_precision (args[0])))
+ new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
+ RSHIFT_EXPR, args[0], args[1]);
+   break;
   BUILTIN_GPF (BINOP, fmulx, 0, ALL)
{
  gcc_assert (nargs == 2);
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 
402453aa9bba5949da43c984c4603196b1efd092..bbe0a4a3c4aea4187e7b1a9f10ab60e79df7b138
 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -409,7 +409,7 @@
 
   BUILTIN_VDQ_I (SHIFTIMM, ashr, 3, NONE)
   VAR1 (SHIFTIMM, ashr_simd, 0, NONE, di)
-  BUILTIN_VDQ_I (SHIFTIMM, lshr, 3, NONE)
+  BUILTIN_VDQ_I (USHIFTIMM, lshr, 3, NONE)
   VAR1 (USHIFTIMM, lshr_simd, 0, NONE, di)
   /* Implemented by aarch64_shr_n.  */
   BUILTIN_VSDQ_I_DI (SHIFTIMM, srshr_n, 0, NONE)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 
635a223b59eb0f64304351939d11b697af81..c4ef5f7f7e3658c830893931ef5a874842410e10
 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -27400,21 +27400,21 @@ __extension__ extern __inline uint8x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vshr_n_u8 (uint8x8_t __a, const int __b)
 {

Re: [PATCH] PR fortran/102917 - PDT type parameters are not restricted to default integer

2021-10-26 Thread Harald Anlauf via Gcc-patches
Hi Tobias,

> OK. I wonder whether a valid len/kind example should be added which uses
> such a PDT with non-default-kind integer.

the testcase pdt_4.f03 did actually check for the error message that gets
removed and had to be adjusted.  Removing just the dg-error does that job :-)

Thanks,
Harald

> Tobias
> 
> >> Gesendet: Freitag, 22. Oktober 2021 um 22:25 Uhr
> >> Von: "Steve Kargl" 
> >> An: "Harald Anlauf" 
> >> Cc: fort...@gcc.gnu.org
> >> Betreff: Re: PDT type parameters are not restricted to default integer
> >>
> >> On Fri, Oct 22, 2021 at 10:16:05PM +0200, Harald Anlauf wrote:
> >>> Hi Steve,
> >>>
> >>> Am 22.10.21 um 21:35 schrieb Steve Kargl via Fortran:
>  Here's an obvious quick fix.  Please apply.
> 
> 
>  diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
>  index 6043e100fbb..e889bb44142 100644
>  --- a/gcc/fortran/decl.c
>  +++ b/gcc/fortran/decl.c
>  @@ -5619,14 +5619,6 @@ match_attr_spec (void)
>   m = MATCH_ERROR;
>   goto cleanup;
> }
>  -if (current_ts.kind != gfc_default_integer_kind)
>  -  {
>  -gfc_error ("Component with LEN attribute at %C must be "
>  -   "default integer kind (%d)",
>  -gfc_default_integer_kind);
>  -m = MATCH_ERROR;
>  -goto cleanup;
>  -  }
> }
>   else
> {
> >>> I think you are right.  We should always have allowed any integer kind.
> >>>
> >>> However, have you checked whether this change introduces regressions?
> >>> If you don't, somebody else will.  Please open a PR, then.
> >>>
> >> It seems that pdt_4.f03 will fail with the above patch because
> >> it explicitly tests for this error message.  That's the only
> >> failure in the testsuite.  For the record, F2003, page 48,
> >>
> >> R435 type-param-def-stmt  is INTEGER [ kind-selector ] , ...
> >>
> >> Each type parameter is itself of type integer.  If its kind selector
> >> is omitted, the kind type parameter is default integer.
> >>
> >> Now that I think about and look, there is a nearby similar gcc_error()
> >> for KIND.  This should be removed too.
> >>
> >> --
> >> Steve
> >>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955
>


Re: [PATCH] PR fortran/102816 - [12 Regression] ICE in resolve_structure_cons, at fortran/resolve.c:1467

2021-10-26 Thread Harald Anlauf via Gcc-patches
Hi Tobias,

> In other cases, it requires some careful weighting whether error should
> have the error location "use m" or where the symbol is used. (Here, it
> cannot occur as the module won't get generated and an error is already
> printed at the proper location.)

I had though about this but couldn't come up with a way to create
a module file from an invalid type definition.

The closest is sth. which imports it, such as in

program p
  type t
 integer :: a([2]) ! { dg-error "must be scalar" }
  end type
  type(t) :: x = t([3, 4]) ! { dg-error "Bad array spec of component" }
  interface
 subroutine foo (x)
   import t
   type(t) :: x
   type(t), parameter :: y = t([5, 6]) ! { dg-error "Bad array spec of 
component" }
 end subroutine foo
  end interface
end

However (= fortunately) this works just fine.

> > Regtested on x86_64-pc-linux-gnu.  OK?
> 
> OK.

Will commit tonight.  Thanks for the review!

Harald

> Tobias
> 
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955
>


Re: [PATCH] Fix loop split incorrect count and probability

2021-10-26 Thread Richard Biener via Gcc-patches
On Tue, 26 Oct 2021, Xionghu Luo wrote:

> 
> 
> On 2021/10/21 18:55, Richard Biener wrote:
> > On Thu, 21 Oct 2021, Xionghu Luo wrote:
> > 
> >>
> >>
> >> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
> >>>
> >>>
> >>> On 2021/9/23 20:17, Richard Biener wrote:
>  On Wed, 22 Sep 2021, Xionghu Luo wrote:
> 
> >
> >
> > On 2021/8/11 17:16, Richard Biener wrote:
> >> On Wed, 11 Aug 2021, Xionghu Luo wrote:
> >>
> >>>
> >>>
> >>> On 2021/8/10 22:47, Richard Biener wrote:
>  On Mon, 9 Aug 2021, Xionghu Luo wrote:
> 
> > Thanks,
> >
> > On 2021/8/6 19:46, Richard Biener wrote:
> >> On Tue, 3 Aug 2021, Xionghu Luo wrote:
> >>
> >>> loop split condition is moved between loop1 and loop2, the split 
> >>> bb's
> >>> count and probability should also be duplicated instead of (100% 
> >>> vs
> >>> INV),
> >>> secondly, the original loop1 and loop2 count need be propotional 
> >>> from
> >>> the
> >>> original loop.
> >>>
> >>>
> >>> diff base/loop-cond-split-1.c.151t.lsplit
> >>> patched/loop-cond-split-1.c.151t.lsplit:
> >>> ...
> >>>   int prephitmp_16;
> >>>   int prephitmp_25;
> >>>
> >>>[local count: 118111600]:
> >>>   if (n_7(D) > 0)
> >>> goto ; [89.00%]
> >>>   else
> >>> goto ; [11.00%]
> >>>
> >>>[local count: 118111600]:
> >>>   return;
> >>>
> >>>[local count: 105119324]:
> >>>   pretmp_3 = ga;
> >>>
> >>> -   [local count: 955630225]:
> >>> +   [local count: 315357973]:
> >>>   # i_13 = PHI 
> >>>   # prephitmp_12 = PHI 
> >>>   if (prephitmp_12 != 0)
> >>> goto ; [33.00%]
> >>>   else
> >>> goto ; [67.00%]
> >>>
> >>> -   [local count: 315357972]:
> >>> +   [local count: 104068130]:
> >>>   _2 = do_something ();
> >>>   ga = _2;
> >>>
> >>> -   [local count: 955630225]:
> >>> +   [local count: 315357973]:
> >>>   # prephitmp_5 = PHI 
> >>>   i_10 = inc (i_13);
> >>>   if (n_7(D) > i_10)
> >>> goto ; [89.00%]
> >>>   else
> >>> goto ; [11.00%]
> >>>
> >>>[local count: 105119324]:
> >>>   goto ; [100.00%]
> >>>
> >>> -   [local count: 850510901]:
> >>> +   [local count: 280668596]:
> >>>   if (prephitmp_12 != 0)
> >>> -goto ; [100.00%]
> >>> +goto ; [33.00%]
> >>>   else
> >>> -goto ; [INV]
> >>> +goto ; [67.00%]
> >>>
> >>> -   [local count: 850510901]:
> >>> +   [local count: 280668596]:
> >>>   goto ; [100.00%]
> >>>
> >>> -   [count: 0]:
> >>> +   [local count: 70429947]:
> >>>   # i_23 = PHI 
> >>>   # prephitmp_25 = PHI 
> >>>
> >>> -   [local count: 955630225]:
> >>> +   [local count: 640272252]:
> >>>   # i_15 = PHI 
> >>>   # prephitmp_16 = PHI 
> >>>   i_22 = inc (i_15);
> >>>   if (n_7(D) > i_22)
> >>> goto ; [89.00%]
> >>>   else
> >>> goto ; [11.00%]
> >>>
> >>> -   [local count: 850510901]:
> >>> +   [local count: 569842305]:
> >>>   goto ; [100.00%]
> >>>
> >>> }
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>  * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> >>>  (do_split_loop_on_cond): Likewise.
> >>> ---
> >>> gcc/tree-ssa-loop-split.c | 16 
> >>> 1 file changed, 8 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> >>> index 3a09bbc39e5..8e5a7ded0f7 100644
> >>> --- a/gcc/tree-ssa-loop-split.c
> >>> +++ b/gcc/tree-ssa-loop-split.c
> >>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> >>>  basic_block cond_bb;
> >
> > if (!initial_true)
> > - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> > + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> > +
> > +   edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> > +  ? EDGE_SUCC (bbs[i], 0)
> > +  : EDGE_SUCC (bbs[i], 1);
> >
> >>> 
> >>>   class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> >>> -   

Re: [PATCH] Move negative stride bias out of dr_misalignment

2021-10-26 Thread Bernhard Reutner-Fischer via Gcc-patches
On 26 October 2021 11:19:44 CEST, Richard Biener via Gcc-patches 
 wrote:


 
>@@ -2010,6 +2010,7 @@ get_negative_load_store_type (vec_info *vinfo,
>   if (dump_enabled_p ())
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>"negative step but alignment required.\n");
>+  *poffset = 0;
>   return VMAT_ELEMENTWISE;
>   *poffset = 0;
> }

I think you cannot really diagnose these, they would trigger a lot, in each 
early return, no?
Or would we see that there are unreachable, non artificial stmts in the same 
block after a return?
Somebody could experiment with diagnosing each and every DCEd stmt 
(!artificial, nondebug) but I would expect that hell breaks loose..
cheers,


[PATCH] Unify offset and byte_offset for vect_create_addr_base_for_vector_ref

2021-10-26 Thread Richard Biener via Gcc-patches
Now that both are measured in bytes we can unify the two parameters.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-26  Richard Biener  

* tree-vectorizer.h (vect_create_addr_base_for_vector_ref):
Remove byte_offset parameter.
(vect_create_data_ref_ptr): Likewise.
* tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref):
Likewise.
(vect_create_data_ref_ptr): Likewise.
* tree-vect-stmts.c (vectorizable_store): Adjust.
(vectorizable_load): Likewise.
---
 gcc/tree-vect-data-refs.c | 21 -
 gcc/tree-vect-stmts.c | 18 +-
 gcc/tree-vectorizer.h |  4 ++--
 3 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index cda89ef9725..2ea8e983fe6 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4772,8 +4772,6 @@ vect_duplicate_ssa_name_ptr_info (tree name, dr_vec_info 
*dr_info)
is as follows:
if LOOP=i_loop: &in (relative to i_loop)
if LOOP=j_loop: &in+i*2B(relative to j_loop)
-   BYTE_OFFSET: Optional, defaulted to NULL.  If supplied, it is added to the
-   initial address.  Both OFFSET and BYTE_OFFSET are measured in bytes.
 
Output:
1. Return an SSA_NAME whose value is the address of the memory location of
@@ -4786,8 +4784,7 @@ vect_duplicate_ssa_name_ptr_info (tree name, dr_vec_info 
*dr_info)
 tree
 vect_create_addr_base_for_vector_ref (vec_info *vinfo, stmt_vec_info stmt_info,
  gimple_seq *new_stmt_list,
- tree offset,
- tree byte_offset)
+ tree offset)
 {
   dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info);
   struct data_reference *dr = dr_info->dr;
@@ -4823,12 +4820,6 @@ vect_create_addr_base_for_vector_ref (vec_info *vinfo, 
stmt_vec_info stmt_info,
   base_offset = fold_build2 (PLUS_EXPR, sizetype,
 base_offset, offset);
 }
-  if (byte_offset)
-{
-  byte_offset = fold_convert (sizetype, byte_offset);
-  base_offset = fold_build2 (PLUS_EXPR, sizetype,
-base_offset, byte_offset);
-}
 
   /* base + base_offset */
   if (loop_vinfo)
@@ -4882,10 +4873,6 @@ vect_create_addr_base_for_vector_ref (vec_info *vinfo, 
stmt_vec_info stmt_info,
5. BSI: location where the new stmts are to be placed if there is no loop
6. ONLY_INIT: indicate if ap is to be updated in the loop, or remain
 pointing to the initial address.
-   7. BYTE_OFFSET (optional, defaults to NULL): a byte offset to be added
-   to the initial address accessed by the data-ref in STMT_INFO.  This is
-   similar to OFFSET, but OFFSET is counted in elements, while BYTE_OFFSET
-   in bytes.
8. IV_STEP (optional, defaults to NULL): the amount that should be added
to the IV during each iteration of the loop.  NULL says to move
by one copy of AGGR_TYPE up or down, depending on the step of the
@@ -4920,7 +4907,7 @@ vect_create_data_ref_ptr (vec_info *vinfo, stmt_vec_info 
stmt_info,
  tree aggr_type, class loop *at_loop, tree offset,
  tree *initial_address, gimple_stmt_iterator *gsi,
  gimple **ptr_incr, bool only_init,
- tree byte_offset, tree iv_step)
+ tree iv_step)
 {
   const char *base_name;
   loop_vec_info loop_vinfo = dyn_cast  (vinfo);
@@ -5048,11 +5035,11 @@ vect_create_data_ref_ptr (vec_info *vinfo, 
stmt_vec_info stmt_info,
   /* (2) Calculate the initial address of the aggregate-pointer, and set
  the aggregate-pointer to point to it before the loop.  */
 
-  /* Create: (&(base[init_val]+offset+byte_offset) in the loop preheader.  */
+  /* Create: (&(base[init_val]+offset) in the loop preheader.  */
 
   new_temp = vect_create_addr_base_for_vector_ref (vinfo,
   stmt_info, &new_stmt_list,
-  offset, byte_offset);
+  offset);
   if (new_stmt_list)
 {
   if (pe)
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 3cd338a73eb..bf07e7a9495 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -8110,7 +8110,7 @@ vectorizable_store (vec_info *vinfo,
  = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type,
  simd_lane_access_p ? loop : NULL,
  offset, &dummy, gsi, &ptr_incr,
- simd_lane_access_p, NULL_TREE, bump);
+ simd_lane_access_p, bump);
}
   else
{
@@ -8518,7 +8518,6 @@ vect

Re: [PATCH] Turn vect_create_addr_base_for_vector_ref offset into a byte offset

2021-10-26 Thread Richard Biener via Gcc-patches
On Tue, 26 Oct 2021, Bernhard Reutner-Fischer wrote:

> On 26 October 2021 10:58:50 CEST, Richard Biener via Gcc-patches 
>  wrote:
> 
> >@@ -2006,6 +2011,7 @@ get_negative_load_store_type (vec_info *vinfo,
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "negative step but alignment required.\n");
> >   return VMAT_ELEMENTWISE;
> >+  *poffset = 0;
> 
> Would you want to move this to before the return?

Doh!  Why don't we diagnose this ... :/

Richard.


Re: [PATCH] Turn vect_create_addr_base_for_vector_ref offset into a byte offset

2021-10-26 Thread Bernhard Reutner-Fischer via Gcc-patches
On 26 October 2021 10:58:50 CEST, Richard Biener via Gcc-patches 
 wrote:

>@@ -2006,6 +2011,7 @@ get_negative_load_store_type (vec_info *vinfo,
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>"negative step but alignment required.\n");
>   return VMAT_ELEMENTWISE;
>+  *poffset = 0;

Would you want to move this to before the return?
thanks,
> }
> 
>   if (vls_type == VLS_STORE_INVARIANT)
>@@ -2014,7 +2020,6 @@ get_negative_load_store_type (vec_info *vinfo,
>   dump_printf_loc (MSG_NOTE, vect_location,
>"negative step with invariant source;"
>" no permute needed.\n");
>-  *poffset = -TYPE_VECTOR_SUBPARTS (vectype) + 1;
>   return VMAT_CONTIGUOUS_DOWN;
> }



Re: [PATCH] forwprop: Remove incorrect assertion [PR102897]

2021-10-26 Thread Kewen.Lin via Gcc-patches
Hi Richi,

on 2021/10/26 下午3:50, Richard Biener wrote:
> On Tue, Oct 26, 2021 at 5:40 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> As PR102897 shows, there is one incorrect assertion in function
>> simplify_permutation, which is based on the wrong assumption that
>> all cases with op2_type == tgt_type are handled previously, the
>> proposed fix is to remove this wrong assertion.
>>
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
> 
> I think you need to enable optimization in the new testcase, gcc.dg/ only
> runs -O0 by default which wouldn't trigger forwprop?  Please verify the
> testcase ICEs before the fix.
> 

Thanks for catching!  You are right, the optimization option is required,
I just verified it and committed with the additional "-O1" as r12-4705.

> Otherwise OK.
> 

Thanks!

BR,
Kewen

> Thanks,
> Richard,
> 
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>> PR tree-optimization/102897
>> * tree-ssa-forwprop.c (simplify_permutation): Remove a wrong 
>> assertion.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.dg/pr102897.c: New test.
>> ---
>>  gcc/testsuite/gcc.dg/pr102897.c | 16 
>>  gcc/tree-ssa-forwprop.c |  2 --
>>  2 files changed, 16 insertions(+), 2 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/pr102897.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/pr102897.c 
>> b/gcc/testsuite/gcc.dg/pr102897.c
>> new file mode 100644
>> index 000..d96b0e48ccc
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr102897.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* Specify C99 to avoid the warning/error on compound literals.  */
>> +/* { dg-options "-std=c99" } */
>> +
>> +/* Verify that there is no ICE.  */
>> +
>> +typedef __attribute__((vector_size(8))) signed char int8x8_t;
>> +typedef __attribute__((vector_size(8))) unsigned char uint8x8_t;
>> +
>> +int8x8_t fn1 (int8x8_t val20, char tmp)
>> +{
>> +  uint8x8_t __trans_tmp_3;
>> +  __trans_tmp_3 = (uint8x8_t){tmp};
>> +  int8x8_t __a = (int8x8_t) __trans_tmp_3;
>> +  return __builtin_shuffle (__a, val20, (uint8x8_t){0});
>> +}
>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>> index 5b30d4c1a76..a830bab78ba 100644
>> --- a/gcc/tree-ssa-forwprop.c
>> +++ b/gcc/tree-ssa-forwprop.c
>> @@ -2267,8 +2267,6 @@ simplify_permutation (gimple_stmt_iterator *gsi)
>>   if (!VECTOR_TYPE_P (tgt_type))
>> return 0;
>>   tree op2_type = TREE_TYPE (op2);
>> - /* Should have folded this before.  */
>> - gcc_assert (op2_type != tgt_type);
>>
>>   /* Figure out the shrunk factor.  */
>>   poly_uint64 tgt_units = TYPE_VECTOR_SUBPARTS (tgt_type);
>> --
>> 2.27.0


Re: [PATCH] PR fortran/102816 - [12 Regression] ICE in resolve_structure_cons, at fortran/resolve.c:1467

2021-10-26 Thread Tobias Burnus

Dear Harald, hi all,

On 22.10.21 21:36, Harald Anlauf via Fortran wrote:

the recently introduced shape validation for array components
in DT constructors did not properly deal with invalid code
created by ingenious testers.

Obvious solution: replace the gcc_assert by a suitable error message.

Regarding the error message: before the shape validation, gfortran
would emit the same error message twice referring to the same line,
namely the bad declaration of the component.  With the attached patch
we get one error message for the bad declaration of the component,
and one for the structure constructor referring to that DT component.
One could easily change that and make the second message refer to the
same as the declaration, giving two errors for the same line.

Comments / opinions?


Does not really matter in this case as long as there is one error for
the invalid "integer :: a([2])".

In other cases, it requires some careful weighting whether error should
have the error location "use m" or where the symbol is used. (Here, it
cannot occur as the module won't get generated and an error is already
printed at the proper location.)


Regtested on x86_64-pc-linux-gnu.  OK?


OK.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] PR fortran/102917 - PDT type parameters are not restricted to default integer

2021-10-26 Thread Tobias Burnus

Dear Harald, dear all,

On 24.10.21 21:00, Harald Anlauf via Fortran wrote:

I've created PR 102917 for tracking this issue and packaged
the attached patch.

Regtested on x86_64-pc-linux-gnu.  OK mainline?


OK. I wonder whether a valid len/kind example should be added which uses
such a PDT with non-default-kind integer.

Tobias


Gesendet: Freitag, 22. Oktober 2021 um 22:25 Uhr
Von: "Steve Kargl" 
An: "Harald Anlauf" 
Cc: fort...@gcc.gnu.org
Betreff: Re: PDT type parameters are not restricted to default integer

On Fri, Oct 22, 2021 at 10:16:05PM +0200, Harald Anlauf wrote:

Hi Steve,

Am 22.10.21 um 21:35 schrieb Steve Kargl via Fortran:

Here's an obvious quick fix.  Please apply.


diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 6043e100fbb..e889bb44142 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -5619,14 +5619,6 @@ match_attr_spec (void)
 m = MATCH_ERROR;
 goto cleanup;
   }
-if (current_ts.kind != gfc_default_integer_kind)
-  {
-gfc_error ("Component with LEN attribute at %C must be "
-   "default integer kind (%d)",
-gfc_default_integer_kind);
-m = MATCH_ERROR;
-goto cleanup;
-  }
   }
 else
   {

I think you are right.  We should always have allowed any integer kind.

However, have you checked whether this change introduces regressions?
If you don't, somebody else will.  Please open a PR, then.


It seems that pdt_4.f03 will fail with the above patch because
it explicitly tests for this error message.  That's the only
failure in the testsuite.  For the record, F2003, page 48,

R435 type-param-def-stmt  is INTEGER [ kind-selector ] , ...

Each type parameter is itself of type integer.  If its kind selector
is omitted, the kind type parameter is default integer.

Now that I think about and look, there is a nearby similar gcc_error()
for KIND.  This should be removed too.

--
Steve


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] AVX512FP16: Optimize _Float16 reciprocal for div and sqrt

2021-10-26 Thread Hongyu Wang via Gcc-patches
Hi,

For _Float16 type, add insn and expanders to optimize x / y to
x * rcp (y), and x / sqrt (y) to x * rsqrt (y).
As Half float only have minor precision difference between div and
mul * rcp, there is no need for Newton-Rhapson approximation.

Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
Ok for master?

gcc/ChangeLog:

* config/i386/i386.c (use_rsqrt_p): Add mode parameter, enable
  HFmode rsqrt without TARGET_SSE_MATH.
(ix86_optab_supported_p): Refactor rint, adjust floor, ceil,
btrunc condition to be restricted by -ftrapping-math, adjust
use_rsqrt_p function call.
* config/i386/i386.md (rcphf2): New define_insn.
(rsqrthf2): Likewise.
* config/i386/sse.md (div3): Change VF2H to VF2.
(div3): New expander for HF mode.
(rsqrt2): Likewise.
(*avx512fp16_vmrcpv8hf2): New define_insn for rpad pass.
(*avx512fp16_vmrsqrtv8hf2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-recip-1.c: New test.
* gcc.target/i386/avx512fp16-recip-2.c: Ditto.
* gcc.target/i386/pr102464.c: Add -fno-trapping-math.
---
 gcc/config/i386/i386.c| 29 +++---
 gcc/config/i386/i386.md   | 44 -
 gcc/config/i386/sse.md| 63 +++-
 .../gcc.target/i386/avx512fp16-recip-1.c  | 43 
 .../gcc.target/i386/avx512fp16-recip-2.c  | 97 +++
 gcc/testsuite/gcc.target/i386/pr102464.c  |  2 +-
 6 files changed, 258 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-recip-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-recip-2.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 299e1ab2621..c5789365d3b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18905,9 +18905,10 @@ ix86_vectorize_builtin_scatter (const_tree vectype,
1.0/sqrt.  */
 
 static bool
-use_rsqrt_p ()
+use_rsqrt_p (machine_mode mode)
 {
-  return (TARGET_SSE && TARGET_SSE_MATH
+  return ((mode == HFmode
+  || (TARGET_SSE && TARGET_SSE_MATH))
  && flag_finite_math_only
  && !flag_trapping_math
  && flag_unsafe_math_optimizations);
@@ -23603,29 +23604,27 @@ ix86_optab_supported_p (int op, machine_mode mode1, 
machine_mode,
   return opt_type == OPTIMIZE_FOR_SPEED;
 
 case rint_optab:
-  if (mode1 == HFmode)
-   return true;
-  else if (SSE_FLOAT_MODE_P (mode1)
-  && TARGET_SSE_MATH
-  && !flag_trapping_math
-  && !TARGET_SSE4_1)
+  if (SSE_FLOAT_MODE_P (mode1)
+ && TARGET_SSE_MATH
+ && !flag_trapping_math
+ && !TARGET_SSE4_1
+ && mode1 != HFmode)
return opt_type == OPTIMIZE_FOR_SPEED;
   return true;
 
 case floor_optab:
 case ceil_optab:
 case btrunc_optab:
-  if (mode1 == HFmode)
-   return true;
-  else if (SSE_FLOAT_MODE_P (mode1)
-  && TARGET_SSE_MATH
-  && !flag_trapping_math
-  && TARGET_SSE4_1)
+  if (((SSE_FLOAT_MODE_P (mode1)
+   && TARGET_SSE_MATH
+   && TARGET_SSE4_1)
+  || mode1 == HFmode)
+ && !flag_trapping_math)
return true;
   return opt_type == OPTIMIZE_FOR_SPEED;
 
 case rsqrt_optab:
-  return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p ();
+  return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (mode1);
 
 default:
   return true;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e733a40fc90..11535df5425 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8417,11 +8417,27 @@
(match_operand:XF 2 "register_operand")))]
   "TARGET_80387")
 
+/* There is no more precision loss than Newton-Rhapson approximation
+  when using HFmode rcp/rsqrt, so do the transformation directly under
+  TARGET_RECIP_DIV and fast-math.  */
 (define_expand "divhf3"
   [(set (match_operand:HF 0 "register_operand")
(div:HF (match_operand:HF 1 "register_operand")
   (match_operand:HF 2 "nonimmediate_operand")))]
-  "TARGET_AVX512FP16")
+  "TARGET_AVX512FP16"
+{
+  if (TARGET_RECIP_DIV
+  && optimize_insn_for_speed_p ()
+  && flag_finite_math_only && !flag_trapping_math
+  && flag_unsafe_math_optimizations)
+{
+  rtx op = gen_reg_rtx (HFmode);
+  operands[2] = force_reg (HFmode, operands[2]);
+  emit_insn (gen_rcphf2 (op, operands[2]));
+  emit_insn (gen_mulhf3 (operands[0], operands[1], op));
+  DONE;
+}
+})
 
 (define_expand "div3"
   [(set (match_operand:MODEF 0 "register_operand")
@@ -16973,6 +16989,19 @@
]
(symbol_ref "true")))])
 
+(define_insn "rcphf2"
+  [(set (match_operand:HF 0 "register_operand" "=v,v")
+   (unspec:HF [(match_operand:HF 1 "nonimmediate_operand" "v,m")]
+  UNSPEC_RCP))]
+  "TARGET_AVX512FP16"

Re: [PATCH] libcody: add mostlyclean Makefile target

2021-10-26 Thread Martin Liška

On 10/25/21 18:10, Eric Gallager wrote:

On Mon, Oct 25, 2021 at 7:35 AM Martin Liška  wrote:


Hello.

The patch adds missing Makefile mostlyclean.

Ready to be installed?
Thanks,
Martin



Generally the way the various "*clean" targets are arranged, in order
of cleanliness, from least clean to most clean, is:
mostlyclean
clean
distclean
maintainer-clean
...with each target depending on the previous one in the order. So
thus, instead of mostlyclean depending on clean, it'd be the other way
around, with clean depending on mostlyclean. See how the gcc/
subdirectory does it, for example. See the "Standard Targets for
Users" section of the GNU Coding Standards:
https://www.gnu.org/prep/standards/html_node/Standard-Targets.html#Standard-Targets


Thank you for the explanation.

There's updated version of the patch.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin




 PR other/102657

libcody/ChangeLog:

 * Makefile.in: Add mostlyclean Makefile target.
---
   libcody/Makefile.in | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libcody/Makefile.in b/libcody/Makefile.in
index b8b45a2e310..d8f1e8216d4 100644
--- a/libcody/Makefile.in
+++ b/libcody/Makefile.in
@@ -111,7 +111,7 @@ maintainer-clean:: distclean
   clean::
 rm -f $(shell find $(srcdir) -name '*~')

-.PHONY: all check clean distclean maintainer-clean
+.PHONY: all check clean distclean maintainer-clean mostlyclean

   CXXFLAGS/ := -I$(srcdir)
   LIBCODY.O := buffer.o client.o fatal.o netclient.o netserver.o \
@@ -127,6 +127,8 @@ clean::
 rm -f $(LIBCODY.O) $(LIBCODY.O:.o=.d)
 rm -f libcody.a

+mostlyclean: clean
+
   CXXFLAGS/fatal.cc = -DSRCDIR='"$(srcdir)"'

   fatal.o: Makefile revision
--
2.33.1

From fcad6039f910b49dfc4022d3b1eeb993025dabca Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 25 Oct 2021 16:32:55 +0200
Subject: [PATCH] libcody: add mostlyclean Makefile target

	PR other/102657

libcody/ChangeLog:

	* Makefile.in: Add mostlyclean Makefile target.
---
 libcody/Makefile.in | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/libcody/Makefile.in b/libcody/Makefile.in
index b8b45a2e310..7eaf8ace8ce 100644
--- a/libcody/Makefile.in
+++ b/libcody/Makefile.in
@@ -60,6 +60,8 @@ vpath %.cc $(srcdir)
 
 all:: Makefile
 
+mostlyclean::
+
 clean:: Makefile
 
 # FIXME: Delete
@@ -77,7 +79,9 @@ revision.stamp: $(srcdir)/.
 revision: revision.stamp
 	@cmp -s $< $@ || cp -f $< $@
 
-clean::
+mostlyclean::
+
+clean:: mostlyclean
 	rm -f revision.stamp revision
 
 distclean:: clean
@@ -102,16 +106,18 @@ endif
 config.status: $(srcdir)/configure $(srcdir)/config.h.in
 	if test -x $@; then ./$@ -recheck; else $< @configure_args@; fi
 
+mostlyclean::
+
+clean:: mostlyclean
+	rm -f $(shell find $(srcdir) -name '*~')
+
 distclean:: clean
 	rm -f config.h
 
 maintainer-clean:: distclean
 	rm -f $(srcdir)/config.h.in
 
-clean::
-	rm -f $(shell find $(srcdir) -name '*~')
-
-.PHONY: all check clean distclean maintainer-clean
+.PHONY: all check clean distclean maintainer-clean mostlyclean
 
 CXXFLAGS/ := -I$(srcdir)
 LIBCODY.O := buffer.o client.o fatal.o netclient.o netserver.o \
@@ -123,7 +129,9 @@ libcody.a: $(LIBCODY.O)
 	$(AR) -cr $@ $^
 	$(RANLIB) $@
 
-clean::
+mostlyclean::
+
+clean:: mostlyclean
 	rm -f $(LIBCODY.O) $(LIBCODY.O:.o=.d)
 	rm -f libcody.a
 
-- 
2.33.1



[PATCH] target: [PR102941] Fix inline-asm flags with non-REG_P output

2021-10-26 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

So the problem here is that arm_md_asm_adjust would
just create a set directly to the output memory which is wrong.
It needs to output to a temp register first and then do a
move.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
I have no way to test on arm even though this touches common code.

PR target/102941

gcc/ChangeLog:

* config/arm/aarch-common.c (arm_md_asm_adjust):
Use a temp if !REG_P.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/asm-flag-7.c: New test.
* gcc.target/arm/asm-flag-7.c: New test.
---
 gcc/config/arm/aarch-common.c |  2 +-
 gcc/testsuite/gcc.target/aarch64/asm-flag-7.c | 22 ++
 gcc/testsuite/gcc.target/arm/asm-flag-7.c | 23 +++
 3 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-7.c
 create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-7.c

diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index 67343fe4025..60b3516c1df 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -641,7 +641,7 @@ arm_md_asm_adjust (vec &outputs, vec & /*inputs*/,
   rtx x = gen_rtx_REG (mode, CC_REGNUM);
   x = gen_rtx_fmt_ee (code, word_mode, x, const0_rtx);
 
-  if (dest_mode == word_mode)
+  if (dest_mode == word_mode && REG_P (dest))
emit_insn (gen_rtx_SET (dest, x));
   else
{
diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-7.c 
b/gcc/testsuite/gcc.target/aarch64/asm-flag-7.c
new file mode 100644
index 000..6c31b854b0b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-7.c
@@ -0,0 +1,22 @@
+/* Test that "=@cc*" works with MEM_P RTX  */
+/* PR target/102941 */
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#ifndef __GCC_ASM_FLAG_OUTPUTS__
+#error "missing preprocessor define"
+#endif
+int test_cmpu_x;
+
+void f(long *);
+long
+test_cmpu_y() {
+  long le;
+  f(&le);
+  __asm__("cmp %"
+  "[x], %"
+  "[y]"
+  : "=@ccls"(le)
+  : [x] ""(test_cmpu_x), [y] ""(test_cmpu_y));
+return le;
+}
diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-7.c 
b/gcc/testsuite/gcc.target/arm/asm-flag-7.c
new file mode 100644
index 000..ac11da0a3a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/asm-flag-7.c
@@ -0,0 +1,23 @@
+/* Test that "=@cc*" works with MEM_P RTX  */
+/* PR target/102941 */
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+/* { dg-skip-if "" { arm_thumb1 } } */
+
+#ifndef __GCC_ASM_FLAG_OUTPUTS__
+#error "missing preprocessor define"
+#endif
+int test_cmpu_x;
+
+void f(long *);
+long
+test_cmpu_y() {
+  long le;
+  f(&le);
+  __asm__("cmp %"
+  "[x], %"
+  "[y]"
+  : "=@ccls"(le)
+  : [x] ""(test_cmpu_x), [y] ""(test_cmpu_y));
+return le;
+}
-- 
2.17.1



Re: [PATCH,Fortran 6/7] Fortran: Delete unused decl in trans-types.h

2021-10-26 Thread Tobias Burnus

On 25.10.21 00:30, Bernhard Reutner-Fischer via Fortran wrote:

From: Bernhard Reutner-Fischer 
gcc/fortran/ChangeLog:

  * trans-types.h (gfc_convert_function_code): Delete.


OK.

Tobias


---
  gcc/fortran/trans-types.h | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/gcc/fortran/trans-types.h b/gcc/fortran/trans-types.h
index 1b43503092b..3bc236cad0d 100644
--- a/gcc/fortran/trans-types.h
+++ b/gcc/fortran/trans-types.h
@@ -65,9 +65,6 @@ enum gfc_packed {
PACKED_STATIC
  };

-/* be-function.c */
-void gfc_convert_function_code (gfc_namespace *);
-
  /* trans-types.c */
  void gfc_init_kinds (void);
  void gfc_init_types (void);

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH,Fortran 5/7] Fortran: Delete unused decl in trans-stmt.h

2021-10-26 Thread Tobias Burnus

On 25.10.21 00:30, Bernhard Reutner-Fischer via Gcc-patches wrote:

From: Bernhard Reutner-Fischer 

gcc/fortran/ChangeLog:

  * trans-stmt.h (gfc_trans_deallocate_array): Delete.


OK. Thanks!

Tobias


---
  gcc/fortran/trans-stmt.h | 1 -
  1 file changed, 1 deletion(-)

diff --git a/gcc/fortran/trans-stmt.h b/gcc/fortran/trans-stmt.h
index 1a24d9b4cdc..e824caf4d08 100644
--- a/gcc/fortran/trans-stmt.h
+++ b/gcc/fortran/trans-stmt.h
@@ -66,7 +66,6 @@ tree gfc_trans_sync_team (gfc_code *);
  tree gfc_trans_where (gfc_code *);
  tree gfc_trans_allocate (gfc_code *);
  tree gfc_trans_deallocate (gfc_code *);
-tree gfc_trans_deallocate_array (tree);

  /* trans-openmp.c */
  tree gfc_trans_omp_directive (gfc_code *);

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH,Fortran 4/7] Fortran: make some trans-array functions static

2021-10-26 Thread Tobias Burnus

On 25.10.21 00:30, Bernhard Reutner-Fischer via Fortran wrote:

From: Bernhard Reutner-Fischer 

gcc/fortran/ChangeLog:

  * trans-array.c (gfc_trans_scalarized_loop_end): Make static.
  * trans-array.h (gfc_trans_scalarized_loop_end,
  gfc_conv_tmp_ref, gfc_conv_array_transpose): Delete declaration.


(the later are declared but no longer exist)

OK. Thanks for the cleanup!

Tobias


---
  gcc/fortran/trans-array.c | 2 +-
  gcc/fortran/trans-array.h | 6 --
  2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index bceb8b24ba4..5ceb261b698 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -4161,7 +4161,7 @@ gfc_start_scalarized_body (gfc_loopinfo * loop, 
stmtblock_t * pbody)

  /* Generates the actual loop code for a scalarization loop.  */

-void
+static void
  gfc_trans_scalarized_loop_end (gfc_loopinfo * loop, int n,
 stmtblock_t * pbody)
  {
diff --git a/gcc/fortran/trans-array.h b/gcc/fortran/trans-array.h
index 1d3dc4819eb..12068c742a5 100644
--- a/gcc/fortran/trans-array.h
+++ b/gcc/fortran/trans-array.h
@@ -118,8 +118,6 @@ void gfc_copy_loopinfo_to_se (gfc_se *, gfc_loopinfo *);

  /* Marks the start of a scalarized expression, and declares loop variables.  
*/
  void gfc_start_scalarized_body (gfc_loopinfo *, stmtblock_t *);
-/* Generates one actual loop for a scalarized expression.  */
-void  gfc_trans_scalarized_loop_end (gfc_loopinfo *, int, stmtblock_t *);
  /* Generates the actual loops for a scalarized expression.  */
  void gfc_trans_scalarizing_loops (gfc_loopinfo *, stmtblock_t *);
  /* Mark the end of the main loop body and the start of the copying loop.  */
@@ -137,8 +135,6 @@ tree gfc_build_null_descriptor (tree);
  void gfc_conv_array_ref (gfc_se *, gfc_array_ref *, gfc_expr *, locus *);
  /* Translate a reference to a temporary array.  */
  void gfc_conv_tmp_array_ref (gfc_se * se);
-/* Translate a reference to an array temporary.  */
-void gfc_conv_tmp_ref (gfc_se *);

  /* Calculate the overall offset, including subreferences.  */
  void gfc_get_dataptr_offset (stmtblock_t*, tree, tree, tree, bool, gfc_expr*);
@@ -149,8 +145,6 @@ void gfc_conv_expr_descriptor (gfc_se *, gfc_expr *);
  /* Convert an array for passing as an actual function parameter.  */
  void gfc_conv_array_parameter (gfc_se *, gfc_expr *, bool,
 const gfc_symbol *, const char *, tree *);
-/* Evaluate and transpose a matrix expression.  */
-void gfc_conv_array_transpose (gfc_se *, gfc_expr *);

  /* These work with both descriptors and descriptorless arrays.  */
  tree gfc_conv_array_data (tree);

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH,Fortran 2/7] Fortran: make some match* functions static

2021-10-26 Thread Tobias Burnus

On 25.10.21 00:30, Bernhard Reutner-Fischer via Gcc-patches wrote:


From: Bernhard Reutner-Fischer 

gfc_match_small_int_expr was unused, delete it.
gfc_match_gcc_unroll should use gfc_match_small_literal_int and then

(but wasn't in this patch)

gfc_match_small_int can be deleted since it will be unused.

gcc/fortran/ChangeLog:

  * decl.c (gfc_match_old_kind_spec, set_com_block_bind_c,
  set_verify_bind_c_sym, set_verify_bind_c_com_block,
  get_bind_c_idents, gfc_match_suffix, gfc_get_type_attr_spec,
  (check_extended_derived_type): Make static.
  (gfc_match_gcc_unroll): Add comment.
  * match.c (gfc_match_small_int_expr): Delete definition.
  * match.h (gfc_match_small_int_expr): Delete declaration.
  (gfc_match_name_C, gfc_match_old_kind_spec, set_com_block_bind_c,
  set_verify_bind_c_sym, set_verify_bind_c_com_block,
  get_bind_c_idents, gfc_match_suffix,
  gfc_get_type_attr_spec): Delete declaration.


OK. Thanks for the cleanup.

Tobias


---
  gcc/fortran/decl.c  | 15 ---
  gcc/fortran/match.c | 26 --
  gcc/fortran/match.h |  9 -
  3 files changed, 8 insertions(+), 42 deletions(-)

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 6043e100fbb..1e034d1b344 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -3128,7 +3128,7 @@ cleanup:
 This assumes that the byte size is equal to the kind number for
 non-COMPLEX types, and equal to twice the kind number for COMPLEX.  */

-match
+static match
  gfc_match_old_kind_spec (gfc_typespec *ts)
  {
match m;
@@ -5867,7 +5867,7 @@ set_binding_label (const char **dest_label, const char 
*sym_name,
  /* Set the status of the given common block as being BIND(C) or not,
 depending on the given parameter, is_bind_c.  */

-void
+static void
  set_com_block_bind_c (gfc_common_head *com_block, int is_bind_c)
  {
com_block->is_bind_c = is_bind_c;
@@ -6055,7 +6055,7 @@ verify_bind_c_sym (gfc_symbol *tmp_sym, gfc_typespec *ts,
 the type is C interoperable.  Errors are reported by the functions
 used to set/test these fields.  */

-bool
+static bool
  set_verify_bind_c_sym (gfc_symbol *tmp_sym, int num_idents)
  {
bool retval = true;
@@ -6075,7 +6075,7 @@ set_verify_bind_c_sym (gfc_symbol *tmp_sym, int 
num_idents)
  /* Set the fields marking the given common block as BIND(C), including
 a binding label, and report any errors encountered.  */

-bool
+static bool
  set_verify_bind_c_com_block (gfc_common_head *com_block, int num_idents)
  {
bool retval = true;
@@ -6095,7 +6095,7 @@ set_verify_bind_c_com_block (gfc_common_head *com_block, 
int num_idents)
  /* Retrieve the list of one or more identifiers that the given bind(c)
 attribute applies to.  */

-bool
+static bool
  get_bind_c_idents (void)
  {
char name[GFC_MAX_SYMBOL_LEN + 1];
@@ -6804,7 +6804,7 @@ match_result (gfc_symbol *function, gfc_symbol **result)
 clause and BIND(C), either one, or neither.  The draft does not
 require them to come in a specific order.  */

-match
+static match
  gfc_match_suffix (gfc_symbol *sym, gfc_symbol **result)
  {
match is_bind_c;   /* Found bind(c).  */
@@ -10116,7 +10116,7 @@ check_extended_derived_type (char *name)
 not a handled attribute, and MATCH_YES otherwise.  TODO: More error
 checking on attribute conflicts needs to be done.  */

-match
+static match
  gfc_get_type_attr_spec (symbol_attribute *attr, char *name)
  {
/* See if the derived type is marked as private.  */
@@ -11794,6 +11794,7 @@ gfc_match_gcc_unroll (void)
  {
int value;

+  /* FIXME: use gfc_match_small_literal_int instead, delete small_int  */
if (gfc_match_small_int (&value) == MATCH_YES)
  {
if (value < 0 || value > USHRT_MAX)
diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c
index 91cde55d7a1..5d07f897e45 100644
--- a/gcc/fortran/match.c
+++ b/gcc/fortran/match.c
@@ -530,32 +530,6 @@ gfc_match_small_int (int *value)
  }


-/* This function is the same as the gfc_match_small_int, except that
-   we're keeping the pointer to the expr.  This function could just be
-   removed and the previously mentioned one modified, though all calls
-   to it would have to be modified then (and there were a number of
-   them).  Return MATCH_ERROR if fail to extract the int; otherwise,
-   return the result of gfc_match_expr().  The expr (if any) that was
-   matched is returned in the parameter expr.  */
-
-match
-gfc_match_small_int_expr (int *value, gfc_expr **expr)
-{
-  match m;
-  int i;
-
-  m = gfc_match_expr (expr);
-  if (m != MATCH_YES)
-return m;
-
-  if (gfc_extract_int (*expr, &i, 1))
-m = MATCH_ERROR;
-
-  *value = i;
-  return m;
-}
-
-
  /* Matches a statement label.  Uses gfc_match_small_literal_int() to
 do most of the work.  */

diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h
index eb9459ea99c..e9368db281d 100644
--- a/gcc/fortran/match.h
+++ b/gcc/fortran/match.h
@@ -48,9 +48,7 @@ m

Re: [PATCH,Fortran 1/7] Fortran: make some trans* functions static

2021-10-26 Thread Tobias Burnus

On 25.10.21 00:30, Bernhard Reutner-Fischer via Fortran wrote:


From: Bernhard Reutner-Fischer 

This makes some trans* functions static and deletes declarations of
functions that either do not exist anymore like gfc_get_function_decl
or that are unused like gfc_check_any_c_kind.

gcc/fortran/ChangeLog:
  * expr.c (is_non_empty_structure_constructor): Make static.
  * gfortran.h (gfc_check_any_c_kind): Delete.
  * match.c (gfc_match_label): Make static.
  * match.h (gfc_match_label): Delete declaration.
  * scanner.c (file_changes_cur, file_changes_count,
  file_changes_allocated): Make static.
  * trans-expr.c (gfc_get_character_len): Make static.
  (gfc_class_len_or_zero_get): Make static.
  (VTAB_GET_FIELD_GEN): Undefine.
  (gfc_get_class_array_ref): Make static.
  (gfc_finish_interface_mapping): Make static.
  * trans-types.c (gfc_check_any_c_kind): Delete.
  (pfunc_type_node, dtype_type_node, gfc_get_ppc_type): Make static.
  * trans-types.h (gfc_get_ppc_type): Delete declaration.
  * trans.c (gfc_msg_wrong_return): Delete.
  * trans.h (gfc_class_len_or_zero_get, gfc_class_vtab_extends_get,
  gfc_vptr_extends_get, gfc_get_class_array_ref, gfc_get_character_len,
  gfc_finish_interface_mapping, gfc_msg_wrong_return,
  gfc_get_function_decl): Delete declaration.


OK.

Thanks for the cleanup!

Tobias


---
  gcc/fortran/expr.c|  2 +-
  gcc/fortran/gfortran.h|  1 -
  gcc/fortran/match.c   |  2 +-
  gcc/fortran/match.h   |  1 -
  gcc/fortran/scanner.c |  4 ++--
  gcc/fortran/trans-expr.c  | 10 +-
  gcc/fortran/trans-types.c | 25 +++--
  gcc/fortran/trans-types.h |  1 -
  gcc/fortran/trans.c   |  1 -
  gcc/fortran/trans.h   | 11 ---
  10 files changed, 12 insertions(+), 46 deletions(-)

diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index b19d3a26c60..4dea840e348 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -4817,7 +4817,7 @@ gfc_apply_init (gfc_typespec *ts, symbol_attribute *attr, 
gfc_expr *init)
  /* Check whether an expression is a structure constructor and whether it has
 other values than NULL.  */

-bool
+static bool
  is_non_empty_structure_constructor (gfc_expr * e)
  {
if (e->expr_type != EXPR_STRUCTURE)
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 66192c07d8c..f7662c59a5d 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3284,7 +3284,6 @@ bool gfc_check_character_range (gfc_char_t, int);
  extern bool gfc_seen_div0;

  /* trans-types.c */
-bool gfc_check_any_c_kind (gfc_typespec *);
  int gfc_validate_kind (bt, int, bool);
  int gfc_get_int_kind_from_width_isofortranenv (int size);
  int gfc_get_real_kind_from_width_isofortranenv (int size);
diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c
index 53a575e616e..91cde55d7a1 100644
--- a/gcc/fortran/match.c
+++ b/gcc/fortran/match.c
@@ -599,7 +599,7 @@ cleanup:
 it.  We also make sure the symbol does not refer to another
 (active) block.  A matched label is pointed to by gfc_new_block.  */

-match
+static match
  gfc_match_label (void)
  {
char name[GFC_MAX_SYMBOL_LEN + 1];
diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h
index 21e94f79d95..eb9459ea99c 100644
--- a/gcc/fortran/match.h
+++ b/gcc/fortran/match.h
@@ -47,7 +47,6 @@ match gfc_match_space (void);
  match gfc_match_eos (void);
  match gfc_match_small_literal_int (int *, int *);
  match gfc_match_st_label (gfc_st_label **);
-match gfc_match_label (void);
  match gfc_match_small_int (int *);
  match gfc_match_small_int_expr (int *, gfc_expr **);
  match gfc_match_name (char *);
diff --git a/gcc/fortran/scanner.c b/gcc/fortran/scanner.c
index 5a450692ba3..69b81ab97f8 100644
--- a/gcc/fortran/scanner.c
+++ b/gcc/fortran/scanner.c
@@ -78,8 +78,8 @@ static struct gfc_file_change
gfc_linebuf *lb;
int line;
  } *file_changes;
-size_t file_changes_cur, file_changes_count;
-size_t file_changes_allocated;
+static size_t file_changes_cur, file_changes_count;
+static size_t file_changes_allocated;

  static gfc_char_t *last_error_char;

diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 2d7f9e0fb91..e7aec3845d3 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -45,7 +45,7 @@ along with GCC; see the file COPYING3.  If not see

  /* Calculate the number of characters in a string.  */

-tree
+static tree
  gfc_get_character_len (tree type)
  {
tree len;
@@ -278,7 +278,7 @@ gfc_class_len_get (tree decl)
  /* Try to get the _len component of a class.  When the class is not unlimited
 poly, i.e. no _len field exists, then return a zero node.  */

-tree
+static tree
  gfc_class_len_or_zero_get (tree decl)
  {
tree len;
@@ -382,7 +382,7 @@ VTAB_GET_FIELD_GEN (def_init, VTABLE_DEF_INIT_FIELD)
  VTAB_GET_FIELD_GEN (copy, VTABLE_COPY_FIELD)
  VTAB_GET_FIELD_GEN (final, VTABLE_FINAL_FIELD)
  VTAB_GET_FIELD_GEN (dealloc

Re: [PATCH,Fortran 3/7] Fortran: make some constructor* functions static

2021-10-26 Thread Tobias Burnus

On 25.10.21 00:30, Bernhard Reutner-Fischer via Fortran wrote:

From: Bernhard Reutner-Fischer 
gfc_constructor_expr_foreach and gfc_constructor_swap were just stubs.
gcc/fortran/ChangeLog:

  * constructor.c (gfc_constructor_get_base): Make static.
  (gfc_constructor_expr_foreach, (gfc_constructor_swap): Delete.
  * constructor.h (gfc_constructor_get_base): Remove declaration.
  (gfc_constructor_expr_foreach, (gfc_constructor_swap): Delete.


OK. Thanks for the cleanup.

Tobias


---
  gcc/fortran/constructor.c | 20 ++--
  gcc/fortran/constructor.h | 10 --
  2 files changed, 2 insertions(+), 28 deletions(-)

diff --git a/gcc/fortran/constructor.c b/gcc/fortran/constructor.c
index 3e4377a5ad3..4b5a748271b 100644
--- a/gcc/fortran/constructor.c
+++ b/gcc/fortran/constructor.c
@@ -85,7 +85,8 @@ gfc_constructor_get (void)
return c;
  }

-gfc_constructor_base gfc_constructor_get_base (void)
+static gfc_constructor_base
+gfc_constructor_get_base (void)
  {
return splay_tree_new (splay_tree_compare_ints, NULL, node_free);
  }
@@ -209,23 +210,6 @@ gfc_constructor_lookup_expr (gfc_constructor_base base, 
int offset)
  }


-int
-gfc_constructor_expr_foreach (gfc_constructor *ctor ATTRIBUTE_UNUSED,
-   int(*f)(gfc_expr *) ATTRIBUTE_UNUSED)
-{
-  gcc_assert (0);
-  return 0;
-}
-
-void
-gfc_constructor_swap (gfc_constructor *ctor ATTRIBUTE_UNUSED,
-  int n ATTRIBUTE_UNUSED, int m ATTRIBUTE_UNUSED)
-{
-  gcc_assert (0);
-}
-
-
-
  gfc_constructor *
  gfc_constructor_first (gfc_constructor_base base)
  {
diff --git a/gcc/fortran/constructor.h b/gcc/fortran/constructor.h
index 85a72dcfc0e..25cd6a8f192 100644
--- a/gcc/fortran/constructor.h
+++ b/gcc/fortran/constructor.h
@@ -23,8 +23,6 @@ along with GCC; see the file COPYING3.  If not see
  /* Get a new constructor structure.  */
  gfc_constructor *gfc_constructor_get (void);

-gfc_constructor_base gfc_constructor_get_base (void);
-
  /* Copy a constructor structure.  */
  gfc_constructor_base gfc_constructor_copy (gfc_constructor_base base);

@@ -64,14 +62,6 @@ gfc_constructor *gfc_constructor_lookup 
(gfc_constructor_base base, int n);
  */
  gfc_expr *gfc_constructor_lookup_expr (gfc_constructor_base base, int n);

-
-int gfc_constructor_expr_foreach (gfc_constructor *ctor, int(*)(gfc_expr *));
-
-
-void gfc_constructor_swap (gfc_constructor *ctor, int n, int m);
-
-
-
  /* Get the first constructor node in the constructure structure.
 Returns NULL if there is no such expression.  */
  gfc_constructor *gfc_constructor_first (gfc_constructor_base base);

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] Move negative stride bias out of dr_misalignment

2021-10-26 Thread Richard Biener via Gcc-patches
This moves applying of a bias for negative stride accesses out of
dr_misalignment in favor of a more general optional offset argument.
The negative bias is now computed by get_load_store_type and applied
accordingly to determine the alignment support scheme.  Likewise
the peeling/versioning code is adjusted albeit that still assumes
we'll end up with VMAT_CONTIGUOUS_DOWN or VMAT_CONTIGUOUS_REVERSE
but at least when not so (VMAT_STRIDED_SLP is one possibility) then
get_load_store_type will _not_ falsely report an aligned access but
instead an access with known misalignment.

This fixes PR96109.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2021-10-25  Richard Biener  

PR tree-optimization/96109
* tree-vectorizer.h (dr_misalignment): Add optional offset
parameter.
* tree-vect-data-refs.c (dr_misalignment): Likewise.  Remove
offset applied for negative stride accesses.
(vect_enhance_data_refs_alignment): Compute negative stride
access offset and pass it to dr_misalignment.
* tree-vect-stmts.c (get_negative_load_store_type): Pass
negative offset to dr_misalignment.
(get_group_load_store_type): Likewise.
(get_load_store_type): Likewise.
(vectorizable_store): Remove asserts about alignment.
(vectorizable_load): Likewise.
---
 gcc/tree-vect-data-refs.c | 58 ++-
 gcc/tree-vect-stmts.c | 19 +
 gcc/tree-vectorizer.h |  3 +-
 3 files changed, 48 insertions(+), 32 deletions(-)

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 46360c50bd4..cda89ef9725 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -887,10 +887,11 @@ vect_slp_analyze_instance_dependence (vec_info *vinfo, 
slp_instance instance)
   return res;
 }
 
-/* Return the misalignment of DR_INFO accessed in VECTYPE.  */
+/* Return the misalignment of DR_INFO accessed in VECTYPE with OFFSET
+   applied.  */
 
 int
-dr_misalignment (dr_vec_info *dr_info, tree vectype)
+dr_misalignment (dr_vec_info *dr_info, tree vectype, poly_int64 offset)
 {
   HOST_WIDE_INT diff = 0;
   /* Alignment is only analyzed for the first element of a DR group,
@@ -919,13 +920,9 @@ dr_misalignment (dr_vec_info *dr_info, tree vectype)
targetm.vectorize.preferred_vector_alignment (vectype)))
 return DR_MISALIGNMENT_UNKNOWN;
 
-  /* If this is a backward running DR then first access in the larger
- vectype actually is N-1 elements before the address in the DR.
- Adjust misalign accordingly.  */
-  poly_int64 misalignment = misalign + diff;
-  if (tree_int_cst_sgn (DR_STEP (dr_info->dr)) < 0)
-misalignment += ((TYPE_VECTOR_SUBPARTS (vectype) - 1)
-* -TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE 
(vectype;
+  /* Apply the offset from the DR group start and the externally supplied
+ offset which can for example result from a negative stride access.  */
+  poly_int64 misalignment = misalign + diff + offset;
 
   /* vect_compute_data_ref_alignment will have ensured that target_alignment
  is constant and otherwise set misalign to DR_MISALIGNMENT_UNKNOWN.  */
@@ -1549,8 +1546,15 @@ vect_get_peeling_costs_all_drs (loop_vec_info loop_vinfo,
   int misalignment;
   unsigned HOST_WIDE_INT alignment;
 
+  bool negative = tree_int_cst_compare (DR_STEP (dr_info->dr),
+   size_zero_node) < 0;
+  poly_int64 off = 0;
+  if (negative)
+   off = ((TYPE_VECTOR_SUBPARTS (vectype) - 1)
+  * -TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype;
+
   if (npeel == 0)
-   misalignment = dr_misalignment (dr_info, vectype);
+   misalignment = dr_misalignment (dr_info, vectype, off);
   else if (dr_info == dr0_info
   || vect_dr_aligned_if_peeled_dr_is (dr_info, dr0_info))
misalignment = 0;
@@ -1560,7 +1564,7 @@ vect_get_peeling_costs_all_drs (loop_vec_info loop_vinfo,
misalignment = DR_MISALIGNMENT_UNKNOWN;
   else
{
- misalignment = dr_misalignment (dr_info, vectype);
+ misalignment = dr_misalignment (dr_info, vectype, off);
  misalignment += npeel * TREE_INT_CST_LOW (DR_STEP (dr_info->dr));
  misalignment &= alignment - 1;
}
@@ -1960,8 +1964,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
   */
  unsigned int target_align =
DR_TARGET_ALIGNMENT (dr_info).to_constant ();
- unsigned int dr_size = vect_get_scalar_dr_size (dr_info);
- unsigned int mis = dr_misalignment (dr_info, vectype);
+ unsigned HOST_WIDE_INT dr_size = vect_get_scalar_dr_size 
(dr_info);
+ poly_int64 off = 0;
+ if (negative)
+   off = (TYPE_VECTOR_SUBPARTS (vectype) - 1) * -dr_size;
+ unsigned int mis = dr_misalignment (dr_info, vectype, 

Re: [PATCH] libstdc++: Fix 28_regex/basic_regex/84110.cc on Solaris

2021-10-26 Thread Jonathan Wakely via Gcc-patches
On Mon, 25 Oct 2021 at 13:09, Rainer Orth wrote:
>
> 28_regex/basic_regex/84110.cc currently FAILs on Solaris:
>
> FAIL: 28_regex/basic_regex/84110.cc (test for excess errors)
> UNRESOLVED: 28_regex/basic_regex/84110.cc compilation failed to produce 
> executable
>
> Excess errors:
> /vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc:14:
>  error: reference to 'extended' is ambiguous
>
> The issue is seen in the full output:
>
> /vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc:
>  In function ‘void test01()’:
> /vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc:14:
>  error: reference to ‘extended’ is ambiguous
> In file included from 
> /var/gcc/regression/master/11.4-gcc-gas/build/gcc/include-fixed/math.h:391,
>  from 
> /var/gcc/regression/master/11.4-gcc-gas/build/i386-pc-solaris2.11/libstdc++-v3/include/cmath:45,
>  from 
> /vol/gcc/src/hg/master/local/libstdc++-v3/include/precompiled/stdc++.h:41:
> /usr/include/floatingpoint.h:73: note: candidates are: ‘typedef unsigned int 
> extended [3]’
>
> Fixed by qualifying extended.  Tested on i386-pc-solaris2.11,
> sparc-sun-solaris2.11, and x86_64-pc-linux-gnu.
>
> Ok for master?
>
> I'm not certain if this is the best fix, though.

Yeah, it's a bit odd to have only one of them qualified. I'll probably
forget why it's like that and remove it again to make it consistent
;-)

Please do:

using namespace std::regex_constants;
// See https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582486.html
using std::regex_constants::extended;

(There's no difference between std::regex::extended and
std::regex_constants::extended, but we might as well consistently
qualify it one way or the other in a given scope).

OK for master like that - thanks!


Re: [PATCH] libstdc++: Fix 17_intro/names.cc on Solaris

2021-10-26 Thread Jonathan Wakely via Gcc-patches
On Mon, 25 Oct 2021 at 12:55, Rainer Orth  wrote:
>
> 17_intro/names.cc and experimental/names.cc currently FAIL on Solaris
>
> FAIL: 17_intro/names.cc (test for excess errors)
> FAIL: experimental/names.cc (test for excess errors)
>
> Excess errors:
> /usr/include/sys/timespec_util.h:22: error: expected ')' before ';' token
> /usr/include/stdlib.h:157: error: expected unqualified-id before '[' token
> /usr/include/stdlib.h:157: error: expected ')' before '[' token
>
>  has
>
> extern int timespeccompare(const struct timespec *l, const struct timespec 
> *r);
>
> while  has
>
> typedef struct drand48_data {
> unsigned int _initialised;
> unsigned short int x[3];
> unsigned short int a[3];
> unsigned int c;
> unsigned short lastx[3];
> } drand48_data;
>
> both of which are broken by defining r resp. x to ( in the testcase.
>
> Fixed by undoing the defines.  Tested on i386-pc-solaris2.11,
> sparc-sun-solaris2.11, and x86_64-pc-linux-gnu.
>
> Ok for master?

OK, thanks


[committed] Fortran: Fix character(len=cst) dummies with bind(C) [PR102885] (erratum: should be: 'len=noncst')

2021-10-26 Thread Tobias Burnus

I forgot to handle len= correctly. This patch does the
same as for non-BIND(C). In the latter case, the call is:

  gfc_generate_function_code → gfc_trans_deferred_vars → 
gfc_trans_deferred_array

and that function has

  if (sym->ts.type == BT_CHARACTER
  && !INTEGER_CST_P (sym->ts.u.cl->backend_decl))
{
  gfc_conv_string_length (sym->ts.u.cl, NULL, &init);
  gfc_trans_vla_type_sizes (sym, &init);
}

Expect a déjà vu if you read the attached patch ...

Thanks to Dominique for testing with -flto and finding this issue
and to Richard for helping me to understand what goes wrong.

Committed as obvious as r12-4703.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit a31a3d0421f0cf1f7eefacfec8cbf37e7f91600d
Author: Tobias Burnus 
Date:   Tue Oct 26 10:53:53 2021 +0200

Fortran: Fix character(len=cst) dummies with bind(C) [PR102885]

PR fortran/102885

gcc/fortran/ChangeLog:

* trans-decl.c (gfc_conv_cfi_to_gfc): Properly handle nonconstant
character lenghts.

gcc/testsuite/ChangeLog:

* gfortran.dg/lto/bind-c-char_0.f90: New test.
---
 gcc/fortran/trans-decl.c|  7 
 gcc/testsuite/gfortran.dg/lto/bind-c-char_0.f90 | 49 +
 2 files changed, 56 insertions(+)

diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index fe5511b5285..49ba9060eae 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -6837,6 +6837,13 @@ gfc_conv_cfi_to_gfc (stmtblock_t *init, stmtblock_t *finally,
   gfc_add_modify (&block, sym->ts.u.cl->backend_decl, tmp);
 }
 
+  if (sym->ts.type == BT_CHARACTER
+  && !INTEGER_CST_P (sym->ts.u.cl->backend_decl))
+{
+  gfc_conv_string_length (sym->ts.u.cl, NULL, init);
+  gfc_trans_vla_type_sizes (sym, init);
+}
+
   /* gfc->data = cfi->base_addr - or for scalars: gfc = cfi->base_addr.
  assumed-size/explicit-size arrays end up here for character(len=*)
  only. */
diff --git a/gcc/testsuite/gfortran.dg/lto/bind-c-char_0.f90 b/gcc/testsuite/gfortran.dg/lto/bind-c-char_0.f90
new file mode 100644
index 000..48b495b1d29
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/lto/bind-c-char_0.f90
@@ -0,0 +1,49 @@
+! { dg-lto-do link }
+! { dg-lto-options {{ -O0 -flto }} }
+!
+! PR fortran/102885
+
+module m
+  use iso_c_binding, only: c_char
+  implicit none (type, external)
+
+contains
+
+! Assumed-shape array, nonallocatable/nonpointer
+
+subroutine ar3 (xn, n) bind(C)
+  integer :: n
+  character(len=n) :: xn(..)
+  if (size(xn) /= 6) stop
+  if (len(xn) /= 5) stop  
+  select rank(xn)
+rank(1)
+  xn = ['FDGhf', &
+'hdrhg', &
+'fDgFl', &
+'DFHs3', &
+'4a54G', &
+'hSs6k']
+  rank default
+stop
+  end select
+end
+
+end
+
+program main
+  use m
+  implicit none (type, external)
+  character(kind=c_char, len=5) :: str5a6(6)
+
+  ! assumed rank - with array descriptor
+
+  str5a6 = ['DDGhf', &
+'hdrh$', &
+'fDGSl', &
+'DFHs3', &
+'43grG', &
+'hFG$k']
+  call ar3 (str5a6, 5)
+
+end


[PATCH] Turn vect_create_addr_base_for_vector_ref offset into a byte offset

2021-10-26 Thread Richard Biener via Gcc-patches
This changes the offset in elements for vect_create_addr_base_for_vector_ref
and vect_create_data_ref_ptr to an offset in bytes, easing a following
refactoring.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-26  Richard Biener  

* tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref):
Take offset in bytes.
(vect_create_data_ref_ptr): Likewise.
* tree-vect-loop-manip.c (get_misalign_in_elems): Multiply
offset by element size.
(vect_create_cond_for_align_checks): Likewise.
* tree-vect-stmts.c (get_negative_load_store_type): Likewise.
(vectorizable_load): Remove duplicate leftover from merge
conflict.
---
 gcc/tree-vect-data-refs.c  | 15 ++-
 gcc/tree-vect-loop-manip.c |  8 ++--
 gcc/tree-vect-stmts.c  | 11 +++
 3 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index cbcd4b80246..46360c50bd4 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4753,8 +4753,7 @@ vect_duplicate_ssa_name_ptr_info (tree name, dr_vec_info 
*dr_info)
if LOOP=i_loop: &in (relative to i_loop)
if LOOP=j_loop: &in+i*2B(relative to j_loop)
BYTE_OFFSET: Optional, defaulted to NULL.  If supplied, it is added to the
-   initial address.  Unlike OFFSET, which is number of elements to
-   be added, BYTE_OFFSET is measured in bytes.
+   initial address.  Both OFFSET and BYTE_OFFSET are measured in bytes.
 
Output:
1. Return an SSA_NAME whose value is the address of the memory location of
@@ -4777,7 +4776,6 @@ vect_create_addr_base_for_vector_ref (vec_info *vinfo, 
stmt_vec_info stmt_info,
   tree dest;
   gimple_seq seq = NULL;
   tree vect_ptr_type;
-  tree step = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
   loop_vec_info loop_vinfo = dyn_cast  (vinfo);
   innermost_loop_behavior *drb = vect_dr_behavior (vinfo, dr_info);
 
@@ -4801,8 +4799,7 @@ vect_create_addr_base_for_vector_ref (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   if (offset)
 {
-  offset = fold_build2 (MULT_EXPR, sizetype,
-   fold_convert (sizetype, offset), step);
+  offset = fold_convert (sizetype, offset);
   base_offset = fold_build2 (PLUS_EXPR, sizetype,
 base_offset, offset);
 }
@@ -4860,8 +4857,8 @@ vect_create_addr_base_for_vector_ref (vec_info *vinfo, 
stmt_vec_info stmt_info,
2. AGGR_TYPE: the type of the reference, which should be either a vector
 or an array.
3. AT_LOOP: the loop where the vector memref is to be created.
-   4. OFFSET (optional): an offset to be added to the initial address accessed
-   by the data-ref in STMT_INFO.
+   4. OFFSET (optional): a byte offset to be added to the initial address
+   accessed by the data-ref in STMT_INFO.
5. BSI: location where the new stmts are to be placed if there is no loop
6. ONLY_INIT: indicate if ap is to be updated in the loop, or remain
 pointing to the initial address.
@@ -4885,7 +4882,7 @@ vect_create_addr_base_for_vector_ref (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if OFFSET is not supplied:
  initial_address = &a[init];
   if OFFSET is supplied:
- initial_address = &a[init + OFFSET];
+initial_address = &a[init] + OFFSET;
   if BYTE_OFFSET is supplied:
 initial_address = &a[init] + BYTE_OFFSET;
 
@@ -5031,7 +5028,7 @@ vect_create_data_ref_ptr (vec_info *vinfo, stmt_vec_info 
stmt_info,
   /* (2) Calculate the initial address of the aggregate-pointer, and set
  the aggregate-pointer to point to it before the loop.  */
 
-  /* Create: (&(base[init_val+offset]+byte_offset) in the loop preheader.  */
+  /* Create: (&(base[init_val]+offset+byte_offset) in the loop preheader.  */
 
   new_temp = vect_create_addr_base_for_vector_ref (vinfo,
   stmt_info, &new_stmt_list,
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 378b1026baa..72d583189c4 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1625,7 +1625,9 @@ get_misalign_in_elems (gimple **seq, loop_vec_info 
loop_vinfo)
   bool negative = tree_int_cst_compare (DR_STEP (dr_info->dr),
size_zero_node) < 0;
   tree offset = (negative
-? size_int (-TYPE_VECTOR_SUBPARTS (vectype) + 1)
+? size_int ((-TYPE_VECTOR_SUBPARTS (vectype) + 1)
+* TREE_INT_CST_LOW
+(TYPE_SIZE_UNIT (TREE_TYPE (vectype
 : size_zero_node);
   tree start_addr = vect_create_addr_base_for_vector_ref (loop_vinfo,
  stmt_info, seq,
@@ -3227,7 +3229,9 @@ vect_create_cond_for_align_checks (loop_vec_info 
loop_vinfo,
   bool ne

RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-26 Thread Richard Biener via Gcc-patches
On Tue, 26 Oct 2021, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, October 26, 2021 9:26 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd
> > 
> > Subject: RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns
> > on signed values
> > 
> > On Mon, 25 Oct 2021, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Friday, October 15, 2021 12:31 PM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd
> > > > 
> > > > Subject: Re: [PATCH] middle-end: fix de-optimizations with bitclear
> > > > patterns on signed values
> > > >
> > > > On Fri, 15 Oct 2021, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > During testing after rebasing to commit I noticed a failing
> > > > > testcase with the bitmask compare patch.
> > > > >
> > > > > Consider the following C++ testcase:
> > > > >
> > > > > #include 
> > > > >
> > > > > #define A __attribute__((noipa))
> > > > > A bool f5 (double i, double j) { auto c = i <=> j; return c >= 0;
> > > > > }
> > > > >
> > > > > This turns into a comparison against chars, on systems where chars
> > > > > are signed the pattern inserts an unsigned convert such that it's
> > > > > able to do the transformation.
> > > > >
> > > > > i.e.:
> > > > >
> > > > >   # RANGE [-1, 2]
> > > > >   # c$_M_value_22 = PHI <-1(3), 0(2), 2(5), 1(4)>
> > > > >   # RANGE ~[3, 254]
> > > > >   _11 = (unsigned char) c$_M_value_22;
> > > > >   _19 = _11 <= 1;
> > > > >   # .MEM_24 = VDEF <.MEM_6(D)>
> > > > >   D.10434 ={v} {CLOBBER};
> > > > >   # .MEM_14 = VDEF <.MEM_24>
> > > > >   D.10407 ={v} {CLOBBER};
> > > > >   # VUSE <.MEM_14>
> > > > >   return _19;
> > > > >
> > > > > instead of:
> > > > >
> > > > >   # RANGE [-1, 2]
> > > > >   # c$_M_value_5 = PHI <-1(3), 0(2), 2(5), 1(4)>
> > > > >   # RANGE [-2, 2]
> > > > >   _3 = c$_M_value_5 & -2;
> > > > >   _19 = _3 == 0;
> > > > >   # .MEM_24 = VDEF <.MEM_6(D)>
> > > > >   D.10440 ={v} {CLOBBER};
> > > > >   # .MEM_14 = VDEF <.MEM_24>
> > > > >   D.10413 ={v} {CLOBBER};
> > > > >   # VUSE <.MEM_14>
> > > > >   return _19;
> > > > >
> > > > > This causes much worse codegen under -ffast-math due to phiops no
> > > > > longer recognizing the pattern.  It turns out that phiopts
> > > > > spaceship_replacement is looking for the exact form that was just
> > changed.
> > > > >
> > > > > Trying to get it to recognize the new form is not trivial as the
> > > > > transformation doesn't look to work when the thing it's pointing
> > > > > to is itself
> > > > a phi-node.
> > > >
> > > > What do you mean?  Where it handles the BIT_AND it could also handle
> > > > the conversion, no?  The later handling would probably more
> > > > explicitely need to distinguish between the BIT_AND and the conversion
> > forms.
> > >
> > > Looks like I misunderstood the code, it was looking at the uses not
> > > the defs of the value.
> > >
> > > --- inline copy of patch ---
> > >
> > > The comments seems to suggest this code only checks for (res & ~1) ==
> > > 0 but the implementation seems to suggest it's broader.
> > >
> > > As such I added a case to check to see if the value comparison we
> > > found is a type cast.  and strips away the type cast and continues.
> > >
> > > In match.pd the typecasts are only added for signed comparisons to ==
> > > 0 and != 0 which are then rewritten into comparisons with 1.
> > >
> > > As such I only check for 1 and LE and GT, which is what match.pd would
> > > have rewritten it to.
> > >
> > > This fixes the regression but this is not code I 100% understand,
> > > since I don't really know the semantics of the spaceship operator so
> > > would appreciate an extra look.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > > and no regressions.
> > >
> > > Ok for master?
> > 
> > Please add a testcase.  I hope Jakub can review the spaceship_replacement
> > patch since he's the one familiar with the code.
> 
> There's already a bunch of testcases that test the various variants: 
> gcc/testsuite/g++.dg/opt/pr94589-1.C
> and gcc/testsuite/g++.dg/opt/pr94589-2.C which is how I noticed the failure.  
> However they only trigger
> the failure on signed chars.  I tried forcing `-fsigned-char` to see if I can 
> make a general testcase but this
> seems to have not done it.
> 
> Is there another flag I can use?

I suppose you can copy the testcase(s) and replace 'auto' with
'signed char'?

> 
> Regards,
> Tamar
> > 
> > Thanks,
> > Richard.
> > 
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-ssa-phiopt.c (spaceship_replacement): Handle new canonical
> > >   codegen.
> > >
> > > diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c index
> > >
> > 0e339c46afa29fa97f90d9bc4394370cd9b4b396..65b25be3399b75d5e9cab0f78
> > aa2
> > > 340418571a33 100644
> > > --- a/gcc/tree-ssa-phiopt.c
> > > +++ 

RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-26 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, October 26, 2021 9:26 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd
> 
> Subject: RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns
> on signed values
> 
> On Mon, 25 Oct 2021, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Friday, October 15, 2021 12:31 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd
> > > 
> > > Subject: Re: [PATCH] middle-end: fix de-optimizations with bitclear
> > > patterns on signed values
> > >
> > > On Fri, 15 Oct 2021, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > During testing after rebasing to commit I noticed a failing
> > > > testcase with the bitmask compare patch.
> > > >
> > > > Consider the following C++ testcase:
> > > >
> > > > #include 
> > > >
> > > > #define A __attribute__((noipa))
> > > > A bool f5 (double i, double j) { auto c = i <=> j; return c >= 0;
> > > > }
> > > >
> > > > This turns into a comparison against chars, on systems where chars
> > > > are signed the pattern inserts an unsigned convert such that it's
> > > > able to do the transformation.
> > > >
> > > > i.e.:
> > > >
> > > >   # RANGE [-1, 2]
> > > >   # c$_M_value_22 = PHI <-1(3), 0(2), 2(5), 1(4)>
> > > >   # RANGE ~[3, 254]
> > > >   _11 = (unsigned char) c$_M_value_22;
> > > >   _19 = _11 <= 1;
> > > >   # .MEM_24 = VDEF <.MEM_6(D)>
> > > >   D.10434 ={v} {CLOBBER};
> > > >   # .MEM_14 = VDEF <.MEM_24>
> > > >   D.10407 ={v} {CLOBBER};
> > > >   # VUSE <.MEM_14>
> > > >   return _19;
> > > >
> > > > instead of:
> > > >
> > > >   # RANGE [-1, 2]
> > > >   # c$_M_value_5 = PHI <-1(3), 0(2), 2(5), 1(4)>
> > > >   # RANGE [-2, 2]
> > > >   _3 = c$_M_value_5 & -2;
> > > >   _19 = _3 == 0;
> > > >   # .MEM_24 = VDEF <.MEM_6(D)>
> > > >   D.10440 ={v} {CLOBBER};
> > > >   # .MEM_14 = VDEF <.MEM_24>
> > > >   D.10413 ={v} {CLOBBER};
> > > >   # VUSE <.MEM_14>
> > > >   return _19;
> > > >
> > > > This causes much worse codegen under -ffast-math due to phiops no
> > > > longer recognizing the pattern.  It turns out that phiopts
> > > > spaceship_replacement is looking for the exact form that was just
> changed.
> > > >
> > > > Trying to get it to recognize the new form is not trivial as the
> > > > transformation doesn't look to work when the thing it's pointing
> > > > to is itself
> > > a phi-node.
> > >
> > > What do you mean?  Where it handles the BIT_AND it could also handle
> > > the conversion, no?  The later handling would probably more
> > > explicitely need to distinguish between the BIT_AND and the conversion
> forms.
> >
> > Looks like I misunderstood the code, it was looking at the uses not
> > the defs of the value.
> >
> > --- inline copy of patch ---
> >
> > The comments seems to suggest this code only checks for (res & ~1) ==
> > 0 but the implementation seems to suggest it's broader.
> >
> > As such I added a case to check to see if the value comparison we
> > found is a type cast.  and strips away the type cast and continues.
> >
> > In match.pd the typecasts are only added for signed comparisons to ==
> > 0 and != 0 which are then rewritten into comparisons with 1.
> >
> > As such I only check for 1 and LE and GT, which is what match.pd would
> > have rewritten it to.
> >
> > This fixes the regression but this is not code I 100% understand,
> > since I don't really know the semantics of the spaceship operator so
> > would appreciate an extra look.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no regressions.
> >
> > Ok for master?
> 
> Please add a testcase.  I hope Jakub can review the spaceship_replacement
> patch since he's the one familiar with the code.

There's already a bunch of testcases that test the various variants: 
gcc/testsuite/g++.dg/opt/pr94589-1.C
and gcc/testsuite/g++.dg/opt/pr94589-2.C which is how I noticed the failure.  
However they only trigger
the failure on signed chars.  I tried forcing `-fsigned-char` to see if I can 
make a general testcase but this
seems to have not done it.

Is there another flag I can use?

Regards,
Tamar
> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-phiopt.c (spaceship_replacement): Handle new canonical
> > codegen.
> >
> > diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c index
> >
> 0e339c46afa29fa97f90d9bc4394370cd9b4b396..65b25be3399b75d5e9cab0f78
> aa2
> > 340418571a33 100644
> > --- a/gcc/tree-ssa-phiopt.c
> > +++ b/gcc/tree-ssa-phiopt.c
> > @@ -2037,6 +2037,7 @@ spaceship_replacement (basic_block cond_bb,
> basic_block middle_bb,
> >tree lhs, rhs;
> >gimple *orig_use_stmt = use_stmt;
> >tree orig_use_lhs = NULL_TREE;
> > +  bool is_canon = false;
> >int prec = TYPE_PRECISION (TREE_TYPE (phires));
> >if (is_gimple_assign (use_stmt)
> >&& gimple_assign_rhs_code (use_stmt) == BIT_AND_EX

RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-26 Thread Richard Biener via Gcc-patches
On Mon, 25 Oct 2021, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, October 15, 2021 12:31 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd
> > 
> > Subject: Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns
> > on signed values
> > 
> > On Fri, 15 Oct 2021, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > During testing after rebasing to commit I noticed a failing testcase
> > > with the bitmask compare patch.
> > >
> > > Consider the following C++ testcase:
> > >
> > > #include 
> > >
> > > #define A __attribute__((noipa))
> > > A bool f5 (double i, double j) { auto c = i <=> j; return c >= 0; }
> > >
> > > This turns into a comparison against chars, on systems where chars are
> > > signed the pattern inserts an unsigned convert such that it's able to
> > > do the transformation.
> > >
> > > i.e.:
> > >
> > >   # RANGE [-1, 2]
> > >   # c$_M_value_22 = PHI <-1(3), 0(2), 2(5), 1(4)>
> > >   # RANGE ~[3, 254]
> > >   _11 = (unsigned char) c$_M_value_22;
> > >   _19 = _11 <= 1;
> > >   # .MEM_24 = VDEF <.MEM_6(D)>
> > >   D.10434 ={v} {CLOBBER};
> > >   # .MEM_14 = VDEF <.MEM_24>
> > >   D.10407 ={v} {CLOBBER};
> > >   # VUSE <.MEM_14>
> > >   return _19;
> > >
> > > instead of:
> > >
> > >   # RANGE [-1, 2]
> > >   # c$_M_value_5 = PHI <-1(3), 0(2), 2(5), 1(4)>
> > >   # RANGE [-2, 2]
> > >   _3 = c$_M_value_5 & -2;
> > >   _19 = _3 == 0;
> > >   # .MEM_24 = VDEF <.MEM_6(D)>
> > >   D.10440 ={v} {CLOBBER};
> > >   # .MEM_14 = VDEF <.MEM_24>
> > >   D.10413 ={v} {CLOBBER};
> > >   # VUSE <.MEM_14>
> > >   return _19;
> > >
> > > This causes much worse codegen under -ffast-math due to phiops no
> > > longer recognizing the pattern.  It turns out that phiopts
> > > spaceship_replacement is looking for the exact form that was just changed.
> > >
> > > Trying to get it to recognize the new form is not trivial as the
> > > transformation doesn't look to work when the thing it's pointing to is 
> > > itself
> > a phi-node.
> > 
> > What do you mean?  Where it handles the BIT_AND it could also handle the
> > conversion, no?  The later handling would probably more explicitely need to
> > distinguish between the BIT_AND and the conversion forms.
> 
> Looks like I misunderstood the code, it was looking at the uses not the defs 
> of
> the value.
> 
> --- inline copy of patch ---
> 
> The comments seems to suggest this code only checks for (res & ~1) == 0 but 
> the
> implementation seems to suggest it's broader.
> 
> As such I added a case to check to see if the value comparison we found is a
> type cast.  and strips away the type cast and continues.
> 
> In match.pd the typecasts are only added for signed comparisons to == 0 and 
> != 0
> which are then rewritten into comparisons with 1.
> 
> As such I only check for 1 and LE and GT, which is what match.pd would have
> rewritten it to.
> 
> This fixes the regression but this is not code I 100% understand, since I 
> don't
> really know the semantics of the spaceship operator so would appreciate an 
> extra
> look.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no regressions.
> 
> Ok for master?

Please add a testcase.  I hope Jakub can review the spaceship_replacement
patch since he's the one familiar with the code.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-phiopt.c (spaceship_replacement): Handle new canonical
>   codegen.
> 
> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> index 
> 0e339c46afa29fa97f90d9bc4394370cd9b4b396..65b25be3399b75d5e9cab0f78aa2340418571a33
>  100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -2037,6 +2037,7 @@ spaceship_replacement (basic_block cond_bb, basic_block 
> middle_bb,
>tree lhs, rhs;
>gimple *orig_use_stmt = use_stmt;
>tree orig_use_lhs = NULL_TREE;
> +  bool is_canon = false;
>int prec = TYPE_PRECISION (TREE_TYPE (phires));
>if (is_gimple_assign (use_stmt)
>&& gimple_assign_rhs_code (use_stmt) == BIT_AND_EXPR
> @@ -2063,6 +2064,26 @@ spaceship_replacement (basic_block cond_bb, 
> basic_block middle_bb,
>  }
>else if (is_gimple_assign (use_stmt))
>  {
> +  /* Deal with if match.pd has rewritten the (res & ~1) == 0
> +  into res <= 1 and has left a type-cast for signed types.  */
> +  if (gimple_assign_cast_p (use_stmt))
> + {
> +   orig_use_lhs = gimple_assign_lhs (use_stmt);
> +   if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (orig_use_lhs))
> + return false;
> +   if (EDGE_COUNT (phi_bb->preds) != 4)
> + return false;
> +   if (!TYPE_UNSIGNED (TREE_TYPE (orig_use_lhs)))
> + return false;
> +   if (!single_imm_use (orig_use_lhs, &use_p, &use_stmt))
> + return false;
> +   tree_code cmp;
> +   if (is_gimple_assign (use_stmt)
> +   && (cmp = gimple_assign_rhs_code (use_stmt))
> +   && (cmp == LE_E

Re: [PATCH v3] detect out-of-bounds stores by atomic functions [PR102453]

2021-10-26 Thread Richard Biener via Gcc-patches
On Mon, Oct 25, 2021 at 4:21 AM Jeff Law  wrote:
>
>
>
> On 10/24/2021 5:40 PM, Martin Sebor via Gcc-patches wrote:
> > Attached is a revised patch for just the access warning pass
> > to diagnose out-of-bounds stores by atomic functions, with
> > no attr-fnspec changes.
> >
> > Is this okay for trunk?
> >
> > Martin
> >
> > PS Just to clarify the effect of the original patch in case
> > it wasn't: it didn't enable optimizations of atomic built-ins.
> > It just made it possible, by first calling the new
> > atomic_builtin_fnspec() to get the fnspec, and then by actually
> > doing something with it.  The original patch did not modify
> > builtin_fnspec() to call the new atomic_builtin_fnspec().  But
> > since you seem to have reservations about exposing the attribute
> > in any form I have withdrawn the original patch and replaced it
> > with the more limited one.
> >
> >
> I'm letting Richi take the lead here.  But I did notice a tiny nit:
>
>
> > +
> > +  /* Tyhe size in bytes of the access by the function, and the number
> s/Tyhe/The/

The patch looks OK to me.

Thanks,
Richard.

>
>


Re: [PATCH] hardened conditionals

2021-10-26 Thread Richard Biener via Gcc-patches
On Fri, Oct 22, 2021 at 4:19 AM Alexandre Oliva  wrote:
>
> On Oct 20, 2021, Alexandre Oliva  wrote:
>
> > I suppose it's a latent issue exposed by the patch,
>
> I was mistaken.  Though I even had bisected the -fcompare-debug problem
> back to a patch from back in May, that added a new sink_code pass before
> store_merging, it was actually a bug in my patch, it was just a little
> hard to hit with bootstrap-debug, but it came up with -fcompare-debug in
> ways that misled me.
>
> split_block remains slightly risky to use unless you know you have or
> are going to insert nondebug stmts/insns in both blocks.  I've often
> pondered warning in case split_block completes with only debug
> stmts/insns in either block, but IIRC there are multiple passes that
> split first and insert code afterwards, which have to be rearranged to
> aovid the warning.
>
> Anyway, here's the fixed patch.  Regstrapped on x86_64-linux-gnu, and
> bootstrapped with an additional patch that enables both new passes.  Ok
> to install?

OK.

Thanks,
Richard.

>
> hardened conditionals
>
> From: Alexandre Oliva 
>
> This patch introduces optional passes to harden conditionals used in
> branches, and in computing boolean expressions, by adding redundant
> tests of the reversed conditions, and trapping in case of unexpected
> results.  Though in abstract machines the redundant tests should never
> fail, CPUs may be led to misbehave under certain kinds of attacks,
> such as of power deprivation, and these tests reduce the likelihood of
> going too far down an unexpected execution path.
>
>
> for  gcc/ChangeLog
>
> * common.opt (fharden-compares): New.
> (fharden-conditional-branches): New.
> * doc/invoke.texi: Document new options.
> * gimple-harden-conditionals.cc: New.
> * passes.def: Add new passes.
> * tree-pass.h (make_pass_harden_compares): Declare.
> (make_pass_harden_conditional_branches): Declare.
>
> for  gcc/ada/ChangeLog
>
> * doc/gnat_rm/security_hardening_features.rst
> (Hardened Conditionals): New.
>
> for  gcc/testsuite/ChangeLog
>
> * c-c++-common/torture/harden-comp.c: New.
> * c-c++-common/torture/harden-cond.c: New.
> ---
>  gcc/Makefile.in|1
>  .../doc/gnat_rm/security_hardening_features.rst|   40 ++
>  gcc/common.opt |8
>  gcc/doc/invoke.texi|   19 +
>  gcc/gimple-harden-conditionals.cc  |  439 
> 
>  gcc/passes.def |2
>  gcc/testsuite/c-c++-common/torture/harden-comp.c   |   14 +
>  gcc/testsuite/c-c++-common/torture/harden-cond.c   |   18 +
>  gcc/tree-pass.h|3
>  9 files changed, 544 insertions(+)
>  create mode 100644 gcc/gimple-harden-conditionals.cc
>  create mode 100644 gcc/testsuite/c-c++-common/torture/harden-comp.c
>  create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cond.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index f36ffa4740b78..a79ff93dd5999 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1389,6 +1389,7 @@ OBJS = \
> gimple-if-to-switch.o \
> gimple-iterator.o \
> gimple-fold.o \
> +   gimple-harden-conditionals.o \
> gimple-laddress.o \
> gimple-loop-interchange.o \
> gimple-loop-jam.o \
> diff --git a/gcc/ada/doc/gnat_rm/security_hardening_features.rst 
> b/gcc/ada/doc/gnat_rm/security_hardening_features.rst
> index 1c46e3a4c7b88..52240d7e3dd54 100644
> --- a/gcc/ada/doc/gnat_rm/security_hardening_features.rst
> +++ b/gcc/ada/doc/gnat_rm/security_hardening_features.rst
> @@ -87,3 +87,43 @@ types and subtypes, may be silently ignored.  
> Specifically, it is not
>  currently recommended to rely on any effects this pragma might be
>  expected to have when calling subprograms through access-to-subprogram
>  variables.
> +
> +
> +.. Hardened Conditionals:
> +
> +Hardened Conditionals
> +=
> +
> +GNAT can harden conditionals to protect against control flow attacks.
> +
> +This is accomplished by two complementary transformations, each
> +activated by a separate command-line option.
> +
> +The option *-fharden-compares* enables hardening of compares that
> +compute results stored in variables, adding verification that the
> +reversed compare yields the opposite result.
> +
> +The option *-fharden-conditional-branches* enables hardening of
> +compares that guard conditional branches, adding verification of the
> +reversed compare to both execution paths.
> +
> +These transformations are introduced late in the compilation pipeline,
> +long after boolean expressions are decomposed into separate compares,
> +each one turned into either a conditional branch or a compare whose
> +result is stored in a boolean variable or temporary.  Compiler
> +optimizations, if enabled, may also turn conditiona

[PATCH v3 1/1] [ARM] Add support for TLS register based stack protector canary access

2021-10-26 Thread Ard Biesheuvel via Gcc-patches
Add support for accessing the stack canary value via the TLS register,
so that multiple threads running in the same address space can use
distinct canary values. This is intended for the Linux kernel running in
SMP mode, where processes entering the kernel are essentially threads
running the same program concurrently: using a global variable for the
canary in that context is problematic because it can never be rotated,
and so the OS is forced to use the same value as long as it remains up.

Using the TLS register to index the stack canary helps with this, as it
allows each CPU to context switch the TLS register along with the rest
of the process, permitting each process to use its own value for the
stack canary.

2021-10-21 Ard Biesheuvel 

* config/arm/arm-opts.h (enum stack_protector_guard): New
* config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem):
New
* config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define
(arm_option_override_internal): Handle and put in error checks
for stack protector guard options.
(arm_option_reconfigure_globals): Likewise
(arm_stack_protect_tls_canary_mem): New
(arm_stack_protect_guard): New
* config/arm/arm.md (stack_protect_set): New
(stack_protect_set_tls): Likewise
(stack_protect_test): Likewise
(stack_protect_test_tls): Likewise
* config/arm/arm.opt (-mstack-protector-guard): New
(-mstack-protector-guard-offset): New.

Signed-off-by: Ard Biesheuvel 
---
 gcc/config/arm/arm-opts.h   |  6 ++
 gcc/config/arm/arm-protos.h |  2 +
 gcc/config/arm/arm.c| 52 
 gcc/config/arm/arm.md   | 62 +++-
 gcc/config/arm/arm.opt  | 22 +++
 gcc/doc/invoke.texi |  9 +++
 6 files changed, 151 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h
index 5c4b62f404f7..581ba3c4fbbb 100644
--- a/gcc/config/arm/arm-opts.h
+++ b/gcc/config/arm/arm-opts.h
@@ -69,4 +69,10 @@ enum arm_tls_type {
   TLS_GNU,
   TLS_GNU2
 };
+
+/* Where to get the canary for the stack protector.  */
+enum stack_protector_guard {
+  SSP_TLSREG,  /* per-thread canary in TLS register */
+  SSP_GLOBAL   /* global canary */
+};
 #endif
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 9b1f61394ad7..37e80256a78d 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, 
rtx, rtx, rtx, rtx);
 extern rtx arm_load_tp (rtx);
 extern bool arm_coproc_builtin_available (enum unspecv);
 extern bool arm_coproc_ldc_stc_legitimate_address (rtx);
+extern rtx arm_stack_protect_tls_canary_mem (void);
+
 
 #if defined TREE_CODE
 extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c4ff06b087eb..0bf06e764dbb 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] =
 
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
+
+#undef TARGET_STACK_PROTECT_GUARD
+#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard
 
 /* Obstack for minipool constant handling.  */
 static struct obstack minipool_obstack;
@@ -3155,6 +3158,26 @@ arm_option_override_internal (struct gcc_options *opts,
   if (TARGET_THUMB2_P (opts->x_target_flags))
 opts->x_inline_asm_unified = true;
 
+  if (arm_stack_protector_guard == SSP_GLOBAL
+  && opts->x_arm_stack_protector_guard_offset_str)
+{
+  error ("incompatible options %'-mstack-protector-guard=global%' and"
+"%'-mstack-protector-guard-offset=%qs%'",
+arm_stack_protector_guard_offset_str);
+}
+
+  if (opts->x_arm_stack_protector_guard_offset_str)
+{
+  char *end;
+  const char *str = arm_stack_protector_guard_offset_str;
+  errno = 0;
+  long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0);
+  if (!*str || *end || errno)
+   error ("%qs is not a valid offset in %qs", str,
+  "-mstack-protector-guard-offset=");
+  arm_stack_protector_guard_offset = offs;
+}
+
 #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS
   SUBTARGET_OVERRIDE_INTERNAL_OPTIONS;
 #endif
@@ -3822,6 +3845,10 @@ arm_option_reconfigure_globals (void)
   else
target_thread_pointer = TP_SOFT;
 }
+
+  if (arm_stack_protector_guard == SSP_TLSREG
+  && target_thread_pointer != TP_CP15)
+error("%'-mstack-protector-guard=tls%' needs a hardware TLS register");
 }
 
 /* Perform some validation between the desired architecture and the rest of the
@@ -8087,6 +8114,19 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx 
reg, rtx pic_reg,
 }
 
 
+rtx
+arm_stack_protect_tls_canary_mem (void)
+{
+  rtx tp = gen_reg_rtx (SImode);
+  emit_insn (gen_load_tp_hard (tp));
+
+  rtx r

[PATCH v3 0/1] implement TLS register based stack canary for ARM

2021-10-26 Thread Ard Biesheuvel via Gcc-patches
Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352

In the Linux kernel, user processes calling into the kernel are
essentially threads running in the same address space, of a program that
never terminates. This means that using a global variable for the stack
protector canary value is problematic on SMP systems, as we can never
change it unless we reboot the system. (Processes that sleep for any
reason will do so on a call into the kernel, which means that there will
always be live kernel stack frames carrying copies of the canary taken
when the function was entered)

AArch64 implements -mstack-protector-guard=sysreg for this purpose, as
this permits the kernel to use different memory addresses for the stack
canary for each CPU, and context switch the chosen system register with
the rest of the process, allowing each process to use its own unique
value for the stack canary.

This patch implements something similar, but for the 32-bit ARM kernel,
which will start using the user space TLS register TPIDRURO to index
per-process metadata while running in the kernel. This means we can just
add an offset to TPIDRURO to obtain the address from which to load the
canary value.

As for the spilling issues that have been fixed in this code in the
past: I suppose a register carrying the TLS register value will never
get spilled to begin with?

Changes since v2:
- fix the template for stack_protect_test_tls so it correctly conveys
  the fact that it sets the Z flag

Comments/suggestions welcome.

Cc: Keith Packard 
Cc: thomas.preudho...@celest.fr
Cc: adhemerval.zane...@linaro.org
Cc: Qing Zhao 
Cc: Richard Sandiford 
Cc: gcc-patches@gcc.gnu.org

Ard Biesheuvel (1):
  [ARM] Add support for TLS register based stack protector canary access

 gcc/config/arm/arm-opts.h   |  6 ++
 gcc/config/arm/arm-protos.h |  2 +
 gcc/config/arm/arm.c| 52 
 gcc/config/arm/arm.md   | 62 +++-
 gcc/config/arm/arm.opt  | 22 +++
 gcc/doc/invoke.texi |  9 +++
 6 files changed, 151 insertions(+), 2 deletions(-)

-- 
2.30.2



  1   2   >