Re: [Patch, rs6000, middle-end] v10: Add implementation for different targets for pair mem fusion

2024-11-01 Thread Alex Coplan
Hi Ajit,

Sorry for not reviewing this sooner.  For a while I thought this patch
was with Richard S to look at the rtl-ssa changes, but we've now agreed
that I should take a look at it.

I noticed that this breaks the build on aarch64.  I suspect you got an
email from the Linaro CI about this?  You can see on patchwork that the
Linaro CI build failed on aarch64:
https://patchwork.sourceware.org/project/gcc/patch/d878c99b-b8b7-4738-ae30-ce127fd98...@linux.ibm.com/
unfortunately I'm told the CI only keeps builds around for 30 days, so
you'll need to try your own build to reproduce the failure, but that
should be fairly straightforward (just building a cc1 cross is enough).

The main issue leading to the build failure is the introduction of new
pure virtual functions in pair-fusion.h without corresponding
implementations in aarch64-ldp-fusion.cc:aarch64_pair_fusion.

I see the following diagnostics:

$SRC/gcc/config/aarch64/aarch64-ldp-fusion.cc: In member function ‘virtual 
unsigned int {anonymous}::pass_ldp_fusion::execute(function*)’:
$SRC/gcc/config/aarch64/aarch64-ldp-fusion.cc:302:27: error: cannot declare 
variable ‘pass’ to be of abstract type ‘aarch64_pair_fusion’
  302 |   aarch64_pair_fusion pass;
  |   ^~~~
$SRC/gcc/config/aarch64/aarch64-ldp-fusion.cc:38:8: note:   because the 
following virtual functions are pure within ‘aarch64_pair_fusion’:
   38 | struct aarch64_pair_fusion : public pair_fusion
  |^~~
In file included from
$SRC/gcc/config/aarch64/aarch64-ldp-fusion.cc:31:
$SRC/gcc/pair-fusion.h:151:16: note: ‘virtual void 
pair_fusion::change_existing_multword_mode(rtx_insn*)’
  151 |   virtual void change_existing_multword_mode (rtx_insn *insn) = 0;
  |^
$SRC/gcc/pair-fusion.h:156:16: note: ‘virtual void 
pair_fusion::modify_new_rtx_insn(rtl_ssa::insn_info*, obstack_watermark*, 
rtl_ssa::insn_change**, auto_vec&)’
  156 |   virtual void modify_new_rtx_insn (rtl_ssa::insn_info *first,
  |^~~
[... several more complaints about new pure virtuals ...]

so either these will need to be given default implementations (i.e. not pure
virtuals) or you will need to add a specific implementation in
aarch64-ldp-fusion.cc.  Most likely the former.

Please try to make sure any future versions of the patch build and test
cleanly on aarch64-linux-gnu.

More comments below.

The most important comment is the one below the call to set_multiword_subreg in
pair-fusion.cc, please pay particular attention to that.  Hopefully if you
re-work things along those lines, you should be able to drop most (if not all)
of your rtl-ssa changes.

On 03/09/2024 15:24, Ajit Agarwal wrote:
> Hello Richard:
> 
> This patch addresses all the review comments.
> It also fix the arm build failure.
> 
> Common infrastructure using generic code for pair mem fusion of different
> targets.
> 
> rs6000 target specific code implement virtual functions defined by generic 
> code.
> 
> Target specific code are added in rs6000-mem-fusion.cc.
> 
> Bootstrapped and regtested on powepc64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> 
> rs6000, middle-end: Add implementation for different targets for pair mem 
> fusion
> 
> Common infrastructure using generic code for pair mem fusion of different
> targets.
> 
> rs6000 target specific code implement virtual functions defined by generic 
> code.
> 
> Target specific code are added in rs6000-mem-fusion.cc.
> 
> 2024-09-03  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-passes.def: New mem fusion pass
>   before pass_early_remat.
>   * pair-fusion.h: Add additional pure virtual function
>   required for rs6000 target implementation.
>   * pair-fusion.cc: Use of virtual functions for additional
>   virtual function addded for rs6000 target.
>   * config/rs6000/rs6000-mem-fusion.cc: Add new pass.
>   Add target specific implementation for generic pure virtual
>   functions.
>   * config/rs6000/mma.md: Modify movoo machine description.
>   Add new machine description movoo1.
>   * config/rs6000/rs6000.cc: Modify rs6000_split_multireg_move
>   to expand movoo machine description for all constraints.
>   * config.gcc: Add new object file.
>   * config/rs6000/rs6000-protos.h: Add new prototype for mem
>   fusion pass.
>   * config/rs6000/t-rs6000: Add new rule.
>   * rtl-ssa/functions.h: Move out allocate function from private
>   to public and add get_m_temp_defs function.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/powerpc/mem-fusion.C: New test.
>   * g++.target/powerpc/mem-fusion-1.C: New test.
>   * gcc.target/powerpc/mma-builtin-1.c: Modify test.
> ---
>  gcc/config.gcc|   2 +
>  gcc/config/rs6000/mma.md  |  26 +-
>  gcc/config/rs6000/rs6000-mem-fusion.cc| 695 +

[committed] aarch64: Assume alias conflict if common address reg changes [PR116783]

2024-10-30 Thread Alex Coplan
Hi,

This is a backport of the PR116783 fix to GCC 14.  It was pre-approved here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665097.html

The only intended non-contextual difference w.r.t. the patch on trunk is
that the test no longer needs -fno-late-combine-instructions on the 14
branch (I verified that it failed there without the change to
aarch64-ldp-fusion.cc).

Bootstrapped/regtested on aarch64-linux-gnu (all languages), no
regressions.  Pushed to the 14 branch.

Thanks,
Alex

---

As the PR shows, pair fusion was tricking memory_modified_in_insn_p into
returning false when a common base register (in this case, x1) was
modified between the mem and the store insn.  This lead to wrong code as
the accesses really did alias.

To avoid this sort of problem, this patch avoids invoking RTL alias
analysis altogether (and assume an alias conflict) if the two insns to
be compared share a common address register R, and the insns see different
definitions of R (i.e. it was modified in between).

This is a backport (but not a straight cherry pick) of
r15-4518-gc0e54ce1999ccf2241f74c5188b11b92e5aedc1f.

gcc/ChangeLog:

PR rtl-optimization/116783
* config/aarch64/aarch64-ldp-fusion.cc
(def_walker::cand_addr_uses): New.
(def_walker::def_walker): Add parameter for candidate address
uses.
(def_walker::alias_conflict_p): Declare.
(def_walker::addr_reg_conflict_p): New.
(def_walker::conflict_p): New.
(store_walker::store_walker): Add parameter for candidate
address uses and pass to base ctor.
(store_walker::conflict_p): Rename to ...
(store_walker::alias_conflict_p): ... this.
(load_walker::load_walker): Add parameter for candidate
address uses and pass to base ctor.
(load_walker::conflict_p): Rename to ...
(load_walker::alias_conflict_p): ... this.
(ldp_bb_info::try_fuse_pair): Collect address register
uses for candidate insns and pass down to alias walkers.

gcc/testsuite/ChangeLog:

PR rtl-optimization/116783
* g++.dg/torture/pr116783.C: New test.
diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 1fc25e389cf..f32d30d54c5 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -2173,11 +2173,80 @@ protected:
 
   def_iter_t def_iter;
   insn_info *limit;
-  def_walker (def_info *def, insn_info *limit) :
-def_iter (def), limit (limit) {}
+
+  // Array of register uses from the candidate insn which occur in MEMs.
+  use_array cand_addr_uses;
+
+  def_walker (def_info *def, insn_info *limit, use_array addr_uses) :
+def_iter (def), limit (limit), cand_addr_uses (addr_uses) {}
 
   virtual bool iter_valid () const { return *def_iter; }
 
+  // Implemented in {load,store}_walker.
+  virtual bool alias_conflict_p (int &budget) const = 0;
+
+  // Return true if the current (walking) INSN () uses a register R inside a
+  // MEM, where R is also used inside a MEM by the (static) candidate insn, and
+  // those uses see different definitions of that register.  In this case we
+  // can't rely on RTL alias analysis, and for now we conservatively assume 
that
+  // there is an alias conflict.  See PR116783.
+  bool addr_reg_conflict_p () const
+  {
+use_array curr_insn_uses = insn ()->uses ();
+auto cand_use_iter = cand_addr_uses.begin ();
+auto insn_use_iter = curr_insn_uses.begin ();
+while (cand_use_iter != cand_addr_uses.end ()
+  && insn_use_iter != curr_insn_uses.end ())
+  {
+   auto insn_use = *insn_use_iter;
+   auto cand_use = *cand_use_iter;
+   if (insn_use->regno () > cand_use->regno ())
+ cand_use_iter++;
+   else if (insn_use->regno () < cand_use->regno ())
+ insn_use_iter++;
+   else
+ {
+   // As it stands I believe the alias code (memory_modified_in_insn_p)
+   // doesn't look at insn notes such as REG_EQU{IV,AL}, so it should
+   // be safe to skip over uses that only occur in notes.
+   if (insn_use->includes_address_uses ()
+   && !insn_use->only_occurs_in_notes ()
+   && insn_use->def () != cand_use->def ())
+ {
+   if (dump_file)
+ {
+   fprintf (dump_file,
+"assuming aliasing of cand i%d and i%d:\n"
+"-> insns see different defs of common addr reg 
r%u\n"
+"-> ",
+cand_use->insn ()->uid (), insn_use->insn ()->uid 
(),
+insn_use->regno ());
+
+   // Note that while the following sequence could be made more
+   // concise by eliding pp_string calls into the pp_printf
+   // calls, doing so triggers -Wformat-diag.
+   pretty_printer pp;
+

Re: [RFC PATCH 1/5] vect: Force alignment peeling to vectorize more early break loops

2024-10-29 Thread Alex Coplan
On 29/10/2024 15:53, Alex Coplan wrote:
> On 29/10/2024 13:39, Richard Biener wrote:
> > On Mon, 28 Oct 2024, Alex Coplan wrote:
> > 
> > > This allows us to vectorize more loops with early exits by forcing
> > > peeling for alignment to make sure that we're guaranteed to be able to
> > > safely read an entire vector iteration without crossing a page boundary.
> > > 
> > > To make this work for VLA architectures we have to allow compile-time
> > > non-constant target alignments.  We also have to override the result of
> > > the target's preferred_vector_alignment hook if it isn't a power-of-two
> > > multiple of the TYPE_SIZE of the chosen vector type.
> > > 
> > > There is currently an implicit assumption that the TYPE_SIZE of the
> > > vector type is itself a power of two.  For non-VLA types this
> > > could be checked directly in the vectorizer.  For VLA types I
> > > had discussed offline with Richard S about adding a target hook to allow
> > > the vectorizer to query the backend to confirm that a given VLA type
> > > is known to have a power-of-two size at runtime.
> > 
> > GCC assumes all vectors have power-of-two size, so I don't think we
> > need to check anything but we'd instead have to make sure the
> > target constrains the hardware when this assumption doesn't hold
> > in silicon.
> 
> Ah, I didn't realise this was already assumed to be the case (even for
> VLA in a target-independent way).  In that case it sounds like we might
> not need the check/hook.  Thanks.
> 
> > 
> > >  I thought we
> > > might be able to do this check in vector_alignment_reachable_p.  Any
> > > thoughts on that, richi?
> > 
> > For the purpose of alignment peeling yeah, I guess this would be
> > a possible place to check this.  The hook is currently used for
> > the case where the element has a lower alignment than its
> > size and thus vector alignment cannot be reached by peeling.
> > 
> > Btw, I thought we can already apply peeling for alignment for
> > VLA vectors ...
> 
> Yes.  However, without this patch we don't attempt alignment peeling by
> a compile-time unknown quantity (e.g. to get aligned to the size of a
> VLA vector).  As things stand we take whatever
> targetm.preferred_vector_alignment says we should align to, and for SVE
> that is typically a compile-time constant, I think (see
> aarch64_vectorize_preferred_vector_alignment).
> 
> > 
> > > gcc/ChangeLog:
> > > 
> > >   * tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
> > >   Set need_peeling_for_alignment flag on read DRs instead of
> > >   failing vectorization.  Punt on gathers.
> > >   (dr_misalignment): Handle non-constant target alignments.
> > >   (vect_compute_data_ref_alignment): If need_peeling_for_alignment
> > >   flag is set on the DR, then override the target alignment chosen
> > >   by the preferred_vector_alignment hook to choose a safe
> > >   alignment.
> > >   (vect_supportable_dr_alignment): Override
> > >   support_vector_misalignment hook if need_peeling_for_alignment
> > >   is set on the DR: in this case we must return
> > >   dr_unaligned_unsupported in order to force peeling.
> > >   * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog
> > >   peeling by a compile-time non-constant amount.
> > >   * tree-vectorizer.h (dr_vec_info): Add new flag
> > >   need_peeling_for_alignment.
> > > ---
> > >  gcc/tree-vect-data-refs.cc  | 77 ++---
> > >  gcc/tree-vect-loop-manip.cc |  6 ---
> > >  gcc/tree-vectorizer.h   |  5 +++
> > >  3 files changed, 68 insertions(+), 20 deletions(-)
> > 
> > Eh, where's the inline copy ...
> 
> Hmm, I suppose I should be passing --inline to git format-patch rather
> than --attach.
> 
> > 
> > @@ -739,15 +739,22 @@ vect_analyze_early_break_dependences (loop_vec_info 
> > loop_vinfo)
> >   if (DR_IS_READ (dr_ref)
> >   && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> > {
> > + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo))
> > +   {
> > + const char *msg
> > 
> > you want to add STMT_VINFO_STRIDED_P as well.
> 
> Ack, thanks.
> 
> > 
> >   /* Vector size in bytes.  */
> > +  poly_uint64 safe_align
> > +   = exact_div (tree_to_poly_uint64 (TYPE_SIZE (vectype)), 
> > BITS_PER_U

Re: [RFC PATCH 1/5] vect: Force alignment peeling to vectorize more early break loops

2024-10-29 Thread Alex Coplan
On 29/10/2024 13:39, Richard Biener wrote:
> On Mon, 28 Oct 2024, Alex Coplan wrote:
> 
> > This allows us to vectorize more loops with early exits by forcing
> > peeling for alignment to make sure that we're guaranteed to be able to
> > safely read an entire vector iteration without crossing a page boundary.
> > 
> > To make this work for VLA architectures we have to allow compile-time
> > non-constant target alignments.  We also have to override the result of
> > the target's preferred_vector_alignment hook if it isn't a power-of-two
> > multiple of the TYPE_SIZE of the chosen vector type.
> > 
> > There is currently an implicit assumption that the TYPE_SIZE of the
> > vector type is itself a power of two.  For non-VLA types this
> > could be checked directly in the vectorizer.  For VLA types I
> > had discussed offline with Richard S about adding a target hook to allow
> > the vectorizer to query the backend to confirm that a given VLA type
> > is known to have a power-of-two size at runtime.
> 
> GCC assumes all vectors have power-of-two size, so I don't think we
> need to check anything but we'd instead have to make sure the
> target constrains the hardware when this assumption doesn't hold
> in silicon.

Ah, I didn't realise this was already assumed to be the case (even for
VLA in a target-independent way).  In that case it sounds like we might
not need the check/hook.  Thanks.

> 
> >  I thought we
> > might be able to do this check in vector_alignment_reachable_p.  Any
> > thoughts on that, richi?
> 
> For the purpose of alignment peeling yeah, I guess this would be
> a possible place to check this.  The hook is currently used for
> the case where the element has a lower alignment than its
> size and thus vector alignment cannot be reached by peeling.
> 
> Btw, I thought we can already apply peeling for alignment for
> VLA vectors ...

Yes.  However, without this patch we don't attempt alignment peeling by
a compile-time unknown quantity (e.g. to get aligned to the size of a
VLA vector).  As things stand we take whatever
targetm.preferred_vector_alignment says we should align to, and for SVE
that is typically a compile-time constant, I think (see
aarch64_vectorize_preferred_vector_alignment).

> 
> > gcc/ChangeLog:
> > 
> > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
> > Set need_peeling_for_alignment flag on read DRs instead of
> > failing vectorization.  Punt on gathers.
> > (dr_misalignment): Handle non-constant target alignments.
> > (vect_compute_data_ref_alignment): If need_peeling_for_alignment
> > flag is set on the DR, then override the target alignment chosen
> > by the preferred_vector_alignment hook to choose a safe
> > alignment.
> > (vect_supportable_dr_alignment): Override
> > support_vector_misalignment hook if need_peeling_for_alignment
> > is set on the DR: in this case we must return
> > dr_unaligned_unsupported in order to force peeling.
> > * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog
> > peeling by a compile-time non-constant amount.
> > * tree-vectorizer.h (dr_vec_info): Add new flag
> > need_peeling_for_alignment.
> > ---
> >  gcc/tree-vect-data-refs.cc  | 77 ++---
> >  gcc/tree-vect-loop-manip.cc |  6 ---
> >  gcc/tree-vectorizer.h   |  5 +++
> >  3 files changed, 68 insertions(+), 20 deletions(-)
> 
> Eh, where's the inline copy ...

Hmm, I suppose I should be passing --inline to git format-patch rather
than --attach.

> 
> @@ -739,15 +739,22 @@ vect_analyze_early_break_dependences (loop_vec_info 
> loop_vinfo)
>   if (DR_IS_READ (dr_ref)
>   && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
> {
> + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo))
> +   {
> + const char *msg
> 
> you want to add STMT_VINFO_STRIDED_P as well.

Ack, thanks.

> 
>   /* Vector size in bytes.  */
> +  poly_uint64 safe_align
> +   = exact_div (tree_to_poly_uint64 (TYPE_SIZE (vectype)), 
> BITS_PER_UNIT);
> 
> safe_align = TYPE_SIZE_UNIT (vectype);

Nice, thanks.

> 
> +  /* Multiply by the unroll factor to get the number of bytes read
> +per vector iteration.  */
> +  if (loop_vinfo)
> +   {
> + auto num_copies = vect_get_num_copies (loop_vinfo, vectype);
> + gcc_checking_assert (pow2p_hwi (num_copies));
> + safe_align *= num_copies;
> 
> the unroll factor is the vectorization factor

I'm probably bei

[RFC PATCH 1/5] vect: Force alignment peeling to vectorize more early break loops

2024-10-28 Thread Alex Coplan
This allows us to vectorize more loops with early exits by forcing
peeling for alignment to make sure that we're guaranteed to be able to
safely read an entire vector iteration without crossing a page boundary.

To make this work for VLA architectures we have to allow compile-time
non-constant target alignments.  We also have to override the result of
the target's preferred_vector_alignment hook if it isn't a power-of-two
multiple of the TYPE_SIZE of the chosen vector type.

There is currently an implicit assumption that the TYPE_SIZE of the
vector type is itself a power of two.  For non-VLA types this
could be checked directly in the vectorizer.  For VLA types I
had discussed offline with Richard S about adding a target hook to allow
the vectorizer to query the backend to confirm that a given VLA type
is known to have a power-of-two size at runtime.  I thought we
might be able to do this check in vector_alignment_reachable_p.  Any
thoughts on that, richi?

gcc/ChangeLog:

* tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
Set need_peeling_for_alignment flag on read DRs instead of
failing vectorization.  Punt on gathers.
(dr_misalignment): Handle non-constant target alignments.
(vect_compute_data_ref_alignment): If need_peeling_for_alignment
flag is set on the DR, then override the target alignment chosen
by the preferred_vector_alignment hook to choose a safe
alignment.
(vect_supportable_dr_alignment): Override
support_vector_misalignment hook if need_peeling_for_alignment
is set on the DR: in this case we must return
dr_unaligned_unsupported in order to force peeling.
* tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog
peeling by a compile-time non-constant amount.
* tree-vectorizer.h (dr_vec_info): Add new flag
need_peeling_for_alignment.
---
 gcc/tree-vect-data-refs.cc  | 77 ++---
 gcc/tree-vect-loop-manip.cc |  6 ---
 gcc/tree-vectorizer.h   |  5 +++
 3 files changed, 68 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 202af7a8952..4e49d8403df 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -739,15 +739,22 @@ vect_analyze_early_break_dependences (loop_vec_info loop_vinfo)
 	  if (DR_IS_READ (dr_ref)
 	  && !ref_within_array_bound (stmt, DR_REF (dr_ref)))
 	{
+	  if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo))
+		{
+		  const char *msg
+		= "early break not supported: cannot peel gather "
+		  "for alignment, vectorization would read out of "
+		  "bounds at %G";
+		  return opt_result::failure_at (stmt, msg, stmt);
+		}
+
+	  dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_vinfo);
+	  dr_info->need_peeling_for_alignment = true;
+
 	  if (dump_enabled_p ())
-		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
- "early breaks not supported: vectorization "
- "would %s beyond size of obj.\n",
- DR_IS_READ (dr_ref) ? "read" : "write");
-	  return opt_result::failure_at (stmt,
- "can't safely apply code motion to "
- "dependencies of %G to vectorize "
- "the early exit.\n", stmt);
+		dump_printf_loc (MSG_NOTE, vect_location,
+ "marking DR (read) as needing peeling for "
+ "alignment\n");
 	}
 
 	  if (DR_IS_READ (dr_ref))
@@ -1230,11 +1237,15 @@ dr_misalignment (dr_vec_info *dr_info, tree vectype, poly_int64 offset)
  offset which can for example result from a negative stride access.  */
   poly_int64 misalignment = misalign + diff + offset;
 
-  /* vect_compute_data_ref_alignment will have ensured that target_alignment
- is constant and otherwise set misalign to DR_MISALIGNMENT_UNKNOWN.  */
-  unsigned HOST_WIDE_INT target_alignment_c
-= dr_info->target_alignment.to_constant ();
-  if (!known_misalignment (misalignment, target_alignment_c, &misalign))
+  /* Below we reject compile-time non-constant target alignments, but if
+ our misalignment is zero, then we are known to already be aligned
+ w.r.t. any such possible target alignment.  */
+  if (known_eq (misalignment, 0))
+return 0;
+
+  unsigned HOST_WIDE_INT target_alignment_c;
+  if (!dr_info->target_alignment.is_constant (&target_alignment_c)
+  || !known_misalignment (misalignment, target_alignment_c, &misalign))
 return DR_MISALIGNMENT_UNKNOWN;
   return misalign;
 }
@@ -1337,6 +1348,43 @@ vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info,
   poly_uint64 vector_alignment
 = exact_div (targetm.vectorize.preferred_vector_alignment (vectype),
 		 BITS_PER_UNIT);
+
+  /* If this DR needs peeling for alignment for correctness, we must
+ ensure the target alignment is a constant power-of-two multiple of the
+ amount read per vector iteration (overriding the above hook where
+ necessary).  */
+  if (dr_info->need_peeling_for_alignment)
+{

[RFC PATCH 5/5] vect: Also cost gconds for scalar

2024-10-28 Thread Alex Coplan
Currently we only cost gconds for the vector loop while we omit costing
them when analyzing the scalar loop; this unfairly penalizes the vector
loop in the case of loops with early exits.

This (together with the previous patches) enables us to vectorize
std::find with 64-bit element sizes.

gcc/ChangeLog:

* tree-vect-loop.cc (vect_compute_single_scalar_iteration_cost):
Don't skip over gconds.
---
 gcc/tree-vect-loop.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b8e155b90f8..5e5825c6593 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1688,7 +1688,9 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
 	  gimple *stmt = gsi_stmt (si);
 	  stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (stmt);
 
-  if (!is_gimple_assign (stmt) && !is_gimple_call (stmt))
+	  if (!is_gimple_assign (stmt)
+	  && !is_gimple_call (stmt)
+	  && !is_a (stmt))
 continue;
 
   /* Skip stmts that are not vectorized inside the loop.  */


[RFC PATCH 4/5] vect: Ensure we add vector skip guard even when versioning for aliasing

2024-10-28 Thread Alex Coplan
This fixes a latent wrong code issue whereby vect_do_peeling determined
the wrong condition for inserting the vector skip guard.  Specifically
in the case where the loop niters are unknown at compile time we used to
check:

  !LOOP_REQUIRES_VERSIONING (loop_vinfo)

but LOOP_REQUIRES_VERSIONING is true for loops which we have versioned
for aliasing, and that has nothing to do with prolog peeling.  I think
this condition should instead be checking specifically if we aren't
versioning for alignment.

As it stands, when we version for alignment, we don't peel, so the
vector skip guard is indeed redundant in that case.

With the testcase added (reduced from the Fortran frontend) we would
version for aliasing, omit the vector skip guard, and then at runtime we
would peel sufficient iterations for alignment that there wasn't a full
vector iteration left when we entered the vector body, thus overflowing
the output buffer.

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_do_peeling): Adjust skip_vector
condition to only omit the edge if we're versioning for
alignment.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-early-break_130.c: New test.
---
 .../gcc.dg/vect/vect-early-break_130.c| 91 +++
 gcc/tree-vect-loop-manip.cc   |  2 +-
 2 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break_130.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_130.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_130.c
new file mode 100644
index 000..ce43fcd5681
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_130.c
@@ -0,0 +1,91 @@
+/* { dg-require-effective-target mmap } */
+/* { dg-add-options vect_early_break } */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* This was reduced from gcc/fortran/scanner.cc:gfc_widechar_to_char.
+   The problem was that we omitted adding the vector skip guard when
+   versioning for aliasing.  When invoked on a string that is 28 bytes
+   long, that caused us to enter the vector body after having peeled 15
+   iterations, leaving only 13 iterations to be performed as vector, but
+   the vector body performs 16 (thus overflowing the res buffer by three
+   bytes).  */
+__attribute__((noipa))
+void f (const uint32_t *s, char *res, int length)
+{
+  unsigned long i;
+
+  for (i = 0; i < length; i++)
+{
+  if (s[i] > 255)
+__builtin_abort ();
+  res[i] = (char)s[i];
+}
+}
+
+int main(void)
+{
+  long pgsz = sysconf (_SC_PAGESIZE);
+  if (pgsz == -1) {
+fprintf (stderr, "sysconf failed: %m\n");
+return 0;
+  }
+
+  void *p = mmap (NULL,
+  pgsz * 2,
+  PROT_READ | PROT_WRITE,
+  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+  if (p == MAP_FAILED) {
+fprintf (stderr, "mmap failed: %m\n");
+return 0;
+  }
+
+  if (mprotect (p + pgsz, pgsz, PROT_NONE)) {
+fprintf (stderr, "mprotect failed: %m\n");
+return 0;
+  }
+
+  uint32_t in[128];
+  memset (in, 0, sizeof(in));
+
+  uintptr_t x = (uintptr_t)in;
+
+  /* We want to make our input pointer maximally misaligned (so we have
+ to peel the greatest possible number of iterations for alignment).
+ We need two bits of alignment for our uint32_t pointer to be
+ aligned.  Assuming we process 16 chars per vector iteration, we
+ will need to load 16 uint32_ts, thus we need a further 4 bits of
+ alignment.  */
+  const uintptr_t align_bits = 2 + 4;
+  const uintptr_t align_p2 = (1 << align_bits);
+  const uintptr_t align_p2m1 = align_p2 - 1;
+
+  if (x & align_p2m1 <= 4)
+x &= -align_p2; /* Round down.  */
+  else
+x = (x + align_p2m1) & -align_p2; /* Round up.  */
+
+  /* Add one uint32_t to get maximally misaligned.  */
+  uint32_t *inp = (uint32_t *)x + 1;
+
+  const char *str = "dec-comparison-complex_1.f90";
+  long n;
+#pragma GCC novector
+  for (n = 0; str[n]; n++)
+inp[n] = str[n];
+
+  if (n > pgsz)
+__builtin_abort ();
+
+  char *buf = p + pgsz - n;
+  f (inp, buf, n);
+
+#pragma GCC novector
+  for (int i = 0; i < n; i++)
+if (buf[i] != str[i])
+  __builtin_abort ();
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index cf3a90219b7..55761b61185 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3278,7 +3278,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
   bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
 		  ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo),
   bound_prolog + bound_epilog)
-		  : (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
+		  : (!LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo)
 			 || vect_epilogues));
 
   /* Epilog loop must be executed if the number of iterations for epilog


[RFC PATCH 2/5] vect: Don't guard scalar epilogue for inverted loops

2024-10-28 Thread Alex Coplan
For loops with LOOP_VINFO_EARLY_BREAKS_VECT_PEELED we should always
enter the scalar epilogue, so avoid emitting a guard on entry to the
epilogue.

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_do_peeling): Avoid emitting an
epilogue guard for inverted early-exit loops.
---
 gcc/tree-vect-loop-manip.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 718652f9bd8..8eb9970edbc 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3537,7 +3537,9 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 
   /* If we have a peeled vector iteration we will never skip the epilog loop
 	 and we can simplify the cfg a lot by not doing the edge split.  */
-  if (skip_epilog || LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+  if (skip_epilog
+	  || (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+	  && !LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)))
 	{
 	  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
 niters, niters_vector_mult_vf);


[RFC PATCH 3/5] vect: Fix dominators when adding a guard to skip the vector loop

2024-10-28 Thread Alex Coplan
From: Tamar Christina 

The alignment peeling changes exposed a latent missing dominator update
with early break vectorization, specifically when inserting the vector
skip edge, since the new edge bypasses the prolog skip block and thus
has the potential to subvert its dominance.  This patch fixes that.

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_do_peeling): Update immediate
dominators of nodes that were dominated by the prolog skip block
after inserting vector skip edge.

gcc/testsuite/ChangeLog:

* g++.dg/vect/vect-early-break_6.cc: New test.

Co-Authored-By: Alex Coplan 
---
 .../g++.dg/vect/vect-early-break_6.cc | 25 +++
 gcc/tree-vect-loop-manip.cc   | 24 ++
 2 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/vect/vect-early-break_6.cc

diff --git a/gcc/testsuite/g++.dg/vect/vect-early-break_6.cc b/gcc/testsuite/g++.dg/vect/vect-early-break_6.cc
new file mode 100644
index 000..fdd9af832a7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-early-break_6.cc
@@ -0,0 +1,25 @@
+// { dg-do compile }
+// ICE in verify_dominators, reduced from charset.cc (libstdc++).
+
+void convert_escape(int *);
+int cpp_interpret_string_1_to, cpp_interpret_string_1_tbuf;
+char *cpp_interpret_string_1_base;
+char cpp_interpret_string_1_limit;
+void cpp_interpret_string_1() {
+  char *p;
+  for (;;) {
+cpp_interpret_string_1_base = p;
+while (p < &cpp_interpret_string_1_limit && *p)
+  p++;
+if (p > cpp_interpret_string_1_base)
+  if (cpp_interpret_string_1_to)
+goto fail;
+if (p >= &cpp_interpret_string_1_limit)
+  break;
+int *tbuf_ptr =
+cpp_interpret_string_1_to ? &cpp_interpret_string_1_tbuf : __null;
+convert_escape(tbuf_ptr);
+  }
+fail:
+  ;
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 8eb9970edbc..cf3a90219b7 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3471,6 +3471,30 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	  skip_e = guard_e;
 	  e = EDGE_PRED (guard_to, 0);
 	  e = (e != guard_e ? e : EDGE_PRED (guard_to, 1));
+
+	  /* Handle any remaining dominator updates needed after
+	 inserting the loop skip edge above.  */
+	  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
+	  && prolog_peeling)
+	{
+	  /* Adding a skip edge to skip a loop with multiple exits
+		 means the dominator of the join blocks for all exits shifts
+		 from the prolog skip guard to the loop skip guard.  */
+	  auto prolog_skip_bb
+		= single_pred (loop_preheader_edge (prolog)->src);
+	  auto needs_update
+		= get_dominated_by (CDI_DOMINATORS, prolog_skip_bb);
+
+	  /* Update everything except for the immediate children of
+		 the prolog skip block (the prolog and vector preheaders).
+		 Those should remain dominated by the prolog skip block itself,
+		 since the loop guard edge goes to the epilogue.  */
+	  for (auto bb : needs_update)
+		if (bb != EDGE_SUCC (prolog_skip_bb, 0)->dest
+		&& bb != EDGE_SUCC (prolog_skip_bb, 1)->dest)
+		  set_immediate_dominator (CDI_DOMINATORS, bb, guard_bb);
+	}
+
 	  slpeel_update_phi_nodes_for_guard1 (first_loop, epilog, guard_e, e);
 
 	  /* Simply propagate profile info from guard_bb to guard_to which is


[RFC PATCH 0/5] vect: Force peeling for alignment to handle more early break loops

2024-10-28 Thread Alex Coplan
This patch series allows us to vectorize more loops with early exits by
forcing peeling for alignment to make sure that we're guaranteed to be
able to safely read an entire vector iteration without crossing a page
boundary.

The motivation is to vectorize search loops such as std::find.  This
shows up in (e.g.) xalancbmk from SPEC CPU 2017.

For a single pair of runs of SPEC CPU 2017 on Neoverse V1 with LTO, I
see notable improvements in xalancbmk (3.2%) and imagick (4.8%).  parest
shows a regression of 1.9%.  I see the following geomean improvements:

+---+-+
| benchmark | geomean |
+---+-+
| SPECint   | -0.17%  |
| SPECfp| -0.13%  |
| overall   | -0.15%  |
+---+-+

The series is structured as follows:
 - 1/5 adds the new functionality.
 - 2/5 fixes a latent wrong code (and missed optimization) bug.
 - 3/5 fixes a latent dominator ICE (exposed by 1/5).
 - 4/5 fixes another latent wrong code bug exposed by the changes.
 - 5/5 fixes a costing issue that caused us to miss vectorization with larger
   element sizes.

The patch series survives an O3-bootstrap on aarch64-linux-gnu.  There
are currently some testsuite regressions, in the following files:

gcc.dg/unroll-6.c
gcc.dg/tree-ssa/cunroll-13.c
gcc.dg/tree-ssa/cunroll-14.c
gcc.dg/tree-ssa/predcom-8.c
gcc.dg/vect/bb-slp-pr65935.c
gcc.dg/vect/vect-104.c
gcc.dg/vect/vect-early-break_108-pr113588.c
gcc.dg/vect/vect-early-break_109-pr113588.c
gcc.dg/vect/vect-early-break_110-pr113467.c
gcc.dg/vect/vect-early-break_3.c
gcc.dg/vect/vect-early-break_65.c
gcc.dg/vect/vect-early-break_8.c
gfortran.dg/vect/vect-5.f90
gfortran.dg/vect/vect-8.f90

mostly, these seem to be testims (e.g. we now vectorize 2 loops, test
expected to see only 1).  I think the main "real" (non-testism) issues
are latent early break profile update bugs exposed by the series.  These
are failures like:

+FAIL: gcc.dg/tree-ssa/cunroll-14.c scan-tree-dump-not cunroll "Invalid sum"
+FAIL: gcc.dg/tree-ssa/predcom-8.c scan-tree-dump-not pcom "Invalid sum"

I have some WIP patches to address these, but I didn't want to block
this series on getting review by waiting until those patches are
finished.

E.g. one of the main profile issues I noticed is that multiplicative
scaling of BB frequencies (as in scale_loop_profile) doesn't work in the
case of multiple exits (provided we want to hold the counts along exit
edges fixed).  I have a patch that addresses this, but it probably makes
most sense to post it once all the profile issues are fixed.

I would appreciate any feedback on the patches at this stage.

Thanks,
Alex

Alex Coplan (4):
  vect: Force alignment peeling to vectorize more early break loops
  vect: Don't guard scalar epilogue for inverted loops
  vect: Ensure we add vector skip guard even when versioning for aliasing
  vect: Also cost gconds for scalar

Tamar Christina (1):
  vect: Fix dominators when adding a guard to skip the vector loop

 .../g++.dg/vect/vect-early-break_6.cc | 25 +
 .../gcc.dg/vect/vect-early-break_130.c| 91 +++
 gcc/tree-vect-data-refs.cc| 77 +---
 gcc/tree-vect-loop-manip.cc   | 36 ++--
 gcc/tree-vect-loop.cc |  4 +-
 gcc/tree-vectorizer.h |  5 +
 6 files changed, 215 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/vect-early-break_6.cc
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break_130.c



Re: [PATCH] Assorted --disable-checking fixes [PR117249]

2024-10-25 Thread Alex Coplan
On 25/10/2024 10:19, Jakub Jelinek wrote:
> Hi!
> 
> We have currently 3 different definitions of gcc_assert macro, one used most
> of the time (unless --disable-checking) which evaluates the condition at
> runtime and also checks it at runtime, then one for --disable-checking GCC 
> 4.5+
> which looks like
> ((void)(UNLIKELY (!(EXPR)) ? __builtin_unreachable (), 0 : 0))
> and a fallback one
> ((void)(0 && (EXPR)))
> Now, the last one actually doesn't evaluate any of the side-effects in the
> argument, just quiets up unused var/parameter warnings.
> I've tried to replace the middle definition with
> ({ [[assume (EXPR)]]; (void) 0; })
> for compilers which support assume attribute and statement expressions
> (surprisingly quite a few spots use gcc_assert inside of comma expressions),
> but ran into PR117287, so for now such a change isn't being proposed.
> 
> The following patch attempts to move important side-effects from gcc_assert
> arguments.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux with normal
> --enable-checking=yes,rtl,extra, plus additionally I've attempted to do
> x86_64-linux bootstrap with --disable-checking and gcc_assert changed to the
> ((void)(0 && (EXPR)))
> version when --disable-checking.  That version ran into spurious middle-end
> warnings
> ../../gcc/../include/libiberty.h:733:36: error: argument to ‘alloca’ is too 
> large [-Werror=alloca-larger-than=]
> ../../gcc/tree-ssa-reassoc.cc:5659:20: note: in expansion of macro 
> ‘XALLOCAVEC’
>   int op_num = ops.length ();
>   int op_normal_num = op_num;
>   gcc_assert (op_num > 0);
>   int stmt_num = op_num - 1;
>   gimple **stmts = XALLOCAVEC (gimple *, stmt_num);
> where we have gcc_assert exactly to work-around middle-end warnings.
> Guess I'd need to also disable -Werror for this experiment, which actually
> isn't a problem with unmodified system.h, because even for
> --disable-checking we use the __builtin_unreachable at least in
> stage2/stage3 and so the warnings aren't emitted, and even if it used
> [[assume ()]]; it would work too because in stage2/stage3 we could again
> rely on assume and statement expression support.
> 
> Ok for trunk?
> 
> 2024-10-25  Jakub Jelinek  
> 
>   PR middle-end/117249
>   * tree-ssa-structalias.cc (insert_vi_for_tree): Move put calls out of
>   gcc_assert.
>   * lto-cgraph.cc (lto_symtab_encoder_delete_node): Likewise.
>   * gimple-ssa-strength-reduction.cc (get_alternative_base,
>   add_cand_for_stmt): Likewise.
>   * tree-eh.cc (add_stmt_to_eh_lp_fn): Likewise.
>   * except.cc (duplicate_eh_regions_1): Likewise.
>   * tree-ssa-reassoc.cc (insert_operand_rank): Likewise.
>   * config/nvptx/nvptx.cc (nvptx_expand_call): Use == rather than = in
>   gcc_assert.
>   * opts-common.cc (jobserver_info::disconnect): Call close outside of
>   gcc_assert and only check result in it.
>   (jobserver_info::return_token): Call write outside of gcc_assert and
>   only check result in it.
>   * genautomata.cc (output_default_latencies): Move j++ side-effect
>   outside of gcc_assert.
>   * tree-ssa-loop-ivopts.cc (get_alias_ptr_type_for_ptr_address): Use
>   == rather than = in gcc_assert.
>   * cgraph.cc (symbol_table::create_edge): Move ++edges_max_uid
>   side-effect outside of gcc_assert.
>   * pair-fusion.cc (pair_fusion_bb_info::fuse_pair): Call
>   restrict_movement outside of gcc_assert and only check result in it.
>   (pair_fusion::try_promote_writeback): Likewise.
> 
> --- gcc/tree-ssa-structalias.cc.jj2024-10-02 13:30:14.982371644 +0200
> +++ gcc/tree-ssa-structalias.cc   2024-10-23 10:42:02.584056165 +0200
> @@ -2986,7 +2986,8 @@ static void
>  insert_vi_for_tree (tree t, varinfo_t vi)
>  {
>gcc_assert (vi);
> -  gcc_assert (!vi_for_tree->put (t, vi));
> +  bool existed = vi_for_tree->put (t, vi);
> +  gcc_assert (!existed);
>  }
>  
>  /* Find the variable info for tree T in VI_FOR_TREE.  If T does not
> --- gcc/lto-cgraph.cc.jj  2024-09-24 11:31:48.729621966 +0200
> +++ gcc/lto-cgraph.cc 2024-10-23 10:43:11.013085410 +0200
> @@ -155,7 +155,8 @@ lto_symtab_encoder_delete_node (lto_symt
>last_node = encoder->nodes.pop ();
>if (last_node.node != node)
>  {
> -  gcc_assert (encoder->map->put (last_node.node, index + 1));
> +  bool existed = encoder->map->put (last_node.node, index + 1);
> +  gcc_assert (existed);
>  
>/* Move the last element to the original spot of NODE.  */
>encoder->nodes[index] = last_node;
> --- gcc/gimple-ssa-strength-reduction.cc.jj   2024-09-02 09:43:28.806148076 
> +0200
> +++ gcc/gimple-ssa-strength-reduction.cc  2024-10-23 10:38:24.386151585 
> +0200
> @@ -474,7 +474,8 @@ get_alternative_base (tree base)
>aff.offset = 0;
>expr = aff_combination_to_tree (&aff);
>  
> -  gcc_assert (!alt_base_map->put (base, base == expr ? NULL : expr));
> +  bool existed = alt_base_map->put (base, base

[committed v2] pair-fusion: Assume alias conflict if common address reg changes [PR116783]

2024-10-21 Thread Alex Coplan
This is a v2 of:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663600.html

w.r.t. v1 this just implements Richard's suggestion of using safe_push as
discussed here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665097.html

Pushed to trunk after bootstrap + regtest (all languages) on aarch64-linux-gnu.
I'll work on a backport to 14 if there's no fallout after a week or so.

Thanks,
Alex

-- >8 --

As the PR shows, pair-fusion was tricking memory_modified_in_insn_p into
returning false when a common base register (in this case, x1) was
modified between the mem and the store insn.  This lead to wrong code as
the accesses really did alias.

To avoid this sort of problem, this patch avoids invoking RTL alias
analysis altogether (and assume an alias conflict) if the two insns to
be compared share a common address register R, and the insns see different
definitions of R (i.e. it was modified in between).

gcc/ChangeLog:

PR rtl-optimization/116783
* pair-fusion.cc (def_walker::cand_addr_uses): New.
(def_walker::def_walker): Add parameter for candidate address
uses.
(def_walker::alias_conflict_p): Declare.
(def_walker::addr_reg_conflict_p): New.
(def_walker::conflict_p): New.
(store_walker::store_walker): Add parameter for candidate
address uses and pass to base ctor.
(store_walker::conflict_p): Rename to ...
(store_walker::alias_conflict_p): ... this.
(load_walker::load_walker): Add parameter for candidate
address uses and pass to base ctor.
(load_walker::conflict_p): Rename to ...
(load_walker::alias_conflict_p): ... this.
(pair_fusion_bb_info::try_fuse_pair): Collect address register
uses for candidate insns and pass down to alias walkers.

gcc/testsuite/ChangeLog:

PR rtl-optimization/116783
* g++.dg/torture/pr116783.C: New test.
diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
index 653055fdcf6..ccbb5511e9d 100644
--- a/gcc/pair-fusion.cc
+++ b/gcc/pair-fusion.cc
@@ -2089,11 +2089,80 @@ protected:
 
   def_iter_t def_iter;
   insn_info *limit;
-  def_walker (def_info *def, insn_info *limit) :
-def_iter (def), limit (limit) {}
+
+  // Array of register uses from the candidate insn which occur in MEMs.
+  use_array cand_addr_uses;
+
+  def_walker (def_info *def, insn_info *limit, use_array addr_uses) :
+def_iter (def), limit (limit), cand_addr_uses (addr_uses) {}
 
   virtual bool iter_valid () const { return *def_iter; }
 
+  // Implemented in {load,store}_walker.
+  virtual bool alias_conflict_p (int &budget) const = 0;
+
+  // Return true if the current (walking) INSN () uses a register R inside a
+  // MEM, where R is also used inside a MEM by the (static) candidate insn, and
+  // those uses see different definitions of that register.  In this case we
+  // can't rely on RTL alias analysis, and for now we conservatively assume 
that
+  // there is an alias conflict.  See PR116783.
+  bool addr_reg_conflict_p () const
+  {
+use_array curr_insn_uses = insn ()->uses ();
+auto cand_use_iter = cand_addr_uses.begin ();
+auto insn_use_iter = curr_insn_uses.begin ();
+while (cand_use_iter != cand_addr_uses.end ()
+  && insn_use_iter != curr_insn_uses.end ())
+  {
+   auto insn_use = *insn_use_iter;
+   auto cand_use = *cand_use_iter;
+   if (insn_use->regno () > cand_use->regno ())
+ cand_use_iter++;
+   else if (insn_use->regno () < cand_use->regno ())
+ insn_use_iter++;
+   else
+ {
+   // As it stands I believe the alias code (memory_modified_in_insn_p)
+   // doesn't look at insn notes such as REG_EQU{IV,AL}, so it should
+   // be safe to skip over uses that only occur in notes.
+   if (insn_use->includes_address_uses ()
+   && !insn_use->only_occurs_in_notes ()
+   && insn_use->def () != cand_use->def ())
+ {
+   if (dump_file)
+ {
+   fprintf (dump_file,
+"assuming aliasing of cand i%d and i%d:\n"
+"-> insns see different defs of common addr reg 
r%u\n"
+"-> ",
+cand_use->insn ()->uid (), insn_use->insn ()->uid 
(),
+insn_use->regno ());
+
+   // Note that while the following sequence could be made more
+   // concise by eliding pp_string calls into the pp_printf
+   // calls, doing so triggers -Wformat-diag.
+   pretty_printer pp;
+   pp_string (&pp, "[");
+   pp_access (&pp, cand_use, 0);
+   pp_string (&pp, "] in ");
+   pp_printf (&pp, "i%d", cand_use->insn ()->uid ());
+   pp_string (&pp, " vs [");
+   pp_access (&pp, insn_use, 0);
+

Re: pair-fusion: Assume alias conflict if common address reg changes [PR116783]

2024-10-18 Thread Alex Coplan
On 18/10/2024 17:45, Richard Sandiford wrote:
> Alex Coplan  writes:
> > On 11/10/2024 14:30, Richard Biener wrote:
> >> On Fri, 11 Oct 2024, Richard Sandiford wrote:
> >> 
> >> > Alex Coplan  writes:
> >> > > Hi,
> >> > >
> >> > > As the PR shows, pair-fusion was tricking memory_modified_in_insn_p 
> >> > > into
> >> > > returning false when a common base register (in this case, x1) was
> >> > > modified between the mem and the store insn.  This lead to wrong code 
> >> > > as
> >> > > the accesses really did alias.
> >> > >
> >> > > To avoid this sort of problem, this patch avoids invoking RTL alias
> >> > > analysis altogether (and assume an alias conflict) if the two insns to
> >> > > be compared share a common address register R, and the insns see 
> >> > > different
> >> > > definitions of R (i.e. it was modified in between).
> >> > >
> >> > > Bootstrapped/regtested on aarch64-linux-gnu (all languages, both 
> >> > > regular
> >> > > bootstrap and LTO+PGO bootstrap).  OK for trunk?
> >> > 
> >> > Sorry for the slow review.  The patch looks good to me, but...
> >
> > Thanks for the review.  I'd missed that you'd sent this, sorry for not
> > responding sooner.
> >
> >> > 
> >> > > @@ -2544,11 +2624,37 @@ pair_fusion_bb_info::try_fuse_pair (bool 
> >> > > load_p, unsigned access_size,
> >> > >   && bitmap_bit_p (&m_tombstone_bitmap, insn->uid ());
> >> > >};
> >> > >  
> >> > > +  // Maximum number of distinct regnos we expect to appear in a single
> >> > > +  // MEM (and thus in a candidate insn).
> >> > > +  static constexpr int max_mem_regs = 2;
> >> > > +  auto_vec addr_use_vec[2];
> >> > > +  use_array addr_uses[2];
> >> > > +
> >> > > +  // Collect the lists of register uses that occur in the candidate 
> >> > > MEMs.
> >> > > +  for (int i = 0; i < 2; i++)
> >> > > +{
> >> > > +  // N.B. it's safe for us to ignore uses that only occur in notes
> >> > > +  // here (e.g. in a REG_EQUIV expression) since we only pass the
> >> > > +  // MEM down to the alias machinery, so it can't see any 
> >> > > insn-level
> >> > > +  // notes.
> >> > > +  for (auto use : insns[i]->uses ())
> >> > > +  if (use->is_reg ()
> >> > > +  && use->includes_address_uses ()
> >> > > +  && !use->only_occurs_in_notes ())
> >> > > +{
> >> > > +  gcc_checking_assert (addr_use_vec[i].length () < 
> >> > > max_mem_regs);
> >> > > +  addr_use_vec[i].quick_push (use);
> >> > 
> >> > ...if possible, I think it would be better to just use safe_push here,
> >> > without the assert.  There'd then be no need to split max_mem_regs out;
> >> > it could just be hard-coded in the addr_use_vec declaration.
> >
> > I hadn't realised at the time that quick_push () already does a
> > gcc_checking_assert to make sure that we don't overflow.  It does:
> >
> >   template
> >   inline T *
> >   vec::quick_push (const T &obj)
> >   {
> > gcc_checking_assert (space (1));
> > T *slot = &address ()[m_vecpfx.m_num++];
> > ::new (static_cast(slot)) T (obj);
> > return slot;
> >   }
> >
> > (I checked the behaviour by writing a quick selftest in vec.cc, and it
> > indeed aborts as expected with quick_push on overflow for a
> > stack-allocated auto_vec with N = 2.)
> >
> > This means that the assert above is indeed redundant, so I agree that
> > we should be able to drop the assert and drop the max_mem_regs constant,
> > using a literal inside the auto_vec template instead (all while still
> > using quick_push).
> >
> > Does that sound OK to you, or did you have another reason to prefer
> > safe_push?  AIUI the behaviour of safe_push on overflow would be to
> > allocate a new (heap-allocated) vector instead of asserting.
> 
> I just thought it looked odd/unexpected.  Normally the intent of:
> 
>   auto_vec bar;
> 
> is to reserve a sensible amount of stack space for the common c

Re: pair-fusion: Assume alias conflict if common address reg changes [PR116783]

2024-10-18 Thread Alex Coplan
On 11/10/2024 14:30, Richard Biener wrote:
> On Fri, 11 Oct 2024, Richard Sandiford wrote:
> 
> > Alex Coplan  writes:
> > > Hi,
> > >
> > > As the PR shows, pair-fusion was tricking memory_modified_in_insn_p into
> > > returning false when a common base register (in this case, x1) was
> > > modified between the mem and the store insn.  This lead to wrong code as
> > > the accesses really did alias.
> > >
> > > To avoid this sort of problem, this patch avoids invoking RTL alias
> > > analysis altogether (and assume an alias conflict) if the two insns to
> > > be compared share a common address register R, and the insns see different
> > > definitions of R (i.e. it was modified in between).
> > >
> > > Bootstrapped/regtested on aarch64-linux-gnu (all languages, both regular
> > > bootstrap and LTO+PGO bootstrap).  OK for trunk?
> > 
> > Sorry for the slow review.  The patch looks good to me, but...

Thanks for the review.  I'd missed that you'd sent this, sorry for not
responding sooner.

> > 
> > > @@ -2544,11 +2624,37 @@ pair_fusion_bb_info::try_fuse_pair (bool load_p, 
> > > unsigned access_size,
> > >  && bitmap_bit_p (&m_tombstone_bitmap, insn->uid ());
> > >};
> > >  
> > > +  // Maximum number of distinct regnos we expect to appear in a single
> > > +  // MEM (and thus in a candidate insn).
> > > +  static constexpr int max_mem_regs = 2;
> > > +  auto_vec addr_use_vec[2];
> > > +  use_array addr_uses[2];
> > > +
> > > +  // Collect the lists of register uses that occur in the candidate MEMs.
> > > +  for (int i = 0; i < 2; i++)
> > > +{
> > > +  // N.B. it's safe for us to ignore uses that only occur in notes
> > > +  // here (e.g. in a REG_EQUIV expression) since we only pass the
> > > +  // MEM down to the alias machinery, so it can't see any insn-level
> > > +  // notes.
> > > +  for (auto use : insns[i]->uses ())
> > > + if (use->is_reg ()
> > > + && use->includes_address_uses ()
> > > + && !use->only_occurs_in_notes ())
> > > +   {
> > > + gcc_checking_assert (addr_use_vec[i].length () < max_mem_regs);
> > > + addr_use_vec[i].quick_push (use);
> > 
> > ...if possible, I think it would be better to just use safe_push here,
> > without the assert.  There'd then be no need to split max_mem_regs out;
> > it could just be hard-coded in the addr_use_vec declaration.

I hadn't realised at the time that quick_push () already does a
gcc_checking_assert to make sure that we don't overflow.  It does:

  template
  inline T *
  vec::quick_push (const T &obj)
  {
gcc_checking_assert (space (1));
T *slot = &address ()[m_vecpfx.m_num++];
::new (static_cast(slot)) T (obj);
return slot;
  }

(I checked the behaviour by writing a quick selftest in vec.cc, and it
indeed aborts as expected with quick_push on overflow for a
stack-allocated auto_vec with N = 2.)

This means that the assert above is indeed redundant, so I agree that
we should be able to drop the assert and drop the max_mem_regs constant,
using a literal inside the auto_vec template instead (all while still
using quick_push).

Does that sound OK to you, or did you have another reason to prefer
safe_push?  AIUI the behaviour of safe_push on overflow would be to
allocate a new (heap-allocated) vector instead of asserting.

> > 
> > Or does that not work for some reason?  I'm getting a sense of deja vu...
> 
> safe_push should work but as I understand the desire is to rely
> on fully on-stack pre-allocated vectors?

Yes, that was indeed the original intent.

Thanks,
Alex

> 
> > If it doesn't work, an alternative would be to use access_array_builder.
> > 
> > OK for trunk and backports if using safe_push works.
> > 
> > Thanks,
> > Richard
> > 
> > > +   }
> > > +  addr_uses[i] = use_array (addr_use_vec[i]);
> > > +}
> > > +
> > 
> > 
> > >store_walker
> > > -forward_store_walker (mem_defs[0], cand_mems[0], insns[1], 
> > > tombstone_p);
> > > +forward_store_walker (mem_defs[0], cand_mems[0], addr_uses[0], 
> > > insns[1],
> > > +   tombstone_p);
> > >  
> > >store_walker
> > > -backward_store_walker (mem_defs[1], cand_mems[1], insns[0], 
> > > tombstone_p);
> > > +backward_store_walker (mem_defs[1], cand_mems[1], addr_uses[1], 
>

Re: [committed] MAINTAINERS: Add myself as pair fusion and aarch64 ldp/stp maintainer

2024-10-18 Thread Alex Coplan
Of course I forgot to actually attach the patch, now attached :)

On 18/10/2024 11:09, Alex Coplan wrote:
> Pushed to trunk.
> 
> ChangeLog:
> 
>   * MAINTAINERS (CPU Port Maintainers): Add myself as aarch64 ldp/stp
>   maintainer.
>   (Various Maintainers): Add myself as pair fusion maintainer.
> 
diff --git a/MAINTAINERS b/MAINTAINERS
index 269ac2ea6b4..1074886f441 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -54,6 +54,7 @@ docs, and the testsuite related to that.
 
 CPU Port Maintainers(CPU alphabetical order)
 
+aarch64 ldp/stp Alex Coplan 
 aarch64 portRichard Earnshaw
 aarch64 portRichard Sandiford   
 aarch64 portMarcus Shawcroft
@@ -251,6 +252,7 @@ AutoFDO Eugene Rozenfeld

 reload  Ulrich Weigand  
 RTL optimizers  Eric Botcazou   
 instruction combinerSegher Boessenkool  
+pair fusion Alex Coplan 
 auto-vectorizer Richard Biener  
 auto-vectorizer Zdenek Dvorak   
 loop infrastructure Zdenek Dvorak   


[committed] MAINTAINERS: Add myself as pair fusion and aarch64 ldp/stp maintainer

2024-10-18 Thread Alex Coplan
Pushed to trunk.

ChangeLog:

* MAINTAINERS (CPU Port Maintainers): Add myself as aarch64 ldp/stp
maintainer.
(Various Maintainers): Add myself as pair fusion maintainer.



Re: pair-fusion: Assume alias conflict if common address reg changes [PR116783]

2024-10-07 Thread Alex Coplan
On 23/09/2024 11:31, Alex Coplan wrote:
> Hi,
> 
> As the PR shows, pair-fusion was tricking memory_modified_in_insn_p into
> returning false when a common base register (in this case, x1) was
> modified between the mem and the store insn.  This lead to wrong code as
> the accesses really did alias.
> 
> To avoid this sort of problem, this patch avoids invoking RTL alias
> analysis altogether (and assume an alias conflict) if the two insns to
> be compared share a common address register R, and the insns see different
> definitions of R (i.e. it was modified in between).
> 
> Bootstrapped/regtested on aarch64-linux-gnu (all languages, both regular
> bootstrap and LTO+PGO bootstrap).  OK for trunk?

Ping:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663600.html

I realise it was bad timing on my part sending this just after
Richard (S) went away for a week, sorry about that!

Alex

> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   PR rtl-optimization/116783
>   * pair-fusion.cc (def_walker::cand_addr_uses): New.
>   (def_walker::def_walker): Add parameter for candidate address
>   uses.
>   (def_walker::alias_conflict_p): Declare.
>   (def_walker::addr_reg_conflict_p): New.
>   (def_walker::conflict_p): New.
>   (store_walker::store_walker): Add parameter for candidate
>   address uses and pass to base ctor.
>   (store_walker::conflict_p): Rename to ...
>   (store_walker::alias_conflict_p): ... this.
>   (load_walker::load_walker): Add parameter for candidate
>   address uses and pass to base ctor.
>   (load_walker::conflict_p): Rename to ...
>   (load_walker::alias_conflict_p): ... this.
>   (pair_fusion_bb_info::try_fuse_pair): Collect address register
>   uses for candidate insns and pass down to alias walkers.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR rtl-optimization/116783
>   * g++.dg/torture/pr116783.C: New test.

> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
> index cb0374f426b..b1ea611bacd 100644
> --- a/gcc/pair-fusion.cc
> +++ b/gcc/pair-fusion.cc
> @@ -2089,11 +2089,80 @@ protected:
>  
>def_iter_t def_iter;
>insn_info *limit;
> -  def_walker (def_info *def, insn_info *limit) :
> -def_iter (def), limit (limit) {}
> +
> +  // Array of register uses from the candidate insn which occur in MEMs.
> +  use_array cand_addr_uses;
> +
> +  def_walker (def_info *def, insn_info *limit, use_array addr_uses) :
> +def_iter (def), limit (limit), cand_addr_uses (addr_uses) {}
>  
>virtual bool iter_valid () const { return *def_iter; }
>  
> +  // Implemented in {load,store}_walker.
> +  virtual bool alias_conflict_p (int &budget) const = 0;
> +
> +  // Return true if the current (walking) INSN () uses a register R inside a
> +  // MEM, where R is also used inside a MEM by the (static) candidate insn, 
> and
> +  // those uses see different definitions of that register.  In this case we
> +  // can't rely on RTL alias analysis, and for now we conservatively assume 
> that
> +  // there is an alias conflict.  See PR116783.
> +  bool addr_reg_conflict_p () const
> +  {
> +use_array curr_insn_uses = insn ()->uses ();
> +auto cand_use_iter = cand_addr_uses.begin ();
> +auto insn_use_iter = curr_insn_uses.begin ();
> +while (cand_use_iter != cand_addr_uses.end ()
> +&& insn_use_iter != curr_insn_uses.end ())
> +  {
> + auto insn_use = *insn_use_iter;
> + auto cand_use = *cand_use_iter;
> + if (insn_use->regno () > cand_use->regno ())
> +   cand_use_iter++;
> + else if (insn_use->regno () < cand_use->regno ())
> +   insn_use_iter++;
> + else
> +   {
> + // As it stands I believe the alias code (memory_modified_in_insn_p)
> + // doesn't look at insn notes such as REG_EQU{IV,AL}, so it should
> + // be safe to skip over uses that only occur in notes.
> + if (insn_use->includes_address_uses ()
> + && !insn_use->only_occurs_in_notes ()
> + && insn_use->def () != cand_use->def ())
> +   {
> + if (dump_file)
> +   {
> + fprintf (dump_file,
> +  "assuming aliasing of cand i%d and i%d:\n"
> +  "-> insns see different defs of common addr reg 
> r%u\n"
> +  "-> ",
> +  cand_use->insn ()->uid (), insn_use->insn ()->uid 
> (),
> +  insn_use->regno ());
> +
> + // 

[PATCH] testsuite: Prevent unrolling of main in LTO test [PR116683]

2024-09-26 Thread Alex Coplan
Hi,

In r15-3585-g9759f6299d9633cabac540e5c893341c708093ac I added a test which
started failing on PowerPC.  The test checks that we unroll exactly one loop
three times with the following:

// { dg-final { scan-ltrans-rtl-dump-times "Unrolled loop 3 times" 1 
"loop2_unroll" } }

which passes on most targets.  However, on PowerPC, the loop in main
gets unrolled too, causing the scan-ltrans-rtl-dump-times check to fail
as the statement now appears twice in the dump.  I think the extra
unrolling is due to different unrolling heuristics in the rs6000 port.

This patch therefore explicitly tries to block the unrolling in main with an
appropriate #pragma.

I've checked that the test now passes on power (on cfarm29) and aarch64.  I also
checked that reverting the lto-streamer-{in,out}.cc changes still causes the
test to fail (on aarch64).

OK for trunk?

Thanks,
Alex

gcc/testsuite/ChangeLog:

PR testsuite/116683
* g++.dg/ext/pragma-unroll-lambda-lto.C (main): Add #pragma to
prevent unrolling of the setup loop.
diff --git a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C 
b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
index 64cdf90f34d..20cbd2d15cf 100644
--- a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
+++ b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
@@ -24,6 +24,7 @@ short *use_find(short *p)
 int main(void)
 {
   short a[1024];
+#pragma GCC unroll 0
   for (int i = 0; i < 1024; i++)
 a[i] = rand ();
 


Re: [PATCH] gdbhooks: Handle references to vec* in VecPrinter

2024-09-23 Thread Alex Coplan
On 30/08/2024 18:11, Alex Coplan wrote:
> Hi,
> 
> vec.h has this method:
> 
>   template
>   inline T *
>   vec_safe_push (vec *&v, const T &obj CXX_MEM_STAT_INFO)
> 
> where v is a reference to a pointer to vec.  This matches the regex for
> VecPrinter, so gdbhooks.py attempts to print it but chokes on the reference.
> I see the following:
> 
>   #1  0x02b84b7b in vec_safe_push (v=Traceback (most
>   recent call last):
> File "$SRC/gcc/gcc/gdbhooks.py", line 486, in to_string
>   return '0x%x' % intptr(self.gdbval)
> File "$SRC/gcc/gcc/gdbhooks.py", line 168, in intptr
>   return long(gdbval) if sys.version_info.major == 2 else int(gdbval)
>   gdb.error: Cannot convert value to long.
> 
> This patch makes VecPrinter handle such references by stripping them
> (dereferencing) at the top of the relevant functions.
> 
> I thought about trying to make VecPrinter.{to_string,children} robust
> against non-pointer values (i.e. actual vec structs) as the current
> calls to intptr will fail on those.  However, I then realised that the
> current regex only matches pointer types:
> 
>   pp.add_printer_for_regex(r'vec<(\S+), (\S+), (\S+)> \*',
>'vec',
>VecPrinter)
> 
> That is somewhat at odds with the (pre-existing) code in
> VecPrinter.children which appears to attempt to handle non-pointer
> types.  ISTM either we should drop the handling for non-pointer types
> (since the regex requires a pointer) or (perhaps more usefully) relax
> the regex to allow matching a plain vec<...> struct and fix the member
> functions to handle those properly.
> 
> Any thoughts on that, Dave?  Is the current patch OK as an intermediate
> step (manually tested by verifying both a vec*& and vec* print OK)?

Gentle ping on this.

> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   * gdbhooks.py (strip_ref): New. Use it ...
>   (VecPrinter.to_string): ... here,
>   (VecPrinter.children): ... and here.

> diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
> index 904ee28423a..a91e5fd2a83 100644
> --- a/gcc/gdbhooks.py
> +++ b/gcc/gdbhooks.py
> @@ -472,6 +472,11 @@ def get_vec_kind(val):
>  else:
>  assert False, f"unexpected vec kind {kind}"
>  
> +def strip_ref(gdbval):
> +if gdbval.type.code == gdb.TYPE_CODE_REF:
> +return gdbval.referenced_value ()
> +return gdbval
> +
>  class VecPrinter:
>  #-ex "up" -ex "p bb->preds"
>  def __init__(self, gdbval):
> @@ -483,10 +488,10 @@ class VecPrinter:
>  def to_string (self):
>  # A trivial implementation; prettyprinting the contents is done
>  # by gdb calling the "children" method below.
> -return '0x%x' % intptr(self.gdbval)
> +return '0x%x' % intptr(strip_ref(self.gdbval))
>  
>  def children (self):
> -val = self.gdbval
> +val = strip_ref(self.gdbval)
>  if intptr(val) != 0 and get_vec_kind(val) == VEC_KIND_PTR:
>  val = val['m_vec']
>  



pair-fusion: Assume alias conflict if common address reg changes [PR116783]

2024-09-23 Thread Alex Coplan
Hi,

As the PR shows, pair-fusion was tricking memory_modified_in_insn_p into
returning false when a common base register (in this case, x1) was
modified between the mem and the store insn.  This lead to wrong code as
the accesses really did alias.

To avoid this sort of problem, this patch avoids invoking RTL alias
analysis altogether (and assume an alias conflict) if the two insns to
be compared share a common address register R, and the insns see different
definitions of R (i.e. it was modified in between).

Bootstrapped/regtested on aarch64-linux-gnu (all languages, both regular
bootstrap and LTO+PGO bootstrap).  OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

PR rtl-optimization/116783
* pair-fusion.cc (def_walker::cand_addr_uses): New.
(def_walker::def_walker): Add parameter for candidate address
uses.
(def_walker::alias_conflict_p): Declare.
(def_walker::addr_reg_conflict_p): New.
(def_walker::conflict_p): New.
(store_walker::store_walker): Add parameter for candidate
address uses and pass to base ctor.
(store_walker::conflict_p): Rename to ...
(store_walker::alias_conflict_p): ... this.
(load_walker::load_walker): Add parameter for candidate
address uses and pass to base ctor.
(load_walker::conflict_p): Rename to ...
(load_walker::alias_conflict_p): ... this.
(pair_fusion_bb_info::try_fuse_pair): Collect address register
uses for candidate insns and pass down to alias walkers.

gcc/testsuite/ChangeLog:

PR rtl-optimization/116783
* g++.dg/torture/pr116783.C: New test.
diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
index cb0374f426b..b1ea611bacd 100644
--- a/gcc/pair-fusion.cc
+++ b/gcc/pair-fusion.cc
@@ -2089,11 +2089,80 @@ protected:
 
   def_iter_t def_iter;
   insn_info *limit;
-  def_walker (def_info *def, insn_info *limit) :
-def_iter (def), limit (limit) {}
+
+  // Array of register uses from the candidate insn which occur in MEMs.
+  use_array cand_addr_uses;
+
+  def_walker (def_info *def, insn_info *limit, use_array addr_uses) :
+def_iter (def), limit (limit), cand_addr_uses (addr_uses) {}
 
   virtual bool iter_valid () const { return *def_iter; }
 
+  // Implemented in {load,store}_walker.
+  virtual bool alias_conflict_p (int &budget) const = 0;
+
+  // Return true if the current (walking) INSN () uses a register R inside a
+  // MEM, where R is also used inside a MEM by the (static) candidate insn, and
+  // those uses see different definitions of that register.  In this case we
+  // can't rely on RTL alias analysis, and for now we conservatively assume 
that
+  // there is an alias conflict.  See PR116783.
+  bool addr_reg_conflict_p () const
+  {
+use_array curr_insn_uses = insn ()->uses ();
+auto cand_use_iter = cand_addr_uses.begin ();
+auto insn_use_iter = curr_insn_uses.begin ();
+while (cand_use_iter != cand_addr_uses.end ()
+  && insn_use_iter != curr_insn_uses.end ())
+  {
+   auto insn_use = *insn_use_iter;
+   auto cand_use = *cand_use_iter;
+   if (insn_use->regno () > cand_use->regno ())
+ cand_use_iter++;
+   else if (insn_use->regno () < cand_use->regno ())
+ insn_use_iter++;
+   else
+ {
+   // As it stands I believe the alias code (memory_modified_in_insn_p)
+   // doesn't look at insn notes such as REG_EQU{IV,AL}, so it should
+   // be safe to skip over uses that only occur in notes.
+   if (insn_use->includes_address_uses ()
+   && !insn_use->only_occurs_in_notes ()
+   && insn_use->def () != cand_use->def ())
+ {
+   if (dump_file)
+ {
+   fprintf (dump_file,
+"assuming aliasing of cand i%d and i%d:\n"
+"-> insns see different defs of common addr reg 
r%u\n"
+"-> ",
+cand_use->insn ()->uid (), insn_use->insn ()->uid 
(),
+insn_use->regno ());
+
+   // Note that while the following sequence could be made more
+   // concise by eliding pp_string calls into the pp_printf
+   // calls, doing so triggers -Wformat-diag.
+   pretty_printer pp;
+   pp_string (&pp, "[");
+   pp_access (&pp, cand_use, 0);
+   pp_string (&pp, "] in ");
+   pp_printf (&pp, "i%d", cand_use->insn ()->uid ());
+   pp_string (&pp, " vs [");
+   pp_access (&pp, insn_use, 0);
+   pp_string (&pp, "] in ");
+   pp_printf (&pp, "i%d", insn_use->insn ()->uid ());
+   fprintf (dump_file, "%s\n", pp_formatted_text (&pp));
+ }
+   return true;
+ }
+
+   cand_use_iter++;
+ 

Re: [PATCH v3] c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-09-11 Thread Alex Coplan
On 10/09/2024 10:29, Jason Merrill wrote:
> On 9/10/24 6:10 AM, Alex Coplan wrote:
> > On 27/08/2024 10:55, Alex Coplan wrote:
> > > Hi,
> > > 
> > > This is a v3 that hopefully addresses the feedback from both Jason and
> > > Jakub.  v2 was posted here:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660191.html
> > 
> > Gentle ping on this C++ patch:
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661559.html
> > 
> > Jason, are you OK with this approach, or would you prefer to not make the
> > INTEGER_CST assumption and do something along the lines of your last 
> > suggestion
> > instead:
> > 
> > > Perhaps we want a recompute_expr_flags like the existing
> > > recompute_constructor_flags, so we don't need to duplicate PROCESS_ARG
> > > logic elsewhere.
> > 
> > ?  Sorry, I'd missed that reply when I wrote the v3 patch.
> 
> I still think that function would be nice to have, but the patch is OK as
> is.

Thanks, I've pushed the patch and the rest of the series as:

3fd07d4f04f libstdc++: Restore unrolling in std::find using pragma [PR116140]
9759f6299d9 lto: Stream has_unroll flag during LTO [PR116140]
31ff173c708 testsuite: Ensure ltrans dump files get cleaned up properly 
[PR116140]
f97d86242b8 c++: Ensure ANNOTATE_EXPRs remain outermost expressions in 
conditions [PR116140]

Alex

> 
> > Thanks,
> > Alex
> > 
> > > 
> > > (Sorry for the delay in posting the re-spin, I was away last week.)
> > > 
> > > In this version we refactor to introudce a helper class (annotate_saver)
> > > which is much less invasive to the caller (maybe_convert_cond) and
> > > should (at least in theory) be reuseable elsewhere.
> > > 
> > > This version also relies on the assumption that operands 1 and 2 of
> > > ANNOTATE_EXPRs are INTEGER_CSTs, which simplifies the flag updates
> > > without having to rely on assumptions about the specific changes made
> > > in maybe_convert_cond.
> > > 
> > > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
> > > 
> > > Thanks,
> > > Alex
> > > 
> > > -- >8 --
> > > 
> > > For the testcase added with this patch, we would end up losing the:
> > > 
> > >#pragma GCC unroll 4
> > > 
> > > and emitting "warning: ignoring loop annotation".  That warning comes
> > > from tree-cfg.cc:replace_loop_annotate, and means that we failed to
> > > process the ANNOTATE_EXPR in tree-cfg.cc:replace_loop_annotate_in_block.
> > > That function walks backwards over the GIMPLE in an exiting BB for a
> > > loop, skipping over the final gcond, and looks for any ANNOTATE_EXPRS
> > > immediately preceding the gcond.
> > > 
> > > The function documents the following pre-condition:
> > > 
> > > /* [...] We assume that the annotations come immediately before the
> > >condition in BB, if any.  */
> > > 
> > > now looking at the exiting BB of the loop, we have:
> > > 
> > > :
> > >D.4524 = .ANNOTATE (iftmp.1, 1, 4);
> > >retval.0 = D.4524;
> > >if (retval.0 != 0)
> > >  goto ; [INV]
> > >else
> > >  goto ; [INV]
> > > 
> > > and crucially there is an intervening assignment between the gcond and
> > > the preceding .ANNOTATE ifn call.  To see where this comes from, we can
> > > look to the IR given by -fdump-tree-original:
> > > 
> > >if (< > >  int*)operator() (&pred, *first), unroll 4>>>)
> > >  goto ;
> > >else
> > >  goto ;
> > > 
> > > here the problem is that we've wrapped a CLEANUP_POINT_EXPR around the
> > > ANNOTATE_EXPR, meaning the ANNOTATE_EXPR is no longer the outermost
> > > expression in the condition.
> > > 
> > > The CLEANUP_POINT_EXPR gets added by the following call chain:
> > > 
> > > finish_while_stmt_cond
> > >   -> maybe_convert_cond
> > >   -> condition_conversion
> > >   -> fold_build_cleanup_point_expr
> > > 
> > > this patch chooses to fix the issue by first introducing a new helper
> > > class (annotate_saver) to save and restore outer chains of
> > > ANNOTATE_EXPRs and then using it in maybe_convert_cond.
> > > 
> > > With this patch, we don't get any such warning and the loop gets unrolled 
> > > as
> > > expected at 

Re: [PATCH v3] c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-09-10 Thread Alex Coplan
On 27/08/2024 10:55, Alex Coplan wrote:
> Hi,
> 
> This is a v3 that hopefully addresses the feedback from both Jason and
> Jakub.  v2 was posted here:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660191.html

Gentle ping on this C++ patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661559.html

Jason, are you OK with this approach, or would you prefer to not make the
INTEGER_CST assumption and do something along the lines of your last suggestion
instead:

> Perhaps we want a recompute_expr_flags like the existing 
> recompute_constructor_flags, so we don't need to duplicate PROCESS_ARG 
> logic elsewhere.

?  Sorry, I'd missed that reply when I wrote the v3 patch.

Thanks,
Alex

> 
> (Sorry for the delay in posting the re-spin, I was away last week.)
> 
> In this version we refactor to introudce a helper class (annotate_saver)
> which is much less invasive to the caller (maybe_convert_cond) and
> should (at least in theory) be reuseable elsewhere.
> 
> This version also relies on the assumption that operands 1 and 2 of
> ANNOTATE_EXPRs are INTEGER_CSTs, which simplifies the flag updates
> without having to rely on assumptions about the specific changes made
> in maybe_convert_cond.
> 
> Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
> 
> Thanks,
> Alex
> 
> -- >8 --
> 
> For the testcase added with this patch, we would end up losing the:
> 
>   #pragma GCC unroll 4
> 
> and emitting "warning: ignoring loop annotation".  That warning comes
> from tree-cfg.cc:replace_loop_annotate, and means that we failed to
> process the ANNOTATE_EXPR in tree-cfg.cc:replace_loop_annotate_in_block.
> That function walks backwards over the GIMPLE in an exiting BB for a
> loop, skipping over the final gcond, and looks for any ANNOTATE_EXPRS
> immediately preceding the gcond.
> 
> The function documents the following pre-condition:
> 
>/* [...] We assume that the annotations come immediately before the
>   condition in BB, if any.  */
> 
> now looking at the exiting BB of the loop, we have:
> 
>:
>   D.4524 = .ANNOTATE (iftmp.1, 1, 4);
>   retval.0 = D.4524;
>   if (retval.0 != 0)
> goto ; [INV]
>   else
> goto ; [INV]
> 
> and crucially there is an intervening assignment between the gcond and
> the preceding .ANNOTATE ifn call.  To see where this comes from, we can
> look to the IR given by -fdump-tree-original:
> 
>   if (< int*)operator() (&pred, *first), unroll 4>>>)
> goto ;
>   else
> goto ;
> 
> here the problem is that we've wrapped a CLEANUP_POINT_EXPR around the
> ANNOTATE_EXPR, meaning the ANNOTATE_EXPR is no longer the outermost
> expression in the condition.
> 
> The CLEANUP_POINT_EXPR gets added by the following call chain:
> 
> finish_while_stmt_cond
>  -> maybe_convert_cond
>  -> condition_conversion
>  -> fold_build_cleanup_point_expr
> 
> this patch chooses to fix the issue by first introducing a new helper
> class (annotate_saver) to save and restore outer chains of
> ANNOTATE_EXPRs and then using it in maybe_convert_cond.
> 
> With this patch, we don't get any such warning and the loop gets unrolled as
> expected at -O2.
> 
> gcc/cp/ChangeLog:
> 
> PR libstdc++/116140
> * semantics.cc (anotate_saver): New. Use it ...
> (maybe_convert_cond): ... here, to ensure any ANNOTATE_EXPRs
> remain the outermost expression(s) of the condition.
> 
> gcc/testsuite/ChangeLog:
> 
> PR libstdc++/116140
> * g++.dg/ext/pragma-unroll-lambda.C: New test.

> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 5ab2076b673..b1a49b14238 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -951,6 +951,86 @@ maybe_warn_unparenthesized_assignment (tree t, bool 
> nested_p,
>  }
>  }
>  
> +/* Helper class for saving/restoring ANNOTATE_EXPRs.  For a tree node t, 
> users
> +   can construct one of these like so:
> +
> + annotate_saver s (&t);
> +
> +   and t will be updated to have any annotations removed.  The user can then
> +   transform t, and later restore the ANNOTATE_EXPRs with:
> +
> + t = s.restore (t).
> +
> +   The intent is to ensure that any ANNOTATE_EXPRs remain the outermost
> +   expressions following any operations on t.  */
> +
> +class annotate_saver {
> +  /* The chain of saved annotations, if there were any.  Otherwise null.  */
> +  tree m_annotations;
> +
> +  /* If M_ANNOTATIONS is non-null, then M_INNER points to TREE_OPERAND (A, 0)
> + for the innermost annotation A.  */
> +  tree *m_inner;
> +
> +public:
> +  annotate_s

[PATCH] gdbhooks: Handle references to vec* in VecPrinter

2024-08-30 Thread Alex Coplan
Hi,

vec.h has this method:

  template
  inline T *
  vec_safe_push (vec *&v, const T &obj CXX_MEM_STAT_INFO)

where v is a reference to a pointer to vec.  This matches the regex for
VecPrinter, so gdbhooks.py attempts to print it but chokes on the reference.
I see the following:

  #1  0x02b84b7b in vec_safe_push (v=Traceback (most
  recent call last):
File "$SRC/gcc/gcc/gdbhooks.py", line 486, in to_string
  return '0x%x' % intptr(self.gdbval)
File "$SRC/gcc/gcc/gdbhooks.py", line 168, in intptr
  return long(gdbval) if sys.version_info.major == 2 else int(gdbval)
  gdb.error: Cannot convert value to long.

This patch makes VecPrinter handle such references by stripping them
(dereferencing) at the top of the relevant functions.

I thought about trying to make VecPrinter.{to_string,children} robust
against non-pointer values (i.e. actual vec structs) as the current
calls to intptr will fail on those.  However, I then realised that the
current regex only matches pointer types:

  pp.add_printer_for_regex(r'vec<(\S+), (\S+), (\S+)> \*',
   'vec',
   VecPrinter)

That is somewhat at odds with the (pre-existing) code in
VecPrinter.children which appears to attempt to handle non-pointer
types.  ISTM either we should drop the handling for non-pointer types
(since the regex requires a pointer) or (perhaps more usefully) relax
the regex to allow matching a plain vec<...> struct and fix the member
functions to handle those properly.

Any thoughts on that, Dave?  Is the current patch OK as an intermediate
step (manually tested by verifying both a vec*& and vec* print OK)?

Thanks,
Alex

gcc/ChangeLog:

* gdbhooks.py (strip_ref): New. Use it ...
(VecPrinter.to_string): ... here,
(VecPrinter.children): ... and here.
diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
index 904ee28423a..a91e5fd2a83 100644
--- a/gcc/gdbhooks.py
+++ b/gcc/gdbhooks.py
@@ -472,6 +472,11 @@ def get_vec_kind(val):
 else:
 assert False, f"unexpected vec kind {kind}"
 
+def strip_ref(gdbval):
+if gdbval.type.code == gdb.TYPE_CODE_REF:
+return gdbval.referenced_value ()
+return gdbval
+
 class VecPrinter:
 #-ex "up" -ex "p bb->preds"
 def __init__(self, gdbval):
@@ -483,10 +488,10 @@ class VecPrinter:
 def to_string (self):
 # A trivial implementation; prettyprinting the contents is done
 # by gdb calling the "children" method below.
-return '0x%x' % intptr(self.gdbval)
+return '0x%x' % intptr(strip_ref(self.gdbval))
 
 def children (self):
-val = self.gdbval
+val = strip_ref(self.gdbval)
 if intptr(val) != 0 and get_vec_kind(val) == VEC_KIND_PTR:
 val = val['m_vec']
 


Re: [PATCH] gdbhooks: Fix printing of vec with vl_ptr layout

2024-08-30 Thread Alex Coplan
On 30/08/2024 10:12, David Malcolm wrote:
> On Fri, 2024-08-30 at 12:08 +0100, Alex Coplan wrote:
> > Hi,
> > 
> > As it stands, the pretty printing of GCC's vecs by gdbhooks.py only
> > handles vectors with vl_embed layout.  As such, when encountering a
> > vec
> > with vl_ptr layout, GDB would print a diagnostic like:
> > 
> >   gdb.error: There is no member or method named m_vecpfx.
> > 
> > when (e.g.) any such vec occurred in a backtrace.  This patch extends
> > VecPrinter.children to also handle vl_ptr vectors.
> > 
> > Manually tested by verifying that vl_embed vectors still print
> > correctly
> > and empty vl_ptr vectors no longer trigger errors.
> > 
> > OK for trunk? 
> 
> Thanks for fixing this.
> 
> +else:
> +assert False, f"unxpected vec kind {kind}"
> 
> Typo: "unxpected" -> "unexpected"
> 
> Otherwise, looks good for trunk (with a ChangeLog).

Thanks for the review, I've pushed this to trunk with the typo fixed and
a suitable ChangeLog (as g:5020f8ea80af90d8a08eff9fdef3276056df98f5).

Not entirely sure what happened to the ChangeLog as I remember writing
one, but it seems I must have lost it somewhere along the way.  Sorry
about that!

FYI there will likely be at least one follow-on patch as I found another
case where this code trips over on references to vec*.

Thanks,
Alex

> 
> Thanks again
> Dave
> 


[PATCH] gdbhooks: Fix printing of vec with vl_ptr layout

2024-08-30 Thread Alex Coplan
Hi,

As it stands, the pretty printing of GCC's vecs by gdbhooks.py only
handles vectors with vl_embed layout.  As such, when encountering a vec
with vl_ptr layout, GDB would print a diagnostic like:

  gdb.error: There is no member or method named m_vecpfx.

when (e.g.) any such vec occurred in a backtrace.  This patch extends
VecPrinter.children to also handle vl_ptr vectors.

Manually tested by verifying that vl_embed vectors still print correctly
and empty vl_ptr vectors no longer trigger errors.

OK for trunk?

Thanks,
Alex
diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
index 7a64c03b8ac..2c0a074d0b6 100644
--- a/gcc/gdbhooks.py
+++ b/gcc/gdbhooks.py
@@ -453,6 +453,25 @@ class PassPrinter:
 
 ##
 
+VEC_KIND_EMBED = 0
+VEC_KIND_PTR = 1
+
+"""
+Given a vec or pointer to vec, return its layout (embedded or space
+efficient).
+"""
+def get_vec_kind(val):
+typ = val.type
+if typ.code == gdb.TYPE_CODE_PTR:
+typ = typ.target()
+kind = typ.template_argument(2).name
+if kind == "vl_embed":
+return VEC_KIND_EMBED
+elif kind == "vl_ptr":
+return VEC_KIND_PTR
+else:
+assert False, f"unxpected vec kind {kind}"
+
 class VecPrinter:
 #-ex "up" -ex "p bb->preds"
 def __init__(self, gdbval):
@@ -467,11 +486,16 @@ class VecPrinter:
 return '0x%x' % intptr(self.gdbval)
 
 def children (self):
-if intptr(self.gdbval) == 0:
+val = self.gdbval
+if intptr(val) != 0 and get_vec_kind(val) == VEC_KIND_PTR:
+val = val['m_vec']
+
+if intptr(val) == 0:
 return
-m_vecpfx = self.gdbval['m_vecpfx']
+
+assert get_vec_kind(val) == VEC_KIND_EMBED
+m_vecpfx = val['m_vecpfx']
 m_num = m_vecpfx['m_num']
-val = self.gdbval
 typ = val.type
 if typ.code == gdb.TYPE_CODE_PTR:
 typ = typ.target()


[PATCH] testsuite: Rename scanltranstree.exp -> scanltrans.exp

2024-08-29 Thread Alex Coplan
Hi,

Since r15-3254-g3f51f0dc88ec21c1ec79df694200f10ef85915f4
added scan-ltrans-rtl* variants to scanltranstree.exp, it no longer
makes sense to have "tree" in the name.  This renames the file
accordingly and updates users.

Tested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

libatomic/ChangeLog:

* testsuite/lib/libatomic.exp: Load scanltrans.exp instead of
scanltranstree.exp.

libgomp/ChangeLog:

* testsuite/lib/libgomp.exp: Load scanltrans.exp instead of
scanltranstree.exp.

libitm/ChangeLog:

* testsuite/lib/libitm.exp: Load scanltrans.exp instead of
scanltranstree.exp.

libphobos/ChangeLog:

* testsuite/lib/libphobos-dg.exp: Load scanltrans.exp instead of
scanltranstree.exp.

libvtv/ChangeLog:

* testsuite/lib/libvtv.exp: Load scanltrans.exp instead of
scanltranstree.exp.

gcc/testsuite/ChangeLog:

* gcc.dg-selftests/dg-final.exp: Load scanltrans.exp instead of
scanltranstree.exp.
* lib/gcc-dg.exp: Likewise.
* lib/scanltranstree.exp: Rename to ...
* lib/scanltrans.exp: ... this.
diff --git a/gcc/testsuite/gcc.dg-selftests/dg-final.exp 
b/gcc/testsuite/gcc.dg-selftests/dg-final.exp
index 6b6f32e0510..5503b0c0911 100644
--- a/gcc/testsuite/gcc.dg-selftests/dg-final.exp
+++ b/gcc/testsuite/gcc.dg-selftests/dg-final.exp
@@ -23,7 +23,7 @@ load_lib "scanlang.exp"
 load_lib "lto.exp"
 load_lib "scanasm.exp"
 load_lib "scanwpaipa.exp"
-load_lib "scanltranstree.exp"
+load_lib "scanltrans.exp"
 load_lib "scanoffloadtree.exp"
 load_lib "scanoffloadrtl.exp"
 load_lib "gcc-dg.exp"
diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index 992062103c1..d9513e2859c 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -21,7 +21,7 @@ load_lib target-supports-dg.exp
 load_lib scanasm.exp
 load_lib scanrtl.exp
 load_lib scantree.exp
-load_lib scanltranstree.exp
+load_lib scanltrans.exp
 load_lib scanipa.exp
 load_lib scanwpaipa.exp
 load_lib scanlang.exp
diff --git a/gcc/testsuite/lib/scanltranstree.exp 
b/gcc/testsuite/lib/scanltrans.exp
similarity index 100%
rename from gcc/testsuite/lib/scanltranstree.exp
rename to gcc/testsuite/lib/scanltrans.exp
diff --git a/libatomic/testsuite/lib/libatomic.exp 
b/libatomic/testsuite/lib/libatomic.exp
index ed6ba806732..642530557f7 100644
--- a/libatomic/testsuite/lib/libatomic.exp
+++ b/libatomic/testsuite/lib/libatomic.exp
@@ -38,7 +38,7 @@ load_gcc_lib scanlang.exp
 load_gcc_lib scanrtl.exp
 load_gcc_lib scansarif.exp
 load_gcc_lib scantree.exp
-load_gcc_lib scanltranstree.exp
+load_gcc_lib scanltrans.exp
 load_gcc_lib scanipa.exp
 load_gcc_lib scanwpaipa.exp
 load_gcc_lib multiline.exp
diff --git a/libgomp/testsuite/lib/libgomp.exp 
b/libgomp/testsuite/lib/libgomp.exp
index 7c109262916..2d0339b5e56 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -31,7 +31,7 @@ load_gcc_lib scanlang.exp
 load_gcc_lib scanrtl.exp
 load_gcc_lib scansarif.exp
 load_gcc_lib scantree.exp
-load_gcc_lib scanltranstree.exp
+load_gcc_lib scanltrans.exp
 load_gcc_lib scanoffload.exp
 load_gcc_lib scanoffloadipa.exp
 load_gcc_lib scanoffloadtree.exp
diff --git a/libitm/testsuite/lib/libitm.exp b/libitm/testsuite/lib/libitm.exp
index 3e60797c3e3..0182234a24a 100644
--- a/libitm/testsuite/lib/libitm.exp
+++ b/libitm/testsuite/lib/libitm.exp
@@ -44,7 +44,7 @@ load_gcc_lib scanlang.exp
 load_gcc_lib scanrtl.exp
 load_gcc_lib scansarif.exp
 load_gcc_lib scantree.exp
-load_gcc_lib scanltranstree.exp
+load_gcc_lib scanltrans.exp
 load_gcc_lib scanipa.exp
 load_gcc_lib scanwpaipa.exp
 load_gcc_lib timeout-dg.exp
diff --git a/libphobos/testsuite/lib/libphobos-dg.exp 
b/libphobos/testsuite/lib/libphobos-dg.exp
index 965ff025a04..90bc02ef5e5 100644
--- a/libphobos/testsuite/lib/libphobos-dg.exp
+++ b/libphobos/testsuite/lib/libphobos-dg.exp
@@ -17,7 +17,7 @@
 load_gcc_lib multiline.exp
 load_gcc_lib prune.exp
 load_gcc_lib scandump.exp
-load_gcc_lib scanltranstree.exp
+load_gcc_lib scanltrans.exp
 load_gcc_lib scanwpaipa.exp
 load_gcc_lib file-format.exp
 load_gcc_lib scanasm.exp
diff --git a/libvtv/testsuite/lib/libvtv.exp b/libvtv/testsuite/lib/libvtv.exp
index bfd03d7d258..788a207e948 100644
--- a/libvtv/testsuite/lib/libvtv.exp
+++ b/libvtv/testsuite/lib/libvtv.exp
@@ -42,7 +42,7 @@ load_gcc_lib scanasm.exp
 load_gcc_lib scandump.exp
 load_gcc_lib scanrtl.exp
 load_gcc_lib scantree.exp
-load_gcc_lib scanltranstree.exp
+load_gcc_lib scanltrans.exp
 load_gcc_lib scanipa.exp
 load_gcc_lib scanwpaipa.exp
 load_gcc_lib timeout-dg.exp


Re: [PATCH v2 2/5] testsuite: Add scan-ltrans-rtl* for use in dg-final [PR116140]

2024-08-29 Thread Alex Coplan
On 28/08/2024 18:34, Andrew Pinski wrote:
> On Wed, Aug 28, 2024 at 4:05 AM Alex Coplan  wrote:
> >
> > On 28/08/2024 11:53, Richard Sandiford wrote:
> > > Alex Coplan  writes:
> > > > Hi,
> > > >
> > > > This is a v2 of:
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659966.html
> > > > which is rebased on top of Richard S's patch to reduce the 
> > > > cut-and-paste in
> > > > scanltranstree.exp (thanks again for doing that).
> > > >
> > > > Tested on aarch64-linux-gnu, OK for trunk?
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > -- >8 --
> > > >
> > > > This extends the scan-ltrans-tree* helpers to create RTL variants.  This
> > > > is needed to check the behaviour of an RTL pass under LTO.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR libstdc++/116140
> > > > * doc/sourcebuild.texi: Document ltrans-rtl value of kind for
> > > > scan--dump*.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR libstdc++/116140
> > > > * lib/scanltranstree.exp (scan-ltrans-rtl-dump): New.
> > > > (scan-ltrans-rtl-dump-not): New.
> > > > (scan-ltrans-rtl-dump-dem): New.
> > > > (scan-ltrans-rtl-dump-dem-not): New.
> > > > (scan-ltrans-rtl-dump-times): New.
> > >
> > > The patch only contains the gcc/testsuite changes, but those are ok
> > > for trunk, thanks.
> >
> > Gah, sorry -- those got lost in the rebase.  Is it OK to commit this
> > together with the doc changes included as per the previous patch?
> 
> I am getting a new ERROR after this. Maybe you didn't notice this
> since you were looking for new FAIL.
> +ERROR: gcc.dg/ipa/ipa-icf-38.c: error executing dg-final: variable is
> not assigned by any conversion specifiers
> 
> The error corresponds to:
> /* { dg-final { scan-ltrans-tree-dump "Function foo" "optimized" } } */
> 
> That is the only testcase which uses `scan-ltrans-tree-dump*` even.

This should now be fixed as of
g:4b729d2ff3259e5b1d40f93d4f9e7edf5f0064f4.

Thanks,
Alex

> 
> Thanks,
> Andrew Pinski
> 
> 
> 
> >
> > Alex
> >
> > >
> > > Richard
> > >
> > > > ---
> > > >  gcc/testsuite/lib/scanltranstree.exp | 80 +---
> > > >  1 file changed, 37 insertions(+), 43 deletions(-)
> > > >
> > > > diff --git a/gcc/testsuite/lib/scanltranstree.exp 
> > > > b/gcc/testsuite/lib/scanltranstree.exp
> > > > index bc6e02dc369..a7d4de3765f 100644
> > > > --- a/gcc/testsuite/lib/scanltranstree.exp
> > > > +++ b/gcc/testsuite/lib/scanltranstree.exp
> > > > @@ -19,50 +19,44 @@
> > > >
> > > >  load_lib scandump.exp
> > > >
> > > > -# The first item in the list is an LTO equivalent of the second item
> > > > -# in the list; see the documentation of the second item for details.
> > > > -foreach { name scan type suffix } {
> > > > -scan-ltrans-tree-dump scan-dump ltrans-tree t
> > > > -scan-ltrans-tree-dump-not scan-dump-not ltrans-tree t
> > > > -scan-ltrans-tree-dump-dem scan-dump-dem ltrans-tree t
> > > > -scan-ltrans-tree-dump-dem-not scan-dump-dem-not ltrans-tree t
> > > > -} {
> > > > -eval [string map [list @NAME@ $name \
> > > > -  @SCAN@ $scan \
> > > > -  @TYPE@ $type \
> > > > -  @SUFFIX@ $suffix] {
> > > > -proc @NAME@ { args } {
> > > > -   if { [llength $args] < 2 } {
> > > > -   error "@NAME@: too few arguments"
> > > > -   return
> > > > -   }
> > > > -   if { [llength $args] > 3 } {
> > > > -   error "@NAME@: too many arguments"
> > > > -   return
> > > > +# Define scan-ltrans-{tree,rtl}-dump{,-not,-dem,-dem-not}.  These are 
> > > > LTO
> > > > +# variants of the corresponding functions without -ltrans in the name.
> > > > +foreach ir { tree rtl } {
> > > > +foreach modifier { {} -not -dem -dem-not } {
> > > > +   eval [string map [list @NAME@ scan-ltrans-$ir-dump$modifier \
> > > > +  

[committed] testsuite: Fix up refactored scanltranstree.exp functions [PR116522]

2024-08-29 Thread Alex Coplan
When adding RTL variants of the scan-ltrans-tree* functions in:
r15-3254-g3f51f0dc88ec21c1ec79df694200f10ef85915f4
I messed up the name of the underlying scan function to invoke.  The
code currently attempts to invoke functions named
scan{,-not,-dem,-dem-not} but should instead be invoking
scan-dump{,-not,-dem,-dem-not}.  This patch fixes that.

I missed this in testing because dg-cmp-results.sh (at least by default)
doesn't report new ERRORs.  I'll use a more robust way of comparing
test results in the future.

The problem didn't affect the scan-ltrans-{tree,rtl}-dump-times
functions, and I only spot-checked a test using one of those functions.
Apologies for the breakage.

Tested on aarch64.  No regressions, and verified that the ERROR in
gcc.dg/ipa/ipa-icf-38.c goes away.  Pushing to trunk as obvious.

Alex

gcc/testsuite/ChangeLog:

PR testsuite/116522
* lib/scanltranstree.exp: Fix name of underlying scan function
used for scan-ltrans-{tree,rtl}-dump{,-not,-dem,-dem-not}.
diff --git a/gcc/testsuite/lib/scanltranstree.exp 
b/gcc/testsuite/lib/scanltranstree.exp
index a7d4de3765f..3d85813ea2f 100644
--- a/gcc/testsuite/lib/scanltranstree.exp
+++ b/gcc/testsuite/lib/scanltranstree.exp
@@ -24,7 +24,7 @@ load_lib scandump.exp
 foreach ir { tree rtl } {
 foreach modifier { {} -not -dem -dem-not } {
eval [string map [list @NAME@ scan-ltrans-$ir-dump$modifier \
-  @SCAN@ scan$modifier \
+  @SCAN@ scan-dump$modifier \
   @TYPE@ ltrans-$ir \
   @SUFFIX@ [string index $ir 0]] {
proc @NAME@ { args } {


Re: [PATCH v2 2/5] testsuite: Add scan-ltrans-rtl* for use in dg-final [PR116140]

2024-08-28 Thread Alex Coplan
On 28/08/2024 11:53, Richard Sandiford wrote:
> Alex Coplan  writes:
> > Hi,
> >
> > This is a v2 of:
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659966.html
> > which is rebased on top of Richard S's patch to reduce the cut-and-paste in
> > scanltranstree.exp (thanks again for doing that).
> >
> > Tested on aarch64-linux-gnu, OK for trunk?
> >
> > Thanks,
> > Alex
> >
> > -- >8 --
> >
> > This extends the scan-ltrans-tree* helpers to create RTL variants.  This
> > is needed to check the behaviour of an RTL pass under LTO.
> >
> > gcc/ChangeLog:
> >
> > PR libstdc++/116140
> > * doc/sourcebuild.texi: Document ltrans-rtl value of kind for
> > scan--dump*.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR libstdc++/116140
> > * lib/scanltranstree.exp (scan-ltrans-rtl-dump): New.
> > (scan-ltrans-rtl-dump-not): New.
> > (scan-ltrans-rtl-dump-dem): New.
> > (scan-ltrans-rtl-dump-dem-not): New.
> > (scan-ltrans-rtl-dump-times): New.
> 
> The patch only contains the gcc/testsuite changes, but those are ok
> for trunk, thanks.

Gah, sorry -- those got lost in the rebase.  Is it OK to commit this
together with the doc changes included as per the previous patch?

Alex

> 
> Richard
> 
> > ---
> >  gcc/testsuite/lib/scanltranstree.exp | 80 +---
> >  1 file changed, 37 insertions(+), 43 deletions(-)
> >
> > diff --git a/gcc/testsuite/lib/scanltranstree.exp 
> > b/gcc/testsuite/lib/scanltranstree.exp
> > index bc6e02dc369..a7d4de3765f 100644
> > --- a/gcc/testsuite/lib/scanltranstree.exp
> > +++ b/gcc/testsuite/lib/scanltranstree.exp
> > @@ -19,50 +19,44 @@
> >  
> >  load_lib scandump.exp
> >  
> > -# The first item in the list is an LTO equivalent of the second item
> > -# in the list; see the documentation of the second item for details.
> > -foreach { name scan type suffix } {
> > -scan-ltrans-tree-dump scan-dump ltrans-tree t
> > -scan-ltrans-tree-dump-not scan-dump-not ltrans-tree t
> > -scan-ltrans-tree-dump-dem scan-dump-dem ltrans-tree t
> > -scan-ltrans-tree-dump-dem-not scan-dump-dem-not ltrans-tree t
> > -} {
> > -eval [string map [list @NAME@ $name \
> > -  @SCAN@ $scan \
> > -  @TYPE@ $type \
> > -  @SUFFIX@ $suffix] {
> > -proc @NAME@ { args } {
> > -   if { [llength $args] < 2 } {
> > -   error "@NAME@: too few arguments"
> > -   return
> > -   }
> > -   if { [llength $args] > 3 } {
> > -   error "@NAME@: too many arguments"
> > -   return
> > +# Define scan-ltrans-{tree,rtl}-dump{,-not,-dem,-dem-not}.  These are LTO
> > +# variants of the corresponding functions without -ltrans in the name.
> > +foreach ir { tree rtl } {
> > +foreach modifier { {} -not -dem -dem-not } {
> > +   eval [string map [list @NAME@ scan-ltrans-$ir-dump$modifier \
> > +  @SCAN@ scan$modifier \
> > +  @TYPE@ ltrans-$ir \
> > +  @SUFFIX@ [string index $ir 0]] {
> > +   proc @NAME@ { args } {
> > +   if { [llength $args] < 2 } {
> > +   error "@NAME@: too few arguments"
> > +   return
> > +   }
> > +   if { [llength $args] > 3 } {
> > +   error "@NAME@: too many arguments"
> > +   return
> > +   }
> > +   if { [llength $args] >= 3 } {
> > +   @SCAN@ @TYPE@ [lindex $args 0] \
> > +   "\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
> > +   ".ltrans0.ltrans" \
> > +   [lindex $args 2]
> > +   } else {
> > +   @SCAN@ @TYPE@ [lindex $args 0] \
> > +   "\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
> > +   ".ltrans0.ltrans"
> > +   }
> > }
> > -   if { [llength $args] >= 3 } {
> > -   @SCAN@ @TYPE@ [lindex $args 0] \
> > -   "\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
> > -   ".ltrans0.ltrans" \
> > -   [lindex $args 2]
> > -   } else {
> > -   @SCAN@ @TYPE@ [lindex $args 0] \
> > -   "\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $

[PATCH v2 2/5] testsuite: Add scan-ltrans-rtl* for use in dg-final [PR116140]

2024-08-28 Thread Alex Coplan
Hi,

This is a v2 of:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659966.html
which is rebased on top of Richard S's patch to reduce the cut-and-paste in
scanltranstree.exp (thanks again for doing that).

Tested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

-- >8 --

This extends the scan-ltrans-tree* helpers to create RTL variants.  This
is needed to check the behaviour of an RTL pass under LTO.

gcc/ChangeLog:

PR libstdc++/116140
* doc/sourcebuild.texi: Document ltrans-rtl value of kind for
scan--dump*.

gcc/testsuite/ChangeLog:

PR libstdc++/116140
* lib/scanltranstree.exp (scan-ltrans-rtl-dump): New.
(scan-ltrans-rtl-dump-not): New.
(scan-ltrans-rtl-dump-dem): New.
(scan-ltrans-rtl-dump-dem-not): New.
(scan-ltrans-rtl-dump-times): New.
---
 gcc/testsuite/lib/scanltranstree.exp | 80 +---
 1 file changed, 37 insertions(+), 43 deletions(-)

diff --git a/gcc/testsuite/lib/scanltranstree.exp b/gcc/testsuite/lib/scanltranstree.exp
index bc6e02dc369..a7d4de3765f 100644
--- a/gcc/testsuite/lib/scanltranstree.exp
+++ b/gcc/testsuite/lib/scanltranstree.exp
@@ -19,50 +19,44 @@
 
 load_lib scandump.exp
 
-# The first item in the list is an LTO equivalent of the second item
-# in the list; see the documentation of the second item for details.
-foreach { name scan type suffix } {
-scan-ltrans-tree-dump scan-dump ltrans-tree t
-scan-ltrans-tree-dump-not scan-dump-not ltrans-tree t
-scan-ltrans-tree-dump-dem scan-dump-dem ltrans-tree t
-scan-ltrans-tree-dump-dem-not scan-dump-dem-not ltrans-tree t
-} {
-eval [string map [list @NAME@ $name \
-			   @SCAN@ $scan \
-			   @TYPE@ $type \
-			   @SUFFIX@ $suffix] {
-proc @NAME@ { args } {
-	if { [llength $args] < 2 } {
-		error "@NAME@: too few arguments"
-		return
-	}
-	if { [llength $args] > 3 } {
-		error "@NAME@: too many arguments"
-		return
+# Define scan-ltrans-{tree,rtl}-dump{,-not,-dem,-dem-not}.  These are LTO
+# variants of the corresponding functions without -ltrans in the name.
+foreach ir { tree rtl } {
+foreach modifier { {} -not -dem -dem-not } {
+	eval [string map [list @NAME@ scan-ltrans-$ir-dump$modifier \
+			   @SCAN@ scan$modifier \
+			   @TYPE@ ltrans-$ir \
+			   @SUFFIX@ [string index $ir 0]] {
+	proc @NAME@ { args } {
+		if { [llength $args] < 2 } {
+		error "@NAME@: too few arguments"
+		return
+		}
+		if { [llength $args] > 3 } {
+		error "@NAME@: too many arguments"
+		return
+		}
+		if { [llength $args] >= 3 } {
+		@SCAN@ @TYPE@ [lindex $args 0] \
+			"\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
+			".ltrans0.ltrans" \
+			[lindex $args 2]
+		} else {
+		@SCAN@ @TYPE@ [lindex $args 0] \
+			"\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
+			".ltrans0.ltrans"
+		}
 	}
-	if { [llength $args] >= 3 } {
-		@SCAN@ @TYPE@ [lindex $args 0] \
-		"\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
-		".ltrans0.ltrans" \
-		[lindex $args 2]
-	} else {
-		@SCAN@ @TYPE@ [lindex $args 0] \
-		"\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
-		".ltrans0.ltrans"
-	}
-}
-}]
+	}]
+}
 }
 
-# The first item in the list is an LTO equivalent of the second item
-# in the list; see the documentation of the second item for details.
-foreach { name scan type suffix } {
-scan-ltrans-tree-dump-times scan-dump-times ltrans-tree t
-} {
-eval [string map [list @NAME@ $name \
-			   @SCAN@ $scan \
-			   @TYPE@ $type \
-			   @SUFFIX@ $suffix] {
+# Define scan-ltrans-{tree,rtl}-dump-times.  These are LTO variants of the
+# corresponding functions without -ltrans in the name.
+foreach ir { tree rtl } {
+eval [string map [list @NAME@ scan-ltrans-$ir-dump-times \
+			   @TYPE@ ltrans-$ir \
+			   @SUFFIX@ [string index $ir 0]] {
 	proc @NAME@ { args } {
 	if { [llength $args] < 3 } {
 		error "@NAME@: too few arguments"
@@ -73,11 +67,11 @@ foreach { name scan type suffix } {
 		return
 	}
 	if { [llength $args] >= 4 } {
-		@SCAN@ "@TYPE@" [lindex $args 0] [lindex $args 1] \
+		scan-dump-times "@TYPE@" [lindex $args 0] [lindex $args 1] \
 		"\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 2]" \
 		".ltrans0.ltrans" [lindex $args 3]
 	} else {
-		@SCAN@ "@TYPE@" [lindex $args 0] [lindex $args 1] \
+		scan-dump-times "@TYPE@" [lindex $args 0] [lindex $args 1] \
 		"\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 2]" \
 		".ltrans0.ltrans"
 	}


[PATCH v2 4/5] lto: Stream has_unroll flag during LTO [PR116140]

2024-08-28 Thread Alex Coplan
This is a v2 of:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659969.html
this version just streams the flag as requested.

Bootstrapped/tested as a series on aarch64-linux-gnu (both with and
without LTO), OK for trunk?

Thanks,
Alex

-- >8 --

When #pragma GCC unroll is processed in
tree-cfg.cc:replace_loop_annotate_in_block, we set both the loop->unroll
field (which is currently streamed during LTO) but also the
cfun->has_unroll flag.

cfun->has_unroll, however, is not currently streamed during LTO.  This
patch fixes that.

Prior to this patch, loops marked with #pragma GCC unroll that would be
unrolled by RTL loop2_unroll in a non-LTO compilation didn't get
unrolled under LTO.

gcc/ChangeLog:

PR libstdc++/116140
* lto-streamer-in.cc (input_struct_function_base): Stream in
fn->has_unroll.
* lto-streamer-out.cc (output_struct_function_base): Stream out
fn->has_unroll.

gcc/testsuite/ChangeLog:

PR libstdc++/116140
* g++.dg/ext/pragma-unroll-lambda-lto.C: New test.
---
 gcc/lto-streamer-in.cc|  1 +
 gcc/lto-streamer-out.cc   |  1 +
 .../g++.dg/ext/pragma-unroll-lambda-lto.C | 32 +++
 3 files changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C

diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 64f75807328..9d0ec5d589c 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1326,6 +1326,7 @@ input_struct_function_base (struct function *fn, class data_in *data_in,
   fn->has_force_vectorize_loops = bp_unpack_value (&bp, 1);
   fn->has_simduid_loops = bp_unpack_value (&bp, 1);
   fn->has_musttail = bp_unpack_value (&bp, 1);
+  fn->has_unroll = bp_unpack_value (&bp, 1);
   fn->assume_function = bp_unpack_value (&bp, 1);
   fn->va_list_fpr_size = bp_unpack_value (&bp, 8);
   fn->va_list_gpr_size = bp_unpack_value (&bp, 8);
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index 523d6dad221..52d0d737330 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -2283,6 +2283,7 @@ output_struct_function_base (struct output_block *ob, struct function *fn)
   bp_pack_value (&bp, fn->has_force_vectorize_loops, 1);
   bp_pack_value (&bp, fn->has_simduid_loops, 1);
   bp_pack_value (&bp, fn->has_musttail, 1);
+  bp_pack_value (&bp, fn->has_unroll, 1);
   bp_pack_value (&bp, fn->assume_function, 1);
   bp_pack_value (&bp, fn->va_list_fpr_size, 8);
   bp_pack_value (&bp, fn->va_list_gpr_size, 8);
diff --git a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
new file mode 100644
index 000..144c4c32692
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
@@ -0,0 +1,32 @@
+// { dg-do link { target c++11 } }
+// { dg-options "-O2 -flto -fdump-rtl-loop2_unroll" }
+
+#include 
+
+template
+inline Iter
+my_find(Iter first, Iter last, Pred pred)
+{
+#pragma GCC unroll 4
+while (first != last && !pred(*first))
+++first;
+return first;
+}
+
+__attribute__((noipa))
+short *use_find(short *p)
+{
+auto pred = [](short x) { return x == 42; };
+return my_find(p, p + 1024, pred);
+}
+
+int main(void)
+{
+  short a[1024];
+  for (int i = 0; i < 1024; i++)
+a[i] = rand ();
+
+  return use_find (a) - a;
+}
+
+// { dg-final { scan-ltrans-rtl-dump-times "Unrolled loop 3 times" 1 "loop2_unroll" } }


Re: [PATCH] testsuite: Reduce cut-&-paste in scanltranstree.exp

2024-08-27 Thread Alex Coplan
On 15/08/2024 13:55, Richard Sandiford wrote:
> scanltranstree.exp defines some LTO wrappers around standard
> non-LTO scanners.  Four of them are cut-&-paste variants of
> one another, so this patch generates them from a single template.
> It also does the same for scan-ltrans-tree-dump-times, so that
> other *-times scanners can be added easily in future.
> 
> The scanners seem to be lightly used.  gcc.dg/ipa/ipa-icf-38.c uses
> scan-ltrans-tree-dump{,-not} and libgomp.c/declare-variant-1.c
> uses scan-ltrans-tree-dump-{not,times}.  Nothing currently seems
> to use scan-ltrans-tree-dump-dem*.
> 
> Tested on the files above so far.  Surprisingly, it worked first time,
> but I tested that deliberately introduced mistakes were flagged.
> (That's my story anyway.)  OK if it passes full testing on
> aarch64-linux-gnu & x86_64-linux-gnu?

Thanks for doing this.  I had the feeling when trying to add the RTL
variants of the scanners that the code needed refactoring, but my Tcl
skills really weren't up to the job.  I learned a lot about Tcl by
trying to make sense of this patch.

I'll try adding the RTL variants on top of this.

Thanks,
Alex

> 
> Richard
> 
> 
> gcc/testsuite/
>   * lib/scanltranstree.exp: Redefine the routines using two
>   templates.
> ---
>  gcc/testsuite/lib/scanltranstree.exp | 186 +--
>  1 file changed, 62 insertions(+), 124 deletions(-)
> 
> diff --git a/gcc/testsuite/lib/scanltranstree.exp 
> b/gcc/testsuite/lib/scanltranstree.exp
> index 79f05f0ffed..bc6e02dc369 100644
> --- a/gcc/testsuite/lib/scanltranstree.exp
> +++ b/gcc/testsuite/lib/scanltranstree.exp
> @@ -19,130 +19,68 @@
>  
>  load_lib scandump.exp
>  
> -# Utility for scanning compiler result, invoked via dg-final.
> -# Call pass if pattern is present, otherwise fail.
> -#
> -# Argument 0 is the regexp to match
> -# Argument 1 is the name of the dumped tree pass
> -# Argument 2 handles expected failures and the like
> -proc scan-ltrans-tree-dump { args } {
> -
> -if { [llength $args] < 2 } {
> - error "scan-ltrans-tree-dump: too few arguments"
> - return
> -}
> -if { [llength $args] > 3 } {
> - error "scan-ltrans-tree-dump: too many arguments"
> - return
> -}
> -if { [llength $args] >= 3 } {
> - scan-dump "ltrans-tree" [lindex $args 0] \
> -   "\[0-9\]\[0-9\]\[0-9\]t.[lindex $args 1]" ".ltrans0.ltrans" \
> -   [lindex $args 2]
> -} else {
> - scan-dump "ltrans-tree" [lindex $args 0] \
> -   "\[0-9\]\[0-9\]\[0-9\]t.[lindex $args 1]" ".ltrans0.ltrans"
> -}
> -}
> -
> -# Call pass if pattern is present given number of times, otherwise fail.
> -# Argument 0 is the regexp to match
> -# Argument 1 is number of times the regexp must be found
> -# Argument 2 is the name of the dumped tree pass
> -# Argument 3 handles expected failures and the like
> -proc scan-ltrans-tree-dump-times { args } {
> -
> -if { [llength $args] < 3 } {
> - error "scan-ltrans-tree-dump-times: too few arguments"
> - return
> -}
> -if { [llength $args] > 4 } {
> - error "scan-ltrans-tree-dump-times: too many arguments"
> - return
> -}
> -if { [llength $args] >= 4 } {
> - scan-dump-times "ltrans-tree" [lindex $args 0] [lindex $args 1] \
> - "\[0-9\]\[0-9\]\[0-9\]t.[lindex $args 2]" \
> - ".ltrans0.ltrans" [lindex $args 3]
> -} else {
> - scan-dump-times "ltrans-tree" [lindex $args 0] [lindex $args 1] \
> - "\[0-9\]\[0-9\]\[0-9\]t.[lindex $args 2]" 
> ".ltrans0.ltrans"
> -}
> +# The first item in the list is an LTO equivalent of the second item
> +# in the list; see the documentation of the second item for details.
> +foreach { name scan type suffix } {
> +scan-ltrans-tree-dump scan-dump ltrans-tree t
> +scan-ltrans-tree-dump-not scan-dump-not ltrans-tree t
> +scan-ltrans-tree-dump-dem scan-dump-dem ltrans-tree t
> +scan-ltrans-tree-dump-dem-not scan-dump-dem-not ltrans-tree t
> +} {
> +eval [string map [list @NAME@ $name \
> +@SCAN@ $scan \
> +@TYPE@ $type \
> +@SUFFIX@ $suffix] {
> +proc @NAME@ { args } {
> + if { [llength $args] < 2 } {
> + error "@NAME@: too few arguments"
> + return
> + }
> + if { [llength $args] > 3 } {
> + error "@NAME@: too many arguments"
> + return
> + }
> + if { [llength $args] >= 3 } {
> + @SCAN@ @TYPE@ [lindex $args 0] \
> + "\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
> + ".ltrans0.ltrans" \
> + [lindex $args 2]
> + } else {
> + @SCAN@ @TYPE@ [lindex $args 0] \
> + "\[0-9\]\[0-9\]\[0-9\]@SUFFIX@.[lindex $args 1]" \
> + ".ltrans0.ltrans"
> + }
> +}
> +}]
>  }
>  
> -# Call pass if patt

[PATCH v3] c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-08-27 Thread Alex Coplan
Hi,

This is a v3 that hopefully addresses the feedback from both Jason and
Jakub.  v2 was posted here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660191.html

(Sorry for the delay in posting the re-spin, I was away last week.)

In this version we refactor to introudce a helper class (annotate_saver)
which is much less invasive to the caller (maybe_convert_cond) and
should (at least in theory) be reuseable elsewhere.

This version also relies on the assumption that operands 1 and 2 of
ANNOTATE_EXPRs are INTEGER_CSTs, which simplifies the flag updates
without having to rely on assumptions about the specific changes made
in maybe_convert_cond.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

-- >8 --

For the testcase added with this patch, we would end up losing the:

  #pragma GCC unroll 4

and emitting "warning: ignoring loop annotation".  That warning comes
from tree-cfg.cc:replace_loop_annotate, and means that we failed to
process the ANNOTATE_EXPR in tree-cfg.cc:replace_loop_annotate_in_block.
That function walks backwards over the GIMPLE in an exiting BB for a
loop, skipping over the final gcond, and looks for any ANNOTATE_EXPRS
immediately preceding the gcond.

The function documents the following pre-condition:

   /* [...] We assume that the annotations come immediately before the
  condition in BB, if any.  */

now looking at the exiting BB of the loop, we have:

   :
  D.4524 = .ANNOTATE (iftmp.1, 1, 4);
  retval.0 = D.4524;
  if (retval.0 != 0)
goto ; [INV]
  else
goto ; [INV]

and crucially there is an intervening assignment between the gcond and
the preceding .ANNOTATE ifn call.  To see where this comes from, we can
look to the IR given by -fdump-tree-original:

  if (<::operator() (&pred, *first), unroll 4>>>)
goto ;
  else
goto ;

here the problem is that we've wrapped a CLEANUP_POINT_EXPR around the
ANNOTATE_EXPR, meaning the ANNOTATE_EXPR is no longer the outermost
expression in the condition.

The CLEANUP_POINT_EXPR gets added by the following call chain:

finish_while_stmt_cond
 -> maybe_convert_cond
 -> condition_conversion
 -> fold_build_cleanup_point_expr

this patch chooses to fix the issue by first introducing a new helper
class (annotate_saver) to save and restore outer chains of
ANNOTATE_EXPRs and then using it in maybe_convert_cond.

With this patch, we don't get any such warning and the loop gets unrolled as
expected at -O2.

gcc/cp/ChangeLog:

PR libstdc++/116140
* semantics.cc (anotate_saver): New. Use it ...
(maybe_convert_cond): ... here, to ensure any ANNOTATE_EXPRs
remain the outermost expression(s) of the condition.

gcc/testsuite/ChangeLog:

PR libstdc++/116140
* g++.dg/ext/pragma-unroll-lambda.C: New test.
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 5ab2076b673..b1a49b14238 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -951,6 +951,86 @@ maybe_warn_unparenthesized_assignment (tree t, bool 
nested_p,
 }
 }
 
+/* Helper class for saving/restoring ANNOTATE_EXPRs.  For a tree node t, users
+   can construct one of these like so:
+
+ annotate_saver s (&t);
+
+   and t will be updated to have any annotations removed.  The user can then
+   transform t, and later restore the ANNOTATE_EXPRs with:
+
+ t = s.restore (t).
+
+   The intent is to ensure that any ANNOTATE_EXPRs remain the outermost
+   expressions following any operations on t.  */
+
+class annotate_saver {
+  /* The chain of saved annotations, if there were any.  Otherwise null.  */
+  tree m_annotations;
+
+  /* If M_ANNOTATIONS is non-null, then M_INNER points to TREE_OPERAND (A, 0)
+ for the innermost annotation A.  */
+  tree *m_inner;
+
+public:
+  annotate_saver (tree *);
+  tree restore (tree);
+};
+
+/* If *COND is an ANNOTATE_EXPR, walk through the chain of annotations, and set
+   *COND equal to the first non-ANNOTATE_EXPR (saving a pointer to the
+   original chain of annotations for later use in restore).  */
+
+annotate_saver::annotate_saver (tree *cond) : m_annotations (nullptr)
+{
+  tree *t = cond;
+  while (TREE_CODE (*t) == ANNOTATE_EXPR)
+t = &TREE_OPERAND (*t, 0);
+
+  if (t != cond)
+{
+  m_annotations = *cond;
+  *cond = *t;
+  m_inner = t;
+}
+}
+
+/* If we didn't strip any annotations on construction, return NEW_INNER
+   unmodified.  Otherwise, wrap the saved annotations around NEW_INNER 
(updating
+   the types and flags of the annotations if needed) and return the resulting
+   expression.  */
+
+tree
+annotate_saver::restore (tree new_inner)
+{
+  if (!m_annotations)
+return new_inner;
+
+  /* If the type of the inner expression changed, we need to update the types
+ of all the ANNOTATE_EXPRs.  We may need to update the flags too, but we
+ assume they only change if the type of the inner expression changes.
+ The flag update logic assumes that the other operands to the
+ ANNOTATE_EXPRs are alway

Re: [PATCH v2] c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-08-16 Thread Alex Coplan
On 16/08/2024 12:47, Jakub Jelinek wrote:
> On Fri, Aug 16, 2024 at 11:38:03AM +0100, Alex Coplan wrote:
> > Looking at the calls to build3 (ANNOTATE_EXPR, ...) in cp/semantics.cc,
> > it looks like the other two operands of ANNOTATE_EXPRs are only ever
> > INTEGER_CSTs (the code in tree-cfg.cc:replace_loop_annotate_in_block
> > corroborates this).
> 
> As long as we don't add new ANNOTATE_EXPR kinds with non-constant arguments,
> but we don't have them right now.
> 
> > I think an INTEGER_CST C will always have:
> > 
> >   TREE_SIDE_EFFECTS (C) = TREE_READONLY (C) = 0
> 
> That is true.
> > 
> > and since the TREE_READONLY flags are conjunctive and TREE_SIDE_EFFECTS
> > flags are disjunctive, for an ANNOTATE_EXPR A we will necessarily have:
> > 
> >  - TREE_READONLY (A) = 0
> 
> No.  The TREE_READONLY computation is:
> read_only = 1;
> ...
> if (!TREE_READONLY (arg##N) \
> && !CONSTANT_CLASS_P (arg##N))  \
>   (void) (read_only = 0);   \
> While INTEGER_CST isn't TREE_READONLY, it is CONSTANT_CLASS_P.
> 
> >  - TREE_SIDE_EFFECTS (A) = TREE_SIDE_EFFECTS (TREE_OPERAND (A, 0))
> 
> So, unless we add non-INTEGER_CST extra arguments to ANNOTATE_EXPR,
>   TREE_READONLY (A) = TREE_READONLY (TREE_OPERAND (A, 0))
> || CONSTANT_CLASS_P (TREE_OPERAND (A, 0));

Ah, right.  I was going off memory of what we discussed so far and didn't look
at what PROCESS_ARG actually does.  Thanks.

In any case, this avoids the need for the checking in the change of
direction of the flags (although perhaps pushes the problem elsewhere in
that now we arguably need to check that operands 1 and 2 of each
ANNOTATE_EXPR is an INTEGER_CST, unless we want to just rely on that
assumption unchecked).

> Not really sure if the first argument will ever be say INTEGER_CST,
> #pragma GCC unroll 8
> while (1)
> {
>   if (something)
> break;
> }
> ?
> 
>   Jakub
> 


Re: [PATCH v2] c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-08-16 Thread Alex Coplan
On 15/08/2024 16:55, Jason Merrill wrote:
> On 8/12/24 1:55 PM, Alex Coplan wrote:
> > Hi!
> > 
> > This is a v2 patch of:
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659968.html
> > that addresses Jakub's feedback.
> > 
> > FWIW, I tried to contrive a testcase where convert_from_reference kicks
> > in and we get called with an ANNOTATE_EXPR in maybe_convert_cond, but to
> > no avail.
> 
> Yes, the convert_from_reference shouldn't have any effect here, that should
> have happened already when processing the condition expression.
> 
> > However, I did see cases (both in hand-written testcases and
> > in the testsuite, e.g. g++.dg/ext/pr114409-2.C) where the subsequent
> > call to condition_conversion would change the type (e.g. from int to
> > bool), which shows the need for updating the types in the ANNOTATE_EXPR
> > chain -- thanks for pointing that out, Jakub!
> > 
> > Personally, I feel the handling of the flags (in this patch, as per
> > Jakub's suggestion) is a bit of a premature optimization.  It seems
> > cleaner (and safer) to me just to re-build the annotations if needed
> > (i.e. in the case that the type changed).  You could even have a nice
> > abstraction that encapsulates the stripping and re-building of
> > ANNOTATE_EXPRs, so that it doesn't clutter the caller quite so much.
> 
> I'm sympathetic that the optimization is not very significant, but neither
> is updating the flags.  You could also factor it out for the same less
> clutter in the caller?

Good point, I'll see if I can't factor things out with the in-place update
approach.

> 
> > +  /* If the type of *CONDP changed (e.g. due to 
> > convert_from_reference) then
> 
> As discussed, this is much more likely to be from condition_conversion.
> 
> > +the flags may have changed too.  The logic in the loop below relies on
> > +the flags only being changed in the following directions (if at all):
> > +  TREE_SIDE_EFFECTS : 0 -> 1
> > +  TREE_READONLY : 1 -> 0
> > +thus avoiding re-computing the flags from scratch (e.g. via build3), so
> > +let's verify that this precondition holds.  */
> 
> Is there any case where an ANNOTATE_EXPR can have different
> READONLY/SIDE_EFFECTS flags from its operand?  It would be simpler to just
> copy the flags and not bother with the checking.

Looking at the calls to build3 (ANNOTATE_EXPR, ...) in cp/semantics.cc,
it looks like the other two operands of ANNOTATE_EXPRs are only ever
INTEGER_CSTs (the code in tree-cfg.cc:replace_loop_annotate_in_block
corroborates this).

I think an INTEGER_CST C will always have:

  TREE_SIDE_EFFECTS (C) = TREE_READONLY (C) = 0

and since the TREE_READONLY flags are conjunctive and TREE_SIDE_EFFECTS
flags are disjunctive, for an ANNOTATE_EXPR A we will necessarily have:

 - TREE_READONLY (A) = 0
 - TREE_SIDE_EFFECTS (A) = TREE_SIDE_EFFECTS (TREE_OPERAND (A, 0))

so indeed I think this can be simplified significantly, since the above
means we needn't update TREE_READONLY, and TREE_SIDE_EFFECTS can be set
to that of the updated operand (without the checking).

I'll adjust the patch to account for this and try to factor things out
as suggested above.

Thanks a lot for the review.

Alex

> 
> > +#define CHECK_FLAG_CHANGE(prop, value)\
> > +  gcc_checking_assert (prop (orig_inner) == prop (*condp) || prop 
> > (*condp) == value)
> > +  CHECK_FLAG_CHANGE (TREE_SIDE_EFFECTS, 1);
> > +  CHECK_FLAG_CHANGE (TREE_READONLY, 0);
> > +#undef CHECK_FLAG_CHANGE
> > +  for (tree c = cond; c != *condp; c = TREE_OPERAND (c, 0))
> > +   {
> > + gcc_checking_assert (TREE_CODE (c) == ANNOTATE_EXPR);
> > + TREE_TYPE (c) = TREE_TYPE (*condp);
> > + TREE_SIDE_EFFECTS (c) |= TREE_SIDE_EFFECTS (*condp);
> > + TREE_READONLY (c) &= TREE_READONLY (*condp);
> > +   }
> 
> 


[PATCH v2] c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-08-12 Thread Alex Coplan
Hi!

This is a v2 patch of:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659968.html
that addresses Jakub's feedback.

FWIW, I tried to contrive a testcase where convert_from_reference kicks
in and we get called with an ANNOTATE_EXPR in maybe_convert_cond, but to
no avail.  However, I did see cases (both in hand-written testcases and
in the testsuite, e.g. g++.dg/ext/pr114409-2.C) where the subsequent
call to condition_conversion would change the type (e.g. from int to
bool), which shows the need for updating the types in the ANNOTATE_EXPR
chain -- thanks for pointing that out, Jakub!

Personally, I feel the handling of the flags (in this patch, as per
Jakub's suggestion) is a bit of a premature optimization.  It seems
cleaner (and safer) to me just to re-build the annotations if needed
(i.e. in the case that the type changed).  You could even have a nice
abstraction that encapsulates the stripping and re-building of
ANNOTATE_EXPRs, so that it doesn't clutter the caller quite so much.

However, I understand the desire to avoid re-allocating here (even if
this should be a fairly uncommon case), hence this patch goes with the
approach suggested by Jakub.  I'm happy to be persuaded to do otherwise
by C++ maintainers :)

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

-- >8 --

For the testcase added with this patch, we would end up losing the:

  #pragma GCC unroll 4

and emitting "warning: ignoring loop annotation".  That warning comes
from tree-cfg.cc:replace_loop_annotate, and means that we failed to
process the ANNOTATE_EXPR in tree-cfg.cc:replace_loop_annotate_in_block.
That function walks backwards over the GIMPLE in an exiting BB for a
loop, skipping over the final gcond, and looks for any ANNOTATE_EXPRS
immediately preceding the gcond.

The function documents the following pre-condition:

   /* [...] We assume that the annotations come immediately before the
  condition in BB, if any.  */

now looking at the exiting BB of the loop, we have:

   :
  D.4524 = .ANNOTATE (iftmp.1, 1, 4);
  retval.0 = D.4524;
  if (retval.0 != 0)
goto ; [INV]
  else
goto ; [INV]

and crucially there is an intervening assignment between the gcond and
the preceding .ANNOTATE ifn call.  To see where this comes from, we can
look to the IR given by -fdump-tree-original:

  if (<::operator() (&pred, *first), unroll 4>>>)
goto ;
  else
goto ;

here the problem is that we've wrapped a CLEANUP_POINT_EXPR around the
ANNOTATE_EXPR, meaning the ANNOTATE_EXPR is no longer the outermost
expression in the condition.

The CLEANUP_POINT_EXPR gets added by the following call chain:

finish_while_stmt_cond
 -> maybe_convert_cond
 -> condition_conversion
 -> fold_build_cleanup_point_expr

this patch chooses to fix the issue in maybe_convert_cond by walking through
any ANNOTATE_EXPRs and doing any condition conversion on the inner expression,
leaving the ANNOTATE_EXPRs (if any) as the outermost expressions in the
condition.

With this patch, we don't get any such warning and the loop gets unrolled as
expected at -O2.

gcc/cp/ChangeLog:

PR libstdc++/116140
* semantics.cc (maybe_convert_cond): Ensure any ANNOTATE_EXPRs
remain the outermost expression(s) of the condition.

gcc/testsuite/ChangeLog:

PR libstdc++/116140
* g++.dg/ext/pragma-unroll-lambda.C: New test.

Co-Authored-By: Jakub Jelinek 
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index e58612660c9..e128061e93d 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -966,25 +966,62 @@ maybe_convert_cond (tree cond)
   if (type_dependent_expression_p (cond))
 return cond;
 
+  /* If the condition has any ANNOTATE_EXPRs, those must remain the outermost
+ expressions of the condition.  Walk through these, performing the 
condition
+ conversion in place on the inner expression.  */
+  tree *condp = &cond;
+  while (TREE_CODE (*condp) == ANNOTATE_EXPR)
+condp = &TREE_OPERAND (*condp, 0);
+
+  tree orig_inner = *condp;
+
   /* For structured binding used in condition, the conversion needs to be
  evaluated before the individual variables are initialized in the
  std::tuple_{size,elemenet} case.  cp_finish_decomp saved the conversion
  result in a TARGET_EXPR, pick it up from there.  */
-  if (DECL_DECOMPOSITION_P (cond)
-  && DECL_DECOMP_IS_BASE (cond)
-  && DECL_DECOMP_BASE (cond)
-  && TREE_CODE (DECL_DECOMP_BASE (cond)) == TARGET_EXPR)
-cond = TARGET_EXPR_SLOT (DECL_DECOMP_BASE (cond));
+  if (DECL_DECOMPOSITION_P (*condp)
+  && DECL_DECOMP_IS_BASE (*condp)
+  && DECL_DECOMP_BASE (*condp)
+  && TREE_CODE (DECL_DECOMP_BASE (*condp)) == TARGET_EXPR)
+*condp = TARGET_EXPR_SLOT (DECL_DECOMP_BASE (*condp));
 
   if (warn_sequence_point && !processing_template_decl)
-verify_sequence_points (cond);
+verify_sequence_points (*condp);
 
-  maybe_warn_unparenthesized_assignment (cond, /*nested_p=*/false,
+  maybe_warn_unpare

Re: [PATCH 1/5] cp: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-08-09 Thread Alex Coplan
On 09/08/2024 17:56, Jakub Jelinek wrote:
> On Fri, Aug 09, 2024 at 04:46:55PM +0100, Alex Coplan wrote:
> > On 09/08/2024 17:34, Jakub Jelinek wrote:
> > > On Fri, Aug 09, 2024 at 04:21:14PM +0100, Alex Coplan wrote:
> > > > Hmm, good spot, I didn't realise that convert_from_reference could
> > > > change the type of the condition like this.
> > > > 
> > > > In that case I suppose the only thing to do is to construct a stack of
> > > > ANNOTATE_EXPRs on the way down and re-build the expressions (taking the
> > > > type of the inner expression, starting with the cond) on the way back
> > > > up.
> > > 
> > > I think you don't need to rebuild them, just update their types.
> > > Something along the lines of
> > >   for (tree c = cond; c != *condp; c = TREE_OPERAND (c, 0))
> > > {
> > >   gcc_checking_assert (TREE_CODE (c) == ANNOTATE_EXPR);
> > >   TREE_TYPE (c) = TREE_TYPE (*condp);
> > >}
> > 
> > I suppose I was just concerned about any other properties of
> > ANNOTATE_EXPRs that might be inherited from the operand (that could be
> > affected by such a change).
> > 
> > It looks like TREE_{READONLY,THIS_VOLATILE,SIDE_EFFECTS} are all set
> > in convert_from_reference and propagated by build3, so if those change
> > underneath us then only updating the type seems insufficient.
> 
> I think TREE_THIS_VOLATILE isn't, that is only for references.
> The others could change, but only in the !TREE_SIDE_EFFECTS ->
> TREE_SIDE_EFFECTS or TREE_READONLY -> !TREE_READONLY direction
> (the former if it was volatile bool &).
> So you could also in the loop update it just in case,
>   TREE_SIDE_EFFECTS (c) |= TREE_SIDE_EFFECTS (*condp);
>   TREE_READONLY (c) &= TREE_READONLY (*condp);

Any reason not to just update those unconditionally?  I.e. just make
those normal assignments?  I'm assuming we'll only run the loop at all
in the case that TREE_TYPE (*condp) != TREE_TYPE (cond).

Alex

> 
>   Jakub
> 


Re: [PATCH 1/5] cp: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-08-09 Thread Alex Coplan
On 09/08/2024 17:34, Jakub Jelinek wrote:
> On Fri, Aug 09, 2024 at 04:21:14PM +0100, Alex Coplan wrote:
> > Hmm, good spot, I didn't realise that convert_from_reference could
> > change the type of the condition like this.
> > 
> > In that case I suppose the only thing to do is to construct a stack of
> > ANNOTATE_EXPRs on the way down and re-build the expressions (taking the
> > type of the inner expression, starting with the cond) on the way back
> > up.
> 
> I think you don't need to rebuild them, just update their types.
> Something along the lines of
>   for (tree c = cond; c != *condp; c = TREE_OPERAND (c, 0))
> {
>   gcc_checking_assert (TREE_CODE (c) == ANNOTATE_EXPR);
>   TREE_TYPE (c) = TREE_TYPE (*condp);
>}

I suppose I was just concerned about any other properties of
ANNOTATE_EXPRs that might be inherited from the operand (that could be
affected by such a change).

It looks like TREE_{READONLY,THIS_VOLATILE,SIDE_EFFECTS} are all set
in convert_from_reference and propagated by build3, so if those change
underneath us then only updating the type seems insufficient.

> 
>   Jakub
> 


Re: [PATCH 1/5] cp: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-08-09 Thread Alex Coplan
On 09/08/2024 17:08, Jakub Jelinek wrote:
> On Fri, Aug 09, 2024 at 11:01:22AM -0400, Jason Merrill wrote:
> > > The CLEANUP_POINT_EXPR gets added by the following call chain:
> > > 
> > > finish_while_stmt_cond
> > >   -> maybe_convert_cond
> > >   -> condition_conversion
> > >   -> fold_build_cleanup_point_expr
> > > 
> > > this patch chooses to fix the issue in maybe_convert_cond by walking 
> > > through
> > > any ANNOTATE_EXPRs and doing any condition conversion on the inner 
> > > expression,
> > > leaving the ANNOTATE_EXPRs (if any) as the outermost expressions in the
> > > condition.
> > 
> > I see that simplify_loop_decl_cond and finish_loop_cond use this same
> > pattern.  OK.
> 
> I'm a little bit worried about the convert_from_reference in there.
> Shouldn't TREE_TYPE of ANNOTATE_EXPR always match the TREE_TYPE of its
> operand?  If it could be say REFERENCE_TYPE to BOOLEAN_TYPE,
> convert_from_reference could change it but with the patch fail to adjust
> the type.

Hmm, good spot, I didn't realise that convert_from_reference could
change the type of the condition like this.

In that case I suppose the only thing to do is to construct a stack of
ANNOTATE_EXPRs on the way down and re-build the expressions (taking the
type of the inner expression, starting with the cond) on the way back
up.

I'll try something along those lines, then (unless there are further
comments suggesting otherwise!)

Thanks,
Alex

> 
>   Jakub
> 


Re: [PATCH 4/5] lto: Set has_unroll flag when streaming in for LTO [PR116140]

2024-08-09 Thread Alex Coplan
On 09/08/2024 13:12, Richard Biener wrote:
> 
> 
> > Am 09.08.2024 um 11:30 schrieb Alex Coplan :
> > 
> > When #pragma GCC unroll is processed in
> > tree-cfg.cc:replace_loop_annotate_in_block, we set both the loop->unroll
> > field (which is currently streamed out and back in during LTO) but also
> > the cfun->has_unroll flag.
> > 
> > cfun->has_unroll, however, is not currently streamed during LTO, so this
> > patch attempts to recover it by setting it on any function containing a
> > loop with loop->unroll > 1.
> > 
> > Prior to this patch, loops marked with #pragma GCC unroll that would be
> > unrolled by RTL loop2_unroll in a non-LTO compilation didn't get
> > unrolled under LTO.
> > 
> > As per the comment in the PR, a more conservative fix might explicitly
> > stream out cfun->has_unroll and stream it back in again, but this patch
> > it simpler and I can't currently see a reason against inferring the
> > value of the flag like this (comments welcome).
> 
> If the flag is redundant please eliminate it entirely.  Otherwise please 
> stream it.

AFAICT, the flag effectively provides a way of cacheing the answer to
the question "are there any loops in this function for which unrolling
has been requested"?

Eliminating it entirely would mean replacing any uses with loops over
all loops in the function, which seems suboptimal.

The current users are (both in loop-init.cc):
 - pass_rtl_unroll_loops::gate, and
 - pass_loop2::gate

Would you prefer introducing loops over all loops in the function at
those use sites?  Or is it OK to keep the flag?

If the flag is purely a cache of the above test (I think it is), then
ISTM streaming it has no benefit since we necessarily have to iterate
over all loops when streaming in.

Thanks,
Alex

> 
> > gcc/ChangeLog:
> > 
> >PR libstdc++/116140
> >* lto-streamer-in.cc (input_cfg): Set fn->has_unroll if fn
> >contains a loop with requested unrolling.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >PR libstdc++/116140
> >* g++.dg/ext/pragma-unroll-lambda-lto.C: New test.
> > ---
> > gcc/lto-streamer-in.cc|  2 ++
> > .../g++.dg/ext/pragma-unroll-lambda-lto.C | 32 +++
> > 2 files changed, 34 insertions(+)
> > create mode 100644 gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
> > 
> > <0004-lto-Set-has_unroll-flag-when-streaming-in-for-LTO-PR.patch>


[PATCH 4/5] lto: Set has_unroll flag when streaming in for LTO [PR116140]

2024-08-09 Thread Alex Coplan
When #pragma GCC unroll is processed in
tree-cfg.cc:replace_loop_annotate_in_block, we set both the loop->unroll
field (which is currently streamed out and back in during LTO) but also
the cfun->has_unroll flag.

cfun->has_unroll, however, is not currently streamed during LTO, so this
patch attempts to recover it by setting it on any function containing a
loop with loop->unroll > 1.

Prior to this patch, loops marked with #pragma GCC unroll that would be
unrolled by RTL loop2_unroll in a non-LTO compilation didn't get
unrolled under LTO.

As per the comment in the PR, a more conservative fix might explicitly
stream out cfun->has_unroll and stream it back in again, but this patch
it simpler and I can't currently see a reason against inferring the
value of the flag like this (comments welcome).

gcc/ChangeLog:

PR libstdc++/116140
* lto-streamer-in.cc (input_cfg): Set fn->has_unroll if fn
contains a loop with requested unrolling.

gcc/testsuite/ChangeLog:

PR libstdc++/116140
* g++.dg/ext/pragma-unroll-lambda-lto.C: New test.
---
 gcc/lto-streamer-in.cc|  2 ++
 .../g++.dg/ext/pragma-unroll-lambda-lto.C | 32 +++
 2 files changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C

diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 2e592be8082..93877065d86 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1136,6 +1136,8 @@ input_cfg (class lto_input_block *ib, class data_in *data_in,
   /* Read OMP SIMD related info.  */
   loop->safelen = streamer_read_hwi (ib);
   loop->unroll = streamer_read_hwi (ib);
+  if (loop->unroll > 1)
+	fn->has_unroll = true;
   loop->owned_clique = streamer_read_hwi (ib);
   loop->dont_vectorize = streamer_read_hwi (ib);
   loop->force_vectorize = streamer_read_hwi (ib);
diff --git a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
new file mode 100644
index 000..144c4c32692
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
@@ -0,0 +1,32 @@
+// { dg-do link { target c++11 } }
+// { dg-options "-O2 -flto -fdump-rtl-loop2_unroll" }
+
+#include 
+
+template
+inline Iter
+my_find(Iter first, Iter last, Pred pred)
+{
+#pragma GCC unroll 4
+while (first != last && !pred(*first))
+++first;
+return first;
+}
+
+__attribute__((noipa))
+short *use_find(short *p)
+{
+auto pred = [](short x) { return x == 42; };
+return my_find(p, p + 1024, pred);
+}
+
+int main(void)
+{
+  short a[1024];
+  for (int i = 0; i < 1024; i++)
+a[i] = rand ();
+
+  return use_find (a) - a;
+}
+
+// { dg-final { scan-ltrans-rtl-dump-times "Unrolled loop 3 times" 1 "loop2_unroll" } }


[PATCH 5/5] libstdc++: Restore unrolling in std::find using pragma [PR116140]

2024-08-09 Thread Alex Coplan
Together with the preparatory compiler patches, this patch restores
unrolling in std::__find_if, but this time relying on the compiler to do
it by using:

  #pragma GCC unroll 4

which should restore the majority of the regression relative to the
hand-unrolled version while still being vectorizable with WIP alignment
peeling enhancements.

On Neoverse V1 with LTO, this reduces the regression in xalancbmk (from
SPEC CPU 2017) from 5.8% to 1.7% (restoring ~71% of the lost
performance).

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

libstdc++-v3/ChangeLog:

PR libstdc++/116140
* include/bits/stl_algobase.h (std::__find_if): Add #pragma to
request GCC to unroll the loop.
---
 libstdc++-v3/include/bits/stl_algobase.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h
index 27f6c377ad6..f13662fc448 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -2104,6 +2104,7 @@ _GLIBCXX_END_NAMESPACE_ALGO
 inline _Iterator
 __find_if(_Iterator __first, _Iterator __last, _Predicate __pred)
 {
+#pragma GCC unroll 4
   while (__first != __last && !__pred(__first))
 	++__first;
   return __first;


[PATCH 1/5] cp: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-08-09 Thread Alex Coplan
For the testcase added with this patch, we would end up losing the:

  #pragma GCC unroll 4

and emitting "warning: ignoring loop annotation".  That warning comes
from tree-cfg.cc:replace_loop_annotate, and means that we failed to
process the ANNOTATE_EXPR in tree-cfg.cc:replace_loop_annotate_in_block.
That function walks backwards over the GIMPLE in an exiting BB for a
loop, skipping over the final gcond, and looks for any ANNOTATE_EXPRS
immediately preceding the gcond.

The function documents the following pre-condition:

   /* [...] We assume that the annotations come immediately before the
  condition in BB, if any.  */

now looking at the exiting BB of the loop, we have:

   :
  D.4524 = .ANNOTATE (iftmp.1, 1, 4);
  retval.0 = D.4524;
  if (retval.0 != 0)
goto ; [INV]
  else
goto ; [INV]

and crucially there is an intervening assignment between the gcond and
the preceding .ANNOTATE ifn call.  To see where this comes from, we can
look to the IR given by -fdump-tree-original:

  if (<::operator() (&pred, *first), unroll 4>>>)
goto ;
  else
goto ;

here the problem is that we've wrapped a CLEANUP_POINT_EXPR around the
ANNOTATE_EXPR, meaning the ANNOTATE_EXPR is no longer the outermost
expression in the condition.

The CLEANUP_POINT_EXPR gets added by the following call chain:

finish_while_stmt_cond
 -> maybe_convert_cond
 -> condition_conversion
 -> fold_build_cleanup_point_expr

this patch chooses to fix the issue in maybe_convert_cond by walking through
any ANNOTATE_EXPRs and doing any condition conversion on the inner expression,
leaving the ANNOTATE_EXPRs (if any) as the outermost expressions in the
condition.

With this patch, we don't get any such warning and the loop gets unrolled as
expected at -O2.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

gcc/cp/ChangeLog:

PR libstdc++/116140
* semantics.cc (maybe_convert_cond): Ensure any ANNOTATE_EXPRs
remain the outermost expression(s) of the condition.

gcc/testsuite/ChangeLog:

PR libstdc++/116140
* g++.dg/ext/pragma-unroll-lambda.C: New test.
---
 gcc/cp/semantics.cc   | 26 ---
 .../g++.dg/ext/pragma-unroll-lambda.C | 17 
 2 files changed, 34 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/pragma-unroll-lambda.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index e58612660c9..8634a188458 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -966,25 +966,33 @@ maybe_convert_cond (tree cond)
   if (type_dependent_expression_p (cond))
 return cond;
 
+  /* If the condition has any ANNOTATE_EXPRs, those must remain the outermost
+ expressions of the condition.  Walk through these, performing the condition
+ conversion in place on the inner expression.  */
+  tree *condp = &cond;
+  while (TREE_CODE (*condp) == ANNOTATE_EXPR)
+condp = &TREE_OPERAND (*condp, 0);
+
   /* For structured binding used in condition, the conversion needs to be
  evaluated before the individual variables are initialized in the
  std::tuple_{size,elemenet} case.  cp_finish_decomp saved the conversion
  result in a TARGET_EXPR, pick it up from there.  */
-  if (DECL_DECOMPOSITION_P (cond)
-  && DECL_DECOMP_IS_BASE (cond)
-  && DECL_DECOMP_BASE (cond)
-  && TREE_CODE (DECL_DECOMP_BASE (cond)) == TARGET_EXPR)
-cond = TARGET_EXPR_SLOT (DECL_DECOMP_BASE (cond));
+  if (DECL_DECOMPOSITION_P (*condp)
+  && DECL_DECOMP_IS_BASE (*condp)
+  && DECL_DECOMP_BASE (*condp)
+  && TREE_CODE (DECL_DECOMP_BASE (*condp)) == TARGET_EXPR)
+*condp = TARGET_EXPR_SLOT (DECL_DECOMP_BASE (*condp));
 
   if (warn_sequence_point && !processing_template_decl)
-verify_sequence_points (cond);
+verify_sequence_points (*condp);
 
-  maybe_warn_unparenthesized_assignment (cond, /*nested_p=*/false,
+  maybe_warn_unparenthesized_assignment (*condp, /*nested_p=*/false,
 	 tf_warning_or_error);
 
   /* Do the conversion.  */
-  cond = convert_from_reference (cond);
-  return condition_conversion (cond);
+  *condp = convert_from_reference (*condp);
+  *condp = condition_conversion (*condp);
+  return cond;
 }
 
 /* Finish an expression-statement, whose EXPRESSION is as indicated.  */
diff --git a/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda.C b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda.C
new file mode 100644
index 000..f410f6d8d25
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/pragma-unroll-lambda.C
@@ -0,0 +1,17 @@
+// { dg-do compile { target c++11 } }
+
+template
+inline Iter
+my_find(Iter first, Iter last, Pred pred)
+{
+#pragma GCC unroll 4
+while (first != last && !pred(*first))
+++first;
+return first;
+}
+
+short *use_find(short *p)
+{
+auto pred = [](short x) { return x == 42; };
+return my_find(p, p + 1024, pred);
+}


[PATCH 3/5] testsuite: Ensure ltrans dump files get cleaned up properly [PR116140]

2024-08-09 Thread Alex Coplan
I noticed while working on a test that uses LTO and requests a dump
file, that we are failing to cleanup ltrans dump files in the testsuite.

E.g. the test I was working on compiles with -flto
-fdump-rtl-loop2_unroll, and we end up with the following file:

./gcc/testsuite/g++/pr116140.ltrans0.ltrans.287r.loop2_unroll

being left behind by the testsuite.  This is problematic not just from a
"missing cleanup" POV, but also because it can cause the test to pass
spuriously when the test is re-run wtih an unpatched compiler (without
the bug fix).  In the broken case, loop2_unroll isn't run at all, so we
end up scanning the old dumpfile (from the previous test run) and making
the dumpfile scan pass.

Running with `-v -v` in RUNTESTFLAGS we can see the following cleanup
attempt is made:

remove-build-file `pr116140.{C,exe}.{ltrans[0-9]*.,}[0-9][0-9][0-9]{l,i,r,t}.*'

looking again at the ltrans dump file above we can see this will fail for two
reasons:

 - The actual dump file has no {C,exe} extension between the basename and
   ltrans0.
 - The actual dump file has an additional `.ltrans` component after `.ltrans0`.

This patch therefore relaxes the pattern constructed for cleaning up such
dumpfiles to also match dumpfiles with the above form.

Running the testsuite before/after this patch shows the number of files in
gcc/testsuite (in the build dir) with "ltrans" in the name goes from 1416 to 62
on aarch64.

No regressions on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/testsuite/ChangeLog:

PR libstdc++/116140
* lib/gcc-dg.exp (schedule-cleanups): Relax ltrans dumpfile
cleanup pattern to handle missing cases.
---
 gcc/testsuite/lib/gcc-dg.exp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index 992062103c1..cdb677d7873 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -190,7 +190,7 @@ proc schedule-cleanups { opts } {
 # Handle ltrans files around -flto
 if [regexp -- {(^|\s+)-flto(\s+|$)} $opts] {
 	verbose "Cleanup -flto seen" 4
-	set ltrans "{ltrans\[0-9\]*.,}"
+	set ltrans "{ltrans\[0-9\]*{.ltrans,}.,}"
 } else {
 	set ltrans ""
 }
@@ -206,7 +206,7 @@ proc schedule-cleanups { opts } {
 	if {$basename_ext != ""} {
 		regsub -- {^.*\.} $basename_ext {} basename_ext
 	}
-	lappend tfiles "$stem.{$basename_ext,exe}"
+	lappend tfiles "$stem{.$basename_ext,.exe,}"
 	unset basename_ext
 	} else {
 	lappend tfiles $basename


[PATCH 2/5] testsuite: Add scan-ltrans-rtl for use in dg-final [PR116140]

2024-08-09 Thread Alex Coplan
This extends the scan-ltrans-tree* helpers to create RTL variants.  This
is needed to check the behaviour of an RTL pass under LTO.

In particular it's used by a later patch in the series to check that
RTL unrolling is applied under LTO.

Tested as a series on aarch64-linux-gnu, OK for trunk?

gcc/ChangeLog:

PR libstdc++/116140
* doc/sourcebuild.texi: Document ltrans-rtl value of kind for
scan--dump*.

gcc/testsuite/ChangeLog:

PR libstdc++/116140
* lib/scanltranstree.exp (scan-ltrans-rtl-dump): New.
(scan-ltrans-rtl-dump-times): New.
(scan-ltrans-rtl-dump-not): New.
(scan-ltrans-rtl-dump-dem): New.
(scan-ltrans-rtl-dump-dem-not): New.
---
 gcc/doc/sourcebuild.texi |   4 +-
 gcc/testsuite/lib/scanltranstree.exp | 123 +++
 2 files changed, 125 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index d5c48e67b71..827fe9ce66c 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -3615,8 +3615,8 @@ stands for zero or more unmatched lines; the whitespace after
 @subsubsection Scan optimization dump files
 
 These commands are available for @var{kind} of @code{tree}, @code{ltrans-tree},
-@code{offload-tree}, @code{rtl}, @code{offload-rtl}, @code{ipa},
-@code{offload-ipa}, and @code{wpa-ipa}.
+@code{offload-tree}, @code{rtl}, @code{ltrans-rtl}, @code{offload-rtl},
+@code{ipa}, @code{offload-ipa}, and @code{wpa-ipa}.
 
 @table @code
 @item scan-@var{kind}-dump @var{regex} @var{suffix} [@{ target/xfail @var{selector} @}]
diff --git a/gcc/testsuite/lib/scanltranstree.exp b/gcc/testsuite/lib/scanltranstree.exp
index 79f05f0ffed..d7e27ad364a 100644
--- a/gcc/testsuite/lib/scanltranstree.exp
+++ b/gcc/testsuite/lib/scanltranstree.exp
@@ -146,3 +146,126 @@ proc scan-ltrans-tree-dump-dem-not { args } {
 			  ".ltrans0.ltrans"
 }
 }
+
+# Utility for scanning ltrans RTL dumps, invoked via dg-final.
+# Call pass if pattern is present, otherwise fail.
+#
+# Argument 0 is the regexp to match
+# Argument 1 is the name of the dumped rtl pass
+# Argument 2 handles expected failures and the like
+proc scan-ltrans-rtl-dump { args } {
+if { [llength $args] < 2 } {
+	error "scan-ltrans-rtl-dump: too few arguments"
+	return
+}
+if { [llength $args] > 3 } {
+	error "scan-ltrans-rtl-dump: too many arguments"
+	return
+}
+if { [llength $args] >= 3 } {
+	scan-dump "ltrans-rtl" [lindex $args 0] \
+		  "\[0-9\]\[0-9\]\[0-9\]r.[lindex $args 1]" ".ltrans0.ltrans" \
+		  [lindex $args 2]
+} else {
+	scan-dump "ltrans-rtl" [lindex $args 0] \
+		  "\[0-9\]\[0-9\]\[0-9\]r.[lindex $args 1]" ".ltrans0.ltrans"
+}
+}
+
+# Call pass if pattern is present given number of times, otherwise fail.
+# Argument 0 is the regexp to match
+# Argument 1 is number of times the regexp must be found
+# Argument 2 is the name of the dumped rtl pass
+# Argument 3 handles expected failures and the like
+proc scan-ltrans-rtl-dump-times { args } {
+if { [llength $args] < 3 } {
+	error "scan-ltrans-rtl-dump-times: too few arguments"
+	return
+}
+if { [llength $args] > 4 } {
+	error "scan-ltrans-rtl-dump-times: too many arguments"
+	return
+}
+if { [llength $args] >= 4 } {
+	scan-dump-times "ltrans-rtl" [lindex $args 0] [lindex $args 1] \
+			"\[0-9\]\[0-9\]\[0-9\]r.[lindex $args 2]" \
+			".ltrans0.ltrans" [lindex $args 3]
+} else {
+	scan-dump-times "ltrans-rtl" [lindex $args 0] [lindex $args 1] \
+			"\[0-9\]\[0-9\]\[0-9\]r.[lindex $args 2]" ".ltrans0.ltrans"
+}
+}
+
+# Call pass if pattern is not present, otherwise fail.
+#
+# Argument 0 is the regexp to match
+# Argument 1 is the name of the dumped rtl pass
+# Argument 2 handles expected failures and the like
+proc scan-ltrans-rtl-dump-not { args } {
+if { [llength $args] < 2 } {
+	error "scan-ltrans-rtl-dump-not: too few arguments"
+	return
+}
+if { [llength $args] > 3 } {
+	error "scan-ltrans-rtl-dump-not: too many arguments"
+	return
+}
+if { [llength $args] >= 3 } {
+	scan-dump-not "ltrans-rtl" [lindex $args 0] \
+		  "\[0-9\]\[0-9\]\[0-9\]r.[lindex $args 1]" ".ltrans0.ltrans" \
+		  [lindex $args 2]
+} else {
+	scan-dump-not "ltrans-rtl" [lindex $args 0] \
+		  "\[0-9\]\[0-9\]\[0-9\]r.[lindex $args 1]" ".ltrans0.ltrans"
+}
+}
+
+# Utility for scanning demangled compiler result, invoked via dg-final.
+# Call pass if pattern is present, otherwise fail.
+#
+# Argument 0 is the regexp to match
+# Argument 1 is the name of the dumped rtl pass
+# Argument 2 handles expected failures and the like
+proc scan-ltrans-rtl-dump-dem { args } {
+if { [llength $args] < 2 } {
+	error "scan-ltrans-rtl-dump-dem: too few arguments"
+	return
+}
+if { [llength $args] > 3 } {
+	error "scan-ltrans-rtl-dump-dem: too many arguments"
+	return
+}
+if { [llength $args] >= 3 } {
+	scan-dump-dem "ltrans-rtl" [lindex $args 0] \
+		  "\[0-9\]\[0-9\]\[0

[PATCH 0/5] Address std::find regression with RTL unrolling [PR116140]

2024-08-09 Thread Alex Coplan
This patch series aims to address PR116140.  The regression in xalancbmk
(both in SPEC 2006 and SPEC 2017) occurred when removing the
hand-unrolling in std::__find_if in libstdc++.

Keeping the loop re-rolled in the source is desirable as it allows the
function to be vectorized with WIP vectorizer enhancements (peeling read
DRs for alignment in early break loops).

In theory this should have just been a single patch adding:
  #pragma GCC unroll 4
to the std::__find_if loop in libstdc++.

However, there were a couple of snags (see the PR for details).  The
series is structured as follows:
 - 1/5 fixes a bug in the C++ frontend causing the #pragma to get
   dropped under certain conditions.
 - 2/5 and 3/5 are preparatory testsuite patches for 4/5 which adds
   an LTO test that needs to scan an ltrans RTL dumpfile.
 - 4/5 fixes a bug where the has_unroll flag on functions isn't
   streamed during LTO.
 - 5/5 then finally adds the #pragma to std::__find_if.

The following table shows the performance effect of the patch series on
xalancbmk (both from SPEC CPU 2017 and SPEC CPU 2006).  This is on
Neoverse V1 with LTO.

+---+---+---+
|  Benchmark Suite  | SPEC CPU 2017 | SPEC CPU 2006 |
+---+---+---+
| Regression in PR  | 5.83% | 11.12%|
| Regression after patches  | 1.68% | 3.16% |
| % of regression recovered | 71.24%| 71.11%|
+---+---+---+

Bootstrapped/regtested as a series on aarch64-linux-gnu, no regressions.

Alex Coplan (5):
  cp: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions 
[PR116140]
  testsuite: Add scan-ltrans-rtl for use in dg-final [PR116140]
  testsuite: Ensure ltrans dump files get cleaned up properly [PR116140]
  lto: Set has_unroll flag when streaming in for LTO [PR116140]
  libstdc++: Restore unrolling in std::find using pragma [PR116140]

 gcc/cp/semantics.cc   |  26 ++--
 gcc/doc/sourcebuild.texi  |   4 +-
 gcc/lto-streamer-in.cc|   2 +
 .../g++.dg/ext/pragma-unroll-lambda-lto.C |  32 +
 .../g++.dg/ext/pragma-unroll-lambda.C |  17 +++
 gcc/testsuite/lib/gcc-dg.exp  |   4 +-
 gcc/testsuite/lib/scanltranstree.exp  | 123 ++
 libstdc++-v3/include/bits/stl_algobase.h  |   1 +
 8 files changed, 196 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/pragma-unroll-lambda-lto.C
 create mode 100644 gcc/testsuite/g++.dg/ext/pragma-unroll-lambda.C



[PATCH 1/2] gdbhooks: Make dot viewer configurable

2024-08-01 Thread Alex Coplan
Hi,

This adds a new GDB parameter 'gcc-dot-cmd' which allows the user to
configure the command used to render the CFG within dot-fn.

E.g. with this patch the user can change their dot viewer like so:

(gdb) show gcc-dot-cmd
The current value of 'gcc-dot-cmd' is "dot -Tx11".
(gdb) set gcc-dot-cmd xdot
(gdb) dot-fn # opens in xdot

The second patch in this series adds a hook which users can define in
their .gdbinit in order to be called when the GCC extensions have
finished loading, thus allowing users to automatically configure
gcc-dot-cmd as desired in their .gdbinit.

Manually tested by debugging an x86 -> aarch64 cross, changing the
parameter, and invoking dot-fn.

OK to install?

Thanks,
Alex

gcc/ChangeLog:

* gdbhooks.py (GCCDotCmd): New.
(gcc_dot_cmd): New. Use it ...
(DotFn.invoke): ... here.
iff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
index 92e38880a70..db8ce0d071b 100644
--- a/gcc/gdbhooks.py
+++ b/gcc/gdbhooks.py
@@ -783,6 +783,18 @@ class DumpFn(gdb.Command):
 
 DumpFn()
 
+class GCCDotCmd(gdb.Parameter):
+"""
+This parameter controls the command used to render dot files within
+GCC's dot-fn command.  It will be invoked as gcc-dot-cmd .
+"""
+def __init__(self):
+super(GCCDotCmd, self).__init__('gcc-dot-cmd',
+gdb.COMMAND_NONE, gdb.PARAM_STRING)
+self.value = "dot -Tx11"
+
+gcc_dot_cmd = GCCDotCmd()
+
 class DotFn(gdb.Command):
 """
 A custom command to show a gimple/rtl function control flow graph.
@@ -848,7 +860,8 @@ class DotFn(gdb.Command):
 return
 
 # Show graph in temp file
-os.system("( dot -Tx11 \"%s\"; rm \"%s\" ) &" % (filename, filename))
+dot_cmd = gcc_dot_cmd.value
+os.system("( %s \"%s\"; rm \"%s\" ) &" % (dot_cmd, filename, filename))
 
 DotFn()
 


[PATCH 2/2] gdbhooks: Add attempt to invoke on-gcc-hooks-load

2024-08-01 Thread Alex Coplan
This extends GCC's GDB hooks to attempt invoking the user-defined
command "on-gcc-hooks-load".  The idea is that users can define the
command in their .gdbinit to override the default values of parameters
defined by GCC's GDB extensions.

For example, together with the previous patch, I can add the following
fragment to my .gdbinit:

define on-gcc-hooks-load
  set gcc-dot-cmd xdot
end

which means, once the GCC extensions get loaded, whenever I invoke
dot-fn then the graph will be rendered using xdot.

The try/except should make this patch a no-op for users that don't
currently define this command.  I looked for a way to test explicitly
for whether a GDB command exists but didn't find one.

This is needed because the user's .gdbinit is sourced before GCC's GDB
extensions are loaded, and GCC-specific parameters can't be configured
before they are defined.

As an alternative (to avoid having the callback), I considered having
the user define a convenience variable with a well-known name and using
that (if defined) in gdbhooks.py to set the default value for
gcc-dot-cmd.  But that seemed like a hack.  I'd be interested to hear
from any GDB experts if there's a better way of managing configuration
like this.

Tested by invoking dot-fn with/without the above fragment in my .gdbinit
and observing the change in dot renderer.

OK to install?

Thanks,
Alex

gcc/ChangeLog:

* gdbhooks.py: Add attempted call to "on-gcc-hooks-load" once
we've finished loading the hooks.
diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
index db8ce0d071b..7a64c03b8ac 100644
--- a/gcc/gdbhooks.py
+++ b/gcc/gdbhooks.py
@@ -865,4 +865,12 @@ class DotFn(gdb.Command):
 
 DotFn()
 
+# Try and invoke the user-defined command "on-gcc-hooks-load".  Doing
+# this allows users to customize the GCC extensions once they've been
+# loaded by defining the hook in their .gdbinit.
+try:
+gdb.execute('on-gcc-hooks-load')
+except gdb.error:
+pass
+
 print('Successfully loaded GDB hooks for GCC')


Re: [PATCH 2/3] aarch64: Add support for moving fpm system register

2024-07-22 Thread Alex Coplan
Hi Claudio,

I've left a couple of small comments below.

On 22/07/2024 09:30, Claudio Bantaloukas wrote:
> 
> Unlike most system registers, fpmr can be heavily written to in code that
> exercises the fp8 functionality. That is because every fp8 instrinsic call
> can potentially change the value of fpmr.
> Rather than just use a an unspec, we treat the fpmr system register like
> all other registers and use a move operation to read and write to it.
> 
> We introduce a new class of moveable system registers that, currently,
> only accepts fpmr and a new constraint, Umv, that allows us to
> selectively use mrs and msr instructions when expanding rtl for them.
> Given that there is code that depends on "real" registers coming before
> "fake" ones, we introduce a new constant FPM_REGNUM that uses an
> existing value and renumber registers below that.
> This requires us to update the bitmaps that describe which registers
> belong to each register class.
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64.cc (aarch64_hard_regno_nregs): Add
>   support for MOVEABLE_SYSREGS class.
>   (aarch64_hard_regno_mode_ok): Only allow 64 bit reads and writes
>   to fpmr.
>   (aarch64_regno_regclass): Support MOVEABLE_SYSREGS class.
>   (aarch64_class_max_nregs): Likewise.
>   * config/aarch64/aarch64.h (FIXED_REGISTERS): add fpmr.
>   (CALL_REALLY_USED_REGISTERS): Likewise.
>   (REGISTER_NAMES): Likewise.
>   (enum reg_class): Add MOVEABLE_SYSREGS class.
>   (REG_CLASS_NAMES): Likewise.
>   (REG_CLASS_CONTENTS): Update class bitmaps to deal with fpmr,
>   the new MOVEABLE_REGS class and renumbering of registers.
>   * config/aarch64/aarch64.md: (FPM_REGNUM): added new register
>   number, reusing old value.
>   (FFR_REGNUM): Renumber.
>   (FFRT_REGNUM): Likewise.
>   (LOWERING_REGNUM): Likewise.
>   (TPIDR2_BLOCK_REGNUM): Likewise.
>   (SME_STATE_REGNUM): Likewise.
>   (TPIDR2_SETUP_REGNUM): Likewise.
>   (ZA_FREE_REGNUM): Likewise.
>   (ZA_SAVED_REGNUM): Likewise.
>   (ZA_REGNUM): Likewise.
>   (ZT0_REGNUM): Likewise.
>   (*mov_aarch64): Add support for moveable sysregs.
>   (*movsi_aarch64): Likewise.
>   (*movdi_aarch64): Likewise.
>   * config/aarch64/constraints.md (MOVEABLE_SYSREGS): New constraint.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/acle/fp8.c: New tests.
> ---
>  gcc/config/aarch64/aarch64.cc   |   9 ++
>  gcc/config/aarch64/aarch64.h|  14 ++-
>  gcc/config/aarch64/aarch64.md   |  30 --
>  gcc/config/aarch64/constraints.md   |   3 +
>  gcc/testsuite/gcc.target/aarch64/acle/fp8.c | 107 +++-
>  5 files changed, 146 insertions(+), 17 deletions(-)
> 

> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 0d9e80d85b2..fa526836c6a 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -2018,6 +2018,7 @@ aarch64_hard_regno_nregs (unsigned regno, machine_mode 
> mode)
>  case PR_HI_REGS:
>return mode == VNx32BImode ? 2 : 1;
>  
> +case MOVEABLE_SYSREGS:
>  case FFR_REGS:
>  case PR_AND_FFR_REGS:
>  case FAKE_REGS:
> @@ -2045,6 +2046,10 @@ aarch64_hard_regno_mode_ok (unsigned regno, 
> machine_mode mode)
>  /* This must have the same size as _Unwind_Word.  */
>  return mode == DImode;
>  
> +  if (regno == FPM_REGNUM)
> +/* This must have the same size as the FPMR register.  */
> +return mode == QImode || mode == HImode || mode == SImode || mode == 
> DImode;

I'm probably missing something here, but I can't seem to square the
comment with the logic.  These modes all have different sizes, so how
can they all be the same size as the FPMR register?

> +
>unsigned int vec_flags = aarch64_classify_vector_mode (mode);
>if (vec_flags == VEC_SVE_PRED)
>  return pr_or_ffr_regnum_p (regno);
> @@ -12682,6 +12687,9 @@ aarch64_regno_regclass (unsigned regno)
>if (PR_REGNUM_P (regno))
>  return PR_LO_REGNUM_P (regno) ? PR_LO_REGS : PR_HI_REGS;
>  
> +  if (regno == FPM_REGNUM)
> +return MOVEABLE_SYSREGS;
> +
>if (regno == FFR_REGNUM || regno == FFRT_REGNUM)
>  return FFR_REGS;
>  
> @@ -13070,6 +13078,7 @@ aarch64_class_max_nregs (reg_class_t regclass, 
> machine_mode mode)
>  case PR_HI_REGS:
>return mode == VNx32BImode ? 2 : 1;
>  
> +case MOVEABLE_SYSREGS:
>  case STACK_REG:
>  case FFR_REGS:
>  case PR_AND_FFR_REGS:
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 40793aab814..7385bdaa456 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -522,6 +522,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE 
> ATTRIBUTE_UNUSED
>  1, 1, 1, 1,  /* SFP, AP, CC, VG */   \
>  0, 0, 0, 0,   0, 0, 0, 0,   /* P0 - P7 */   \
>  0, 0, 0, 0,   0, 0, 0, 0,   /* P8 - P15 */

[PATCH v2] middle-end: Add debug functions to dump dominator tree in dot format

2024-07-05 Thread Alex Coplan
This is a v2 patch which implements richi's feedback.

OK if it survives bootstrap on aarch64?

Thanks,
Alex

-- >8 --

This adds debug functions to dump the dominator tree in dot format.
There are two overloads: one which takes a FILE * and another which
takes a const char *fname and wraps the first with fopen/fclose for
convenience.

gcc/ChangeLog:

* dominance.cc (dot_dominance_tree): New.
diff --git a/gcc/dominance.cc b/gcc/dominance.cc
index 0357210ed27..c14d997ded7 100644
--- a/gcc/dominance.cc
+++ b/gcc/dominance.cc
@@ -1658,6 +1658,36 @@ debug_dominance_info (enum cdi_direction dir)
   fprintf (stderr, "%i %i\n", bb->index, bb2->index);
 }
 
+/* Dump the dominance tree in direction DIR to the file F in dot form.
+   This allows easily visualizing the tree using graphviz.  */
+
+DEBUG_FUNCTION void
+dot_dominance_tree (FILE *f, enum cdi_direction dir)
+{
+  fprintf (f, "digraph {\n");
+  basic_block bb, idom;
+  FOR_EACH_BB_FN (bb, cfun)
+if ((idom = get_immediate_dominator (dir, bb)))
+  fprintf (f, "%i -> %i;\n", idom->index, bb->index);
+  fprintf (f, "}\n");
+}
+
+/* Convenience wrapper around the above that dumps the dominance tree in
+   direction DIR to the file at path FNAME in dot form.  */
+
+DEBUG_FUNCTION void
+dot_dominance_tree (const char *fname, enum cdi_direction dir)
+{
+  FILE *f = fopen (fname, "w");
+  if (f)
+{
+  dot_dominance_tree (f, dir);
+  fclose (f);
+}
+  else
+fprintf (stderr, "failed to open %s: %s\n", fname, xstrerror (errno));
+}
+
 /* Prints to stderr representation of the dominance tree (for direction DIR)
rooted in ROOT, indented by INDENT tabulators.  If INDENT_FIRST is false,
the first line of the output is not indented.  */


Re: [PATCH] middle-end: Add debug function to dump dominator tree in dot format

2024-07-05 Thread Alex Coplan
On 05/07/2024 09:59, Richard Biener wrote:
> On Fri, 5 Jul 2024, Alex Coplan wrote:
> 
> > Hi,
> > 
> > This adds a debug function to dump the dominator tree in dot/graphviz
> > format.  The idea is that the function can be called in GDB, the output
> > copy/pasted into a .dot file and then rendered using graphviz.
> > 
> > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
> 
> Can you follow other APIs here and rename and use a FILE * arg?
> 
> DEBUG_FUNCTION void
> dot_dominance_tree (FILE *f, enum cdi_direction dir)
> ...
> 
> that way in gdb you can do
> 
> (gdb) p fopen ("/tmp/x.dot", "w")
> (gdb) p dot_dominance_tree ($1, CDI_DOMINATORS);
> (gdb0 p fclose ($1);
> 
> and then dot the file?  It's also easier to use this from a
> gdb python wrapper which can do the above as well.  In other
> places there's then an overload with a const char *fname argument
> doing the fopen/fclose itself.

Yes, that sounds much better (it would certainly be more useable in
bigger functions that way).  I'll give that a go (including the
convenience overload), thanks.

Alex

> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Alex
> > 
> > gcc/ChangeLog:
> > 
> > * dominance.cc (debug_dominance_tree_dot): New.
> > 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] middle-end: Add debug function to dump dominator tree in dot format

2024-07-05 Thread Alex Coplan
Hi,

This adds a debug function to dump the dominator tree in dot/graphviz
format.  The idea is that the function can be called in GDB, the output
copy/pasted into a .dot file and then rendered using graphviz.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

* dominance.cc (debug_dominance_tree_dot): New.
diff --git a/gcc/dominance.cc b/gcc/dominance.cc
index 0357210ed27..b536f193abc 100644
--- a/gcc/dominance.cc
+++ b/gcc/dominance.cc
@@ -1658,6 +1658,20 @@ debug_dominance_info (enum cdi_direction dir)
   fprintf (stderr, "%i %i\n", bb->index, bb2->index);
 }
 
+/* Print the dominance tree (in direction DIR) in dot form.  This allows easily
+   visualizing the tree using graphviz.  */
+
+DEBUG_FUNCTION void
+debug_dominance_tree_dot (enum cdi_direction dir)
+{
+  fprintf (stderr, "digraph {\n");
+  basic_block bb, idom;
+  FOR_EACH_BB_FN (bb, cfun)
+if ((idom = get_immediate_dominator (dir, bb)))
+  fprintf (stderr, "%i -> %i;\n", idom->index, bb->index);
+  fprintf (stderr, "}\n");
+}
+
 /* Prints to stderr representation of the dominance tree (for direction DIR)
rooted in ROOT, indented by INDENT tabulators.  If INDENT_FIRST is false,
the first line of the output is not indented.  */


Re: [PATCH 1/2]middle-end: fix wide_int_constant_multiple_p when VAL and DIV are 0. [PR114932]

2024-07-02 Thread Alex Coplan
On 02/07/2024 13:41, Richard Biener wrote:
> On Tue, 2 Jul 2024, Alex Coplan wrote:
> 
> > On 02/07/2024 10:46, Alex Coplan wrote:
> > > On 02/07/2024 10:01, Richard Biener wrote:
> > > > On Mon, 1 Jul 2024, Tamar Christina wrote:
> > > > 
> > > > > > -Original Message-
> > > > > > From: Tamar Christina 
> > > > > > Sent: Monday, July 1, 2024 9:14 PM
> > > > > > To: gcc-patches@gcc.gnu.org
> > > > > > Cc: nd ; rguent...@suse.de; j...@ventanamicro.com
> > > > > > Subject: [PATCH 1/2]middle-end: fix wide_int_constant_multiple_p 
> > > > > > when VAL and
> > > > > > DIV are 0. [PR114932]
> > > > > > 
> > > > > > Hi All,
> > > > > > 
> > > > > > wide_int_constant_multiple_p tries to check if for two tree 
> > > > > > expressions a and b
> > > > > > that there is a multiplier which makes a == b * c.
> > > > > > 
> > > > > > This code however seems to think that there's no c where a=0 and 
> > > > > > b=0 are equal
> > > > > > which is of course wrong.
> > > > > > 
> > > > > > This fixes it and also fixes the comment.
> > > > > > 
> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > > x86_64-pc-linux-gnu -m32, -m64 and no issues.
> > > > > > 
> > > > > > Ok for master?
> > > > > > 
> > > > > > Thanks,
> > > > > > Tamar
> > > > > > 
> > > > > > gcc/ChangeLog:
> > > > > > 
> > > > > > PR tree-optimization/114932
> > > > > > * tree-affine.cc (wide_int_constant_multiple_p): Support 0 and 
> > > > > > 0 being
> > > > > > multiples.
> > > > > > 
> > > > > > ---
> > > > > > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> > > > > > index
> > > > > > d6309c4390362b680f0aa97a41fac3281ade66fd..bfea0fe826a6affa0ace154e3ca
> > > > > > 38c9ef632fcba 100644
> > > > > > --- a/gcc/tree-affine.cc
> > > > > > +++ b/gcc/tree-affine.cc
> > > > > > @@ -880,11 +880,10 @@ free_affine_expand_cache (hash_map > > > > > name_expansion *> **cache)
> > > > > >*cache = NULL;
> > > > > >  }
> > > > > > 
> > > > > > -/* If VAL != CST * DIV for any constant CST, returns false.
> > > > > > -   Otherwise, if *MULT_SET is true, additionally compares CST and 
> > > > > > MULT,
> > > > > > -   and if they are different, returns false.  Finally, if neither 
> > > > > > of these
> > > > > > -   two cases occur, true is returned, and CST is stored to MULT 
> > > > > > and MULT_SET
> > > > > > -   is set to true.  */
> > > > > > +/* If VAL == CST * DIV for any constant CST, returns true.
> > > > > > +   and if *MULT_SET is true, additionally compares CST and MULT
> > > > > > +   and if they are different, returns false.  If true is returned, 
> > > > > > CST is
> > > > > > +   stored to MULT and MULT_SET is set to true.  */
> > > > > > 
> > > > > >  static bool
> > > > > >  wide_int_constant_multiple_p (const poly_widest_int &val,
> > > > > > @@ -895,6 +894,12 @@ wide_int_constant_multiple_p (const 
> > > > > > poly_widest_int
> > > > > > &val,
> > > > > > 
> > > > > >if (known_eq (val, 0))
> > > > > >  {
> > > > > > +  if (maybe_eq (div, 0))
> > > > > > +   {
> > > > > > + *mult = 1;
> > > > > > + return true;
> > > > > > +   }
> > > > > > +
> > > > > 
> > > > > Note, I also tested known_eq here, and also no regression on what I 
> > > > > can test.
> > > > > I picked maybe_eq since that's what the lines after this one tests.
> > > > 
> > > > I think the maybe_eq (div, 0) is because otherwise multiple_p might
> > > > crash?  I'm not sure if there's a difference between
>

Re: [PATCH 1/2]middle-end: fix wide_int_constant_multiple_p when VAL and DIV are 0. [PR114932]

2024-07-02 Thread Alex Coplan
On 02/07/2024 10:46, Alex Coplan wrote:
> On 02/07/2024 10:01, Richard Biener wrote:
> > On Mon, 1 Jul 2024, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Tamar Christina 
> > > > Sent: Monday, July 1, 2024 9:14 PM
> > > > To: gcc-patches@gcc.gnu.org
> > > > Cc: nd ; rguent...@suse.de; j...@ventanamicro.com
> > > > Subject: [PATCH 1/2]middle-end: fix wide_int_constant_multiple_p when 
> > > > VAL and
> > > > DIV are 0. [PR114932]
> > > > 
> > > > Hi All,
> > > > 
> > > > wide_int_constant_multiple_p tries to check if for two tree expressions 
> > > > a and b
> > > > that there is a multiplier which makes a == b * c.
> > > > 
> > > > This code however seems to think that there's no c where a=0 and b=0 
> > > > are equal
> > > > which is of course wrong.
> > > > 
> > > > This fixes it and also fixes the comment.
> > > > 
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > x86_64-pc-linux-gnu -m32, -m64 and no issues.
> > > > 
> > > > Ok for master?
> > > > 
> > > > Thanks,
> > > > Tamar
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > PR tree-optimization/114932
> > > > * tree-affine.cc (wide_int_constant_multiple_p): Support 0 and 
> > > > 0 being
> > > > multiples.
> > > > 
> > > > ---
> > > > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> > > > index
> > > > d6309c4390362b680f0aa97a41fac3281ade66fd..bfea0fe826a6affa0ace154e3ca
> > > > 38c9ef632fcba 100644
> > > > --- a/gcc/tree-affine.cc
> > > > +++ b/gcc/tree-affine.cc
> > > > @@ -880,11 +880,10 @@ free_affine_expand_cache (hash_map > > > name_expansion *> **cache)
> > > >*cache = NULL;
> > > >  }
> > > > 
> > > > -/* If VAL != CST * DIV for any constant CST, returns false.
> > > > -   Otherwise, if *MULT_SET is true, additionally compares CST and MULT,
> > > > -   and if they are different, returns false.  Finally, if neither of 
> > > > these
> > > > -   two cases occur, true is returned, and CST is stored to MULT and 
> > > > MULT_SET
> > > > -   is set to true.  */
> > > > +/* If VAL == CST * DIV for any constant CST, returns true.
> > > > +   and if *MULT_SET is true, additionally compares CST and MULT
> > > > +   and if they are different, returns false.  If true is returned, CST 
> > > > is
> > > > +   stored to MULT and MULT_SET is set to true.  */
> > > > 
> > > >  static bool
> > > >  wide_int_constant_multiple_p (const poly_widest_int &val,
> > > > @@ -895,6 +894,12 @@ wide_int_constant_multiple_p (const poly_widest_int
> > > > &val,
> > > > 
> > > >if (known_eq (val, 0))
> > > >  {
> > > > +  if (maybe_eq (div, 0))
> > > > +   {
> > > > + *mult = 1;
> > > > + return true;
> > > > +   }
> > > > +
> > > 
> > > Note, I also tested known_eq here, and also no regression on what I can 
> > > test.
> > > I picked maybe_eq since that's what the lines after this one tests.
> > 
> > I think the maybe_eq (div, 0) is because otherwise multiple_p might
> > crash?  I'm not sure if there's a difference between
> > maybe_eq (x, 0) and known_eq (x, 0) though - how does a maybe_eq
> > POLY_INT look like that's not known_eq?
> 
> Take:
> 
> A = POLY_INT_CST [16,0]
> B = POLY_INT_CST [8,8]
> 
> then these represent polynomials:
> 
> A = 16
> B = 8 + 8x
> 
> where x is only known at runtime.  We have maybe_eq (A,B) since there is
> a value of x (= 1) which makes these equal at runtime, but clearly
> !known_eq (A,B) (take x = 0, for example).

So specifically in the case of:

maybe_eq (x, 0) vs known_eq (x, 0)

I suppose x = POLY_INT_CST [-4,4] would satisfy the first (again with x
= 1) but not the second.

Thanks,
Alex

> 
> That is my understanding at least, hopefully that makes sense.
> 
> Thanks,
> Alex
> 
> > 
> > > I'm not sure I fully understand why one tests known and the other maybe.  
> > > It seems to me
> > > that both should test known.  But I tested both so which ever one is felt 
> > > to be more correct
> > > I can commit If ok.
> > > 
> > > Thanks,
> > > Tamar
> > > 
> > > >if (*mult_set && maybe_ne (*mult, 0))
> > > > return false;
> > > >*mult_set = true;
> > > > 
> > > > 
> > > > 
> > > > 
> > > > --
> > > 
> > 
> > -- 
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 1/2]middle-end: fix wide_int_constant_multiple_p when VAL and DIV are 0. [PR114932]

2024-07-02 Thread Alex Coplan
On 02/07/2024 10:01, Richard Biener wrote:
> On Mon, 1 Jul 2024, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Tamar Christina 
> > > Sent: Monday, July 1, 2024 9:14 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: nd ; rguent...@suse.de; j...@ventanamicro.com
> > > Subject: [PATCH 1/2]middle-end: fix wide_int_constant_multiple_p when VAL 
> > > and
> > > DIV are 0. [PR114932]
> > > 
> > > Hi All,
> > > 
> > > wide_int_constant_multiple_p tries to check if for two tree expressions a 
> > > and b
> > > that there is a multiplier which makes a == b * c.
> > > 
> > > This code however seems to think that there's no c where a=0 and b=0 are 
> > > equal
> > > which is of course wrong.
> > > 
> > > This fixes it and also fixes the comment.
> > > 
> > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > x86_64-pc-linux-gnu -m32, -m64 and no issues.
> > > 
> > > Ok for master?
> > > 
> > > Thanks,
> > > Tamar
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   PR tree-optimization/114932
> > >   * tree-affine.cc (wide_int_constant_multiple_p): Support 0 and 0 being
> > >   multiples.
> > > 
> > > ---
> > > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> > > index
> > > d6309c4390362b680f0aa97a41fac3281ade66fd..bfea0fe826a6affa0ace154e3ca
> > > 38c9ef632fcba 100644
> > > --- a/gcc/tree-affine.cc
> > > +++ b/gcc/tree-affine.cc
> > > @@ -880,11 +880,10 @@ free_affine_expand_cache (hash_map > > name_expansion *> **cache)
> > >*cache = NULL;
> > >  }
> > > 
> > > -/* If VAL != CST * DIV for any constant CST, returns false.
> > > -   Otherwise, if *MULT_SET is true, additionally compares CST and MULT,
> > > -   and if they are different, returns false.  Finally, if neither of 
> > > these
> > > -   two cases occur, true is returned, and CST is stored to MULT and 
> > > MULT_SET
> > > -   is set to true.  */
> > > +/* If VAL == CST * DIV for any constant CST, returns true.
> > > +   and if *MULT_SET is true, additionally compares CST and MULT
> > > +   and if they are different, returns false.  If true is returned, CST is
> > > +   stored to MULT and MULT_SET is set to true.  */
> > > 
> > >  static bool
> > >  wide_int_constant_multiple_p (const poly_widest_int &val,
> > > @@ -895,6 +894,12 @@ wide_int_constant_multiple_p (const poly_widest_int
> > > &val,
> > > 
> > >if (known_eq (val, 0))
> > >  {
> > > +  if (maybe_eq (div, 0))
> > > + {
> > > +   *mult = 1;
> > > +   return true;
> > > + }
> > > +
> > 
> > Note, I also tested known_eq here, and also no regression on what I can 
> > test.
> > I picked maybe_eq since that's what the lines after this one tests.
> 
> I think the maybe_eq (div, 0) is because otherwise multiple_p might
> crash?  I'm not sure if there's a difference between
> maybe_eq (x, 0) and known_eq (x, 0) though - how does a maybe_eq
> POLY_INT look like that's not known_eq?

Take:

A = POLY_INT_CST [16,0]
B = POLY_INT_CST [8,8]

then these represent polynomials:

A = 16
B = 8 + 8x

where x is only known at runtime.  We have maybe_eq (A,B) since there is
a value of x (= 1) which makes these equal at runtime, but clearly
!known_eq (A,B) (take x = 0, for example).

That is my understanding at least, hopefully that makes sense.

Thanks,
Alex

> 
> > I'm not sure I fully understand why one tests known and the other maybe.  
> > It seems to me
> > that both should test known.  But I tested both so which ever one is felt 
> > to be more correct
> > I can commit If ok.
> > 
> > Thanks,
> > Tamar
> > 
> > >if (*mult_set && maybe_ne (*mult, 0))
> > >   return false;
> > >*mult_set = true;
> > > 
> > > 
> > > 
> > > 
> > > --
> > 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 1/6] rtl-ssa: Rework _ignoring interfaces

2024-06-20 Thread Alex Coplan
Hi Richard,

I had a quick look through the patch and noticed a couple of minor typos.
Otherwise looks like a nice cleanup!

On 20/06/2024 14:34, Richard Sandiford wrote:
> rtl-ssa has routines for scanning forwards or backwards for something
> under the control of an exclusion set.  These searches are currently
> used for two main things:
> 
> - to work out where an instruction can be moved within its EBB
> - to work out whether recog can add a new hard register clobber
> 
> The exclusion set was originally a callback function that returned
> true for insns that should be ignored.  However, for the late-combine
> work, I'd also like to be able to skip an entire definition, along
> with all its uses.
> 
> This patch prepares for that by turning the exclusion set into an
> object that provides predicate member functions.  Currently the
> only two member functions are:
> 
> - should_ignore_insn: what the old callback did
> - should_ignore_def: the new functionality
> 
> but more could be added later.
> 
> Doing this also makes it easy to remove some assymmetry that I think

s/assymmetry/asymmetry/

> in hindsight was a mistake: in forward scans, ignoring an insn meant
> ignoring all definitions in that insn (ok) and all uses of those
> definitions (non-obvious).  The new interface makes it possible
> to select the required behaviour, with that behaviour being applied
> consistently in both directions.
> 
> Now that the exclusion set is a dedicated object, rather than
> just a "random" function, I think it makes sense to remove the
> _ignoring suffix from the function names.  The suffix was originally
> there to describe the callback, and in particular to emphasise that
> a true return meant "ignore" rather than "heed".
> 
> gcc/
>   * rtl-ssa.h: Include predicates.h.
>   * rtl-ssa/predicates.h: New file.
>   * rtl-ssa/access-utils.h (prev_call_clobbers_ignoring): Rename to...
>   (prev_call_clobbers): ...this and treat the ignore parameter as an
>   object with the same interface as ignore_nothing.
>   (next_call_clobbers_ignoring): Rename to...
>   (next_call_clobbers): ...this and treat the ignore parameter as an
>   object with the same interface as ignore_nothing.
>   (first_nondebug_insn_use_ignoring): Rename to...
>   (first_nondebug_insn_use): ...this and treat the ignore parameter as
>   an object with the same interface as ignore_nothing.
>   (last_nondebug_insn_use_ignoring): Rename to...
>   (last_nondebug_insn_use): ...this and treat the ignore parameter as
>   an object with the same interface as ignore_nothing.
>   (last_access_ignoring): Rename to...
>   (last_access): ...this and treat the ignore parameter as an object
>   with the same interface as ignore_nothing.  Conditionally skip
>   definitions.
>   (prev_access_ignoring): Rename to...
>   (prev_access): ...this and treat the ignore parameter as an object
>   with the same interface as ignore_nothing.
>   (first_def_ignoring): Replace with...
>   (first_access): ...this new function.
>   (next_access_ignoring): Rename to...
>   (next_access): ...this and treat the ignore parameter as an object
>   with the same interface as ignore_nothing.  Conditionally skip
>   definitions.
>   * rtl-ssa/change-utils.h (insn_is_changing): Delete.
>   (restrict_movement_ignoring): Rename to...
>   (restrict_movement): ...this and treat the ignore parameter as an
>   object with the same interface as ignore_nothing.
>   (recog_ignoring): Rename to...
>   (recog): ...this and treat the ignore parameter as an object with
>   the same interface as ignore_nothing.
>   * rtl-ssa/changes.h (insn_is_changing_closure): Delete.
>   * rtl-ssa/functions.h (function_info::add_regno_clobber): Treat
>   the ignore parameter as an object with the same interface as
>   ignore_nothing.
>   * rtl-ssa/insn-utils.h (insn_is): Delete.
>   * rtl-ssa/insns.h (insn_is_closure): Delete.
>   * rtl-ssa/member-fns.inl
>   (insn_is_changing_closure::insn_is_changing_closure): Delete.
>   (insn_is_changing_closure::operator()): Likewise.
>   (function_info::add_regno_clobber): Treat the ignore parameter
>   as an object with the same interface as ignore_nothing.
>   (ignore_changing_insns::ignore_changing_insns): New function.
>   (ignore_changing_insns::should_ignore_insn): Likewise.
>   * rtl-ssa/movement.h (restrict_movement_for_dead_range): Treat
>   the ignore parameter as an object with the same interface as
>   ignore_nothing.
>   (restrict_movement_for_defs_ignoring): Rename to...
>   (restrict_movement_for_defs): ...this and treat the ignore parameter
>   as an object with the same interface as ignore_nothing.
>   (restrict_movement_for_uses_ignoring): Rename to...
>   (restrict_movement_for_uses): ...this and treat the ignore parameter
>   as an object 

Re: [Patch, aarch64, middle-end] v3: Move pair_fusion pass from aarch64 to middle-end

2024-05-22 Thread Alex Coplan
Hi Ajit,

You need to remove the header dependencies that are no longer required
for aarch64-ldp-fusion.o in t-aarch64 (not forgetting to update the
ChangeLog).  A few other minor nits below.

LGTM with those changes, but you'll need Richard S to approve.

Thanks a lot for doing this.

On 22/05/2024 00:16, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> All comments are addressed.
> 
> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> to support multiple targets.
> 
> Common infrastructure of load store pair fusion is divided into
> target independent and target dependent code.
> 
> Target independent code is structured in the following files.
> gcc/pair-fusion.h
> gcc/pair-fusion.cc
> 
> Target independent code is the Generic code with pure virtual
> function to interface betwwen target independent and dependent
> code.
> 
> Bootstrapped and regtested on aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> 
> 
> aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end
> 
> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> to support multiple targets.
> 
> Common infrastructure of load store pair fusion is divided into
> target independent and target dependent code.
> 
> Target independent code is structured in the following files.
> gcc/pair-fusion.h
> gcc/pair-fusion.cc
> 
> Target independent code is the Generic code with pure virtual
> function to interface betwwen target independent and dependent
> code.
> 
> 2024-05-22  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * pair-fusion.h: Generic header code for load store pair fusion
>   that can be shared across different architectures.
>   * pair-fusion.cc: Generic source code implementation for
>   load store pair fusion that can be shared across different 
> architectures.
>   * Makefile.in: Add new object file pair-fusion.o.
>   * config/aarch64/aarch64-ldp-fusion.cc: Delete generic code and move it
>   to pair-fusion.cc in the middle-end.
>   * config/aarch64/t-aarch64: Add header file dependency on pair-fusion.h.
> ---
>  gcc/Makefile.in  |1 +
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 3298 +-
>  gcc/config/aarch64/t-aarch64 |2 +-
>  gcc/pair-fusion.cc   | 3013 
>  gcc/pair-fusion.h|  193 ++
>  5 files changed, 3286 insertions(+), 3221 deletions(-)
>  create mode 100644 gcc/pair-fusion.cc
>  create mode 100644 gcc/pair-fusion.h
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a7f15694c34..643342f623d 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1563,6 +1563,7 @@ OBJS = \
>   ipa-strub.o \
>   ipa.o \
>   ira.o \
> + pair-fusion.o \
>   ira-build.o \
>   ira-costs.o \
>   ira-conflicts.o \
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 085366cdf68..0af927231d3 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc

> diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
> index 78713558e7d..bdada08be70 100644
> --- a/gcc/config/aarch64/t-aarch64
> +++ b/gcc/config/aarch64/t-aarch64
> @@ -203,7 +203,7 @@ aarch64-early-ra.o: 
> $(srcdir)/config/aarch64/aarch64-early-ra.cc \
>  aarch64-ldp-fusion.o: $(srcdir)/config/aarch64/aarch64-ldp-fusion.cc \
>  $(CONFIG_H) $(SYSTEM_H) $(CORETYPES_H) $(BACKEND_H) $(RTL_H) $(DF_H) \
>  $(RTL_SSA_H) cfgcleanup.h tree-pass.h ordered-hash-map.h tree-dfa.h \
> -fold-const.h tree-hash-traits.h print-tree.h
> +fold-const.h tree-hash-traits.h print-tree.h pair-fusion.h

So now you also need to remove the deps on the includes removed in the latest
version of the patch.

>   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>   $(srcdir)/config/aarch64/aarch64-ldp-fusion.cc
>  
> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
> new file mode 100644
> index 000..827b88cf2fc
> --- /dev/null
> +++ b/gcc/pair-fusion.cc
> @@ -0,0 +1,3013 @@
> +// Pass to fuse adjacent loads/stores into paired memory accesses.
> +// Copyright (C) 2024 Free Software Foundation, Inc.
> +//
> +// This file is part of GCC.
> +//
> +// GCC is free software; you can redistribute it and/or modify it
> +// under the terms of the GNU General Public License as published by
> +// the Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +//
> +// GCC is distributed in the hope that it will be useful, but
> +// WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +// General Public License for more details.
> +//
> +// You should have received a copy of the GNU General Public License
> +// along with GCC; see the file COPYING3.  If not see
> +// .
> +
> +#define INCLU

Re: [Patch, aarch64, middle-end] v2: Move pair_fusion pass from aarch64 to middle-end

2024-05-21 Thread Alex Coplan
Hi Ajit,

I've left some more comments below.  It's getting there now, thanks for
your patience.

On 21/05/2024 20:32, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> All comments are addressed.
> 
> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> to support multiple targets.
> 
> Common infrastructure of load store pair fusion is divided into
> target independent and target dependent code.
> 
> Target independent code is structured in the following files.
> gcc/pair-fusion.h
> gcc/pair-fusion.cc
> 
> Target independent code is the Generic code with pure virtual
> function to interface betwwen target independent and dependent
> code.
> 
> Bootstrapped and regtested on aarch64-linux-gnu.
> 
> Thabks & Regards
> Ajit
> 
> 
> aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end
> 
> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> to support multiple targets.
> 
> Common infrastructure of load store pair fusion is divided into
> target independent and target dependent code.
> 
> Target independent code is structured in the following files.
> gcc/pair-fusion.h
> gcc/pair-fusion.cc
> 
> Target independent code is the Generic code with pure virtual
> function to interface betwwen target independent and dependent
> code.
> 
> 2024-05-21  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * pair-fusion.h: Generic header code for load store pair fusion
>   that can be shared across different architectures.
>   * pair-fusion.cc: Generic source code implementation for
>   load store pair fusion that can be shared across different 
> architectures.
>   * Makefile.in: Add new object file pair-fusion.o.
>   * config/aarch64/aarch64-ldp-fusion.cc: Delete generic code and move it
>   to pair-fusion.cc in the middle-end.
>   * config/aarch64/t-aarch64: Add header file dependency pair-fusion.h.

insert "on" after dependency.

> ---
>  gcc/Makefile.in  |1 +
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 3282 +-
>  gcc/config/aarch64/t-aarch64 |2 +-
>  gcc/pair-fusion.cc   | 3013 
>  gcc/pair-fusion.h|  189 ++
>  5 files changed, 3280 insertions(+), 3207 deletions(-)
>  create mode 100644 gcc/pair-fusion.cc
>  create mode 100644 gcc/pair-fusion.h
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a7f15694c34..643342f623d 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1563,6 +1563,7 @@ OBJS = \
>   ipa-strub.o \
>   ipa.o \
>   ira.o \
> + pair-fusion.o \
>   ira-build.o \
>   ira-costs.o \
>   ira-conflicts.o \
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 085366cdf68..612f62060bc 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -40,262 +40,13 @@
>  
>  using namespace rtl_ssa;

I think we should drop this, since the public interface and remaining
backend code in this file is independent of RTL-SSA.  I think you should
also drop the inlcude of "rtl-ssa.h" from this file.   These two
changes will force you to get the header file (pair-fusion.h) right.

With these changes we can also significantly thin out the include list
in this file.  The current set of includes is:

#define INCLUDE_ALGORITHM
#define INCLUDE_FUNCTIONAL
#define INCLUDE_LIST
#define INCLUDE_TYPE_TRAITS
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "rtl.h"
#include "df.h"
#include "rtl-iter.h"
#include "rtl-ssa.h"
#include "cfgcleanup.h"
#include "tree-pass.h"
#include "ordered-hash-map.h"
#include "tree-dfa.h"
#include "fold-const.h"
#include "tree-hash-traits.h"
#include "print-tree.h"
#include "insn-attr.h"

I think instead the following should be enough for this file:

#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "rtl.h"
#include "memmodel.h"
#include "emit-rtl.h"
#include "tm_p.h"
#include "rtl-iter.h"
#include "tree-pass.h"
#include "insn-attr.h"
#include "pair-fusion.h"

>  
> +#include "pair-fusion.h"
> +
>  static constexpr HOST_WIDE_INT LDP_IMM_BITS = 7;
>  static constexpr HOST_WIDE_INT LDP_IMM_SIGN_BIT = (1 << (LDP_IMM_BITS - 1));
>  static constexpr HOST_WIDE_INT LDP_MAX_IMM = LDP_IMM_SIGN_BIT - 1;
>  static constexpr HOST_WIDE_INT LDP_MIN_IMM = -LDP_MAX_IMM - 1;
>  

> diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
> index 78713558e7d..bdada08be70 100644
> --- a/gcc/config/aarch64/t-aarch64
> +++ b/gcc/config/aarch64/t-aarch64
> @@ -203,7 +203,7 @@ aarch64-early-ra.o: 
> $(srcdir)/config/aarch64/aarch64-early-ra.cc \
>  aarch64-ldp-fusion.o: $(srcdir)/config/aarch64/aarch64-ldp-fusion.cc \
>  $(CONFIG_H) $(SYSTEM_H) $(CORETYPES_H) $(BACKEND_H) $(RTL_H) $(DF_H) \
>  $(RTL_SSA_H) cfgcleanup.h tree-pass.h ordered-hash-map.h tree-dfa.h \
> -fold-c

Re: [Patch, aarch64, middle-end] Move pair_fusion pass from aarch64 to middle-end

2024-05-21 Thread Alex Coplan
On 20/05/2024 21:50, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> to support multiple targets.
> 
> Common infrastructure of load store pair fusion is divided into
> target independent and target dependent code.
> 
> Target independent code is structured in the following files.
> gcc/pair-fusion.h
> gcc/pair-fusion.cc
> 
> Target independent code is the Generic code with pure virtual
> function to interface betwwen target independent and dependent
> code.
> 
> Bootstrapped and regtested on aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end
> 
> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> to support multiple targets.
> 
> Common infrastructure of load store pair fusion is divided into
> target independent and target dependent code.
> 
> Target independent code is structured in the following files.
> gcc/pair-fusion.h
> gcc/pair-fusion.cc
> 
> Target independent code is the Generic code with pure virtual
> function to interface betwwen target independent and dependent
> code.
> 
> 2024-05-20  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * pair-fusion.h: Generic header code for load store fusion
>   that can be shared across different architectures.
>   * pair-fusion.cc: Generic source code implementation for
>   load store fusion that can be shared across different architectures.
>   * Makefile.in: Add new executable pair-fusion.o
>   * config/aarch64/aarch64-ldp-fusion.cc: Target specific
>   code for load store fusion of aarch64.

Apologies for missing this in the last review but you'll also need to
update gcc/config/aarch64/t-aarch64 to add a dependency on pair-fusion.h
for aarch64-ldp-fusion.o.

Thanks,
Alex

> ---
>  gcc/Makefile.in  |1 +
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 3303 +-
>  gcc/pair-fusion.cc   | 2852 +++
>  gcc/pair-fusion.h|  340 +++
>  4 files changed, 3268 insertions(+), 3228 deletions(-)
>  create mode 100644 gcc/pair-fusion.cc
>  create mode 100644 gcc/pair-fusion.h



Re: [Patch, aarch64, middle-end] Move pair_fusion pass from aarch64 to middle-end

2024-05-21 Thread Alex Coplan
On 21/05/2024 16:02, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 21/05/24 1:16 am, Alex Coplan wrote:
> > On 20/05/2024 18:44, Alex Coplan wrote:
> >> Hi Ajit,
> >>
> >> On 20/05/2024 21:50, Ajit Agarwal wrote:
> >>> Hello Alex/Richard:
> >>>
> >>> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> >>> to support multiple targets.
> >>>
> >>> Common infrastructure of load store pair fusion is divided into
> >>> target independent and target dependent code.
> >>>
> >>> Target independent code is structured in the following files.
> >>> gcc/pair-fusion.h
> >>> gcc/pair-fusion.cc
> >>>
> >>> Target independent code is the Generic code with pure virtual
> >>> function to interface betwwen target independent and dependent
> >>> code.
> >>>
> >>> Bootstrapped and regtested on aarch64-linux-gnu.
> >>>
> >>> Thanks & Regards
> >>> Ajit
> >>>
> >>> aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end
> >>>
> >>> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> >>> to support multiple targets.
> >>>
> >>> Common infrastructure of load store pair fusion is divided into
> >>> target independent and target dependent code.
> >>>
> >>> Target independent code is structured in the following files.
> >>> gcc/pair-fusion.h
> >>> gcc/pair-fusion.cc
> >>>
> >>> Target independent code is the Generic code with pure virtual
> >>> function to interface betwwen target independent and dependent
> >>> code.
> >>>
> >>> 2024-05-20  Ajit Kumar Agarwal  
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>   * pair-fusion.h: Generic header code for load store fusion
> >>
> >> Insert "pair" before fusion?
> 
> Addressed in v1 of the patch.
> >>
> >>>   that can be shared across different architectures.
> >>>   * pair-fusion.cc: Generic source code implementation for
> >>>   load store fusion that can be shared across different architectures.
> >>
> >> Likewise.
> Addressed in v1 of the patch.
> >>
> >>>   * Makefile.in: Add new executable pair-fusion.o
> >>
> >> It's not an executable but an object file.
> >>
> >>>   * config/aarch64/aarch64-ldp-fusion.cc: Target specific
> >>>   code for load store fusion of aarch64.
> >>
> >> I guess this should say something like: "Delete generic code and move it
> >> to pair-fusion.cc in the middle-end."
> >>
> >> I've left some comments below on the header file.  The rest of the patch
> >> looks pretty good to me.  I tried diffing the original contents of
> >> aarch64-ldp-fusion.cc with pair-fusion.cc, and that looks as expected.
> >>
> > 
> > 
> > 
> >>> diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h
> >>> new file mode 100644
> >>> index 000..00f6d3e149a
> >>> --- /dev/null
> >>> +++ b/gcc/pair-fusion.h
> >>> @@ -0,0 +1,340 @@
> >>> +// Pair Mem fusion generic header file.
> >>> +// Copyright (C) 2024 Free Software Foundation, Inc.
> >>> +//
> >>> +// This file is part of GCC.
> >>> +//
> >>> +// GCC is free software; you can redistribute it and/or modify it
> >>> +// under the terms of the GNU General Public License as published by
> >>> +// the Free Software Foundation; either version 3, or (at your option)
> >>> +// any later version.
> >>> +//
> >>> +// GCC is distributed in the hope that it will be useful, but
> >>> +// WITHOUT ANY WARRANTY; without even the implied warranty of
> >>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> >>> +// General Public License for more details.
> >>> +//
> >>> +// You should have received a copy of the GNU General Public License
> >>> +// along with GCC; see the file COPYING3.  If not see
> >>> +// <http://www.gnu.org/licenses/>.
> >>> +
> >>> +#define INCLUDE_ALGORITHM
> >>> +#define INCLUDE_FUNCTIONAL
> >>> +#define INCLUDE_LIST
> >>> +#define INCLUDE_TYPE_TRAITS
> >>> +#include "config.h&

Re: [Patch, aarch64, middle-end] Move pair_fusion pass from aarch64 to middle-end

2024-05-20 Thread Alex Coplan
On 20/05/2024 18:44, Alex Coplan wrote:
> Hi Ajit,
> 
> On 20/05/2024 21:50, Ajit Agarwal wrote:
> > Hello Alex/Richard:
> > 
> > Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> > to support multiple targets.
> > 
> > Common infrastructure of load store pair fusion is divided into
> > target independent and target dependent code.
> > 
> > Target independent code is structured in the following files.
> > gcc/pair-fusion.h
> > gcc/pair-fusion.cc
> > 
> > Target independent code is the Generic code with pure virtual
> > function to interface betwwen target independent and dependent
> > code.
> > 
> > Bootstrapped and regtested on aarch64-linux-gnu.
> > 
> > Thanks & Regards
> > Ajit
> > 
> > aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end
> > 
> > Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> > to support multiple targets.
> > 
> > Common infrastructure of load store pair fusion is divided into
> > target independent and target dependent code.
> > 
> > Target independent code is structured in the following files.
> > gcc/pair-fusion.h
> > gcc/pair-fusion.cc
> > 
> > Target independent code is the Generic code with pure virtual
> > function to interface betwwen target independent and dependent
> > code.
> > 
> > 2024-05-20  Ajit Kumar Agarwal  
> > 
> > gcc/ChangeLog:
> > 
> > * pair-fusion.h: Generic header code for load store fusion
> 
> Insert "pair" before fusion?
> 
> > that can be shared across different architectures.
> > * pair-fusion.cc: Generic source code implementation for
> > load store fusion that can be shared across different architectures.
> 
> Likewise.
> 
> > * Makefile.in: Add new executable pair-fusion.o
> 
> It's not an executable but an object file.
> 
> > * config/aarch64/aarch64-ldp-fusion.cc: Target specific
> > code for load store fusion of aarch64.
> 
> I guess this should say something like: "Delete generic code and move it
> to pair-fusion.cc in the middle-end."
> 
> I've left some comments below on the header file.  The rest of the patch
> looks pretty good to me.  I tried diffing the original contents of
> aarch64-ldp-fusion.cc with pair-fusion.cc, and that looks as expected.
> 



> > diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h
> > new file mode 100644
> > index 000..00f6d3e149a
> > --- /dev/null
> > +++ b/gcc/pair-fusion.h
> > @@ -0,0 +1,340 @@
> > +// Pair Mem fusion generic header file.
> > +// Copyright (C) 2024 Free Software Foundation, Inc.
> > +//
> > +// This file is part of GCC.
> > +//
> > +// GCC is free software; you can redistribute it and/or modify it
> > +// under the terms of the GNU General Public License as published by
> > +// the Free Software Foundation; either version 3, or (at your option)
> > +// any later version.
> > +//
> > +// GCC is distributed in the hope that it will be useful, but
> > +// WITHOUT ANY WARRANTY; without even the implied warranty of
> > +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +// General Public License for more details.
> > +//
> > +// You should have received a copy of the GNU General Public License
> > +// along with GCC; see the file COPYING3.  If not see
> > +// <http://www.gnu.org/licenses/>.
> > +
> > +#define INCLUDE_ALGORITHM
> > +#define INCLUDE_FUNCTIONAL
> > +#define INCLUDE_LIST
> > +#define INCLUDE_TYPE_TRAITS
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "backend.h"
> > +#include "rtl.h"
> > +#include "df.h"
> > +#include "rtl-iter.h"
> > +#include "rtl-ssa.h"
> 
> I'm not sure how desirable this is, but you might be able to
> forward-declare RTL-SSA types like this:
> 
> class def_info;
> class insn_info;
> class insn_range_info;
> 
> thus removing the need to include the header here, since the interface
> only refers to these types by pointer or reference.
> 
> Richard: please say if you'd prefer keeping the include.
> 
> > +#include "cfgcleanup.h"
> > +#include "tree-pass.h"
> > +#include "ordered-hash-map.h"
> > +#include "tree-dfa.h"
> > +#include "fold-const.h"
> > +#include "tree-hash-traits.h"
> > +#inclu

Re: [Patch, aarch64] v6: Preparatory patch to place target independent and,dependent changed code in one file

2024-05-17 Thread Alex Coplan
Hi Ajit,

On 17/05/2024 18:05, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 16/05/24 10:21 pm, Alex Coplan wrote:
> > Hi Ajit,
> > 
> > Thanks a lot for working through the review feedback.
> > 
> 
> Thanks a lot for reviewing the code and approving the patch.

To be clear, I didn't approve the patch because I can't, I just said
that it looks good to me.  You need an AArch64 maintainer (probably
Richard S) to approve it.

> 
> > The patch LGTM with the two minor suggested changes below.  I can't
> > approve the patch, though, so you'll need an OK from Richard S.
> > 
> > Also, I'm not sure if it makes sense to apply the patch in isolation, it
> > might make more sense to only apply it in series with follow-up patches to:
> >  - Finish renaming any bits of the generic code that need renaming (I
> >guess we'll want to rename at least ldp_bb_info to something else,
> >probably there are other bits too).
> >  - Move the generic parts out of gcc/config/aarch64 to a .cc file in the
> >middle-end.
> >
> 
> Addressed in separate patch sent.

Hmm, that doens't look right.  You sent a single patch here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652028.html
which looks to squash the work you've done in this patch together with
the move.

What I expect to see is a patch series, as follows:

[PATCH 1/3] aarch64: Split generic code from aarch64 code in ldp fusion
[PATCH 2/3] aarch64: Further renaming of generic code
[PATCH 3/3] aarch64, middle-end: Move pair_fusion pass from aarch64 to 
middle-end

where 1/3 is exactly the patch that I reviewed above with the two
(minor) requested changes (plus any changes requested by Richard), 2/3
(optionally) does further renaming to use generic terminology in the
generic code where needed/desired, and 3/3 does a straight cut/paste
move of code into pair-fusion.h and pair-fusion.cc, with no other
changes (save for perhaps a Makefile change and adding an include in
aarch64-ldp-fusion.cc).

Arguably you could split this even further and do the move of the
pair_fusion class to the new header in a separate patch prior to the
final move.

N.B. (IMO) the patches should be presented like this both for review and
(if approved) when committing.

Richard S may have further suggestions on how to split the patches /
make them more tractable to review, I think this is the bare minimum
that is needed though.

Hope that makes sense.

Thanks,
Alex

>  
> > I'll let Richard S make the final judgement on that.  I don't really
> > mind either way.
> 
> Sure.
> 
> Thanks & Regards
> Ajit
> > 
> > On 15/05/2024 15:06, Ajit Agarwal wrote:
> >> Hello Alex/Richard:
> >>
> >> All review comments are addressed.
> >>
> >> Common infrastructure of load store pair fusion is divided into target
> >> independent and target dependent changed code.
> >>
> >> Target independent code is the Generic code with pure virtual function
> >> to interface between target independent and dependent code.
> >>
> >> Target dependent code is the implementation of pure virtual function for
> >> aarch64 target and the call to target independent code.
> >>
> >> Bootstrapped and regtested on aarch64-linux-gnu.
> >>
> >> Thanks & Regards
> >> Ajit
> >>
> >> aarch64: Preparatory patch to place target independent and
> >> dependent changed code in one file
> >>
> >> Common infrastructure of load store pair fusion is divided into target
> >> independent and target dependent changed code.
> >>
> >> Target independent code is the Generic code with pure virtual function
> >> to interface betwwen target independent and dependent code.
> >>
> >> Target dependent code is the implementation of pure virtual function for
> >> aarch64 target and the call to target independent code.
> >>
> >> 2024-05-15  Ajit Kumar Agarwal  
> >>
> >> gcc/ChangeLog:
> >>
> >>* config/aarch64/aarch64-ldp-fusion.cc: Place target
> >>independent and dependent changed code.
> >> ---
> >>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 533 +++
> >>  1 file changed, 357 insertions(+), 176 deletions(-)
> >>
> >> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> >> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> index 1d9caeab05d..429e532ea3b 100644
> >> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> @@ -138,6 +138,225 @@ struct alt_base
> &g

Re: [Patch, aarch64] v6: Preparatory patch to place target independent and,dependent changed code in one file

2024-05-16 Thread Alex Coplan
Hi Ajit,

Thanks a lot for working through the review feedback.

The patch LGTM with the two minor suggested changes below.  I can't
approve the patch, though, so you'll need an OK from Richard S.

Also, I'm not sure if it makes sense to apply the patch in isolation, it
might make more sense to only apply it in series with follow-up patches to:
 - Finish renaming any bits of the generic code that need renaming (I
   guess we'll want to rename at least ldp_bb_info to something else,
   probably there are other bits too).
 - Move the generic parts out of gcc/config/aarch64 to a .cc file in the
   middle-end.

I'll let Richard S make the final judgement on that.  I don't really
mind either way.

On 15/05/2024 15:06, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> All review comments are addressed.
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface between target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> Bootstrapped and regtested on aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> aarch64: Preparatory patch to place target independent and
> dependent changed code in one file
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> 2024-05-15  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 533 +++
>  1 file changed, 357 insertions(+), 176 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 1d9caeab05d..429e532ea3b 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -138,6 +138,225 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int &budget) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const = 0;
> +  virtual void advance () = 0;
> +};
> +
> +// When querying handle_writeback_opportunities, this enum is used to
> +// qualify which opportunities we are asking about.
> +enum class writeback {
> +  // Only those writeback opportunities that arise from existing
> +  // auto-increment accesses.
> +  EXISTING,

Very minor nit: I think an extra blank line here would be nice for readability
now that the enumerators have comments above.

> +  // All writeback opportunities including those that involve folding
> +  // base register updates into a non-writeback pair.
> +  ALL
> +};
> +

Can we have a block comment here which describes the purpose of the
class and how it fits together with the target?  Something like the
following would do:

// This class can be overriden by targets to give a pass that fuses
// adjacent loads and stores into load/store pair instructions.
//
// The target can override the various virtual functions to customize
// the behaviour of the pass as appropriate for the target.

> +struct pair_fusion {
> +  pair_fusion ()
> +  {
> +calculate_dominance_info (CDI_DOMINATORS);
> +df_analyze ();
> +crtl->ssa = new rtl_ssa::function_info (cfun);
> +  };
> +
> +  // Given:
> +  // - an rtx REG_OP, the non-memory operand in a load/store insn,
> +  // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
> +  // - a boolean LOAD_P (true iff the insn is a load), then:
> +  // return true if the access should be considered an FP/SIMD access.
> +  // Such accesses are segregated from GPR accesses, since we only want
> +  // to form pairs for accesses that use the same register file.
> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
> +  {
> +return false;
> +  }
> +
> +  // Return true if we should consider forming pairs from memory
> +  // accesses with operand mode MODE at this stage in compilation.
> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> +
> +  // Return true iff REG_OP is a suitable register operand for a paired
> +  // memory access, where LOAD_P is true if we're asking about loads and
> +  // false for stores.  MODE gives the mode of the operand.
> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> +   machine_mode mode) = 0;
> +
> +  // Return alias check limit.
> +  // This is needed to avoi

Re: [Patch, aarch64] v4: Preparatory patch to place target independent and,dependent changed code in one file

2024-05-14 Thread Alex Coplan
Hi Ajit,

Please can you pay careful attention to the review comments?

In particular, you have ignored my comment about changing the access of
member functions in ldp_bb_info several times now (on at least three
patch reviews).

Likewise on multiple occasions you've only partially implemented a piece
of review feedback (e.g. applying the "override" keyword to virtual
overrides).

That all makes it rather tiresome to review your patches.

Also, I realise I should have mentioned this on a previous revision of
this patch, but I thought we previously agreed (with Richard S) to split
out the renaming in existing code (e.g. ldp/stp -> "paired access" and
so on) to a separate patch?  That would make this eaiser to review.

On 14/05/2024 15:08, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> All comments are addressed.
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> Bootstrapped on aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> 
> 
> arch64: Preparatory patch to place target independent and
> dependent changed code in one file
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> 2024-05-14  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 526 +++
>  1 file changed, 345 insertions(+), 181 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 1d9caeab05d..e6af4b0570a 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -138,6 +138,210 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int &budget) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const = 0;
> +  virtual void advance () = 0;
> +};
> +
> +// This is used in handle_writeback_opportunities describing
> +// ALL if aarch64_ldp_writeback > 1 otherwise check
> +// EXISTING if aarch64_ldp_writeback.

Since this enum belongs to the generic interface, it's best if it is
described in general terms, i.e. the comment shouldn't refer to the
aarch64 param.

How about:

// When querying handle_writeback_opportunities, this enum is used to
// qualify which opportunities we are asking about.

then above the EXISTING enumerator, you could say:

  // Only those writeback opportunities that arise from existing
  // auto-increment accesses.

and for ALL, you could say:

  // All writeback opportunities including those that involve folding
  // base register updates into a non-writeback pair.

> +enum class writeback {
> +  ALL,
> +  EXISTING
> +};

Also, sorry for the very minor nit, but I think it is more logical if we
flip the order of the enumerators here, i.e. EXISTING should come first.

> +
> +struct pair_fusion {
> +  pair_fusion ()
> +  {
> +calculate_dominance_info (CDI_DOMINATORS);
> +df_analyze ();
> +crtl->ssa = new rtl_ssa::function_info (cfun);
> +  };
> +
> +  // Given:
> +  // - an rtx REG_OP, the non-memory operand in a load/store insn,
> +  // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
> +  // - a boolean LOAD_P (true iff the insn is a load), then:
> +  // return true if the access should be considered an FP/SIMD access.
> +  // Such accesses are segregated from GPR accesses, since we only want
> +  // to form pairs for accesses that use the same register file.
> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
> +  {
> +return false;
> +  }
> +
> +  // Return true if we should consider forming ldp/stp insns from memory

Replace "ldp/stp insns" with "pairs" here, since this is the generic
interface.

> +  // accesses with operand mode MODE at this stage in compilation.
> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> +
> +  // Return true iff REG_OP is a suitable register operand for a paired
> +  // memory access, where LOAD_P is true if we're asking about loads and
> +  // false for stores.  MODE gives the mode of the operand.
> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> +   mach

Re: [Patch, aarch64] v3: Preparatory patch to place target independent and,dependent changed code in one file

2024-05-13 Thread Alex Coplan
Hi Ajit,

Why did you send three mails for this revision of the patch?  If you're
going to send a new revision of the patch you should increment the
version number and outline the changes / reasons for the new revision.

Mostly the comments below are just style nits and things you missed from
the last review(s) (please try not to miss so many in the future).

On 09/05/2024 17:06, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> All review comments are addressed.
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> Bootstrapped on aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> 
> 
> aarch64: Preparatory patch to place target independent and
> dependent changed code in one file
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> 2024-05-09  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 542 +++
>  1 file changed, 363 insertions(+), 179 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 1d9caeab05d..217790e111a 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -138,6 +138,224 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int &budget) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const = 0;
> +  virtual void advance () = 0;
> +};
> +
> +enum class writeback{

You missed a nit here.  Space before '{'.

> +  ALL,
> +  EXISTING
> +};

You also missed adding comments for the enum, please see the review for v2:
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651074.html

> +
> +struct pair_fusion {
> +  pair_fusion ()
> +  {
> +calculate_dominance_info (CDI_DOMINATORS);
> +df_analyze ();
> +crtl->ssa = new rtl_ssa::function_info (cfun);
> +  };
> +
> +  // Given:
> +  // - an rtx REG_OP, the non-memory operand in a load/store insn,
> +  // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
> +  // - a boolean LOAD_P (true iff the insn is a load), then:
> +  // return true if the access should be considered an FP/SIMD access.
> +  // Such accesses are segregated from GPR accesses, since we only want
> +  // to form pairs for accesses that use the same register file.
> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
> +  {
> +return false;
> +  }
> +
> +  // Return true if we should consider forming ldp/stp insns from memory
> +  // accesses with operand mode MODE at this stage in compilation.
> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> +
> +  // Return true iff REG_OP is a suitable register operand for a paired
> +  // memory access, where LOAD_P is true if we're asking about loads and
> +  // false for stores.  MEM_MODE gives the mode of the operand.
> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> +   machine_mode mode) = 0;

The comment needs updating since we changed the name of the last param,
i.e. s/MEM_MODE/MODE/.

> +
> +  // Return alias check limit.
> +  // This is needed to avoid unbounded quadratic behaviour when
> +  // performing alias analysis.
> +  virtual int pair_mem_alias_check_limit () = 0;
> +
> +  // Returns true if we should try to handle writeback opportunities
> +  // (not whether there are any).
> +  virtual bool handle_writeback_opportunities (enum writeback which) = 0 ;

Heh, the bit in parens from the v2 review probably doesn't need to go
into the comment here.

Also you should describe WHICH in the comment.

> +
> +  // Given BASE_MEM, the mem from the lower candidate access for a pair,
> +  // and LOAD_P (true if the access is a load), check if we should proceed
> +  // to form the pair given the target's code generation policy on
> +  // paired accesses.
> +  virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
> + machine_mode mode) = 0;

The name of the first param needs updating in the prototype, i.e.
s/first_mem/base_mem/.  I think you missed the bit a

Re: [PATCH, aarch64] v2: Preparatory patch to place target independent and,dependent changed code in one file

2024-05-08 Thread Alex Coplan
Hi Ajit,

Sorry for the long delay in reviewing this.

This is really getting there now.  I've left a few more comments below.

Apart from minor style things, the main remaining issues are mostly
around comments.  It's important to have good clear comments for
functions with the parameters (and return value, if any) clearly
described.  See https://www.gnu.org/prep/standards/standards.html#Comments

Note that this now needs a little rebasing, too.

On 21/04/2024 13:22, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> All review comments are addressed and changes are made to transform_for_base
> function as per consensus.
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> Bootstrapped on aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> 
> 
> aarch64: Preparatory patch to place target independent and
> dependent changed code in one file
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> 2024-04-21  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 484 +++
>  1 file changed, 325 insertions(+), 159 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 365dcf48b22..83a917e1d20 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -138,6 +138,189 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int &budget) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const = 0;
> +  virtual void advance () = 0;
> +};
> +
> +// Forward declaration to be used inside the aarch64_pair_fusion class.
> +bool ldp_operand_mode_ok_p (machine_mode mode);
> +rtx aarch64_destructure_load_pair (rtx regs[2], rtx pattern);
> +rtx aarch64_destructure_store_pair (rtx regs[2], rtx pattern);
> +rtx aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2],
> + bool load_p);

I don't think we want to change the linkage of these, they should be kept
static.

> +enum class writeback{

Nit: space before '{'

> +  WRITEBACK_PAIR_P,
> +  WRITEBACK
> +};

We're going to want some more descriptive names here.  How about
EXISTING and ALL?  Note that the WRITEBACK_ prefix isn't necessary as
you're using an enum class, so uses of the enumerators need to be
prefixed with writeback:: anyway.  A comment describing the usage of the
enum as well as comments above the enumerators describing their
interpretation would be good.

> +
> +struct pair_fusion {
> +

Nit: excess blank line.

> +  pair_fusion ()
> +  {
> +calculate_dominance_info (CDI_DOMINATORS);
> +df_analyze ();
> +crtl->ssa = new rtl_ssa::function_info (cfun);
> +  };

Can we have one blank line between the virtual functions, please?  I
think that would be more readable now that there are comments above each
of them.

> +  // Return true if GPR is FP or SIMD accesses, passed
> +  // with GPR reg_op rtx, machine mode and load_p.

It's slightly awkward trying to document this without the parameter
names, but I can see that you're omitting them to avoid unused parameter
warnings.  One option would be to introduce names in the comment as you
go.  How about this instead:

// Given:
// - an rtx REG_OP, the non-memory operand in a load/store insn,
// - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
// - a boolean LOAD_P (true iff the insn is a load), then:
// return true if the access should be considered an FP/SIMD access.
// Such accesses are segregated from GPR accesses, since we only want to
// form pairs for accesses that use the same register file.

> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
> +  {
> +return false;
> +  }
> +  // Return true if pair operand mode is ok. Passed with
> +  // machine mode.

Could you use something closer to the comment that is already above
ldp_operand_mode_ok_p?  The purpose of this predicate is really to test
the following: "is it a good idea (for optimization) to form paired
accesses with this operand mode at this stage in compilation?

Re: [PATCH v2] aarch64: Preserve mem info on change of base for ldp/stp [PR114674]

2024-05-07 Thread Alex Coplan
On 12/04/2024 12:13, Richard Sandiford wrote:
> Alex Coplan  writes:
> > This is a v2 because I accidentally sent a WIP version of the patch last
> > time round which used replace_equiv_address instead of
> > replace_equiv_address_nv; that caused some ICEs (pointed out by the
> > Linaro CI) since pair addressing modes aren't a subset of the addresses
> > that are accepted by memory_operand for a given mode.
> >
> > This patch should otherwise be identical to v1.  Bootstrapped/regtested
> > on aarch64-linux-gnu (indeed this is the patch I actually tested last
> > time), is this version also OK for GCC 15?
> 
> OK, thanks.  Sorry for missing this in the first review.

Now pushed to trunk, thanks.

Alex

> 
> Richard
> 
> > Thanks,
> > Alex
> >
> > --- >8 ---
> >
> > The ldp/stp fusion pass can change the base of an access so that the two
> > accesses end up using a common base register.  So far we have been using
> > adjust_address_nv to do this, but this means that we don't preserve
> > other properties of the mem we're replacing.  It seems better to use
> > replace_equiv_address_nv, as this will preserve e.g. the MEM_ALIGN of the
> > mem whose address we're changing.
> >
> > The PR shows that by adjusting the other mem we lose alignment
> > information about the original access and therefore end up rejecting an
> > otherwise viable pair when --param=aarch64-stp-policy=aligned is passed.
> > This patch fixes that by using replace_equiv_address_nv instead.
> >
> > Notably this is the same approach as taken by
> > aarch64_check_consecutive_mems when a change of base is required, so
> > this at least makes things more consistent between the ldp fusion pass
> > and the peepholes.
> >
> > gcc/ChangeLog:
> >
> > PR target/114674
> > * config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair):
> > Use replace_equiv_address_nv on a change of base instead of
> > adjust_address_nv on the other access.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/114674
> > * gcc.target/aarch64/pr114674.c: New test.
> >
> > diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> > b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > index 365dcf48b22..d07d79df06c 100644
> > --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > @@ -1730,11 +1730,11 @@ ldp_bb_info::fuse_pair (bool load_p,
> > adjust_amt *= -1;
> >  
> >rtx change_reg = XEXP (change_pat, !load_p);
> > -  machine_mode mode_for_mem = GET_MODE (change_mem);
> >rtx effective_base = drop_writeback (base_mem);
> > -  rtx new_mem = adjust_address_nv (effective_base,
> > -  mode_for_mem,
> > -  adjust_amt);
> > +  rtx adjusted_addr = plus_constant (Pmode,
> > +XEXP (effective_base, 0),
> > +adjust_amt);
> > +  rtx new_mem = replace_equiv_address_nv (change_mem, adjusted_addr);
> >rtx new_set = load_p
> > ? gen_rtx_SET (change_reg, new_mem)
> > : gen_rtx_SET (new_mem, change_reg);
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr114674.c 
> > b/gcc/testsuite/gcc.target/aarch64/pr114674.c
> > new file mode 100644
> > index 000..944784fd008
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr114674.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 --param=aarch64-stp-policy=aligned" } */
> > +typedef struct {
> > +   unsigned int f1;
> > +   unsigned int f2;
> > +} test_struct;
> > +
> > +static test_struct ts = {
> > +   123, 456
> > +};
> > +
> > +void foo(void)
> > +{
> > +   ts.f2 = 36969 * (ts.f2 & 65535) + (ts.f1 >> 16);
> > +   ts.f1 = 18000 * (ts.f2 & 65535) + (ts.f2 >> 16);
> > +}
> > +/* { dg-final { scan-assembler-times "stp" 1 } } */


[PATCH] aarch64: Fix typo in aarch64-ldp-fusion.cc:combine_reg_notes [PR114936]

2024-05-03 Thread Alex Coplan
This fixes a typo in combine_reg_notes in the load/store pair fusion
pass.  As it stands, the calls to filter_notes store any
REG_FRAME_RELATED_EXPR to fr_expr with the following association:

 - i2 -> fr_expr[0]
 - i1 -> fr_expr[1]

but then the checks inside the following if statement expect the
opposite (more natural) association, i.e.:

 - i2 -> fr_expr[1]
 - i1 -> fr_expr[0]

this patch fixes the oversight by swapping the fr_expr indices in the
calls to filter_notes.

In hindsight it would probably have been less confusing / error-prone to
have combine_reg_notes take an array of two insns, then we wouldn't have
to mix 1-based and 0-based indexing as well as remembering to call
filter_notes in reverse program order.  This however is a minimal fix
for backporting purposes.

Many thanks to Matthew for spotting this typo and pointing it out to me.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk and the 14
branch after the 14.1 release?

Thanks,
Alex

gcc/ChangeLog:

PR target/114936
* config/aarch64/aarch64-ldp-fusion.cc (combine_reg_notes):
Ensure insn iN has its REG_FRAME_RELATED_EXPR (if any) stored in
FR_EXPR[N-1], thus matching the correspondence expected by the
copy_rtx calls.
diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 0bc225dae7b..12ef305d8d3 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -1085,9 +1085,9 @@ combine_reg_notes (insn_info *i1, insn_info *i2, bool 
load_p)
   bool found_eh_region = false;
   rtx result = NULL_RTX;
   result = filter_notes (REG_NOTES (i2->rtl ()), result,
-&found_eh_region, fr_expr);
-  result = filter_notes (REG_NOTES (i1->rtl ()), result,
 &found_eh_region, fr_expr + 1);
+  result = filter_notes (REG_NOTES (i1->rtl ()), result,
+&found_eh_region, fr_expr);
 
   if (!load_p)
 {


Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-05-03 Thread Alex Coplan
On 22/04/2024 13:01, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 14/04/24 10:29 pm, Ajit Agarwal wrote:
> > Hello Alex:
> > 
> > On 12/04/24 11:02 pm, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 12/04/24 8:15 pm, Alex Coplan wrote:
> >>> On 12/04/2024 20:02, Ajit Agarwal wrote:
> >>>> Hello Alex:
> >>>>
> >>>> On 11/04/24 7:55 pm, Alex Coplan wrote:
> >>>>> On 10/04/2024 23:48, Ajit Agarwal wrote:
> >>>>>> Hello Alex:
> >>>>>>
> >>>>>> On 10/04/24 7:52 pm, Alex Coplan wrote:
> >>>>>>> Hi Ajit,
> >>>>>>>
> >>>>>>> On 10/04/2024 15:31, Ajit Agarwal wrote:
> >>>>>>>> Hello Alex:
> >>>>>>>>
> >>>>>>>> On 10/04/24 1:42 pm, Alex Coplan wrote:
> >>>>>>>>> Hi Ajit,
> >>>>>>>>>
> >>>>>>>>> On 09/04/2024 20:59, Ajit Agarwal wrote:
> >>>>>>>>>> Hello Alex:
> >>>>>>>>>>
> >>>>>>>>>> On 09/04/24 8:39 pm, Alex Coplan wrote:
> >>>>>>>>>>> On 09/04/2024 20:01, Ajit Agarwal wrote:
> >>>>>>>>>>>> Hello Alex:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 09/04/24 7:29 pm, Alex Coplan wrote:
> >>>>>>>>>>>>> On 09/04/2024 17:30, Ajit Agarwal wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 05/04/24 10:03 pm, Alex Coplan wrote:
> >>>>>>>>>>>>>>> On 05/04/2024 13:53, Ajit Agarwal wrote:
> >>>>>>>>>>>>>>>> Hello Alex/Richard:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> All review comments are incorporated.
> >>>>>>> 
> >>>>>>>>>>>>>>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
> >>>>>>>>>>>>>>>> &left_list,
> >>>>>>>>>>>>>>>>  // of accesses.  If we find two sets of adjacent accesses, 
> >>>>>>>>>>>>>>>> call
> >>>>>>>>>>>>>>>>  // merge_pairs.
> >>>>>>>>>>>>>>>>  void
> >>>>>>>>>>>>>>>> -ldp_bb_info::transform_for_base (int encoded_lfs,
> >>>>>>>>>>>>>>>> - access_group &group)
> >>>>>>>>>>>>>>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
> >>>>>>>>>>>>>>>> + access_group &group)
> >>>>>>>>>>>>>>>>  {
> >>>>>>>>>>>>>>>>const auto lfs = decode_lfs (encoded_lfs);
> >>>>>>>>>>>>>>>>const unsigned access_size = lfs.size;
> >>>>>>>>>>>>>>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
> >>>>>>>>>>>>>>>> encoded_lfs,
> >>>>>>>>>>>>>>>> access.cand_insns,
> >>>>>>>>>>>>>>>> lfs.load_p,
> >>>>>>>>>>>>>>>> access_size);
> >>>>>>>>>>>>>>>> -  skip_next = access.cand_insns.empty ();
> >>>>>>>>>>>>>>>> +  skip_next = bb_state->cand_insns_empty_p 
> >>>>>>>>>>>>>>>> (access.cand_insns);
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> As above, why is this needed?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For rs6000 we want to return always tru

cfgrtl: Fix MEM_EXPR update in duplicate_insn_chain [PR114924]

2024-05-02 Thread Alex Coplan
Hi,

The PR shows that when cfgrtl.cc:duplicate_insn_chain attempts to
update the MR_DEPENDENCE_CLIQUE information for a MEM_EXPR we can end up
accidentally dropping (e.g.) an ARRAY_REF from the MEM_EXPR and end up
replacing it with the underlying MEM_REF.  This leads to an
inconsistency in the MEM_EXPR information, and could lead to wrong code.

While the walk down to the MEM_REF is necessary to update
MR_DEPENDENCE_CLIQUE, we should use the outer tree expression for the
MEM_EXPR.  This patch does that.

Bootstrapped/regtested on aarch64-linux-gnu, no regressions.  OK for
trunk?  What about backports?

Thanks,
Alex

gcc/ChangeLog:

PR rtl-optimization/114924
* cfgrtl.cc (duplicate_insn_chain): When updating MEM_EXPRs,
don't strip (e.g.) ARRAY_REFs from the final MEM_EXPR.
diff --git a/gcc/cfgrtl.cc b/gcc/cfgrtl.cc
index 304c429c99b..a5dc3512159 100644
--- a/gcc/cfgrtl.cc
+++ b/gcc/cfgrtl.cc
@@ -4432,12 +4432,13 @@ duplicate_insn_chain (rtx_insn *from, rtx_insn *to,
   since MEM_EXPR is shared so make a copy and
   walk to the subtree again.  */
tree new_expr = unshare_expr (MEM_EXPR (*iter));
+   tree orig_new_expr = new_expr;
if (TREE_CODE (new_expr) == WITH_SIZE_EXPR)
  new_expr = TREE_OPERAND (new_expr, 0);
while (handled_component_p (new_expr))
  new_expr = TREE_OPERAND (new_expr, 0);
MR_DEPENDENCE_CLIQUE (new_expr) = newc;
-   set_mem_expr (const_cast  (*iter), new_expr);
+   set_mem_expr (const_cast  (*iter), orig_new_expr);
  }
  }
}


Re: [PATCH] wwwdocs: Add note to changes.html for __has_{feature,extension}

2024-04-26 Thread Alex Coplan
On 26/04/2024 09:14, Marek Polacek wrote:
> On Fri, Apr 26, 2024 at 11:12:54AM +0100, Alex Coplan wrote:
> > On 17/04/2024 11:41, Marek Polacek wrote:
> > > On Mon, Apr 15, 2024 at 11:13:27AM +0100, Alex Coplan wrote:
> > > > On 04/04/2024 11:00, Alex Coplan wrote:
> > > > > Hi,
> > > > > 
> > > > > This adds a note to the GCC 14 release notes mentioning support for
> > > > > __has_{feature,extension} (PR60512).
> > > > > 
> > > > > OK to commit?
> > > > 
> > > > Ping.  Is this changes.html patch OK?  I guess it needs a review from 
> > > > C++
> > > > maintainers since it adds to the C++ section.
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > Alex
> > > > 
> > > > > diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> > > > > index 9fd224c1..facead8d 100644
> > > > > --- a/htdocs/gcc-14/changes.html
> > > > > +++ b/htdocs/gcc-14/changes.html
> > > > > @@ -242,6 +242,12 @@ a work-in-progress.
> > > > >constinit and optimized dynamic 
> > > > > initialization
> > > > >  
> > > > >
> > > > > +  The Clang language extensions __has_feature and
> > > > > +__has_extension have been implemented in GCC.  These
> > > > > +are available from C, C++, and Objective-C(++).
> > > 
> > > Since the extension is for the whole c-family, not just C++, I think it
> > > belongs to a "C family" section.  See e.g. 
> > > <https://gcc.gnu.org/gcc-13/changes.html>.
> > 
> > Thanks, I agree that makes more sense.  How about this version instead then:
> 
> Thanks, I think you can go ahead with this.

Great, I've pushed that to wwwdocs.

>  
> > diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> > index fce0fb44..42353955 100644
> > --- a/htdocs/gcc-14/changes.html
> > +++ b/htdocs/gcc-14/changes.html
> > @@ -303,7 +303,15 @@ a work-in-progress.
> >Further clean up and improvements to the GNAT code.
> >  
> >  
> > -
> > +C family
> > +
> > +  The Clang language extensions __has_feature and
> > +__has_extension have been implemented in GCC.  These
> > +are available from C, C++, and Objective-C(++).
> > +This is primarily intended to aid the portability of code written
> > +against Clang.
> > +  
> > +
> >  
> >  C
> 
> Marek
> 


Re: [PATCH] wwwdocs: Add note to changes.html for __has_{feature,extension}

2024-04-26 Thread Alex Coplan
On 17/04/2024 11:41, Marek Polacek wrote:
> On Mon, Apr 15, 2024 at 11:13:27AM +0100, Alex Coplan wrote:
> > On 04/04/2024 11:00, Alex Coplan wrote:
> > > Hi,
> > > 
> > > This adds a note to the GCC 14 release notes mentioning support for
> > > __has_{feature,extension} (PR60512).
> > > 
> > > OK to commit?
> > 
> > Ping.  Is this changes.html patch OK?  I guess it needs a review from C++
> > maintainers since it adds to the C++ section.
> > 
> > > 
> > > Thanks,
> > > Alex
> > 
> > > diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> > > index 9fd224c1..facead8d 100644
> > > --- a/htdocs/gcc-14/changes.html
> > > +++ b/htdocs/gcc-14/changes.html
> > > @@ -242,6 +242,12 @@ a work-in-progress.
> > >constinit and optimized dynamic initialization
> > >  
> > >
> > > +  The Clang language extensions __has_feature and
> > > +__has_extension have been implemented in GCC.  These
> > > +are available from C, C++, and Objective-C(++).
> 
> Since the extension is for the whole c-family, not just C++, I think it
> belongs to a "C family" section.  See e.g. 
> <https://gcc.gnu.org/gcc-13/changes.html>.

Thanks, I agree that makes more sense.  How about this version instead then:

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index fce0fb44..42353955 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -303,7 +303,15 @@ a work-in-progress.
   Further clean up and improvements to the GNAT code.
 
 
-
+C family
+
+  The Clang language extensions __has_feature and
+__has_extension have been implemented in GCC.  These
+are available from C, C++, and Objective-C(++).
+This is primarily intended to aid the portability of code written
+against Clang.
+  
+
 
 C
 
Alex

> 
> Marek
> 


Re: [PATCH] wwwdocs: Add note to changes.html for __has_{feature,extension}

2024-04-15 Thread Alex Coplan
On 04/04/2024 11:00, Alex Coplan wrote:
> Hi,
> 
> This adds a note to the GCC 14 release notes mentioning support for
> __has_{feature,extension} (PR60512).
> 
> OK to commit?

Ping.  Is this changes.html patch OK?  I guess it needs a review from C++
maintainers since it adds to the C++ section.

> 
> Thanks,
> Alex

> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> index 9fd224c1..facead8d 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -242,6 +242,12 @@ a work-in-progress.
>constinit and optimized dynamic initialization
>  
>
> +  The Clang language extensions __has_feature and
> +__has_extension have been implemented in GCC.  These
> +are available from C, C++, and Objective-C(++).
> +This is primarily intended to aid the portability of code written
> +against Clang.
> +  
>  
>  
>  Runtime Library (libstdc++)



Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-12 Thread Alex Coplan
On 12/04/2024 20:02, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 11/04/24 7:55 pm, Alex Coplan wrote:
> > On 10/04/2024 23:48, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 10/04/24 7:52 pm, Alex Coplan wrote:
> >>> Hi Ajit,
> >>>
> >>> On 10/04/2024 15:31, Ajit Agarwal wrote:
> >>>> Hello Alex:
> >>>>
> >>>> On 10/04/24 1:42 pm, Alex Coplan wrote:
> >>>>> Hi Ajit,
> >>>>>
> >>>>> On 09/04/2024 20:59, Ajit Agarwal wrote:
> >>>>>> Hello Alex:
> >>>>>>
> >>>>>> On 09/04/24 8:39 pm, Alex Coplan wrote:
> >>>>>>> On 09/04/2024 20:01, Ajit Agarwal wrote:
> >>>>>>>> Hello Alex:
> >>>>>>>>
> >>>>>>>> On 09/04/24 7:29 pm, Alex Coplan wrote:
> >>>>>>>>> On 09/04/2024 17:30, Ajit Agarwal wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 05/04/24 10:03 pm, Alex Coplan wrote:
> >>>>>>>>>>> On 05/04/2024 13:53, Ajit Agarwal wrote:
> >>>>>>>>>>>> Hello Alex/Richard:
> >>>>>>>>>>>>
> >>>>>>>>>>>> All review comments are incorporated.
> >>> 
> >>>>>>>>>>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
> >>>>>>>>>>>> &left_list,
> >>>>>>>>>>>>  // of accesses.  If we find two sets of adjacent accesses, call
> >>>>>>>>>>>>  // merge_pairs.
> >>>>>>>>>>>>  void
> >>>>>>>>>>>> -ldp_bb_info::transform_for_base (int encoded_lfs,
> >>>>>>>>>>>> - access_group &group)
> >>>>>>>>>>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
> >>>>>>>>>>>> + access_group &group)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>const auto lfs = decode_lfs (encoded_lfs);
> >>>>>>>>>>>>const unsigned access_size = lfs.size;
> >>>>>>>>>>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
> >>>>>>>>>>>> encoded_lfs,
> >>>>>>>>>>>> access.cand_insns,
> >>>>>>>>>>>> lfs.load_p,
> >>>>>>>>>>>> access_size);
> >>>>>>>>>>>> -  skip_next = access.cand_insns.empty ();
> >>>>>>>>>>>> +  skip_next = bb_state->cand_insns_empty_p 
> >>>>>>>>>>>> (access.cand_insns);
> >>>>>>>>>>>
> >>>>>>>>>>> As above, why is this needed?
> >>>>>>>>>>
> >>>>>>>>>> For rs6000 we want to return always true. as load store pair
> >>>>>>>>>> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
> >>>>>>>>>> And we want load store pair to 8/16 32/64. Thats why we want
> >>>>>>>>>> to generate always true for rs6000 to skip pairs as above.
> >>>>>>>>>
> >>>>>>>>> Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 
> >>>>>>>>> you have
> >>>>>>>>> load/store pair instructions where the two arms of the access are 
> >>>>>>>>> storing
> >>>>>>>>> operands of different sizes?  Or something else?
> >>>>>>>>>
> >>>>>>>>> As it stands the logic is to skip the next iteration only if we
> >>>>>>>>> exhausted all the candidate insns for the current access.  In the 
> >>>>>>>>> case
> >>>>>>>>> that we didn't exhaust all such candidates, then the idea is that 
> >>>>>>&g

[PATCH v2] aarch64: Preserve mem info on change of base for ldp/stp [PR114674]

2024-04-12 Thread Alex Coplan
This is a v2 because I accidentally sent a WIP version of the patch last
time round which used replace_equiv_address instead of
replace_equiv_address_nv; that caused some ICEs (pointed out by the
Linaro CI) since pair addressing modes aren't a subset of the addresses
that are accepted by memory_operand for a given mode.

This patch should otherwise be identical to v1.  Bootstrapped/regtested
on aarch64-linux-gnu (indeed this is the patch I actually tested last
time), is this version also OK for GCC 15?

Thanks,
Alex

--- >8 ---

The ldp/stp fusion pass can change the base of an access so that the two
accesses end up using a common base register.  So far we have been using
adjust_address_nv to do this, but this means that we don't preserve
other properties of the mem we're replacing.  It seems better to use
replace_equiv_address_nv, as this will preserve e.g. the MEM_ALIGN of the
mem whose address we're changing.

The PR shows that by adjusting the other mem we lose alignment
information about the original access and therefore end up rejecting an
otherwise viable pair when --param=aarch64-stp-policy=aligned is passed.
This patch fixes that by using replace_equiv_address_nv instead.

Notably this is the same approach as taken by
aarch64_check_consecutive_mems when a change of base is required, so
this at least makes things more consistent between the ldp fusion pass
and the peepholes.

gcc/ChangeLog:

PR target/114674
* config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair):
Use replace_equiv_address_nv on a change of base instead of
adjust_address_nv on the other access.

gcc/testsuite/ChangeLog:

PR target/114674
* gcc.target/aarch64/pr114674.c: New test.
diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 365dcf48b22..d07d79df06c 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -1730,11 +1730,11 @@ ldp_bb_info::fuse_pair (bool load_p,
adjust_amt *= -1;
 
   rtx change_reg = XEXP (change_pat, !load_p);
-  machine_mode mode_for_mem = GET_MODE (change_mem);
   rtx effective_base = drop_writeback (base_mem);
-  rtx new_mem = adjust_address_nv (effective_base,
-  mode_for_mem,
-  adjust_amt);
+  rtx adjusted_addr = plus_constant (Pmode,
+XEXP (effective_base, 0),
+adjust_amt);
+  rtx new_mem = replace_equiv_address_nv (change_mem, adjusted_addr);
   rtx new_set = load_p
? gen_rtx_SET (change_reg, new_mem)
: gen_rtx_SET (new_mem, change_reg);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr114674.c 
b/gcc/testsuite/gcc.target/aarch64/pr114674.c
new file mode 100644
index 000..944784fd008
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr114674.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param=aarch64-stp-policy=aligned" } */
+typedef struct {
+   unsigned int f1;
+   unsigned int f2;
+} test_struct;
+
+static test_struct ts = {
+   123, 456
+};
+
+void foo(void)
+{
+   ts.f2 = 36969 * (ts.f2 & 65535) + (ts.f1 >> 16);
+   ts.f1 = 18000 * (ts.f2 & 65535) + (ts.f2 >> 16);
+}
+/* { dg-final { scan-assembler-times "stp" 1 } } */


Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-11 Thread Alex Coplan
On 10/04/2024 23:48, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 10/04/24 7:52 pm, Alex Coplan wrote:
> > Hi Ajit,
> > 
> > On 10/04/2024 15:31, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 10/04/24 1:42 pm, Alex Coplan wrote:
> >>> Hi Ajit,
> >>>
> >>> On 09/04/2024 20:59, Ajit Agarwal wrote:
> >>>> Hello Alex:
> >>>>
> >>>> On 09/04/24 8:39 pm, Alex Coplan wrote:
> >>>>> On 09/04/2024 20:01, Ajit Agarwal wrote:
> >>>>>> Hello Alex:
> >>>>>>
> >>>>>> On 09/04/24 7:29 pm, Alex Coplan wrote:
> >>>>>>> On 09/04/2024 17:30, Ajit Agarwal wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 05/04/24 10:03 pm, Alex Coplan wrote:
> >>>>>>>>> On 05/04/2024 13:53, Ajit Agarwal wrote:
> >>>>>>>>>> Hello Alex/Richard:
> >>>>>>>>>>
> >>>>>>>>>> All review comments are incorporated.
> > 
> >>>>>>>>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
> >>>>>>>>>> &left_list,
> >>>>>>>>>>  // of accesses.  If we find two sets of adjacent accesses, call
> >>>>>>>>>>  // merge_pairs.
> >>>>>>>>>>  void
> >>>>>>>>>> -ldp_bb_info::transform_for_base (int encoded_lfs,
> >>>>>>>>>> -   access_group &group)
> >>>>>>>>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
> >>>>>>>>>> +   access_group &group)
> >>>>>>>>>>  {
> >>>>>>>>>>const auto lfs = decode_lfs (encoded_lfs);
> >>>>>>>>>>const unsigned access_size = lfs.size;
> >>>>>>>>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
> >>>>>>>>>> encoded_lfs,
> >>>>>>>>>>   access.cand_insns,
> >>>>>>>>>>   lfs.load_p,
> >>>>>>>>>>   access_size);
> >>>>>>>>>> -skip_next = access.cand_insns.empty ();
> >>>>>>>>>> +skip_next = bb_state->cand_insns_empty_p (access.cand_insns);
> >>>>>>>>>
> >>>>>>>>> As above, why is this needed?
> >>>>>>>>
> >>>>>>>> For rs6000 we want to return always true. as load store pair
> >>>>>>>> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
> >>>>>>>> And we want load store pair to 8/16 32/64. Thats why we want
> >>>>>>>> to generate always true for rs6000 to skip pairs as above.
> >>>>>>>
> >>>>>>> Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 
> >>>>>>> you have
> >>>>>>> load/store pair instructions where the two arms of the access are 
> >>>>>>> storing
> >>>>>>> operands of different sizes?  Or something else?
> >>>>>>>
> >>>>>>> As it stands the logic is to skip the next iteration only if we
> >>>>>>> exhausted all the candidate insns for the current access.  In the case
> >>>>>>> that we didn't exhaust all such candidates, then the idea is that when
> >>>>>>> access becomes prev_access, we can attempt to use those candidates as
> >>>>>>> the "left-hand side" of a pair in the next iteration since we failed 
> >>>>>>> to
> >>>>>>> use them as the "right-hand side" of a pair in the current iteration.
> >>>>>>> I don't see why you wouldn't want that behaviour.  Please can you
> >>>>>>> explain?
> >>>>>>>
> >>>>>>
> >>>>>> In merge_pair we get the 2 load candiates one load from 0 offset and
> >>>>>> other load is from 16th offset. Then in next iteration we get load
> >>>>

[PATCH] aarch64: Preserve mem info on change of base for ldp/stp [PR114674]

2024-04-11 Thread Alex Coplan
Hi,

The ldp/stp fusion pass can change the base of an access so that the two
accesses end up using a common base register.  So far we have been using
adjust_address_nv to do this, but this means that we don't preserve
other properties of the mem we're replacing.  It seems better to use
replace_equiv_address_nv, as this will preserve e.g. the MEM_ALIGN of the
mem whose address we're changing.

The PR shows that by adjusting the other mem we lose alignment
information about the original access and therefore end up rejecting an
otherwise viable pair when --param=aarch64-stp-policy=aligned is passed.
This patch fixes that by using replace_equiv_address_nv instead.

Notably this is the same approach as taken by
aarch64_check_consecutive_mems when a change of base is required, so
this at least makes things more consistent between the ldp fusion pass
and the peepholes.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk when stage 1
opens for GCC 15?

Thanks,
Alex


gcc/ChangeLog:

PR target/114674
* config/aarch64/aarch64-ldp-fusion.cc (ldp_bb_info::fuse_pair):
Use replace_equiv_address_nv on a change of base instead of
adjust_address_nv on the other access.

gcc/testsuite/ChangeLog:

PR target/114674
* gcc.target/aarch64/pr114674.c: New test.
diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 365dcf48b22..4258a560c48 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -1730,11 +1730,11 @@ ldp_bb_info::fuse_pair (bool load_p,
adjust_amt *= -1;
 
   rtx change_reg = XEXP (change_pat, !load_p);
-  machine_mode mode_for_mem = GET_MODE (change_mem);
   rtx effective_base = drop_writeback (base_mem);
-  rtx new_mem = adjust_address_nv (effective_base,
-  mode_for_mem,
-  adjust_amt);
+  rtx adjusted_addr = plus_constant (Pmode,
+XEXP (effective_base, 0),
+adjust_amt);
+  rtx new_mem = replace_equiv_address (change_mem, adjusted_addr);
   rtx new_set = load_p
? gen_rtx_SET (change_reg, new_mem)
: gen_rtx_SET (new_mem, change_reg);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr114674.c 
b/gcc/testsuite/gcc.target/aarch64/pr114674.c
new file mode 100644
index 000..944784fd008
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr114674.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param=aarch64-stp-policy=aligned" } */
+typedef struct {
+   unsigned int f1;
+   unsigned int f2;
+} test_struct;
+
+static test_struct ts = {
+   123, 456
+};
+
+void foo(void)
+{
+   ts.f2 = 36969 * (ts.f2 & 65535) + (ts.f1 >> 16);
+   ts.f1 = 18000 * (ts.f2 & 65535) + (ts.f2 >> 16);
+}
+/* { dg-final { scan-assembler-times "stp" 1 } } */


Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-10 Thread Alex Coplan
Hi Ajit,

On 10/04/2024 15:31, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 10/04/24 1:42 pm, Alex Coplan wrote:
> > Hi Ajit,
> > 
> > On 09/04/2024 20:59, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 09/04/24 8:39 pm, Alex Coplan wrote:
> >>> On 09/04/2024 20:01, Ajit Agarwal wrote:
> >>>> Hello Alex:
> >>>>
> >>>> On 09/04/24 7:29 pm, Alex Coplan wrote:
> >>>>> On 09/04/2024 17:30, Ajit Agarwal wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 05/04/24 10:03 pm, Alex Coplan wrote:
> >>>>>>> On 05/04/2024 13:53, Ajit Agarwal wrote:
> >>>>>>>> Hello Alex/Richard:
> >>>>>>>>
> >>>>>>>> All review comments are incorporated.

> >>>>>>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
> >>>>>>>> &left_list,
> >>>>>>>>  // of accesses.  If we find two sets of adjacent accesses, call
> >>>>>>>>  // merge_pairs.
> >>>>>>>>  void
> >>>>>>>> -ldp_bb_info::transform_for_base (int encoded_lfs,
> >>>>>>>> - access_group &group)
> >>>>>>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
> >>>>>>>> + access_group &group)
> >>>>>>>>  {
> >>>>>>>>const auto lfs = decode_lfs (encoded_lfs);
> >>>>>>>>const unsigned access_size = lfs.size;
> >>>>>>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
> >>>>>>>> encoded_lfs,
> >>>>>>>> access.cand_insns,
> >>>>>>>> lfs.load_p,
> >>>>>>>> access_size);
> >>>>>>>> -  skip_next = access.cand_insns.empty ();
> >>>>>>>> +  skip_next = bb_state->cand_insns_empty_p (access.cand_insns);
> >>>>>>>
> >>>>>>> As above, why is this needed?
> >>>>>>
> >>>>>> For rs6000 we want to return always true. as load store pair
> >>>>>> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
> >>>>>> And we want load store pair to 8/16 32/64. Thats why we want
> >>>>>> to generate always true for rs6000 to skip pairs as above.
> >>>>>
> >>>>> Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 you 
> >>>>> have
> >>>>> load/store pair instructions where the two arms of the access are 
> >>>>> storing
> >>>>> operands of different sizes?  Or something else?
> >>>>>
> >>>>> As it stands the logic is to skip the next iteration only if we
> >>>>> exhausted all the candidate insns for the current access.  In the case
> >>>>> that we didn't exhaust all such candidates, then the idea is that when
> >>>>> access becomes prev_access, we can attempt to use those candidates as
> >>>>> the "left-hand side" of a pair in the next iteration since we failed to
> >>>>> use them as the "right-hand side" of a pair in the current iteration.
> >>>>> I don't see why you wouldn't want that behaviour.  Please can you
> >>>>> explain?
> >>>>>
> >>>>
> >>>> In merge_pair we get the 2 load candiates one load from 0 offset and
> >>>> other load is from 16th offset. Then in next iteration we get load
> >>>> from 16th offset and other load from 32 offset. In next iteration
> >>>> we get load from 32 offset and other load from 48 offset.
> >>>>
> >>>> For example:
> >>>>
> >>>> Currently we get the load candiates as follows.
> >>>>
> >>>> pairs:
> >>>>
> >>>> load from 0th offset.
> >>>> load from 16th offset.
> >>>>
> >>>> next pairs:
> >>>>
> >>>> load from 16th offset.
> >>>> load from 32th offset.
> >>>>
> >>>> next pairs:
> >>>>
&

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-10 Thread Alex Coplan
Hi Ajit,

On 09/04/2024 20:59, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 09/04/24 8:39 pm, Alex Coplan wrote:
> > On 09/04/2024 20:01, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 09/04/24 7:29 pm, Alex Coplan wrote:
> >>> On 09/04/2024 17:30, Ajit Agarwal wrote:
> >>>>
> >>>>
> >>>> On 05/04/24 10:03 pm, Alex Coplan wrote:
> >>>>> On 05/04/2024 13:53, Ajit Agarwal wrote:
> >>>>>> Hello Alex/Richard:
> >>>>>>
> >>>>>> All review comments are incorporated.
> >>>>>
> >>>>> Thanks, I was kind-of expecting you to also send the renaming patch as a
> >>>>> preparatory patch as we discussed.
> >>>>>
> >>>>> Sorry for another meta comment, but: I think the reason that the Linaro
> >>>>> CI isn't running tests on your patches is actually because you're
> >>>>> sending 1/3 of a series but not sending the rest of the series.
> >>>>>
> >>>>> So please can you either send this as an individual preparatory patch
> >>>>> (not marked as a series) or if you're going to send a series (e.g. with
> >>>>> a preparatory rename patch as 1/2 and this as 2/2) then send the entire
> >>>>> series when you make updates.  That way the CI should test your patches,
> >>>>> which would be helpful.
> >>>>>
> >>>>
> >>>> Addressed.
> >>>>  
> >>>>>>
> >>>>>> Common infrastructure of load store pair fusion is divided into target
> >>>>>> independent and target dependent changed code.
> >>>>>>
> >>>>>> Target independent code is the Generic code with pure virtual function
> >>>>>> to interface betwwen target independent and dependent code.
> >>>>>>
> >>>>>> Target dependent code is the implementation of pure virtual function 
> >>>>>> for
> >>>>>> aarch64 target and the call to target independent code.
> >>>>>>
> >>>>>> Thanks & Regards
> >>>>>> Ajit
> >>>>>>
> >>>>>>
> >>>>>> aarch64: Place target independent and dependent changed code in one 
> >>>>>> file
> >>>>>>
> >>>>>> Common infrastructure of load store pair fusion is divided into target
> >>>>>> independent and target dependent changed code.
> >>>>>>
> >>>>>> Target independent code is the Generic code with pure virtual function
> >>>>>> to interface betwwen target independent and dependent code.
> >>>>>>
> >>>>>> Target dependent code is the implementation of pure virtual function 
> >>>>>> for
> >>>>>> aarch64 target and the call to target independent code.
> >>>>>>
> >>>>>> 2024-04-06  Ajit Kumar Agarwal  
> >>>>>>
> >>>>>> gcc/ChangeLog:
> >>>>>>
> >>>>>>* config/aarch64/aarch64-ldp-fusion.cc: Place target
> >>>>>>independent and dependent changed code.
> >>>>>
> >>>>> You're going to need a proper ChangeLog eventually, but I guess there's
> >>>>> no need for that right now.
> >>>>>
> >>>>>> ---
> >>>>>>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++
> >>>>>>  1 file changed, 249 insertions(+), 122 deletions(-)
> >>>>>>
> >>>>>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> >>>>>> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >>>>>> index 22ed95eb743..cb21b514ef7 100644
> >>>>>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >>>>>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >>>>>> @@ -138,8 +138,122 @@ struct alt_base
> >>>>>>poly_int64 offset;
> >>>>>>  };
> >>>>>>  
> >>>>>> +// Virtual base class for load/store walkers used in alias analysis.
> >>>>>> +struct alias_walker
> >>>>>> +{
> >>&g

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-09 Thread Alex Coplan
On 09/04/2024 20:01, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 09/04/24 7:29 pm, Alex Coplan wrote:
> > On 09/04/2024 17:30, Ajit Agarwal wrote:
> >>
> >>
> >> On 05/04/24 10:03 pm, Alex Coplan wrote:
> >>> On 05/04/2024 13:53, Ajit Agarwal wrote:
> >>>> Hello Alex/Richard:
> >>>>
> >>>> All review comments are incorporated.
> >>>
> >>> Thanks, I was kind-of expecting you to also send the renaming patch as a
> >>> preparatory patch as we discussed.
> >>>
> >>> Sorry for another meta comment, but: I think the reason that the Linaro
> >>> CI isn't running tests on your patches is actually because you're
> >>> sending 1/3 of a series but not sending the rest of the series.
> >>>
> >>> So please can you either send this as an individual preparatory patch
> >>> (not marked as a series) or if you're going to send a series (e.g. with
> >>> a preparatory rename patch as 1/2 and this as 2/2) then send the entire
> >>> series when you make updates.  That way the CI should test your patches,
> >>> which would be helpful.
> >>>
> >>
> >> Addressed.
> >>  
> >>>>
> >>>> Common infrastructure of load store pair fusion is divided into target
> >>>> independent and target dependent changed code.
> >>>>
> >>>> Target independent code is the Generic code with pure virtual function
> >>>> to interface betwwen target independent and dependent code.
> >>>>
> >>>> Target dependent code is the implementation of pure virtual function for
> >>>> aarch64 target and the call to target independent code.
> >>>>
> >>>> Thanks & Regards
> >>>> Ajit
> >>>>
> >>>>
> >>>> aarch64: Place target independent and dependent changed code in one file
> >>>>
> >>>> Common infrastructure of load store pair fusion is divided into target
> >>>> independent and target dependent changed code.
> >>>>
> >>>> Target independent code is the Generic code with pure virtual function
> >>>> to interface betwwen target independent and dependent code.
> >>>>
> >>>> Target dependent code is the implementation of pure virtual function for
> >>>> aarch64 target and the call to target independent code.
> >>>>
> >>>> 2024-04-06  Ajit Kumar Agarwal  
> >>>>
> >>>> gcc/ChangeLog:
> >>>>
> >>>>  * config/aarch64/aarch64-ldp-fusion.cc: Place target
> >>>>  independent and dependent changed code.
> >>>
> >>> You're going to need a proper ChangeLog eventually, but I guess there's
> >>> no need for that right now.
> >>>
> >>>> ---
> >>>>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++
> >>>>  1 file changed, 249 insertions(+), 122 deletions(-)
> >>>>
> >>>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> >>>> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >>>> index 22ed95eb743..cb21b514ef7 100644
> >>>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >>>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >>>> @@ -138,8 +138,122 @@ struct alt_base
> >>>>poly_int64 offset;
> >>>>  };
> >>>>  
> >>>> +// Virtual base class for load/store walkers used in alias analysis.
> >>>> +struct alias_walker
> >>>> +{
> >>>> +  virtual bool conflict_p (int &budget) const = 0;
> >>>> +  virtual insn_info *insn () const = 0;
> >>>> +  virtual bool valid () const  = 0;
> >>>
> >>> Heh, looking at this made me realise there is a whitespace bug here in
> >>> the existing code (double space after const).  Sorry about that!  I'll
> >>> push an obvious fix for that.
> >>>
> >>>> +  virtual void advance () = 0;
> >>>> +};
> >>>> +
> >>>> +struct pair_fusion {
> >>>> +
> >>>> +  pair_fusion () {};
> >>>
> >>> This ctor looks pointless at the moment.  Perhaps instead we could put
> >>> the contents of ldp_fusion_init in here and then delete that function?
&

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-09 Thread Alex Coplan
On 09/04/2024 17:30, Ajit Agarwal wrote:
> 
> 
> On 05/04/24 10:03 pm, Alex Coplan wrote:
> > On 05/04/2024 13:53, Ajit Agarwal wrote:
> >> Hello Alex/Richard:
> >>
> >> All review comments are incorporated.
> > 
> > Thanks, I was kind-of expecting you to also send the renaming patch as a
> > preparatory patch as we discussed.
> > 
> > Sorry for another meta comment, but: I think the reason that the Linaro
> > CI isn't running tests on your patches is actually because you're
> > sending 1/3 of a series but not sending the rest of the series.
> > 
> > So please can you either send this as an individual preparatory patch
> > (not marked as a series) or if you're going to send a series (e.g. with
> > a preparatory rename patch as 1/2 and this as 2/2) then send the entire
> > series when you make updates.  That way the CI should test your patches,
> > which would be helpful.
> >
> 
> Addressed.
>  
> >>
> >> Common infrastructure of load store pair fusion is divided into target
> >> independent and target dependent changed code.
> >>
> >> Target independent code is the Generic code with pure virtual function
> >> to interface betwwen target independent and dependent code.
> >>
> >> Target dependent code is the implementation of pure virtual function for
> >> aarch64 target and the call to target independent code.
> >>
> >> Thanks & Regards
> >> Ajit
> >>
> >>
> >> aarch64: Place target independent and dependent changed code in one file
> >>
> >> Common infrastructure of load store pair fusion is divided into target
> >> independent and target dependent changed code.
> >>
> >> Target independent code is the Generic code with pure virtual function
> >> to interface betwwen target independent and dependent code.
> >>
> >> Target dependent code is the implementation of pure virtual function for
> >> aarch64 target and the call to target independent code.
> >>
> >> 2024-04-06  Ajit Kumar Agarwal  
> >>
> >> gcc/ChangeLog:
> >>
> >>* config/aarch64/aarch64-ldp-fusion.cc: Place target
> >>independent and dependent changed code.
> > 
> > You're going to need a proper ChangeLog eventually, but I guess there's
> > no need for that right now.
> > 
> >> ---
> >>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++
> >>  1 file changed, 249 insertions(+), 122 deletions(-)
> >>
> >> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> >> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> index 22ed95eb743..cb21b514ef7 100644
> >> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> @@ -138,8 +138,122 @@ struct alt_base
> >>poly_int64 offset;
> >>  };
> >>  
> >> +// Virtual base class for load/store walkers used in alias analysis.
> >> +struct alias_walker
> >> +{
> >> +  virtual bool conflict_p (int &budget) const = 0;
> >> +  virtual insn_info *insn () const = 0;
> >> +  virtual bool valid () const  = 0;
> > 
> > Heh, looking at this made me realise there is a whitespace bug here in
> > the existing code (double space after const).  Sorry about that!  I'll
> > push an obvious fix for that.
> > 
> >> +  virtual void advance () = 0;
> >> +};
> >> +
> >> +struct pair_fusion {
> >> +
> >> +  pair_fusion () {};
> > 
> > This ctor looks pointless at the moment.  Perhaps instead we could put
> > the contents of ldp_fusion_init in here and then delete that function?
> > 
> 
> Addressed.
> 
> >> +  virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
> >> + bool load_p) = 0;
> > 
> > Please can we have comments above each of these virtual functions
> > describing any parameters, what the purpose of the hook is, and the
> > interpretation of the return value?  This will serve as the
> > documentation for other targets that want to make use of the pass.
> > 
> > It might make sense to have a default-false implementation for
> > fpsimd_op_p, especially if you don't want to make use of this bit for
> > rs6000.
> >
> 
> Addressed.
>  
> >> +
> >> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> >> +  virtual bool pair

[PATCH][committed] aarch64: Fix whitespace in aarch64-ldp-fusion.cc:alias_walker

2024-04-05 Thread Alex Coplan
I spotted this whitespace error during the review of
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648846.html.

Pushing as obvious after testing on aarch64-linux-gnu.

Thanks,
Alex

gcc/ChangeLog:

* config/aarch64/aarch64-ldp-fusion.cc (struct alias_walker):
Fix double space after const qualifier on valid ().
diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 22ed95eb743..365dcf48b22 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -2138,7 +2138,7 @@ struct alias_walker
 {
   virtual bool conflict_p (int &budget) const = 0;
   virtual insn_info *insn () const = 0;
-  virtual bool valid () const  = 0;
+  virtual bool valid () const = 0;
   virtual void advance () = 0;
 };
 


Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-05 Thread Alex Coplan
On 05/04/2024 13:53, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> All review comments are incorporated.

Thanks, I was kind-of expecting you to also send the renaming patch as a
preparatory patch as we discussed.

Sorry for another meta comment, but: I think the reason that the Linaro
CI isn't running tests on your patches is actually because you're
sending 1/3 of a series but not sending the rest of the series.

So please can you either send this as an individual preparatory patch
(not marked as a series) or if you're going to send a series (e.g. with
a preparatory rename patch as 1/2 and this as 2/2) then send the entire
series when you make updates.  That way the CI should test your patches,
which would be helpful.

> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> Thanks & Regards
> Ajit
> 
> 
> aarch64: Place target independent and dependent changed code in one file
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> 2024-04-06  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code.

You're going to need a proper ChangeLog eventually, but I guess there's
no need for that right now.

> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++
>  1 file changed, 249 insertions(+), 122 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 22ed95eb743..cb21b514ef7 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -138,8 +138,122 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int &budget) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const  = 0;

Heh, looking at this made me realise there is a whitespace bug here in
the existing code (double space after const).  Sorry about that!  I'll
push an obvious fix for that.

> +  virtual void advance () = 0;
> +};
> +
> +struct pair_fusion {
> +
> +  pair_fusion () {};

This ctor looks pointless at the moment.  Perhaps instead we could put
the contents of ldp_fusion_init in here and then delete that function?

> +  virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
> +bool load_p) = 0;

Please can we have comments above each of these virtual functions
describing any parameters, what the purpose of the hook is, and the
interpretation of the return value?  This will serve as the
documentation for other targets that want to make use of the pass.

It might make sense to have a default-false implementation for
fpsimd_op_p, especially if you don't want to make use of this bit for
rs6000.

> +
> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> +  virtual bool pair_trailing_writeback_p () = 0;

Sorry for the run-around, but: I think this and
handle_writeback_opportunities () should be the same function, either
returning an enum or taking an enum and returning a boolean.

At a minimum they should have more similar sounding names.

> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> +   machine_mode mem_mode) = 0;
> +  virtual int pair_mem_alias_check_limit () = 0;
> +  virtual bool handle_writeback_opportunities () = 0 ;
> +  virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
> + machine_mode mode) = 0;
> +  virtual rtx gen_mem_pair (rtx *pats,  rtx writeback,

Nit: excess whitespace after pats,

> + bool load_p) = 0;
> +  virtual bool pair_mem_promote_writeback_p (rtx pat) = 0;
> +  virtual bool track_load_p () = 0;
> +  virtual bool track_store_p () = 0;

I think it would probably make more sense for these two to have
default-true implementations rather than being pure virtual functions.

Also, sorry for the bikeshedding, but please can we keep the plural
names (so track_loads_p and track_stores_p).

> +  virtual bool cand_insns_empty_p (std::list &insns) = 0;

Why does this need to be virtualised?  I would it expect it to
just be insns.empty () on all targets.

> +  virtual bool pair_mem_i

[PATCH] wwwdocs: Add note to changes.html for __has_{feature,extension}

2024-04-04 Thread Alex Coplan
Hi,

This adds a note to the GCC 14 release notes mentioning support for
__has_{feature,extension} (PR60512).

OK to commit?

Thanks,
Alex
diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 9fd224c1..facead8d 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -242,6 +242,12 @@ a work-in-progress.
   constinit and optimized dynamic initialization
 
   
+  The Clang language extensions __has_feature and
+__has_extension have been implemented in GCC.  These
+are available from C, C++, and Objective-C(++).
+This is primarily intended to aid the portability of code written
+against Clang.
+  
 
 
 Runtime Library (libstdc++)


Re: [PATCH V3 0/2] aarch64: Place target independent and dependent changed code in one file.

2024-04-03 Thread Alex Coplan
On 23/02/2024 16:41, Ajit Agarwal wrote:
> Hello Richard/Alex/Segher:

Hi Ajit,

Sorry for the delay and thanks for working on this.

Generally this looks like the right sort of approach (IMO) but I've left
some comments below.

I'll start with a meta comment: in the subject line you have marked this
as 0/2, but usually 0/n is reserved for the cover letter of a patch
series and wouldn't contain an actual patch.  I think this might have
confused the Linaro CI suitably such that it didn't run regression tests
on the patch.

> 
> This patch adds the changed code for target independent and
> dependent code for load store fusion.
> 
> Common infrastructure of load store pair fusion is
> divided into target independent and target dependent
> changed code.
> 
> Target independent code is the Generic code with
> pure virtual function to interface betwwen target
> independent and dependent code.
> 
> Target dependent code is the implementation of pure
> virtual function for aarch64 target and the call
> to target independent code.
> 
> Bootstrapped for aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> aarch64: Place target independent and dependent changed code in one file.
> 
> Common infrastructure of load store pair fusion is
> divided into target independent and target dependent
> changed code.
> 
> Target independent code is the Generic code with
> pure virtual function to interface betwwen target
> independent and dependent code.
> 
> Target dependent code is the implementation of pure
> virtual function for aarch64 target and the call
> to target independent code.
> 
> 2024-02-23  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 437 ---
>  1 file changed, 305 insertions(+), 132 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 22ed95eb743..2ef22ff1e96 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -40,10 +40,10 @@
>  
>  using namespace rtl_ssa;
>  
> -static constexpr HOST_WIDE_INT LDP_IMM_BITS = 7;
> -static constexpr HOST_WIDE_INT LDP_IMM_SIGN_BIT = (1 << (LDP_IMM_BITS - 1));
> -static constexpr HOST_WIDE_INT LDP_MAX_IMM = LDP_IMM_SIGN_BIT - 1;
> -static constexpr HOST_WIDE_INT LDP_MIN_IMM = -LDP_MAX_IMM - 1;
> +static constexpr HOST_WIDE_INT PAIR_MEM_IMM_BITS = 7;
> +static constexpr HOST_WIDE_INT PAIR_MEM_IMM_SIGN_BIT = (1 << 
> (PAIR_MEM_IMM_BITS - 1));
> +static constexpr HOST_WIDE_INT PAIR_MEM_MAX_IMM = PAIR_MEM_IMM_SIGN_BIT - 1;
> +static constexpr HOST_WIDE_INT PAIR_MEM_MIN_IMM = -PAIR_MEM_MAX_IMM - 1;

These constants shouldn't be renamed: they are specific to aarch64 so the
original names should be preserved in this file.

I expect we want to introduce virtual functions to validate an offset
for a paired access.  The aarch64 code could then implement it by
comparing the offset against LDP_{MIN,MAX}_IMM, and the generic code
could use that hook to replace the code that queries those constants
directly (i.e. in find_trailing_add and get_viable_bases).

>  
>  // We pack these fields (load_p, fpsimd_p, and size) into an integer
>  // (LFS) which we use as part of the key into the main hash tables.
> @@ -138,8 +138,18 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int &budget) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const  = 0;
> +  virtual void advance () = 0;
> +};
> +
> +
>  // State used by the pass for a given basic block.
> -struct ldp_bb_info
> +struct pair_fusion

As a comment on the high-level design, I think we want a generic class
for the overall pass, not just for the BB-specific structure.

That is because naturally we want the ldp_fusion_bb function itself to
be a member of such a class, so that it can access virtual functions to
query the target e.g. about the load/store pair policy, and whether to
try and promote writeback pairs.

If we keep all of the virtual functions in such an outer class, then we
can keep the ldp_fusion_bb class generic (not needing an override for
each target) and that inner class can perhaps be given a pointer or
reference to the outer class when it is instantiated in ldp_fusion_bb.

>  {
>using def_hash = nofree_ptr_hash;
>using expr_key_t = pair_hash>;
> @@ -161,13 +171,13 @@ struct ldp_bb_info
>static const size_t obstack_alignment = sizeof (void *);
>bb_info *m_bb;
>  
> -  ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
> +  pair_fusion (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
>{
>  obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
>   obstack_alignment, obstack_chunk_alloc,
> 

Re: [PATCH 0/1 V2] Target independent code for common infrastructure of load,store fusion for rs6000 and aarch64 target.

2024-02-15 Thread Alex Coplan
On 15/02/2024 22:38, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 15/02/24 10:12 pm, Alex Coplan wrote:
> > On 15/02/2024 21:24, Ajit Agarwal wrote:
> >> Hello Richard:
> >>
> >> As per your suggestion I have divided the patch into target independent
> >> and target dependent for aarch64 target. I kept aarch64-ldp-fusion same
> >> and did not change that.
> > 
> > I'm not sure this was what Richard suggested doing, though.
> > He said (from
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645545.html):
> > 
> >> Maybe one way of making the review easier would be to split the aarch64
> >> pass into the "target-dependent" and "target-independent" pieces
> >> in-place, i.e. keeping everything within aarch64-ldp-fusion.cc, and then
> >> (as separate patches) move the target-independent pieces outside
> >> config/aarch64.
> > 
> > but this adds the target-independent parts separately instead of
> > splitting it out within config/aarch64 (which I agree should make the
> > review easier).
> 
> I am sorry I didnt follow. Can you kindly elaborate on this.

So IIUC Richard was suggesting splitting into target-independent and
target-dependent pieces within aarch64-ldp-fusion.cc as a first step,
i.e. you introduce the abstractions (virtual functions) needed within
that file.  That should hopefully be a relatively small diff.

Then in a separate patch you can move the target-independent parts out of
config/aarch64.

Does that make sense?

Thanks,
Alex

> 
> Thanks & Regards
> Ajit
> > 
> > Thanks,
> > Alex
> > 
> >>
> >> Common infrastructure of load store pair fusion is divided into
> >> target independent and target dependent code for rs6000 and aarch64
> >> target.
> >>
> >> Target independent code is structured in the following files.
> >> gcc/pair-fusion-base.h
> >> gcc/pair-fusion-common.cc
> >> gcc/pair-fusion.cc
> >>
> >> Target independent code is the Generic code with pure virtual
> >> function to interface betwwen target independent and dependent
> >> code.
> >>
> >> Thanks & Regards
> >> Ajit
> >>
> >> Target independent code for common infrastructure of load
> >> store fusion for rs6000 and aarch64 target.
> >>
> >> Common infrastructure of load store pair fusion is divided into
> >> target independent and target dependent code for rs6000 and aarch64
> >> target.
> >>
> >> Target independent code is structured in the following files.
> >> gcc/pair-fusion-base.h
> >> gcc/pair-fusion-common.cc
> >> gcc/pair-fusion.cc
> >>
> >> Target independent code is the Generic code with pure virtual
> >> function to interface betwwen target independent and dependent
> >> code.
> >>
> >> 2024-02-15  Ajit Kumar Agarwal  
> >>
> >> gcc/ChangeLog:
> >>
> >>* pair-fusion-base.h: Generic header code for load store fusion
> >>that can be shared across different architectures.
> >>* pair-fusion-common.cc: Generic source code for load store
> >>fusion that can be shared across different architectures.
> >>* pair-fusion.cc: Generic implementation of pair_fusion class
> >>defined in pair-fusion-base.h
> >>* Makefile.in: Add new executable pair-fusion.o and
> >>pair-fusion-common.o.
> >> ---
> >>  gcc/Makefile.in   |2 +
> >>  gcc/pair-fusion-base.h|  586 ++
> >>  gcc/pair-fusion-common.cc | 1202 
> >>  gcc/pair-fusion.cc| 1225 +
> >>  4 files changed, 3015 insertions(+)
> >>  create mode 100644 gcc/pair-fusion-base.h
> >>  create mode 100644 gcc/pair-fusion-common.cc
> >>  create mode 100644 gcc/pair-fusion.cc
> >>
> >> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> >> index a74761b7ab3..df5061ddfe7 100644
> >> --- a/gcc/Makefile.in
> >> +++ b/gcc/Makefile.in
> >> @@ -1563,6 +1563,8 @@ OBJS = \
> >>ipa-strub.o \
> >>ipa.o \
> >>ira.o \
> >> +  pair-fusion-common.o \
> >> +  pair-fusion.o \
> >>ira-build.o \
> >>ira-costs.o \
> >>ira-conflicts.o \
> >> diff --git a/gcc/pair-fusion-base.h b/gcc/pair-fusion-base.h
> >> new file mode 100644
> >> index 000..fdaf4fd743d
> >> --- /dev/null

Re: [PATCH 0/1 V2] Target independent code for common infrastructure of load,store fusion for rs6000 and aarch64 target.

2024-02-15 Thread Alex Coplan
On 15/02/2024 21:24, Ajit Agarwal wrote:
> Hello Richard:
> 
> As per your suggestion I have divided the patch into target independent
> and target dependent for aarch64 target. I kept aarch64-ldp-fusion same
> and did not change that.

I'm not sure this was what Richard suggested doing, though.
He said (from
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645545.html):

> Maybe one way of making the review easier would be to split the aarch64
> pass into the "target-dependent" and "target-independent" pieces
> in-place, i.e. keeping everything within aarch64-ldp-fusion.cc, and then
> (as separate patches) move the target-independent pieces outside
> config/aarch64.

but this adds the target-independent parts separately instead of
splitting it out within config/aarch64 (which I agree should make the
review easier).

Thanks,
Alex

> 
> Common infrastructure of load store pair fusion is divided into
> target independent and target dependent code for rs6000 and aarch64
> target.
> 
> Target independent code is structured in the following files.
> gcc/pair-fusion-base.h
> gcc/pair-fusion-common.cc
> gcc/pair-fusion.cc
> 
> Target independent code is the Generic code with pure virtual
> function to interface betwwen target independent and dependent
> code.
> 
> Thanks & Regards
> Ajit
> 
> Target independent code for common infrastructure of load
> store fusion for rs6000 and aarch64 target.
> 
> Common infrastructure of load store pair fusion is divided into
> target independent and target dependent code for rs6000 and aarch64
> target.
> 
> Target independent code is structured in the following files.
> gcc/pair-fusion-base.h
> gcc/pair-fusion-common.cc
> gcc/pair-fusion.cc
> 
> Target independent code is the Generic code with pure virtual
> function to interface betwwen target independent and dependent
> code.
> 
> 2024-02-15  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * pair-fusion-base.h: Generic header code for load store fusion
>   that can be shared across different architectures.
>   * pair-fusion-common.cc: Generic source code for load store
>   fusion that can be shared across different architectures.
>   * pair-fusion.cc: Generic implementation of pair_fusion class
>   defined in pair-fusion-base.h
>   * Makefile.in: Add new executable pair-fusion.o and
>   pair-fusion-common.o.
> ---
>  gcc/Makefile.in   |2 +
>  gcc/pair-fusion-base.h|  586 ++
>  gcc/pair-fusion-common.cc | 1202 
>  gcc/pair-fusion.cc| 1225 +
>  4 files changed, 3015 insertions(+)
>  create mode 100644 gcc/pair-fusion-base.h
>  create mode 100644 gcc/pair-fusion-common.cc
>  create mode 100644 gcc/pair-fusion.cc
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a74761b7ab3..df5061ddfe7 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1563,6 +1563,8 @@ OBJS = \
>   ipa-strub.o \
>   ipa.o \
>   ira.o \
> + pair-fusion-common.o \
> + pair-fusion.o \
>   ira-build.o \
>   ira-costs.o \
>   ira-conflicts.o \
> diff --git a/gcc/pair-fusion-base.h b/gcc/pair-fusion-base.h
> new file mode 100644
> index 000..fdaf4fd743d
> --- /dev/null
> +++ b/gcc/pair-fusion-base.h
> @@ -0,0 +1,586 @@
> +// Generic code for Pair MEM  fusion optimization pass.
> +// Copyright (C) 2023-2024 Free Software Foundation, Inc.
> +//
> +// This file is part of GCC.
> +//
> +// GCC is free software; you can redistribute it and/or modify it
> +// under the terms of the GNU General Public License as published by
> +// the Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +//
> +// GCC is distributed in the hope that it will be useful, but
> +// WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +// General Public License for more details.
> +//
> +// You should have received a copy of the GNU General Public License
> +// along with GCC; see the file COPYING3.  If not see
> +// .
> +
> +#ifndef GCC_PAIR_FUSION_H
> +#define GCC_PAIR_FUSION_H
> +#define INCLUDE_ALGORITHM
> +#define INCLUDE_FUNCTIONAL
> +#define INCLUDE_LIST
> +#define INCLUDE_TYPE_TRAITS
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "df.h"
> +#include "rtl-iter.h"
> +#include "rtl-ssa.h"
> +#include "cfgcleanup.h"
> +#include "tree-pass.h"
> +#include "ordered-hash-map.h"
> +#include "tree-dfa.h"
> +#include "fold-const.h"
> +#include "tree-hash-traits.h"
> +#include "print-tree.h"
> +#include "insn-attr.h"
> +using namespace rtl_ssa;
> +// We pack these fields (load_p, fpsimd_p, and size) into an integer
> +// (LFS) which we use as part of the key into the main hash tables.
> +//
> +// The idea is that we group candidates together only if they agree on
> +

Re: [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]

2024-02-15 Thread Alex Coplan
On 14/02/2024 11:18, Richard Sandiford wrote:
> Alex Coplan  writes:
> > This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch.
> > The only part of the patch that isn't a straight cherry-pick is due to
> > the TX iterator lacking TDmode for GCC 12, so this version adjusts
> > TX_V16QI accordingly.
> >
> > Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the
> > testsuite I saw were in
> > gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output
> > "READ of size 4 [...]" check appears to be flaky on the GCC 12 branch
> > since libhwasan gained the short granule tag feature, I've requested a
> > backport of the following patch (committed as
> > r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that
> > (independent) issue for GCC 12:
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html
> >
> > OK for the GCC 12 branch?
> 
> OK, thanks.

Thanks.  The patch cherry-picks cleanly on the GCC 11 branch, and
bootstraps/regtests OK there.  Is it OK for GCC 11 too, even though the
issue is latent there (at least for the testcase in the patch)?

Alex

> 
> Richard
> 
> > Thanks,
> > Alex
> >
> > -- >8 --
> >
> > The PR shows us ICEing due to an unrecognizable TFmode save emitted by
> > aarch64_process_components.  The problem is that for T{I,F,D}mode we
> > conservatively require mems to be in range for x-register ldp/stp.  That
> > is because (at least for TImode) it can be allocated to both GPRs and
> > FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
> > a q-register load/store.
> >
> > As Richard pointed out in the PR, aarch64_get_separate_components
> > already checks that the offsets are suitable for a single load, so we
> > just need to choose a mode in aarch64_reg_save_mode that gives the full
> > q-register range.  In this patch, we choose V16QImode as an alternative
> > 16-byte "bag-of-bits" mode that doesn't have the artificial range
> > restrictions imposed on T{I,F,D}mode.
> >
> > Unlike for GCC 14 we need additional handling in the load/store pair
> > code as various cases are not expecting to see V16QImode (particularly
> > the writeback patterns, but also aarch64_gen_load_pair).
> >
> > gcc/ChangeLog:
> >
> > PR target/111677
> > * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
> > V16QImode for the full 16-byte FPR saves in the vector PCS case.
> > (aarch64_gen_storewb_pair): Handle V16QImode.
> > (aarch64_gen_loadwb_pair): Likewise.
> > (aarch64_gen_load_pair): Likewise.
> > * config/aarch64/aarch64.md (loadwb_pair_):
> > Rename to ...
> > (loadwb_pair_): ... this, extending to
> > V16QImode.
> > (storewb_pair_): Rename to ...
> > (storewb_pair_): ... this, extending to
> > V16QImode.
> > * config/aarch64/iterators.md (TX_V16QI): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/111677
> > * gcc.target/aarch64/torture/pr111677.c: New test.
> >
> > (cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3)
> > ---
> >  gcc/config/aarch64/aarch64.cc | 13 ++-
> >  gcc/config/aarch64/aarch64.md | 35 ++-
> >  gcc/config/aarch64/iterators.md   |  3 ++
> >  .../gcc.target/aarch64/torture/pr111677.c | 28 +++
> >  4 files changed, 61 insertions(+), 18 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 3bccd96a23d..2bbba323770 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno)
> >case ARM_PCS_SIMD:
> > /* The vector PCS saves the low 128 bits (which is the full
> >register on non-SVE targets).  */
> > -   return TFmode;
> > +   return V16QImode;
> >  
> >case ARM_PCS_SVE:
> > /* Use vectors of DImode for registers that need frame
> > @@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx 
> > base, rtx reg, rtx reg2,
> >return gen_storewb_pairtf_di (base, base, reg, reg2,
> > GEN_INT (-adjustment),
> > GEN_INT (UNITS_PER_VREG - adjustment));
> > +case E_V16QImode:
> > +  return gen_storewb_pai

[PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]

2024-02-12 Thread Alex Coplan
This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch.
The only part of the patch that isn't a straight cherry-pick is due to
the TX iterator lacking TDmode for GCC 12, so this version adjusts
TX_V16QI accordingly.

Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the
testsuite I saw were in
gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output
"READ of size 4 [...]" check appears to be flaky on the GCC 12 branch
since libhwasan gained the short granule tag feature, I've requested a
backport of the following patch (committed as
r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that
(independent) issue for GCC 12:
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html

OK for the GCC 12 branch?

Thanks,
Alex

-- >8 --

The PR shows us ICEing due to an unrecognizable TFmode save emitted by
aarch64_process_components.  The problem is that for T{I,F,D}mode we
conservatively require mems to be in range for x-register ldp/stp.  That
is because (at least for TImode) it can be allocated to both GPRs and
FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
a q-register load/store.

As Richard pointed out in the PR, aarch64_get_separate_components
already checks that the offsets are suitable for a single load, so we
just need to choose a mode in aarch64_reg_save_mode that gives the full
q-register range.  In this patch, we choose V16QImode as an alternative
16-byte "bag-of-bits" mode that doesn't have the artificial range
restrictions imposed on T{I,F,D}mode.

Unlike for GCC 14 we need additional handling in the load/store pair
code as various cases are not expecting to see V16QImode (particularly
the writeback patterns, but also aarch64_gen_load_pair).

gcc/ChangeLog:

PR target/111677
* config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
V16QImode for the full 16-byte FPR saves in the vector PCS case.
(aarch64_gen_storewb_pair): Handle V16QImode.
(aarch64_gen_loadwb_pair): Likewise.
(aarch64_gen_load_pair): Likewise.
* config/aarch64/aarch64.md (loadwb_pair_):
Rename to ...
(loadwb_pair_): ... this, extending to
V16QImode.
(storewb_pair_): Rename to ...
(storewb_pair_): ... this, extending to
V16QImode.
* config/aarch64/iterators.md (TX_V16QI): New.

gcc/testsuite/ChangeLog:

PR target/111677
* gcc.target/aarch64/torture/pr111677.c: New test.

(cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3)
---
 gcc/config/aarch64/aarch64.cc | 13 ++-
 gcc/config/aarch64/aarch64.md | 35 ++-
 gcc/config/aarch64/iterators.md   |  3 ++
 .../gcc.target/aarch64/torture/pr111677.c | 28 +++
 4 files changed, 61 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 3bccd96a23d..2bbba323770 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno)
   case ARM_PCS_SIMD:
 	/* The vector PCS saves the low 128 bits (which is the full
 	   register on non-SVE targets).  */
-	return TFmode;
+	return V16QImode;
 
   case ARM_PCS_SVE:
 	/* Use vectors of DImode for registers that need frame
@@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
   return gen_storewb_pairtf_di (base, base, reg, reg2,
 GEN_INT (-adjustment),
 GEN_INT (UNITS_PER_VREG - adjustment));
+case E_V16QImode:
+  return gen_storewb_pairv16qi_di (base, base, reg, reg2,
+   GEN_INT (-adjustment),
+   GEN_INT (UNITS_PER_VREG - adjustment));
 default:
   gcc_unreachable ();
 }
@@ -8647,6 +8651,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
 case E_TFmode:
   return gen_loadwb_pairtf_di (base, base, reg, reg2, GEN_INT (adjustment),
    GEN_INT (UNITS_PER_VREG));
+case E_V16QImode:
+  return gen_loadwb_pairv16qi_di (base, base, reg, reg2,
+  GEN_INT (adjustment),
+  GEN_INT (UNITS_PER_VREG));
 default:
   gcc_unreachable ();
 }
@@ -8730,6 +8738,9 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2,
 case E_V4SImode:
   return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2);
 
+case E_V16QImode:
+  return gen_load_pairv16qiv16qi (reg1, mem1, reg2, mem2);
+
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index fb100bdf6b3..99f185718c9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1874,17 +1874,18 @@ (define_insn "loadwb_pair_"
   [(set_attr "type" "neon_load1_2reg")]
 )
 
-(define_insn "loadwb_pair_"
+(define_insn "l

Re: [PATCH][PUSHED] hwasan: support new dg-output format.

2024-02-09 Thread Alex Coplan
Hi,

On 04/05/2022 09:59, Martin Liška wrote:
> Supports change in libsanitizer where it newly reports:
> READ of size 4 at 0xc3d4 tags: 02/01(00) (ptr/mem) in thread T0
> 
> So the 'tags' contains now 3 entries compared to 2 entries.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/hwasan/alloca-outside-caught.c: Update dg-output.
>   * c-c++-common/hwasan/heap-overflow.c: Likewise.
>   * c-c++-common/hwasan/hwasan-thread-access-parent.c: Likewise.
>   * c-c++-common/hwasan/large-aligned-1.c: Likewise.

I noticed the above test (large-aligned-1.c) failing on the GCC 12
branch due to the change in output format mentioned above.  This patch
(committed as r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) seems
to apply cleanly on the GCC 12 branch too, is it OK to backport to GCC 12?

Thanks,
Alex

>   * c-c++-common/hwasan/stack-tagging-basic-1.c: Likewise.
> ---
>  gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c   | 2 +-
>  gcc/testsuite/c-c++-common/hwasan/heap-overflow.c   | 2 +-
>  gcc/testsuite/c-c++-common/hwasan/hwasan-thread-access-parent.c | 2 +-
>  gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c | 2 +-
>  gcc/testsuite/c-c++-common/hwasan/stack-tagging-basic-1.c   | 2 +-
>  5 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c 
> b/gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c
> index 60d7a9a874f..6f3825bee7c 100644
> --- a/gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c
> +++ b/gcc/testsuite/c-c++-common/hwasan/alloca-outside-caught.c
> @@ -20,6 +20,6 @@ main ()
>  }
>  
>  /* { dg-output "HWAddressSanitizer: tag-mismatch on address 0x\[0-9a-f\]*.*" 
> } */
> -/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
> \[\[:xdigit:\]\]\[\[:xdigit:\]\]/00 \\(ptr/mem\\) in thread T0.*" } */
> +/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
> \[\[:xdigit:\]\]\[\[:xdigit:\]\]/00.* \\(ptr/mem\\) in thread T0.*" } */
>  /* { dg-output "Address 0x\[0-9a-f\]* is located in stack of thread T0.*" } 
> */
>  /* { dg-output "SUMMARY: HWAddressSanitizer: tag-mismatch \[^\n\]*.*" } */
> diff --git a/gcc/testsuite/c-c++-common/hwasan/heap-overflow.c 
> b/gcc/testsuite/c-c++-common/hwasan/heap-overflow.c
> index 137466800de..bddb38c81f1 100644
> --- a/gcc/testsuite/c-c++-common/hwasan/heap-overflow.c
> +++ b/gcc/testsuite/c-c++-common/hwasan/heap-overflow.c
> @@ -23,7 +23,7 @@ int main(int argc, char **argv) {
>  }
>  
>  /* { dg-output "HWAddressSanitizer: tag-mismatch on address 0x\[0-9a-f\]*.*" 
> } */
> -/* { dg-output "READ of size 1 at 0x\[0-9a-f\]* tags: 
> \[\[:xdigit:\]\]\[\[:xdigit:\]\]/\[\[:xdigit:\]\]\[\[:xdigit:\]\] 
> \\(ptr/mem\\) in thread T0.*" } */
> +/* { dg-output "READ of size 1 at 0x\[0-9a-f\]* tags: 
> \[\[:xdigit:\]\]\[\[:xdigit:\]\]/\[\[:xdigit:\]\]\[\[:xdigit:\]\].* 
> \\(ptr/mem\\) in thread T0.*" } */
>  /* { dg-output "located 0 bytes to the right of 10-byte region.*" } */
>  /* { dg-output "allocated here:.*" } */
>  /* { dg-output "#1 0x\[0-9a-f\]+ +in _*main \[^\n\r]*heap-overflow.c:18" } */
> diff --git a/gcc/testsuite/c-c++-common/hwasan/hwasan-thread-access-parent.c 
> b/gcc/testsuite/c-c++-common/hwasan/hwasan-thread-access-parent.c
> index 828909d3b3b..eca27c8cd2c 100644
> --- a/gcc/testsuite/c-c++-common/hwasan/hwasan-thread-access-parent.c
> +++ b/gcc/testsuite/c-c++-common/hwasan/hwasan-thread-access-parent.c
> @@ -46,6 +46,6 @@ main (int argc, char **argv)
>  }
>  
>  /* { dg-output "HWAddressSanitizer: tag-mismatch on address 0x\[0-9a-f\]*.*" 
> } */
> -/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
> 00/\[\[:xdigit:\]\]\[\[:xdigit:\]\] \\(ptr/mem\\) in thread T1.*" } */
> +/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
> 00/\[\[:xdigit:\]\]\[\[:xdigit:\]\].* \\(ptr/mem\\) in thread T1.*" } */
>  /* { dg-output "Address 0x\[0-9a-f\]* is located in stack of thread T0.*" } 
> */
>  /* { dg-output "SUMMARY: HWAddressSanitizer: tag-mismatch \[^\n\]*.*" } */
> diff --git a/gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c 
> b/gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c
> index 1aa13032396..6158ba4bdbc 100644
> --- a/gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c
> +++ b/gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c
> @@ -9,6 +9,6 @@
>  /* { dg-output "HWAddressSanitizer: tag-mismatch on address 0x\[0-9a-f\]*.*" 
> } */
>  /* NOTE: This assumes the current tagging mechanism (one at a time from the
> base and large aligned variables being handled first).  */
> -/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
> \[\[:xdigit:\]\]\[\[:xdigit:\]\]/\[\[:xdigit:\]\]\[\[:xdigit:\]\] 
> \\(ptr/mem\\) in thread T0.*" } */
> +/* { dg-output "READ of size 4 at 0x\[0-9a-f\]* tags: 
> \[\[:xdigit:\]\]\[\[:xdigit:\]\]/\[\[:xdigit:\]\]\[\[:xdigit:\]\].* 
> \\(ptr/mem\\) in thread T0.*" } */
>  /* { dg-output "Address 0x\[0-9a-f\]* is located in stack of thread T0.*" } 
> *

Re: [PATCH] c++: Don't advertise cxx_constexpr_string_builtins [PR113658]

2024-02-02 Thread Alex Coplan
On 02/02/2024 09:34, Marek Polacek wrote:
> On Fri, Feb 02, 2024 at 10:27:23AM +0000, Alex Coplan wrote:
> > Bootstrapped/regtested on x86_64-apple-darwin, OK for trunk?
> > 
> > Thanks,
> > Alex
> > 
> > -- >8 --
> > 
> > When __has_feature was introduced for GCC 14, I included the feature
> > cxx_constexpr_string_builtins, since of the relevant string builtins
> > that GCC implements, it seems to support constexpr evaluation of those
> > builtins.
> > 
> > However, as the PR shows, GCC doesn't implement the full list of
> > builtins in the clang documentation.  After enumerating the builtins,
> > the clang docs [1] say:
> > 
> > > Support for constant expression evaluation for the above builtins can
> > > be detected with __has_feature(cxx_constexpr_string_builtins).
> > 
> > and a strict reading of this would suggest we can't really support
> > constexpr evaluation of a builtin if we don't implement the builtin in
> > the first place.
> > 
> > So the conservatively correct thing to do seems to be to stop
> > advertising the feature altogether to avoid failing to build code which
> > assumes the presence of this feature implies the presence of all the
> > builtins listed in the clang documentation.
> > 
> > [1] : https://clang.llvm.org/docs/LanguageExtensions.html#string-builtins
> > 
> > gcc/cp/ChangeLog:
> > 
> > PR c++/113658
> > * cp-objcp-common.cc (cp_feature_table): Remove entry for
> > cxx_constexpr_string_builtins.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR c++/113658
> > * g++.dg/ext/pr113658.C: New test.
> 
> > diff --git a/gcc/cp/cp-objcp-common.cc b/gcc/cp/cp-objcp-common.cc
> > index f06edf04ef0..85dde0459fa 100644
> > --- a/gcc/cp/cp-objcp-common.cc
> > +++ b/gcc/cp/cp-objcp-common.cc
> > @@ -110,7 +110,6 @@ static constexpr cp_feature_info cp_feature_table[] =
> >{ "cxx_alignof", cxx11 },
> >{ "cxx_attributes", cxx11 },
> >{ "cxx_constexpr", cxx11 },
> > -  { "cxx_constexpr_string_builtins", cxx11 },
> >{ "cxx_decltype", cxx11 },
> >{ "cxx_decltype_incomplete_return_types", cxx11 },
> >{ "cxx_default_function_template_args", cxx11 },
> > diff --git a/gcc/testsuite/g++.dg/ext/pr113658.C 
> > b/gcc/testsuite/g++.dg/ext/pr113658.C
> > new file mode 100644
> > index 000..f4a34888f28
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/ext/pr113658.C
> 
> Might be better to name this has-feature2.C
> 
> > @@ -0,0 +1,13 @@
> 
> Please include
> // PR c++/113658

Can do.

> 
> > +// { dg-do compile }
> > +// { dg-options "" }
> 
> Why dg-options ""?  It doesn't seem to have any purpose here.

That is to disable -pedantic-errors which IIRC is added by default in
the testsuite options.

With -pedantic-errors we would have __has_extension behaving like
__has_feature, and I wanted to check in the test that this doesn't get
reported as a feature or extension.

Incidentally it also means we don't have to provide a dummy declaration,
with -pedantic-errors we would get a warning about an empty TU which
would make the test fail.

Thanks,
Alex

> 
> > +// PR113658: we shouldn't declare support for 
> > cxx_constexpr_string_builtins as
> > +// GCC is missing some of the builtins that clang implements.
> > +
> > +#if __has_feature (cxx_constexpr_string_builtins)
> > +#error
> > +#endif
> > +
> > +#if __has_extension (cxx_constexpr_string_builtins)
> > +#error
> > +#endif
> 
> 
> Marek
> 


[PATCH] c++: Don't advertise cxx_constexpr_string_builtins [PR113658]

2024-02-02 Thread Alex Coplan
Bootstrapped/regtested on x86_64-apple-darwin, OK for trunk?

Thanks,
Alex

-- >8 --

When __has_feature was introduced for GCC 14, I included the feature
cxx_constexpr_string_builtins, since of the relevant string builtins
that GCC implements, it seems to support constexpr evaluation of those
builtins.

However, as the PR shows, GCC doesn't implement the full list of
builtins in the clang documentation.  After enumerating the builtins,
the clang docs [1] say:

> Support for constant expression evaluation for the above builtins can
> be detected with __has_feature(cxx_constexpr_string_builtins).

and a strict reading of this would suggest we can't really support
constexpr evaluation of a builtin if we don't implement the builtin in
the first place.

So the conservatively correct thing to do seems to be to stop
advertising the feature altogether to avoid failing to build code which
assumes the presence of this feature implies the presence of all the
builtins listed in the clang documentation.

[1] : https://clang.llvm.org/docs/LanguageExtensions.html#string-builtins

gcc/cp/ChangeLog:

PR c++/113658
* cp-objcp-common.cc (cp_feature_table): Remove entry for
cxx_constexpr_string_builtins.

gcc/testsuite/ChangeLog:

PR c++/113658
* g++.dg/ext/pr113658.C: New test.
diff --git a/gcc/cp/cp-objcp-common.cc b/gcc/cp/cp-objcp-common.cc
index f06edf04ef0..85dde0459fa 100644
--- a/gcc/cp/cp-objcp-common.cc
+++ b/gcc/cp/cp-objcp-common.cc
@@ -110,7 +110,6 @@ static constexpr cp_feature_info cp_feature_table[] =
   { "cxx_alignof", cxx11 },
   { "cxx_attributes", cxx11 },
   { "cxx_constexpr", cxx11 },
-  { "cxx_constexpr_string_builtins", cxx11 },
   { "cxx_decltype", cxx11 },
   { "cxx_decltype_incomplete_return_types", cxx11 },
   { "cxx_default_function_template_args", cxx11 },
diff --git a/gcc/testsuite/g++.dg/ext/pr113658.C 
b/gcc/testsuite/g++.dg/ext/pr113658.C
new file mode 100644
index 000..f4a34888f28
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/pr113658.C
@@ -0,0 +1,13 @@
+// { dg-do compile }
+// { dg-options "" }
+
+// PR113658: we shouldn't declare support for cxx_constexpr_string_builtins as
+// GCC is missing some of the builtins that clang implements.
+
+#if __has_feature (cxx_constexpr_string_builtins)
+#error
+#endif
+
+#if __has_extension (cxx_constexpr_string_builtins)
+#error
+#endif


Re: [PATCH v2] c++: avoid -Wdangling-reference for std::span-like classes [PR110358]

2024-02-01 Thread Alex Coplan
On 31/01/2024 15:53, Marek Polacek wrote:
> On Wed, Jan 31, 2024 at 07:44:41PM +0000, Alex Coplan wrote:
> > Hi Marek,
> > 
> > On 30/01/2024 13:15, Marek Polacek wrote:
> > > On Thu, Jan 25, 2024 at 10:13:10PM -0500, Jason Merrill wrote:
> > > > On 1/25/24 20:36, Marek Polacek wrote:
> > > > > Better version:
> > > > > 
> > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > 
> > > > > -- >8 --
> > > > > Real-world experience shows that -Wdangling-reference triggers for
> > > > > user-defined std::span-like classes a lot.  We can easily avoid that
> > > > > by considering classes like
> > > > > 
> > > > >  template
> > > > >  struct Span {
> > > > >T* data_;
> > > > >std::size len_;
> > > > >  };
> > > > > 
> > > > > to be std::span-like, and not warning for them.  Unlike the previous
> > > > > patch, this one considers a non-union class template that has a 
> > > > > pointer
> > > > > data member and a trivial destructor as std::span-like.
> > > > > 
> > > > >   PR c++/110358
> > > > >   PR c++/109640
> > > > > 
> > > > > gcc/cp/ChangeLog:
> > > > > 
> > > > >   * call.cc (reference_like_class_p): Don't warn for 
> > > > > std::span-like
> > > > >   classes.
> > > > > 
> > > > > gcc/ChangeLog:
> > > > > 
> > > > >   * doc/invoke.texi: Update -Wdangling-reference description.
> > > > > 
> > > > > gcc/testsuite/ChangeLog:
> > > > > 
> > > > >   * g++.dg/warn/Wdangling-reference18.C: New test.
> > > > >   * g++.dg/warn/Wdangling-reference19.C: New test.
> > > > >   * g++.dg/warn/Wdangling-reference20.C: New test.
> > > > > ---
> > > > >   gcc/cp/call.cc| 18 
> > > > >   gcc/doc/invoke.texi   | 14 +++
> > > > >   .../g++.dg/warn/Wdangling-reference18.C   | 24 +++
> > > > >   .../g++.dg/warn/Wdangling-reference19.C   | 25 +++
> > > > >   .../g++.dg/warn/Wdangling-reference20.C   | 42 
> > > > > +++
> > > > >   5 files changed, 123 insertions(+)
> > > > >   create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference18.C
> > > > >   create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference19.C
> > > > >   create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference20.C
> > > > > 
> > > > > diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> > > > > index 9de0d77c423..afd3e1ff024 100644
> > > > > --- a/gcc/cp/call.cc
> > > > > +++ b/gcc/cp/call.cc
> > > > > @@ -14082,6 +14082,24 @@ reference_like_class_p (tree ctype)
> > > > >   return true;
> > > > >   }
> > > > > +  /* Avoid warning if CTYPE looks like std::span: it's a class 
> > > > > template,
> > > > > + has a T* member, and a trivial destructor.  For example,
> > > > > +
> > > > > +  template
> > > > > +  struct Span {
> > > > > + T* data_;
> > > > > + std::size len_;
> > > > > +  };
> > > > > +
> > > > > + is considered std::span-like.  */
> > > > > +  if (NON_UNION_CLASS_TYPE_P (ctype)
> > > > > +  && CLASSTYPE_TEMPLATE_INSTANTIATION (ctype)
> > > > > +  && TYPE_HAS_TRIVIAL_DESTRUCTOR (ctype))
> > > > > +for (tree field = next_aggregate_field (TYPE_FIELDS (ctype));
> > > > > +  field; field = next_aggregate_field (DECL_CHAIN (field)))
> > > > > +  if (TYPE_PTR_P (TREE_TYPE (field)))
> > > > > + return true;
> > > > > +
> > > > > /* Some classes, such as std::tuple, have the reference member in 
> > > > > its
> > > > >(non-direct) base class.  */
> > > > > if (dfs_walk_once (TYPE_BINFO (ctype), 
> > > > > class_has_reference_member_p_r,
> > > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.te

Re: [PATCH v2] c++: avoid -Wdangling-reference for std::span-like classes [PR110358]

2024-01-31 Thread Alex Coplan
Hi Marek,

On 30/01/2024 13:15, Marek Polacek wrote:
> On Thu, Jan 25, 2024 at 10:13:10PM -0500, Jason Merrill wrote:
> > On 1/25/24 20:36, Marek Polacek wrote:
> > > Better version:
> > > 
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > 
> > > -- >8 --
> > > Real-world experience shows that -Wdangling-reference triggers for
> > > user-defined std::span-like classes a lot.  We can easily avoid that
> > > by considering classes like
> > > 
> > >  template
> > >  struct Span {
> > >T* data_;
> > >std::size len_;
> > >  };
> > > 
> > > to be std::span-like, and not warning for them.  Unlike the previous
> > > patch, this one considers a non-union class template that has a pointer
> > > data member and a trivial destructor as std::span-like.
> > > 
> > >   PR c++/110358
> > >   PR c++/109640
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * call.cc (reference_like_class_p): Don't warn for std::span-like
> > >   classes.
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   * doc/invoke.texi: Update -Wdangling-reference description.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/warn/Wdangling-reference18.C: New test.
> > >   * g++.dg/warn/Wdangling-reference19.C: New test.
> > >   * g++.dg/warn/Wdangling-reference20.C: New test.
> > > ---
> > >   gcc/cp/call.cc| 18 
> > >   gcc/doc/invoke.texi   | 14 +++
> > >   .../g++.dg/warn/Wdangling-reference18.C   | 24 +++
> > >   .../g++.dg/warn/Wdangling-reference19.C   | 25 +++
> > >   .../g++.dg/warn/Wdangling-reference20.C   | 42 +++
> > >   5 files changed, 123 insertions(+)
> > >   create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference18.C
> > >   create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference19.C
> > >   create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference20.C
> > > 
> > > diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> > > index 9de0d77c423..afd3e1ff024 100644
> > > --- a/gcc/cp/call.cc
> > > +++ b/gcc/cp/call.cc
> > > @@ -14082,6 +14082,24 @@ reference_like_class_p (tree ctype)
> > >   return true;
> > >   }
> > > +  /* Avoid warning if CTYPE looks like std::span: it's a class template,
> > > + has a T* member, and a trivial destructor.  For example,
> > > +
> > > +  template
> > > +  struct Span {
> > > + T* data_;
> > > + std::size len_;
> > > +  };
> > > +
> > > + is considered std::span-like.  */
> > > +  if (NON_UNION_CLASS_TYPE_P (ctype)
> > > +  && CLASSTYPE_TEMPLATE_INSTANTIATION (ctype)
> > > +  && TYPE_HAS_TRIVIAL_DESTRUCTOR (ctype))
> > > +for (tree field = next_aggregate_field (TYPE_FIELDS (ctype));
> > > +  field; field = next_aggregate_field (DECL_CHAIN (field)))
> > > +  if (TYPE_PTR_P (TREE_TYPE (field)))
> > > + return true;
> > > +
> > > /* Some classes, such as std::tuple, have the reference member in its
> > >(non-direct) base class.  */
> > > if (dfs_walk_once (TYPE_BINFO (ctype), class_has_reference_member_p_r,
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index 6ec56493e59..e0ff18a86f5 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -3916,6 +3916,20 @@ where @code{std::minmax} returns 
> > > @code{std::pair}, and
> > >   both references dangle after the end of the full expression that 
> > > contains
> > >   the call to @code{std::minmax}.
> > > +The warning does not warn for @code{std::span}-like classes.  We consider
> > > +classes of the form:
> > > +
> > > +@smallexample
> > > +template
> > > +struct Span @{
> > > +  T* data_;
> > > +  std::size len_;
> > > +@};
> > > +@end smallexample
> > > +
> > > +as @code{std::span}-like; that is, the class is a non-union class 
> > > template
> > > +that has a pointer data member and a trivial destructor.
> > > +
> > >   This warning is enabled by @option{-Wall}.
> > >   @opindex Wdelete-non-virtual-dtor
> > > diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-reference18.C 
> > > b/gcc/testsuite/g++.dg/warn/Wdangling-reference18.C
> > > new file mode 100644
> > > index 000..e088c177769
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/warn/Wdangling-reference18.C
> > > @@ -0,0 +1,24 @@
> > > +// PR c++/110358
> > > +// { dg-do compile { target c++11 } }
> > > +// { dg-options "-Wdangling-reference" }
> > > +// Don't warn for std::span-like classes.
> > > +
> > > +template 
> > > +struct Span {
> > > +T* data_;
> > > +int len_;
> > > +
> > > +[[nodiscard]] constexpr auto operator[](int n) const noexcept -> T& 
> > > { return data_[n]; }
> > > +[[nodiscard]] constexpr auto front() const noexcept -> T& { return 
> > > data_[0]; }
> > > +[[nodiscard]] constexpr auto back() const noexcept -> T& { return 
> > > data_[len_ - 1]; }
> > > +};
> > > +
> > > +auto get() -> Span;
> > > +
> > > +auto f() -> int {
> > > +int const& a = get().front(); // { dg-

[PATCH][GCC 13] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]

2024-01-30 Thread Alex Coplan
Bootstrapped/regtested on aarch64-linux-gnu, OK for the 13 branch after
a week of the trunk fix being in?  OK for the other active branches if
the same changes test cleanly there?

GCC 14 patch for reference:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/61.html

Thanks,
Alex

-- >8 --

The PR shows us ICEing due to an unrecognizable TFmode save emitted by
aarch64_process_components.  The problem is that for T{I,F,D}mode we
conservatively require mems to be in range for x-register ldp/stp.  That
is because (at least for TImode) it can be allocated to both GPRs and
FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
a q-register load/store.

As Richard pointed out in the PR, aarch64_get_separate_components
already checks that the offsets are suitable for a single load, so we
just need to choose a mode in aarch64_reg_save_mode that gives the full
q-register range.  In this patch, we choose V16QImode as an alternative
16-byte "bag-of-bits" mode that doesn't have the artificial range
restrictions imposed on T{I,F,D}mode.

For T{F,D}mode in GCC 15 I think we could consider relaxing the
restriction imposed in aarch64_classify_address, as AFAIK T{F,D}mode can
only be allocated to FPRs (unlike TImode).  But such a change seems too
invasive to consider for GCC 14 at this stage (let alone backports).

Unlike for GCC 14 we need additional handling in the load/store pair
code as various cases are not expecting to see V16QImode (particularly
the writeback patterns, but also aarch64_gen_load_pair).

gcc/ChangeLog:

PR target/111677
* config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
V16QImode for the full 16-byte FPR saves in the vector PCS case.
(aarch64_gen_storewb_pair): Handle V16QImode.
(aarch64_gen_loadwb_pair): Likewise.
(aarch64_gen_load_pair): Likewise.
* config/aarch64/aarch64.md (loadwb_pair_):
Rename to ...
(loadwb_pair_): ... this, extending to
V16QImode.
(storewb_pair_): Rename to ...
(storewb_pair_): ... this, extending to
V16QImode.
* config/aarch64/iterators.md (TX_V16QI): New.

gcc/testsuite/ChangeLog:

PR target/111677
* gcc.target/aarch64/torture/pr111677.c: New test.
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 02515d4683a..f546c48ae2d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -4074,7 +4074,7 @@ aarch64_reg_save_mode (unsigned int regno)
   case ARM_PCS_SIMD:
/* The vector PCS saves the low 128 bits (which is the full
   register on non-SVE targets).  */
-   return TFmode;
+   return V16QImode;
 
   case ARM_PCS_SVE:
/* Use vectors of DImode for registers that need frame
@@ -8863,6 +8863,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx base, 
rtx reg, rtx reg2,
   return gen_storewb_pairtf_di (base, base, reg, reg2,
GEN_INT (-adjustment),
GEN_INT (UNITS_PER_VREG - adjustment));
+case E_V16QImode:
+  return gen_storewb_pairv16qi_di (base, base, reg, reg2,
+  GEN_INT (-adjustment),
+  GEN_INT (UNITS_PER_VREG - adjustment));
 default:
   gcc_unreachable ();
 }
@@ -8908,6 +8912,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx base, 
rtx reg, rtx reg2,
 case E_TFmode:
   return gen_loadwb_pairtf_di (base, base, reg, reg2, GEN_INT (adjustment),
   GEN_INT (UNITS_PER_VREG));
+case E_V16QImode:
+  return gen_loadwb_pairv16qi_di (base, base, reg, reg2,
+ GEN_INT (adjustment),
+ GEN_INT (UNITS_PER_VREG));
 default:
   gcc_unreachable ();
 }
@@ -8991,6 +8999,9 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx 
mem1, rtx reg2,
 case E_V4SImode:
   return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2);
 
+case E_V16QImode:
+  return gen_load_pairv16qiv16qi (reg1, mem1, reg2, mem2);
+
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 50239d72fc0..922cc987595 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1896,17 +1896,18 @@ (define_insn "loadwb_pair_"
   [(set_attr "type" "neon_load1_2reg")]
 )
 
-(define_insn "loadwb_pair_"
+(define_insn "loadwb_pair_"
   [(parallel
 [(set (match_operand:P 0 "register_operand" "=k")
-  (plus:P (match_operand:P 1 "register_operand" "0")
-  (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
- (set (match_operand:TX 2 "register_operand" "=w")
-  (mem:TX (match_dup 1)))
- (set (match_operand:TX 3 "register_operand" "=w")
-  (mem:TX (plus:P (match_dup 1)
+ (plus:P (match_operand:P 1 "register_operand" "0"

[PATCH] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]

2024-01-30 Thread Alex Coplan
Hi,

The PR shows us ICEing due to an unrecognizable TFmode save emitted by
aarch64_process_components.  The problem is that for T{I,F,D}mode we
conservatively require mems to be in range for x-register ldp/stp.  That
is because (at least for TImode) it can be allocated to both GPRs and
FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
a q-register load/store.

As Richard pointed out in the PR, aarch64_get_separate_components
already checks that the offsets are suitable for a single load, so we
just need to choose a mode in aarch64_reg_save_mode that gives the full
q-register range.  In this patch, we choose V16QImode as an alternative
16-byte "bag-of-bits" mode that doesn't have the artificial range
restrictions imposed on T{I,F,D}mode.

For T{F,D}mode in GCC 15 I think we could consider relaxing the
restriction imposed in aarch64_classify_address, as AFAIK T{F,D}mode can
only be allocated to FPRs (unlike TImode).  But such a change seems too
invasive to consider for GCC 14 at this stage (let alone backports).

Fortunately the new flexible load/store pair patterns in GCC 14 allow
this mode change to work without further changes.  The backports are
more involved as we need to adjust the load/store pair handling to cater
for V16QImode in a few places.

Note that for the testcase we are relying on the torture options to add
-funroll-loops at -O3 which is necessary to trigger the ICE on trunk
(but not on the 13 branch).

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

PR target/111677
* config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
V16QImode for the full 16-byte FPR saves in the vector PCS case.

gcc/testsuite/ChangeLog:

PR target/111677
* gcc.target/aarch64/torture/pr111677.c: New test.
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a37d47b243e..4556b8dd504 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2361,7 +2361,7 @@ aarch64_reg_save_mode (unsigned int regno)
   case ARM_PCS_SIMD:
/* The vector PCS saves the low 128 bits (which is the full
   register on non-SVE targets).  */
-   return TFmode;
+   return V16QImode;
 
   case ARM_PCS_SVE:
/* Use vectors of DImode for registers that need frame
diff --git a/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c 
b/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
new file mode 100644
index 000..6bb640c42c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fopenmp } */
+/* { dg-options "-ffast-math -fstack-protector-strong -fopenmp" } */
+typedef struct {
+  long size_z;
+  int width;
+} dt_bilateral_t;
+typedef float dt_aligned_pixel_t[4];
+#pragma omp declare simd
+void dt_bilateral_splat(dt_bilateral_t *b) {
+  float *buf;
+  long offsets[8];
+  for (; b;) {
+int firstrow;
+for (int j = firstrow; j; j++)
+  for (int i; i < b->width; i++) {
+dt_aligned_pixel_t contrib;
+for (int k = 0; k < 4; k++)
+  buf[offsets[k]] += contrib[k];
+  }
+float *dest;
+for (int j = (long)b; j; j++) {
+  float *src = (float *)b->size_z;
+  for (int i = 0; i < (long)b; i++)
+dest[i] += src[i];
+}
+  }
+}


[PATCH] aarch64: Ensure iterator validity when updating debug uses [PR113616]

2024-01-29 Thread Alex Coplan
Hi,

The fix for PR113089 introduced range-based for loops over the
debug_insn_uses of an RTL-SSA set_info, but in the case that we reset a
debug insn, the use would get removed from the use list, and thus we
would end up using an invalidated iterator in the next iteration of the
loop.  In practice this means we end up terminating the loop
prematurely, and hence ICE as in PR113089 since there are debug uses
that we failed to fix up.

This patch fixes that by introducing a general mechanism to avoid this
sort of problem.  We introduce a safe_iterator to iterator-utils.h which
wraps an iterator, and also holds the end iterator value.  It then
pre-computes the next iterator value at all iterations, so it doesn't
matter if the original iterator got invalidated during the loop body, we
can still move safely to the next iteration.

We introduce an iterate_safely helper which effectively adapts a
container such as iterator_range into a container of safe_iterators over
the original iterator type.

We then use iterate_safely around all loops over debug_insn_uses () in
the aarch64 ldp/stp pass to fix PR113616.  While doing this, I
remembered that cleanup_tombstones () had the same problem.  I
previously worked around this locally by manually maintaining the next
nondebug insn, so this patch also refactors that loop to use the new
iterate_safely helper.

While doing that I noticed that a couple of cases in cleanup_tombstones
could be converted from using dyn_cast to as_a,
which should be safe because there are no clobbers of mem in RTL-SSA, so
all defs of memory should be set_infos.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

PR target/113616
* config/aarch64/aarch64-ldp-fusion.cc (fixup_debug_uses_trailing_add):
Use iterate_safely when iterating over debug uses.
(fixup_debug_uses): Likewise.
(ldp_bb_info::cleanup_tombstones): Use iterate_safely to iterate
over nondebug insns instead of manually maintaining the next insn.
* iterator-utils.h (class safe_iterator): New.
(iterate_safely): New.

gcc/testsuite/ChangeLog:

PR target/113616
* gcc.c-torture/compile/pr113616.c: New test.
diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 932a6398ae3..22ed95eb743 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -1480,7 +1480,7 @@ fixup_debug_uses_trailing_add (obstack_watermark &attempt,
   def_info *def = defs[0];
 
   if (auto set = safe_dyn_cast (def->prev_def ()))
-for (auto use : set->debug_insn_uses ())
+for (auto use : iterate_safely (set->debug_insn_uses ()))
   if (*use->insn () > *pair_dst)
// DEF is getting re-ordered above USE, fix up USE accordingly.
fixup_debug_use (attempt, use, def, base, wb_offset);
@@ -1544,13 +1544,16 @@ fixup_debug_uses (obstack_watermark &attempt,
   auto def = memory_access (insns[0]->defs ());
   auto last_def = memory_access (insns[1]->defs ());
   for (; def != last_def; def = def->next_def ())
-   for (auto use : as_a (def)->debug_insn_uses ())
- {
-   if (dump_file)
- fprintf (dump_file, "  i%d: resetting debug use of mem\n",
-  use->insn ()->uid ());
-   reset_debug_use (use);
- }
+   {
+ auto set = as_a (def);
+ for (auto use : iterate_safely (set->debug_insn_uses ()))
+   {
+ if (dump_file)
+   fprintf (dump_file, "  i%d: resetting debug use of mem\n",
+use->insn ()->uid ());
+ reset_debug_use (use);
+   }
+   }
 }
 
   // Now let's take care of register uses, starting with debug uses
@@ -1577,7 +1580,7 @@ fixup_debug_uses (obstack_watermark &attempt,
 
   // Now that we've characterized the defs involved, go through the
   // debug uses and determine how to update them (if needed).
-  for (auto use : set->debug_insn_uses ())
+  for (auto use : iterate_safely (set->debug_insn_uses ()))
{
  if (*pair_dst < *use->insn () && defs[1])
// We're re-ordering defs[1] above a previous use of the
@@ -1609,7 +1612,7 @@ fixup_debug_uses (obstack_watermark &attempt,
 
   // We have a def in insns[1] which isn't def'd by the first insn.
   // Look to the previous def and see if it has any debug uses.
-  for (auto use : prev_set->debug_insn_uses ())
+  for (auto use : iterate_safely (prev_set->debug_insn_uses ()))
if (*pair_dst < *use->insn ())
  // We're ordering DEF above a previous use of the same register.
  update_debug_use (use, def, writeback_pat);
@@ -1622,7 +1625,8 @@ fixup_debug_uses (obstack_watermark &attempt,
   // second writeback def which need re-parenting: do that.
   auto def = find_access (insns[1]->defs (), base_regno);
   gcc_assert (def);
- 

Re: [PATCH] aarch64: Fix undefinedness while testing the J constraint [PR100204]

2024-01-26 Thread Alex Coplan
On 25/01/2024 11:57, Andrew Pinski wrote:
> The J constraint can invoke undefined behavior due to it taking the
> negative of the ival if ival was HWI_MIN. The fix is simple as casting
> to `unsigned HOST_WIDE_INT` before doing the negative of it. This
> does that.

Thanks for doing this.

> 
> Committed as obvious after build/test for aarch64-linux-gnu.
> 
> gcc/ChangeLog:
> 
>   PR target/100204
>   * config/aarch64/constraints.md (J): Cast to `unsigned HOST_WIDE_INT`
>   before taking the negative of it.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/constraints.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/constraints.md 
> b/gcc/config/aarch64/constraints.md
> index 8566befd727..a2569cea510 100644
> --- a/gcc/config/aarch64/constraints.md
> +++ b/gcc/config/aarch64/constraints.md
> @@ -118,7 +118,7 @@ (define_constraint "Uat"
>  (define_constraint "J"
>   "A constant that can be used with a SUB operation (once negated)."
>   (and (match_code "const_int")
> -  (match_test "aarch64_uimm12_shift (-ival)")))
> +  (match_test "aarch64_uimm12_shift (- (unsigned HOST_WIDE_INT) ival)")))

Sorry for the nitpick, but: I don't think we want a space after the unary -
here (at least according to https://gcc.gnu.org/codingconventions.html).

Alex

>  
>  ;; We can't use the mode of a CONST_INT to determine the context in
>  ;; which it is being used, so we must have a separate constraint for
> -- 
> 2.39.3
> 


Re: [PATCH V2] rs6000: New pass for replacement of adjacent loads fusion (lxv).

2024-01-24 Thread Alex Coplan
Hi Ajit,

On 21/01/2024 19:57, Ajit Agarwal wrote:
> 
> Hello All:
> 
> New pass to replace adjacent memory addresses lxv with lxvp.
> Added common infrastructure for load store fusion for
> different targets.

Thanks for this, it would be nice to see the load/store pair pass
generalized to multiple targets.

I assume you are targeting GCC 15 for this, as we are in stage 4 at
the moment?

> 
> Common routines are refactored in fusion-common.h.
> 
> AARCH64 load/store fusion pass is not changed with the 
> common infrastructure.

I think any patch to generalize the load/store pair fusion pass should
update the aarch64 code at the same time to use the generic
infrastructure, instead of duplicating the code.

As a general comment, I think we should move as much of the code as
possible to target-independent code, with only the bits that are truly
target-specific (e.g. deciding which modes to allow for a load/store
pair operand) in target code.

In terms of structuring the interface between generic code and target
code, I think it would be pragmatic to use a class with (in some cases,
pure) virtual functions that can be overriden by targets to implement
any target-specific behaviour.

IMO the generic class should be implemented in its own .cc instead of
using a header-only approach.  The target code would then define a
derived class which overrides the virtual functions (where necessary)
declared in the generic class, and then instantiate the derived class to
create a target-customized instance of the pass.

A more traditional GCC approach would be to use optabs and target hooks
to customize the behaviour of the pass to handle target-specific
aspects, but:
 - Target hooks are quite heavyweight, and we'd potentially have to add
   quite a few hooks just for one pass that (at least initially) will
   only be used by a couple of targets.
 - Using classes allows both sides to easily maintain their own state
   and share that state where appropriate.

Nit on naming: I understand you want to move away from ldp_fusion, but
how about pair_fusion or mem_pair_fusion instead of just "fusion" as a
base name?  IMO just "fusion" isn't very clear as to what the pass is
trying to achieve.

In general the code could do with a lot more commentary to explain the
rationale for various things / explain the high-level intent of the
code.

Unfortunately I'm not familiar with the DF framework (I've only really
worked with RTL-SSA for the aarch64 pass), so I haven't commented on the
use of that framework, but it would be nice if what you're trying to do
could be done using RTL-SSA instead of using DF directly.

Hopefully Richard S can chime in on those aspects.

My main concerns with the patch at the moment (apart from the code
duplication) is that it looks like:

 - The patch removes alias analysis from try_fuse_pair, which is unsafe.
 - The patch tries to make its own RTL changes inside
   rs6000_gen_load_pair, but it should let fuse_pair make those changes
   using RTL-SSA instead.

I've left some more specific (but still mostly high-level) comments below.

> 
> For AARCH64 architectures just include "fusion-common.h"
> and target dependent code can be added to that.
> 
> 
> Alex/Richard:
> 
> If you would like me to add for AARCH64 I can do that for AARCH64.
> 
> If you would like to do that is fine with me.
> 
> Bootstrapped and regtested with powerpc64-linux-gnu.
> 
> Improvement in performance is seen with Spec 2017 spec FP benchmarks.
> 
> Thanks & Regards
> Ajit
> 
> rs6000: New  pass for replacement of adjacent lxv with lxvp.

Are you looking to handle stores eventually, out of interest?  Looking
at rs6000-vecload-opt.cc:fusion_bb it looks like you're just handling
loads at the moment.

> 
> New pass to replace adjacent memory addresses lxv with lxvp.
> Added common infrastructure for load store fusion for
> different targets.
> 
> Common routines are refactored in fusion-common.h.

I've just done a very quick scan through this file as it mostly just
looks to be idential to existing code in aarch64-ldp-fusion.cc.

> 
> 2024-01-21  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-passes.def: New vecload pass
>   before pass_early_remat.
>   * config/rs6000/rs6000-vecload-opt.cc: Add new pass.
>   * config.gcc: Add new executable.
>   * config/rs6000/rs6000-protos.h: Add new prototype for vecload
>   pass.
>   * config/rs6000/rs6000.cc: Add new prototype for vecload pass.
>   * config/rs6000/t-rs6000: Add new rule.
>   * fusion-common.h: Add common infrastructure for load store
>   fusion that can be shared across different architectures.
>   * emit-rtl.cc: Modify assert code.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/powerpc/vecload.C: New test.
>   * g++.target/powerpc/vecload1.C: New test.
>   * gcc.target/powerpc/mma-builtin-1.c: Modify test.
> ---
>  gcc/config.gcc|4 +-
>  gcc/config/rs6000/rs6000-passe

Re: [PATCH] aarch64: Re-enable ldp/stp fusion pass

2024-01-24 Thread Alex Coplan
On 24/01/2024 09:15, Kyrylo Tkachov wrote:
> Hi Alex,
> 
> > -Original Message-
> > From: Alex Coplan 
> > Sent: Wednesday, January 24, 2024 8:34 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Earnshaw ; Richard Sandiford
> > ; Kyrylo Tkachov ;
> > Jakub Jelinek 
> > Subject: [PATCH] aarch64: Re-enable ldp/stp fusion pass
> > 
> > Hi,
> > 
> > Since, to the best of my knowledge, all reported regressions related to
> > the ldp/stp fusion pass have now been fixed, and PGO+LTO bootstrap with
> > --enable-languages=all is working again with the passes enabled, this
> > patch turns the passes back on by default, as agreed with Jakub here:
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642478.html
> > 
> > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
> > 
> 
> If we were super-pedantic about the GCC rules we could say that this is a 
> revert of 8ed77a2356c3562f96c64f968e7529065c128c6a and therefore:
> "Similarly, no outside approval is needed to revert a patch that you checked 
> in." 😊
> But that would go against the spirit of the rule.

Heh, definitely seems against the spirit of the rule.

> Anyway, this is ok. Thanks for working through the regressions so diligently.

Thanks! Pushed as g:da9647e98aa289ba3aba41cf5bbe14d0f5f27e77.

I'll keep an eye on gcc-bugs for any further fallout.

Alex

> Kyrill
> 
> > Thanks,
> > Alex
> > 
> > gcc/ChangeLog:
> > 
> > * config/aarch64/aarch64.opt (-mearly-ldp-fusion): Set default
> > to 1.
> > (-mlate-ldp-fusion): Likewise.


  1   2   3   4   >