Re: [PATCH 4/4] aarch64: Fix up uses of mem following stp insert [PR113070]

2024-01-15 Thread Alex Coplan
On 13/01/2024 15:46, Alex Coplan wrote:
> As the PR shows (specifically #c7) we are missing updating uses of mem
> when inserting an stp in the aarch64 load/store pair fusion pass.  This
> patch fixes that.
> 
> RTL-SSA has a simple view of memory and by default doesn't allow stores
> to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
> own alias analysis and so can re-order stores over other accesses when
> we deem this is safe.  If neither store can be re-purposed (moved into
> the required position to form the stp while respecting the RTL-SSA
> constraints), then we turn both the candidate stores into "tombstone"
> insns (logically delete them) and insert a new stp insn.
> 
> As it stands, we implement the insert case separately (after dealing
> with the candidate stores) in fuse_pair by inserting into the middle of
> the vector of changes.  This is OK when we only have to insert one
> change, but with this fix we would need to insert the change for the new
> stp plus multiple changes to fix up uses of mem (note the number of
> fix-ups is naturally bounded by the alias limit param to prevent
> quadratic behaviour).  If we kept the code structured as is and inserted
> into the middle of the vector, that would lead to repeated moving of
> elements in the vector which seems inefficient.  The structure of the
> code would also be a little unwieldy.
> 
> To improve on that situation, this patch introduces a helper class,
> stp_change_builder, which implements a state machine that helps to build
> the required changes directly in program order.  That state machine is
> reponsible for deciding what changes need to be made in what order, and
> the code in fuse_pair then simply follows those steps.
> 
> Together with the fix in the previous patch for installing new defs
> correctly in RTL-SSA, this fixes PR113070.
> 
> We take the opportunity to rename the function decide_stp_strategy to
> try_repurpose_store, as that seems more descriptive of what it actually
> does, since stp_change_builder is now responsible for the overall change
> strategy.
> 
> Bootstrapped/regtested as a series with/without the passes enabled on
> aarch64-linux-gnu, OK for trunk?
> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   PR target/113070
>   * config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
>   (decide_stp_strategy): Reanme to ...
>   (try_repurpose_store): ... this.
>   (ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
>   construct stp changes.  Fix up uses when inserting new stp insns.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++-
>  1 file changed, 194 insertions(+), 54 deletions(-)
> 

> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 689a8c884bd..703cfb1228c 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -844,11 +844,138 @@ def_upwards_move_range (def_info *def)
>return range;
>  }
>  
> +// Class that implements a state machine for building the changes needed to 
> form
> +// a store pair instruction.  This allows us to easily build the changes in
> +// program order, as required by rtl-ssa.
> +struct stp_change_builder
> +{
> +  enum class state
> +  {
> +FIRST,
> +INSERT,
> +FIXUP_USE,
> +LAST,
> +DONE
> +  };
> +
> +  enum class action
> +  {
> +TOMBSTONE,
> +CHANGE,
> +INSERT,
> +FIXUP_USE
> +  };
> +
> +  struct change
> +  {
> +action type;
> +insn_info *insn;
> +  };
> +
> +  bool done () const { return m_state == state::DONE; }
> +
> +  stp_change_builder (insn_info *insns[2],
> +   insn_info *repurpose,
> +   insn_info *dest)
> +: m_state (state::FIRST), m_insns { insns[0], insns[1] },
> +  m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}
> +
> +  change get_change () const
> +  {
> +switch (m_state)
> +  {
> +  case state::FIRST:
> + return {
> +   m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> +   m_insns[0]
> + };
> +  case state::LAST:
> + return {
> +   m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> +   m_insns[1]
> + };
> +  case state::INSERT:
> + return { action::INSERT, m_dest };
> +  case state::FIXUP_USE:
> + return { action::FIXUP_USE, m_use->insn () };
> +  case state::DONE:
> + break;
> +  }
> +
> +gcc_unreachable ();
> +  }
> +
> +  // Transition to the next state.
> +  void advance ()
> +  {
> +switch (m_state)
> +  {
> +  case state::FIRST:
> + if (m_repurpose)
> +   m_state = state::LAST;
> + else
> +   m_state = state::INSERT;
> + break;
> +  case state::INSERT:
> +  {
> + def_info *def = memory_access (m_insns[0]->defs ());
> + while (*def->next_def ()->insn () <= *m_dest)
> +   def = def->next_def (

Re: [PATCH 4/4] aarch64: Fix up uses of mem following stp insert [PR113070]

2024-01-22 Thread Richard Sandiford
Alex Coplan  writes:
> As the PR shows (specifically #c7) we are missing updating uses of mem
> when inserting an stp in the aarch64 load/store pair fusion pass.  This
> patch fixes that.
>
> RTL-SSA has a simple view of memory and by default doesn't allow stores
> to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
> own alias analysis and so can re-order stores over other accesses when
> we deem this is safe.  If neither store can be re-purposed (moved into
> the required position to form the stp while respecting the RTL-SSA
> constraints), then we turn both the candidate stores into "tombstone"
> insns (logically delete them) and insert a new stp insn.
>
> As it stands, we implement the insert case separately (after dealing
> with the candidate stores) in fuse_pair by inserting into the middle of
> the vector of changes.  This is OK when we only have to insert one
> change, but with this fix we would need to insert the change for the new
> stp plus multiple changes to fix up uses of mem (note the number of
> fix-ups is naturally bounded by the alias limit param to prevent
> quadratic behaviour).  If we kept the code structured as is and inserted
> into the middle of the vector, that would lead to repeated moving of
> elements in the vector which seems inefficient.  The structure of the
> code would also be a little unwieldy.
>
> To improve on that situation, this patch introduces a helper class,
> stp_change_builder, which implements a state machine that helps to build
> the required changes directly in program order.  That state machine is
> reponsible for deciding what changes need to be made in what order, and
> the code in fuse_pair then simply follows those steps.
>
> Together with the fix in the previous patch for installing new defs
> correctly in RTL-SSA, this fixes PR113070.
>
> We take the opportunity to rename the function decide_stp_strategy to
> try_repurpose_store, as that seems more descriptive of what it actually
> does, since stp_change_builder is now responsible for the overall change
> strategy.
>
> Bootstrapped/regtested as a series with/without the passes enabled on
> aarch64-linux-gnu, OK for trunk?
>
> Thanks,
> Alex
>
> gcc/ChangeLog:
>
>   PR target/113070
>   * config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
>   (decide_stp_strategy): Reanme to ...
>   (try_repurpose_store): ... this.
>   (ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
>   construct stp changes.  Fix up uses when inserting new stp insns.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++-
>  1 file changed, 194 insertions(+), 54 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 689a8c884bd..703cfb1228c 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -844,11 +844,138 @@ def_upwards_move_range (def_info *def)
>return range;
>  }
>  
> +// Class that implements a state machine for building the changes needed to 
> form
> +// a store pair instruction.  This allows us to easily build the changes in
> +// program order, as required by rtl-ssa.
> +struct stp_change_builder
> +{
> +  enum class state
> +  {
> +FIRST,
> +INSERT,
> +FIXUP_USE,
> +LAST,
> +DONE
> +  };
> +
> +  enum class action
> +  {
> +TOMBSTONE,
> +CHANGE,
> +INSERT,
> +FIXUP_USE
> +  };
> +
> +  struct change
> +  {
> +action type;
> +insn_info *insn;
> +  };
> +
> +  bool done () const { return m_state == state::DONE; }
> +
> +  stp_change_builder (insn_info *insns[2],
> +   insn_info *repurpose,
> +   insn_info *dest)
> +: m_state (state::FIRST), m_insns { insns[0], insns[1] },
> +  m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}

Just to make sure I understand: is it the case that

  *insns[0] <= *dest <= *insns[1]

?

> +
> +  change get_change () const
> +  {
> +switch (m_state)
> +  {
> +  case state::FIRST:
> + return {
> +   m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> +   m_insns[0]
> + };
> +  case state::LAST:
> + return {
> +   m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> +   m_insns[1]
> + };
> +  case state::INSERT:
> + return { action::INSERT, m_dest };
> +  case state::FIXUP_USE:
> + return { action::FIXUP_USE, m_use->insn () };
> +  case state::DONE:
> + break;
> +  }
> +
> +gcc_unreachable ();
> +  }
> +
> +  // Transition to the next state.
> +  void advance ()
> +  {
> +switch (m_state)
> +  {
> +  case state::FIRST:
> + if (m_repurpose)
> +   m_state = state::LAST;
> + else
> +   m_state = state::INSERT;
> + break;
> +  case state::INSERT:
> +  {
> + def_info *def = memory_access (m_insns[0]->defs ());
> + while (*def->ne

Re: [PATCH 4/4] aarch64: Fix up uses of mem following stp insert [PR113070]

2024-01-22 Thread Alex Coplan
On 22/01/2024 15:59, Richard Sandiford wrote:
> Alex Coplan  writes:
> > As the PR shows (specifically #c7) we are missing updating uses of mem
> > when inserting an stp in the aarch64 load/store pair fusion pass.  This
> > patch fixes that.
> >
> > RTL-SSA has a simple view of memory and by default doesn't allow stores
> > to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
> > own alias analysis and so can re-order stores over other accesses when
> > we deem this is safe.  If neither store can be re-purposed (moved into
> > the required position to form the stp while respecting the RTL-SSA
> > constraints), then we turn both the candidate stores into "tombstone"
> > insns (logically delete them) and insert a new stp insn.
> >
> > As it stands, we implement the insert case separately (after dealing
> > with the candidate stores) in fuse_pair by inserting into the middle of
> > the vector of changes.  This is OK when we only have to insert one
> > change, but with this fix we would need to insert the change for the new
> > stp plus multiple changes to fix up uses of mem (note the number of
> > fix-ups is naturally bounded by the alias limit param to prevent
> > quadratic behaviour).  If we kept the code structured as is and inserted
> > into the middle of the vector, that would lead to repeated moving of
> > elements in the vector which seems inefficient.  The structure of the
> > code would also be a little unwieldy.
> >
> > To improve on that situation, this patch introduces a helper class,
> > stp_change_builder, which implements a state machine that helps to build
> > the required changes directly in program order.  That state machine is
> > reponsible for deciding what changes need to be made in what order, and
> > the code in fuse_pair then simply follows those steps.
> >
> > Together with the fix in the previous patch for installing new defs
> > correctly in RTL-SSA, this fixes PR113070.
> >
> > We take the opportunity to rename the function decide_stp_strategy to
> > try_repurpose_store, as that seems more descriptive of what it actually
> > does, since stp_change_builder is now responsible for the overall change
> > strategy.
> >
> > Bootstrapped/regtested as a series with/without the passes enabled on
> > aarch64-linux-gnu, OK for trunk?
> >
> > Thanks,
> > Alex
> >
> > gcc/ChangeLog:
> >
> > PR target/113070
> > * config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
> > (decide_stp_strategy): Reanme to ...
> > (try_repurpose_store): ... this.
> > (ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
> > construct stp changes.  Fix up uses when inserting new stp insns.
> > ---
> >  gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++-
> >  1 file changed, 194 insertions(+), 54 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> > b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > index 689a8c884bd..703cfb1228c 100644
> > --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > @@ -844,11 +844,138 @@ def_upwards_move_range (def_info *def)
> >return range;
> >  }
> >  
> > +// Class that implements a state machine for building the changes needed 
> > to form
> > +// a store pair instruction.  This allows us to easily build the changes in
> > +// program order, as required by rtl-ssa.
> > +struct stp_change_builder
> > +{
> > +  enum class state
> > +  {
> > +FIRST,
> > +INSERT,
> > +FIXUP_USE,
> > +LAST,
> > +DONE
> > +  };
> > +
> > +  enum class action
> > +  {
> > +TOMBSTONE,
> > +CHANGE,
> > +INSERT,
> > +FIXUP_USE
> > +  };
> > +
> > +  struct change
> > +  {
> > +action type;
> > +insn_info *insn;
> > +  };
> > +
> > +  bool done () const { return m_state == state::DONE; }
> > +
> > +  stp_change_builder (insn_info *insns[2],
> > + insn_info *repurpose,
> > + insn_info *dest)
> > +: m_state (state::FIRST), m_insns { insns[0], insns[1] },
> > +  m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}
> 
> Just to make sure I understand: is it the case that
> 
>   *insns[0] <= *dest <= *insns[1]
> 
> ?

Yes, that is my understanding.  I thought about asserting it somewhere in
stp_change_builder, but it seemed a bit gratuitous.

> 
> > +
> > +  change get_change () const
> > +  {
> > +switch (m_state)
> > +  {
> > +  case state::FIRST:
> > +   return {
> > + m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> > + m_insns[0]
> > +   };
> > +  case state::LAST:
> > +   return {
> > + m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> > + m_insns[1]
> > +   };
> > +  case state::INSERT:
> > +   return { action::INSERT, m_dest };
> > +  case state::FIXUP_USE:
> > +   return { action::FIXUP_USE, m_use->insn () };
> > +  case state::DONE:
> > +   break;
> > +  }
> > +
> > +gcc_u

Re: [PATCH 4/4] aarch64: Fix up uses of mem following stp insert [PR113070]

2024-01-23 Thread Alex Coplan
On 22/01/2024 21:50, Alex Coplan wrote:
> On 22/01/2024 15:59, Richard Sandiford wrote:
> > Alex Coplan  writes:
> > > As the PR shows (specifically #c7) we are missing updating uses of mem
> > > when inserting an stp in the aarch64 load/store pair fusion pass.  This
> > > patch fixes that.
> > >
> > > RTL-SSA has a simple view of memory and by default doesn't allow stores
> > > to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
> > > own alias analysis and so can re-order stores over other accesses when
> > > we deem this is safe.  If neither store can be re-purposed (moved into
> > > the required position to form the stp while respecting the RTL-SSA
> > > constraints), then we turn both the candidate stores into "tombstone"
> > > insns (logically delete them) and insert a new stp insn.
> > >
> > > As it stands, we implement the insert case separately (after dealing
> > > with the candidate stores) in fuse_pair by inserting into the middle of
> > > the vector of changes.  This is OK when we only have to insert one
> > > change, but with this fix we would need to insert the change for the new
> > > stp plus multiple changes to fix up uses of mem (note the number of
> > > fix-ups is naturally bounded by the alias limit param to prevent
> > > quadratic behaviour).  If we kept the code structured as is and inserted
> > > into the middle of the vector, that would lead to repeated moving of
> > > elements in the vector which seems inefficient.  The structure of the
> > > code would also be a little unwieldy.
> > >
> > > To improve on that situation, this patch introduces a helper class,
> > > stp_change_builder, which implements a state machine that helps to build
> > > the required changes directly in program order.  That state machine is
> > > reponsible for deciding what changes need to be made in what order, and
> > > the code in fuse_pair then simply follows those steps.
> > >
> > > Together with the fix in the previous patch for installing new defs
> > > correctly in RTL-SSA, this fixes PR113070.
> > >
> > > We take the opportunity to rename the function decide_stp_strategy to
> > > try_repurpose_store, as that seems more descriptive of what it actually
> > > does, since stp_change_builder is now responsible for the overall change
> > > strategy.
> > >
> > > Bootstrapped/regtested as a series with/without the passes enabled on
> > > aarch64-linux-gnu, OK for trunk?
> > >
> > > Thanks,
> > > Alex
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR target/113070
> > >   * config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New.
> > >   (decide_stp_strategy): Reanme to ...
> > >   (try_repurpose_store): ... this.
> > >   (ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
> > >   construct stp changes.  Fix up uses when inserting new stp insns.
> > > ---
> > >  gcc/config/aarch64/aarch64-ldp-fusion.cc | 248 ++-
> > >  1 file changed, 194 insertions(+), 54 deletions(-)
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> > > b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > > index 689a8c884bd..703cfb1228c 100644
> > > --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > > +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> > > @@ -844,11 +844,138 @@ def_upwards_move_range (def_info *def)
> > >return range;
> > >  }
> > >  
> > > +// Class that implements a state machine for building the changes needed 
> > > to form
> > > +// a store pair instruction.  This allows us to easily build the changes 
> > > in
> > > +// program order, as required by rtl-ssa.
> > > +struct stp_change_builder
> > > +{
> > > +  enum class state
> > > +  {
> > > +FIRST,
> > > +INSERT,
> > > +FIXUP_USE,
> > > +LAST,
> > > +DONE
> > > +  };
> > > +
> > > +  enum class action
> > > +  {
> > > +TOMBSTONE,
> > > +CHANGE,
> > > +INSERT,
> > > +FIXUP_USE
> > > +  };
> > > +
> > > +  struct change
> > > +  {
> > > +action type;
> > > +insn_info *insn;
> > > +  };
> > > +
> > > +  bool done () const { return m_state == state::DONE; }
> > > +
> > > +  stp_change_builder (insn_info *insns[2],
> > > +   insn_info *repurpose,
> > > +   insn_info *dest)
> > > +: m_state (state::FIRST), m_insns { insns[0], insns[1] },
> > > +  m_repurpose (repurpose), m_dest (dest), m_use (nullptr) {}
> > 
> > Just to make sure I understand: is it the case that
> > 
> >   *insns[0] <= *dest <= *insns[1]
> > 
> > ?
> 
> Yes, that is my understanding.  I thought about asserting it somewhere in
> stp_change_builder, but it seemed a bit gratuitous.
> 
> > 
> > > +
> > > +  change get_change () const
> > > +  {
> > > +switch (m_state)
> > > +  {
> > > +  case state::FIRST:
> > > + return {
> > > +   m_insns[0] == m_repurpose ? action::CHANGE : action::TOMBSTONE,
> > > +   m_insns[0]
> > > + };
> > > +  case state::LAST:
> > > + return {
> > > +   m_insns[1] == m_repurpose ? action::CHANGE : action::TOMBSTONE,