RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-10-23 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Friday, July 14, 2023 2:35 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> updates for early break.
> 
> On Thu, 13 Jul 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Thursday, July 13, 2023 6:31 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ;
> j...@ventanamicro.com
> > > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> > > updates for early break.
> > >
> > > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch updates the peeling code to maintain LCSSA during peeling.
> > > > The rewrite also naturally takes into account multiple exits and so it 
> > > > didn't
> > > > make sense to split them off.
> > > >
> > > > For the purposes of peeling the only change for multiple exits is that 
> > > > the
> > > > secondary exits are all wired to the start of the new loop preheader 
> > > > when
> > > doing
> > > > epilogue peeling.
> > > >
> > > > When doing prologue peeling the CFG is kept in tact.
> > > >
> > > > For both epilogue and prologue peeling we wire through between the
> two
> > > loops any
> > > > PHI nodes that escape the first loop into the second loop if flow_loops 
> > > > is
> > > > specified.  The reason for this conditionality is because
> > > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 
> > > > ways:
> > > >   - prologue peeling
> > > >   - epilogue peeling
> > > >   - loop distribution
> > > >
> > > > for the last case the loops should remain independent, and so not be
> > > connected.
> > > > Because of this propagation of only used phi nodes get_current_def can
> be
> > > used
> > > > to easily find the previous definitions.  However live statements that 
> > > > are
> > > > not used inside the loop itself are not propagated (since if unused, the
> > > moment
> > > > we add the guard in between the two loops the value across the bypass
> edge
> > > can
> > > > be wrong if the loop has been peeled.)
> > > >
> > > > This is dealt with easily enough in find_guard_arg.
> > > >
> > > > For multiple exits, while we are in LCSSA form, and have a correct DOM
> tree,
> > > the
> > > > moment we add the guard block we will change the dominators again.  To
> > > deal with
> > > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the
> blocks
> > > to
> > > > update without having to recompute the list of blocks to update again.
> > > >
> > > > When multiple exits and doing epilogue peeling we will also temporarily
> have
> > > an
> > > > incorrect VUSES chain for the secondary exits as it anticipates the 
> > > > final
> result
> > > > after the VDEFs have been moved.  This will thus be corrected once the
> code
> > > > motion is applied.
> > > >
> > > > Lastly by doing things this way we can remove the helper functions that
> > > > previously did lock step iterations to update things as it went along.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > > Not sure if I get through all of this in one go - so be prepared that
> > > the rest of the review follows another day.
> >
> > No worries, I appreciate the reviews!
> > Just giving some quick replies for when you continue.
> 
> Continueing.
> 
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * tree-loop-distribution.cc (copy_loop_before): Pass flow_loops 
> > > > =
> > > false.
> > > > * tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when 
> > > > exit==null.
> > > > * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> > > additional
> > > > assert.
> > > > (vect_set_loop_conditi

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-08-18 Thread Richard Biener via Gcc-patches
On Fri, 18 Aug 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, August 18, 2023 2:53 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> > updates for early break.
> > 
> > On Fri, 18 Aug 2023, Tamar Christina wrote:
> > 
> > > > > Yeah if you comment it out one of the testcases should fail.
> > > >
> > > > using new_preheader instead of e->dest would make things clearer.
> > > >
> > > > You are now adding the same arg to every exit (you've just queried the
> > > > main exit redirect_edge_var_map_vector).
> > > >
> > > > OK, so I think I understand what you're doing.  If I understand
> > > > correctly we know that when we exit the main loop via one of the
> > > > early exits we are definitely going to enter the epilog but when
> > > > we take the main exit we might not.
> > > >
> > >
> > > Correct.. but..
> > >
> > > > Looking at the CFG we create currently this isn't reflected and
> > > > this complicates this PHI node updating.  What I'd try to do
> > > > is leave redirecting the alternate exits until after
> > >
> > > It is, in the case of the alternate exits this is reflected in copying
> > > the same values, as they are the values of the number of completed
> > > iterations since the scalar code restarts the last iteration.
> > >
> > > So all the PHI nodes of the alternate exits are correct.  The vector
> > > Iteration doesn't handle the partial iteration.
> > >
> > > > slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> > > > means leaving it almost unchanged besides the LC SSA maintaining
> > > > changes.  After that for the multi-exit case split the
> > > > epilog preheader edge and redirect all the alternate exits to the
> > > > new preheader.  So the CFG becomes
> > > >
> > > >  
> > > > /  |
> > > >/
> > > >   /  if (epilog)
> > > >alt exits //  \
> > > > //loop around
> > > > |   /
> > > >preheader with "header" PHIs
> > > >   |
> > > >   
> > > >
> > > > note you need the header PHIs also on the main exit path but you
> > > > only need the loop end PHIs there.
> > > >
> > > > It seems so that at least currently the order of things makes
> > > > them more complicated than necessary.
> > >
> > > I've been trying to, but this representation seems a lot harder to work 
> > > with,
> > > In particular at the moment once we exit
> > slpeel_tree_duplicate_loop_to_edge_cfg
> > > the loop structure is exactly the same as one expects from any normal 
> > > epilog
> > vectorization.
> > >
> > > But this new representation requires me to place the guard much earlier 
> > > than
> > the epilogue
> > > preheader,  yet I still have to adjust the PHI nodes in the preheader.  
> > > So it
> > seems that this split
> > > is there to only indicate that we always enter the epilog when taking an 
> > > early
> > exit.
> > >
> > > Today this is reflected in the values of the PHI nodes rather than 
> > > structurally.
> > Once we place
> > > The guard we update the nodes and the alternate exits get their value for
> > ivtmp updated to VF.
> > >
> > > This representation also forces me to do the redirection in every call 
> > > site of
> > > slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated
> > in all use sites.
> > >
> > > But I think this doesn't address the main reason why the
> > slpeel_tree_duplicate_loop_to_edge_cfg
> > > code has a large block of code to deal with PHI node updates.
> > >
> > > The reason as you mentioned somewhere else is that after we redirect the
> > edges I have to reconstruct
> > > the phi nodes.  For most it's straight forwards, but for live values or 
> > > vuse
> > chains it requires extra code.
> > >
> > > You're right in that before we redirect the edges they are all correct in 
> > 

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-08-18 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Friday, August 18, 2023 2:53 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> updates for early break.
> 
> On Fri, 18 Aug 2023, Tamar Christina wrote:
> 
> > > > Yeah if you comment it out one of the testcases should fail.
> > >
> > > using new_preheader instead of e->dest would make things clearer.
> > >
> > > You are now adding the same arg to every exit (you've just queried the
> > > main exit redirect_edge_var_map_vector).
> > >
> > > OK, so I think I understand what you're doing.  If I understand
> > > correctly we know that when we exit the main loop via one of the
> > > early exits we are definitely going to enter the epilog but when
> > > we take the main exit we might not.
> > >
> >
> > Correct.. but..
> >
> > > Looking at the CFG we create currently this isn't reflected and
> > > this complicates this PHI node updating.  What I'd try to do
> > > is leave redirecting the alternate exits until after
> >
> > It is, in the case of the alternate exits this is reflected in copying
> > the same values, as they are the values of the number of completed
> > iterations since the scalar code restarts the last iteration.
> >
> > So all the PHI nodes of the alternate exits are correct.  The vector
> > Iteration doesn't handle the partial iteration.
> >
> > > slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> > > means leaving it almost unchanged besides the LC SSA maintaining
> > > changes.  After that for the multi-exit case split the
> > > epilog preheader edge and redirect all the alternate exits to the
> > > new preheader.  So the CFG becomes
> > >
> > >  
> > > /  |
> > >/
> > >   /  if (epilog)
> > >alt exits //  \
> > > //loop around
> > > |   /
> > >preheader with "header" PHIs
> > >   |
> > >   
> > >
> > > note you need the header PHIs also on the main exit path but you
> > > only need the loop end PHIs there.
> > >
> > > It seems so that at least currently the order of things makes
> > > them more complicated than necessary.
> >
> > I've been trying to, but this representation seems a lot harder to work 
> > with,
> > In particular at the moment once we exit
> slpeel_tree_duplicate_loop_to_edge_cfg
> > the loop structure is exactly the same as one expects from any normal epilog
> vectorization.
> >
> > But this new representation requires me to place the guard much earlier than
> the epilogue
> > preheader,  yet I still have to adjust the PHI nodes in the preheader.  So 
> > it
> seems that this split
> > is there to only indicate that we always enter the epilog when taking an 
> > early
> exit.
> >
> > Today this is reflected in the values of the PHI nodes rather than 
> > structurally.
> Once we place
> > The guard we update the nodes and the alternate exits get their value for
> ivtmp updated to VF.
> >
> > This representation also forces me to do the redirection in every call site 
> > of
> > slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated
> in all use sites.
> >
> > But I think this doesn't address the main reason why the
> slpeel_tree_duplicate_loop_to_edge_cfg
> > code has a large block of code to deal with PHI node updates.
> >
> > The reason as you mentioned somewhere else is that after we redirect the
> edges I have to reconstruct
> > the phi nodes.  For most it's straight forwards, but for live values or vuse
> chains it requires extra code.
> >
> > You're right in that before we redirect the edges they are all correct in 
> > the exit
> block, you mentioned that
> > the API for the edge redirection is supposed to copy the values over if I
> create the phi nodes before hand.
> >
> > However this doesn't seem to work:
> >
> >  for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
> >!gsi_end_p (gsi_from); gsi_next (_from))
> > {
> >   gimple *from_phi = gsi_stmt (gsi_from);
> >   tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> >   create_phi_node (new_res, new_preheader);
> >

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-08-18 Thread Richard Biener via Gcc-patches
On Fri, 18 Aug 2023, Tamar Christina wrote:

> > > Yeah if you comment it out one of the testcases should fail.
> > 
> > using new_preheader instead of e->dest would make things clearer.
> > 
> > You are now adding the same arg to every exit (you've just queried the
> > main exit redirect_edge_var_map_vector).
> > 
> > OK, so I think I understand what you're doing.  If I understand
> > correctly we know that when we exit the main loop via one of the
> > early exits we are definitely going to enter the epilog but when
> > we take the main exit we might not.
> > 
> 
> Correct.. but..
> 
> > Looking at the CFG we create currently this isn't reflected and
> > this complicates this PHI node updating.  What I'd try to do
> > is leave redirecting the alternate exits until after
> 
> It is, in the case of the alternate exits this is reflected in copying
> the same values, as they are the values of the number of completed 
> iterations since the scalar code restarts the last iteration.
> 
> So all the PHI nodes of the alternate exits are correct.  The vector
> Iteration doesn't handle the partial iteration.
> 
> > slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> > means leaving it almost unchanged besides the LC SSA maintaining
> > changes.  After that for the multi-exit case split the
> > epilog preheader edge and redirect all the alternate exits to the
> > new preheader.  So the CFG becomes
> > 
> >  
> > /  |
> >/
> >   /  if (epilog)
> >alt exits //  \
> > //loop around
> > |   /
> >preheader with "header" PHIs
> >   |
> >   
> > 
> > note you need the header PHIs also on the main exit path but you
> > only need the loop end PHIs there.
> > 
> > It seems so that at least currently the order of things makes
> > them more complicated than necessary.
> 
> I've been trying to, but this representation seems a lot harder to work with,
> In particular at the moment once we exit 
> slpeel_tree_duplicate_loop_to_edge_cfg
> the loop structure is exactly the same as one expects from any normal epilog 
> vectorization.
> 
> But this new representation requires me to place the guard much earlier than 
> the epilogue
> preheader,  yet I still have to adjust the PHI nodes in the preheader.  So it 
> seems that this split
> is there to only indicate that we always enter the epilog when taking an 
> early exit.
> 
> Today this is reflected in the values of the PHI nodes rather than 
> structurally.  Once we place
> The guard we update the nodes and the alternate exits get their value for 
> ivtmp updated to VF.
> 
> This representation also forces me to do the redirection in every call site of
> slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated in 
> all use sites.
> 
> But I think this doesn't address the main reason why the 
> slpeel_tree_duplicate_loop_to_edge_cfg
> code has a large block of code to deal with PHI node updates.
> 
> The reason as you mentioned somewhere else is that after we redirect the 
> edges I have to reconstruct
> the phi nodes.  For most it's straight forwards, but for live values or vuse 
> chains it requires extra code.
> 
> You're right in that before we redirect the edges they are all correct in the 
> exit block, you mentioned that
> the API for the edge redirection is supposed to copy the values over if I 
> create the phi nodes before hand.
> 
> However this doesn't seem to work:
> 
>  for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
>  !gsi_end_p (gsi_from); gsi_next (_from))
>   {
> gimple *from_phi = gsi_stmt (gsi_from);
> tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> create_phi_node (new_res, new_preheader);
>   }
> 
>   for (edge exit : loop_exits)
>   redirect_edge_and_branch (exit, new_preheader);
> 
> Still leaves them empty.  Grepping around most code seems to pair 
> redirect_edge_and_branch with
> copy_phi_arg_into_existing_phi.  The problem is that in all these cases after 
> redirecting an edge they
> call copy_phi_arg_into_existing_phi from a predecessor edge to fill in the 
> phi nodes.

You need to call flush_pending_stmts on each edge you redirect.
copy_phi_arg_into_existing_phi isn't suitable for edge redirecting.

> This is because as I redirect_edge_and_branch destroys the phi node entries 
> and copy_phi_arg_into_existing_phi
> simply just reads the gimple_phi_arg_def which would be NULL.
> 
> You could point it to the src block of the exit, in which case it copies the 
> wrong values in for the vuses.  At the end
> of vectorization the cfgcleanup code does the same thing to maintain LCSSA if 
> you haven't.  This code always goes
> wrong for multiple exits because of the problem described above.  There's no 
> node for it to copy the right value
> from.
> 
> As an alternate approach I can split the exit 

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-08-18 Thread Tamar Christina via Gcc-patches
> > Yeah if you comment it out one of the testcases should fail.
> 
> using new_preheader instead of e->dest would make things clearer.
> 
> You are now adding the same arg to every exit (you've just queried the
> main exit redirect_edge_var_map_vector).
> 
> OK, so I think I understand what you're doing.  If I understand
> correctly we know that when we exit the main loop via one of the
> early exits we are definitely going to enter the epilog but when
> we take the main exit we might not.
> 

Correct.. but..

> Looking at the CFG we create currently this isn't reflected and
> this complicates this PHI node updating.  What I'd try to do
> is leave redirecting the alternate exits until after

It is, in the case of the alternate exits this is reflected in copying
the same values, as they are the values of the number of completed 
iterations since the scalar code restarts the last iteration.

So all the PHI nodes of the alternate exits are correct.  The vector
Iteration doesn't handle the partial iteration.

> slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> means leaving it almost unchanged besides the LC SSA maintaining
> changes.  After that for the multi-exit case split the
> epilog preheader edge and redirect all the alternate exits to the
> new preheader.  So the CFG becomes
> 
>  
> /  |
>/
>   /  if (epilog)
>alt exits //  \
> //loop around
> |   /
>preheader with "header" PHIs
>   |
>   
> 
> note you need the header PHIs also on the main exit path but you
> only need the loop end PHIs there.
> 
> It seems so that at least currently the order of things makes
> them more complicated than necessary.

I've been trying to, but this representation seems a lot harder to work with,
In particular at the moment once we exit slpeel_tree_duplicate_loop_to_edge_cfg
the loop structure is exactly the same as one expects from any normal epilog 
vectorization.

But this new representation requires me to place the guard much earlier than 
the epilogue
preheader,  yet I still have to adjust the PHI nodes in the preheader.  So it 
seems that this split
is there to only indicate that we always enter the epilog when taking an early 
exit.

Today this is reflected in the values of the PHI nodes rather than 
structurally.  Once we place
The guard we update the nodes and the alternate exits get their value for ivtmp 
updated to VF.

This representation also forces me to do the redirection in every call site of
slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated in all 
use sites.

But I think this doesn't address the main reason why the 
slpeel_tree_duplicate_loop_to_edge_cfg
code has a large block of code to deal with PHI node updates.

The reason as you mentioned somewhere else is that after we redirect the edges 
I have to reconstruct
the phi nodes.  For most it's straight forwards, but for live values or vuse 
chains it requires extra code.

You're right in that before we redirect the edges they are all correct in the 
exit block, you mentioned that
the API for the edge redirection is supposed to copy the values over if I 
create the phi nodes before hand.

However this doesn't seem to work:

 for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
   !gsi_end_p (gsi_from); gsi_next (_from))
{
  gimple *from_phi = gsi_stmt (gsi_from);
  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
  create_phi_node (new_res, new_preheader);
}

  for (edge exit : loop_exits)
redirect_edge_and_branch (exit, new_preheader);

Still leaves them empty.  Grepping around most code seems to pair 
redirect_edge_and_branch with
copy_phi_arg_into_existing_phi.  The problem is that in all these cases after 
redirecting an edge they
call copy_phi_arg_into_existing_phi from a predecessor edge to fill in the phi 
nodes.

This is because as I redirect_edge_and_branch destroys the phi node entries and 
copy_phi_arg_into_existing_phi
simply just reads the gimple_phi_arg_def which would be NULL.

You could point it to the src block of the exit, in which case it copies the 
wrong values in for the vuses.  At the end
of vectorization the cfgcleanup code does the same thing to maintain LCSSA if 
you haven't.  This code always goes
wrong for multiple exits because of the problem described above.  There's no 
node for it to copy the right value
from.

As an alternate approach I can split the exit edges, copy the phi nodes into 
the split and after that redirect them.
This however creates the awkwardness of having the exit edges no longer connect 
to the preheader.

All of this then begs the question if this is all easier than the current 
approach which is just to read the edge var
map to figure out the nodes that were removed during the redirect.

Maybe I'm still misunderstanding the API, 

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-07-17 Thread Richard Biener via Gcc-patches
On Mon, 17 Jul 2023, Tamar Christina wrote:

> 
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, July 14, 2023 2:35 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> > updates for early break.
> > 
> > On Thu, 13 Jul 2023, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Thursday, July 13, 2023 6:31 PM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; nd ;
> > j...@ventanamicro.com
> > > > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> > > > updates for early break.
> > > >
> > > > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > This patch updates the peeling code to maintain LCSSA during peeling.
> > > > > The rewrite also naturally takes into account multiple exits and so 
> > > > > it didn't
> > > > > make sense to split them off.
> > > > >
> > > > > For the purposes of peeling the only change for multiple exits is 
> > > > > that the
> > > > > secondary exits are all wired to the start of the new loop preheader 
> > > > > when
> > > > doing
> > > > > epilogue peeling.
> > > > >
> > > > > When doing prologue peeling the CFG is kept in tact.
> > > > >
> > > > > For both epilogue and prologue peeling we wire through between the
> > two
> > > > loops any
> > > > > PHI nodes that escape the first loop into the second loop if 
> > > > > flow_loops is
> > > > > specified.  The reason for this conditionality is because
> > > > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 
> > > > > ways:
> > > > >   - prologue peeling
> > > > >   - epilogue peeling
> > > > >   - loop distribution
> > > > >
> > > > > for the last case the loops should remain independent, and so not be
> > > > connected.
> > > > > Because of this propagation of only used phi nodes get_current_def can
> > be
> > > > used
> > > > > to easily find the previous definitions.  However live statements 
> > > > > that are
> > > > > not used inside the loop itself are not propagated (since if unused, 
> > > > > the
> > > > moment
> > > > > we add the guard in between the two loops the value across the bypass
> > edge
> > > > can
> > > > > be wrong if the loop has been peeled.)
> > > > >
> > > > > This is dealt with easily enough in find_guard_arg.
> > > > >
> > > > > For multiple exits, while we are in LCSSA form, and have a correct DOM
> > tree,
> > > > the
> > > > > moment we add the guard block we will change the dominators again.  To
> > > > deal with
> > > > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the
> > blocks
> > > > to
> > > > > update without having to recompute the list of blocks to update again.
> > > > >
> > > > > When multiple exits and doing epilogue peeling we will also 
> > > > > temporarily
> > have
> > > > an
> > > > > incorrect VUSES chain for the secondary exits as it anticipates the 
> > > > > final
> > result
> > > > > after the VDEFs have been moved.  This will thus be corrected once the
> > code
> > > > > motion is applied.
> > > > >
> > > > > Lastly by doing things this way we can remove the helper functions 
> > > > > that
> > > > > previously did lock step iterations to update things as it went along.
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > >
> > > > > Ok for master?
> > > >
> > > > Not sure if I get through all of this in one go - so be prepared that
> > > > the rest of the review follows another day.
> > >
> > > No worries, I appreciate the reviews!
> > > Just giving some quick replies for when you continue.
> > 
&

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-07-17 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Biener 
> Sent: Friday, July 14, 2023 2:35 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> updates for early break.
> 
> On Thu, 13 Jul 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Thursday, July 13, 2023 6:31 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ;
> j...@ventanamicro.com
> > > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> > > updates for early break.
> > >
> > > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch updates the peeling code to maintain LCSSA during peeling.
> > > > The rewrite also naturally takes into account multiple exits and so it 
> > > > didn't
> > > > make sense to split them off.
> > > >
> > > > For the purposes of peeling the only change for multiple exits is that 
> > > > the
> > > > secondary exits are all wired to the start of the new loop preheader 
> > > > when
> > > doing
> > > > epilogue peeling.
> > > >
> > > > When doing prologue peeling the CFG is kept in tact.
> > > >
> > > > For both epilogue and prologue peeling we wire through between the
> two
> > > loops any
> > > > PHI nodes that escape the first loop into the second loop if flow_loops 
> > > > is
> > > > specified.  The reason for this conditionality is because
> > > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 
> > > > ways:
> > > >   - prologue peeling
> > > >   - epilogue peeling
> > > >   - loop distribution
> > > >
> > > > for the last case the loops should remain independent, and so not be
> > > connected.
> > > > Because of this propagation of only used phi nodes get_current_def can
> be
> > > used
> > > > to easily find the previous definitions.  However live statements that 
> > > > are
> > > > not used inside the loop itself are not propagated (since if unused, the
> > > moment
> > > > we add the guard in between the two loops the value across the bypass
> edge
> > > can
> > > > be wrong if the loop has been peeled.)
> > > >
> > > > This is dealt with easily enough in find_guard_arg.
> > > >
> > > > For multiple exits, while we are in LCSSA form, and have a correct DOM
> tree,
> > > the
> > > > moment we add the guard block we will change the dominators again.  To
> > > deal with
> > > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the
> blocks
> > > to
> > > > update without having to recompute the list of blocks to update again.
> > > >
> > > > When multiple exits and doing epilogue peeling we will also temporarily
> have
> > > an
> > > > incorrect VUSES chain for the secondary exits as it anticipates the 
> > > > final
> result
> > > > after the VDEFs have been moved.  This will thus be corrected once the
> code
> > > > motion is applied.
> > > >
> > > > Lastly by doing things this way we can remove the helper functions that
> > > > previously did lock step iterations to update things as it went along.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > > Not sure if I get through all of this in one go - so be prepared that
> > > the rest of the review follows another day.
> >
> > No worries, I appreciate the reviews!
> > Just giving some quick replies for when you continue.
> 
> Continueing.
> 
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * tree-loop-distribution.cc (copy_loop_before): Pass flow_loops 
> > > > =
> > > false.
> > > > * tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when 
> > > > exit==null.
> > > > * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> > > additional
> > > > assert.
> > > > (vect_set_loop_conditi

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-07-14 Thread Richard Biener via Gcc-patches
On Thu, 13 Jul 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, July 13, 2023 6:31 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> > updates for early break.
> > 
> > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This patch updates the peeling code to maintain LCSSA during peeling.
> > > The rewrite also naturally takes into account multiple exits and so it 
> > > didn't
> > > make sense to split them off.
> > >
> > > For the purposes of peeling the only change for multiple exits is that the
> > > secondary exits are all wired to the start of the new loop preheader when
> > doing
> > > epilogue peeling.
> > >
> > > When doing prologue peeling the CFG is kept in tact.
> > >
> > > For both epilogue and prologue peeling we wire through between the two
> > loops any
> > > PHI nodes that escape the first loop into the second loop if flow_loops is
> > > specified.  The reason for this conditionality is because
> > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways:
> > >   - prologue peeling
> > >   - epilogue peeling
> > >   - loop distribution
> > >
> > > for the last case the loops should remain independent, and so not be
> > connected.
> > > Because of this propagation of only used phi nodes get_current_def can be
> > used
> > > to easily find the previous definitions.  However live statements that are
> > > not used inside the loop itself are not propagated (since if unused, the
> > moment
> > > we add the guard in between the two loops the value across the bypass edge
> > can
> > > be wrong if the loop has been peeled.)
> > >
> > > This is dealt with easily enough in find_guard_arg.
> > >
> > > For multiple exits, while we are in LCSSA form, and have a correct DOM 
> > > tree,
> > the
> > > moment we add the guard block we will change the dominators again.  To
> > deal with
> > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the 
> > > blocks
> > to
> > > update without having to recompute the list of blocks to update again.
> > >
> > > When multiple exits and doing epilogue peeling we will also temporarily 
> > > have
> > an
> > > incorrect VUSES chain for the secondary exits as it anticipates the final 
> > > result
> > > after the VDEFs have been moved.  This will thus be corrected once the 
> > > code
> > > motion is applied.
> > >
> > > Lastly by doing things this way we can remove the helper functions that
> > > previously did lock step iterations to update things as it went along.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > 
> > Not sure if I get through all of this in one go - so be prepared that
> > the rest of the review follows another day.
> 
> No worries, I appreciate the reviews!
> Just giving some quick replies for when you continue.

Continueing.

> > 
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-loop-distribution.cc (copy_loop_before): Pass flow_loops =
> > false.
> > >   * tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when exit==null.
> > >   * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> > additional
> > >   assert.
> > >   (vect_set_loop_condition_normal): Skip modifying loop IV for multiple
> > >   exits.
> > >   (slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit
> > peeling.
> > >   (slpeel_can_duplicate_loop_p): Likewise.
> > >   (vect_update_ivs_after_vectorizer): Don't enter this...
> > >   (vect_update_ivs_after_early_break): ...but instead enter here.
> > >   (find_guard_arg): Update for new peeling code.
> > >   (slpeel_update_phi_nodes_for_loops): Remove.
> > >   (slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0
> > checks.
> > >   (slpeel_update_phi_nodes_for_lcssa): Remove.
> > >   (vect_do_peeling): Fix VF for multiple exits and force epilogue.
> > >   * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > >   non_break_control_flow and early_breaks.
> >

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-07-13 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, July 13, 2023 6:31 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> updates for early break.
> 
> On Wed, 28 Jun 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch updates the peeling code to maintain LCSSA during peeling.
> > The rewrite also naturally takes into account multiple exits and so it 
> > didn't
> > make sense to split them off.
> >
> > For the purposes of peeling the only change for multiple exits is that the
> > secondary exits are all wired to the start of the new loop preheader when
> doing
> > epilogue peeling.
> >
> > When doing prologue peeling the CFG is kept in tact.
> >
> > For both epilogue and prologue peeling we wire through between the two
> loops any
> > PHI nodes that escape the first loop into the second loop if flow_loops is
> > specified.  The reason for this conditionality is because
> > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways:
> >   - prologue peeling
> >   - epilogue peeling
> >   - loop distribution
> >
> > for the last case the loops should remain independent, and so not be
> connected.
> > Because of this propagation of only used phi nodes get_current_def can be
> used
> > to easily find the previous definitions.  However live statements that are
> > not used inside the loop itself are not propagated (since if unused, the
> moment
> > we add the guard in between the two loops the value across the bypass edge
> can
> > be wrong if the loop has been peeled.)
> >
> > This is dealt with easily enough in find_guard_arg.
> >
> > For multiple exits, while we are in LCSSA form, and have a correct DOM tree,
> the
> > moment we add the guard block we will change the dominators again.  To
> deal with
> > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the blocks
> to
> > update without having to recompute the list of blocks to update again.
> >
> > When multiple exits and doing epilogue peeling we will also temporarily have
> an
> > incorrect VUSES chain for the secondary exits as it anticipates the final 
> > result
> > after the VDEFs have been moved.  This will thus be corrected once the code
> > motion is applied.
> >
> > Lastly by doing things this way we can remove the helper functions that
> > previously did lock step iterations to update things as it went along.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> Not sure if I get through all of this in one go - so be prepared that
> the rest of the review follows another day.

No worries, I appreciate the reviews!
Just giving some quick replies for when you continue.

> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-loop-distribution.cc (copy_loop_before): Pass flow_loops =
> false.
> > * tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when exit==null.
> > * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> additional
> > assert.
> > (vect_set_loop_condition_normal): Skip modifying loop IV for multiple
> > exits.
> > (slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit
> peeling.
> > (slpeel_can_duplicate_loop_p): Likewise.
> > (vect_update_ivs_after_vectorizer): Don't enter this...
> > (vect_update_ivs_after_early_break): ...but instead enter here.
> > (find_guard_arg): Update for new peeling code.
> > (slpeel_update_phi_nodes_for_loops): Remove.
> > (slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0
> checks.
> > (slpeel_update_phi_nodes_for_lcssa): Remove.
> > (vect_do_peeling): Fix VF for multiple exits and force epilogue.
> > * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > non_break_control_flow and early_breaks.
> > (vect_need_peeling_or_partial_vectors_p): Force partial vector if
> > multiple exits and VLA.
> > (vect_analyze_loop_form): Support inner loop multiple exits.
> > (vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > (vect_create_epilog_for_reduction):  Update live phi nodes.
> > (vectorizable_live_operation): Ignore live operations in vector loop
> > when multiple exits.
> > (vect_transform_loop): Force unrolling for VF loops and multiple exits.
> > * tree-vect-stmts.cc (ve

Re: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-07-13 Thread Richard Biener via Gcc-patches
On Wed, 28 Jun 2023, Tamar Christina wrote:

> Hi All,
> 
> This patch updates the peeling code to maintain LCSSA during peeling.
> The rewrite also naturally takes into account multiple exits and so it didn't
> make sense to split them off.
> 
> For the purposes of peeling the only change for multiple exits is that the
> secondary exits are all wired to the start of the new loop preheader when 
> doing
> epilogue peeling.
> 
> When doing prologue peeling the CFG is kept in tact.
> 
> For both epilogue and prologue peeling we wire through between the two loops 
> any
> PHI nodes that escape the first loop into the second loop if flow_loops is
> specified.  The reason for this conditionality is because
> slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways:
>   - prologue peeling
>   - epilogue peeling
>   - loop distribution
> 
> for the last case the loops should remain independent, and so not be 
> connected.
> Because of this propagation of only used phi nodes get_current_def can be used
> to easily find the previous definitions.  However live statements that are
> not used inside the loop itself are not propagated (since if unused, the 
> moment
> we add the guard in between the two loops the value across the bypass edge can
> be wrong if the loop has been peeled.)
> 
> This is dealt with easily enough in find_guard_arg.
> 
> For multiple exits, while we are in LCSSA form, and have a correct DOM tree, 
> the
> moment we add the guard block we will change the dominators again.  To deal 
> with
> this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the blocks 
> to
> update without having to recompute the list of blocks to update again.
> 
> When multiple exits and doing epilogue peeling we will also temporarily have 
> an
> incorrect VUSES chain for the secondary exits as it anticipates the final 
> result
> after the VDEFs have been moved.  This will thus be corrected once the code
> motion is applied.
> 
> Lastly by doing things this way we can remove the helper functions that
> previously did lock step iterations to update things as it went along.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

Not sure if I get through all of this in one go - so be prepared that
the rest of the review follows another day.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-loop-distribution.cc (copy_loop_before): Pass flow_loops = false.
>   * tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when exit==null.
>   * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional
>   assert.
>   (vect_set_loop_condition_normal): Skip modifying loop IV for multiple
>   exits.
>   (slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit peeling.
>   (slpeel_can_duplicate_loop_p): Likewise.
>   (vect_update_ivs_after_vectorizer): Don't enter this...
>   (vect_update_ivs_after_early_break): ...but instead enter here.
>   (find_guard_arg): Update for new peeling code.
>   (slpeel_update_phi_nodes_for_loops): Remove.
>   (slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0 checks.
>   (slpeel_update_phi_nodes_for_lcssa): Remove.
>   (vect_do_peeling): Fix VF for multiple exits and force epilogue.
>   * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
>   non_break_control_flow and early_breaks.
>   (vect_need_peeling_or_partial_vectors_p): Force partial vector if
>   multiple exits and VLA.
>   (vect_analyze_loop_form): Support inner loop multiple exits.
>   (vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
>   (vect_create_epilog_for_reduction):  Update live phi nodes.
>   (vectorizable_live_operation): Ignore live operations in vector loop
>   when multiple exits.
>   (vect_transform_loop): Force unrolling for VF loops and multiple exits.
>   * tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze ctrl statements.
>   (vect_mark_stmts_to_be_vectorized): Check for non-exit control flow and
>   analyze gcond params.
>   (vect_analyze_stmt): Support gcond.
>   * tree-vectorizer.cc (pass_vectorize::execute): Support multiple exits
>   in RPO pass.
>   * tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
>   (LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_GENERAL_CTR_FLOW): New.
>   (loop_vec_info_for_loop): Change to const and static.
>   (is_loop_header_bb_p): Drop assert.
>   (slpeel_can_duplicate_loop_p): Update prototype.
>   (class loop): Add early_breaks and non_break_control_flow.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
> index 
> 97879498db46dd3c34181ae9aa6e5476004dd5b5..d790ce5fffab3aa3dfc40d833a968314a4442b9e
>  100644
> --- a/gcc/tree-loop-distribution.cc
> +++ b/gcc/tree-loop-distribution.cc
> @@ -948,7 +948,7 @@ copy_loop_before (class loop *loop, bool 
> redirect_lc_phi_defs)
>