RE: [PATCH]AArch64 xorsign: Fix scalar xorsign lowering

2023-09-01 Thread Tamar Christina via Gcc-patches



> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, September 1, 2023 2:36 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH]AArch64 xorsign: Fix scalar xorsign lowering
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > In GCC-9 our scalar xorsign pattern broke and we didn't notice it
> > because the testcase was not strong enough.  With this commit
> >
> > 8d2d39587d941a40f25ea0144cceb677df115040 is the first bad commit
> > commit 8d2d39587d941a40f25ea0144cceb677df115040
> > Author: Segher Boessenkool 
> > Date:   Mon Oct 22 22:23:39 2018 +0200
> >
> > combine: Do not combine moves from hard registers
> >
> > combine started introducing useless moves on hard registers,  when one
> > of the arguments to our scalar xorsign is a hardreg we get an additional 
> > move
> inserted.
> >
> > This leads to combine forming an AND with the immediate inside and
> > using the superflous move to do the r->w move, instead of what we
> > wanted before which was for the `and` to be a vector and and have reload
> pick the right alternative.
> 
> IMO, the xorsign optab ought to go away.  IIRC it was just a stop-gap measure
> that (like most stop-gap measures) never got cleaned up later.
> 
> But that's not important now. :)
> 
> > To fix this the patch just forces the use of the vector version
> > directly and so combine has no chance to mess it up.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (xorsign3): Renamed to..
> > (@xorsign3): ...This.
> > * config/aarch64/aarch64.md (xorsign3): Renamed to...
> > (@xorsign3): ..This and emit vectors directly
> > * config/aarch64/iterators.md (VCONQ): Add SF and DF.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/xorsign.c:
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index
> >
> f67eb70577d0c2d9911d8c867d38a4d0b390337c..e955691f1be8830efacc2
> 3746511
> > 9764ce2a4942 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -500,7 +500,7 @@ (define_expand "ctz2"
> >}
> >  )
> >
> > -(define_expand "xorsign3"
> > +(define_expand "@xorsign3"
> >[(match_operand:VHSDF 0 "register_operand")
> > (match_operand:VHSDF 1 "register_operand")
> > (match_operand:VHSDF 2 "register_operand")] diff --git
> > a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index
> >
> 01cf989641fce8e6c3828f6cfef62e101c4142df..9db82347bf891f9bc40aede
> cdc84
> > 62c94bf1a769 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -6953,31 +6953,20 @@ (define_insn "copysign3_insn"
> >  ;; EOR   v0.8B, v0.8B, v3.8B
> >  ;;
> >
> > -(define_expand "xorsign3"
> > +(define_expand "@xorsign3"
> >[(match_operand:GPF 0 "register_operand")
> > (match_operand:GPF 1 "register_operand")
> > (match_operand:GPF 2 "register_operand")]
> >"TARGET_SIMD"
> >  {
> > -
> > -  machine_mode imode = mode;
> > -  rtx mask = gen_reg_rtx (imode);
> > -  rtx op1x = gen_reg_rtx (imode);
> > -  rtx op2x = gen_reg_rtx (imode);
> > -
> > -  int bits = GET_MODE_BITSIZE (mode) - 1;
> > -  emit_move_insn (mask, GEN_INT (trunc_int_for_mode
> (HOST_WIDE_INT_M1U << bits,
> > -imode)));
> > -
> > -  emit_insn (gen_and3 (op2x, mask,
> > -   lowpart_subreg (imode, operands[2],
> > -   mode)));
> > -  emit_insn (gen_xor3 (op1x,
> > -   lowpart_subreg (imode, operands[1],
> > -   mode),
> > -   op2x));
> > +  rtx tmp = gen_reg_rtx (mode);  rtx op1 = gen_reg_rtx
> > + (mode);  rtx op2 = gen_reg_rtx (mode);
> emit_move_insn
> > + (op1, lowpart_subreg (mode, operands[1], mode));
> > + emit_move_insn (op2, lowpart_subreg (mode, operands[2],
> > + mode));  emit_insn (gen_xorsign3(mode, tmp, op1,
> op2));
> 
> Do we need the extra moves into op1 and op2?  I would have expected the
> subregs to be acceptable as direct operands of the xorsign3.  Making them
> direct operands should be better, since there's then less risk of having the
> same value live in different registers at the same time.
> 

That was the first thing I tried but it doesn't work because validate_subreg 
seems
to have the invariant that you can either change mode between the same size
or make it paradoxical but not both at the same time.

i.e. it rejects subreg:V2DI (subreg:DI (reg:DF))), and lowpart_subreg folds it 
to
NULL_RTX. Because the lowering when the input is a subreg takes the mode of
the original RTX. i.e. the above is folder to subreg:V2DI (reg:DF) which is 

[PATCH]AArch64 xorsign: Fix scalar xorsign lowering

2023-09-01 Thread Tamar Christina via Gcc-patches
Hi All,

In GCC-9 our scalar xorsign pattern broke and we didn't notice it because the
testcase was not strong enough.  With this commit

8d2d39587d941a40f25ea0144cceb677df115040 is the first bad commit
commit 8d2d39587d941a40f25ea0144cceb677df115040
Author: Segher Boessenkool 
Date:   Mon Oct 22 22:23:39 2018 +0200

combine: Do not combine moves from hard registers

combine started introducing useless moves on hard registers,  when one of the
arguments to our scalar xorsign is a hardreg we get an additional move inserted.

This leads to combine forming an AND with the immediate inside and using the
superflous move to do the r->w move, instead of what we wanted before which was
for the `and` to be a vector and and have reload pick the right alternative.

To fix this the patch just forces the use of the vector version directly and
so combine has no chance to mess it up.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (xorsign3): Renamed to..
(@xorsign3): ...This.
* config/aarch64/aarch64.md (xorsign3): Renamed to...
(@xorsign3): ..This and emit vectors directly
* config/aarch64/iterators.md (VCONQ): Add SF and DF.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/xorsign.c:

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
f67eb70577d0c2d9911d8c867d38a4d0b390337c..e955691f1be8830efacc237465119764ce2a4942
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -500,7 +500,7 @@ (define_expand "ctz2"
   }
 )
 
-(define_expand "xorsign3"
+(define_expand "@xorsign3"
   [(match_operand:VHSDF 0 "register_operand")
(match_operand:VHSDF 1 "register_operand")
(match_operand:VHSDF 2 "register_operand")]
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
01cf989641fce8e6c3828f6cfef62e101c4142df..9db82347bf891f9bc40aedecdc8462c94bf1a769
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6953,31 +6953,20 @@ (define_insn "copysign3_insn"
 ;; EOR   v0.8B, v0.8B, v3.8B
 ;;
 
-(define_expand "xorsign3"
+(define_expand "@xorsign3"
   [(match_operand:GPF 0 "register_operand")
(match_operand:GPF 1 "register_operand")
(match_operand:GPF 2 "register_operand")]
   "TARGET_SIMD"
 {
-
-  machine_mode imode = mode;
-  rtx mask = gen_reg_rtx (imode);
-  rtx op1x = gen_reg_rtx (imode);
-  rtx op2x = gen_reg_rtx (imode);
-
-  int bits = GET_MODE_BITSIZE (mode) - 1;
-  emit_move_insn (mask, GEN_INT (trunc_int_for_mode (HOST_WIDE_INT_M1U << bits,
-imode)));
-
-  emit_insn (gen_and3 (op2x, mask,
-   lowpart_subreg (imode, operands[2],
-   mode)));
-  emit_insn (gen_xor3 (op1x,
-   lowpart_subreg (imode, operands[1],
-   mode),
-   op2x));
+  rtx tmp = gen_reg_rtx (mode);
+  rtx op1 = gen_reg_rtx (mode);
+  rtx op2 = gen_reg_rtx (mode);
+  emit_move_insn (op1, lowpart_subreg (mode, operands[1], mode));
+  emit_move_insn (op2, lowpart_subreg (mode, operands[2], mode));
+  emit_insn (gen_xorsign3(mode, tmp, op1, op2));
   emit_move_insn (operands[0],
- lowpart_subreg (mode, op1x, imode));
+ lowpart_subreg (mode, tmp, mode));
   DONE;
 }
 )
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
9398d713044433cd89b2a83db5ae7969feb1dcf7..2451d8c2cd8e2da6ac8339eed9bc975cf203fa4c
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -1428,7 +1428,8 @@ (define_mode_attr VCONQ [(V8QI "V16QI") (V16QI "V16QI")
 (V4HF "V8HF") (V8HF "V8HF")
 (V2SF "V4SF") (V4SF "V4SF")
 (V2DF "V2DF") (SI   "V4SI")
-(HI   "V8HI") (QI   "V16QI")])
+(HI   "V8HI") (QI   "V16QI")
+(SF   "V4SF") (DF   "V2DF")])
 
 ;; Half modes of all vector modes.
 (define_mode_attr VHALF [(V8QI "V4QI")  (V16QI "V8QI")
diff --git a/gcc/testsuite/gcc.target/aarch64/xorsign.c 
b/gcc/testsuite/gcc.target/aarch64/xorsign.c
index 
22c5829449d932bed08de7e453c435ade3b787b2..dfb7ba7f140524507cb79cb06e12c72ad46eb753
 100644
--- a/gcc/testsuite/gcc.target/aarch64/xorsign.c
+++ b/gcc/testsuite/gcc.target/aarch64/xorsign.c
@@ -79,8 +79,9 @@ check_l_neg_rev (long double x, long double y)
   return __builtin_copysignl (-1.0, y) * x;
 }
 
-/* { dg-final { scan-assembler "\[ \t\]?eor\[ \t\]?" } } */
-/* { dg-final { scan-assembler "\[ \t\]?and\[ \t\]?" } } */
+/* { dg-final { scan-assembler-times {eor\tv[0-9]+\.16b, v[0-9]+\.16b, 
v[0-9]+\.16b} 8 } } */
+/* { dg-final { scan-assembler-times 

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-08-18 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Friday, August 18, 2023 2:53 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> updates for early break.
> 
> On Fri, 18 Aug 2023, Tamar Christina wrote:
> 
> > > > Yeah if you comment it out one of the testcases should fail.
> > >
> > > using new_preheader instead of e->dest would make things clearer.
> > >
> > > You are now adding the same arg to every exit (you've just queried the
> > > main exit redirect_edge_var_map_vector).
> > >
> > > OK, so I think I understand what you're doing.  If I understand
> > > correctly we know that when we exit the main loop via one of the
> > > early exits we are definitely going to enter the epilog but when
> > > we take the main exit we might not.
> > >
> >
> > Correct.. but..
> >
> > > Looking at the CFG we create currently this isn't reflected and
> > > this complicates this PHI node updating.  What I'd try to do
> > > is leave redirecting the alternate exits until after
> >
> > It is, in the case of the alternate exits this is reflected in copying
> > the same values, as they are the values of the number of completed
> > iterations since the scalar code restarts the last iteration.
> >
> > So all the PHI nodes of the alternate exits are correct.  The vector
> > Iteration doesn't handle the partial iteration.
> >
> > > slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> > > means leaving it almost unchanged besides the LC SSA maintaining
> > > changes.  After that for the multi-exit case split the
> > > epilog preheader edge and redirect all the alternate exits to the
> > > new preheader.  So the CFG becomes
> > >
> > >  
> > > /  |
> > >/
> > >   /  if (epilog)
> > >alt exits //  \
> > > //loop around
> > > |   /
> > >preheader with "header" PHIs
> > >   |
> > >   
> > >
> > > note you need the header PHIs also on the main exit path but you
> > > only need the loop end PHIs there.
> > >
> > > It seems so that at least currently the order of things makes
> > > them more complicated than necessary.
> >
> > I've been trying to, but this representation seems a lot harder to work 
> > with,
> > In particular at the moment once we exit
> slpeel_tree_duplicate_loop_to_edge_cfg
> > the loop structure is exactly the same as one expects from any normal epilog
> vectorization.
> >
> > But this new representation requires me to place the guard much earlier than
> the epilogue
> > preheader,  yet I still have to adjust the PHI nodes in the preheader.  So 
> > it
> seems that this split
> > is there to only indicate that we always enter the epilog when taking an 
> > early
> exit.
> >
> > Today this is reflected in the values of the PHI nodes rather than 
> > structurally.
> Once we place
> > The guard we update the nodes and the alternate exits get their value for
> ivtmp updated to VF.
> >
> > This representation also forces me to do the redirection in every call site 
> > of
> > slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated
> in all use sites.
> >
> > But I think this doesn't address the main reason why the
> slpeel_tree_duplicate_loop_to_edge_cfg
> > code has a large block of code to deal with PHI node updates.
> >
> > The reason as you mentioned somewhere else is that after we redirect the
> edges I have to reconstruct
> > the phi nodes.  For most it's straight forwards, but for live values or vuse
> chains it requires extra code.
> >
> > You're right in that before we redirect the edges they are all correct in 
> > the exit
> block, you mentioned that
> > the API for the edge redirection is supposed to copy the values over if I
> create the phi nodes before hand.
> >
> > However this doesn't seem to work:
> >
> >  for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
> >!gsi_end_p (gsi_from); gsi_next (_from))
> > {
> >   gimple *from_phi = gsi_stmt (gsi_from);
> >   tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> >   create_phi_node (new_res, new_preheader);
> > }
> >
> >   for (edge exit : loop_exits)
> > redirect_edge_and_branch (exit, new_preheader);
> >
> > Still leaves them empty.  Grepping around most code seems to pair
> redirect_edge_and_branch with
> > copy_phi_arg_into_existing_phi.  The problem is that in all these cases 
> > after
> redirecting an edge they
> > call copy_phi_arg_into_existing_phi from a predecessor edge to fill in the 
> > phi
> nodes.
> 
> You need to call flush_pending_stmts on each edge you redirect.
> copy_phi_arg_into_existing_phi isn't suitable for edge redirecting.

Oh. I'll give that a try, that would make sense.. I didn't flush it in the 
current approach
because I needed the map, but since I want to get rid of the map, 

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-08-18 Thread Tamar Christina via Gcc-patches
> > Yeah if you comment it out one of the testcases should fail.
> 
> using new_preheader instead of e->dest would make things clearer.
> 
> You are now adding the same arg to every exit (you've just queried the
> main exit redirect_edge_var_map_vector).
> 
> OK, so I think I understand what you're doing.  If I understand
> correctly we know that when we exit the main loop via one of the
> early exits we are definitely going to enter the epilog but when
> we take the main exit we might not.
> 

Correct.. but..

> Looking at the CFG we create currently this isn't reflected and
> this complicates this PHI node updating.  What I'd try to do
> is leave redirecting the alternate exits until after

It is, in the case of the alternate exits this is reflected in copying
the same values, as they are the values of the number of completed 
iterations since the scalar code restarts the last iteration.

So all the PHI nodes of the alternate exits are correct.  The vector
Iteration doesn't handle the partial iteration.

> slpeel_tree_duplicate_loop_to_edge_cfg finished which probably
> means leaving it almost unchanged besides the LC SSA maintaining
> changes.  After that for the multi-exit case split the
> epilog preheader edge and redirect all the alternate exits to the
> new preheader.  So the CFG becomes
> 
>  
> /  |
>/
>   /  if (epilog)
>alt exits //  \
> //loop around
> |   /
>preheader with "header" PHIs
>   |
>   
> 
> note you need the header PHIs also on the main exit path but you
> only need the loop end PHIs there.
> 
> It seems so that at least currently the order of things makes
> them more complicated than necessary.

I've been trying to, but this representation seems a lot harder to work with,
In particular at the moment once we exit slpeel_tree_duplicate_loop_to_edge_cfg
the loop structure is exactly the same as one expects from any normal epilog 
vectorization.

But this new representation requires me to place the guard much earlier than 
the epilogue
preheader,  yet I still have to adjust the PHI nodes in the preheader.  So it 
seems that this split
is there to only indicate that we always enter the epilog when taking an early 
exit.

Today this is reflected in the values of the PHI nodes rather than 
structurally.  Once we place
The guard we update the nodes and the alternate exits get their value for ivtmp 
updated to VF.

This representation also forces me to do the redirection in every call site of
slpeel_tree_duplicate_loop_to_edge_cfg making the code more complicated in all 
use sites.

But I think this doesn't address the main reason why the 
slpeel_tree_duplicate_loop_to_edge_cfg
code has a large block of code to deal with PHI node updates.

The reason as you mentioned somewhere else is that after we redirect the edges 
I have to reconstruct
the phi nodes.  For most it's straight forwards, but for live values or vuse 
chains it requires extra code.

You're right in that before we redirect the edges they are all correct in the 
exit block, you mentioned that
the API for the edge redirection is supposed to copy the values over if I 
create the phi nodes before hand.

However this doesn't seem to work:

 for (auto gsi_from = gsi_start_phis (scalar_exit->dest);
   !gsi_end_p (gsi_from); gsi_next (_from))
{
  gimple *from_phi = gsi_stmt (gsi_from);
  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
  create_phi_node (new_res, new_preheader);
}

  for (edge exit : loop_exits)
redirect_edge_and_branch (exit, new_preheader);

Still leaves them empty.  Grepping around most code seems to pair 
redirect_edge_and_branch with
copy_phi_arg_into_existing_phi.  The problem is that in all these cases after 
redirecting an edge they
call copy_phi_arg_into_existing_phi from a predecessor edge to fill in the phi 
nodes.

This is because as I redirect_edge_and_branch destroys the phi node entries and 
copy_phi_arg_into_existing_phi
simply just reads the gimple_phi_arg_def which would be NULL.

You could point it to the src block of the exit, in which case it copies the 
wrong values in for the vuses.  At the end
of vectorization the cfgcleanup code does the same thing to maintain LCSSA if 
you haven't.  This code always goes
wrong for multiple exits because of the problem described above.  There's no 
node for it to copy the right value
from.

As an alternate approach I can split the exit edges, copy the phi nodes into 
the split and after that redirect them.
This however creates the awkwardness of having the exit edges no longer connect 
to the preheader.

All of this then begs the question if this is all easier than the current 
approach which is just to read the edge var
map to figure out the nodes that were removed during the redirect.

Maybe I'm still misunderstanding the API, 

RE: [PATCH]AArch64 update costing for MLA by invariant

2023-08-03 Thread Tamar Christina via Gcc-patches
> >> Do you see vect_constant_defs in practice, or is this just for 
> >> completeness?
> >> I would expect any constants to appear as direct operands.  I don't
> >> mind keeping it if it's just a belt-and-braces thing though.
> >
> > In the latency case where I had allow_constants the early rejection
> > based on the operand itself wouldn't be rejected so in that case I
> > still needed to reject them but do so after the multiply check.  While
> > they do appear as direct operands as well they also have their own
> > nodes, in particular for SLP so the constants are handled as a group.
> 
> Ah, OK, thanks.
> 
> > But can also check CONSTANT_CLASS_P (rhs) if that's preferrable.
> 
> No, what you did is more correct.  I just wasn't sure at first which case it 
> was
> handling.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_multiply_add_p): Update handling
of constants. 
(aarch64_adjust_stmt_cost): Use it.
(aarch64_vector_costs::count_ops): Likewise.
(aarch64_vector_costs::add_stmt_cost): Pass vinfo to
aarch64_adjust_stmt_cost.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
d4d7602554592b9042b8eaf389eff1ec80c2090e..7cc5916ce06b2635346c807da9306738b939ebc6
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16410,10 +16410,6 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
   if (code != PLUS_EXPR && code != MINUS_EXPR)
 return false;
 
-  if (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
-  || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign)))
-return false;
-
   for (int i = 1; i < 3; ++i)
 {
   tree rhs = gimple_op (assign, i);
@@ -16441,7 +16437,8 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
return false;
  def_stmt_info = vinfo->lookup_def (rhs);
  if (!def_stmt_info
- || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def)
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_constant_def)
return false;
}
 
@@ -16721,8 +16718,9 @@ aarch64_sve_adjust_stmt_cost (class vec_info *vinfo, 
vect_cost_for_stmt kind,
and which when vectorized would operate on vector type VECTYPE.  Add the
cost of any embedded operations.  */
 static fractional_cost
-aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, stmt_vec_info stmt_info,
- tree vectype, fractional_cost stmt_cost)
+aarch64_adjust_stmt_cost (vec_info *vinfo, vect_cost_for_stmt kind,
+ stmt_vec_info stmt_info, tree vectype,
+ unsigned vec_flags, fractional_cost stmt_cost)
 {
   if (vectype)
 {
@@ -16745,6 +16743,14 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
stmt_vec_info stmt_info,
  break;
}
 
+  gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
+  if (assign && !vect_is_reduction (stmt_info))
+   {
+ /* For MLA we need to reduce the cost since MLA is 1 instruction.  */
+ if (aarch64_multiply_add_p (vinfo, stmt_info, vec_flags))
+   return 0;
+   }
+
   if (kind == vector_stmt || kind == vec_to_scalar)
if (tree cmp_type = vect_embedded_comparison_type (stmt_info))
  {
@@ -16814,7 +16820,8 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
 }
 
   /* Assume that multiply-adds will become a single operation.  */
-  if (stmt_info && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags))
+  if (stmt_info
+  && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags))
 return;
 
   /* Count the basic operation cost associated with KIND.  */
@@ -17060,8 +17067,8 @@ aarch64_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
 {
   /* Account for any extra "embedded" costs that apply additively
 to the base cost calculated above.  */
-  stmt_cost = aarch64_adjust_stmt_cost (kind, stmt_info, vectype,
-   stmt_cost);
+  stmt_cost = aarch64_adjust_stmt_cost (m_vinfo, kind, stmt_info,
+   vectype, m_vec_flags, stmt_cost);
 
   /* If we're recording a nonzero vector loop body cost for the
 innermost loop, also estimate the operations that would need


rb17618.patch
Description: rb17618.patch


RE: [PATCH]AArch64 Undo vec_widen_shiftl optabs [PR106346]

2023-08-03 Thread Tamar Christina via Gcc-patches
> > +
> > +(define_constraint "D3"
> > +  "@internal
> > + A constraint that matches vector of immediates that is with 0 to
> > +(bits(mode)/2)-1."
> > + (and (match_code "const,const_vector")
> > +  (match_test "aarch64_const_vec_all_same_in_range_p (op, 0,
> > +   (GET_MODE_UNIT_BITSIZE (mode) / 2) - 1)")))
> 
> Having this mapping for D2 and D3, with D2 corresponded to prec/2, kind-of
> makes D3 a false mnemonic.  How about DL instead?  (L for "left-shift long" or
> "low-part", take your pick)
> 
> Looks good otherwise.
> 

Wasn't sure if this was an ok with changes or not, so here's the final patch 

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/106346
* config/aarch64/aarch64-simd.md (vec_widen_shiftl_lo_,
vec_widen_shiftl_hi_): Remove.
(aarch64_shll_internal): Renamed to...
(aarch64_shll): .. This.
(aarch64_shll2_internal): Renamed to...
(aarch64_shll2): .. This.
(aarch64_shll_n, aarch64_shll2_n): Re-use new
optabs.
* config/aarch64/constraints.md (D2, DL): New.
* config/aarch64/predicates.md (aarch64_simd_shll_imm_vec): New.

gcc/testsuite/ChangeLog:

PR target/106346
* gcc.target/aarch64/pr98772.c: Adjust assembly.
* gcc.target/aarch64/vect-widen-shift.c: New test.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
d95394101470446e55f25a2397dd112239b6a54d..f67eb70577d0c2d9911d8c867d38a4d0b390337c
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -6387,105 +6387,67 @@ (define_insn "aarch64_qshl"
   [(set_attr "type" "neon_sat_shift_reg")]
 )
 
-(define_expand "vec_widen_shiftl_lo_"
-  [(set (match_operand: 0 "register_operand" "=w")
-   (unspec: [(match_operand:VQW 1 "register_operand" "w")
-(match_operand:SI 2
-  "aarch64_simd_shift_imm_bitsize_" "i")]
-VSHLL))]
-  "TARGET_SIMD"
-  {
-rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
-emit_insn (gen_aarch64_shll_internal (operands[0], operands[1],
-p, operands[2]));
-DONE;
-  }
-)
-
-(define_expand "vec_widen_shiftl_hi_"
-   [(set (match_operand: 0 "register_operand")
-   (unspec: [(match_operand:VQW 1 "register_operand" "w")
-(match_operand:SI 2
-  "immediate_operand" "i")]
- VSHLL))]
-   "TARGET_SIMD"
-   {
-rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
-emit_insn (gen_aarch64_shll2_internal (operands[0], operands[1],
- p, operands[2]));
-DONE;
-   }
-)
-
 ;; vshll_n
 
-(define_insn "aarch64_shll_internal"
-  [(set (match_operand: 0 "register_operand" "=w")
-   (unspec: [(vec_select:
-   (match_operand:VQW 1 "register_operand" "w")
-   (match_operand:VQW 2 "vect_par_cnst_lo_half" ""))
-(match_operand:SI 3
-  "aarch64_simd_shift_imm_bitsize_" "i")]
-VSHLL))]
+(define_insn "aarch64_shll"
+  [(set (match_operand: 0 "register_operand")
+   (ashift: (ANY_EXTEND:
+   (match_operand:VD_BHSI 1 "register_operand"))
+(match_operand: 2
+  "aarch64_simd_shll_imm_vec")))]
   "TARGET_SIMD"
-  {
-if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
-  return "shll\\t%0., %1., %3";
-else
-  return "shll\\t%0., %1., %3";
+  {@ [cons: =0, 1, 2]
+ [w, w, D2] shll\t%0., %1., %I2
+ [w, w, DL] shll\t%0., %1., %I2
   }
   [(set_attr "type" "neon_shift_imm_long")]
 )
 
-(define_insn "aarch64_shll2_internal"
-  [(set (match_operand: 0 "register_operand" "=w")
-   (unspec: [(vec_select:
-   (match_operand:VQW 1 "register_operand" "w")
-   (match_operand:VQW 2 "vect_par_cnst_hi_half" ""))
-(match_operand:SI 3
-  "aarch64_simd_shift_imm_bitsize_" "i")]
+(define_expand "aarch64_shll_n"
+  [(set (match_operand: 0 "register_operand")
+   (unspec: [(match_operand:VD_BHSI 1 "register_operand")
+(match_operand:SI 2
+  "aarch64_simd_shift_imm_bitsize_")]
 VSHLL))]
   "TARGET_SIMD"
   {
-if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
-  return "shll2\\t%0., %1., %3";
-else
-  return "shll2\\t%0., %1., %3";
+rtx shft = gen_const_vec_duplicate (mode, operands[2]);
+emit_insn (gen_aarch64_shll (operands[0], operands[1], shft));
+DONE;
   }
-  [(set_attr "type" "neon_shift_imm_long")]
 )
 
-(define_insn "aarch64_shll_n"
-  [(set 

RE: [PATCH]AArch64 update costing for MLA by invariant

2023-08-02 Thread Tamar Christina via Gcc-patches
> Tamar Christina  writes:
> > Hi All,
> >
> > When determining issue rates we currently discount non-constant MLA
> > accumulators for Advanced SIMD but don't do it for the latency.
> >
> > This means the costs for Advanced SIMD with a constant accumulator are
> > wrong and results in us costing SVE and Advanced SIMD the same.  This
> > can cauze us to vectorize with Advanced SIMD instead of SVE in some cases.
> >
> > This patch adds the same discount for SVE and Scalar as we do for issue 
> > rate.
> >
> > My assumption was that on issue rate we reject all scalar constants
> > early because we take into account the extra instruction to create the
> constant?
> > Though I'd have expected this to be in prologue costs.  For this
> > reason I added an extra parameter to allow me to force the check to at
> > least look for the multiplication.
> 
> I'm not sure that was it.  I wish I'd added a comment to say what it was
> though :(  I suspect different parts of this function were written at 
> different
> times, hence the inconsistency.
> 
> > This gives a 5% improvement in fotonik3d_r in SPECCPU 2017 on large
> > Neoverse cores.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.cc (aarch64_multiply_add_p): Add param
> > allow_constants.
> > (aarch64_adjust_stmt_cost): Use it.
> > (aarch64_vector_costs::count_ops): Likewise.
> > (aarch64_vector_costs::add_stmt_cost): Pass vinfo to
> > aarch64_adjust_stmt_cost.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index
> >
> 560e5431636ef46c41d56faa0c4e95be78f64b50..76b74b77b3f122a3c9725
> 57e2f83
> > b63ba365fea9 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -16398,10 +16398,11 @@ aarch64_advsimd_ldp_stp_p (enum
> vect_cost_for_stmt kind,
> > or multiply-subtract sequence that might be suitable for fusing into a
> > single instruction.  If VEC_FLAGS is zero, analyze the operation as
> > a scalar one, otherwise analyze it as an operation on vectors with those
> > -   VEC_* flags.  */
> > +   VEC_* flags.  When ALLOW_CONSTANTS we'll recognize all accumulators
> including
> > +   constant ones.  */
> >  static bool
> >  aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info,
> > -   unsigned int vec_flags)
> > +   unsigned int vec_flags, bool allow_constants)
> >  {
> >gassign *assign = dyn_cast (stmt_info->stmt);
> >if (!assign)
> > @@ -16410,8 +16411,9 @@ aarch64_multiply_add_p (vec_info *vinfo,
> stmt_vec_info stmt_info,
> >if (code != PLUS_EXPR && code != MINUS_EXPR)
> >  return false;
> >
> > -  if (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
> > -  || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign)))
> > +  if (!allow_constants
> > +  && (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
> > + || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign
> >  return false;
> >
> >for (int i = 1; i < 3; ++i)
> > @@ -16429,7 +16431,7 @@ aarch64_multiply_add_p (vec_info *vinfo,
> stmt_vec_info stmt_info,
> >if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
> > continue;
> >
> > -  if (vec_flags & VEC_ADVSIMD)
> > +  if (!allow_constants && (vec_flags & VEC_ADVSIMD))
> > {
> >   /* Scalar and SVE code can tie the result to any FMLA input (or none,
> >  although that requires a MOVPRFX for SVE).  However, Advanced
> > SIMD @@ -16441,7 +16443,8 @@ aarch64_multiply_add_p (vec_info
> *vinfo, stmt_vec_info stmt_info,
> > return false;
> >   def_stmt_info = vinfo->lookup_def (rhs);
> >   if (!def_stmt_info
> > - || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def)
> > + || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def
> > + || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_constant_def)
> 
> Do you see vect_constant_defs in practice, or is this just for completeness?
> I would expect any constants to appear as direct operands.  I don't mind
> keeping it if it's just a belt-and-braces thing though.

In the latency case where I had allow_constants the early rejection based on
the operand itself wouldn't be rejected so in that case I still needed to reject
them but do so after the multiply check.  While they do appear as direct
operands as well they also have their own nodes, in particular for SLP so the
constants are handled as a group.

But can also check CONSTANT_CLASS_P (rhs) if that's preferrable. 

> 
> But rather than add the allow_constants parameter, I think we should just try
> removing:
> 
>   if (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
>   || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign)))
> return false;
> 
> so that the detection is the same for throughput and latency.  I think:
> 
>   if 

[PATCH][gensupport]: Don't segfault on empty attrs list

2023-08-02 Thread Tamar Christina via Gcc-patches
Hi All,

Currently we segfault when len == 0 for an attribute list.

essentially [cons: =0, 1, 2, 3; attrs: ] segfaults but should be equivalent to
[cons: =0, 1, 2, 3] and [cons: =0, 1, 2, 3; attrs:].  This fixes it by just
returning early and leaving it to the validators whether this should error out
or not.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* gensupport.cc (conlist): Support length 0 attribute.

--- inline copy of patch -- 
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 
959d1d9c83cf397fcb344e8d3db0f339a967587f..5c5f1cf4781551d3db95103c19cd1b70d98f4f73
 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -619,6 +619,9 @@ public:
  [ns..ns + len) should equal XSTR (rtx, 0).  */
   conlist (const char *ns, unsigned int len, bool numeric)
   {
+if (len == 0)
+  return;
+
 /* Trim leading whitespaces.  */
 while (ISBLANK (*ns))
   {




-- 
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 
959d1d9c83cf397fcb344e8d3db0f339a967587f..5c5f1cf4781551d3db95103c19cd1b70d98f4f73
 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -619,6 +619,9 @@ public:
  [ns..ns + len) should equal XSTR (rtx, 0).  */
   conlist (const char *ns, unsigned int len, bool numeric)
   {
+if (len == 0)
+  return;
+
 /* Trim leading whitespaces.  */
 while (ISBLANK (*ns))
   {





[PATCH]AArch64 Undo vec_widen_shiftl optabs [PR106346]

2023-08-02 Thread Tamar Christina via Gcc-patches
Hi All,

In GCC 11 we implemented the vectorizer optab for widening left shifts,
however this optab is only supported for uniform shift constants.

At the moment GCC still has two loop vectorization strategy (classical loop and
SLP based loop vec) and the optab is implemented as a scalar pattern.

This means that when we apply it to a non-uniform constant inside a loop we only
find out during SLP build that the constants aren't uniform.  At this point it's
too late and we lose SLP entirely.

Over the years I've tried various options but none of it works well:

1. Dissolving patterns during SLP built (problematic, also dissolves them for
non-slp).
2. Optionally ignoring patterns for SLP build (problematic, ends up interfearing
with relevancy detection).
3. Relaxing contraint on SLP build to allow non-constant values and dissolving
them after SLP build using an SLP pattern.  (problematic, ends up breaking
shift reassociation).

As a result we've concluded that for now this pattern should just be removed
and formed during RTL.

The plan is to move this to an SLP only pattern once we remove classical loop
vectorization support from GCC, at which time we can also properly support SVE's
Top and Bottom variants.

This removes the optab and reworks the RTL to recognize both the vector variant
and the intrinsics variant.  Also just simplifies all these patterns.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/106346
* config/aarch64/aarch64-simd.md (vec_widen_shiftl_lo_,
vec_widen_shiftl_hi_): Remove.
(aarch64_shll_internal): Renamed to...
(aarch64_shll): .. This.
(aarch64_shll2_internal): Renamed to...
(aarch64_shll2): .. This.
(aarch64_shll_n, aarch64_shll2_n): Re-use new
optabs.
* config/aarch64/constraints.md (D2, D3): New.
* config/aarch64/predicates.md (aarch64_simd_shift_imm_vec): New.

gcc/testsuite/ChangeLog:

PR target/106346
* gcc.target/aarch64/pr98772.c: Adjust assembly.
* gcc.target/aarch64/vect-widen-shift.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
d95394101470446e55f25a2397dd112239b6a54d..afd5b8632afbcddf8dad14495c3446c560eb085d
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -6387,105 +6387,66 @@ (define_insn "aarch64_qshl"
   [(set_attr "type" "neon_sat_shift_reg")]
 )
 
-(define_expand "vec_widen_shiftl_lo_"
-  [(set (match_operand: 0 "register_operand" "=w")
-   (unspec: [(match_operand:VQW 1 "register_operand" "w")
-(match_operand:SI 2
-  "aarch64_simd_shift_imm_bitsize_" "i")]
-VSHLL))]
-  "TARGET_SIMD"
-  {
-rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
-emit_insn (gen_aarch64_shll_internal (operands[0], operands[1],
-p, operands[2]));
-DONE;
-  }
-)
-
-(define_expand "vec_widen_shiftl_hi_"
-   [(set (match_operand: 0 "register_operand")
-   (unspec: [(match_operand:VQW 1 "register_operand" "w")
-(match_operand:SI 2
-  "immediate_operand" "i")]
- VSHLL))]
-   "TARGET_SIMD"
-   {
-rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
-emit_insn (gen_aarch64_shll2_internal (operands[0], operands[1],
- p, operands[2]));
-DONE;
-   }
-)
-
 ;; vshll_n
 
-(define_insn "aarch64_shll_internal"
-  [(set (match_operand: 0 "register_operand" "=w")
-   (unspec: [(vec_select:
-   (match_operand:VQW 1 "register_operand" "w")
-   (match_operand:VQW 2 "vect_par_cnst_lo_half" ""))
-(match_operand:SI 3
-  "aarch64_simd_shift_imm_bitsize_" "i")]
-VSHLL))]
+(define_insn "aarch64_shll"
+  [(set (match_operand: 0 "register_operand")
+   (ashift: (ANY_EXTEND:
+   (match_operand:VD_BHSI 1 "register_operand"))
+(match_operand: 2
+  "aarch64_simd_shift_imm_vec")))]
   "TARGET_SIMD"
-  {
-if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
-  return "shll\\t%0., %1., %3";
-else
-  return "shll\\t%0., %1., %3";
+  {@ [cons: =0, 1, 2]
+ [w, w, D2] shll\t%0., %1., %I2
+ [w, w, D3] shll\t%0., %1., %I2
   }
   [(set_attr "type" "neon_shift_imm_long")]
 )
 
-(define_insn "aarch64_shll2_internal"
-  [(set (match_operand: 0 "register_operand" "=w")
-   (unspec: [(vec_select:
-   (match_operand:VQW 1 "register_operand" "w")
-   (match_operand:VQW 2 "vect_par_cnst_hi_half" ""))
-(match_operand:SI 3
-   

[PATCH]AArch64 update costing for combining vector conditionals

2023-08-02 Thread Tamar Christina via Gcc-patches
Hi All,

boolean comparisons have different cost depending on the mode. e.g.
a && b when predicated doesn't require an addition instruction, the AND is free
by combining the predicate of the one operation into the second one.  At the
moment though we only fuse compares so this update requires one of the
operands to be a comparison.

Scalars also don't require this because the non-ifct variant is a series of
branches where following the branch sequences themselves are natural ANDs.

Advanced SIMD however does require an actual AND to combine the boolean values.

As such this patch discounts Scalar and SVE boolean operation latency and
throughput.

With this patch comparison heavy code prefers SVE as it should, especially in
cases with SVE VL == Advanced SIMD VL where previously the SVE prologue costs
would tip it towards Advanced SIMD.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_bool_compound_p): New.
(aarch64_adjust_stmt_cost, aarch64_vector_costs::count_ops): Use it.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
b1bacc734b4630257b6ebf8ca7d9afeb34008c10..55963bb28be7ede08b05fb9fddb5a65f6818c63e
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16453,6 +16453,49 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
   return false;
 }
 
+/* Return true if STMT_INFO is the second part of a two-statement boolean AND
+   expression sequence that might be suitable for fusing into a
+   single instruction.  If VEC_FLAGS is zero, analyze the operation as
+   a scalar one, otherwise analyze it as an operation on vectors with those
+   VEC_* flags.  */
+
+static bool
+aarch64_bool_compound_p (vec_info *vinfo, stmt_vec_info stmt_info,
+unsigned int vec_flags)
+{
+  gassign *assign = dyn_cast (stmt_info->stmt);
+  if (!assign
+  || !STMT_VINFO_VECTYPE (stmt_info)
+  || !VECTOR_BOOLEAN_TYPE_P (STMT_VINFO_VECTYPE (stmt_info))
+  || gimple_assign_rhs_code (assign) != BIT_AND_EXPR)
+return false;
+
+  for (int i = 1; i < 3; ++i)
+{
+  tree rhs = gimple_op (assign, i);
+
+  if (TREE_CODE (rhs) != SSA_NAME)
+   continue;
+
+  stmt_vec_info def_stmt_info = vinfo->lookup_def (rhs);
+  if (!def_stmt_info
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) != vect_internal_def)
+   continue;
+
+  gassign *rhs_assign = dyn_cast (def_stmt_info->stmt);
+  if (!rhs_assign
+ || TREE_CODE_CLASS (gimple_assign_rhs_code (rhs_assign))
+   != tcc_comparison)
+   continue;
+
+  if (vec_flags & VEC_ADVSIMD)
+   return false;
+
+  return true;
+}
+  return false;
+}
+
 /* We are considering implementing STMT_INFO using SVE.  If STMT_INFO is an
in-loop reduction that SVE supports directly, return its latency in cycles,
otherwise return zero.  SVE_COSTS specifies the latencies of the relevant
@@ -16750,11 +16793,17 @@ aarch64_adjust_stmt_cost (vec_info *vinfo, 
vect_cost_for_stmt kind,
}
 
   gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
-  if (assign && !vect_is_reduction (stmt_info))
+  if (assign)
{
  bool simd_p = vec_flags & VEC_ADVSIMD;
  /* For MLA we need to reduce the cost since MLA is 1 instruction.  */
- if (aarch64_multiply_add_p (vinfo, stmt_info, vec_flags, !simd_p))
+ if (!vect_is_reduction (stmt_info)
+ && aarch64_multiply_add_p (vinfo, stmt_info, vec_flags, !simd_p))
+   return 0;
+
+ /* For vector boolean ANDs with a compare operand we just need
+one insn.  */
+ if (aarch64_bool_compound_p (vinfo, stmt_info, vec_flags))
return 0;
}
 
@@ -16831,6 +16880,12 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
   && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags, false))
 return;
 
+  /* Assume that bool AND with compare operands will become a single
+ operation.  */
+  if (stmt_info
+  && aarch64_bool_compound_p (m_vinfo, stmt_info, m_vec_flags))
+return;
+
   /* Count the basic operation cost associated with KIND.  */
   switch (kind)
 {




-- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
b1bacc734b4630257b6ebf8ca7d9afeb34008c10..55963bb28be7ede08b05fb9fddb5a65f6818c63e
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16453,6 +16453,49 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
   return false;
 }
 
+/* Return true if STMT_INFO is the second part of a two-statement boolean AND
+   expression sequence that might be suitable for fusing into a
+   single instruction.  If VEC_FLAGS is zero, analyze the operation as
+   a scalar one, otherwise analyze it as an operation on vectors 

[PATCH]AArch64 update costing for MLA by invariant

2023-08-02 Thread Tamar Christina via Gcc-patches
Hi All,

When determining issue rates we currently discount non-constant MLA accumulators
for Advanced SIMD but don't do it for the latency.

This means the costs for Advanced SIMD with a constant accumulator are wrong and
results in us costing SVE and Advanced SIMD the same.  This can cauze us to
vectorize with Advanced SIMD instead of SVE in some cases.

This patch adds the same discount for SVE and Scalar as we do for issue rate.

My assumption was that on issue rate we reject all scalar constants early
because we take into account the extra instruction to create the constant?
Though I'd have expected this to be in prologue costs.  For this reason I added
an extra parameter to allow me to force the check to at least look for the
multiplication.

This gives a 5% improvement in fotonik3d_r in SPECCPU 2017 on large
Neoverse cores.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_multiply_add_p): Add param
allow_constants. 
(aarch64_adjust_stmt_cost): Use it.
(aarch64_vector_costs::count_ops): Likewise.
(aarch64_vector_costs::add_stmt_cost): Pass vinfo to
aarch64_adjust_stmt_cost.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
560e5431636ef46c41d56faa0c4e95be78f64b50..76b74b77b3f122a3c972557e2f83b63ba365fea9
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16398,10 +16398,11 @@ aarch64_advsimd_ldp_stp_p (enum vect_cost_for_stmt 
kind,
or multiply-subtract sequence that might be suitable for fusing into a
single instruction.  If VEC_FLAGS is zero, analyze the operation as
a scalar one, otherwise analyze it as an operation on vectors with those
-   VEC_* flags.  */
+   VEC_* flags.  When ALLOW_CONSTANTS we'll recognize all accumulators 
including
+   constant ones.  */
 static bool
 aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info,
-   unsigned int vec_flags)
+   unsigned int vec_flags, bool allow_constants)
 {
   gassign *assign = dyn_cast (stmt_info->stmt);
   if (!assign)
@@ -16410,8 +16411,9 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
   if (code != PLUS_EXPR && code != MINUS_EXPR)
 return false;
 
-  if (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
-  || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign)))
+  if (!allow_constants
+  && (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
+ || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign
 return false;
 
   for (int i = 1; i < 3; ++i)
@@ -16429,7 +16431,7 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
   if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
continue;
 
-  if (vec_flags & VEC_ADVSIMD)
+  if (!allow_constants && (vec_flags & VEC_ADVSIMD))
{
  /* Scalar and SVE code can tie the result to any FMLA input (or none,
 although that requires a MOVPRFX for SVE).  However, Advanced SIMD
@@ -16441,7 +16443,8 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
return false;
  def_stmt_info = vinfo->lookup_def (rhs);
  if (!def_stmt_info
- || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def)
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_constant_def)
return false;
}
 
@@ -16721,8 +16724,9 @@ aarch64_sve_adjust_stmt_cost (class vec_info *vinfo, 
vect_cost_for_stmt kind,
and which when vectorized would operate on vector type VECTYPE.  Add the
cost of any embedded operations.  */
 static fractional_cost
-aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, stmt_vec_info stmt_info,
- tree vectype, fractional_cost stmt_cost)
+aarch64_adjust_stmt_cost (vec_info *vinfo, vect_cost_for_stmt kind,
+ stmt_vec_info stmt_info, tree vectype,
+ unsigned vec_flags, fractional_cost stmt_cost)
 {
   if (vectype)
 {
@@ -16745,6 +16749,15 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
stmt_vec_info stmt_info,
  break;
}
 
+  gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
+  if (assign && !vect_is_reduction (stmt_info))
+   {
+ bool simd_p = vec_flags & VEC_ADVSIMD;
+ /* For MLA we need to reduce the cost since MLA is 1 instruction.  */
+ if (aarch64_multiply_add_p (vinfo, stmt_info, vec_flags, !simd_p))
+   return 0;
+   }
+
   if (kind == vector_stmt || kind == vec_to_scalar)
if (tree cmp_type = vect_embedded_comparison_type (stmt_info))
  {
@@ -16795,7 +16808,8 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
 }
 
   /* Assume that 

RE: [PATCH 2/2][frontend]: Add novector C pragma

2023-08-02 Thread Tamar Christina via Gcc-patches
Ping.

> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, July 26, 2023 8:35 PM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; jos...@codesourcery.com
> Subject: RE: [PATCH 2/2][frontend]: Add novector C pragma
> 
> Hi, This is a respin of the patch taking in the feedback received from the C++
> part.
> 
> Simultaneously it's also a ping 
> 
> 
> 
> Hi All,
> 
> FORTRAN currently has a pragma NOVECTOR for indicating that vectorization
> should not be applied to a particular loop.
> 
> ICC/ICX also has such a pragma for C and C++ called #pragma novector.
> 
> As part of this patch series I need a way to easily turn off vectorization of
> particular loops, particularly for testsuite reasons.
> 
> This patch proposes a #pragma GCC novector that does the same for C as
> gfortan does for FORTRAN and what ICX/ICX does for C.
> 
> I added only some basic tests here, but the next patch in the series uses 
> this in
> the testsuite in about ~800 tests.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/c-family/ChangeLog:
> 
>   * c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
>   * c-pragma.cc (init_pragma): Use it.
> 
> gcc/c/ChangeLog:
> 
>   * c-parser.cc (c_parser_while_statement, c_parser_do_statement,
>   c_parser_for_statement, c_parser_statement_after_labels,
>   c_parse_pragma_novector, c_parser_pragma): Wire through novector
> and
>   default to false.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-novector-pragma.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h index
> 9cc95ab3ee376628dbef2485b84e6008210fa8fc..99cf2e8bd1c05537c1984
> 70f1aaa0a5a9da4e576 100644
> --- a/gcc/c-family/c-pragma.h
> +++ b/gcc/c-family/c-pragma.h
> @@ -87,6 +87,7 @@ enum pragma_kind {
>PRAGMA_GCC_PCH_PREPROCESS,
>PRAGMA_IVDEP,
>PRAGMA_UNROLL,
> +  PRAGMA_NOVECTOR,
> 
>PRAGMA_FIRST_EXTERNAL
>  };
> diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc index
> 0d2b333cebbed32423d5dc6fd2a3ac0ce0bf8b94..848a850b8e123ff1c6ae1e
> c4b7f8ccbd599b1a88 100644
> --- a/gcc/c-family/c-pragma.cc
> +++ b/gcc/c-family/c-pragma.cc
> @@ -1862,6 +1862,10 @@ init_pragma (void)
>  cpp_register_deferred_pragma (parse_in, "GCC", "unroll",
> PRAGMA_UNROLL,
> false, false);
> 
> +  if (!flag_preprocess_only)
> +cpp_register_deferred_pragma (parse_in, "GCC", "novector",
> PRAGMA_NOVECTOR,
> +   false, false);
> +
>  #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
>c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);  #else
> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index
> 24a6eb6e4596f32c477e3f1c3f98b9792f7bc92c..74f3cbb0d61b5f4c0eb300
> 672f495dde3f1517f7 100644
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -1572,9 +1572,11 @@ static tree c_parser_c99_block_statement
> (c_parser *, bool *,
> location_t * = NULL);
>  static void c_parser_if_statement (c_parser *, bool *, vec *);  static 
> void
> c_parser_switch_statement (c_parser *, bool *); -static void
> c_parser_while_statement (c_parser *, bool, unsigned short, bool *); -static
> void c_parser_do_statement (c_parser *, bool, unsigned short); -static void
> c_parser_for_statement (c_parser *, bool, unsigned short, bool *);
> +static void c_parser_while_statement (c_parser *, bool, unsigned short, bool,
> +   bool *);
> +static void c_parser_do_statement (c_parser *, bool, unsigned short,
> +bool); static void c_parser_for_statement (c_parser *, bool, unsigned short,
> bool,
> + bool *);
>  static tree c_parser_asm_statement (c_parser *);  static tree
> c_parser_asm_operands (c_parser *);  static tree
> c_parser_asm_goto_operands (c_parser *); @@ -6644,13 +6646,13 @@
> c_parser_statement_after_labels (c_parser *parser, bool *if_p,
> c_parser_switch_statement (parser, if_p);
> break;
>   case RID_WHILE:
> -   c_parser_while_statement (parser, false, 0, if_p);
> +   c_parser_while_statement (parser, false, 0, false, if_p);
> break;
>   case RID_DO:
> -   c_parser_do_statement (parser, false, 0);
> +   c_parser_do_statement (parser, false, 0, false);
> break;
>   case RID_FOR:
> -   c_parser_for_statement (parser, false, 0, if_p);
> +   c_parser_for_statement (parser, false, 0, false, if_p);
> break;
>   case RID_GOTO:
> c_parser_consume_token (parser);
> @@ -7146,7 +7148,7 @@ c_parser_switch_statement (c_parser *parser, bool
> *if_p)
> 
>  static void
>  c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short
> unroll,
> -   bool *if_p)
> +   bool novector, bool *if_p)
>  {
>tree block, cond, body;
>unsigned char 

RE: [PATCH 2/2][frontend]: Add novector C pragma

2023-07-26 Thread Tamar Christina via Gcc-patches
Hi, This is a respin of the patch taking in the feedback received from the C++ 
part.

Simultaneously it's also a ping 



Hi All,

FORTRAN currently has a pragma NOVECTOR for indicating that vectorization should
not be applied to a particular loop.

ICC/ICX also has such a pragma for C and C++ called #pragma novector.

As part of this patch series I need a way to easily turn off vectorization of
particular loops, particularly for testsuite reasons.

This patch proposes a #pragma GCC novector that does the same for C
as gfortan does for FORTRAN and what ICX/ICX does for C.

I added only some basic tests here, but the next patch in the series uses this
in the testsuite in about ~800 tests.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/c-family/ChangeLog:

* c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
* c-pragma.cc (init_pragma): Use it.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_while_statement, c_parser_do_statement,
c_parser_for_statement, c_parser_statement_after_labels,
c_parse_pragma_novector, c_parser_pragma): Wire through novector and
default to false.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-novector-pragma.c: New test.

--- inline copy of patch ---

diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 
9cc95ab3ee376628dbef2485b84e6008210fa8fc..99cf2e8bd1c05537c198470f1aaa0a5a9da4e576
 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -87,6 +87,7 @@ enum pragma_kind {
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
   PRAGMA_UNROLL,
+  PRAGMA_NOVECTOR,
 
   PRAGMA_FIRST_EXTERNAL
 };
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 
0d2b333cebbed32423d5dc6fd2a3ac0ce0bf8b94..848a850b8e123ff1c6ae1ec4b7f8ccbd599b1a88
 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1862,6 +1862,10 @@ init_pragma (void)
 cpp_register_deferred_pragma (parse_in, "GCC", "unroll", PRAGMA_UNROLL,
  false, false);
 
+  if (!flag_preprocess_only)
+cpp_register_deferred_pragma (parse_in, "GCC", "novector", PRAGMA_NOVECTOR,
+ false, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 
24a6eb6e4596f32c477e3f1c3f98b9792f7bc92c..74f3cbb0d61b5f4c0eb300672f495dde3f1517f7
 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1572,9 +1572,11 @@ static tree c_parser_c99_block_statement (c_parser *, 
bool *,
  location_t * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec *);
 static void c_parser_switch_statement (c_parser *, bool *);
-static void c_parser_while_statement (c_parser *, bool, unsigned short, bool 
*);
-static void c_parser_do_statement (c_parser *, bool, unsigned short);
-static void c_parser_for_statement (c_parser *, bool, unsigned short, bool *);
+static void c_parser_while_statement (c_parser *, bool, unsigned short, bool,
+ bool *);
+static void c_parser_do_statement (c_parser *, bool, unsigned short, bool);
+static void c_parser_for_statement (c_parser *, bool, unsigned short, bool,
+   bool *);
 static tree c_parser_asm_statement (c_parser *);
 static tree c_parser_asm_operands (c_parser *);
 static tree c_parser_asm_goto_operands (c_parser *);
@@ -6644,13 +6646,13 @@ c_parser_statement_after_labels (c_parser *parser, bool 
*if_p,
  c_parser_switch_statement (parser, if_p);
  break;
case RID_WHILE:
- c_parser_while_statement (parser, false, 0, if_p);
+ c_parser_while_statement (parser, false, 0, false, if_p);
  break;
case RID_DO:
- c_parser_do_statement (parser, false, 0);
+ c_parser_do_statement (parser, false, 0, false);
  break;
case RID_FOR:
- c_parser_for_statement (parser, false, 0, if_p);
+ c_parser_for_statement (parser, false, 0, false, if_p);
  break;
case RID_GOTO:
  c_parser_consume_token (parser);
@@ -7146,7 +7148,7 @@ c_parser_switch_statement (c_parser *parser, bool *if_p)
 
 static void
 c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
- bool *if_p)
+ bool novector, bool *if_p)
 {
   tree block, cond, body;
   unsigned char save_in_statement;
@@ -7168,6 +7170,11 @@ c_parser_while_statement (c_parser *parser, bool ivdep, 
unsigned short unroll,
   build_int_cst (integer_type_node,
  annot_expr_unroll_kind),
   build_int_cst (integer_type_node, unroll));
+  if (novector && cond != error_mark_node)
+cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+  build_int_cst 

RE: [PATCH 1/2][frontend] Add novector C++ pragma

2023-07-26 Thread Tamar Christina via Gcc-patches
> > +
> > +   cp_token *tok = pragma_tok;
> > +
> > +   do
> >   {
> > -   tok = cp_lexer_consume_token (parser->lexer);
> > -   ivdep = cp_parser_pragma_ivdep (parser, tok);
> > -   tok = cp_lexer_peek_token (the_parser->lexer);
> > +   switch (cp_parser_pragma_kind (tok))
> > + {
> > +   case PRAGMA_IVDEP:
> > + {
> > +   if (tok != pragma_tok)
> > + tok = cp_lexer_consume_token (parser->lexer);
> > +   ivdep = cp_parser_pragma_ivdep (parser, tok);
> > +   tok = cp_lexer_peek_token (the_parser->lexer);
> > +   break;
> > + }
> > +   case PRAGMA_UNROLL:
> > + {
> > +   if (tok != pragma_tok)
> > + tok = cp_lexer_consume_token (parser->lexer);
> > +   unroll = cp_parser_pragma_unroll (parser, tok);
> > +   tok = cp_lexer_peek_token (the_parser->lexer);
> > +   break;
> > + }
> > +   case PRAGMA_NOVECTOR:
> > + {
> > +   if (tok != pragma_tok)
> > + tok = cp_lexer_consume_token (parser->lexer);
> > +   novector = cp_parser_pragma_novector (parser, tok);
> > +   tok = cp_lexer_peek_token (the_parser->lexer);
> > +   break;
> > + }
> > +   default:
> > + gcc_unreachable ();
> 
> This unreachable seems to assert that if a pragma follows one of these
> pragmas, it must be another one of these pragmas?  That seems wrong;
> instead of hitting gcc_unreachable() in that case we should fall through to 
> the
> diagnostic below.
> 

Ah, good should. Since it has to exit two levels I had to introduce a bool
for controlling the loop iterations.  New patch below.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/cp/ChangeLog:

* cp-tree.h (RANGE_FOR_NOVECTOR): New.
(cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Add novector param.
* init.cc (build_vec_init): Default novector to false.
* method.cc (build_comparison_op): Likewise.
* parser.cc (cp_parser_statement): Likewise.
(cp_parser_for, cp_parser_c_for, cp_parser_range_for,
cp_convert_range_for, cp_parser_iteration_statement,
cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
(cp_parser_pragma_novector): New.
* pt.cc (tsubst_expr): Likewise.
* semantics.cc (finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Likewise.

gcc/ChangeLog:

* doc/extend.texi: Document it.

gcc/testsuite/ChangeLog:

* g++.dg/vect/vect.exp (support vect- prefix).
* g++.dg/vect/vect-novector-pragma.cc: New test.

--- inline copy of patch ---

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 
8398223311194837441107cb335d497ff5f5ec1c..bece7bff1f01a23cfc94386fd3295a0be8c462fe
 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5377,6 +5377,7 @@ get_vec_init_expr (tree t)
 #define RANGE_FOR_UNROLL(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 4)
 #define RANGE_FOR_INIT_STMT(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 5)
 #define RANGE_FOR_IVDEP(NODE)  TREE_LANG_FLAG_6 (RANGE_FOR_STMT_CHECK (NODE))
+#define RANGE_FOR_NOVECTOR(NODE) TREE_LANG_FLAG_5 (RANGE_FOR_STMT_CHECK (NODE))
 
 /* STMT_EXPR accessor.  */
 #define STMT_EXPR_STMT(NODE)   TREE_OPERAND (STMT_EXPR_CHECK (NODE), 0)
@@ -7286,7 +7287,7 @@ extern bool maybe_clone_body  (tree);
 
 /* In parser.cc */
 extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool,
- unsigned short);
+ unsigned short, bool);
 extern void cp_convert_omp_range_for (tree &, vec *, tree &,
  tree &, tree &, tree &, tree &, tree &);
 extern void cp_finish_omp_range_for (tree, tree);
@@ -7609,16 +7610,19 @@ extern void begin_else_clause   (tree);
 extern void finish_else_clause (tree);
 extern void finish_if_stmt (tree);
 extern tree begin_while_stmt   (void);
-extern void finish_while_stmt_cond (tree, tree, bool, unsigned short);
+extern void finish_while_stmt_cond (tree, tree, bool, unsigned short,
+bool);
 extern void finish_while_stmt  (tree);
 extern tree begin_do_stmt  (void);
 extern void finish_do_body (tree);
-extern void finish_do_stmt (tree, tree, bool, unsigned short);
+extern void finish_do_stmt (tree, tree, bool, unsigned short,
+bool);
 extern tree finish_return_stmt (tree);
 extern tree begin_for_scope(tree *);
 extern tree begin_for_stmt (tree, tree);
 

[PATCH 2/2][frontend]: Add novector C pragma

2023-07-19 Thread Tamar Christina via Gcc-patches
Hi All,

FORTRAN currently has a pragma NOVECTOR for indicating that vectorization should
not be applied to a particular loop.

ICC/ICX also has such a pragma for C and C++ called #pragma novector.

As part of this patch series I need a way to easily turn off vectorization of
particular loops, particularly for testsuite reasons.

This patch proposes a #pragma GCC novector that does the same for C
as gfortan does for FORTRAN and what ICX/ICX does for C.

I added only some basic tests here, but the next patch in the series uses this
in the testsuite in about ~800 tests.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/c-family/ChangeLog:

* c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
* c-pragma.cc (init_pragma): Use it.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_while_statement, c_parser_do_statement,
c_parser_for_statement, c_parser_statement_after_labels,
c_parse_pragma_novector, c_parser_pragma): Wire through novector and
default to false.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-novector-pragma.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 
9cc95ab3ee376628dbef2485b84e6008210fa8fc..99cf2e8bd1c05537c198470f1aaa0a5a9da4e576
 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -87,6 +87,7 @@ enum pragma_kind {
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
   PRAGMA_UNROLL,
+  PRAGMA_NOVECTOR,
 
   PRAGMA_FIRST_EXTERNAL
 };
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 
0d2b333cebbed32423d5dc6fd2a3ac0ce0bf8b94..848a850b8e123ff1c6ae1ec4b7f8ccbd599b1a88
 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1862,6 +1862,10 @@ init_pragma (void)
 cpp_register_deferred_pragma (parse_in, "GCC", "unroll", PRAGMA_UNROLL,
  false, false);
 
+  if (!flag_preprocess_only)
+cpp_register_deferred_pragma (parse_in, "GCC", "novector", PRAGMA_NOVECTOR,
+ false, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 
24a6eb6e4596f32c477e3f1c3f98b9792f7bc92c..4c64d898cddac437958ce20c5603b88a05a99093
 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1572,9 +1572,11 @@ static tree c_parser_c99_block_statement (c_parser *, 
bool *,
  location_t * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec *);
 static void c_parser_switch_statement (c_parser *, bool *);
-static void c_parser_while_statement (c_parser *, bool, unsigned short, bool 
*);
-static void c_parser_do_statement (c_parser *, bool, unsigned short);
-static void c_parser_for_statement (c_parser *, bool, unsigned short, bool *);
+static void c_parser_while_statement (c_parser *, bool, unsigned short, bool,
+ bool *);
+static void c_parser_do_statement (c_parser *, bool, unsigned short, bool);
+static void c_parser_for_statement (c_parser *, bool, unsigned short, bool,
+   bool *);
 static tree c_parser_asm_statement (c_parser *);
 static tree c_parser_asm_operands (c_parser *);
 static tree c_parser_asm_goto_operands (c_parser *);
@@ -6644,13 +6646,13 @@ c_parser_statement_after_labels (c_parser *parser, bool 
*if_p,
  c_parser_switch_statement (parser, if_p);
  break;
case RID_WHILE:
- c_parser_while_statement (parser, false, 0, if_p);
+ c_parser_while_statement (parser, false, 0, false, if_p);
  break;
case RID_DO:
- c_parser_do_statement (parser, false, 0);
+ c_parser_do_statement (parser, false, 0, false);
  break;
case RID_FOR:
- c_parser_for_statement (parser, false, 0, if_p);
+ c_parser_for_statement (parser, false, 0, false, if_p);
  break;
case RID_GOTO:
  c_parser_consume_token (parser);
@@ -7146,7 +7148,7 @@ c_parser_switch_statement (c_parser *parser, bool *if_p)
 
 static void
 c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
- bool *if_p)
+ bool novector, bool *if_p)
 {
   tree block, cond, body;
   unsigned char save_in_statement;
@@ -7168,6 +7170,11 @@ c_parser_while_statement (c_parser *parser, bool ivdep, 
unsigned short unroll,
   build_int_cst (integer_type_node,
  annot_expr_unroll_kind),
   build_int_cst (integer_type_node, unroll));
+  if (novector && cond != error_mark_node)
+cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+  build_int_cst (integer_type_node,
+ annot_expr_no_vector_kind),
+  integer_zero_node);
   

[PATCH 1/2][frontend] Add novector C++ pragma

2023-07-19 Thread Tamar Christina via Gcc-patches
Hi All,

FORTRAN currently has a pragma NOVECTOR for indicating that vectorization should
not be applied to a particular loop.

ICC/ICX also has such a pragma for C and C++ called #pragma novector.

As part of this patch series I need a way to easily turn off vectorization of
particular loops, particularly for testsuite reasons.

This patch proposes a #pragma GCC novector that does the same for C++
as gfortan does for FORTRAN and what ICX/ICX does for C++.

I added only some basic tests here, but the next patch in the series uses this
in the testsuite in about ~800 tests.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/cp/ChangeLog:

* cp-tree.def (RANGE_FOR_STMT): Update comment.
* cp-tree.h (RANGE_FOR_NOVECTOR): New.
(cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Add novector param.
* init.cc (build_vec_init): Default novector to false.
* method.cc (build_comparison_op): Likewise.
* parser.cc (cp_parser_statement): Likewise.
(cp_parser_for, cp_parser_c_for, cp_parser_range_for,
cp_convert_range_for, cp_parser_iteration_statement,
cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
(cp_parser_pragma_novector): New.
* pt.cc (tsubst_expr): Likewise.
* semantics.cc (finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Likewise.

gcc/ChangeLog:

* doc/extend.texi: Document it.

gcc/testsuite/ChangeLog:

* g++.dg/vect/vect.exp (support vect- prefix).
* g++.dg/vect/vect-novector-pragma.cc: New test.

--- inline copy of patch -- 
diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def
index 
0e66ca70e00caa1dc4beada1024ace32954e2aaf..c13c8ea98a523c4ef1c55a11e02d5da9db7e367e
 100644
--- a/gcc/cp/cp-tree.def
+++ b/gcc/cp/cp-tree.def
@@ -305,8 +305,8 @@ DEFTREECODE (IF_STMT, "if_stmt", tcc_statement, 4)
 
 /* Used to represent a range-based `for' statement. The operands are
RANGE_FOR_DECL, RANGE_FOR_EXPR, RANGE_FOR_BODY, RANGE_FOR_SCOPE,
-   RANGE_FOR_UNROLL, and RANGE_FOR_INIT_STMT, respectively.  Only used in
-   templates.  */
+   RANGE_FOR_UNROLL, RANGE_FOR_NOVECTOR and RANGE_FOR_INIT_STMT,
+   respectively.  Only used in templates.  */
 DEFTREECODE (RANGE_FOR_STMT, "range_for_stmt", tcc_statement, 6)
 
 /* Used to represent an expression statement.  Use `EXPR_STMT_EXPR' to
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 
8398223311194837441107cb335d497ff5f5ec1c..bece7bff1f01a23cfc94386fd3295a0be8c462fe
 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5377,6 +5377,7 @@ get_vec_init_expr (tree t)
 #define RANGE_FOR_UNROLL(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 4)
 #define RANGE_FOR_INIT_STMT(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 5)
 #define RANGE_FOR_IVDEP(NODE)  TREE_LANG_FLAG_6 (RANGE_FOR_STMT_CHECK (NODE))
+#define RANGE_FOR_NOVECTOR(NODE) TREE_LANG_FLAG_5 (RANGE_FOR_STMT_CHECK (NODE))
 
 /* STMT_EXPR accessor.  */
 #define STMT_EXPR_STMT(NODE)   TREE_OPERAND (STMT_EXPR_CHECK (NODE), 0)
@@ -7286,7 +7287,7 @@ extern bool maybe_clone_body  (tree);
 
 /* In parser.cc */
 extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool,
- unsigned short);
+ unsigned short, bool);
 extern void cp_convert_omp_range_for (tree &, vec *, tree &,
  tree &, tree &, tree &, tree &, tree &);
 extern void cp_finish_omp_range_for (tree, tree);
@@ -7609,16 +7610,19 @@ extern void begin_else_clause   (tree);
 extern void finish_else_clause (tree);
 extern void finish_if_stmt (tree);
 extern tree begin_while_stmt   (void);
-extern void finish_while_stmt_cond (tree, tree, bool, unsigned short);
+extern void finish_while_stmt_cond (tree, tree, bool, unsigned short,
+bool);
 extern void finish_while_stmt  (tree);
 extern tree begin_do_stmt  (void);
 extern void finish_do_body (tree);
-extern void finish_do_stmt (tree, tree, bool, unsigned short);
+extern void finish_do_stmt (tree, tree, bool, unsigned short,
+bool);
 extern tree finish_return_stmt (tree);
 extern tree begin_for_scope(tree *);
 extern tree begin_for_stmt (tree, tree);
 extern void finish_init_stmt   (tree);
-extern void finish_for_cond(tree, tree, bool, unsigned short);
+extern void finish_for_cond(tree, tree, bool, unsigned short,
+bool);
 extern void finish_for_expr(tree, tree);
 extern void finish_for_stmt(tree);
 extern tree begin_range_for_stmt 

[PATCH]AArch64 fix regexp for live_1.c sve test

2023-07-18 Thread Tamar Christina via Gcc-patches
Hi All,

The resulting predicate register of a whilelo is not
restricted to the lower half of the predicate register file.

As such these tests started failing after recent changes
because the whilelo outside the loop is getting assigned p15.

This widens the regexp.

Tested on aarch64-none-linux-gnu and passes again.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/live_1.c: Update assembly.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
index 
80ee176d1807bf628ad47551d69ff5d84deda79e..2db6c3c209a9514646e92628f3d2dd58d466539c
 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
@@ -27,10 +27,10 @@
 
 TEST_ALL (EXTRACT_LAST)
 
-/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].b, } 2 } } */
-/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].h, } 4 } } */
-/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].s, } 4 } } */
-/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].d, } 4 } } */
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.b, } 2 } } */
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, } 4 } } */
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.s, } 4 } } */
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.d, } 4 } } */
 
 /* { dg-final { scan-assembler-times {\tlastb\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 
1 } } */
 /* { dg-final { scan-assembler-times {\tlastb\th[0-9]+, p[0-7], z[0-9]+\.h\n} 
2 } } */




-- 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
index 
80ee176d1807bf628ad47551d69ff5d84deda79e..2db6c3c209a9514646e92628f3d2dd58d466539c
 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/live_1.c
@@ -27,10 +27,10 @@
 
 TEST_ALL (EXTRACT_LAST)
 
-/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].b, } 2 } } */
-/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].h, } 4 } } */
-/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].s, } 4 } } */
-/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7].d, } 4 } } */
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.b, } 2 } } */
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.h, } 4 } } */
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.s, } 4 } } */
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-9]+.d, } 4 } } */
 
 /* { dg-final { scan-assembler-times {\tlastb\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 
1 } } */
 /* { dg-final { scan-assembler-times {\tlastb\th[0-9]+, p[0-7], z[0-9]+\.h\n} 
2 } } */





RE: [PATCH v2] tree-optimization/110279- Check for nested FMA chains in reassoc

2023-07-17 Thread Tamar Christina via Gcc-patches
I think Andrew is listed as maintainer for tree-ssa, or maybe it's on one of 
the Richard's lists?

> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Philipp
> Tomsich
> Sent: Tuesday, July 11, 2023 7:51 AM
> To: Jakub Jelinek 
> Cc: gcc-patches@gcc.gnu.org; Di Zhao OS
> 
> Subject: Re: [PATCH v2] tree-optimization/110279- Check for nested FMA
> chains in reassoc
> 
> Jakub,
> 
> it looks like you did a lot of work on reassoc in the past — could you have a
> quick look and comment?
> 
> Thanks,
> Philipp.
> 
> 
> On Tue, 11 Jul 2023 at 04:59, Di Zhao OS
>  wrote:
> >
> > Attached is an updated version of the patch.
> >
> > Based on Philipp's review, some changes:
> >
> > 1. Defined new enum fma_state to describe the state of FMA candidates
> >for a list of operands. (Since the tests seems simple after the
> >change, I didn't add predicates on it.) 2. Changed return type of
> > convert_mult_to_fma_1 and convert_mult_to_fma
> >to tree, to remove the in/out parameter.
> > 3. Added description of return value values of rank_ops_for_fma.
> >
> > ---
> > gcc/ChangeLog:
> >
> > * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Added new
> parameter
> > check_only_p. Changed return type to tree.
> > (struct fma_transformation_info): Moved to header.
> > (class fma_deferring_state): Moved to header.
> > (convert_mult_to_fma): Added new parameter check_only_p. Changed
> > return type to tree.
> > * tree-ssa-math-opts.h (struct fma_transformation_info): Moved from
> .cc.
> > (class fma_deferring_state): Moved from .cc.
> > (convert_mult_to_fma): Add function decl.
> > * tree-ssa-reassoc.cc (enum fma_state): Defined new enum to describe
> > the state of FMA candidates for a list of operands.
> > (rewrite_expr_tree_parallel): Changed boolean parameter to enum 
> > type.
> > (rank_ops_for_fma): Return enum fma_state.
> > (reassociate_bb): Avoid rewriting to parallel if nested FMAs are 
> > found.
> >
> > Thanks,
> > Di Zhao
> >
> >


RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-07-17 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Biener 
> Sent: Friday, July 14, 2023 2:35 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV
> updates for early break.
> 
> On Thu, 13 Jul 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Thursday, July 13, 2023 6:31 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ;
> j...@ventanamicro.com
> > > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> > > updates for early break.
> > >
> > > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch updates the peeling code to maintain LCSSA during peeling.
> > > > The rewrite also naturally takes into account multiple exits and so it 
> > > > didn't
> > > > make sense to split them off.
> > > >
> > > > For the purposes of peeling the only change for multiple exits is that 
> > > > the
> > > > secondary exits are all wired to the start of the new loop preheader 
> > > > when
> > > doing
> > > > epilogue peeling.
> > > >
> > > > When doing prologue peeling the CFG is kept in tact.
> > > >
> > > > For both epilogue and prologue peeling we wire through between the
> two
> > > loops any
> > > > PHI nodes that escape the first loop into the second loop if flow_loops 
> > > > is
> > > > specified.  The reason for this conditionality is because
> > > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 
> > > > ways:
> > > >   - prologue peeling
> > > >   - epilogue peeling
> > > >   - loop distribution
> > > >
> > > > for the last case the loops should remain independent, and so not be
> > > connected.
> > > > Because of this propagation of only used phi nodes get_current_def can
> be
> > > used
> > > > to easily find the previous definitions.  However live statements that 
> > > > are
> > > > not used inside the loop itself are not propagated (since if unused, the
> > > moment
> > > > we add the guard in between the two loops the value across the bypass
> edge
> > > can
> > > > be wrong if the loop has been peeled.)
> > > >
> > > > This is dealt with easily enough in find_guard_arg.
> > > >
> > > > For multiple exits, while we are in LCSSA form, and have a correct DOM
> tree,
> > > the
> > > > moment we add the guard block we will change the dominators again.  To
> > > deal with
> > > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the
> blocks
> > > to
> > > > update without having to recompute the list of blocks to update again.
> > > >
> > > > When multiple exits and doing epilogue peeling we will also temporarily
> have
> > > an
> > > > incorrect VUSES chain for the secondary exits as it anticipates the 
> > > > final
> result
> > > > after the VDEFs have been moved.  This will thus be corrected once the
> code
> > > > motion is applied.
> > > >
> > > > Lastly by doing things this way we can remove the helper functions that
> > > > previously did lock step iterations to update things as it went along.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > > Not sure if I get through all of this in one go - so be prepared that
> > > the rest of the review follows another day.
> >
> > No worries, I appreciate the reviews!
> > Just giving some quick replies for when you continue.
> 
> Continueing.
> 
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * tree-loop-distribution.cc (copy_loop_before): Pass flow_loops 
> > > > =
> > > false.
> > > > * tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when 
> > > > exit==null.
> > > > * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> > > additional
> > > > assert.
> > > > (vect_set_loop_condition_normal): Skip modifying loop IV for 
> > > > multiple
> > > > exits.
> > > > (slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit
> > > peeling.
> > > > (slpeel_can_duplicate_loop_p): Likewise.
> > > > (vect_update_ivs_after_vectorizer): Don't enter this...
> > > > (vect_update_ivs_after_early_break): ...but instead enter here.
> > > > (find_guard_arg): Update for new peeling code.
> > > > (slpeel_update_phi_nodes_for_loops): Remove.
> > > > (slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0
> > > checks.
> > > > (slpeel_update_phi_nodes_for_lcssa): Remove.
> > > > (vect_do_peeling): Fix VF for multiple exits and force epilogue.
> > > > * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > > > non_break_control_flow and early_breaks.
> > > > (vect_need_peeling_or_partial_vectors_p): Force partial vector 
> > > > if
> > > > multiple exits and VLA.
> > > > (vect_analyze_loop_form): Support inner loop multiple exits.

RE: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if-conv.cc.

2023-07-17 Thread Tamar Christina via Gcc-patches
> On Mon, Jul 17, 2023 at 12:21 AM Tamar Christina via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Monday, July 17, 2023 7:19 AM
> > > To: Roger Sayle 
> > > Cc: gcc-patches@gcc.gnu.org; Tamar Christina
> > > 
> > > Subject: Re: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if-
> conv.cc.
> > >
> > > On Fri, Jul 14, 2023 at 8:56 PM Roger Sayle
> > > 
> > > wrote:
> > > >
> > > >
> > > >
> > > > This patch fixes the bootstrap failure I'm seeing using gcc 4.8.5
> > > > as
> > > >
> > > > the host compiler.  Ok for mainline?  [I might be missing
> > > > something]
> > >
> > > OK.   Btw, while I didn't spot this during review I would appreciate
> > > if the code could use vec.[q]sort, this should work with a lambda as
> > > well I think.
> >
> > That was my first use, but that hits
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99469
> 
> That is not hitting PR 99469 but rather it means your comparison is not
> correct for an (unstable) sort.
> That is qsort comparator should have this relationship `f(a,b) == !f(b, a)` 
> and
> `f(a,a)` should also return false.

I'm using the standard std::pair comparator which indicates that f(a,a) is true,
https://en.cppreference.com/w/cpp/utility/pair/operator_cmp 

> If you are running into this for qsort here, you will most likely run into 
> issues
> with std::sort later on too.

Don't see why or how. It needs to have a consistent relationship which std::pair
maintains.  So why would using the standard tuple comparator with a standard
std::sort cause problem?

Thanks,
Tamar

> 
> Thanks,
> Andrew
> 
> >
> > Regards,
> > Tamar


RE: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if-conv.cc.

2023-07-17 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Monday, July 17, 2023 7:19 AM
> To: Roger Sayle 
> Cc: gcc-patches@gcc.gnu.org; Tamar Christina 
> Subject: Re: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in 
> tree-if-conv.cc.
> 
> On Fri, Jul 14, 2023 at 8:56 PM Roger Sayle 
> wrote:
> >
> >
> >
> > This patch fixes the bootstrap failure I'm seeing using gcc 4.8.5 as
> >
> > the host compiler.  Ok for mainline?  [I might be missing something]
> 
> OK.   Btw, while I didn't spot this during review I would appreciate
> if the code could use vec.[q]sort, this should work with a lambda as well I
> think.

That was my first use, but that hits 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99469

Regards,
Tamar


RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-07-13 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, July 13, 2023 6:31 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> updates for early break.
> 
> On Wed, 28 Jun 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch updates the peeling code to maintain LCSSA during peeling.
> > The rewrite also naturally takes into account multiple exits and so it 
> > didn't
> > make sense to split them off.
> >
> > For the purposes of peeling the only change for multiple exits is that the
> > secondary exits are all wired to the start of the new loop preheader when
> doing
> > epilogue peeling.
> >
> > When doing prologue peeling the CFG is kept in tact.
> >
> > For both epilogue and prologue peeling we wire through between the two
> loops any
> > PHI nodes that escape the first loop into the second loop if flow_loops is
> > specified.  The reason for this conditionality is because
> > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways:
> >   - prologue peeling
> >   - epilogue peeling
> >   - loop distribution
> >
> > for the last case the loops should remain independent, and so not be
> connected.
> > Because of this propagation of only used phi nodes get_current_def can be
> used
> > to easily find the previous definitions.  However live statements that are
> > not used inside the loop itself are not propagated (since if unused, the
> moment
> > we add the guard in between the two loops the value across the bypass edge
> can
> > be wrong if the loop has been peeled.)
> >
> > This is dealt with easily enough in find_guard_arg.
> >
> > For multiple exits, while we are in LCSSA form, and have a correct DOM tree,
> the
> > moment we add the guard block we will change the dominators again.  To
> deal with
> > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the blocks
> to
> > update without having to recompute the list of blocks to update again.
> >
> > When multiple exits and doing epilogue peeling we will also temporarily have
> an
> > incorrect VUSES chain for the secondary exits as it anticipates the final 
> > result
> > after the VDEFs have been moved.  This will thus be corrected once the code
> > motion is applied.
> >
> > Lastly by doing things this way we can remove the helper functions that
> > previously did lock step iterations to update things as it went along.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> Not sure if I get through all of this in one go - so be prepared that
> the rest of the review follows another day.

No worries, I appreciate the reviews!
Just giving some quick replies for when you continue.

> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-loop-distribution.cc (copy_loop_before): Pass flow_loops =
> false.
> > * tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when exit==null.
> > * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> additional
> > assert.
> > (vect_set_loop_condition_normal): Skip modifying loop IV for multiple
> > exits.
> > (slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit
> peeling.
> > (slpeel_can_duplicate_loop_p): Likewise.
> > (vect_update_ivs_after_vectorizer): Don't enter this...
> > (vect_update_ivs_after_early_break): ...but instead enter here.
> > (find_guard_arg): Update for new peeling code.
> > (slpeel_update_phi_nodes_for_loops): Remove.
> > (slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0
> checks.
> > (slpeel_update_phi_nodes_for_lcssa): Remove.
> > (vect_do_peeling): Fix VF for multiple exits and force epilogue.
> > * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > non_break_control_flow and early_breaks.
> > (vect_need_peeling_or_partial_vectors_p): Force partial vector if
> > multiple exits and VLA.
> > (vect_analyze_loop_form): Support inner loop multiple exits.
> > (vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > (vect_create_epilog_for_reduction):  Update live phi nodes.
> > (vectorizable_live_operation): Ignore live operations in vector loop
> > when multiple exits.
> > (vect_transform_loop): Force unrolling for VF loops and multiple exits.
> > * tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze ctrl statements.
> > (vect_mark_stmts_to_be_vectorized): Check for non-exit control flow
> and
> > analyze gcond params.
> > (vect_analyze_stmt): Support gcond.
> > * tree-vectorizer.cc (pass_vectorize::execute): Support multiple exits
> > in RPO pass.
> > * tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
> > (LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_GENERAL_CTR_FLOW):
> New.
> > (loop_vec_info_for_loop): Change to const and static.
> > (is_loop_header_bb_p): Drop assert.
> > 

RE: [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits.

2023-07-13 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, July 13, 2023 12:49 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: Re: [PATCH 8/19]middle-end: updated niters analysis to handle
> multiple exits.
> 
> On Wed, 28 Jun 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > For early break vectorization we have to update niters analysis to
> > record and analyze all exits of the loop, and so all conds.
> >
> > The niters of the loop is still determined by the main/natural exit of
> > the loop as this is the O(n) bounds.  For now we don't do much with
> > the secondary conds, but their assumptions can be used to generate
> versioning checks later.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> I probably confused vec_init_exit_info in the previous patch - that said, I'm
> missing a clear function that determines the natural exit of the original (if-
> converted) scalar loop.  As vec_init_exit_info seems to (re-)compute that I'll
> comment on it here.

Ah was wondering if you'd seen it 

> 
> +  /* The main IV is to be determined by the block that's the first
> reachable
> + block from the latch.  We cannot rely on the order the loop analysis
> + returns and we don't have any SCEV analysis on the loop.  */
> + auto_vec  workset;  workset.safe_push (loop_latch_edge (loop));
> + hash_set  visited;
> +
> +  while (!workset.is_empty ())
> +{
> +  edge e = workset.pop ();
> +  if (visited.contains (e))
> +   continue;
> +
> +  bool found_p = false;
> +  for (edge ex : e->src->succs)
> +   {
> + if (exits.contains (ex))
> +   {
> + found_p = true;
> + e = ex;
> + break;
> +   }
> +   }
> +
> +  if (found_p)
> +   {
> + loop->vec_loop_iv = e;
> + for (edge ex : exits)
> +   if (e != ex)
> + loop->vec_loop_alt_exits.safe_push (ex);
> + return;
> +   }
> +  else
> +   {
> + for (edge ex : e->src->preds)
> +   workset.safe_insert (0, ex);
> +   }
> +  visited.add (e);
> +}
> 
> So this greedily follows edges from the latch and takes the first exit.  Why's
> that better than simply choosing the first?
> 
> I'd have done
> 
>  auto_vec exits = get_loop_exit_edges (loop);  for (e : exits)
>{
>  if (vect_get_loop_niters (...))
>{
>  if no assumptions use that edge, if assumptions continue
>  searching, maybe ther's an edge w/o assumptions
>}
>}
>  use (first) exit with assumptions
> 
> we probably want to know 'may_be_zero' as well and prefer an edge without
> that.  So eventually call number_of_iterations_exit_assumptions
> directly and look for the best niter_desc and pass that to 
> vect_get_loop_niters
> (or re-do the work).
> 
> As said for "copying" the exit to the loop copies use the block mapping.
> 

The issue is with the scalar loops, where we have no SCEV data and also no
SSA mapping data (from what I can tell, the map was cleared in ifcvt itself).

So for this to work with SCEV, we'd have to start analyzing the loop coming out 
of
LOOP_VINFO_SCALAR_LOOP as well unless I'm missing something?

Regards,
Tamar


RE: [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables

2023-07-13 Thread Tamar Christina via Gcc-patches
> e7ac2b5f3db55de3dbbab7bd2bfe08388f4ec533..cab82d7960e5be517bba2
> 621f7f4
> > 888e7bf3c295 100644
> > --- a/gcc/cfgloop.h
> > +++ b/gcc/cfgloop.h
> > @@ -272,6 +272,14 @@ public:
> >   the basic-block from being collected but its index can still be
> >   reused.  */
> >basic_block former_header;
> > +
> > +  /* The controlling loop IV for the current loop when vectorizing.  This 
> > IV
> > + controls the natural exits of the loop.  */  edge  GTY ((skip
> > + (""))) vec_loop_iv;
> > +
> > +  /* If the loop has multiple exits this structure contains the alternate
> > + exits of the loop which are relevant for vectorization.  */
> > + vec GTY ((skip (""))) vec_loop_alt_exits;
> 
> That's a quite heavy representation and as you say it's vectorizer specific.  
> May
> I ask you to eliminate at _least_ vec_loop_alt_exits?
> Are there not all exits in that vector?  Note there's already the list of 
> exits and if
> you have the canonical counting IV exit you can match against that to get all
> the others?
> 

Sure, though that means some filtering whenever one iterates over the alt exits,
not a problem though.

> >  /* Given LOOP this function generates a new copy of it and puts it
> > on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
> > @@ -1458,13 +1523,15 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> loop *loop,
> >edge exit, new_exit;
> >bool duplicate_outer_loop = false;
> >
> > -  exit = single_exit (loop);
> > +  exit = loop->vec_loop_iv;
> >at_exit = (e == exit);
> >if (!at_exit && e != loop_preheader_edge (loop))
> >  return NULL;
> >
> >if (scalar_loop == NULL)
> >  scalar_loop = loop;
> > +  else
> > +vec_init_exit_info (scalar_loop);
> >
> >bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
> >pbbs = bbs + 1;
> > @@ -1490,13 +1557,17 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class
> loop *loop,
> >bbs[0] = preheader;
> >new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
> >
> > -  exit = single_exit (scalar_loop);
> > +  exit = scalar_loop->vec_loop_iv;
> >copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs,
> > , 1, _exit, NULL,
> > at_exit ? loop->latch : e->src, true);
> > -  exit = single_exit (loop);
> > +  exit = loop->vec_loop_iv;
> >basic_block new_preheader = new_bbs[0];
> >
> > +  /* Record the new loop exit information.  new_loop doesn't have SCEV
> data and
> > + so we must initialize the exit information.  */
> > +  vec_init_exit_info (new_loop);
> > +
> 
> You have a mapping of old to new BB so you should be able to
> map old to new exit by mapping e->src/dest and looking up the new edge?
> 
> The vec_loop_iv exit is mapped directly (new_exit).
> 
> So I don't really understand what's missing there.

But I don't have the mapping when the loop as versioned, e.g. by ifcvt.  So in 
the cases
where scalar_loop != loop in which case I still need them to match up.

vect_loop_form_info is destroyed after analysis though and is not available 
during
peeling. That's why we copy relevant information out in vect_create_loop_vinfo.

But in general we only have 1 per loop as well, so it would be the same as 
using loop_vinfo.

I could move it into loop_vinfo and then require you to pass the edges to the 
peeling function
as you mentioned.  This would solve the location we place them in, but still 
not sure what to do
about versioned loops.  Would need to get its main edge "somewhere", would 
another field in
loop_vinfo be ok?

Cheers,
Tamar

> > +  if (!loop->vec_loop_iv)
> > +return opt_result::failure_at (vect_location,
> > +  "not vectorized:"
> > +  " could not determine main exit from"
> > +  " loop with multiple exits.\n");
> > +
> >/* Different restrictions apply when we are considering an inner-most 
> > loop,
> >   vs. an outer (nested) loop.
> >   (FORNOW. May want to relax some of these restrictions in the future). 
> >  */
> > @@ -3025,9 +3032,8 @@ start_over:
> >if (dump_enabled_p ())
> >  dump_printf_loc (MSG_NOTE, vect_location, "epilog loop 
> > required\n");
> >if (!vect_can_advance_ivs_p (loop_vinfo)
> > - || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> > -  single_exit (LOOP_VINFO_LOOP
> > -(loop_vinfo
> > + || !slpeel_can_duplicate_loop_p (loop_vinfo,
> > +  LOOP_VINFO_IV_EXIT (loop_vinfo)))
> >  {
> >   ok = opt_result::failure_at (vect_location,
> >"not vectorized: can't create required "
> > @@ -5964,7 +5970,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> loop_vinfo,
> >   Store them in NEW_PHIS.  */
> >if (double_reduc)
> >  loop = outer_loop;
> > -  exit_bb = single_exit (loop)->dest;
> > +  

RE: [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds

2023-07-10 Thread Tamar Christina via Gcc-patches
> > -  *type_out = STMT_VINFO_VECTYPE (stmt_info);
> > +  if (cond_cst)
> > +{
> > +  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
> > +  pattern_stmt
> > +   = gimple_build_cond (gimple_cond_code (cond_stmt),
> > +gimple_get_lhs (pattern_stmt),
> > +fold_convert (ret_type, cond_cst),
> > +gimple_cond_true_label (cond_stmt),
> > +gimple_cond_false_label (cond_stmt));
> > +  *type_out = STMT_VINFO_VECTYPE (stmt_info);
> 
> is there any vectype set for a gcond?

No, because gconds can't be codegen'd yet, atm we must replace the original
gcond when generating code.

However looking at the diff this code, don't think the else is needed here.
Testing an updated patch.

> 
> I must say the flow of the function is a bit convoluted now.  Is it possible 
> to
> factor out a helper so we can fully separate the gassign vs. gcond handling in
> this function?

I am not sure, the only place the changes are are at the start (e.g. how we 
determine bf_stmt)
and how we determine ret_type, and when determining shift_first for the single 
use case.

Now I can't move the ret_type anywhere as I need to decompose bf_stmt first.  
And the shift_first
can be simplified by moving it up into the part that determined bf_stmt, but 
then we walk the
immediate uses even on cases where we early exit.  Which seems inefficient.

Then there's the final clause which just generates an additional gcond if the 
original statement was
a gcond. But not sure that'll help, since it's just something done *in 
addition* to the normal assign.

So there doesn't seem to be enough, or big enough divergence to justify a 
split.   I have however made
an attempt at cleaning it up a bit, is this one better?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_init_pattern_stmt): Copy STMT_VINFO_TYPE
from original statement.
(vect_recog_bitfield_ref_pattern): Support bitfields in gcond.

Co-Authored-By:  Andre Vieira 

--- inline copy of patch ---

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
60bc9be6819af9bd28a81430869417965ba9d82d..b842f7d983405cd04f6760be7d91c1f55b30aac4
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -128,6 +128,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple 
*pattern_stmt,
   STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
   STMT_VINFO_DEF_TYPE (pattern_stmt_info)
 = STMT_VINFO_DEF_TYPE (orig_stmt_info);
+  STMT_VINFO_TYPE (pattern_stmt_info) = STMT_VINFO_TYPE (orig_stmt_info);
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
 {
   gcc_assert (!vectype
@@ -2441,6 +2442,10 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
bf_value = BIT_FIELD_REF (container, bitsize, bitpos);
result = (type_out) bf_value;
 
+   or
+
+   if (BIT_FIELD_REF (container, bitsize, bitpos) `cmp` )
+
where type_out is a non-bitfield type, that is to say, it's precision 
matches
2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)).
 
@@ -2450,6 +2455,10 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
here it starts with:
result = (type_out) bf_value;
 
+   or
+
+   if (BIT_FIELD_REF (container, bitsize, bitpos) `cmp` )
+
Output:
 
* TYPE_OUT: The vector type of the output of this pattern.
@@ -2482,33 +2491,45 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
 
The shifting is always optional depending on whether bitpos != 0.
 
+   When the original bitfield was inside a gcond then an new gcond is also
+   generated with the newly `result` as the operand to the comparison.
+
 */
 
 static gimple *
 vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 tree *type_out)
 {
-  gassign *first_stmt = dyn_cast  (stmt_info->stmt);
-
-  if (!first_stmt)
-return NULL;
-
-  gassign *bf_stmt;
-  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
-  && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+  gimple *bf_stmt = NULL;
+  tree lhs = NULL_TREE;
+  tree ret_type = NULL_TREE;
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  if (gcond *cond_stmt = dyn_cast  (stmt))
+{
+  tree op = gimple_cond_lhs (cond_stmt);
+  if (TREE_CODE (op) != SSA_NAME)
+   return NULL;
+  bf_stmt = dyn_cast  (SSA_NAME_DEF_STMT (op));
+  if (TREE_CODE (gimple_cond_rhs (cond_stmt)) != INTEGER_CST)
+   return NULL;
+}
+  else if (is_gimple_assign (stmt)
+  && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt))
+  && TREE_CODE (gimple_assign_rhs1 (stmt)) == SSA_NAME)
 {
-  gimple *second_stmt
-   = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
+  gimple *second_stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt));
   bf_stmt = dyn_cast  (second_stmt);
-  if (!bf_stmt
-  

RE: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-07 Thread Tamar Christina via Gcc-patches
Hi Both,

Thanks for all the reviews/patches so far 

> >
> > Looks good, but I wonder what we can do to at least make the multiple
> > exit case behave reasonably?  The vectorizer keeps track
> 
> > of a "canonical" exit, would it be possible to pass in the main exit
> > edge and use that instead of single_exit (), would other exits then
> > behave somewhat reasonable or would we totally screw things up here?
> > That is, the "canonical" exit would be the counting exit while the
> > other exits are on data driven conditions and thus wouldn't change
> > probability when we reduce the number of iterations(?)
> 
> I can add canonical_exit parameter and make the function to direct flow to it 
> if
> possible.  However overall I think fixup depends on what transformation led to
> the change.
> 
> Assuming that vectorizer did no prologues and apilogues and we vectorized
> with factor N, then I think the update could be done more specifically as
> follows.
> 

If it helps, how this patch series addresses multiple exits by forcing a scalar
epilogue, all non canonical_exits would have been redirected to this scalar
epilogue, so the remaining scalar iteration count will be at most VF.

Regards,
Tamar

> We know that header block count dropped by 4. So we can start from that
> and each time we reach basic block with exit edge, we know the original count
> of the edge.  This count is unchanged, so one can rescale probabilities out of
> that BB accordingly.  If loop has no inner loops, we can just walk the body in
> RPO and propagate scales downwards and we sould arrive to right result
> 
> I originally added the bound parameter to handle prologues/epilogues which
> gets new artificial bound.  In prologue I think you are right that the flow 
> will be
> probably directed to the conditional counting iterations.
> 
> In epilogue we add no artificial iteration cap, so maybe it is more realistic 
> to
> simply scale up probability of all exits?
> 
> To see what is going on I tried following testcase:
> 
> int a[99];
> test()
> {
>   for (int i = 0; i < 99; i++)
>   a[i]++;
> }
> 
> What surprises me is that vectorizer at -O2 does nothing and we end up
> unrolling the loop:
> 
> L2:
> addl$1, (%rax)
> addl$1, 4(%rax)
> addl$1, 8(%rax)
> addq$12, %rax
> cmpq$a+396, %rax
> 
> Which seems sily thing to do. Vectorized loop with epilogue doing 2 and
> 1 addition would be better.
> 
> With -O3 we vectorize it:
> 
> 
> .L2:
> movdqa  (%rax), %xmm0
> addq$16, %rax
> paddd   %xmm1, %xmm0
> movaps  %xmm0, -16(%rax)
> cmpq%rax, %rdx
> jne .L2
> movqa+384(%rip), %xmm0
> addl$1, a+392(%rip)
> movq.LC1(%rip), %xmm1
> paddd   %xmm1, %xmm0
> movq%xmm0, a+384(%rip)
> 
> 
> and correctly drop vectorized loop body to 24 iterations. However the
> epilogue has loop for vector size 2 predicted to iterate once (it won't)
> 
> ;;   basic block 7, loop depth 0, count 10737416 (estimated locally), maybe
> hot
> ;;prev block 5, next block 8, flags: (NEW, VISITED)
> ;;pred:   3 [4.0% (adjusted)]  count:10737416 (estimated locally)
> (FALSE_VALUE,EXECUTABLE)
> ;;succ:   8 [always]  count:10737416 (estimated locally)
> (FALLTHRU,EXECUTABLE)
> 
> ;;   basic block 8, loop depth 1, count 21474835 (estimated locally), maybe
> hot
> ;;prev block 7, next block 9, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   9 [always]  count:10737417 (estimated locally)
> (FALLTHRU,DFS_BACK,EXECUTABLE)
> ;;7 [always]  count:10737416 (estimated locally)
> (FALLTHRU,EXECUTABLE)
>   # i_9 = PHI 
>   # ivtmp_13 = PHI 
>   # vectp_a.14_40 = PHI  [(void *) +
> 384B](7)>
>   # vectp_a.18_46 = PHI  [(void *) +
> 384B](7)>
>   # ivtmp_49 = PHI 
>   vect__14.16_42 = MEM  [(int *)vectp_a.14_40];
>   _14 = a[i_9];
>   vect__15.17_44 = vect__14.16_42 + { 1, 1 };
>   _15 = _14 + 1;
>   MEM  [(int *)vectp_a.18_46] = vect__15.17_44;
>   i_17 = i_9 + 1;
>   ivtmp_18 = ivtmp_13 - 1;
>   vectp_a.14_41 = vectp_a.14_40 + 8;
>   vectp_a.18_47 = vectp_a.18_46 + 8;
>   ivtmp_50 = ivtmp_49 + 1;
>   if (ivtmp_50 < 1)
> goto ; [50.00%]
>   else
> goto ; [50.00%]
> 
> and finally the scalar copy
> 
> ;;   basic block 12, loop depth 0, count 10737416 (estimated locally), maybe
> hot
> ;;prev block 9, next block 13, flags: (NEW, VISITED)
> ;;pred:   8 [50.0% (adjusted)]  count:10737418 (estimated locally)
> (FALSE_VALUE,EXECUTABLE)
> ;;succ:   13 [always]  count:10737416 (estimated locally) (FALLTHRU)
> 
> ;;   basic block 13, loop depth 1, count 1063004409 (estimated locally),
> maybe hot
> ;;prev block 12, next block 14, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   14 [always]  count:1052266996 (estimated locally)
> (FALLTHRU,DFS_BACK,EXECUTABLE)
> ;;12 [always]  count:10737416 (estimated locally) (FALLTHRU)
>   # i_30 = 

[PATCH 2/2]middle-end ifcvt: Sort PHI arguments not only occurrences but also complexity [PR109154]

2023-07-07 Thread Tamar Christina via Gcc-patches
Hi All,

This patch builds on the previous patch by fixing another issue with the
way ifcvt currently picks which branches to test.

The issue with the current implementation is while it sorts for
occurrences of the argument, it doesn't check for complexity of the arguments.

As an example:

   [local count: 528603100]:
  ...
  if (distbb_75 >= 0.0)
goto ; [59.00%]
  else
goto ; [41.00%]

   [local count: 216727269]:
  ...
  goto ; [100.00%]

   [local count: 311875831]:
  ...
  if (distbb_75 < iftmp.0_98)
goto ; [20.00%]
  else
goto ; [80.00%]

   [local count: 62375167]:
  ...

   [local count: 528603100]:
  # prephitmp_175 = PHI <_173(18), 0.0(17), _174(16)>

All tree arguments to the PHI have the same number of occurrences, namely 1,
however it makes a big difference which comparison we test first.

Sorting only on occurrences we'll pick the compares coming from BB 18 and BB 17,
This means we end up generating 4 comparisons, while 2 would have been enough.

By keeping track of the "complexity" of the COND in each BB, (i.e. the number
of comparisons needed to traverse from the start [BB 15] to end [BB 19]) and
using a key tuple of  we end up selecting the compare
from BB 16 and BB 18 first.  BB 16 only requires 1 compare, and BB 18, after we
test BB 16 also only requires one additional compare.  This change paired with
the one previous above results in the optimal 2 compares.

For deep nesting, i.e. for

...
  _79 = vr_15 > 20;
  _80 = _68 & _79;
  _82 = vr_15 <= 20;
  _83 = _68 & _82;
  _84 = vr_15 < -20;
  _85 = _73 & _84;
  _87 = vr_15 >= -20;
  _88 = _73 & _87;
  _ifc__111 = _55 ? 10 : 12;
  _ifc__112 = _70 ? 7 : _ifc__111;
  _ifc__113 = _85 ? 8 : _ifc__112;
  _ifc__114 = _88 ? 9 : _ifc__113;
  _ifc__115 = _45 ? 1 : _ifc__114;
  _ifc__116 = _63 ? 3 : _ifc__115;
  _ifc__117 = _65 ? 4 : _ifc__116;
  _ifc__118 = _83 ? 6 : _ifc__117;
  _ifc__119 = _60 ? 2 : _ifc__118;
  _ifc__120 = _43 ? 13 : _ifc__119;
  _ifc__121 = _75 ? 11 : _ifc__120;
  vw_1 = _80 ? 5 : _ifc__121;

Most of the comparisons are still needed because the chain of
occurrences to not negate eachother. i.e. _80 is _73 & vr_15 >= -20 and
_85 is _73 & vr_15 < -20.  clearly given _73 needs to be true in both branches,
the only additional test needed is on vr_15, where the one test is the negation
of the other.  So we don't need to do the comparison of _73 twice.

The changes in the patch reduces the overall number of compares by one, but has
a bigger effect on the dependency chain.

Previously we would generate 5 instructions chain:

cmple   p7.s, p4/z, z29.s, z30.s
cmpne   p7.s, p7/z, z29.s, #0
cmple   p6.s, p7/z, z31.s, z30.s
cmpge   p6.s, p6/z, z27.s, z25.s
cmplt   p15.s, p6/z, z28.s, z21.s

as the longest chain.  With this patch we generate 3:

cmple   p7.s, p3/z, z27.s, z30.s
cmpne   p7.s, p7/z, z27.s, #0
cmpgt   p7.s, p7/z, z31.s, z30.s

and I don't think (x <= y) && (x != 0) && (z > y) can be reduced further.

Bootstrapped and Regtested on aarch64-none-linux-gnu and no issues.

Not sure how to write a non-fragile testcase for this as the
conditionals chosen depends on threading etc. Any Suggestions?

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/109154
* tree-if-conv.cc (INCLUDE_ALGORITHM): Include.
(struct bb_predicate): Add no_predicate_stmts.
(set_bb_predicate): Increase predicate count.
(set_bb_predicate_gimplified_stmts): Conditionally initialize
no_predicate_stmts.
(get_bb_num_predicate_stmts): New.
(init_bb_predicate): Initialzie no_predicate_stmts.
(release_bb_predicate): Cleanup no_predicate_stmts.
(insert_gimplified_predicates): Preserve no_predicate_stmts.

--- inline copy of patch -- 
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 
16b36dd8b0226f796c1a3fc6d45a9059385e812b..0ed50d99c46f99a4d1ea0e827ee2b2a3f494b2da
 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -80,6 +80,7 @@ along with GCC; see the file COPYING3.  If not see
  :;
 */
 
+#define INCLUDE_ALGORITHM
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -231,6 +232,10 @@ struct bb_predicate {
  recorded here, in order to avoid the duplication of computations
  that occur in previous conditions.  See PR44483.  */
   gimple_seq predicate_gimplified_stmts;
+
+  /* Records the number of statements recorded into
+ PREDICATE_GIMPLIFIED_STMTS.   */
+  unsigned no_predicate_stmts;
 };
 
 /* Returns true when the basic block BB has a predicate.  */
@@ -254,10 +259,16 @@ bb_predicate (basic_block bb)
 static inline void
 set_bb_predicate (basic_block bb, tree cond)
 {
+  auto aux = (struct bb_predicate *) bb->aux;
   gcc_assert ((TREE_CODE (cond) == TRUTH_NOT_EXPR
   && is_gimple_val (TREE_OPERAND (cond, 0)))
  || is_gimple_val (cond));
-  ((struct bb_predicate *) bb->aux)->predicate = cond;
+  aux->predicate 

[PATCH 1/2]middle-end ifcvt: Reduce comparisons on conditionals by tracking truths [PR109154]

2023-07-07 Thread Tamar Christina via Gcc-patches
Hi All,

Following on from Jakub's patch in g:de0ee9d14165eebb3d31c84e98260c05c3b33acb
these two patches finishes the work fixing the regression and improves codegen.

As explained in that commit, ifconvert sorts PHI args in increasing number of
occurrences in order to reduce the number of comparisons done while
traversing the tree.

The remaining task that this patch fixes is dealing with the long chain of
comparisons that can be created from phi nodes, particularly when they share
any common successor (classical example is a diamond node).

on a PHI-node the true and else branches carry a condition, true will
carry `a` and false `~a`.  The issue is that at the moment GCC tests both `a`
and `~a` when the phi node has more than 2 arguments. Clearly this isn't
needed.  The deeper the nesting of phi nodes the larger the repetition.

As an example, for

foo (int *f, int d, int e)
{
  for (int i = 0; i < 1024; i++)
{
  int a = f[i];
  int t;
  if (a < 0)
t = 1;
  else if (a < e)
t = 1 - a * d;
  else
t = 0;
  f[i] = t;
}
}

after Jakub's patch we generate:

  _7 = a_10 < 0;
  _21 = a_10 >= 0;
  _22 = a_10 < e_11(D);
  _23 = _21 & _22;
  _ifc__42 = _23 ? t_13 : 0;
  t_6 = _7 ? 1 : _ifc__42

but while better than before it is still inefficient, since in the false
branch, where we know ~_7 is true, we still test _21.

This leads to superfluous tests for every diamond node.  After this patch we
generate

 _7 = a_10 < 0;
 _22 = a_10 < e_11(D);
 _ifc__42 = _22 ? t_13 : 0;
 t_6 = _7 ? 1 : _ifc__42;

Which correctly elides the test of _21.  This is done by borrowing the
vectorizer's helper functions to limit predicate mask usages.  Ifcvt will chain
conditionals on the false edge (unless specifically inverted) so this patch on
creating cond a ? b : c, will register ~a when traversing c.  If c is a
conditional then c will be simplified to the smaller possible predicate given
the assumptions we already know to be true.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Not sure how to write a non-fragile testcase for this as the
conditionals chosen depends on threading etc. Any Suggestions?

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/109154
* tree-if-conv.cc (gen_simplified_condition,
gen_phi_nest_statement): New.
(gen_phi_arg_condition, predicate_scalar_phi): Use it.

--- inline copy of patch -- 
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 
e342532a343a3c066142adeec5fdfaf736a653e5..16b36dd8b0226f796c1a3fc6d45a9059385e812b
 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1870,12 +1870,44 @@ convert_scalar_cond_reduction (gimple *reduc, 
gimple_stmt_iterator *gsi,
   return rhs;
 }
 
+/* Generate a simplified conditional.  */
+
+static tree
+gen_simplified_condition (tree cond, scalar_cond_masked_set_type _set)
+{
+  /* Check if the value is already live in a previous branch.  This resolves
+ nested conditionals from diamond PHI reductions.  */
+  if (TREE_CODE (cond) == SSA_NAME)
+{
+  gimple *stmt = SSA_NAME_DEF_STMT (cond);
+  gassign *assign = NULL;
+  if ((assign = as_a  (stmt))
+  && gimple_assign_rhs_code (assign) == BIT_AND_EXPR)
+   {
+ tree arg1 = gimple_assign_rhs1 (assign);
+ tree arg2 = gimple_assign_rhs2 (assign);
+ if (cond_set.contains ({ arg1, 1 }))
+   arg1 = boolean_true_node;
+ else
+   arg1 = gen_simplified_condition (arg1, cond_set);
+
+ if (cond_set.contains ({ arg2, 1 }))
+   arg2 = boolean_true_node;
+ else
+   arg2 = gen_simplified_condition (arg2, cond_set);
+
+ cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, arg1, arg2);
+   }
+}
+  return cond;
+}
+
 /* Produce condition for all occurrences of ARG in PHI node.  Set *INVERT
as to whether the condition is inverted.  */
 
 static tree
-gen_phi_arg_condition (gphi *phi, vec *occur,
-  gimple_stmt_iterator *gsi, bool *invert)
+gen_phi_arg_condition (gphi *phi, vec *occur, gimple_stmt_iterator *gsi,
+  scalar_cond_masked_set_type _set, bool *invert)
 {
   int len;
   int i;
@@ -1902,6 +1934,8 @@ gen_phi_arg_condition (gphi *phi, vec *occur,
  c = TREE_OPERAND (c, 0);
  *invert = true;
}
+
+  c = gen_simplified_condition (c, cond_set);
   c = force_gimple_operand_gsi (gsi, unshare_expr (c),
true, NULL_TREE, true, GSI_SAME_STMT);
   if (cond != NULL_TREE)
@@ -1913,11 +1947,79 @@ gen_phi_arg_condition (gphi *phi, vec *occur,
}
   else
cond = c;
+
+  /* Register the new possibly simplified conditional.  When more than 2
+entries in a phi node we chain entries in the false branch, so the
+inverted condition is active.  */
+  scalar_cond_masked_key pred_cond ({ cond, 1 });
+  if (!invert)
+   

RE: [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant.

2023-07-06 Thread Tamar Christina via Gcc-patches
> On Wed, 28 Jun 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > expand_vector_piecewise does not support VLA expansion as it has a
> > hard assert on the type not being VLA.
> >
> > Instead of just failing to expand and so the call marked unsupported we ICE.
> > This adjust it so we don't and can gracefully handle the expansion in
> > support checks.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> Hmm, do we support _any_ VLA "generic" vectors?  That is, why do we get
> here at all?  Doesn't that mean the vectorizer creates code that vector 
> lowering
> thinks is not supported by the target?
> 
> In any case I'd expect expand_vector_operations_1 at
> 
>   if (compute_type == NULL_TREE)
> compute_type = get_compute_type (code, op, type);
>   if (compute_type == type)
> return;
> 
>  <  here
> 
>   new_rhs = expand_vector_operation (gsi, type, compute_type, stmt, code,
>  dce_ssa_names);
> 
> to be able to assert that compute_type (or even type) isn't VLA?
> 
> So, why do we arrive here?
> 

I think we used to arrive here because the patch last year didn't properly 
check the cmp,
I don't his it with this new patch so I'll drop it.  I thought it was an actual 
bug hence why I
submitted the patch 

Thanks,
Tamar
> Richard.
> 
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-generic.cc (expand_vector_comparison): Skip piecewise if
> not
> > constant.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index
> >
> df04a0db68da3222f43dd938f8e7adb186cd93c9..da1fd2f40d82a9fa301e6
> ed0b2f4
> > c3c222d58a8d 100644
> > --- a/gcc/tree-vect-generic.cc
> > +++ b/gcc/tree-vect-generic.cc
> > @@ -481,7 +481,7 @@ expand_vector_comparison (gimple_stmt_iterator
> *gsi, tree type, tree op0,
> > }
> >   t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
> > }
> > -  else
> > +  else if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
> > t = expand_vector_piecewise (gsi, do_compare, type,
> >  TREE_TYPE (TREE_TYPE (op0)), op0, op1,
> >  code, false);
> >
> >
> >
> >
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> Boudien Moerman; HRB 36809 (AG Nuernberg)


RE: [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector

2023-06-30 Thread Tamar Christina via Gcc-patches
Hi Jason,

Thanks for the review. I only now realized I should have split them between C 
and C++.

Will do so on the respins.

> 
> On 6/28/23 09:41, Tamar Christina wrote:
> > Hi All,
> >
> > FORTRAN currently has a pragma NOVECTOR for indicating that
> > vectorization should not be applied to a particular loop.
> >
> > ICC/ICX also has such a pragma for C and C++ called #pragma novector.
> >
> > As part of this patch series I need a way to easily turn off
> > vectorization of particular loops, particularly for testsuite reasons.
> >
> > This patch proposes a #pragma GCC novector that does the same for C
> > and C++ as gfortan does for FORTRAN and what ICX/ICX does for C and C++.
> >
> > I added only some basic tests here, but the next patch in the series
> > uses this in the testsuite in about ~800 tests.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/c-family/ChangeLog:
> >
> > * c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
> > * c-pragma.cc (init_pragma): Use it.
> >
> > gcc/c/ChangeLog:
> >
> > * c-parser.cc (c_parser_while_statement, c_parser_do_statement,
> > c_parser_for_statement, c_parser_statement_after_labels,
> > c_parse_pragma_novector, c_parser_pragma): Wire through novector
> and
> > default to false.
> 
> I'll let the C maintainers review the C changes.
> 
> > gcc/cp/ChangeLog:
> >
> > * cp-tree.def (RANGE_FOR_STMT): Update comment.
> > * cp-tree.h (RANGE_FOR_NOVECTOR): New.
> > (cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
> > finish_for_cond): Add novector param.
> > * init.cc (build_vec_init): Default novector to false.
> > * method.cc (build_comparison_op): Likewise.
> > * parser.cc (cp_parser_statement): Likewise.
> > (cp_parser_for, cp_parser_c_for, cp_parser_range_for,
> > cp_convert_range_for, cp_parser_iteration_statement,
> > cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
> > (cp_parser_pragma_novector): New.
> > * pt.cc (tsubst_expr): Likewise.
> > * semantics.cc (finish_while_stmt_cond, finish_do_stmt,
> > finish_for_cond): Likewise.
> >
> > gcc/ChangeLog:
> >
> > * doc/extend.texi: Document it.
> > * tree-core.h (struct tree_base): Add lang_flag_7 and reduce spare0.
> > * tree.h (TREE_LANG_FLAG_7): New.
> 
> This doesn't seem necessary; I think only flags 1 and 6 are currently used in
> RANGE_FOR_STMT.

Ah fair, I thought every option needed to occupy a specific bit. I'll try to 
re-use one.

> 
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/vect/vect-novector-pragma.cc: New test.
> > * gcc.dg/vect/vect-novector-pragma.c: New test.
> >
> > --- inline copy of patch --
> >...
> > @@ -13594,7 +13595,8 @@ cp_parser_condition (cp_parser* parser)
> >  not included. */
> >
> >   static tree
> > -cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll)
> > +cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll,
> > +  bool novector)
> 
> I wonder about combining the ivdep and novector parameters here and in
> other functions?  Up to you.

As in, combine them in e.g. a struct?

> 
> > @@ -49613,17 +49633,33 @@ cp_parser_pragma (cp_parser *parser,
> enum pragma_context context, bool *if_p)
> > break;
> >   }
> > const bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok);
> > -   unsigned short unroll;
> > +   unsigned short unroll = 0;
> > +   bool novector = false;
> > cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
> > -   if (tok->type == CPP_PRAGMA
> > -   && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL)
> > +
> > +   while (tok->type == CPP_PRAGMA)
> >   {
> > -   tok = cp_lexer_consume_token (parser->lexer);
> > -   unroll = cp_parser_pragma_unroll (parser, tok);
> > -   tok = cp_lexer_peek_token (the_parser->lexer);
> > +   switch (cp_parser_pragma_kind (tok))
> > + {
> > +   case PRAGMA_UNROLL:
> > + {
> > +   tok = cp_lexer_consume_token (parser->lexer);
> > +   unroll = cp_parser_pragma_unroll (parser, tok);
> > +   tok = cp_lexer_peek_token (the_parser->lexer);
> > +   break;
> > + }
> > +   case PRAGMA_NOVECTOR:
> > + {
> > +   tok = cp_lexer_consume_token (parser->lexer);
> > +   novector = cp_parser_pragma_novector (parser, tok);
> > +   tok = cp_lexer_peek_token (the_parser->lexer);
> > +   break;
> > + }
> > +   default:
> > + gcc_unreachable ();
> > + }
> >   }
> 
> Repeating this pattern three times for the three related pragmas is too much;
> please combine the three cases into one.

Sure, I had some trouble combing them before because of the initial token being
consumed, but think I know a way.

Thanks for the review, will send updated split patch Monday.


RE: FW: [PATCH v5 0/19] Support early break/return auto-vectorization

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi Juzhe,

> 
> Hi, Tamar.
> 
> This is an amazing auto-vectorization flow.
> 
> I am thinking about whether RVV can also get benefits from this optimization.
> IMHO, RVV should be also using this flow.
> 
> So, to allow RVV  (target uses len as loop_control and mask as flow control), 
> I
> am not sure whether we can do this (Feel free to correct me if I am wrong):
> 
> +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> + vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type,
> NULL);
> 
> Maybe it can be ?
> 
> if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) {
>   if (mask_loop_p)
>  vect_record_loop_mask
>else
>  vect_record_loop_len
> }
> 

Yeah, that should be the only change required,  I started this patch before the 
loop_len change
made it in and just rebased recently 

> 
> +  tree cond = gimple_assign_lhs (new_stmt);
> +  if (masked_loop_p)
> +{
> +  tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies,
> truth_type, 0);
> +  cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> +_gsi);
> +}
> +
> +  tree t = fold_build2 (NE_EXPR, boolean_type_node, cond,
> + build_zero_cst (truth_type));
> 
> From my understanding, you are using final_mask = loop_mask (WHILE_ULT)
> && control_mask (comparison).
> Then Test final_mask using NE_EXPR. Am I right?

Yeah that's right, It's creating the mask for partial iterations.  The only 
other constraint is
being able to reduce a boolean mask using inclusive OR,  but that's optional 
and is only
needed if one side of the comparison produces more than 1 copy (so it's only 
checked then).

> 
> For RVV, I thinking whether we can have a good way to do this testing.
> Not sure whether we can have something like LEN_TEST_MASK_NE (loop_len,
> control_mask...)
> 

Hmm Is just the vect_record_loop_len change not enough? (I haven't followed the 
masking
implementation in RVV in detail) but I assume that it's following the general 
principle than
& an operation with a mask creates a masked operation?

That is to say, I thought LOOP_LEN was only for the loop control? Which doesn't 
change here.

> I am not saying that we should support "early break" auto-vectorization for
> RVV (loop_len && control_mask).
> I am just write some comments trying to figure out how I can adapt your
> working for RVV in the future.
> 

Yes happy to help, the more uses it gets the more bugs I can fix 

Cheers,
Tamar

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
> 
> From: Li, Pan2
> Date: 2023-06-28 22:21
> To: juzhe.zh...@rivai.ai
> Subject: FW: [PATCH v5 0/19] Support early break/return auto-vectorization
> FYI.
> 
> -Original Message-
> From: Gcc-patches 
> On Behalf Of Tamar Christina via Gcc-patches
> Sent: Wednesday, June 28, 2023 9:41 PM
> To: gcc-patches@gcc.gnu.org
> Cc: n...@arm.com; rguent...@suse.de; j...@ventanamicro.com
> Subject: [PATCH v5 0/19] Support early break/return auto-vectorization
> 
> Hi All,
> 
> This patch adds initial support for early break vectorization in GCC.
> The support is added for any target that implements a vector cbranch optab,
> this includes both fully masked and non-masked targets.
> 
> Depending on the operation, the vectorizer may also require support for
> boolean mask reductions using Inclusive OR.  This is however only checked
> then the comparison would produce multiple statements.
> 
> Concretely the kind of loops supported are of the forms:
> 
> for (int i = 0; i < N; i++)
> {
>
>if ()
>  {
>...
>;
>  }
>
> }
> 
> where  can be:
> - break
> - return
> - goto
> 
> Any number of statements can be used before the  occurs.
> 
> Since this is an initial version for GCC 14 it has the following limitations 
> and
> features:
> 
> - Only fixed sized iterations and buffers are supported.  That is to say any
>   vectors loaded or stored must be to statically allocated arrays with known
>   sizes. N must also be known.  This limitation is because our primary target
>   for this optimization is SVE.  For VLA SVE we can't easily do cross page
>   iteraion checks. The result is likely to also not be beneficial. For that
>   reason we punt support for variable buffers till we have First-Faulting
>   support in GCC.
> - any stores in  should not be to the same objects as in
>   .  Loads are fine as long as they don't have the possibility to
>   alias.  More concretely, we block RAW dependencies when the intermediate
> value
>   can't be separated fromt the store, or the store itself can't be moved.
> - The numbe

RE: [PATCH 9/19] middle-end: refactor vectorizable_comparison to make the main body re-usable.

2023-06-28 Thread Tamar Christina via Gcc-patches
Adding proper maintainers.

> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, June 28, 2023 2:46 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ;
> Marcus Shawcroft ; Kyrylo Tkachov
> ; Richard Sandiford
> 
> Subject: [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison
> to make the main body re-usable.
> 
> Hi All,
> 
> Vectorization of a gcond starts off essentially the same as vectorizing a
> comparison witht he only difference being how the operands are extracted.
> 
> This refactors vectorable_comparison such that we now have a generic
> function that can be used from vectorizable_early_break.  The refactoring
> splits the gassign checks and actual validation/codegen off to a helper
> function.
> 
> No change in functionality expected.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-stmts.cc (vectorizable_comparison): Refactor, splitting
> body
>   to ...
>   (vectorizable_comparison_1): ...This.
> 
> --- inline copy of patch --
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
> ae24f3e66e63d9bd9763284a47fb2c911335c4c1..f3e33cd4ed125b9564ca8
> 1acd197693fc3457c31 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -11332,21 +11332,22 @@ vectorizable_condition (vec_info *vinfo,
> 
>  /* vectorizable_comparison.
> 
> -   Check if STMT_INFO is comparison expression that can be vectorized.
> +/* Helper of vectorizable_comparison.
> +
> +   Check if STMT_INFO is comparison expression CODE that can be vectorized.
> If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
> comparison, put it in VEC_STMT, and insert it at GSI.
> 
> Return true if STMT_INFO is vectorizable in this way.  */
> 
>  static bool
> -vectorizable_comparison (vec_info *vinfo,
> -  stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> -  gimple **vec_stmt,
> -  slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> +vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
> +stmt_vec_info stmt_info, tree_code code,
> +gimple_stmt_iterator *gsi, gimple **vec_stmt,
> +slp_tree slp_node, stmt_vector_for_cost *cost_vec)
>  {
>tree lhs, rhs1, rhs2;
>tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
> -  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
>tree new_temp;
>loop_vec_info loop_vinfo = dyn_cast  (vinfo); @@ -11354,7
> +11355,7 @@ vectorizable_comparison (vec_info *vinfo,
>int ndts = 2;
>poly_uint64 nunits;
>int ncopies;
> -  enum tree_code code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
> +  enum tree_code bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
>int i;
>bb_vec_info bb_vinfo = dyn_cast  (vinfo);
>vec vec_oprnds0 = vNULL;
> @@ -11377,14 +11378,6 @@ vectorizable_comparison (vec_info *vinfo,
>  ncopies = vect_get_num_copies (loop_vinfo, vectype);
> 
>gcc_assert (ncopies >= 1);
> -  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
> -return false;
> -
> -  gassign *stmt = dyn_cast  (stmt_info->stmt);
> -  if (!stmt)
> -return false;
> -
> -  code = gimple_assign_rhs_code (stmt);
> 
>if (TREE_CODE_CLASS (code) != tcc_comparison)
>  return false;
> @@ -11499,7 +11492,6 @@ vectorizable_comparison (vec_info *vinfo,
> return false;
>   }
> 
> -  STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
>vect_model_simple_cost (vinfo, stmt_info,
> ncopies * (1 + (bitop2 != NOP_EXPR)),
> dts, ndts, slp_node, cost_vec); @@ -11565,6
> +11557,44 @@ vectorizable_comparison (vec_info *vinfo,
>return true;
>  }
> 
> +/* vectorizable_comparison.
> +
> +   Check if STMT_INFO is comparison expression that can be vectorized.
> +   If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
> +   comparison, put it in VEC_STMT, and insert it at GSI.
> +
> +   Return true if STMT_INFO is vectorizable in this way.  */
> +
> +static bool
> +vectorizable_comparison (vec_info *vinfo,
> +  stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> +  gimple **vec_stmt,
> +  slp_tree slp_node, stmt_vector_for_cost *cost_vec) {
> +  bb_vec_info bb_vinfo = dyn_cast  (vinfo);
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> +return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
> +return false;
> +
> +  gassign *stmt = dyn_cast  (stmt_info->stmt);  if (!stmt)
> +return false;
> +
> +  enum tree_code code = gimple_assign_rhs_code (stmt);
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +   

RE: [PATCH 3/19]middle-end clean up vect testsuite using pragma novector

2023-06-28 Thread Tamar Christina via Gcc-patches
Resending attached only due to size limit

> -Original Message-
> From: Tamar Christina
> Sent: Wednesday, June 28, 2023 2:42 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; rguent...@suse.de; j...@ventanamicro.com
> Subject: [PATCH 3/19]middle-end clean up vect testsuite using pragma
> novector
> 
> Hi All,
> 
> The support for early break vectorization breaks lots of scan vect and slp
> testcases because they assume that loops with abort () in them cannot be
> vectorized.  Additionally it breaks the point of having a scalar loop to check
> the output of the vectorizer if that loop is also vectorized.
> 
> For that reason this adds
> 
> #pragma GCC novector to all tests which have a scalar loop that we would
> have
> vectorized using this patch series.
> 
> FWIW, none of these tests were failing to vectorize or run before the pragma.
> The tests that did point to some issues were copies to the early break test
> suit as well.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/vect/pr84556.cc: Add novector pragma.
>   * g++.dg/vect/simd-1.cc: Add novector pragma.
>   * g++.dg/vect/simd-2.cc: Add novector pragma.
>   * g++.dg/vect/simd-3.cc: Add novector pragma.
>   * g++.dg/vect/simd-4.cc: Add novector pragma.
>   * g++.dg/vect/simd-5.cc: Add novector pragma.
>   * g++.dg/vect/simd-6.cc: Add novector pragma.
>   * g++.dg/vect/simd-7.cc: Add novector pragma.
>   * g++.dg/vect/simd-8.cc: Add novector pragma.
>   * g++.dg/vect/simd-9.cc: Add novector pragma.
>   * g++.dg/vect/simd-clone-6.cc: Add novector pragma.
>   * gcc.dg/vect/O3-pr70130.c: Add novector pragma.
>   * gcc.dg/vect/Os-vect-95.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-1.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-16.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-2.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-24.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-25.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-26.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-27.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-28.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-29.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-42.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-cond-1.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-over-widen-1.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-over-widen-2.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-pattern-1.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-pattern-2.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-pow-1.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-pr101615-2.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-pr65935.c: Add novector pragma.
>   * gcc.dg/vect/bb-slp-subgroups-1.c: Add novector pragma.
>   * gcc.dg/vect/costmodel/i386/costmodel-vect-31.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/i386/costmodel-vect-33.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/i386/costmodel-vect-68.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-31b.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-31c.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-68a.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-68b.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-68c.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-76a.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-76c.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c: Add
> novector pragma.
>   * gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c: Add novector
> pragma.
>   * gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c: Add novector
> pragma.
>   * gcc.dg/vect/fast-math-bb-slp-call-1.c: Add novector pragma.
>   * gcc.dg/vect/fast-math-bb-slp-call-2.c: Add novector pragma.
>   * gcc.dg/vect/fast-math-vect-call-1.c: Add novector pragma.
>   * gcc.dg/vect/fast-math-vect-call-2.c: Add novector pragma.
>   * gcc.dg/vect/fast-math-vect-complex-3.c: Add novector pragma.
>   * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Add novector pragma.
>   * 

[PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 for Advanced SIMD

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

Advanced SIMD lacks a cmpeq for vectors, and unlike compare to 0 we can't
rewrite to a cmtst.

This operation is however fairly common, especially now that we support early
break vectorization.

As such this adds a pattern to recognize the negated any comparison and
transform it to an all.  i.e. any(~x) => all(x) and invert the branches.

For e.g.

void f1 (int x)
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] != x)
break;
}
}

We currently generate:

cmeqv31.4s, v30.4s, v29.4s
not v31.16b, v31.16b
umaxp   v31.4s, v31.4s, v31.4s
fmovx5, d31
cbnzx5, .L2

and after this patch:

cmeqv31.4s, v30.4s, v29.4s
uminp   v31.4s, v31.4s, v31.4s
fmovx5, d31
cbz x5, .L2

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*cbranchnev4si): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-early-break-cbranch_2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
cd5ec35c3f53028f14828bd70a92924f62524c15..b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3870,6 +3870,37 @@ (define_expand "cbranch4"
   DONE;
 })
 
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+(define_insn_and_split "*cbranchnev4si"
+  [(set (pc)
+(if_then_else
+  (ne (subreg:DI
+   (unspec:V4SI
+ [(not:V4SI (match_operand:V4SI 0 "register_operand" "w"))
+  (not:V4SI (match_dup 0))]
+   UNSPEC_UMAXV) 0)
+  (const_int 0))
+   (label_ref (match_operand 1 ""))
+   (pc)))
+(clobber (match_scratch:DI 2 "=w"))]
+  "TARGET_SIMD"
+  "#"
+  "&& true"
+  [(set (match_dup 2)
+   (unspec:V4SI [(match_dup 0) (match_dup 0)] UNSPEC_UMINV))
+   (set (pc)
+(if_then_else
+  (eq (subreg:DI (match_dup 2) 0)
+ (const_int 0))
+   (label_ref (match_dup 1))
+   (pc)))]
+{
+  if (can_create_pseudo_p ())
+operands[2] = gen_reg_rtx (V4SImode);
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c 
b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
new file mode 100644
index 
..e81027bb50138be627f4dfdffb1557893a5a7723
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+** ...
+   cmeqv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+   uminp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+   fmovx[0-9]+, d[0-9]+
+   cbz x[0-9]+, \.L[0-9]+
+** ...
+*/
+void f1 (int x)
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] != x)
+   break;
+}
+}




-- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
cd5ec35c3f53028f14828bd70a92924f62524c15..b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3870,6 +3870,37 @@ (define_expand "cbranch4"
   DONE;
 })
 
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+(define_insn_and_split "*cbranchnev4si"
+  [(set (pc)
+(if_then_else
+  (ne (subreg:DI
+   (unspec:V4SI
+ [(not:V4SI (match_operand:V4SI 0 "register_operand" "w"))
+  (not:V4SI (match_dup 0))]
+   UNSPEC_UMAXV) 0)
+  (const_int 0))
+   (label_ref (match_operand 1 ""))
+   (pc)))
+(clobber (match_scratch:DI 2 "=w"))]
+  "TARGET_SIMD"
+  "#"
+  "&& true"
+  [(set (match_dup 2)
+   (unspec:V4SI [(match_dup 0) (match_dup 0)] UNSPEC_UMINV))
+   (set (pc)
+(if_then_else
+  (eq (subreg:DI (match_dup 2) 0)
+ (const_int 0))
+   (label_ref (match_dup 1))
+   (pc)))]
+{
+  if (can_create_pseudo_p ())
+operands[2] = gen_reg_rtx (V4SImode);
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c 
b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c
new file mode 100644
index 
..e81027bb50138be627f4dfdffb1557893a5a7723
--- /dev/null

[PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

This adds an implementation for conditional branch optab for AArch64.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] > 0)
break;
}
}

For 128-bit vectors we generate:

cmgtv1.4s, v1.4s, #0
umaxp   v1.4s, v1.4s, v1.4s
fmovx3, d1
cbnzx3, .L8

and of 64-bit vector we can omit the compression:

cmgtv1.2s, v1.2s, #0
fmovx2, d1
cbz x2, .L13

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (cbranch4): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd70a92924f62524c15
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3830,6 +3830,46 @@ (define_expand "vcond_mask_"
   DONE;
 })
 
+;; Patterns comparing two vectors and conditionally jump
+
+(define_expand "cbranch4"
+  [(set (pc)
+(if_then_else
+  (match_operator 0 "aarch64_equality_operator"
+[(match_operand:VDQ_I 1 "register_operand")
+ (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")])
+  (label_ref (match_operand 3 ""))
+  (pc)))]
+  "TARGET_SIMD"
+{
+  auto code = GET_CODE (operands[0]);
+  rtx tmp = operands[1];
+
+  /* If comparing against a non-zero vector we have to do a comparison first
+ so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (mode))
+emit_insn (gen_vec_cmp (tmp, operands[0], operands[1],
+   operands[2]));
+
+  /* For 64-bit vectors we need no reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (mode)))
+{
+  /* Always reduce using a V4SI.  */
+  rtx reduc = gen_lowpart (V4SImode, tmp);
+  rtx res = gen_reg_rtx (V4SImode);
+  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
+  emit_move_insn (tmp, gen_lowpart (mode, res));
+}
+
+  rtx val = gen_reg_rtx (DImode);
+  emit_move_insn (val, gen_lowpart (DImode, tmp));
+
+  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
+  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
+  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
+  DONE;
+})
+
 ;; Patterns comparing two vectors to produce a mask.
 
 (define_expand "vec_cmp"
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c 
b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
new file mode 100644
index 
..c0363c3787270507d7902bb2ac0e39faef63a852
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c
@@ -0,0 +1,124 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#pragma GCC target "+nosve"
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+
+/*
+** f1:
+** ...
+** cmgtv[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] > 0)
+   break;
+}
+}
+
+/*
+** f2:
+** ...
+** cmgev[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] >= 0)
+   break;
+}
+}
+
+/*
+** f3:
+** ...
+** cmeqv[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] == 0)
+   break;
+}
+}
+
+/*
+** f4:
+** ...
+** cmtst   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f4 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] != 0)
+   break;
+}
+}
+
+/*
+** f5:
+** ...
+** cmltv[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...
+*/
+void f5 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] < 0)
+   break;
+}
+}
+
+/*
+** f6:
+** ...
+** cmlev[0-9]+.4s, v[0-9]+.4s, #0
+** umaxp   v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** fmovx[0-9]+, d[0-9]+
+** cbnzx[0-9]+, \.L[0-9]+
+** ...

[PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

This adds an implementation for conditional branch optab for AArch32.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] > 0)
break;
}
}

For 128-bit vectors we generate:

vcgt.s32q8, q9, #0
vpmax.u32   d7, d16, d17
vpmax.u32   d7, d7, d7
vmovr3, s14 @ int
cmp r3, #0

and of 64-bit vector we can omit one vpmax as we still need to compress to
32-bits.

Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/neon.md (cbranch4): New.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (vect_early_break): Add AArch32.
* gcc.target/arm/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 
d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc344d2243dcb63
 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -408,6 +408,45 @@ (define_insn "vec_extract"
   [(set_attr "type" "neon_store1_one_lane,neon_to_gp")]
 )
 
+;; Patterns comparing two vectors and conditionally jump.
+;; Avdanced SIMD lacks a vector != comparison, but this is a quite common
+;; operation.  To not pay the penalty for inverting == we can map our any
+;; comparisons to all i.e. any(~x) => all(x).
+;;
+;; However unlike the AArch64 version, we can't optimize this further as the
+;; chain is too long for combine due to these being unspecs so it doesn't fold
+;; the operation to something simpler.
+(define_expand "cbranch4"
+  [(set (pc) (if_then_else
+ (match_operator 0 "expandable_comparison_operator"
+  [(match_operand:VDQI 1 "register_operand")
+   (match_operand:VDQI 2 "zero_operand")])
+ (label_ref (match_operand 3 "" ""))
+ (pc)))]
+  "TARGET_NEON"
+{
+  rtx mask = operands[1];
+
+  /* For 128-bit vectors we need an additional reductions.  */
+  if (known_eq (128, GET_MODE_BITSIZE (mode)))
+{
+  /* Always reduce using a V4SI.  */
+  mask = gen_reg_rtx (V2SImode);
+  rtx low = gen_reg_rtx (V2SImode);
+  rtx high = gen_reg_rtx (V2SImode);
+  emit_insn (gen_neon_vget_lowv4si (low, operands[1]));
+  emit_insn (gen_neon_vget_highv4si (high, operands[1]));
+  emit_insn (gen_neon_vpumaxv2si (mask, low, high));
+}
+
+  emit_insn (gen_neon_vpumaxv2si (mask, mask, mask));
+
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, mask));
+  emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; This pattern is renamed from "vec_extract" to
 ;; "neon_vec_extract" and this pattern is called
 ;; by define_expand in vec-common.md file.
diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c 
b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
new file mode 100644
index 
..2c05aa10d26ed4ac9785672e6e3b4355cef046dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c
@@ -0,0 +1,136 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+/* f1:
+** ...
+** vcgt.s32q[0-9]+, q[0-9]+, #0
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vmovr[0-9]+, s[0-9]+@ int
+** cmp r[0-9]+, #0
+** bne \.L[0-9]+
+** ...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] > 0)
+   break;
+}
+}
+
+/*
+** f2:
+** ...
+** vcge.s32q[0-9]+, q[0-9]+, #0
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vmovr[0-9]+, s[0-9]+@ int
+** cmp r[0-9]+, #0
+** bne \.L[0-9]+
+** ...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] >= 0)
+   break;
+}
+}
+
+/*
+** f3:
+** ...
+** vceq.i32q[0-9]+, q[0-9]+, #0
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vmovr[0-9]+, s[0-9]+@ int
+** cmp r[0-9]+, #0
+** bne \.L[0-9]+
+** ...
+*/
+void f3 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] == 0)
+   break;
+}
+}
+
+/*
+** f4:
+** ...
+** vceq.i32q[0-9]+, q[0-9]+, #0
+** vmvnq[0-9]+, q[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vpmax.u32   d[0-9]+, d[0-9]+, d[0-9]+
+** vmovr[0-9]+, s[0-9]+@ int
+** cmp r[0-9]+, #0
+** bne \.L[0-9]+
+** 

[PATCH 19/19]Arm: Add MVE cbranch implementation

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

This adds an implementation for conditional branch optab for MVE.

Unfortunately MVE has rather limited operations on VPT.P0, we are missing the
ability to do P0 comparisons and logical OR on P0.

For that reason we can only support cbranch with 0, as for comparing to a 0
predicate we don't need to actually do a comparison, we only have to check that
any bit is set within P0.

Because we can only do P0 comparisons with 0, the costing of the comparison was
reduced in order for the compiler not to try to push 0 to a register thinking
it's too expensive.  For the cbranch implementation to be safe we must see the
constant 0 vector.

For the lack of logical OR on P0 we can't really work around.  This means MVE
can't support cases where the sizes of operands in the comparison don't match,
i.e. when one operand has been unpacked.

For e.g.

void f1 ()
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] > 0)
break;
}
}

For 128-bit vectors we generate:

vcmp.s32gt, q3, q1
vmrsr3, p0  @ movhi
cbnzr3, .L2

MVE does not have 64-bit vector comparisons, as such that is also not supported.

Bootstrapped arm-none-linux-gnueabihf and regtested with
-march=armv8.1-m.main+mve -mfpu=auto and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/arm/arm.cc (arm_rtx_costs_internal): Update costs for pred 0
compares.
* config/arm/mve.md (cbranch4): New.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (vect_early_break): Add MVE.
* gcc.target/arm/mve/vect-early-break-cbranch.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 
38f0839de1c75547c259ac3d655fcfc14e7208a2..15e65c15cb3cb6f70161787e84b255a24eb51e32
 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -11883,6 +11883,15 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, 
enum rtx_code outer_code,
   || TARGET_HAVE_MVE)
  && simd_immediate_valid_for_move (x, mode, NULL, NULL))
*cost = COSTS_N_INSNS (1);
+  else if (TARGET_HAVE_MVE
+  && outer_code == COMPARE
+  && VALID_MVE_PRED_MODE (mode))
+   /* MVE allows very limited instructions on VPT.P0,  however comparisons
+  to 0 do not require us to materialze this constant or require a
+  predicate comparison as we can go through SImode.  For that reason
+  allow P0 CMP 0 as a cheap operation such that the 0 isn't forced to
+  registers as we can't compare two predicates.  */
+   *cost = COSTS_N_INSNS (1);
   else
*cost = COSTS_N_INSNS (4);
   return true;
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 
74909ce47e132c22a94f7d9cd3a0921b38e33051..95d40770ecc25f9eb251eba38306dd43cbebfb3f
 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -6880,6 +6880,21 @@ (define_expand "vcond_mask_"
   DONE;
 })
 
+(define_expand "cbranch4"
+  [(set (pc) (if_then_else
+ (match_operator 0 "expandable_comparison_operator"
+  [(match_operand:MVE_7 1 "register_operand")
+   (match_operand:MVE_7 2 "zero_operand")])
+ (label_ref (match_operand 3 "" ""))
+ (pc)))]
+  "TARGET_HAVE_MVE"
+{
+  rtx val = gen_reg_rtx (SImode);
+  emit_move_insn (val, gen_lowpart (SImode, operands[1]));
+  emit_jump_insn (gen_cbranchsi4 (operands[0], val, const0_rtx, operands[3]));
+  DONE;
+})
+
 ;; Reinterpret operand 1 in operand 0's mode, without changing its contents.
 (define_expand "@arm_mve_reinterpret"
   [(set (match_operand:MVE_vecs 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c 
b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
new file mode 100644
index 
..c3b8506dca0b2b044e6869a6c8259d663c1ff930
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c
@@ -0,0 +1,117 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define N 640
+int a[N] = {0};
+int b[N] = {0};
+
+/*
+** f1:
+** ...
+** vcmp.s32gt, q[0-9]+, q[0-9]+
+** vmrsr[0-9]+, p0 @ movhi
+** cbnzr[0-9]+, \.L[0-9]+
+** ...
+*/
+void f1 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] > 0)
+   break;
+}
+}
+
+/*
+** f2:
+** ...
+** vcmp.s32ge, q[0-9]+, q[0-9]+
+** vmrsr[0-9]+, p0 @ movhi
+** cbnzr[0-9]+, \.L[0-9]+
+** ...
+*/
+void f2 ()
+{
+  for (int i = 0; i < N; i++)
+{
+  b[i] += a[i];
+  if (a[i] >= 0)
+   break;
+}
+}
+
+/*
+** f3:
+** ...
+** vcmp.i32eq, q[0-9]+, q[0-9]+
+** vmrsr[0-9]+, p0 @ movhi
+** cbnzr[0-9]+, \.L[0-9]+
+** 

[PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and Advanced SIMD

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

Advanced SIMD lacks flag setting vector comparisons which SVE adds.  Since 
machines
with SVE also support Advanced SIMD we can use the SVE comparisons to perform 
the
operation in cases where SVE codegen is allowed, but the vectorizer has decided
to generate Advanced SIMD because of loop costing.

e.g. for

void f1 (int x)
{
  for (int i = 0; i < N; i++)
{
  b[i] += a[i];
  if (a[i] != x)
break;
}
}

We currently generate:

cmeqv31.4s, v31.4s, v28.4s
uminp   v31.4s, v31.4s, v31.4s
fmovx5, d31
cbz x5, .L2

and after this patch:

ptrue   p7.b, vl16
...
cmpne   p15.s, p7/z, z31.s, z28.s
b.any   .L2

Because we need to lift the predicate creation to outside of the loop we need to
expand the predicate early,  however in the cbranch expansion we don't see the
outer compare which we need to consume.

For this reason the expansion is two fold, when expanding the cbranch we emit an
SVE predicated comparison and later on during combine we match the SVE and NEON
comparison while also consuming the ptest.

Unfortunately *aarch64_pred_cmpne_neon_ptest is needed because
for some reason combine destroys the NOT and transforms it into a plus and -1.

For the straight SVE ones, we seem to fail to eliminate the ptest in these cases
but that's a separate optimization

Test show that I'm missing a few, but before I write the patterns for them, are
these OK?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (cbranch4): Update with SVE.
* config/aarch64/aarch64-sve.md
(*aarch64_pred_cmp_neon_ptest,
*aarch64_pred_cmpeq_neon_ptest,
*aarch64_pred_cmpne_neon_ptest): New.
(aarch64_ptest): Rename to...
(@aarch64_ptest): ... This.
* genemit.cc: Include rtx-vector-builder.h.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/vect-early-break-cbranch_1.c: New test.
* gcc.target/aarch64/sve/vect-early-break-cbranch_2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78..75cb5d6f7f92b70fed8762fe64e23f0c05a99c99
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3843,31 +3843,59 @@ (define_expand "cbranch4"
   "TARGET_SIMD"
 {
   auto code = GET_CODE (operands[0]);
-  rtx tmp = operands[1];
 
-  /* If comparing against a non-zero vector we have to do a comparison first
- so we can have a != 0 comparison with the result.  */
-  if (operands[2] != CONST0_RTX (mode))
-emit_insn (gen_vec_cmp (tmp, operands[0], operands[1],
-   operands[2]));
-
-  /* For 64-bit vectors we need no reductions.  */
-  if (known_eq (128, GET_MODE_BITSIZE (mode)))
+  /* If SVE is available, lets borrow some instructions.  We will optimize
+ these further later in combine.  */
+  if (TARGET_SVE)
 {
-  /* Always reduce using a V4SI.  */
-  rtx reduc = gen_lowpart (V4SImode, tmp);
-  rtx res = gen_reg_rtx (V4SImode);
-  emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc));
-  emit_move_insn (tmp, gen_lowpart (mode, res));
+  machine_mode full_mode = aarch64_full_sve_mode (mode).require ();
+  rtx in1 = lowpart_subreg (full_mode, operands[1], mode);
+  rtx in2 = lowpart_subreg (full_mode, operands[2], mode);
+
+  machine_mode pred_mode = aarch64_sve_pred_mode (full_mode);
+  rtx_vector_builder builder (VNx16BImode, 16, 2);
+  for (unsigned int i = 0; i < 16; ++i)
+   builder.quick_push (CONST1_RTX (BImode));
+  for (unsigned int i = 0; i < 16; ++i)
+   builder.quick_push (CONST0_RTX (BImode));
+  rtx ptrue = force_reg (VNx16BImode, builder.build ());
+  rtx cast_ptrue = gen_lowpart (pred_mode, ptrue);
+  rtx ptrue_flag = gen_int_mode (SVE_KNOWN_PTRUE, SImode);
+
+  rtx tmp = gen_reg_rtx (pred_mode);
+  aarch64_expand_sve_vec_cmp_int (tmp, code, in1, in2);
+  emit_insn (gen_aarch64_ptest (pred_mode, ptrue, cast_ptrue, ptrue_flag, 
tmp));
+  operands[1] = gen_rtx_REG (CC_NZCmode, CC_REGNUM);
+  operands[2] = const0_rtx;
 }
+  else
+{
+  rtx tmp = operands[1];
 
-  rtx val = gen_reg_rtx (DImode);
-  emit_move_insn (val, gen_lowpart (DImode, tmp));
+  /* If comparing against a non-zero vector we have to do a comparison 
first
+so we can have a != 0 comparison with the result.  */
+  if (operands[2] != CONST0_RTX (mode))
+   emit_insn (gen_vec_cmp (tmp, operands[0], operands[1],
+   operands[2]));
 
-  rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx);
-  rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx);
-  emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3]));
-  DONE;
+  /* For 

[PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

I didn't want these to get lost in the noise of updates.

The following three tests now correctly work for targets that have an
implementation of cbranch for vectors so XFAILs are conditionally removed gated
on vect_early_break support.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/vect-tsvc-s332.c: Remove xfail when early break
supported.
* gcc.dg/vect/tsvc/vect-tsvc-s481.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s482.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
index 
3fd490b3797d9f033c8804b813ee6e222aa45a3b..f3227bf064856c800d3152e62d2c4921bbe0d062
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
@@ -49,4 +49,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
index 
bf98e173d2e6315ffc45477642eab7f9441c4376..441fdb2a41969c7beaf90714474802a87c0e6d04
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break} } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
index 
c4e26806292af03d59d5b9dc13777ba36831c7fc..5f2d2bf96c5bfc77e7c788ceb3f6d6beb677a367
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
@@ -37,4 +37,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break } } } } */




-- 
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
index 
3fd490b3797d9f033c8804b813ee6e222aa45a3b..f3227bf064856c800d3152e62d2c4921bbe0d062
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c
@@ -49,4 +49,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
index 
bf98e173d2e6315ffc45477642eab7f9441c4376..441fdb2a41969c7beaf90714474802a87c0e6d04
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c
@@ -39,4 +39,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break} } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
index 
c4e26806292af03d59d5b9dc13777ba36831c7fc..5f2d2bf96c5bfc77e7c788ceb3f6d6beb677a367
 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c
@@ -37,4 +37,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } 
*/
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! 
vect_early_break } } } } */





[PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

This patch updates the peeling code to maintain LCSSA during peeling.
The rewrite also naturally takes into account multiple exits and so it didn't
make sense to split them off.

For the purposes of peeling the only change for multiple exits is that the
secondary exits are all wired to the start of the new loop preheader when doing
epilogue peeling.

When doing prologue peeling the CFG is kept in tact.

For both epilogue and prologue peeling we wire through between the two loops any
PHI nodes that escape the first loop into the second loop if flow_loops is
specified.  The reason for this conditionality is because
slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways:
  - prologue peeling
  - epilogue peeling
  - loop distribution

for the last case the loops should remain independent, and so not be connected.
Because of this propagation of only used phi nodes get_current_def can be used
to easily find the previous definitions.  However live statements that are
not used inside the loop itself are not propagated (since if unused, the moment
we add the guard in between the two loops the value across the bypass edge can
be wrong if the loop has been peeled.)

This is dealt with easily enough in find_guard_arg.

For multiple exits, while we are in LCSSA form, and have a correct DOM tree, the
moment we add the guard block we will change the dominators again.  To deal with
this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the blocks to
update without having to recompute the list of blocks to update again.

When multiple exits and doing epilogue peeling we will also temporarily have an
incorrect VUSES chain for the secondary exits as it anticipates the final result
after the VDEFs have been moved.  This will thus be corrected once the code
motion is applied.

Lastly by doing things this way we can remove the helper functions that
previously did lock step iterations to update things as it went along.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-loop-distribution.cc (copy_loop_before): Pass flow_loops = false.
* tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when exit==null.
* tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional
assert.
(vect_set_loop_condition_normal): Skip modifying loop IV for multiple
exits.
(slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit peeling.
(slpeel_can_duplicate_loop_p): Likewise.
(vect_update_ivs_after_vectorizer): Don't enter this...
(vect_update_ivs_after_early_break): ...but instead enter here.
(find_guard_arg): Update for new peeling code.
(slpeel_update_phi_nodes_for_loops): Remove.
(slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0 checks.
(slpeel_update_phi_nodes_for_lcssa): Remove.
(vect_do_peeling): Fix VF for multiple exits and force epilogue.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
non_break_control_flow and early_breaks.
(vect_need_peeling_or_partial_vectors_p): Force partial vector if
multiple exits and VLA.
(vect_analyze_loop_form): Support inner loop multiple exits.
(vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
(vect_create_epilog_for_reduction):  Update live phi nodes.
(vectorizable_live_operation): Ignore live operations in vector loop
when multiple exits.
(vect_transform_loop): Force unrolling for VF loops and multiple exits.
* tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze ctrl statements.
(vect_mark_stmts_to_be_vectorized): Check for non-exit control flow and
analyze gcond params.
(vect_analyze_stmt): Support gcond.
* tree-vectorizer.cc (pass_vectorize::execute): Support multiple exits
in RPO pass.
* tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def.
(LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_GENERAL_CTR_FLOW): New.
(loop_vec_info_for_loop): Change to const and static.
(is_loop_header_bb_p): Drop assert.
(slpeel_can_duplicate_loop_p): Update prototype.
(class loop): Add early_breaks and non_break_control_flow.

--- inline copy of patch -- 
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 
97879498db46dd3c34181ae9aa6e5476004dd5b5..d790ce5fffab3aa3dfc40d833a968314a4442b9e
 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -948,7 +948,7 @@ copy_loop_before (class loop *loop, bool 
redirect_lc_phi_defs)
   edge preheader = loop_preheader_edge (loop);
 
   initialize_original_copy_tables ();
-  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader);
+  res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader, false);
   gcc_assert (res != NULL);
 
   /* When a not last partition is 

[PATCH 10/19]middle-end: implement vectorizable_early_break.

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

This implements vectorable_early_exit which is used as the codegen part of
vectorizing a gcond.

For the most part it shares the majority of the code with
vectorizable_comparison with addition that it needs to be able to reduce
multiple resulting statements into a single one for use in the gcond, and also
needs to be able to perform masking on the comparisons.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
lhs.
(vectorizable_early_exit): New.
(vect_analyze_stmt, vect_transform_stmt): Use it.
(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
f3e33cd4ed125b9564ca81acd197693fc3457c31..87c4353fa5180fcb7f60b192897456cf24f3fdbe
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11330,8 +11330,186 @@ vectorizable_condition (vec_info *vinfo,
   return true;
 }
 
-/* vectorizable_comparison.
+static bool
+vectorizable_comparison_1 (vec_info *, tree, stmt_vec_info, tree_code,
+  gimple_stmt_iterator *, gimple **, slp_tree,
+  stmt_vector_for_cost *);
+
+/* Check to see if the current early break given in STMT_INFO is valid for
+   vectorization.  */
+
+static bool
+vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
+gimple_stmt_iterator *gsi, gimple **vec_stmt,
+slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  loop_vec_info loop_vinfo = dyn_cast  (vinfo);
+  if (!loop_vinfo
+  || !is_a  (STMT_VINFO_STMT (stmt_info)))
+return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def)
+return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info))
+return false;
+
+  gimple_match_op op;
+  if (!gimple_extract_op (stmt_info->stmt, ))
+gcc_unreachable ();
+  gcc_assert (op.code.is_tree_code ());
+  auto code = tree_code (op.code);
+
+  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+  gcc_assert (vectype_out);
+
+  stmt_vec_info operand0_info
+= loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (op.ops[0]));
+  if (!operand0_info)
+return false;
+  /* If we're in a pattern get the type of the original statement.  */
+  if (STMT_VINFO_IN_PATTERN_P (operand0_info))
+operand0_info = STMT_VINFO_RELATED_STMT (operand0_info);
+  tree vectype_op = STMT_VINFO_VECTYPE (operand0_info);
+
+  tree truth_type = truth_type_for (vectype_op);
+  machine_mode mode = TYPE_MODE (truth_type);
+  int ncopies;
+
+  if (slp_node)
+ncopies = 1;
+  else
+ncopies = vect_get_num_copies (loop_vinfo, truth_type);
+
+  vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
+  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+
+  /* Analyze only.  */
+  if (!vec_stmt)
+{
+  if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
+   {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "can't vectorize early exit because the "
+  "target doesn't support flag setting vector "
+  "comparisons.\n");
+ return false;
+   }
+
+  if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR))
+   {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "can't vectorize early exit because the "
+  "target does not support boolean vector "
+  "comparisons for type %T.\n", truth_type);
+ return false;
+   }
+
+  if (ncopies > 1
+ && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
+   {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "can't vectorize early exit because the "
+  "target does not support boolean vector OR for "
+  "type %T.\n", truth_type);
+ return false;
+   }
+
+  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+ vec_stmt, slp_node, cost_vec))
+   return false;
 
+  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+   vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL);
+
+  return true;
+}
+
+  /* Tranform.  */
+
+  tree new_temp = NULL_TREE;
+  gimple *new_stmt = NULL;
+
+  if (dump_enabled_p ())
+dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
+
+  if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi,
+ vec_stmt, slp_node, cost_vec))
+gcc_unreachable ();
+
+  

[PATCH 11/19]middle-end: implement code motion for early break.

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

When performing early break vectorization we need to be sure that the vector
operations are safe to perform.  A simple example is e.g.

 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i]*2 != x)
 break;
   vect_a[i] = x;
 }

where the store to vect_b is not allowed to be executed unconditionally since
if we exit through the early break it wouldn't have been done for the full VF
iteration.

Effective the code motion determines:
  - is it safe/possible to vectorize the function
  - what updates to the VUSES should be performed if we do
  - Which statements need to be moved
  - Which statements can't be moved:
* values that are live must be reachable through all exits
* values that aren't single use and shared by the use/def chain of the cond
  - The final insertion point of the instructions.  In the cases we have
multiple early exist statements this should be the one closest to the loop
latch itself.

After motion the loop above is:

 for (int i = 0; i < N; i++)
 {
   ... y = x + i;
   if (vect_a[i]*2 != x)
 break;
   vect_b[i] = y;
   vect_a[i] = x;

 }

The operation is split into two, during data ref analysis we determine
validity of the operation and generate a worklist of actions to perform if we
vectorize.

After peeling and just before statetement tranformation we replay this worklist
which moves the statements and updates book keeping only in the main loop that's
to be vectorized.  This includes updating of USES in exit blocks.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-data-refs.cc (validate_early_exit_stmts): New.
(vect_analyze_data_ref_dependences): Use it.
* tree-vect-loop.cc (move_early_exit_stmts): New.
(vect_transform_loop): Use it.
* tree-vectorizer.h (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS,
LOOP_VINFO_EARLY_BRK_DEST_BB, LOOP_VINFO_EARLY_BRK_VUSES): New.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 
fcc950f528b2d1e044be12424c2df11f692ee8ba..240bd7a86233f6b907816f812681e4cd778ecaae
 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -568,6 +568,278 @@ vect_analyze_data_ref_dependence (struct 
data_dependence_relation *ddr,
   return opt_result::success ();
 }
 
+/* This function tries to validate whether an early break vectorization
+   is possible for the current instruction sequence. Returns True i
+   possible, otherwise False.
+
+   Requirements:
+ - Any memory access must be to a fixed size buffer.
+ - There must not be any loads and stores to the same object.
+ - Multiple loads are allowed as long as they don't alias.
+
+   NOTE:
+ This implemementation is very conservative. Any overlappig loads/stores
+ that take place before the early break statement gets rejected aside from
+ WAR dependencies.
+
+ i.e.:
+
+   a[i] = 8
+   c = a[i]
+   if (b[i])
+ ...
+
+   is not allowed, but
+
+   c = a[i]
+   a[i] = 8
+   if (b[i])
+ ...
+
+   is which is the common case.
+
+   Arguments:
+ - LOOP_VINFO: loop information for the current loop.
+ - CHAIN: Currently detected sequence of instructions that need to be moved
+ if we are to vectorize this early break.
+ - FIXED: Sequences of SSA_NAMEs that must not be moved, they are 
reachable from
+ one or more cond conditions.  If this set overlaps with CHAIN 
then FIXED
+ takes precedence.  This deals with non-single use cases.
+ - LOADS: List of all loads found during traversal.
+ - BASES: List of all load data references found during traversal.
+ - GSTMT: Current position to inspect for validity.  The sequence
+ will be moved upwards from this point.
+ - REACHING_VUSE: The dominating VUSE found so far.
+ - CURRENT_VDEF: The last VDEF we've seen.  These are updated in
+ pre-order and updated in post-order after moving the
+ instruction.  */
+
+static bool
+validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set *chain,
+  hash_set *fixed, vec *loads,
+  vec *bases, tree *reaching_vuse,
+  tree *current_vdef, gimple_stmt_iterator *gstmt,
+  hash_map *renames)
+{
+  if (gsi_end_p (*gstmt))
+return true;
+
+  gimple *stmt = gsi_stmt (*gstmt);
+  if (gimple_has_ops (stmt))
+{
+  tree dest = NULL_TREE;
+  /* Try to find the SSA_NAME being defined.  For Statements with an LHS
+use the LHS, if not, assume that the first argument of a call is the
+value being defined.  e.g. MASKED_LOAD etc.  */
+  if (gimple_has_lhs (stmt))
+   {
+ if (is_gimple_assign (stmt))
+   dest = gimple_assign_lhs (stmt);
+ else if (const gcall *call = dyn_cast  (stmt))

[PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable.

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

Vectorization of a gcond starts off essentially the same as vectorizing a
comparison witht he only difference being how the operands are extracted.

This refactors vectorable_comparison such that we now have a generic function
that can be used from vectorizable_early_break.  The refactoring splits the
gassign checks and actual validation/codegen off to a helper function.

No change in functionality expected.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_comparison): Refactor, splitting body
to ...
(vectorizable_comparison_1): ...This.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
ae24f3e66e63d9bd9763284a47fb2c911335c4c1..f3e33cd4ed125b9564ca81acd197693fc3457c31
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11332,21 +11332,22 @@ vectorizable_condition (vec_info *vinfo,
 
 /* vectorizable_comparison.
 
-   Check if STMT_INFO is comparison expression that can be vectorized.
+/* Helper of vectorizable_comparison.
+
+   Check if STMT_INFO is comparison expression CODE that can be vectorized.
If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
comparison, put it in VEC_STMT, and insert it at GSI.
 
Return true if STMT_INFO is vectorizable in this way.  */
 
 static bool
-vectorizable_comparison (vec_info *vinfo,
-stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
-gimple **vec_stmt,
-slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+vectorizable_comparison_1 (vec_info *vinfo, tree vectype,
+  stmt_vec_info stmt_info, tree_code code,
+  gimple_stmt_iterator *gsi, gimple **vec_stmt,
+  slp_tree slp_node, stmt_vector_for_cost *cost_vec)
 {
   tree lhs, rhs1, rhs2;
   tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
-  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
   tree new_temp;
   loop_vec_info loop_vinfo = dyn_cast  (vinfo);
@@ -11354,7 +11355,7 @@ vectorizable_comparison (vec_info *vinfo,
   int ndts = 2;
   poly_uint64 nunits;
   int ncopies;
-  enum tree_code code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
+  enum tree_code bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
   int i;
   bb_vec_info bb_vinfo = dyn_cast  (vinfo);
   vec vec_oprnds0 = vNULL;
@@ -11377,14 +11378,6 @@ vectorizable_comparison (vec_info *vinfo,
 ncopies = vect_get_num_copies (loop_vinfo, vectype);
 
   gcc_assert (ncopies >= 1);
-  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
-return false;
-
-  gassign *stmt = dyn_cast  (stmt_info->stmt);
-  if (!stmt)
-return false;
-
-  code = gimple_assign_rhs_code (stmt);
 
   if (TREE_CODE_CLASS (code) != tcc_comparison)
 return false;
@@ -11499,7 +11492,6 @@ vectorizable_comparison (vec_info *vinfo,
  return false;
}
 
-  STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
   vect_model_simple_cost (vinfo, stmt_info,
  ncopies * (1 + (bitop2 != NOP_EXPR)),
  dts, ndts, slp_node, cost_vec);
@@ -11565,6 +11557,44 @@ vectorizable_comparison (vec_info *vinfo,
   return true;
 }
 
+/* vectorizable_comparison.
+
+   Check if STMT_INFO is comparison expression that can be vectorized.
+   If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
+   comparison, put it in VEC_STMT, and insert it at GSI.
+
+   Return true if STMT_INFO is vectorizable in this way.  */
+
+static bool
+vectorizable_comparison (vec_info *vinfo,
+stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
+gimple **vec_stmt,
+slp_tree slp_node, stmt_vector_for_cost *cost_vec)
+{
+  bb_vec_info bb_vinfo = dyn_cast  (vinfo);
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
+return false;
+
+  gassign *stmt = dyn_cast  (stmt_info->stmt);
+  if (!stmt)
+return false;
+
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
+ vec_stmt, slp_node, cost_vec))
+return false;
+
+  if (!vec_stmt)
+STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
+
+  return true;
+}
+
 /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
can handle all live statements in the node.  Otherwise return true
if STMT_INFO is not live or if vectorizable_live_operation can handle it.




-- 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
ae24f3e66e63d9bd9763284a47fb2c911335c4c1..f3e33cd4ed125b9564ca81acd197693fc3457c31
 100644
--- 

[PATCH 8/19]middle-end: updated niters analysis to handle multiple exits.

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

For early break vectorization we have to update niters analysis to record and
analyze all exits of the loop, and so all conds.

The niters of the loop is still determined by the main/natural exit of the loop
as this is the O(n) bounds.  For now we don't do much with the secondary conds,
but their assumptions can be used to generate versioning checks later.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-loop.cc (vect_get_loop_niters): Analyze all exits and return
all gconds.
(vect_analyze_loop_form): Update code checking for conds.
(vect_create_loop_vinfo): Handle having multiple conds.
(vect_analyze_loop): Release extra loop conds structures.
* tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
LOOP_VINFO_LOOP_IV_COND): New.
(struct vect_loop_form_info): Add conds, loop_iv_cond.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
55e69a7ca0b24e0872477141db6f74dbf90b7981..9065811b3b9c2a550baf44768603172b9e26b94b
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -849,80 +849,106 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info 
loop_vinfo)
in NUMBER_OF_ITERATIONSM1.  Place the condition under which the
niter information holds in ASSUMPTIONS.
 
-   Return the loop exit condition.  */
+   Return the loop exit conditions.  */
 
 
-static gcond *
+static vec
 vect_get_loop_niters (class loop *loop, tree *assumptions,
  tree *number_of_iterations, tree *number_of_iterationsm1)
 {
-  edge exit = single_exit (loop);
+  auto_vec exits = get_loop_exit_edges (loop);
+  vec conds;
+  conds.create (exits.length ());
   class tree_niter_desc niter_desc;
   tree niter_assumptions, niter, may_be_zero;
-  gcond *cond = get_loop_exit_condition (loop);
 
   *assumptions = boolean_true_node;
   *number_of_iterationsm1 = chrec_dont_know;
   *number_of_iterations = chrec_dont_know;
+
   DUMP_VECT_SCOPE ("get_loop_niters");
 
-  if (!exit)
-return cond;
+  if (exits.is_empty ())
+return conds;
 
-  may_be_zero = NULL_TREE;
-  if (!number_of_iterations_exit_assumptions (loop, exit, _desc, NULL)
-  || chrec_contains_undetermined (niter_desc.niter))
-return cond;
+  if (dump_enabled_p ())
+dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n",
+exits.length ());
 
-  niter_assumptions = niter_desc.assumptions;
-  may_be_zero = niter_desc.may_be_zero;
-  niter = niter_desc.niter;
+  edge exit;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (exits, i, exit)
+{
+  gcond *cond = get_edge_condition (exit);
+  if (cond)
+   conds.safe_push (cond);
 
-  if (may_be_zero && integer_zerop (may_be_zero))
-may_be_zero = NULL_TREE;
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n", i);
 
-  if (may_be_zero)
-{
-  if (COMPARISON_CLASS_P (may_be_zero))
+  may_be_zero = NULL_TREE;
+  if (!number_of_iterations_exit_assumptions (loop, exit, _desc, 
NULL)
+  || chrec_contains_undetermined (niter_desc.niter))
+   continue;
+
+  niter_assumptions = niter_desc.assumptions;
+  may_be_zero = niter_desc.may_be_zero;
+  niter = niter_desc.niter;
+
+  if (may_be_zero && integer_zerop (may_be_zero))
+   may_be_zero = NULL_TREE;
+
+  if (may_be_zero)
{
- /* Try to combine may_be_zero with assumptions, this can simplify
-computation of niter expression.  */
- if (niter_assumptions && !integer_nonzerop (niter_assumptions))
-   niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
-niter_assumptions,
-fold_build1 (TRUTH_NOT_EXPR,
- boolean_type_node,
- may_be_zero));
+ if (COMPARISON_CLASS_P (may_be_zero))
+   {
+ /* Try to combine may_be_zero with assumptions, this can simplify
+computation of niter expression.  */
+ if (niter_assumptions && !integer_nonzerop (niter_assumptions))
+   niter_assumptions = fold_build2 (TRUTH_AND_EXPR, 
boolean_type_node,
+niter_assumptions,
+fold_build1 (TRUTH_NOT_EXPR,
+ boolean_type_node,
+ may_be_zero));
+ else
+   niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero,
+build_int_cst (TREE_TYPE (niter), 0),
+rewrite_to_non_trapping_overflow (niter));
+
+ may_be_zero = NULL_TREE;
+  

[PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

This patch splits off the vectorizer's understanding of the main loop exit off
from the normal loop infrastructure.

Essentially we're relaxing the use of single_exit() in the vectorizer as we will
no longer have a single single and need a well defined split between the main
and secondary exits of loops for vectorization.

These new values were added to the loop class even though they're only used by
the vectorizer for a couple of reasons:
  - We need access to them in places where we have no loop_vinfo.
  - We only have a single loop_vinfo for each loop under consideration, however
that same loop can have different copies, e.g. peeled/versioned copies or
the scalar variant of the loop.  For each of these we still need to be able
to have a coherent exit definition.

For these reason the placement in the loop class was the only way to keep the
book keeping together with the loops and avoid possibly expensive lookups.

For this version of the patch the `main` exit of a loop is defined as the exit
that is closest to the loop latch. This is stored in vec_loop_iv.  The remaining
exits which are relevant for the vectorizer are stored inside
vec_loop_alt_exits.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* cfgloop.cc (alloc_loop): Initialize vec_loop_iv.
* cfgloop.h (class loop): Add vec_loop_iv and vec_loop_alt_exits.
* doc/loop.texi: Document get_edge_condition.
* tree-loop-distribution.cc (loop_distribution::distribute_loop):
Initialize vec_loop_iv since loop distributions calls loop peeling which
only understands vec_loop_iv now.
* tree-scalar-evolution.cc (get_edge_condition): New.
(get_loop_exit_condition): Refactor into get_edge_condition.
* tree-scalar-evolution.h (get_edge_condition): New.
* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Update use
of single_exit.
* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors,
vect_set_loop_condition_normal, vect_set_loop_condition,
slpeel_tree_duplicate_loop_to_edge_cfg, slpeel_can_duplicate_loop_p,
find_loop_location, vect_update_ivs_after_vectorizer,
vect_gen_vector_loop_niters_mult_vf, find_guard_arg, vect_do_peeling):
Replace usages of single_exit.
(vec_init_exit_info): New.
* tree-vect-loop.cc (vect_analyze_loop_form,
vect_create_epilog_for_reduction, vectorizable_live_operation,
scale_profile_for_vect_loop, vect_transform_loop): New.
* tree-vectorizer.h (LOOP_VINFO_IV_EXIT, LOOP_VINFO_ALT_EXITS,
vec_init_exit_info): New.

--- inline copy of patch -- 
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 
e7ac2b5f3db55de3dbbab7bd2bfe08388f4ec533..cab82d7960e5be517bba2621f7f4888e7bf3c295
 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -272,6 +272,14 @@ public:
  the basic-block from being collected but its index can still be
  reused.  */
   basic_block former_header;
+
+  /* The controlling loop IV for the current loop when vectorizing.  This IV
+ controls the natural exits of the loop.  */
+  edge  GTY ((skip (""))) vec_loop_iv;
+
+  /* If the loop has multiple exits this structure contains the alternate
+ exits of the loop which are relevant for vectorization.  */
+  vec GTY ((skip (""))) vec_loop_alt_exits;
 };
 
 /* Set if the loop is known to be infinite.  */
diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc
index 
ccda7415d7037e26048425b5d85f3633a39fd325..98123f7dce98227c8dffe4833e159fbb05596831
 100644
--- a/gcc/cfgloop.cc
+++ b/gcc/cfgloop.cc
@@ -355,6 +355,7 @@ alloc_loop (void)
   loop->nb_iterations_upper_bound = 0;
   loop->nb_iterations_likely_upper_bound = 0;
   loop->nb_iterations_estimate = 0;
+  loop->vec_loop_iv = NULL;
   return loop;
 }
 
diff --git a/gcc/doc/loop.texi b/gcc/doc/loop.texi
index 
b357e9de7bcb1898ab9dda25738b9f003ca6f9f5..4ba6bb2585c81f7af34943b0493b94d5c3a8bf60
 100644
--- a/gcc/doc/loop.texi
+++ b/gcc/doc/loop.texi
@@ -212,6 +212,7 @@ relation, and breath-first search order, respectively.
 @code{NULL} if the loop has more than one exit.  You can only use this
 function if @code{LOOPS_HAVE_RECORDED_EXITS} is used.
 @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
+@item @code{get_edge_condition}: Get the condition belonging to an exit edge.
 @item @code{just_once_each_iteration_p}: Returns true if the basic block
 is executed exactly once during each iteration of a loop (that is, it
 does not belong to a sub-loop, and it dominates the latch of the loop).
diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index 
cf7c197aaf7919a0ecd56a10db0a42f93707ca58..97879498db46dd3c34181ae9aa6e5476004dd5b5
 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -3042,6 +3042,24 @@ loop_distribution::distribute_loop (class loop *loop,
   return 0;
 }
 
+ 

[PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant.

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

expand_vector_piecewise does not support VLA expansion as it has a hard assert
on the type not being VLA.

Instead of just failing to expand and so the call marked unsupported we ICE.
This adjust it so we don't and can gracefully handle the expansion in support
checks.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-generic.cc (expand_vector_comparison): Skip piecewise if not
constant.

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index 
df04a0db68da3222f43dd938f8e7adb186cd93c9..da1fd2f40d82a9fa301e6ed0b2f4c3c222d58a8d
 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -481,7 +481,7 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree 
type, tree op0,
}
  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
}
-  else
+  else if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
t = expand_vector_piecewise (gsi, do_compare, type,
 TREE_TYPE (TREE_TYPE (op0)), op0, op1,
 code, false);




-- 
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index 
df04a0db68da3222f43dd938f8e7adb186cd93c9..da1fd2f40d82a9fa301e6ed0b2f4c3c222d58a8d
 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -481,7 +481,7 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree 
type, tree op0,
}
  t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
}
-  else
+  else if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
t = expand_vector_piecewise (gsi, do_compare, type,
 TREE_TYPE (TREE_TYPE (op0)), op0, op1,
 code, false);





[PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

The bitfield vectorization support does not currently recognize bitfields inside
gconds. This means they can't be used as conditions for early break
vectorization which is a functionality we require.

This adds support for them by explicitly matching and handling gcond as a
source.

Testcases are added in the testsuite update patch as the only way to get there
is with the early break vectorization.   See tests:

  - vect-early-break_20.c
  - vect-early-break_21.c

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_init_pattern_stmt): Copy STMT_VINFO_TYPE
from original statement.
(vect_recog_bitfield_ref_pattern): Support bitfields in gcond.

Co-Authored-By:  Andre Vieira 

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
60bc9be6819af9bd28a81430869417965ba9d82d..c221b1d64449ce3b6c8864bbec4b17ddf938c2d6
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -128,6 +128,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple 
*pattern_stmt,
   STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
   STMT_VINFO_DEF_TYPE (pattern_stmt_info)
 = STMT_VINFO_DEF_TYPE (orig_stmt_info);
+  STMT_VINFO_TYPE (pattern_stmt_info) = STMT_VINFO_TYPE (orig_stmt_info);
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
 {
   gcc_assert (!vectype
@@ -2488,27 +2489,37 @@ static gimple *
 vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
 tree *type_out)
 {
-  gassign *first_stmt = dyn_cast  (stmt_info->stmt);
+  gassign *conv_stmt = dyn_cast  (stmt_info->stmt);
+  gcond *cond_stmt = dyn_cast  (stmt_info->stmt);
 
-  if (!first_stmt)
-return NULL;
-
-  gassign *bf_stmt;
-  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
-  && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
+  gimple *bf_stmt = NULL;
+  tree cond_cst = NULL_TREE;
+  if (cond_stmt)
 {
-  gimple *second_stmt
-   = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt));
-  bf_stmt = dyn_cast  (second_stmt);
-  if (!bf_stmt
- || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
+  tree op = gimple_cond_lhs (cond_stmt);
+  if (TREE_CODE (op) != SSA_NAME)
+   return NULL;
+  bf_stmt = dyn_cast  (SSA_NAME_DEF_STMT (op));
+  cond_cst = gimple_cond_rhs (cond_stmt);
+  if (TREE_CODE (cond_cst) != INTEGER_CST)
return NULL;
 }
-  else
+  else if (conv_stmt
+  && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (conv_stmt))
+  && TREE_CODE (gimple_assign_rhs1 (conv_stmt)) == SSA_NAME)
+{
+  gimple *second_stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (conv_stmt));
+  bf_stmt = dyn_cast  (second_stmt);
+}
+
+  if (!bf_stmt
+  || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF)
 return NULL;
 
   tree bf_ref = gimple_assign_rhs1 (bf_stmt);
   tree container = TREE_OPERAND (bf_ref, 0);
+  tree ret_type = cond_cst ? TREE_TYPE (container)
+  : TREE_TYPE (gimple_assign_lhs (conv_stmt));
 
   if (!bit_field_offset (bf_ref).is_constant ()
   || !bit_field_size (bf_ref).is_constant ()
@@ -2522,8 +2533,6 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   gimple *use_stmt, *pattern_stmt;
   use_operand_p use_p;
-  tree ret = gimple_assign_lhs (first_stmt);
-  tree ret_type = TREE_TYPE (ret);
   bool shift_first = true;
   tree container_type = TREE_TYPE (container);
   tree vectype = get_vectype_for_scalar_type (vinfo, container_type);
@@ -2560,7 +2569,8 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
   /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
  PLUS_EXPR then do the shift last as some targets can combine the shift and
  add into a single instruction.  */
-  if (single_imm_use (gimple_assign_lhs (first_stmt), _p, _stmt))
+  if (conv_stmt
+  && single_imm_use (gimple_assign_lhs (conv_stmt), _p, _stmt))
 {
   if (gimple_code (use_stmt) == GIMPLE_ASSIGN
  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)
@@ -2620,7 +2630,21 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
   NOP_EXPR, result);
 }
 
-  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+  if (cond_cst)
+{
+  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
+  pattern_stmt
+   = gimple_build_cond (gimple_cond_code (cond_stmt),
+gimple_get_lhs (pattern_stmt),
+fold_convert (ret_type, cond_cst),
+gimple_cond_true_label (cond_stmt),
+gimple_cond_false_label (cond_stmt));
+  *type_out = STMT_VINFO_VECTYPE (stmt_info);
+}
+  else
+*type_out
+  = get_vectype_for_scalar_type 

[PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

There's an existing bug in loop frequency scaling where the if statement checks
to see if there's a single exit, and records an dump file note but then
continues.

It then tries to access the null pointer, which of course fails.

For multiple loop exists it's not really clear how to scale the exit
probablities as it's really unknown which exit is most probably.

For that reason I ignore the exit edges during scaling but still adjust the
loop body.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* cfgloopmanip.cc (scale_loop_frequencies): Fix typo.
(scale_loop_profile): Don't access null pointer.

--- inline copy of patch -- 
diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 
6e09dcbb0b1864bc64ffd570a4b923f50c3819b5..b10ef3d2be82902ccd74e52a4318217b2db13bcb
 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -501,7 +501,7 @@ scale_loop_frequencies (class loop *loop, 
profile_probability p)
 /* Scale profile in LOOP by P.
If ITERATION_BOUND is non-zero, scale even further if loop is predicted
to iterate too many times.
-   Before caling this function, preheader block profile should be already
+   Before calling this function, preheader block profile should be already
scaled to final count.  This is necessary because loop iterations are
determined by comparing header edge count to latch ege count and thus
they need to be scaled synchronously.  */
@@ -597,14 +597,14 @@ scale_loop_profile (class loop *loop, profile_probability 
p,
   /* If latch exists, change its count, since we changed
 probability of exit.  Theoretically we should update everything from
 source of exit edge to latch, but for vectorizer this is enough.  */
-  if (loop->latch && loop->latch != e->src)
+  if (e && loop->latch && loop->latch != e->src)
loop->latch->count += count_delta;
 
   /* Scale the probabilities.  */
   scale_loop_frequencies (loop, p);
 
   /* Change latch's count back.  */
-  if (loop->latch && loop->latch != e->src)
+  if (e && loop->latch && loop->latch != e->src)
loop->latch->count -= count_delta;
 
   if (dump_file && (dump_flags & TDF_DETAILS))




-- 
diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
index 
6e09dcbb0b1864bc64ffd570a4b923f50c3819b5..b10ef3d2be82902ccd74e52a4318217b2db13bcb
 100644
--- a/gcc/cfgloopmanip.cc
+++ b/gcc/cfgloopmanip.cc
@@ -501,7 +501,7 @@ scale_loop_frequencies (class loop *loop, 
profile_probability p)
 /* Scale profile in LOOP by P.
If ITERATION_BOUND is non-zero, scale even further if loop is predicted
to iterate too many times.
-   Before caling this function, preheader block profile should be already
+   Before calling this function, preheader block profile should be already
scaled to final count.  This is necessary because loop iterations are
determined by comparing header edge count to latch ege count and thus
they need to be scaled synchronously.  */
@@ -597,14 +597,14 @@ scale_loop_profile (class loop *loop, profile_probability 
p,
   /* If latch exists, change its count, since we changed
 probability of exit.  Theoretically we should update everything from
 source of exit edge to latch, but for vectorizer this is enough.  */
-  if (loop->latch && loop->latch != e->src)
+  if (e && loop->latch && loop->latch != e->src)
loop->latch->count += count_delta;
 
   /* Scale the probabilities.  */
   scale_loop_frequencies (loop, p);
 
   /* Change latch's count back.  */
-  if (loop->latch && loop->latch != e->src)
+  if (e && loop->latch && loop->latch != e->src)
loop->latch->count -= count_delta;
 
   if (dump_file && (dump_flags & TDF_DETAILS))





[PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

FORTRAN currently has a pragma NOVECTOR for indicating that vectorization should
not be applied to a particular loop.

ICC/ICX also has such a pragma for C and C++ called #pragma novector.

As part of this patch series I need a way to easily turn off vectorization of
particular loops, particularly for testsuite reasons.

This patch proposes a #pragma GCC novector that does the same for C and C++
as gfortan does for FORTRAN and what ICX/ICX does for C and C++.

I added only some basic tests here, but the next patch in the series uses this
in the testsuite in about ~800 tests.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/c-family/ChangeLog:

* c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
* c-pragma.cc (init_pragma): Use it.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_while_statement, c_parser_do_statement,
c_parser_for_statement, c_parser_statement_after_labels,
c_parse_pragma_novector, c_parser_pragma): Wire through novector and
default to false.

gcc/cp/ChangeLog:

* cp-tree.def (RANGE_FOR_STMT): Update comment.
* cp-tree.h (RANGE_FOR_NOVECTOR): New.
(cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Add novector param.
* init.cc (build_vec_init): Default novector to false.
* method.cc (build_comparison_op): Likewise.
* parser.cc (cp_parser_statement): Likewise.
(cp_parser_for, cp_parser_c_for, cp_parser_range_for,
cp_convert_range_for, cp_parser_iteration_statement,
cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
(cp_parser_pragma_novector): New.
* pt.cc (tsubst_expr): Likewise.
* semantics.cc (finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Likewise.

gcc/ChangeLog:

* doc/extend.texi: Document it.
* tree-core.h (struct tree_base): Add lang_flag_7 and reduce spare0.
* tree.h (TREE_LANG_FLAG_7): New.

gcc/testsuite/ChangeLog:

* g++.dg/vect/vect-novector-pragma.cc: New test.
* gcc.dg/vect/vect-novector-pragma.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 
9cc95ab3ee376628dbef2485b84e6008210fa8fc..99cf2e8bd1c05537c198470f1aaa0a5a9da4e576
 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -87,6 +87,7 @@ enum pragma_kind {
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
   PRAGMA_UNROLL,
+  PRAGMA_NOVECTOR,
 
   PRAGMA_FIRST_EXTERNAL
 };
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 
0d2b333cebbed32423d5dc6fd2a3ac0ce0bf8b94..848a850b8e123ff1c6ae1ec4b7f8ccbd599b1a88
 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1862,6 +1862,10 @@ init_pragma (void)
 cpp_register_deferred_pragma (parse_in, "GCC", "unroll", PRAGMA_UNROLL,
  false, false);
 
+  if (!flag_preprocess_only)
+cpp_register_deferred_pragma (parse_in, "GCC", "novector", PRAGMA_NOVECTOR,
+ false, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 
24a6eb6e4596f32c477e3f1c3f98b9792f7bc92c..9d35fe68704c8aca197bcd4805a146c655959621
 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1572,9 +1572,11 @@ static tree c_parser_c99_block_statement (c_parser *, 
bool *,
  location_t * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec *);
 static void c_parser_switch_statement (c_parser *, bool *);
-static void c_parser_while_statement (c_parser *, bool, unsigned short, bool 
*);
-static void c_parser_do_statement (c_parser *, bool, unsigned short);
-static void c_parser_for_statement (c_parser *, bool, unsigned short, bool *);
+static void c_parser_while_statement (c_parser *, bool, unsigned short, bool,
+ bool *);
+static void c_parser_do_statement (c_parser *, bool, unsigned short, bool);
+static void c_parser_for_statement (c_parser *, bool, unsigned short, bool,
+   bool *);
 static tree c_parser_asm_statement (c_parser *);
 static tree c_parser_asm_operands (c_parser *);
 static tree c_parser_asm_goto_operands (c_parser *);
@@ -6644,13 +6646,13 @@ c_parser_statement_after_labels (c_parser *parser, bool 
*if_p,
  c_parser_switch_statement (parser, if_p);
  break;
case RID_WHILE:
- c_parser_while_statement (parser, false, 0, if_p);
+ c_parser_while_statement (parser, false, 0, false, if_p);
  break;
case RID_DO:
- c_parser_do_statement (parser, false, 0);
+ c_parser_do_statement (parser, false, 0, false);
  break;
case RID_FOR:
- c_parser_for_statement (parser, false, 0, if_p);

[PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi,

With the patch enabling the vectorization of early-breaks, we'd like to allow
bitfield lowering in such loops, which requires the relaxation of allowing
multiple exits when doing so.  In order to avoid a similar issue to PR107275,
the code that rejects loops with certain types of gimple_stmts was hoisted from
'if_convertible_loop_p_1' to 'get_loop_body_in_if_conv_order', to avoid trying
to lower bitfields in loops we are not going to vectorize anyway.

This also ensures 'ifcvt_local_dec' doesn't accidentally remove statements it
shouldn't as it will never come across them.  I made sure to add a comment to
make clear that there is a direct connection between the two and if we were to
enable vectorization of any other gimple statement we should make sure both
handle it.

NOTE: This patch accepted before but never committed because it is a no-op
without the early break patch.   This is a respun version of Andre's patch and
rebased to changes in ifcvt and updated to handle multiple exits.

Bootstrappend and regression tested on aarch64-none-linux-gnu and
x86_64-pc-linux-gnu and no issues.

gcc/ChangeLog:

* tree-if-conv.cc (if_convertible_loop_p_1): Move check from here ...
(get_loop_body_if_conv_order): ... to here.
(if_convertible_loop_p): Remove single_exit check.
(tree_if_conversion): Move single_exit check to if-conversion part and
support multiple exits.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-bitfield-read-1-not.c: New test.
* gcc.dg/vect/vect-bitfield-read-2-not.c: New test.
* gcc.dg/vect/vect-bitfield-read-8.c: New test.
* gcc.dg/vect/vect-bitfield-read-9.c: New test.

Co-Authored-By:  Andre Vieira 

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
new file mode 100644
index 
..0d91067ebb27b1db2b2352975c43bce8b4171e3f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
@@ -0,0 +1,60 @@
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_long_long } */
+/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
+
+#include 
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+char a : 4;
+};
+
+#define N 32
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define RES 56
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+int res = 0;
+for (int i = 0; i < n; ++i)
+  {
+   switch (ptr[i].a)
+ {
+ case 0:
+   res += ptr[i].a + 1;
+   break;
+ case 1:
+ case 2:
+ case 3:
+   res += ptr[i].a;
+   break;
+ default:
+   return 0;
+ }
+  }
+return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f([0], N) != RES)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
+
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c
new file mode 100644
index 
..4ac7b3fc0dfd1c9d0b5e94a2ba6a745545577ec1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_long_long } */
+/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */
+
+#include 
+#include "tree-vect.h"
+
+extern void abort(void);
+
+struct s {
+char a : 4;
+};
+
+#define N 32
+#define ELT0 {0}
+#define ELT1 {1}
+#define ELT2 {2}
+#define ELT3 {3}
+#define RES 48
+struct s A[N]
+  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
+  ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};
+
+int __attribute__ ((noipa))
+f(struct s *ptr, unsigned n) {
+int res = 0;
+for (int i = 0; i < n; ++i)
+  {
+   asm volatile ("" ::: "memory");
+   res += ptr[i].a;
+  }
+return res;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  if (f([0], N) != RES)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
+
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c
new file mode 100644
index 
..52cfd33d937ae90f3fe9556716c90e098b768ac8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift 

[PATCH v5 0/19] Support early break/return auto-vectorization

2023-06-28 Thread Tamar Christina via Gcc-patches
Hi All,

This patch adds initial support for early break vectorization in GCC.
The support is added for any target that implements a vector cbranch optab,
this includes both fully masked and non-masked targets.

Depending on the operation, the vectorizer may also require support for boolean
mask reductions using Inclusive OR.  This is however only checked then the
comparison would produce multiple statements.

Concretely the kind of loops supported are of the forms:

 for (int i = 0; i < N; i++)
 {
   
   if ()
 {
   ...
   ;
 }
   
 }

where  can be:
 - break
 - return
 - goto

Any number of statements can be used before the  occurs.

Since this is an initial version for GCC 14 it has the following limitations and
features:

- Only fixed sized iterations and buffers are supported.  That is to say any
  vectors loaded or stored must be to statically allocated arrays with known
  sizes. N must also be known.  This limitation is because our primary target
  for this optimization is SVE.  For VLA SVE we can't easily do cross page
  iteraion checks. The result is likely to also not be beneficial. For that
  reason we punt support for variable buffers till we have First-Faulting
  support in GCC.
- any stores in  should not be to the same objects as in
  .  Loads are fine as long as they don't have the possibility to
  alias.  More concretely, we block RAW dependencies when the intermediate value
  can't be separated fromt the store, or the store itself can't be moved.
- The number of loop iterations must be known,  this is just a temporarily
  limitation that I intend to address in GCC 14 itself as follow on patches.
- Prologue peeling, alignment peelinig and loop versioning are supported.
- Fully masked loops, unmasked loops and partially masked loops are supported
- Any number of loop early exits are supported.
- The early exit must be before the natural loop exit/latch.  The vectorizer is
  designed in way to propage phi-nodes downwards.  As such supporting this
  inverted control flow is hard.
- No support for epilogue vectorization.  The only epilogue supported is the
  scalar final one.  Epilogue vectorization would also not be profitable.
- Early breaks are only supported for inner loop vectorization.

I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break

With the help of IPA and LTO this still gets hit quite often.  During bootstrap
it hit rather frequently.  Additionally TSVC s332, s481 and s482 all pass now
since these are tests for support for early exit vectorization.

This implementation does not support completely handling the early break inside
the vector loop itself but instead supports adding checks such that if we know
that we have to exit in the current iteration then we branch to scalar code to
actually do the final VF iterations which handles all the code in .

niters analysis and the majority of the vectorizer with hardcoded single_exit
have been updated with the use of a new function vec_loop_iv value which returns
the exit the vectorizer wants to use as the main IV exit.

for niters the this exit is what determines the overall iterations as
that is the O(iters) for the loop.

For the scalar loop we know that whatever exit you take you have to perform at
most VF iterations.  For vector code we only case about the state of fully
performed iteration and reset the scalar code to the (partially) remaining loop.

This new version of the patch does the majority of the work in a new rewritten
loop peeling.  This new function maintains LCSSA all the way through and no
longer requires the touch up functions the vectorized used to incrementally
adjust them later on.  This means that aside from IV updates and guard edge
updates the early exit code is identical to the single exit cases.

When the loop is peeled during the copying I have to go through great lengths to
keep the dominators up to date.  All exits from the first loop are rewired to 
the
loop header of the second loop.  But this can change the immediate dominator.

The dominators can change again when we wire in the loop guard, as such peeling
now returns a list of dominators that need to be updated if a new guard edge is
added.

For the loop peeling we rewrite the loop form:


 Header
  ---
  |x|
   2
   |
   v
---3<--
 early exit |  |  |
v  v  | latch
7  4->6
|  |
|  v
|  8
|  |
|  v
-->5

into

 Header
  ---
  |x|
   2
   |
   v
---3<--
 early exit |  |  |
v  v  | latch
7  4->6
  

[PATCH][committed][docs]: replace backslashchar [PR 110329].

2023-06-21 Thread Tamar Christina via Gcc-patches
Hi All,

It seems like @blackslashchar{} is a relatively new addition
to texinfo.  Other parts of the docs use @samp{\} so use it
here too so older distros work.

Bootstrapped on aarch64-none-linux-gnu and no issues.

committed under obvious rule.

Thanks,
Tamar

gcc/ChangeLog:

PR other/110329
* doc/md.texi: Replace backslashchar.

--- inline copy of patch -- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
052375b1a31b829303e75417c400024f084aef44..9648fdc846abf1700effe3272d5523538ce9b50f
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -837,7 +837,7 @@ blocks on the line.
 @item
 Within an @samp{@{@@} block, any iterators that do not get expanded will result
 in an error.  If for some reason it is required to have @code{<} or @code{>} in
-the output then these must be escaped using @backslashchar{}.
+the output then these must be escaped using @samp{\}.
 
 @item
 It is possible to use the @samp{attrs} list to specify some attributes and to




-- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
052375b1a31b829303e75417c400024f084aef44..9648fdc846abf1700effe3272d5523538ce9b50f
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -837,7 +837,7 @@ blocks on the line.
 @item
 Within an @samp{@{@@} block, any iterators that do not get expanded will result
 in an error.  If for some reason it is required to have @code{<} or @code{>} in
-the output then these must be escaped using @backslashchar{}.
+the output then these must be escaped using @samp{\}.
 
 @item
 It is possible to use the @samp{attrs} list to specify some attributes and to





[PATCH][gensupport] drop suppport for define_cond_exec from compact syntac

2023-06-20 Thread Tamar Christina via Gcc-patches
Hi All,

define_cond_exec does not support the special @@ syntax
and so can't support {@.  As such just remove support
for it.

Bootstrapped and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR bootstrap/110324
* gensupport.cc (convert_syntax): Explicitly check for RTX code.

--- inline copy of patch -- 
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 
980b49cd4814c9f92cae5876a1bae936338df071..e39e6dacce25009df1ef83a0ab9ed309704ca74b
 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -878,7 +878,8 @@ convert_syntax (rtx x, file_location loc)
   const char *templ;
   vec_conlist tconvec, convec, attrvec;
 
-  templ_index = GET_CODE (x) == DEFINE_INSN ? 3 : 2;
+  templ_index = 3;
+  gcc_assert (GET_CODE (x) == DEFINE_INSN);
 
   templ = XTMPL (x, templ_index);
 
@@ -1053,7 +1054,6 @@ process_rtx (rtx desc, file_location loc)
   break;
 
 case DEFINE_COND_EXEC:
-  convert_syntax (desc, loc);
   queue_pattern (desc, _cond_exec_tail, loc);
   break;
 




-- 
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 
980b49cd4814c9f92cae5876a1bae936338df071..e39e6dacce25009df1ef83a0ab9ed309704ca74b
 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -878,7 +878,8 @@ convert_syntax (rtx x, file_location loc)
   const char *templ;
   vec_conlist tconvec, convec, attrvec;
 
-  templ_index = GET_CODE (x) == DEFINE_INSN ? 3 : 2;
+  templ_index = 3;
+  gcc_assert (GET_CODE (x) == DEFINE_INSN);
 
   templ = XTMPL (x, templ_index);
 
@@ -1053,7 +1054,6 @@ process_rtx (rtx desc, file_location loc)
   break;
 
 case DEFINE_COND_EXEC:
-  convert_syntax (desc, loc);
   queue_pattern (desc, _cond_exec_tail, loc);
   break;
 





[PATCH][committed]AArch64 remove test comment from *mov_aarch64

2023-06-20 Thread Tamar Christina via Gcc-patches
Hi All,

I accidentally left a test comment in the final version of the patch.
This removes the comment.

Regtested on aarch64-none-linux-gnu and no issues.

Committed under the obvious rule.

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.md (*mov_aarch64): Drop test comment.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
97e5a59d30ca17c26c0951e9a7c62a7934af8d4f..25f7905c6a0b3b7dd860a1346b8e1d4e1b04164b
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1224,7 +1224,7 @@ (define_insn "*mov_aarch64"
  [m, r Z  ; store_4, * ] str\\t%w1, %0
  [m, w; store_4, * ] str\t%1, %0
  [r, w; neon_to_gp  , simd  ] umov\t%w0, %1.[0]
- [r, w; neon_to_gp  , nosimd] fmov\t%w0, %s1 /*foo */
+ [r, w; neon_to_gp  , nosimd] fmov\t%w0, %s1
  [w, r Z  ; neon_from_gp, simd  ] dup\t%0., %w1
  [w, r Z  ; neon_from_gp, nosimd] fmov\t%s0, %w1
  [w, w; neon_dup   , simd  ] dup\t%0, %1.[0]




-- 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
97e5a59d30ca17c26c0951e9a7c62a7934af8d4f..25f7905c6a0b3b7dd860a1346b8e1d4e1b04164b
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1224,7 +1224,7 @@ (define_insn "*mov_aarch64"
  [m, r Z  ; store_4, * ] str\\t%w1, %0
  [m, w; store_4, * ] str\t%1, %0
  [r, w; neon_to_gp  , simd  ] umov\t%w0, %1.[0]
- [r, w; neon_to_gp  , nosimd] fmov\t%w0, %s1 /*foo */
+ [r, w; neon_to_gp  , nosimd] fmov\t%w0, %s1
  [w, r Z  ; neon_from_gp, simd  ] dup\t%0., %w1
  [w, r Z  ; neon_from_gp, nosimd] fmov\t%s0, %w1
  [w, w; neon_dup   , simd  ] dup\t%0, %1.[0]





RE: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-19 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Jeff Law 
> Sent: Tuesday, June 20, 2023 3:17 AM
> To: Andrew Pinski ; Thiago Jung Bauermann
> 
> Cc: Manolis Tsamis ; Philipp Tomsich
> ; Richard Biener ;
> Palmer Dabbelt ; Kito Cheng ;
> gcc-patches@gcc.gnu.org; Tamar Christina 
> Subject: Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack
> pointer if possible.
> 
> 
> 
> On 6/19/23 17:48, Andrew Pinski wrote:
> > On Mon, Jun 19, 2023 at 4:40 PM Andrew Pinski 
> wrote:
> >>
> >> On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches
> >>  wrote:
> >>>
> >>>
> >>> Hello Manolis,
> >>>
> >>> Philipp Tomsich  writes:
> >>>
>  On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
> >
> > On 5/25/23 06:35, Manolis Tsamis wrote:
> >> Propagation of the stack pointer in cprop_hardreg is currenty
> >> forbidden in all cases, due to maybe_mode_change returning NULL.
> >> Relax this restriction and allow propagation when no mode change is
> requested.
> >>
> >> gcc/ChangeLog:
> >>
> >>   * regcprop.cc (maybe_mode_change): Enable stack pointer
> propagation.
> > Thanks for the clarification.  This is OK for the trunk.  It looks
> > generic enough to have value going forward now rather than waiting.
> 
>  Rebased, retested, and applied to trunk.  Thanks!
> >>>
> >>> Our CI found a couple of tests that started failing on aarch64-linux
> >>> after this commit. I was able to confirm manually that they don't
> >>> happen in the commit immediately before this one, and also that
> >>> these failures are still present in today's trunk.
> >>>
> >>> I have testsuite logs for last good commit, first bad commit and
> >>> current trunk here:
> >>>
> >>> https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbb
> >>> d4b/
> >>>
> >>> Could you please check?
> >>>
> >>> These are the new failures:
> >>>
> >>> Running gcc:gcc.target/aarch64/aarch64.exp ...
> >>> FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times
> >>> mov\\tx11, sp 1
> >>
> >> So for the above before this change we had:
> >> ```
> >> (insn:TI 597 596 598 2 (set (reg:DI 11 x11)
> >>  (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65
> {*movdi_aarch64}
> >>   (nil))
> >> (insn 598 597 599 2 (set (mem:BLK (scratch) [0  A8])
> >>  (unspec:BLK [
> >>  (reg:DI 11 x11)
> >>  (reg/f:DI 31 sp)
> >>  ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1
> >> 1169 {stack_tie}
> >>   (expr_list:REG_DEAD (reg:DI 11 x11)
> >>  (nil)))
> >> ```
> >>
> >> After we get:
> >> ```
> >> (insn 598 596 599 2 (set (mem:BLK (scratch) [0  A8])
> >>  (unspec:BLK [
> >>  (reg:DI 31 sp [11]) repeated x2
> >>  ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1
> >> 1169 {stack_tie}
> >>   (nil))
> >> ```
> >> Which seems to be ok, except we still have:
> >> .cfi_def_cfa_register 11
> >>
> >> That is because on:
> >> (insn/f 596 595 598 2 (set (reg:DI 12 x12)
> >>  (plus:DI (reg:DI 12 x12)
> >>  (const_int 272 [0x110])))
> >> "stack-check-prologue-16.c":16:1
> >> 153 {*adddi3_aarch64}
> >>   (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11)
> >>  (nil)))
> >>
> >> We record x11 but never update it though that came before the mov for
> >> x11 ... So it seems like cprop_hardreg had no idea it needed to
> >> update it.
> >>
> >> I suspect the other testcases are just propagation of sp into the
> >> stores and such and just needed update. But the above testcase seems
> >> getting broken cfi  though I don't know how to fix it.

Yeah, we noticed the failures internally but left them broken since we have an
upcoming AArch64 patch which requires them to be updated anyway and are
rolling up the updates into that patch. 

> >
> > The code from aarch64.cc:
> > ```
> >/* This is done to provide unwinding information for the stack
> >   adjustments we're about to do, however to prevent the 
> > optimizers
> >   from removing the R11 move and leaving the CFA note (which 
> > would
> be
> >   very wrong) we tie the old and new stack pointer together.
> >   The tie will expand to nothing but the optimizers will not 
> > touch
> >   the instruction.  */
> >rtx stack_ptr_copy = gen_rtx_REG (Pmode,
> STACK_CLASH_SVE_CFA_REGNUM);
> >emit_move_insn (stack_ptr_copy, stack_pointer_rtx);
> >emit_insn (gen_stack_tie (stack_ptr_copy,
> > stack_pointer_rtx));
> >
> >/* We want the CFA independent of the stack pointer for the
> >   duration of the loop.  */
> >add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy);
> >RTX_FRAME_RELATED_P (insn) = 1; ```
> >
> > Well except now with this change, the optimizers touch this
> > instruction. Maybe the move instruction should not be a move but an
> > unspec so optimizers don't know what 

RE: [PATCH] Remove -save-temps from tests using -flto

2023-06-19 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 19, 2023 11:19 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] Remove -save-temps from tests using -flto
> 
> On Mon, 19 Jun 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Monday, June 19, 2023 7:28 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Tamar Christina 
> > > Subject: [PATCH] Remove -save-temps from tests using -flto
> > >
> > > The following removes -save-temps that doesn't seem to have any good
> > > reason from tests that also run with -flto added.  That can cause
> > > ltrans files to race with other multilibs tested and I'm frequently
> > > seeing linker complaints that the architecture doesn't match here.
> > >
> > > I'm not sure whether the .ltrans.o files end up in a non gccN/
> > > specific directory or if we end up sharing the same dir for
> > > different multilibs (not sure if it's easily possible to avoid that).
> > >
> > > Parallel testing on x86_64-unknown-linux-gnu in progress.
> > >
> > > Tamar, was there any reason to use -save-temps here?
> >
> > At the time I was getting unresolved errors from these without it.
> > But perhaps that's something to do with dejagnu versions?
> 
> I don't know.  Can you check if there's an issue on your side when removing -
> save-temps?

Nope no issues, all tests still pass.

Regards,
Tamar
> 
> Richard.
> 
> > Tamar
> >
> > >
> > >   * gcc.dg/vect/vect-bic-bitmask-2.c: Remove -save-temps.
> > >   * gcc.dg/vect/vect-bic-bitmask-3.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-4.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-5.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-6.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-8.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-9.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-10.c: Likewise.
> > >   * gcc.dg/vect/vect-bic-bitmask-11.c: Likewise.
> > > ---
> > >  gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-6.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-8.c  | 2 +-
> > > gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-9.c  | 2 +-
> > >  9 files changed, 9 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > > index e9ec9603af6..e6810433d70 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } }
> > > */
> > >  /* { dg-do run } */
> > > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" }
> > > */
> > > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > >
> > >  #include 
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > > index 06c103d3885..f83078b5d51 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } }
> > > */
> > >  /* { dg-do run } */
> > > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" }
> > > */
> > > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > >
> > >  #include 
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > > index 059bfb3ae62..e33a824df07 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } }
> > > */
> > >  /* { dg-do run } */
> > > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" }
> > > */
> > > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > >
> > >  #include 
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > > index 059bfb3ae62..e33a824df07 100644
> > > --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } }
> > > */
> > >  /* { dg-do run } */
> > > -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" }
> > > */
> > > +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> > >
> > >  #include 
> > >
> > > diff --git 

RE: [PATCH] Remove -save-temps from tests using -flto

2023-06-19 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 19, 2023 7:28 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Tamar Christina 
> Subject: [PATCH] Remove -save-temps from tests using -flto
> 
> The following removes -save-temps that doesn't seem to have any good
> reason from tests that also run with -flto added.  That can cause ltrans 
> files to
> race with other multilibs tested and I'm frequently seeing linker complaints
> that the architecture doesn't match here.
> 
> I'm not sure whether the .ltrans.o files end up in a non gccN/ specific 
> directory
> or if we end up sharing the same dir for different multilibs (not sure if 
> it's easily
> possible to avoid that).
> 
> Parallel testing on x86_64-unknown-linux-gnu in progress.
> 
> Tamar, was there any reason to use -save-temps here?

At the time I was getting unresolved errors from these without it.
But perhaps that's something to do with dejagnu versions?

Tamar

> 
>   * gcc.dg/vect/vect-bic-bitmask-2.c: Remove -save-temps.
>   * gcc.dg/vect/vect-bic-bitmask-3.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-4.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-5.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-6.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-8.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-9.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-10.c: Likewise.
>   * gcc.dg/vect/vect-bic-bitmask-11.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-6.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-8.c  | 2 +-
> gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-9.c  | 2 +-
>  9 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> index e9ec9603af6..e6810433d70 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> index 06c103d3885..f83078b5d51 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> index 059bfb3ae62..e33a824df07 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> index 059bfb3ae62..e33a824df07 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> index 91b82fb5988..8895d5c263c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
>  /* { dg-do run } */
> -/* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
> +/* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
> b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
> index 59f339fb8c5..77d4deb633c 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
> @@ -1,6 +1,6 @@
>  /* { dg-skip-if "missing optab for 

RE: [PATCH v2] machine descriptor: New compact syntax for insn and insn_split in Machine Descriptions.

2023-06-13 Thread Tamar Christina via Gcc-patches
Hi All,

Updated patch with feedback addressed.


Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Any feedback?

Thanks,
Tamar

gcc/ChangeLog:

* gensupport.cc (class conlist, add_constraints, add_attributes,
skip_spaces, expect_char, preprocess_compact_syntax,
parse_section_layout, parse_section, convert_syntax): New.
(process_rtx): Check for conversion.
* genoutput.cc (process_template): Check for unresolved iterators.
(class data): Add compact_syntax_p.
(gen_insn): Use it.
* gensupport.h (compact_syntax): New.
(hash-set.h): Include.

Co-Authored-By: Omar Tahir 

--- inline copy of patch ---

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
6a435eb44610960513e9739ac9ac1e8a27182c10..3bd1bcbc8beda9bbaea71c65118ecfa2cdace335
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -27,6 +27,7 @@ See the next chapter for information on the C header file.
 from such an insn.
 * Output Statement::For more generality, write C code to output
 the assembler code.
+* Compact Syntax::  Compact syntax for writing machine descriptors.
 * Predicates::  Controlling what kinds of operands can be used
 for an insn.
 * Constraints:: Fine-tuning operand selection.
@@ -713,6 +714,167 @@ you can use @samp{*} inside of a @samp{@@} 
multi-alternative template:
 @end group
 @end smallexample
 
+@node Compact Syntax
+@section Compact Syntax
+@cindex compact syntax
+
+When a @code{define_insn} or @code{define_insn_and_split} has multiple
+alternatives it may be beneficial to use the compact syntax when specifying
+alternatives.
+
+This syntax puts the constraints and attributes on the same horizontal line as
+the instruction assembly template.
+
+As an example
+
+@smallexample
+@group
+(define_insn_and_split ""
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r")
+   (match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,Usv"))]
+  ""
+  "@@
+   mov\\t%w0, %w1
+   mov\\t%w0, %w1
+   mov\\t%w0, %w1
+   mov\\t%w0, %1
+   #
+   * return aarch64_output_sve_cnt_immediate ('cnt', '%x0', operands[1]);"
+  "&& true"
+   [(const_int 0)]
+  @{
+ aarch64_expand_mov_immediate (operands[0], operands[1]);
+ DONE;
+  @}
+  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm")
+   (set_attr "arch"   "*,*,*,*,*,sve")
+   (set_attr "length" "4,4,4,4,*,  4")
+]
+)
+@end group
+@end smallexample
+
+can be better expressed as:
+
+@smallexample
+@group
+(define_insn_and_split ""
+  [(set (match_operand:SI 0 "nonimmediate_operand")
+   (match_operand:SI 1 "aarch64_mov_operand"))]
+  ""
+  @{@@ [cons: =0, 1; attrs: type, arch, length]
+ [r , r  ; mov_reg  , *   , 4] mov\t%w0, %w1
+ [k , r  ; mov_reg  , *   , 4] ^
+ [r , k  ; mov_reg  , *   , 4] ^
+ [r , M  ; mov_imm  , *   , 4] mov\t%w0, %1
+ [r , n  ; mov_imm  , *   , *] #
+ [r , Usv; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", 
"%x0", operands[1]);
+  @}
+  "&& true"
+  [(const_int 0)]
+  @{
+aarch64_expand_mov_immediate (operands[0], operands[1]);
+DONE;
+  @}
+)
+@end group
+@end smallexample
+
+The syntax rules are as follows:
+@itemize @bullet
+@item
+Templates must start with @samp{@{@@} to use the new syntax.
+
+@item
+@samp{@{@@} is followed by a layout in parentheses which is @samp{cons:}
+followed by a comma-separated list of @code{match_operand}/@code{match_scratch}
+operand numbers, then a semicolon, followed by the same for attributes
+(@samp{attrs:}).  Operand modifiers can be placed in this section group as 
well.
+Both sections are optional (so you can use only @samp{cons}, or only
+@samp{attrs}, or both), and @samp{cons} must come before @samp{attrs} if
+present.
+
+@item
+Each alternative begins with any amount of whitespace.
+
+@item
+Following the whitespace is a comma-separated list of "constraints" and/or
+"attributes" within brackets @code{[]}, with sections separated by a semicolon.
+
+@item
+Should you want to copy the previous asm line, the symbol @code{^} can be used.
+This allows less copy pasting between alternative and reduces the number of
+lines to update on changes.
+
+@item
+When using C functions for output, the idiom @samp{* return @var{function};}
+can be replaced with the shorthand @samp{<< @var{function};}.
+
+@item
+Following the closing @samp{]} is any amount of whitespace, and then the actual
+asm output.
+
+@item
+Spaces are allowed in the list (they will simply be removed).
+
+@item
+All constraint alternatives should be specified.  For example, a list of
+of three blank alternatives should be written @samp{[,,]} rather than
+@samp{[]}.
+
+@item
+All attribute alternatives should be non-empty, with @samp{*}
+representing the default attribute value.  For example, a list of three
+default attribute values should be written @samp{[*,*,*]} rather than
+@samp{[]}.
+
+
+@item

[PATCH] Remove DEFAULT_MATCHPD_PARTITIONS macro

2023-06-12 Thread Tamar Christina via Gcc-patches
Hi All,

As Jakub pointed out, DEFAULT_MATCHPD_PARTITIONS
is now unused and can be removed.

Bootstrapped aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Remove DEFAULT_MATCHPD_PARTITIONS.

--- inline copy of patch -- 
diff --git a/gcc/config.in b/gcc/config.in
index 
cf2f284378447c8f8e2f838a786dba23d6086fe3..0e62b9fbfc93da8fb511bf581ef9457e55c8bc6c
 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -67,12 +67,6 @@
 #endif
 
 
-/* Define to larger than one set the number of match.pd partitions to make. */
-#ifndef USED_FOR_TARGET
-#undef DEFAULT_MATCHPD_PARTITIONS
-#endif
-
-
 /* Define to larger than zero set the default stack clash protector size. */
 #ifndef USED_FOR_TARGET
 #undef DEFAULT_STK_CLASH_GUARD_SIZE
diff --git a/gcc/configure b/gcc/configure
index 
5f67808b77441ba730183eef90367b70a51b08a0..3aa2534f4d4aa4136e9aaf5de51b8e6b67c48d5a
 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -7908,11 +7908,6 @@ if (test $DEFAULT_MATCHPD_PARTITIONS -lt 1); then
 fi
 
 
-cat >>confdefs.h <<_ACEOF
-#define DEFAULT_MATCHPD_PARTITIONS $DEFAULT_MATCHPD_PARTITIONS
-_ACEOF
-
-
 
 # Enable __cxa_atexit for C++.
 # Check whether --enable-__cxa_atexit was given.
@@ -19850,7 +19845,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19853 "configure"
+#line 19848 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -19956,7 +19951,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19959 "configure"
+#line 19954 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 
cc8dd9e20bf4e3994af99a74ec2a0fe61b0fb1ae..524ef76ec7deb6357d616b6dc6e016d2a9804816
 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -932,8 +932,6 @@ if (test $DEFAULT_MATCHPD_PARTITIONS -lt 1); then
Cannot be negative.]))
 fi
 
-AC_DEFINE_UNQUOTED(DEFAULT_MATCHPD_PARTITIONS, $DEFAULT_MATCHPD_PARTITIONS,
-   [Define to larger than one set the number of match.pd partitions to 
make.])
 AC_SUBST(DEFAULT_MATCHPD_PARTITIONS)
 
 # Enable __cxa_atexit for C++.




-- 
diff --git a/gcc/config.in b/gcc/config.in
index 
cf2f284378447c8f8e2f838a786dba23d6086fe3..0e62b9fbfc93da8fb511bf581ef9457e55c8bc6c
 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -67,12 +67,6 @@
 #endif
 
 
-/* Define to larger than one set the number of match.pd partitions to make. */
-#ifndef USED_FOR_TARGET
-#undef DEFAULT_MATCHPD_PARTITIONS
-#endif
-
-
 /* Define to larger than zero set the default stack clash protector size. */
 #ifndef USED_FOR_TARGET
 #undef DEFAULT_STK_CLASH_GUARD_SIZE
diff --git a/gcc/configure b/gcc/configure
index 
5f67808b77441ba730183eef90367b70a51b08a0..3aa2534f4d4aa4136e9aaf5de51b8e6b67c48d5a
 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -7908,11 +7908,6 @@ if (test $DEFAULT_MATCHPD_PARTITIONS -lt 1); then
 fi
 
 
-cat >>confdefs.h <<_ACEOF
-#define DEFAULT_MATCHPD_PARTITIONS $DEFAULT_MATCHPD_PARTITIONS
-_ACEOF
-
-
 
 # Enable __cxa_atexit for C++.
 # Check whether --enable-__cxa_atexit was given.
@@ -19850,7 +19845,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19853 "configure"
+#line 19848 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -19956,7 +19951,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19959 "configure"
+#line 19954 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 
cc8dd9e20bf4e3994af99a74ec2a0fe61b0fb1ae..524ef76ec7deb6357d616b6dc6e016d2a9804816
 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -932,8 +932,6 @@ if (test $DEFAULT_MATCHPD_PARTITIONS -lt 1); then
Cannot be negative.]))
 fi
 
-AC_DEFINE_UNQUOTED(DEFAULT_MATCHPD_PARTITIONS, $DEFAULT_MATCHPD_PARTITIONS,
-   [Define to larger than one set the number of match.pd partitions to 
make.])
 AC_SUBST(DEFAULT_MATCHPD_PARTITIONS)
 
 # Enable __cxa_atexit for C++.





RE: [PATCH][committed] Regenerate config.in

2023-06-12 Thread Tamar Christina via Gcc-patches
> 
> Do you use the DEFAULT_MATCHPD_PARTITIONS macro anywhere?
> If not, why the AC_DEFINE_UNQUOTED at all and not just the AC_SUBST?
> 

It used to be used to change the default of genmatch.cc, but the default is now 
not to split anymore.  So guess I can remove it.

Will follow up...


[PATCH][committed] Regenerate config.in

2023-06-12 Thread Tamar Christina via Gcc-patches
Hi All,

Looks like I forgot to regenerate config.in which
causes updates when you enable maintainer mode.

Bootstrapped aarch64-none-linux-gnu.

Committed under obvious rule.

Thanks,
Tamar

gcc/ChangeLog:

* config.in: Regenerate.

--- inline copy of patch -- 
diff --git a/gcc/config.in b/gcc/config.in
index 
4cad077bfbed7fd73b3c04ce6405fd2f49178412..cf2f284378447c8f8e2f838a786dba23d6086fe3
 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -67,6 +67,12 @@
 #endif
 
 
+/* Define to larger than one set the number of match.pd partitions to make. */
+#ifndef USED_FOR_TARGET
+#undef DEFAULT_MATCHPD_PARTITIONS
+#endif
+
+
 /* Define to larger than zero set the default stack clash protector size. */
 #ifndef USED_FOR_TARGET
 #undef DEFAULT_STK_CLASH_GUARD_SIZE
@@ -2239,8 +2245,7 @@
 #endif
 
 
-/* Define to the sub-directory in which libtool stores uninstalled libraries.
-   */
+/* Define to the sub-directory where libtool stores uninstalled libraries. */
 #ifndef USED_FOR_TARGET
 #undef LT_OBJDIR
 #endif




-- 
diff --git a/gcc/config.in b/gcc/config.in
index 
4cad077bfbed7fd73b3c04ce6405fd2f49178412..cf2f284378447c8f8e2f838a786dba23d6086fe3
 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -67,6 +67,12 @@
 #endif
 
 
+/* Define to larger than one set the number of match.pd partitions to make. */
+#ifndef USED_FOR_TARGET
+#undef DEFAULT_MATCHPD_PARTITIONS
+#endif
+
+
 /* Define to larger than zero set the default stack clash protector size. */
 #ifndef USED_FOR_TARGET
 #undef DEFAULT_STK_CLASH_GUARD_SIZE
@@ -2239,8 +2245,7 @@
 #endif
 
 
-/* Define to the sub-directory in which libtool stores uninstalled libraries.
-   */
+/* Define to the sub-directory where libtool stores uninstalled libraries. */
 #ifndef USED_FOR_TARGET
 #undef LT_OBJDIR
 #endif





RE: gcc/config.in was not regenerated

2023-06-12 Thread Tamar Christina via Gcc-patches
Hi Coudert,

Sorry, missed that one.

I'll fix that.

Tamar.

> -Original Message-
> From: FX Coudert 
> Sent: Saturday, June 10, 2023 9:21 PM
> To: Tamar Christina 
> Cc: g...@gcc.gnu.org; Jeff Law ; gcc-
> patc...@gcc.gnu.org
> Subject: gcc/config.in was not regenerated
> 
> Hi,
> 
> Building GCC in maintainer mode leads to changes in gcc/config.in
> :
> 
> > diff --git a/gcc/config.in b/gcc/config.in index
> > 4cad077bfbe..25442c59aec 100644
> > --- a/gcc/config.in
> > +++ b/gcc/config.in
> > @@ -67,6 +67,12 @@
> >  #endif
> > +/* Define to larger than one set the number of match.pd
> > partitions to make. */
> > +#ifndef USED_FOR_TARGET
> > +#undef DEFAULT_MATCHPD_PARTITIONS
> > +#endif
> > +
> > +
> >  /* Define to larger than zero set the default stack clash protector
> > size. */  #ifndef USED_FOR_TARGET  #undef
> DEFAULT_STK_CLASH_GUARD_SIZE
> 
> which I think are because this commit
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=0a85544e1aaeca41133ecfc4
> 38cda913dbc0f122
> should have regenerated and committed config.in 
> 
> Christina, can you please have a look?
> 
> FX


[PATCH][GCC][AArch64] convert some patterns to new MD syntax

2023-06-08 Thread Tamar Christina via Gcc-patches
Hi All,

This converts some patterns in the AArch64 backend to use the new
compact syntax.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

gcc/ChangeLog:

* config/aarch64/aarch64.md (arches): Add nosimd.
(*mov_aarch64, *movsi_aarch64, *movdi_aarch64): Rewrite to
compact syntax.

Thanks,
Tamar

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
8b8951d7b14aa1a8858fdc24bf6f9dd3d927d5ea..601173338a9068f7694867c8e6e78f9b10f32a17
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -366,7 +366,7 @@ (define_constants
 ;; As a convenience, "fp_q" means "fp" + the ability to move between
 ;; Q registers and is equivalent to "simd".
 
-(define_enum "arches" [ any rcpc8_4 fp fp_q simd sve fp16])
+(define_enum "arches" [ any rcpc8_4 fp fp_q simd nosimd sve fp16])
 
 (define_enum_attr "arch" "arches" (const_string "any"))
 
@@ -397,6 +397,9 @@ (define_attr "arch_enabled" "no,yes"
(and (eq_attr "arch" "fp_q, simd")
 (match_test "TARGET_SIMD"))
 
+   (and (eq_attr "arch" "nosimd")
+(match_test "!TARGET_SIMD"))
+
(and (eq_attr "arch" "fp16")
 (match_test "TARGET_FP_F16INST"))
 
@@ -1206,44 +1209,27 @@ (define_expand "mov"
 )
 
 (define_insn "*mov_aarch64"
-  [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r,w,r  ,r,w, 
m,m,r,w,w")
-   (match_operand:SHORT 1 "aarch64_mov_operand"  " 
r,M,D,Usv,m,m,rZ,w,w,rZ,w"))]
+  [(set (match_operand:SHORT 0 "nonimmediate_operand")
+   (match_operand:SHORT 1 "aarch64_mov_operand"))]
   "(register_operand (operands[0], mode)
 || aarch64_reg_or_zero (operands[1], mode))"
-{
-   switch (which_alternative)
- {
- case 0:
-   return "mov\t%w0, %w1";
- case 1:
-   return "mov\t%w0, %1";
- case 2:
-   return aarch64_output_scalar_simd_mov_immediate (operands[1],
-   mode);
- case 3:
-   return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
- case 4:
-   return "ldr\t%w0, %1";
- case 5:
-   return "ldr\t%0, %1";
- case 6:
-   return "str\t%w1, %0";
- case 7:
-   return "str\t%1, %0";
- case 8:
-   return TARGET_SIMD ? "umov\t%w0, %1.[0]" : "fmov\t%w0, %s1";
- case 9:
-   return TARGET_SIMD ? "dup\t%0., %w1" : "fmov\t%s0, %w1";
- case 10:
-   return TARGET_SIMD ? "dup\t%0, %1.[0]" : "fmov\t%s0, %s1";
- default:
-   gcc_unreachable ();
- }
-}
-  ;; The "mov_imm" type for CNT is just a placeholder.
-  [(set_attr "type" "mov_reg,mov_imm,neon_move,mov_imm,load_4,load_4,store_4,
-store_4,neon_to_gp,neon_from_gp,neon_dup")
-   (set_attr "arch" "*,*,simd,sve,*,*,*,*,*,*,*")]
+  {@ [cons: =0, 1; attrs: type, arch]
+ [r , r; mov_reg, * ] mov\t%w0, %w1
+ [r , M; mov_imm, * ] mov\t%w0, %1
+ [w , D; neon_move  , simd  ] << 
aarch64_output_scalar_simd_mov_immediate (operands[1], mode);
+ /* The "mov_imm" type for CNT is just a placeholder.  */
+ [r , Usv  ; mov_imm, sve   ] << aarch64_output_sve_cnt_immediate 
("cnt", "%x0", operands[1]);
+ [r , m; load_4 , * ] ldr\t%w0, %1
+ [w , m; load_4 , * ] ldr\t%0, %1
+ [m , rZ   ; store_4, * ] str\\t%w1, %0
+ [m , w; store_4, * ] str\t%1, %0
+ [r , w; neon_to_gp  , simd  ] umov\t%w0, %1.[0]
+ [r , w; neon_to_gp  , nosimd] fmov\t%w0, %s1 /*foo */
+ [w , rZ   ; neon_from_gp, simd  ] dup\t%0., %w1
+ [w , rZ   ; neon_from_gp, nosimd] fmov\t%s0, %w1
+ [w , w; neon_dup   , simd  ] dup\t%0, %1.[0]
+ [w , w; neon_dup   , nosimd] fmov\t%s0, %s1
+  }
 )
 
 (define_expand "mov"
@@ -1280,79 +1266,71 @@ (define_expand "mov"
 )
 
 (define_insn_and_split "*movsi_aarch64"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m,  
r,  r,  r, w,r,w, w")
-   (match_operand:SI 1 "aarch64_mov_operand"  " 
r,r,k,M,n,Usv,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Ds"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand")
+   (match_operand:SI 1 "aarch64_mov_operand"))]
   "(register_operand (operands[0], SImode)
 || aarch64_reg_or_zero (operands[1], SImode))"
-  "@
-   mov\\t%w0, %w1
-   mov\\t%w0, %w1
-   mov\\t%w0, %w1
-   mov\\t%w0, %1
-   #
-   * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
-   ldr\\t%w0, %1
-   ldr\\t%s0, %1
-   str\\t%w1, %0
-   str\\t%s1, %0
-   adrp\\t%x0, %A1\;ldr\\t%w0, [%x0, %L1]
-   adr\\t%x0, %c1
-   adrp\\t%x0, %A1
-   fmov\\t%s0, %w1
-   fmov\\t%w0, %s1
-   fmov\\t%s0, %s1
-   * return aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);"
+  {@ [cons: =0, 1; attrs: type, arch, length]
+ [r , r  ; mov_reg  , *   , 4] mov\t%w0, %w1
+ [k , r  ; mov_reg  , *   , 4] ^
+ [r , k  ; mov_reg  , * 

RE: [PATCH v2] machine descriptor: New compact syntax for insn and insn_split in Machine Descriptions.

2023-06-08 Thread Tamar Christina via Gcc-patches
Hi,

New version of the patch, I've omitted the explanation again 

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Any feedback?

Thanks,
Tamar

gcc/ChangeLog:

* gensupport.cc (class conlist, add_constraints, add_attributes,
create_missing_attributes, skip_spaces, expect_char,
preprocess_compact_syntax, parse_section_layout, parse_section,
convert_syntax): New.
(process_rtx): Check for conversion.
* genoutput.cc (process_template): Check for unresolved iterators.
(class data): Add compact_syntax_p.
(gen_insn): Use it.
* gensupport.h (compact_syntax): New.
(hash-set.h): Include.

Co-Authored-By: Omar Tahir 

--- inline copy of patch ---

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
6a435eb44610960513e9739ac9ac1e8a27182c10..eee3684cd0865dbb07c0da45e0aa4ac0ce4e9643
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -27,6 +27,7 @@ See the next chapter for information on the C header file.
 from such an insn.
 * Output Statement::For more generality, write C code to output
 the assembler code.
+* Compact Syntax::  Compact syntax for writing machine descriptors.
 * Predicates::  Controlling what kinds of operands can be used
 for an insn.
 * Constraints:: Fine-tuning operand selection.
@@ -713,6 +714,183 @@ you can use @samp{*} inside of a @samp{@@} 
multi-alternative template:
 @end group
 @end smallexample
 
+@node Compact Syntax
+@section Compact Syntax
+@cindex compact syntax
+
+In cases where the number of alternatives in a @code{define_insn} or
+@code{define_insn_and_split} are large then it may be beneficial to use the
+compact syntax when specifying alternatives.
+
+This syntax puts the constraints and attributes on the same horizontal line as
+the instruction assembly template.
+
+As an example
+
+@smallexample
+@group
+(define_insn_and_split ""
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r")
+   (match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,Usv"))]
+  ""
+  "@@
+   mov\\t%w0, %w1
+   mov\\t%w0, %w1
+   mov\\t%w0, %w1
+   mov\\t%w0, %1
+   #
+   * return aarch64_output_sve_cnt_immediate ('cnt', '%x0', operands[1]);"
+  "&& true"
+   [(const_int 0)]
+  @{
+ aarch64_expand_mov_immediate (operands[0], operands[1]);
+ DONE;
+  @}
+  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm")
+   (set_attr "arch"   "*,*,*,*,*,sve")
+   (set_attr "length" "4,4,4,4,*,  4")
+]
+)
+@end group
+@end smallexample
+
+can be better expressed as:
+
+@smallexample
+@group
+(define_insn_and_split ""
+  [(set (match_operand:SI 0 "nonimmediate_operand")
+   (match_operand:SI 1 "aarch64_mov_operand"))]
+  ""
+  @{@@ [cons: =0, 1; attrs: type, arch, length]
+ [r , r  ; mov_reg  , *   , 4] mov\t%w0, %w1
+ [k , r  ; mov_reg  , *   , 4] ^
+ [r , k  ; mov_reg  , *   , 4] ^
+ [r , M  ; mov_imm  , *   , 4] mov\t%w0, %1
+ [r , n  ; mov_imm  , *   , *] #
+ [r , Usv; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", 
"%x0", operands[1]);
+  @}
+  "&& true"
+  [(const_int 0)]
+  @{
+aarch64_expand_mov_immediate (operands[0], operands[1]);
+DONE;
+  @}
+)
+@end group
+@end smallexample
+
+The syntax rules are as follows:
+@itemize @bullet
+@item
+Templates must start with @samp{@{@@} to use the new syntax.
+
+@item
+@samp{@{@@} is followed by a layout in parentheses which is @samp{cons:} 
followed by
+a list of @code{match_operand}/@code{match_scratch} comma operand numbers, 
then a
+semicolon, followed by the same for attributes (@samp{attrs:}).  Operand
+modifiers can be placed in this section group as well.  Both sections
+are optional (so you can use only @samp{cons}, or only @samp{attrs}, or both),
+and @samp{cons} must come before @samp{attrs} if present.
+
+@item
+Each alternative begins with any amount of whitespace.
+
+@item
+Following the whitespace is a comma-separated list of "constraints" and/or
+"attributes" within brackets @code{[]}, with sections separated by a semicolon.
+
+@item
+Should you want to copy the previous asm line, the symbol @code{^} can be used.
+This allows less copy pasting between alternative and reduces the number of
+lines to update on changes.
+
+@item
+When using C functions for output, the idiom @samp{* return ;} can be
+replaced with the shorthand @samp{<< @var{function};}.
+
+@item
+Following the closing @samp{]} is any amount of whitespace, and then the actual
+asm output.
+
+@item
+Spaces are allowed in the list (they will simply be removed).
+
+@item
+All constraint alternatives should be specified: a blank list should be
+@samp{[,,]} or generally use @samp{*} for the alternatives. e.g. 
@samp{[*,*,*]}.
+
+@item
+Within an @samp{@{@@} block both multiline and singleline C comments are
+allowed, but when used outside of a C block they must be the only 
non-whitespace
+blocks on the line.

RE: [PATCH v2] machine descriptor: New compact syntax for insn and insn_split in Machine Descriptions.

2023-06-06 Thread Tamar Christina via Gcc-patches
Hi,

Thanks for the review, just some quick responses before I make the changes:

> >int operand_number;  /* Operand index in the big array.  */
> >int output_format;   /* INSN_OUTPUT_FORMAT_*.  */
> > +  bool compact_syntax_p;
> >struct operand_data operand[MAX_MAX_OPERANDS];  };
> >
> > @@ -700,12 +702,57 @@ process_template (class data *d, const char
> *template_code)
> >   if (sp != ep)
> > message_at (d->loc, "trailing whitespace in output template");
> >
> > - while (cp < sp)
> > + /* Check for any unexpanded iterators.  */
> > + if (bp[0] != '*' && d->compact_syntax_p)
> 
> I assume the bp[0] != '*' condition skips the check for C code blocks.
> Genuine question, but are you sure we want that?  C code often includes asm
> strings (in quotes), such as for the SVE CNT[BHWD] example.
> 
> Extending the check would mean that any use of <...> for C++ templates will
> need to be quoted, but explicit instantiation is pretty rare in .md files.  
> It would
> also look weird for conditions.
> 
> Either way is fine, just asking.

I excluded it entirely to avoid also running afoul of the binary operators. So 
e.g.
* a < b && b > c ? foo : bar shouldn't trigger it.   It seemed more trouble 
than it's
worth to try to get correct.

> > +  }
> > +
> > +  /* Adds a character to the end of the string.  */  void add (char
> > + c)  {
> > +con += c;
> > +  }
> > +
> > +  /* Output the string in the form of a brand-new char *, then effectively
> > + clear the internal string by resetting len to 0.  */  char * out
> > + ()
> 
> Formatting: no need for a space before "out".
> 
> > +  {
> > +/* Final character is always a trailing comma, so strip it out.
> > + */
> 
> trailing ',', ';' or ']', rather than just a comma?

Ah no, this is a bit of a lazy intercalate, when the alternatives are pushed in 
it's
not easy to tell how many there will be (because we don't keep track of it in 
this part),
so we just always add a trailing "," and ignore the last char on output.  
Validation of the
alternative counts themselves is done later by the normal machinery.

> 
> > +char * q;
> 
> Similarly no space before "q" here.
> 
> > +if (modifier.empty ())
> > +  q = xstrndup (con.c_str (), con.size () - 1);
> 
> Could just be "xstrdup (con.c_str ())".
> 
> > +else
> > +  {
> > +   int len = con.size () + modifier.size ();
> > +   q = XNEWVEC (char, len);
> > +   strncpy (q, modifier.c_str (), modifier.size ());
> > +   strncpy (q + modifier.size (), con.c_str (), con.size ());
> > +   q[len -1] = '\0';
> > +  }
> 
> Do we need the separation between "modifier" and "cons"?  It looks like the
> code completes the initialisation of "modifier" before it writes to "cons", 
> and
> so we could just use a single string.

Fair point.

> > +   {
> > + if (XSTR (part, 1) && XSTR (part, 1)[0] != '\0')
> > +   {
> > + error_at (loc, "can't mix normal and compact attribute syntax");
> > + break;
> > +   }
> > + XSTR (part, 1) = attrs[index].out ();
> > +
> > + ++index;
> > + if (index == attrs.size ())
> > +   break;
> > +   }
> 
> It looks like you forgive mixing new-style and old-style syntax, since 
> there's no
> "else error" here.  But the documentation said that that wasn't allowed.
> 
> Either way seems OK to me, but see the next comment.
> 
> > +}
> > +
> > +  return index;
> > +}
> > +
> > +/* Modify the attributes list to make space for the implicitly declared
> > +   attributes in the attrs: list.  */
> > +
> > +static void
> > +create_missing_attributes (rtx x, file_location /* loc */,
> > +vec_conlist ) {
> > +  if (attrs.empty ())
> > +return;
> > +
> > +  unsigned int attr_index = GET_CODE (x) == DEFINE_INSN ? 4 : 3;
> > + vec_conlist missing;
> > +
> > +  /* This is an O(n*m) loop but it's fine, both n and m will always be very
> > + small.  */
> 
> Agreed that quadraticness isn't a problem.  But I wonder how many people
> would write an explicit placeholder set_attr.  Unlike match_operand and
> match_scratch, a placeholder set_attr doesn't carry any additional
> information.
> 
> It might be simpler to drop add_attributes and add all attributes
> unconditionally in this function instead.  If the user tries to specify the 
> same
> attribute using both syntaxes, the pattern would end up with two definitions
> of the same attribute, which ought to be flagged by existing code.
> 

This was done to support the (in arm backend) common thing of having attributes
which are either too complex to add inline in the new syntax or that just 
repeat a
value.

i.e. it's to allow cases like this:

  [(set_attr "length")
   (set_attr "predicable" "yes")
   (set_attr "predicable_short_it")
   (set_attr "arch")
   (set (attr "type") (if_then_else (match_operand 2 "const_int_operand" "")
  (const_string "alu_imm")
  (const_string "alu_sreg")))

Where your attrs 

[PATCH v2] machine descriptor: New compact syntax for insn and insn_split in Machine Descriptions.

2023-06-05 Thread Tamar Christina via Gcc-patches
Hi All,

This patch adds support for a compact syntax for specifying constraints in
instruction patterns. Credit for the idea goes to Richard Earnshaw.

With this new syntax we want a clean break from the current limitations to make
something that is hopefully easier to use and maintain.

The idea behind this compact syntax is that often times it's quite hard to
correlate the entries in the constrains list, attributes and instruction lists.

One has to count and this often is tedious.  Additionally when changing a single
line in the insn multiple lines in a diff change, making it harder to see what's
going on.

This new syntax takes into account many of the common things that are done in MD
files.   It's also worth saying that this version is intended to deal with the
common case of a string based alternatives.   For C chunks we have some ideas
but those are not intended to be addressed here.

It's easiest to explain with an example:

normal syntax:

(define_insn_and_split "*movsi_aarch64"
  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m,  
r,  r,  r, w,r,w, w")
(match_operand:SI 1 "aarch64_mov_operand"  " 
r,r,k,M,n,Usv,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Ds"))]
  "(register_operand (operands[0], SImode)
|| aarch64_reg_or_zero (operands[1], SImode))"
  "@
   mov\\t%w0, %w1
   mov\\t%w0, %w1
   mov\\t%w0, %w1
   mov\\t%w0, %1
   #
   * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
   ldr\\t%w0, %1
   ldr\\t%s0, %1
   str\\t%w1, %0
   str\\t%s1, %0
   adrp\\t%x0, %A1\;ldr\\t%w0, [%x0, %L1]
   adr\\t%x0, %c1
   adrp\\t%x0, %A1
   fmov\\t%s0, %w1
   fmov\\t%w0, %s1
   fmov\\t%s0, %s1
   * return aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);"
  "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)
&& REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
   [(const_int 0)]
   "{
   aarch64_expand_mov_immediate (operands[0], operands[1]);
   DONE;
}"
  ;; The "mov_imm" type for CNT is just a placeholder.
  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_4,

load_4,store_4,store_4,load_4,adr,adr,f_mcr,f_mrc,fmov,neon_move")
   (set_attr "arch"   "*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd")
   (set_attr "length" "4,4,4,4,*,  4,4, 4,4, 4,8,4,4, 4, 4, 4,   4")
]
)

New syntax:

(define_insn_and_split "*movsi_aarch64"
  [(set (match_operand:SI 0 "nonimmediate_operand")
(match_operand:SI 1 "aarch64_mov_operand"))]
  "(register_operand (operands[0], SImode)
|| aarch64_reg_or_zero (operands[1], SImode))"
  {@ [cons: =0, 1; attrs: type, arch, length]
 [r , r  ; mov_reg  , *   , 4] mov\t%w0, %w1
 [k , r  ; mov_reg  , *   , 4] ^
 [r , k  ; mov_reg  , *   , 4] ^
 [r , M  ; mov_imm  , *   , 4] mov\t%w0, %1
 [r , n  ; mov_imm  , *   ,16] #
 /* The "mov_imm" type for CNT is just a placeholder.  */
 [r , Usv; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate ("cnt", 
"%x0", operands[1]);
 [r , m  ; load_4   , *   , 4] ldr\t%w0, %1
 [w , m  ; load_4   , fp  , 4] ldr\t%s0, %1
 [m , rZ ; store_4  , *   , 4] str\t%w1, %0
 [m , w  ; store_4  , fp  , 4] str\t%s1, %0
 [r , Usw; load_4   , *   , 8] adrp\t%x0, %A1;ldr\t%w0, [%x0, %L1]
 [r , Usa; adr  , *   , 4] adr\t%x0, %c1
 [r , Ush; adr  , *   , 4] adrp\t%x0, %A1
 [w , rZ ; f_mcr, fp  , 4] fmov\t%s0, %w1
 [r , w  ; f_mrc, fp  , 4] fmov\t%w0, %s1
 [w , w  ; fmov , fp  , 4] fmov\t%s0, %s1
 [w , Ds ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate 
(operands[1], SImode);
  }
  "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)
&& REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
  [(const_int 0)]
  {
aarch64_expand_mov_immediate (operands[0], operands[1]);
DONE;
  }
)

The patch contains some more rewritten examples for both Arm and AArch64.  I
have included them for examples in this patch but the final version posted in
will have these split out.

The main syntax rules are as follows (See docs for full rules):
  - Template must start with "{@" and end with "}" to use the new syntax.
  - "{@" is followed by a layout in parentheses which is "cons:" followed by
a list of match_operand/match_scratch IDs, then a semicolon, then the
same for attributes ("attrs:"). Both sections are optional (so you can
use only cons, or only attrs, or both), and cons must come before attrs
if present.
  - Each alternative begins with any amount of whitespace.
  - Following the whitespace is a comma-separated list of constraints and/or
attributes within brackets [], with sections separated by a semicolon.
  - Following the closing ']' is any amount of whitespace, and then the actual
asm output.
  - Spaces are allowed in the list (they will simply be removed).
  - All alternatives should be specified: a blank list should be
"[,,]", "[,,;,]" etc., 

RE: middle-end: Support early break/return auto-vectorization.

2023-05-15 Thread Tamar Christina via Gcc-patches
Hi,

Yes I hope to upstream it this year.  I'm busy cleaning up a new version of the
patch which and hope to send it up for review again next week if all tests pass.

Cheers,
Tamar

From: juzhe.zh...@rivai.ai 
Sent: Monday, May 15, 2023 6:20 AM
To: gcc-patches 
Cc: rguenther ; Tamar Christina ; 
Richard Sandiford 
Subject: middle-end: Support early break/return auto-vectorization.

Hi, this patch is very interesting patch and I found it's very beneficial after 
applying to my downstream RVV GCC.
However, it has been a long time that this patch didn't update.
Is it possible that this patch will be refined and merged into trunk in the 
future ?

Thanks

juzhe.zh...@rivai.ai


RE: [PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-05 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Alexander Monakov 
> Sent: Friday, May 5, 2023 7:22 PM
> To: Tamar Christina 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] Makefile.in: clean up match.pd-related dependencies
> 
> 
> On Fri, 5 May 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Alexander Monakov 
> > > Sent: Friday, May 5, 2023 6:59 PM
> > > To: Tamar Christina 
> > > Cc: Richard Biener ;
> > > gcc-patches@gcc.gnu.org
> > > Subject: RE: [PATCH] Makefile.in: clean up match.pd-related
> > > dependencies
> > >
> > >
> > > On Fri, 5 May 2023, Tamar Christina wrote:
> > >
> > > > > > Am 05.05.2023 um 19:03 schrieb Alexander Monakov via
> > > > > > Gcc-patches
> > > > > >  > > > > patc...@gcc.gnu.org>:
> > > > > >
> > > > > > Clean up confusing changes from the recent refactoring for
> > > > > > parallel match.pd build.
> > > > > >
> > > > > > gimple-match-head.o is not built. Remove related flags adjustment.
> > > > > >
> > > > > > Autogenerated gimple-match-N.o files do not depend on
> > > > > > gimple-match-exports.cc.
> > > > > >
> > > > > > {gimple,generic)-match-auto.h only depend on the prerequisites
> > > > > > of the corresponding s-{gimple,generic}-match stamp file, not any 
> > > > > > .cc
> file.
> > > > >
> > > > > LGTM
> > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >* Makefile.in: (gimple-match-head.o-warn): Remove.
> > > > > >(GIMPLE_MATCH_PD_SEQ_SRC): Do not depend on
> > > > > >gimple-match-exports.cc.
> > > > > >(gimple-match-auto.h): Only depend on s-gimple-match.
> > > > > >(generic-match-auto.h): Likewise.
> > > > > > ---
> > > > > >
> > > > > > Tamar, do I understand correctly that you do not have more
> > > > > > plans for match.pd and I won't collide with you if I attempt
> > > > > > more cleanups in this
> > > > > area? Thanks!
> > > >
> > > > No, but I'm also not sure why this change.
> > > > The idea here was that if gimple-head-export.cc changes you must
> > > > have changed genmatch.cc and so you need to regenerate the
> > > > gimple-match-*
> > > which could change the header.
> > >
> > > gimple-head-export.cc does not exist.
> > >
> > > gimple-match-exports.cc is not a generated file. It's under source
> > > control and edited independently from genmatch.cc. It is compiled
> > > separately, producing gimple-match-exports.o.
> > >
> > > gimple-match-head.cc is also not a generated file, also under source
> control.
> > > It is transitively included into gimple-match-N.o files. If it
> > > changes, they will be rebuilt. This is not changed by my patch.
> > >
> > > gimple-match-auto.h is a generated file. It depends on
> > > s-gimple-match stamp file, which in turn depends on genmatch and
> > > match.pd. If either changes, the rule for the stamp file triggers.
> > > gimple-match-N.o files also depend on the stamp file, so they will be
> rebuilt as well.
> >
> > s-gimple-match does not depend on gimple-match-head.cc. if it changes
> > the stamp is not invalidated.
> 
> Right, this is correct: there's no need to rerun the recipe for the stamp,
> because contents of gimple-match-head.cc do not affect it.
> 
> > This happens to work because gimple-match-N.cc does depend on
> > gimple-match-head.cc, but if the gimple-match-N.cc already exists then
> nothing changes.
> 
> No, if gimple-match-N.cc already exist, make notices they are out-of-date via
> 
> $(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-head.cc;
> @true
> 
> and this triggers rebuilding gimple-match-N.o.
> 
> I tested this. After 'touch gimple-match-head.cc' all ten gimple-match-N.o 
> files
> are rebuilt.
> 
> > So I don't think this changes anything. If anything I would say the
> > stamp file needs to depend on gimple-match-head.cc.
> 
> Is my explanation above satisfactory?

Sure,

Thanks,
Tamar

> 
> Thanks.
> Alexander
> 
> >
> > Thanks,
> > Tamar
> >
> > >
> > > Is there some problem I'm not seeing?
> > >
> > > Thanks.
> > > Alexander
> > >
> > > > So not sure I agree with this.
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > > >
> > > > > > gcc/Makefile.in | 9 +++--
> > > > > > 1 file changed, 3 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in index
> > > > > > 7e7ac078c5..0cc13c37d0 100644
> > > > > > --- a/gcc/Makefile.in
> > > > > > +++ b/gcc/Makefile.in
> > > > > > @@ -230,7 +230,6 @@ gengtype-lex.o-warn = -Wno-error
> > > > > > libgcov-util.o-warn = -Wno-error libgcov-driver-tool.o-warn =
> > > > > > -Wno-error libgcov-merge-tool.o-warn = -Wno-error
> > > > > > -gimple-match-head.o-warn = -Wno-unused
> > > > > > gimple-match-exports.o-warn
> > > > > =
> > > > > > -Wno-unused dfp.o-warn = -Wno-strict-aliasing
> > > > > >
> > > > > > @@ -2674,12 +2673,10 @@ s-tm-texi:
> > > > > > build/genhooks$(build_exeext)
> > > > > $(srcdir)/doc/tm.texi.in
> > > > > >  false; \
> > > > > >fi
> > > > > >
> > > > > > -$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match 

RE: [PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-05 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Alexander Monakov 
> Sent: Friday, May 5, 2023 6:59 PM
> To: Tamar Christina 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] Makefile.in: clean up match.pd-related dependencies
> 
> 
> On Fri, 5 May 2023, Tamar Christina wrote:
> 
> > > > Am 05.05.2023 um 19:03 schrieb Alexander Monakov via Gcc-patches
> > > >  > > patc...@gcc.gnu.org>:
> > > >
> > > > Clean up confusing changes from the recent refactoring for
> > > > parallel match.pd build.
> > > >
> > > > gimple-match-head.o is not built. Remove related flags adjustment.
> > > >
> > > > Autogenerated gimple-match-N.o files do not depend on
> > > > gimple-match-exports.cc.
> > > >
> > > > {gimple,generic)-match-auto.h only depend on the prerequisites of
> > > > the corresponding s-{gimple,generic}-match stamp file, not any .cc file.
> > >
> > > LGTM
> > >
> > > > gcc/ChangeLog:
> > > >
> > > >* Makefile.in: (gimple-match-head.o-warn): Remove.
> > > >(GIMPLE_MATCH_PD_SEQ_SRC): Do not depend on
> > > >gimple-match-exports.cc.
> > > >(gimple-match-auto.h): Only depend on s-gimple-match.
> > > >(generic-match-auto.h): Likewise.
> > > > ---
> > > >
> > > > Tamar, do I understand correctly that you do not have more plans
> > > > for match.pd and I won't collide with you if I attempt more
> > > > cleanups in this
> > > area? Thanks!
> >
> > No, but I'm also not sure why this change.
> > The idea here was that if gimple-head-export.cc changes you must have
> > changed genmatch.cc and so you need to regenerate the gimple-match-*
> which could change the header.
> 
> gimple-head-export.cc does not exist.
> 
> gimple-match-exports.cc is not a generated file. It's under source control and
> edited independently from genmatch.cc. It is compiled separately, producing
> gimple-match-exports.o.
> 
> gimple-match-head.cc is also not a generated file, also under source control.
> It is transitively included into gimple-match-N.o files. If it changes, they 
> will be
> rebuilt. This is not changed by my patch.
> 
> gimple-match-auto.h is a generated file. It depends on s-gimple-match stamp
> file, which in turn depends on genmatch and match.pd. If either changes, the
> rule for the stamp file triggers. gimple-match-N.o files also depend on the
> stamp file, so they will be rebuilt as well.

s-gimple-match does not depend on gimple-match-head.cc. if it changes the stamp
is not invalidated. 

This happens to work because gimple-match-N.cc does depend on 
gimple-match-head.cc,
but if the gimple-match-N.cc already exists then nothing changes.

So I don't think this changes anything. If anything I would say the stamp file 
needs to
depend on gimple-match-head.cc. 

Thanks,
Tamar

> 
> Is there some problem I'm not seeing?
> 
> Thanks.
> Alexander
> 
> > So not sure I agree with this.
> >
> > Thanks,
> > Tamar
> >
> > > >
> > > > gcc/Makefile.in | 9 +++--
> > > > 1 file changed, 3 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in index
> > > > 7e7ac078c5..0cc13c37d0 100644
> > > > --- a/gcc/Makefile.in
> > > > +++ b/gcc/Makefile.in
> > > > @@ -230,7 +230,6 @@ gengtype-lex.o-warn = -Wno-error
> > > > libgcov-util.o-warn = -Wno-error libgcov-driver-tool.o-warn =
> > > > -Wno-error libgcov-merge-tool.o-warn = -Wno-error
> > > > -gimple-match-head.o-warn = -Wno-unused
> > > > gimple-match-exports.o-warn
> > > =
> > > > -Wno-unused dfp.o-warn = -Wno-strict-aliasing
> > > >
> > > > @@ -2674,12 +2673,10 @@ s-tm-texi: build/genhooks$(build_exeext)
> > > $(srcdir)/doc/tm.texi.in
> > > >  false; \
> > > >fi
> > > >
> > > > -$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-
> head.cc \
> > > > -gimple-match-exports.cc; @true
> > > > -gimple-match-auto.h: s-gimple-match gimple-match-head.cc \
> > > > -gimple-match-exports.cc; @true
> > > > +$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-
> head.cc;
> > > > +@true
> > > > +gimple-match-auto.h: s-gimple-match; @true
> > > > $(GENERIC_MATCH_PD_SEQ_SRC): s-generic-match
> > > > generic-match-head.cc; @true
> > > > -generic-match-auto.h: s-generic-match generic-match-head.cc;
> > > > @true
> > > > +generic-match-auto.h: s-generic-match; @true
> > > >
> > > > s-gimple-match: build/genmatch$(build_exeext) \
> > > >$(srcdir)/match.pd cfn-operators.pd
> > > > --
> > > > 2.39.2
> > > >
> >


RE: [PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-05 Thread Tamar Christina via Gcc-patches
> > Am 05.05.2023 um 19:03 schrieb Alexander Monakov via Gcc-patches  patc...@gcc.gnu.org>:
> >
> > Clean up confusing changes from the recent refactoring for parallel
> > match.pd build.
> >
> > gimple-match-head.o is not built. Remove related flags adjustment.
> >
> > Autogenerated gimple-match-N.o files do not depend on
> > gimple-match-exports.cc.
> >
> > {gimple,generic)-match-auto.h only depend on the prerequisites of the
> > corresponding s-{gimple,generic}-match stamp file, not any .cc file.
> 
> LGTM
> 
> > gcc/ChangeLog:
> >
> >* Makefile.in: (gimple-match-head.o-warn): Remove.
> >(GIMPLE_MATCH_PD_SEQ_SRC): Do not depend on
> >gimple-match-exports.cc.
> >(gimple-match-auto.h): Only depend on s-gimple-match.
> >(generic-match-auto.h): Likewise.
> > ---
> >
> > Tamar, do I understand correctly that you do not have more plans for
> > match.pd and I won't collide with you if I attempt more cleanups in this
> area? Thanks!

No, but I'm also not sure why this change.
The idea here was that if gimple-head-export.cc changes you must have changed
genmatch.cc and so you need to regenerate the gimple-match-* which could change 
the header.

So not sure I agree with this.

Thanks,
Tamar

> >
> > gcc/Makefile.in | 9 +++--
> > 1 file changed, 3 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in index
> > 7e7ac078c5..0cc13c37d0 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -230,7 +230,6 @@ gengtype-lex.o-warn = -Wno-error
> > libgcov-util.o-warn = -Wno-error libgcov-driver-tool.o-warn =
> > -Wno-error libgcov-merge-tool.o-warn = -Wno-error
> > -gimple-match-head.o-warn = -Wno-unused gimple-match-exports.o-warn
> =
> > -Wno-unused dfp.o-warn = -Wno-strict-aliasing
> >
> > @@ -2674,12 +2673,10 @@ s-tm-texi: build/genhooks$(build_exeext)
> $(srcdir)/doc/tm.texi.in
> >  false; \
> >fi
> >
> > -$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-head.cc \
> > -gimple-match-exports.cc; @true
> > -gimple-match-auto.h: s-gimple-match gimple-match-head.cc \
> > -gimple-match-exports.cc; @true
> > +$(GIMPLE_MATCH_PD_SEQ_SRC): s-gimple-match gimple-match-head.cc;
> > +@true
> > +gimple-match-auto.h: s-gimple-match; @true
> > $(GENERIC_MATCH_PD_SEQ_SRC): s-generic-match generic-match-head.cc;
> > @true
> > -generic-match-auto.h: s-generic-match generic-match-head.cc; @true
> > +generic-match-auto.h: s-generic-match; @true
> >
> > s-gimple-match: build/genmatch$(build_exeext) \
> >$(srcdir)/match.pd cfn-operators.pd
> > --
> > 2.39.2
> >


RE: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Jakub Jelinek 
> Sent: Friday, May 5, 2023 4:33 PM
> To: Tamar Christina 
> Cc: Jeff Law ; David Edelsohn ;
> GCC Patches 
> Subject: Re: [PATCH 5/5] match.pd: Use splits in makefile and make
> configurable.
> 
> On Fri, May 05, 2023 at 03:22:11PM +, Tamar Christina wrote:
> > > We require GNU make, so perhaps we could use something like
> > > $(wordlist
> > > 1,$(NUM_MATCH_SPLITS),$(check_p_numbers))
> > > instead of
> > > $(shell seq 1 $(NUM_MATCH_SPLITS))
> > > provided we move the check_p_numbers definition earlier (or perhaps
> > > bettter rename it to something more generic, so that it is clear
> > > that is a variable holding numbers from 1 to .
> >
> > I'm currently testing
> >
> > NUM_MATCH_SPLITS = @DEFAULT_MATCHPD_PARTITIONS@ -
> MATCH_SPLITS_SEQ =
> > $(shell seq 1 $(NUM_MATCH_SPLITS))
> > +MATCH_SPLITS_SEQ = $(shell echo {1..$(NUM_MATCH_SPLITS)})
> >
> > Which seems to work since it looks like we require an sh compatible shell.
> >
> > Question is this right? From the existing
> 
> AIX /bin/sh certainly doesn't handle that.

Wow, wonder what sh version it has..

> 
> But what do I know about AIX...

Same..

> 
> This seems to work and we use it already in the Makefile.
> If something else works portably, we could change both spots...
> 
> 2023-05-05  Jakub Jelinek  
> 
>   * Makefile.in (check_p_numbers): Rename to one_to_, move
>   earlier with helper variables also renamed.
>   (MATCH_SPLUT_SEQ): Use $(wordlist
> 1,$(NUM_MATCH_SPLITS),$(one_to_))
>   instead of $(shell seq 1 $(NUM_MATCH_SPLITS)).
>   (check_p_subdirs): Use $(one_to_) instead of
> $(check_p_numbers).
> 
> --- gcc/Makefile.in.jj2023-05-05 16:02:37.180575333 +0200
> +++ gcc/Makefile.in   2023-05-05 17:20:27.923251821 +0200
> @@ -214,9 +214,19 @@ rtl-ssa-warn = $(STRICT_WARN)
> GCC_WARN_CFLAGS = $(LOOSE_WARN) $(C_LOOSE_WARN) $($(@D)-warn)
> $(if $(filter-out $(STRICT_WARN),$($(@D)-warn)),,$(C_STRICT_WARN))
> $(NOCOMMON_FLAG) $($@-warn)  GCC_WARN_CXXFLAGS =
> $(LOOSE_WARN) $($(@D)-warn) $(NOCOMMON_FLAG) $($@-warn)
> 
> +# 1 2 3 ... 
> +one_to__0:=1 2 3 4 5 6 7 8 9
> +one_to__1:=0 $(one_to__0)
> +one_to__2:=$(foreach i,$(one_to__0),$(addprefix
> +$(i),$(one_to__1))) one_to__3:=$(addprefix
> 0,$(one_to__1))
> +$(one_to__2) one_to__4:=$(foreach
> +i,$(one_to__0),$(addprefix $(i),$(one_to__3)))
> +one_to__5:=$(addprefix 0,$(one_to__3)) $(one_to__4)
> +one_to__6:=$(foreach i,$(one_to__0),$(addprefix
> +$(i),$(one_to__5)))
> +one_to_:=$(one_to__0) $(one_to__2) $(one_to__4)
> +$(one_to__6)
> +
>  # The number of splits to be made for the match.pd files.
>  NUM_MATCH_SPLITS = @DEFAULT_MATCHPD_PARTITIONS@ -
> MATCH_SPLITS_SEQ = $(shell seq 1 $(NUM_MATCH_SPLITS))
> +MATCH_SPLITS_SEQ = $(wordlist
> 1,$(NUM_MATCH_SPLITS),$(one_to_))
>  GIMPLE_MATCH_PD_SEQ_SRC = $(patsubst %, gimple-match-%.cc,
> $(MATCH_SPLITS_SEQ))  GIMPLE_MATCH_PD_SEQ_O = $(patsubst %, gimple-
> match-%.o, $(MATCH_SPLITS_SEQ))  GENERIC_MATCH_PD_SEQ_SRC =
> $(patsubst %, generic-match-%.cc, $(MATCH_SPLITS_SEQ)) @@ -4234,18
> +4244,10 @@ $(patsubst %,%-subtargets,$(lang_checks)
> check_p_tool=$(firstword $(subst _, ,$*))
>  check_p_count=$(check_$(check_p_tool)_parallelize)
>  check_p_subno=$(word 2,$(subst _, ,$*))
> -check_p_numbers0:=1 2 3 4 5 6 7 8 9
> -check_p_numbers1:=0 $(check_p_numbers0) -
> check_p_numbers2:=$(foreach i,$(check_p_numbers0),$(addprefix
> $(i),$(check_p_numbers1))) -check_p_numbers3:=$(addprefix
> 0,$(check_p_numbers1)) $(check_p_numbers2) -
> check_p_numbers4:=$(foreach i,$(check_p_numbers0),$(addprefix
> $(i),$(check_p_numbers3))) -check_p_numbers5:=$(addprefix
> 0,$(check_p_numbers3)) $(check_p_numbers4) -
> check_p_numbers6:=$(foreach i,$(check_p_numbers0),$(addprefix
> $(i),$(check_p_numbers5)))
> -check_p_numbers:=$(check_p_numbers0) $(check_p_numbers2)
> $(check_p_numbers4) $(check_p_numbers6)  check_p_subdir=$(subst _,,$*)
> check_p_subdirs=$(wordlist 1,$(check_p_count),$(wordlist 1, \
>   $(if
> $(GCC_TEST_PARALLEL_SLOTS),$(GCC_TEST_PARALLEL_SLOTS),128), \
> - $(check_p_numbers)))
> + $(one_to_)))

Thanks, If it works I'm happy, I can rebase my other patches to use this.

Thank you!

Regards,
Tamar

> 
>  # For parallelized check-% targets, this decides whether parallelization  # 
> is
> desirable (if -jN is used).  If desirable, recursive make is run with
> 
> 
>   Jakub



RE: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Jakub Jelinek 
> Sent: Friday, May 5, 2023 4:18 PM
> To: Jeff Law 
> Cc: David Edelsohn ; Tamar Christina
> ; GCC Patches 
> Subject: Re: [PATCH 5/5] match.pd: Use splits in makefile and make
> configurable.
> 
> On Fri, May 05, 2023 at 09:04:16AM -0600, Jeff Law via Gcc-patches wrote:
> > On 5/5/23 08:59, David Edelsohn via Gcc-patches wrote:
> > > This patch has broken GCC bootstrap on AIX.  It appears to rely
> > > upon, or complain about, the command "seq":
> > >
> > > /nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11   -g -DIN_GCC
> > > -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
> > > -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format
> > > -Wmissing-format-attribute -Wconditionally-supported
> > > -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
> > > -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H
> > > -DGENERATOR_FILE -static-libstdc++ -static-libgcc -Wl,-bbigtoc -Wl,-
> bmaxdata:0x4000 -o build/genmatch \
> > >  build/genmatch.o
> > > ../build-powerpc-ibm-aix7.2.5.0/libcpp/libcpp.a
> > > build/errors.o build/vec.o build/hash-table.o build/sort.o
> > > ../build-powerpc-ibm-aix7.2.5.0/libiberty/libiberty.a
> > > /usr/bin/bash: seq: command not found
> > > /usr/bin/bash: seq: command not found build/genmatch --gimple \
> > >  --header=tmp-gimple-match-auto.h --include=gimple-match-auto.h \
> > >  /nasfarm/edelsohn/src/src/gcc/match.pd
> > >
> > > All of the match files are dumped to stdout.
> > Sigh.  So the question is do we make seq a requirement or do we
> > implement an alternate to get the sequence or implement a fallback.
> 
> We require GNU make, so perhaps we could use something like $(wordlist
> 1,$(NUM_MATCH_SPLITS),$(check_p_numbers))
> instead of
> $(shell seq 1 $(NUM_MATCH_SPLITS))
> provided we move the check_p_numbers definition earlier (or perhaps bettter
> rename it to something more generic, so that it is clear that is a variable
> holding numbers from 1 to .

I'm currently testing

NUM_MATCH_SPLITS = @DEFAULT_MATCHPD_PARTITIONS@
-MATCH_SPLITS_SEQ = $(shell seq 1 $(NUM_MATCH_SPLITS))
+MATCH_SPLITS_SEQ = $(shell echo {1..$(NUM_MATCH_SPLITS)})

Which seems to work since it looks like we require an sh compatible shell.

Question is this right? From the existing

$(foreach header_var,$(shell sed < Makefile -n -e 's/^\([A-Z0-9_]*_H\)[ 
]*=.*/\1/p'),echo $(header_var)=$(shell echo 
$($(header_var):$(srcdir)/%=.../%) | sed -e 's~\.\.\./config/~config/~' -e 
's~\.\.\./common/config/~common/config/~' -e 's~\.\.\.[^]*/~~g') >> 
tmp-header-vars;)

Rule this seems to be correct.

Thanks,
Tamar

> 
>   Jakub



RE: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Jeff Law 
> Sent: Friday, May 5, 2023 4:04 PM
> To: David Edelsohn ; Tamar Christina
> 
> Cc: GCC Patches 
> Subject: Re: [PATCH 5/5] match.pd: Use splits in makefile and make
> configurable.
> 
> 
> 
> On 5/5/23 08:59, David Edelsohn via Gcc-patches wrote:
> > This patch has broken GCC bootstrap on AIX.  It appears to rely upon,
> > or complain about, the command "seq":
> >
> > /nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11   -g -DIN_GCC
> > -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
> > -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format
> > -Wmissing-format-attribute -Wconditionally-supported
> > -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
> > -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H  -
> DGENERATOR_FILE
> > -static-libstdc++ -static-libgcc -Wl,-bbigtoc -Wl,-bmaxdata:0x4000 -o
> build/genmatch \
> >  build/genmatch.o ../build-powerpc-ibm-aix7.2.5.0/libcpp/libcpp.a
> > build/errors.o build/vec.o build/hash-table.o build/sort.o
> > ../build-powerpc-ibm-aix7.2.5.0/libiberty/libiberty.a
> > /usr/bin/bash: seq: command not found
> > /usr/bin/bash: seq: command not found
> > build/genmatch --gimple \
> >  --header=tmp-gimple-match-auto.h --include=gimple-match-auto.h \
> >  /nasfarm/edelsohn/src/src/gcc/match.pd
> >
> > All of the match files are dumped to stdout.
> Sigh.  So the question is do we make seq a requirement or do we implement an
> alternate to get the sequence or implement a fallback.
> 
> jeff

I'm looking for an alternate sequence now.

If I don't find one in a bit, since Monday is a bank holiday for the UK I can 
temporarily
Ignore the configure flag by defining

MATCH_SPLITS_SEQ = 1 2 3 4 5 6 7 8 9 10

Would that be ok as a temporary fix if I don't find anything else by EOD? But 
hoping to find another way that doesn't rely on coreutils.

Cheers,
Tamar


RE: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-05 Thread Tamar Christina via Gcc-patches
>  This looks pretty reasonable to me.  Are there any patches left in
>  this series that need review?  I'm very much looking forward to
>  build time provements related to this patch, particularly for
>  targets that I bootstrap with qemu emulation -- we take multiple
>  hours to build gimple-match and the ability to parallelize those
>  component
> >> builds should be a significant win.
> >>>
> >>> Hi,
> >>>
> >>> No this is the last one, Richi already approved the rest but he
> >>> didn't feel he had enough knowledge about the build system to say if
> >>> this code was portable enough.
> >>
> >> I'm looking forward to this going as well for improved bootstrap
> >> times, thanks for working on this!
> >>
> >>>
> >>> So just waiting on this one and can commit the series.
> >>
> >> Can we treat Jeff's LGTM above as an ok given his global reviewer position?
> >
> > Ah I didn't treat it as such as it wasn't in reply to the "ok for
> > master" part. But perhaps I misunderstood.  In case it wasn't, this is
> > also a PING for the *.in files maintainers.
> My message was a fairly ambiguous.   I just gave it another once over
> and I'll give an explicit OK for the trunk.
> 

Merci!

I'll go to the next bottleneck then.

Thanks!
Tamar

> Jeff


RE: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-04 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Kyrylo Tkachov 
> Sent: Wednesday, May 3, 2023 4:19 PM
> To: Tamar Christina ; Jeff Law
> ; gcc-patches@gcc.gnu.org
> Cc: nd ; bonz...@gnu.org; nero...@gcc.gnu.org;
> aol...@gcc.gnu.org; ralf.wildenh...@gmx.de
> Subject: RE: [PATCH 5/5] match.pd: Use splits in makefile and make
> configurable.
> 
> 
> 
> > -Original Message-
> > From: Gcc-patches  > bounces+kyrylo.tkachov=arm....@gcc.gnu.org> On Behalf Of Tamar
> > Christina via Gcc-patches
> > Sent: Tuesday, May 2, 2023 8:08 AM
> > To: Jeff Law ; gcc-patches@gcc.gnu.org
> > Cc: nd ; bonz...@gnu.org; nero...@gcc.gnu.org;
> > aol...@gcc.gnu.org; ralf.wildenh...@gmx.de
> > Subject: RE: [PATCH 5/5] match.pd: Use splits in makefile and make
> > configurable.
> >
> > > -Original Message-
> > > From: Jeff Law 
> > > Sent: Sunday, April 30, 2023 8:46 PM
> > > To: Tamar Christina ;
> > > gcc-patches@gcc.gnu.org
> > > Cc: nd ; bonz...@gnu.org; nero...@gcc.gnu.org;
> > > aol...@gcc.gnu.org; ralf.wildenh...@gmx.de
> > > Subject: Re: [PATCH 5/5] match.pd: Use splits in makefile and make
> > > configurable.
> > >
> > >
> > >
> > > On 4/28/23 04:44, Tamar Christina via Gcc-patches wrote:
> > > > Hi All,
> > > >
> > > > This updates the build system to split up match.pd files into chunks of
> 10.
> > > > This also introduces a new flag --with-matchpd-partitions which
> > > > can be used to change the number of partitions.
> > > >
> > > > For the analysis of why 10 please look at the previous patch in the 
> > > > series.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR bootstrap/84402
> > > > * Makefile.in (NUM_MATCH_SPLITS, MATCH_SPLITS_SEQ,
> > > > GIMPLE_MATCH_PD_SEQ_SRC, GIMPLE_MATCH_PD_SEQ_O,
> > > > GENERIC_MATCH_PD_SEQ_SRC, GENERIC_MATCH_PD_SEQ_O): New.
> > > > (OBJS, MOSTLYCLEANFILES, .PRECIOUS): Use them.
> > > > (s-match): Split into s-generic-match and s-gimple-match.
> > > > * configure.ac (with-matchpd-partitions,
> > > > DEFAULT_MATCHPD_PARTITIONS): New.
> > > > * configure: Regenerate.
> > > This looks pretty reasonable to me.  Are there any patches left in
> > > this series that need review?  I'm very much looking forward to
> > > build time provements related to this patch, particularly for
> > > targets that I bootstrap with qemu emulation -- we take multiple
> > > hours to build gimple-match and the ability to parallelize those component
> builds should be a significant win.
> >
> > Hi,
> >
> > No this is the last one, Richi already approved the rest but he didn't
> > feel he had enough knowledge about the build system to say if this
> > code was portable enough.
> 
> I'm looking forward to this going as well for improved bootstrap times, thanks
> for working on this!
> 
> >
> > So just waiting on this one and can commit the series.
> 
> Can we treat Jeff's LGTM above as an ok given his global reviewer position?

Ah I didn't treat it as such as it wasn't in reply to the "ok for master" part. 
But
perhaps I misunderstood.  In case it wasn't, this is also a PING for the *.in 
files
maintainers.

Regards,
Tamar

> Thanks,
> Kyrill
> 
> >
> > Cheers,
> > Tamar
> > >
> > > jeff


RE: [PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-05-02 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Jeff Law 
> Sent: Sunday, April 30, 2023 8:46 PM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; bonz...@gnu.org; nero...@gcc.gnu.org;
> aol...@gcc.gnu.org; ralf.wildenh...@gmx.de
> Subject: Re: [PATCH 5/5] match.pd: Use splits in makefile and make
> configurable.
> 
> 
> 
> On 4/28/23 04:44, Tamar Christina via Gcc-patches wrote:
> > Hi All,
> >
> > This updates the build system to split up match.pd files into chunks of 10.
> > This also introduces a new flag --with-matchpd-partitions which can be
> > used to change the number of partitions.
> >
> > For the analysis of why 10 please look at the previous patch in the series.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR bootstrap/84402
> > * Makefile.in (NUM_MATCH_SPLITS, MATCH_SPLITS_SEQ,
> > GIMPLE_MATCH_PD_SEQ_SRC, GIMPLE_MATCH_PD_SEQ_O,
> > GENERIC_MATCH_PD_SEQ_SRC, GENERIC_MATCH_PD_SEQ_O): New.
> > (OBJS, MOSTLYCLEANFILES, .PRECIOUS): Use them.
> > (s-match): Split into s-generic-match and s-gimple-match.
> > * configure.ac (with-matchpd-partitions,
> > DEFAULT_MATCHPD_PARTITIONS): New.
> > * configure: Regenerate.
> This looks pretty reasonable to me.  Are there any patches left in this series
> that need review?  I'm very much looking forward to build time provements
> related to this patch, particularly for targets that I bootstrap with qemu
> emulation -- we take multiple hours to build gimple-match and the ability to
> parallelize those component builds should be a significant win.

Hi,

No this is the last one, Richi already approved the rest but he didn't feel he 
had
enough knowledge about the build system to say if this code was portable enough.

So just waiting on this one and can commit the series.

Cheers,
Tamar
> 
> jeff


RE: [PATCH 2/5] match.pd: Remove commented out line pragmas unless -vv is used.

2023-04-28 Thread Tamar Christina via Gcc-patches
> On the check for verbose==2, should that be verbose >= 2 ?
> 

That's fair enough. Made the change.

Thanks,
Tamar.

>   paul
> 
> > On Apr 28, 2023, at 6:38 AM, Tamar Christina via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > genmatch currently outputs commented out line directives that have no
> > effect but the compiler still has to parse only to discard.
> >
> > They are however handy when debugging genmatch output.  As such this
> > moves them behind the -vv flag.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR bootstrap/84402
> > * genmatch.cc (output_line_directive): Only emit commented directive
> > when -vv.
> > (main): Initialize verbose.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc index
> >
> 638606b2502f640e59527fc5a0b23fa3bedd0cee..6d62cdea2082d92e5ecc1
> 102c802
> > 05115a4e3040 100644
> > --- a/gcc/genmatch.cc
> > +++ b/gcc/genmatch.cc
> > @@ -209,7 +209,7 @@ output_line_directive (FILE *f, location_t location,
> >   else
> > fprintf (f, "%s:%d", file, loc.line);
> > }
> > -  else
> > +  else if (verbose == 2)
> > /* Other gen programs really output line directives here, at least for
> >development it's right now more convenient to have line information
> >from the generated file.  Still keep the directives as comment
> > for now @@ -5221,6 +5221,7 @@ main (int argc, char **argv)
> > return 1;
> >
> >   bool gimple = true;
> > +  verbose = 0;
> >   char *input = argv[argc-1];
> >   for (int i = 1; i < argc - 1; ++i)
> > {
> >
> >
> >
> >
> > --
> > 



[PATCH 5/5] match.pd: Use splits in makefile and make configurable.

2023-04-28 Thread Tamar Christina via Gcc-patches
Hi All,

This updates the build system to split up match.pd files into chunks of 10.
This also introduces a new flag --with-matchpd-partitions which can be used to
change the number of partitions.

For the analysis of why 10 please look at the previous patch in the series.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR bootstrap/84402
* Makefile.in (NUM_MATCH_SPLITS, MATCH_SPLITS_SEQ,
GIMPLE_MATCH_PD_SEQ_SRC, GIMPLE_MATCH_PD_SEQ_O,
GENERIC_MATCH_PD_SEQ_SRC, GENERIC_MATCH_PD_SEQ_O): New.
(OBJS, MOSTLYCLEANFILES, .PRECIOUS): Use them.
(s-match): Split into s-generic-match and s-gimple-match.
* configure.ac (with-matchpd-partitions,
DEFAULT_MATCHPD_PARTITIONS): New.
* configure: Regenerate.

--- inline copy of patch -- 
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 
70559a014c0e32d8d825766e0c1516fc2ee05421..f3343eea3339e9dc054e83cfb899799c7d784963
 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -214,6 +214,14 @@ rtl-ssa-warn = $(STRICT_WARN)
 GCC_WARN_CFLAGS = $(LOOSE_WARN) $(C_LOOSE_WARN) $($(@D)-warn) $(if 
$(filter-out $(STRICT_WARN),$($(@D)-warn)),,$(C_STRICT_WARN)) $(NOCOMMON_FLAG) 
$($@-warn)
 GCC_WARN_CXXFLAGS = $(LOOSE_WARN) $($(@D)-warn) $(NOCOMMON_FLAG) $($@-warn)
 
+# The number of splits to be made for the match.pd files.
+NUM_MATCH_SPLITS = @DEFAULT_MATCHPD_PARTITIONS@
+MATCH_SPLITS_SEQ = $(shell seq 1 $(NUM_MATCH_SPLITS))
+GIMPLE_MATCH_PD_SEQ_SRC = $(patsubst %, gimple-match-%.cc, $(MATCH_SPLITS_SEQ))
+GIMPLE_MATCH_PD_SEQ_O = $(patsubst %, gimple-match-%.o, $(MATCH_SPLITS_SEQ))
+GENERIC_MATCH_PD_SEQ_SRC = $(patsubst %, generic-match-%.cc, 
$(MATCH_SPLITS_SEQ))
+GENERIC_MATCH_PD_SEQ_O = $(patsubst %, generic-match-%.o, $(MATCH_SPLITS_SEQ))
+
 # These files are to have specific diagnostics suppressed, or are not to
 # be subject to -Werror:
 # flex output may yield harmless "no previous prototype" warnings
@@ -222,9 +230,8 @@ gengtype-lex.o-warn = -Wno-error
 libgcov-util.o-warn = -Wno-error
 libgcov-driver-tool.o-warn = -Wno-error
 libgcov-merge-tool.o-warn = -Wno-error
-gimple-match.o-warn = -Wno-unused
+gimple-match-head.o-warn = -Wno-unused
 gimple-match-exports.o-warn = -Wno-unused
-generic-match.o-warn = -Wno-unused
 dfp.o-warn = -Wno-strict-aliasing
 
 # All warnings have to be shut off in stage1 if the compiler used then
@@ -1310,9 +1317,9 @@ ANALYZER_OBJS = \
 # will build them sooner, because they are large and otherwise tend to be
 # the last objects to finish building.
 OBJS = \
-   gimple-match.o \
+   $(GIMPLE_MATCH_PD_SEQ_O) \
gimple-match-exports.o \
-   generic-match.o \
+   $(GENERIC_MATCH_PD_SEQ_O) \
insn-attrtab.o \
insn-automata.o \
insn-dfatab.o \
@@ -1805,7 +1812,8 @@ MOSTLYCLEANFILES = insn-flags.h insn-config.h 
insn-codes.h \
  insn-output.cc insn-recog.cc insn-emit.cc insn-extract.cc insn-peep.cc \
  insn-attr.h insn-attr-common.h insn-attrtab.cc insn-dfatab.cc \
  insn-latencytab.cc insn-opinit.cc insn-opinit.h insn-preds.cc 
insn-constants.h \
- tm-preds.h tm-constrs.h checksum-options gimple-match.cc generic-match.cc \
+ tm-preds.h tm-constrs.h checksum-options $(GIMPLE_MATCH_PD_SEQ_SRC) \
+ $(GENERIC_MATCH_PD_SEQ_SRC) gimple-match-auto.h generic-match-auto.h \
  tree-check.h min-insn-modes.cc insn-modes.cc insn-modes.h insn-modes-inline.h 
\
  genrtl.h gt-*.h gtype-*.h gtype-desc.cc gtyp-input.list \
  case-cfn-macros.h cfn-operators.pd \
@@ -2420,7 +2428,8 @@ $(common_out_object_file): $(common_out_file)
 .PRECIOUS: insn-config.h insn-flags.h insn-codes.h insn-constants.h \
   insn-emit.cc insn-recog.cc insn-extract.cc insn-output.cc insn-peep.cc \
   insn-attr.h insn-attr-common.h insn-attrtab.cc insn-dfatab.cc \
-  insn-latencytab.cc insn-preds.cc gimple-match.cc generic-match.cc \
+  insn-latencytab.cc insn-preds.cc $(GIMPLE_MATCH_PD_SEQ_SRC) \
+  $(GENERIC_MATCH_PD_SEQ_SRC) gimple-match-auto.h generic-match-auto.h \
   insn-target-def.h
 
 # Dependencies for the md file.  The first time through, we just assume
@@ -2663,19 +2672,36 @@ s-tm-texi: build/genhooks$(build_exeext) 
$(srcdir)/doc/tm.texi.in
  false; \
fi
 
-gimple-match.cc: s-match gimple-match-head.cc gimple-match-exports.cc ; @true
-generic-match.cc: s-match generic-match-head.cc ; @true
-
-s-match: build/genmatch$(build_exeext) $(srcdir)/match.pd cfn-operators.pd
-   $(RUN_GEN) build/genmatch$(build_exeext) --gimple $(srcdir)/match.pd \
-   > tmp-gimple-match.cc
-   $(RUN_GEN) build/genmatch$(build_exeext) --generic $(srcdir)/match.pd \
-   > tmp-generic-match.cc
-   $(SHELL) $(srcdir)/../move-if-change tmp-gimple-match.cc \
-   gimple-match.cc
-   $(SHELL) $(srcdir)/../move-if-change tmp-generic-match.cc \
-   generic-match.cc
-   $(STAMP) s-match

RE: [PATCH 3/3]middle-end RFC - match.pd: automatically partition *-match.cc files.

2023-04-28 Thread Tamar Christina via Gcc-patches
> > [1] https://gcc.gnu.org/legacy-ml/gcc-patches/2018-04/msg01125.html
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> Some comments - I have to leave the Makefile bits to somebody else to see
> whether they are portable as-is.
> 
> The private functions now in gimple-match-exports.cc are not supposed to be
> public API, so the additions to gimple-match.h should be avoided - can
> you add the declarations to gimple-match-head.cc instead?  At least I don't
> see how the refactoring needs to add anything to gimple-match.h?
> 
> -decision_tree::gen (FILE *f, bool gimple)
> +decision_tree::gen (FILE **files, int n_parts, bool gimple)
> 
> can you use a vec<> please to avoid passing n_parts separately?
> 
> +  /* Set a default value for the tool to 5, but GCC itself uses
> + whatever default is determined by the configure variable
> + DEFAULT_MATCHPD_PARTITIONS.  */
> +  int n_parts = 5;
> +  char *input = argv[argc-2];
> ...
>   fprintf (stderr, "Usage: genmatch "
> -  "[--gimple] [--generic] [-v[v]] input\n");
> +  "[--gimple] [--generic] [--splits=] [-v[v]]
> input outdir\n");
> 
> I don't like this - I'm using ./build/genmatch --gimple test.pd | less to 
> debug
> genmatch changes with a small test input and like to preserve that.  Can
> you instead change the usage to
> 
>   genmatch --gimple match.pd gimple-match-1.c gimple-match-2.c
> gimple-match-3.c ...
> 
> thus
> 
> -  "[--gimple] [--generic] [-v[v]] input\n");
> +  "[--gimple] [--generic] [-v[v]] input [output...]\n");
> 
> and when no output is specified continue to use stdout?  Possibly when
> more than one output is given require a --header outfile argument to
> specify the header file to use (and for one output make emit_func
> not ICE but instead not emit to the header, aka header_file == NULL?).
> Ideally without makefile changes that would produce the same
> gimple-match.cc as before (minus the -head.cc changes of course).
> 
> The gimple-match-head.cc/exports changes could be split out as
> far as I can see?  Likewise the Makefile changes if the argument
> control is changed as I sugggest?
>

All changes done.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR bootstrap/84402
* genmatch.cc (emit_func, SIZED_BASED_CHUNKS, get_out_file): New.
(decision_tree::gen): Accept list of files instead of single and update
to write function definition to header and main file.
(write_predicate): Likewise.
(write_header): Emit pragmas and new includes.
(main): Create file buffers and cleanup.
(showUsage): New.

--- inline copy of patch ---

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 
716fb97aac4c3c2baae82e068df3ce158b9afee9..f56b4bc992d87cb7d707e59be2d61c44a45b68e6
 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -183,6 +183,33 @@ fprintf_indent (FILE *f, unsigned int indent, const char 
*format, ...)
   va_end (ap);
 }
 
+/* Like fprintf, but print to two files, one header one C implementation.  */
+FILE *header_file = NULL;
+
+static void
+#if GCC_VERSION >= 4001
+__attribute__((format (printf, 4, 5)))
+#endif
+emit_func (FILE *f, bool open, bool close, const char *format, ...)
+{
+  va_list ap1, ap2;
+  if (header_file != stdout)
+{
+  if (open)
+   fprintf (header_file, "extern ");
+  va_start (ap2, format);
+  vfprintf (header_file, format, ap2);
+  va_end (ap2);
+  if (close)
+   fprintf (header_file, ";\n");
+}
+
+  va_start (ap1, format);
+  vfprintf (f, format, ap1);
+  va_end (ap1);
+  fputc ('\n', f);
+}
+
 static void
 output_line_directive (FILE *f, location_t location,
   bool dumpfile = false, bool fnargs = false)
@@ -217,6 +244,34 @@ output_line_directive (FILE *f, location_t location,
 fprintf (f, "/* #line %d \"%s\" */\n", loc.line, loc.file);
 }
 
+/* Find the file to write into next.  We try to evenly distribute the contents
+   over the different files.  */
+
+#define SIZED_BASED_CHUNKS 1
+
+int current_file = 0;
+FILE *get_out_file (vec  )
+{
+#ifdef SIZED_BASED_CHUNKS
+   FILE *f = NULL;
+   long min = 0;
+   /* We've started writing all the files at pos 0, so ftell is equivalent
+  to the size and should be much faster.  */
+   for (unsigned i = 0; i < parts.length (); i++)
+ {
+   long res = ftell (parts[i]);
+   if (!f || res < min)
+ {
+   min = res;
+   f = parts[i];
+ }
+ }
+  return f;
+#else
+  return parts[current_file++ % parts.length ()];
+#endif
+}
+
 
 /* Pull in tree codes and builtin function codes from their
definition files.  */
@@ -1732,7 +1787,7 @@ public:
   dt_node *root;
 
   void insert (class simplify *, unsigned);
-  void gen (FILE *f, bool gimple);
+  void gen (vec  , bool gimple);
   void print (FILE *f = stderr);
 
   

[PATCH 3/5] match.pd: CSE the dump output check.

2023-04-28 Thread Tamar Christina via Gcc-patches
Hi All,

This is a small improvement in QoL codegen for match.pd to save time not
re-evaluating the condition for printing debug information in every function.

There is a small but consistent runtime and compile time win here.  The runtime
win comes from not having to do the condition over again, and on Arm plaforms
we now use the new test-and-branch support for booleans to only have a single
instruction here.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR bootstrap/84402
* genmatch.cc (decision_tree::gen, write_predicate): Generate new
debug_dump var.
(dt_simplify::gen_1): Use it.

--- inline copy of patch -- 
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 
6d62cdea2082d92e5ecc1102c80205115a4e3040..1f52ca2eebc2794159747338babb56c610387f3b
 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -3431,7 +3431,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
   needs_label = true;
 }
 
-  fprintf_indent (f, indent, "if (UNLIKELY (dump_file && (dump_flags & 
TDF_FOLDING))) "
+  fprintf_indent (f, indent, "if (UNLIKELY (debug_dump)) "
   "fprintf (dump_file, \"%s ",
   s->kind == simplify::SIMPLIFY
   ? "Applying pattern" : "Matching expression");
@@ -3892,6 +3892,8 @@ decision_tree::gen (FILE *f, bool gimple)
}
 
   fprintf (f, ")\n{\n");
+  fprintf_indent (f, 2, "const bool debug_dump = "
+   "dump_file && (dump_flags & TDF_FOLDING);\n");
   s->s->gen_1 (f, 2, gimple, s->s->s->result);
   if (gimple)
fprintf (f, "  return false;\n");
@@ -3937,6 +3939,8 @@ decision_tree::gen (FILE *f, bool gimple)
fprintf (f, ", tree _p%d", i);
  fprintf (f, ")\n");
  fprintf (f, "{\n");
+ fprintf_indent (f, 2, "const bool debug_dump = "
+   "dump_file && (dump_flags & TDF_FOLDING);\n");
  dop->gen_kids (f, 2, gimple, 0);
  if (gimple)
fprintf (f, "  return false;\n");
@@ -4046,6 +4050,8 @@ write_predicate (FILE *f, predicate_id *p, decision_tree 
, bool gimple)
   gimple ? ", tree (*valueize)(tree) ATTRIBUTE_UNUSED" : "");
   /* Conveniently make 'type' available.  */
   fprintf_indent (f, 2, "const tree type = TREE_TYPE (t);\n");
+  fprintf_indent (f, 2, "const bool debug_dump = "
+   "dump_file && (dump_flags & TDF_FOLDING);\n");
 
   if (!gimple)
 fprintf_indent (f, 2, "if (TREE_SIDE_EFFECTS (t)) return false;\n");




-- 
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 
6d62cdea2082d92e5ecc1102c80205115a4e3040..1f52ca2eebc2794159747338babb56c610387f3b
 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -3431,7 +3431,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
   needs_label = true;
 }
 
-  fprintf_indent (f, indent, "if (UNLIKELY (dump_file && (dump_flags & 
TDF_FOLDING))) "
+  fprintf_indent (f, indent, "if (UNLIKELY (debug_dump)) "
   "fprintf (dump_file, \"%s ",
   s->kind == simplify::SIMPLIFY
   ? "Applying pattern" : "Matching expression");
@@ -3892,6 +3892,8 @@ decision_tree::gen (FILE *f, bool gimple)
}
 
   fprintf (f, ")\n{\n");
+  fprintf_indent (f, 2, "const bool debug_dump = "
+   "dump_file && (dump_flags & TDF_FOLDING);\n");
   s->s->gen_1 (f, 2, gimple, s->s->s->result);
   if (gimple)
fprintf (f, "  return false;\n");
@@ -3937,6 +3939,8 @@ decision_tree::gen (FILE *f, bool gimple)
fprintf (f, ", tree _p%d", i);
  fprintf (f, ")\n");
  fprintf (f, "{\n");
+ fprintf_indent (f, 2, "const bool debug_dump = "
+   "dump_file && (dump_flags & TDF_FOLDING);\n");
  dop->gen_kids (f, 2, gimple, 0);
  if (gimple)
fprintf (f, "  return false;\n");
@@ -4046,6 +4050,8 @@ write_predicate (FILE *f, predicate_id *p, decision_tree 
, bool gimple)
   gimple ? ", tree (*valueize)(tree) ATTRIBUTE_UNUSED" : "");
   /* Conveniently make 'type' available.  */
   fprintf_indent (f, 2, "const tree type = TREE_TYPE (t);\n");
+  fprintf_indent (f, 2, "const bool debug_dump = "
+   "dump_file && (dump_flags & TDF_FOLDING);\n");
 
   if (!gimple)
 fprintf_indent (f, 2, "if (TREE_SIDE_EFFECTS (t)) return false;\n");





[PATCH 2/5] match.pd: Remove commented out line pragmas unless -vv is used.

2023-04-28 Thread Tamar Christina via Gcc-patches
Hi All,

genmatch currently outputs commented out line directives that have no effect
but the compiler still has to parse only to discard.

They are however handy when debugging genmatch output.  As such this moves them
behind the -vv flag.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR bootstrap/84402
* genmatch.cc (output_line_directive): Only emit commented directive
when -vv.
(main): Initialize verbose.

--- inline copy of patch -- 
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 
638606b2502f640e59527fc5a0b23fa3bedd0cee..6d62cdea2082d92e5ecc1102c80205115a4e3040
 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -209,7 +209,7 @@ output_line_directive (FILE *f, location_t location,
   else
fprintf (f, "%s:%d", file, loc.line);
 }
-  else
+  else if (verbose == 2)
 /* Other gen programs really output line directives here, at least for
development it's right now more convenient to have line information
from the generated file.  Still keep the directives as comment for now
@@ -5221,6 +5221,7 @@ main (int argc, char **argv)
 return 1;
 
   bool gimple = true;
+  verbose = 0;
   char *input = argv[argc-1];
   for (int i = 1; i < argc - 1; ++i)
 {




-- 
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 
638606b2502f640e59527fc5a0b23fa3bedd0cee..6d62cdea2082d92e5ecc1102c80205115a4e3040
 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -209,7 +209,7 @@ output_line_directive (FILE *f, location_t location,
   else
fprintf (f, "%s:%d", file, loc.line);
 }
-  else
+  else if (verbose == 2)
 /* Other gen programs really output line directives here, at least for
development it's right now more convenient to have line information
from the generated file.  Still keep the directives as comment for now
@@ -5221,6 +5221,7 @@ main (int argc, char **argv)
 return 1;
 
   bool gimple = true;
+  verbose = 0;
   char *input = argv[argc-1];
   for (int i = 1; i < argc - 1; ++i)
 {





RE: [PATCH 2/3]middle-end match.pd: simplify debug dump checks

2023-04-25 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, April 25, 2023 2:14 PM
> To: Tamar Christina 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org;
> nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 2/3]middle-end match.pd: simplify debug dump checks
> 
> On Tue, 25 Apr 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Tuesday, April 18, 2023 11:48 AM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de;
> > > j...@ventanamicro.com
> > > Subject: Re: [PATCH 2/3]middle-end match.pd: simplify debug dump
> > > checks
> > >
> > > On Tue, Apr 18, 2023 at 12:22?PM Tamar Christina via Gcc-patches
> > >  wrote:
> > > >
> > > > Hi All,
> > > >
> > > > This is a small improvement in QoL codegen for match.pd to save
> > > > time not re-evaluating the condition for printing debug
> > > > information in every
> > > function.
> > > >
> > > > There is a small but consistent runtime and compile time win here.
> > > > The runtime win comes from not having to do the condition over
> > > > again, and on Arm plaforms we now use the new test-and-branch
> > > > support for booleans to only have a single instruction here.
> > > >
> > > > Compile time win is gotten from not having to do all the string
> > > > parsing for the printf and having less string interning to do.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > > Ugh, I don't like the new global very much.  Can't we compute it in
> > > the toplevel entry and pass it down as parameter?  Like passing down
> > > the actual dump FILE *?
> >
> > So the dumpfile itself is currently also a global, I did try wiring
> > this down, but the problem here is that eventually at the very top
> > level, I need to modify the gimple_simplify calls because of the overloads
> being created by the output, which also need the parameter.
> >
> > There things become interesting because this then conflicts with the
> > definitions in gimple-fold which would also need to take an additional
> argument and it breaks the public API.
> >
> > Wiring the dbg value through all the generated and public function
> > requires quite a lot of changes so I'm not sure this is worth it in that 
> > case.
> Should I just drop it?
> 
> Yeah, just drop it then.  Btw, I wasn't suggesting to pass it down from the
> actual API but compute it once (locally) at the toplevel entries in the 
> generated
> gimple-match.cc and pass it down from there.  So we'd retain one
> (unconditionally done ...)
> 
>   FILE *local_dump_file = (dump_flags & TDF_folding) ? dump_file : NULL;
> 
> and pass that down to functions called.

Oo apologies, that makes much more sense! I could have indeed stopped at
gimple-match.cc and compute there.  Fair enough. Let me make the change.

Thanks!,
Tamar

> 
> Richard.
> 
> 
> > Thanks,
> > Tamar
> >
> > >
> > > The file output in output_line_directive was because we originally
> > > had match.pd #includeing multiple match-*.pd files, we'd want to
> > > keep that supported I think.  But since the line directives are
> > > commented and there's the same info available below, like
> > >
> > > /* #line 798 "/home/rguenther/src/gcc-13-branch/gcc/match.pd" */
> > >   tree captures[2] ATTRIBUTE_UNUSED = { _p0, _p1 
> > > };
> > >   if (UNLIKELY (dump_file && (dump_flags &
> > > TDF_FOLDING))) fprintf (dump_file, "Matching expression %s:%d,
> > > %s:%d\n", "match.pd", 798, __FILE__, __LINE__);
> > >
> > > there's probably no point in emitting them anymore (originally I
> > > emitted them non-commented but that didn't improve debugging much).
> > > We might want to emit more "proper" line directives for the natively
> > > copied parts of match.pd when code-generating c_expr parts, but that
> would be something separate.
> > >
> > > Can you split the patch into two things?  A patch removing output of
> > > the commented line directives at the call sites is OK.
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > 

RE: [PATCH 2/3]middle-end match.pd: simplify debug dump checks

2023-04-25 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, April 18, 2023 11:48 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de;
> j...@ventanamicro.com
> Subject: Re: [PATCH 2/3]middle-end match.pd: simplify debug dump checks
> 
> On Tue, Apr 18, 2023 at 12:22 PM Tamar Christina via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > This is a small improvement in QoL codegen for match.pd to save time
> > not re-evaluating the condition for printing debug information in every
> function.
> >
> > There is a small but consistent runtime and compile time win here.
> > The runtime win comes from not having to do the condition over again,
> > and on Arm plaforms we now use the new test-and-branch support for
> > booleans to only have a single instruction here.
> >
> > Compile time win is gotten from not having to do all the string
> > parsing for the printf and having less string interning to do.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> Ugh, I don't like the new global very much.  Can't we compute it in the 
> toplevel
> entry and pass it down as parameter?  Like passing down the actual dump FILE
> *?

So the dumpfile itself is currently also a global, I did try wiring this down, 
but the
problem here is that eventually at the very top level, I need to modify the 
gimple_simplify
calls because of the overloads being created by the output, which also need the 
parameter.

There things become interesting because this then conflicts with the 
definitions in gimple-fold
which would also need to take an additional argument and it breaks the public 
API.

Wiring the dbg value through all the generated and public function requires 
quite a lot of changes
so I'm not sure this is worth it in that case.  Should I just drop it?

Thanks,
Tamar

> 
> The file output in output_line_directive was because we originally had
> match.pd #includeing multiple match-*.pd files, we'd want to keep that
> supported I think.  But since the line directives are commented and there's 
> the
> same info available below, like
> 
> /* #line 798 "/home/rguenther/src/gcc-13-branch/gcc/match.pd" */
>   tree captures[2] ATTRIBUTE_UNUSED = { _p0, _p1 };
>   if (UNLIKELY (dump_file && (dump_flags &
> TDF_FOLDING))) fprintf (dump_file, "Matching expression %s:%d, %s:%d\n",
> "match.pd", 798, __FILE__, __LINE__);
> 
> there's probably no point in emitting them anymore (originally I emitted them
> non-commented but that didn't improve debugging much).  We might want
> to emit more "proper" line directives for the natively copied parts of 
> match.pd
> when code-generating c_expr parts, but that would be something separate.
> 
> Can you split the patch into two things?  A patch removing output of the
> commented line directives at the call sites is OK.
> 
> Richard.
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR bootstrap/84402
> > * dumpfile.h (dump_folding_p): New.
> > * dumpfile.cc (set_dump_file): Use it.
> > * generic-match-head.cc (dump_debug): New.
> > * gimple-match-head.cc (dump_debug): New.
> > * genmatch.cc (output_line_directive):  Support outputting only line
> > because file is implied.
> > (dt_simplify::gen_1): Call debug_dump instead of printf.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h index
> >
> 7d5eca899dcc98676a9ce7a7efff8e439854ff89..e7b595ddecdcca9983d958
> 4b8b24
> > 17ae1941c7d4 100644
> > --- a/gcc/dumpfile.h
> > +++ b/gcc/dumpfile.h
> > @@ -522,6 +522,7 @@ parse_dump_option (const char *, const char **);
> > extern FILE *dump_file;  extern dump_flags_t dump_flags;  extern const
> > char *dump_file_name;
> > +extern bool dump_folding_p;
> >
> >  extern bool dumps_are_enabled;
> >
> > diff --git a/gcc/dumpfile.cc b/gcc/dumpfile.cc index
> >
> 51f68c8c6b40051ba3125c84298ee44ca52f5d17..f805aa73f3aa244d84714
> 9eec265
> > 05181ce4efe8 100644
> > --- a/gcc/dumpfile.cc
> > +++ b/gcc/dumpfile.cc
> > @@ -63,6 +63,7 @@ FILE *dump_file = NULL;  const char *dump_file_name;
> > dump_flags_t dump_flags;  bool dumps_are_enabled = false;
> > +bool dump_folding_p = false;
> >
> >
> >  /* Set global "dump_file" to NEW_DUMP_FILE, refreshing the
> "dumps_are_enabled"
> > @@ -73,6 +74,7 @@ set_dump_file (FILE *new_dump_file) 

RE: [PATCH] aarch64: Add the cost model for Neoverse N1

2023-04-25 Thread Tamar Christina via Gcc-patches
Thanks Evandro,

That one works.  I’ll run the new cost model and sched modules through a number 
of workloads and come back with the results.

Cheers,
Tamar

From: Evandro Menezes 
Sent: Monday, April 24, 2023 11:52 PM
To: Evandro Menezes 
Cc: Tamar Christina ; evandro+gcc-patc...@gcc.gnu.org; 
gcc-patches@gcc.gnu.org; Richard Sandiford ; Kyrylo 
Tkachov 
Subject: Re: [PATCH] aarch64: Add the cost model for Neoverse N1

Sorry, but it seems that, before sending, the email client is stripping leading 
spaces.  I’m attaching the file here.

--
Evandro Menezes ◊ evan...@yahoo.com ◊ Austin, TX
Άγιος ο Θεός ⁂ ܩܕܝܫܐ ܐܢ̱ܬ ܠܐ ܡܝܘܬܐ ⁂ Sanctus Deus


Em 24 de abr. de 2023, à(s) 17:48, Evandro Menezes 
mailto:ebah...@icloud.com>> escreveu:

Hi, Tamara.

Does this work?

Thank you,

--
Evandro Menezes ◊ evan...@yahoo.com ◊ Austin, TX
Άγιος ο Θεός ⁂ ܩܕܝܫܐ ܐܢ̱ܬ ܠܐ ܡܝܘܬܐ ⁂ Sanctus Deus


Em 24 de abr. de 2023, à(s) 12:37, Tamar Christina 
mailto:tamar.christ...@arm.com>> escreveu:

Hi Evandro,

I wanted to give this patch a try, but the diff seems corrupt, the whitespaces 
at the start of the context lines seem to have gone missing.

Could you try resending it?

Thanks,
Tamar



RE: [PATCH] aarch64: Add the cost model for Neoverse N1

2023-04-24 Thread Tamar Christina via Gcc-patches
Hi Evandro,

I wanted to give this patch a try, but the diff seems corrupt, the whitespaces 
at the start of the context lines seem to have gone missing.

Could you try resending it?

Thanks,
Tamar

> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Evandro
> Menezes via Gcc-patches
> Sent: Tuesday, April 18, 2023 10:42 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Evandro Menezes ; Richard Sandiford
> ; Kyrylo Tkachov 
> Subject: [PATCH] aarch64: Add the cost model for Neoverse N1
> 
> This patch adds the cost model for Neoverse N1, based on the information
> from the "Arm Neoverse N1 Software Optimization Guide”.
> 
> --
> Evandro Menezes
> 
> ===
> =
> 
> gcc/ChangeLog:
> 
>* config/aarch64/aarch64-cores.def: Use the Neoverse N1 cost model.
>* config/aarch64/aarch64.cc
>(cortexa76_tunings): Rename variable.
>(neoversen1_addrcost_table): New variable.
>(neoversen1_vector_cost): Likewise.
>(neoversen1_regmove_cost): Likewise.
>(neoversen1_advsimd_vector_cost): Likewise.
>(neoversen1_scalar_issue_info): Likewise.
>(neoversen1_advsimd_issue_info): Likewise.
>(neoversen1_vec_issue_info): Likewise.
>(neoversen1_vector_cost): Likewise.
>(neoversen1_tunings): Likewise.
>* config/arm/aarch-cost-tables.h
>(neoversen1_extra_costs): New variable.
> 
> Signed-off-by: Evandro Menezes 
> ---
> gcc/config/aarch64/aarch64-cores.def |  20 ++--
> gcc/config/aarch64/aarch64.cc| 155 ---
> gcc/config/arm/aarch-cost-tables.h   | 107 ++
> 3 files changed, 259 insertions(+), 23 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-cores.def
> b/gcc/config/aarch64/aarch64-cores.def
> index 2ec88c98400..e352e4077b1 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -105,17 +105,17 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,
> thunderx2t99, V8_1A,  (CRYPTO), thu
> /* ARM ('A') cores. */
> AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, V8_2A,  (F16, RCPC,
> DOTPROD), cortexa53, 0x41, 0xd05, -1) AARCH64_CORE("cortex-a75",
> cortexa75, cortexa57, V8_2A,  (F16, RCPC, DOTPROD), cortexa73, 0x41,
> 0xd0a, -1) -AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, V8_2A,
> (F16, RCPC, DOTPROD), neoversen1, 0x41, 0xd0b, -1) -
> AARCH64_CORE("cortex-a76ae",  cortexa76ae, cortexa57, V8_2A,  (F16,
> RCPC, DOTPROD, SSBS), neoversen1, 0x41, 0xd0e, -1) -
> AARCH64_CORE("cortex-a77",  cortexa77, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS), neoversen1, 0x41, 0xd0d, -1) -AARCH64_CORE("cortex-
> a78",  cortexa78, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, SSBS, PROFILE),
> neoversen1, 0x41, 0xd41, -1) -AARCH64_CORE("cortex-a78ae",  cortexa78ae,
> cortexa57, V8_2A,  (F16, RCPC, DOTPROD, SSBS, PROFILE), neoversen1, 0x41,
> 0xd42, -1) -AARCH64_CORE("cortex-a78c",  cortexa78c, cortexa57, V8_2A,
> (F16, RCPC, DOTPROD, SSBS, PROFILE, FLAGM, PAUTH), neoversen1, 0x41,
> 0xd4b, -1)
> +AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, V8_2A,  (F16, RCPC,
> +DOTPROD), cortexa76, 0x41, 0xd0b, -1) AARCH64_CORE("cortex-a76ae",
> +cortexa76ae, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, SSBS), cortexa76,
> +0x41, 0xd0e, -1) AARCH64_CORE("cortex-a77",  cortexa77, cortexa57,
> +V8_2A,  (F16, RCPC, DOTPROD, SSBS), cortexa76, 0x41, 0xd0d, -1)
> +AARCH64_CORE("cortex-a78",  cortexa78, cortexa57, V8_2A,  (F16, RCPC,
> +DOTPROD, SSBS, PROFILE), cortexa76, 0x41, 0xd41, -1)
> +AARCH64_CORE("cortex-a78ae",  cortexa78ae, cortexa57, V8_2A,  (F16,
> +RCPC, DOTPROD, SSBS, PROFILE), cortexa76, 0x41, 0xd42, -1)
> +AARCH64_CORE("cortex-a78c",  cortexa78c, cortexa57, V8_2A,  (F16, RCPC,
> +DOTPROD, SSBS, PROFILE, FLAGM, PAUTH), cortexa76, 0x41, 0xd4b, -1)
> AARCH64_CORE("cortex-a65",  cortexa65, cortexa53, V8_2A,  (F16, RCPC,
> DOTPROD, SSBS), cortexa73, 0x41, 0xd06, -1) AARCH64_CORE("cortex-
> a65ae",  cortexa65ae, cortexa53, V8_2A,  (F16, RCPC, DOTPROD, SSBS),
> cortexa73, 0x41, 0xd43, -1) -AARCH64_CORE("cortex-x1",  cortexx1,
> cortexa57, V8_2A,  (F16, RCPC, DOTPROD, SSBS, PROFILE), neoversen1, 0x41,
> 0xd44, -1) -AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,
> (F16, RCPC, DOTPROD, SSBS, PROFILE, PAUTH), neoversen1, 0x41, 0xd4c, -1)
> -AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD,
> PROFILE), neoversen1, 0x41, 0xd0c, -1)
> +AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC,
> +DOTPROD, SSBS, PROFILE), cortexa76, 0x41, 0xd44, -1)
> +AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC,
> +DOTPROD, SSBS, PROFILE, PAUTH), cortexa76, 0x41, 0xd4c, -1)
> +AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD,
> +PROFILE), cortexa76, 0x41, 0xd0c, -1)
> AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, V8_2A,  (F16, RCPC,
> DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
> 

RE: [PATCH] RFC: New compact syntax for insn and insn_split in Machine Descriptions

2023-04-24 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, April 21, 2023 6:19 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> 
> Subject: Re: [PATCH] RFC: New compact syntax for insn and insn_split in
> Machine Descriptions
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > This patch adds support for a compact syntax for specifying
> > constraints in instruction patterns. Credit for the idea goes to Richard
> Earnshaw.
> >
> > I am sending up this RFC to get feedback for it's inclusion in GCC 14.
> > With this new syntax we want a clean break from the current
> > limitations to make something that is hopefully easier to use and maintain.
> >
> > The idea behind this compact syntax is that often times it's quite
> > hard to correlate the entries in the constrains list, attributes and 
> > instruction
> lists.
> >
> > One has to count and this often is tedious.  Additionally when
> > changing a single line in the insn multiple lines in a diff change,
> > making it harder to see what's going on.
> >
> > This new syntax takes into account many of the common things that are
> done in MD
> > files.   It's also worth saying that this version is intended to deal with 
> > the
> > common case of a string based alternatives.   For C chunks we have some
> ideas
> > but those are not intended to be addressed here.
> >
> > It's easiest to explain with an example:
> >
> > normal syntax:
> >
> > (define_insn_and_split "*movsi_aarch64"
> >   [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m,
> r,  r,  r, w,r,w, w")
> > (match_operand:SI 1 "aarch64_mov_operand"  "
> r,r,k,M,n,Usv,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Ds"))]
> >   "(register_operand (operands[0], SImode)
> > || aarch64_reg_or_zero (operands[1], SImode))"
> >   "@
> >mov\\t%w0, %w1
> >mov\\t%w0, %w1
> >mov\\t%w0, %w1
> >mov\\t%w0, %1
> >#
> >* return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\",
> operands[1]);
> >ldr\\t%w0, %1
> >ldr\\t%s0, %1
> >str\\t%w1, %0
> >str\\t%s1, %0
> >adrp\\t%x0, %A1\;ldr\\t%w0, [%x0, %L1]
> >adr\\t%x0, %c1
> >adrp\\t%x0, %A1
> >fmov\\t%s0, %w1
> >fmov\\t%w0, %s1
> >fmov\\t%s0, %s1
> >* return aarch64_output_scalar_simd_mov_immediate (operands[1],
> SImode);"
> >   "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL
> (operands[1]), SImode)
> > && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
> >[(const_int 0)]
> >"{
> >aarch64_expand_mov_immediate (operands[0], operands[1]);
> >DONE;
> > }"
> >   ;; The "mov_imm" type for CNT is just a placeholder.
> >   [(set_attr "type"
> "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_4,
> >
> load_4,store_4,store_4,load_4,adr,adr,f_mcr,f_mrc,fmov,neon_move")
> >(set_attr "arch"   "*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd")
> >(set_attr "length" "4,4,4,4,*,  4,4, 4,4, 4,8,4,4, 4, 4, 4,   4")
> > ]
> > )
> >
> > New syntax:
> >
> > (define_insn_and_split "*movsi_aarch64"
> >   [(set (match_operand:SI 0 "nonimmediate_operand")
> > (match_operand:SI 1 "aarch64_mov_operand"))]
> >   "(register_operand (operands[0], SImode)
> > || aarch64_reg_or_zero (operands[1], SImode))"
> >   "@@ (cons: 0 1; attrs: type arch length)
> >[=r, r  ; mov_reg  , *   , 4] mov\t%w0, %w1
> >[k , r  ; mov_reg  , *   , 4] ^
> >[r , k  ; mov_reg  , *   , 4] ^
> >[r , M  ; mov_imm  , *   , 4] mov\t%w0, %1
> >[r , n  ; mov_imm  , *   , *] #
> >[r , Usv; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate ('cnt',
> '%x0', operands[1]);
> >[r , m  ; load_4   , *   , 4] ldr\t%w0, %1
> >[w , m  ; load_4   , fp  , 4] ldr\t%s0, %1
> >[m , rZ ; store_4  , *   , 4] str\t%w1, %0
> >[m , w  ; store_4  , fp  , 4] str\t%s1, %0
> >[r , Usw; load_4   , *   , 8] adrp\t%x0, %A1;ldr\t%w0, [%x0, %L1]
> >[r , Usa; adr  , *   , 4] adr\t%x0, %c1
> >[r , Ush; adr  , *   , 4] adrp\t%x0, %A1
> >[w , rZ ; f_mcr, fp  , 4] fmov\t%s0, %w1
> >[r , w  ; f_mrc, fp  , 4] fmov\t%w0, %s1
> >[w , w  ; fmov , fp  , 4] fmov\t%s0, %s1
> >[w , Ds ; neon_move, simd, 4] <<
> aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);"
> >   "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL
> (operands[1]), SImode)
> > && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
> >   [(const_int 0)]
> >   {
> > aarch64_expand_mov_immediate (operands[0], operands[1]);
> > DONE;
> >   }
> >   ;; The "mov_imm" type for CNT is just a placeholder.
> > )
> >
> > The patch contains some more rewritten examples for both Arm and
> > AArch64.  I have included them for examples in this RFC but the final
> > version posted in GCC 14 will have these split out.
> >
> > The main syntax rules are as follows (See docs for full rules):
> >   - Template must start with "@@" to use the new syntax.
> >   - "@@" is followed by a layout in parentheses which 

RE: [PATCH 2/3]middle-end match.pd: simplify debug dump checks

2023-04-19 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, April 18, 2023 11:48 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de;
> j...@ventanamicro.com
> Subject: Re: [PATCH 2/3]middle-end match.pd: simplify debug dump checks
> 
> On Tue, Apr 18, 2023 at 12:22 PM Tamar Christina via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > This is a small improvement in QoL codegen for match.pd to save time
> > not re-evaluating the condition for printing debug information in every
> function.
> >
> > There is a small but consistent runtime and compile time win here.
> > The runtime win comes from not having to do the condition over again,
> > and on Arm plaforms we now use the new test-and-branch support for
> > booleans to only have a single instruction here.
> >
> > Compile time win is gotten from not having to do all the string
> > parsing for the printf and having less string interning to do.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> Ugh, I don't like the new global very much.  Can't we compute it in the 
> toplevel
> entry and pass it down as parameter?  Like passing down the actual dump FILE
> *?

Yeah that would work too, will do.

> 
> The file output in output_line_directive was because we originally had
> match.pd #includeing multiple match-*.pd files, we'd want to keep that
> supported I think.  But since the line directives are commented and there's 
> the
> same info available below, like
> 
> /* #line 798 "/home/rguenther/src/gcc-13-branch/gcc/match.pd" */
>   tree captures[2] ATTRIBUTE_UNUSED = { _p0, _p1 };
>   if (UNLIKELY (dump_file && (dump_flags &
> TDF_FOLDING))) fprintf (dump_file, "Matching expression %s:%d, %s:%d\n",
> "match.pd", 798, __FILE__, __LINE__);
> 
> there's probably no point in emitting them anymore (originally I emitted them
> non-commented but that didn't improve debugging much).  We might want
> to emit more "proper" line directives for the natively copied parts of 
> match.pd
> when code-generating c_expr parts, but that would be something separate.
> 
> Can you split the patch into two things?  A patch removing output of the
> commented line directives at the call sites is OK.

Sure, I'll hold up respinning waiting on the 3rd patch review since this one 
will change
that one as well, so easier to handle all comments at once.

Thanks for the review,
Tamar
> 
> Richard.
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR bootstrap/84402
> > * dumpfile.h (dump_folding_p): New.
> > * dumpfile.cc (set_dump_file): Use it.
> > * generic-match-head.cc (dump_debug): New.
> > * gimple-match-head.cc (dump_debug): New.
> > * genmatch.cc (output_line_directive):  Support outputting only line
> > because file is implied.
> > (dt_simplify::gen_1): Call debug_dump instead of printf.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h index
> >
> 7d5eca899dcc98676a9ce7a7efff8e439854ff89..e7b595ddecdcca9983d958
> 4b8b24
> > 17ae1941c7d4 100644
> > --- a/gcc/dumpfile.h
> > +++ b/gcc/dumpfile.h
> > @@ -522,6 +522,7 @@ parse_dump_option (const char *, const char **);
> > extern FILE *dump_file;  extern dump_flags_t dump_flags;  extern const
> > char *dump_file_name;
> > +extern bool dump_folding_p;
> >
> >  extern bool dumps_are_enabled;
> >
> > diff --git a/gcc/dumpfile.cc b/gcc/dumpfile.cc index
> >
> 51f68c8c6b40051ba3125c84298ee44ca52f5d17..f805aa73f3aa244d84714
> 9eec265
> > 05181ce4efe8 100644
> > --- a/gcc/dumpfile.cc
> > +++ b/gcc/dumpfile.cc
> > @@ -63,6 +63,7 @@ FILE *dump_file = NULL;  const char *dump_file_name;
> > dump_flags_t dump_flags;  bool dumps_are_enabled = false;
> > +bool dump_folding_p = false;
> >
> >
> >  /* Set global "dump_file" to NEW_DUMP_FILE, refreshing the
> "dumps_are_enabled"
> > @@ -73,6 +74,7 @@ set_dump_file (FILE *new_dump_file)  {
> >dumpfile_ensure_any_optinfo_are_flushed ();
> >dump_file = new_dump_file;
> > +  dump_folding_p = dump_file && (dump_flags & TDF_FOLDING);
> >dump_context::get ().refresh_dumps_are_enabled ();  }
> >
> > diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> > index
> >
> f011204c5be450663231bdece0596317b37f9f9b..16b8f9f3b61d3d5651

RE: [PATCH 1/3]middle-end match.pd: don't emit label if not needed

2023-04-18 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, April 18, 2023 11:38 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de;
> j...@ventanamicro.com
> Subject: Re: [PATCH 1/3]middle-end match.pd: don't emit label if not needed
> 
> On Tue, Apr 18, 2023 at 12:21 PM Tamar Christina via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > This is a small QoL codegen improvement for match.pd to not emit
> > labels when they are not needed.  The codegen is nice and there is a
> > small (but consistent) improvement in compile time.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> OK.  Btw - how many labels does this remove? (just wc -l the generated files?)

Not terribly much anymore, it's about 160 lines.  Though when benchmarking it
shows a consistent 2-5% speedup in compile time (I take the geomean of about 
100 compiles).

Regards,
Tamar

> 
> Richard.
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR bootstrap/84402
> > * genmatch.cc (dt_simplify::gen_1): Only emit labels if used.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc index
> >
> 4fab4135347c43d95546a7df0bb1c4d074937288..638606b2502f640e595
> 27fc5a0b2
> > 3fa3bedd0cee 100644
> > --- a/gcc/genmatch.cc
> > +++ b/gcc/genmatch.cc
> > @@ -3352,6 +3352,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool
> gimple, operand *result)
> >char local_fail_label[256];
> >snprintf (local_fail_label, 256, "next_after_fail%u", ++fail_label_cnt);
> >fail_label = local_fail_label;
> > +  bool needs_label = false;
> >
> >/* Analyze captures and perform early-outs on the incoming arguments
> >   that cover cases we cannot handle.  */ @@ -3366,6 +3367,7 @@
> > dt_simplify::gen_1 (FILE *f, int indent, bool gimple, operand *result)
> > fprintf_indent (f, indent,
> > "if (TREE_SIDE_EFFECTS (_p%d)) goto %s;\n",
> > i, fail_label);
> > +   needs_label = true;
> > if (verbose >= 1)
> >   warning_at (as_a  (s->match)->ops[i]->location,
> >   "forcing toplevel operand to have no "
> > @@ -3381,6 +3383,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool
> gimple, operand *result)
> > fprintf_indent (f, indent,
> > "if (TREE_SIDE_EFFECTS (captures[%d])) "
> > "goto %s;\n", i, fail_label);
> > +   needs_label = true;
> > if (verbose >= 1)
> >   warning_at (cinfo.info[i].c->location,
> >   "forcing captured operand to have no "
> > @@ -3423,7 +3426,10 @@ dt_simplify::gen_1 (FILE *f, int indent, bool
> gimple, operand *result)
> >  }
> >
> >if (s->kind == simplify::SIMPLIFY)
> > -fprintf_indent (f, indent, "if (UNLIKELY (!dbg_cnt (match))) goto 
> > %s;\n",
> fail_label);
> > +{
> > +  fprintf_indent (f, indent, "if (UNLIKELY (!dbg_cnt (match))) goto 
> > %s;\n",
> fail_label);
> > +  needs_label = true;
> > +}
> >
> >fprintf_indent (f, indent, "if (UNLIKELY (dump_file && (dump_flags &
> TDF_FOLDING))) "
> >"fprintf (dump_file, \"%s ", @@ -3496,9 +3502,12 @@
> > dt_simplify::gen_1 (FILE *f, int indent, bool gimple, operand *result)
> >   "res_op->resimplify (%s, valueize);\n",
> >   !e->force_leaf ? "lseq" : "NULL");
> >   if (e->force_leaf)
> > -   fprintf_indent (f, indent,
> > -   "if (!maybe_push_res_to_seq (res_op, NULL)) 
> > "
> > -   "goto %s;\n", fail_label);
> > +   {
> > + fprintf_indent (f, indent,
> > + "if (!maybe_push_res_to_seq (res_op, 
> > NULL)) "
> > + "goto %s;\n", fail_label);
> > + needs_label = true;
> > +   }
> > }
> > }
> >else if (result->type == operand::OP_CAPTURE @@ -3554,9
> > +356

[PATCH] RFC: New compact syntax for insn and insn_split in Machine Descriptions

2023-04-18 Thread Tamar Christina via Gcc-patches
Hi All,

This patch adds support for a compact syntax for specifying constraints in
instruction patterns. Credit for the idea goes to Richard Earnshaw.

I am sending up this RFC to get feedback for it's inclusion in GCC 14.
With this new syntax we want a clean break from the current limitations to make
something that is hopefully easier to use and maintain.

The idea behind this compact syntax is that often times it's quite hard to
correlate the entries in the constrains list, attributes and instruction lists.

One has to count and this often is tedious.  Additionally when changing a single
line in the insn multiple lines in a diff change, making it harder to see what's
going on.

This new syntax takes into account many of the common things that are done in MD
files.   It's also worth saying that this version is intended to deal with the
common case of a string based alternatives.   For C chunks we have some ideas
but those are not intended to be addressed here.

It's easiest to explain with an example:

normal syntax:

(define_insn_and_split "*movsi_aarch64"
  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m,  
r,  r,  r, w,r,w, w")
(match_operand:SI 1 "aarch64_mov_operand"  " 
r,r,k,M,n,Usv,m,m,rZ,w,Usw,Usa,Ush,rZ,w,w,Ds"))]
  "(register_operand (operands[0], SImode)
|| aarch64_reg_or_zero (operands[1], SImode))"
  "@
   mov\\t%w0, %w1
   mov\\t%w0, %w1
   mov\\t%w0, %w1
   mov\\t%w0, %1
   #
   * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
   ldr\\t%w0, %1
   ldr\\t%s0, %1
   str\\t%w1, %0
   str\\t%s1, %0
   adrp\\t%x0, %A1\;ldr\\t%w0, [%x0, %L1]
   adr\\t%x0, %c1
   adrp\\t%x0, %A1
   fmov\\t%s0, %w1
   fmov\\t%w0, %s1
   fmov\\t%s0, %s1
   * return aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);"
  "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)
&& REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
   [(const_int 0)]
   "{
   aarch64_expand_mov_immediate (operands[0], operands[1]);
   DONE;
}"
  ;; The "mov_imm" type for CNT is just a placeholder.
  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_4,

load_4,store_4,store_4,load_4,adr,adr,f_mcr,f_mrc,fmov,neon_move")
   (set_attr "arch"   "*,*,*,*,*,sve,*,fp,*,fp,*,*,*,fp,fp,fp,simd")
   (set_attr "length" "4,4,4,4,*,  4,4, 4,4, 4,8,4,4, 4, 4, 4,   4")
]
)

New syntax:

(define_insn_and_split "*movsi_aarch64"
  [(set (match_operand:SI 0 "nonimmediate_operand")
(match_operand:SI 1 "aarch64_mov_operand"))]
  "(register_operand (operands[0], SImode)
|| aarch64_reg_or_zero (operands[1], SImode))"
  "@@ (cons: 0 1; attrs: type arch length)
   [=r, r  ; mov_reg  , *   , 4] mov\t%w0, %w1
   [k , r  ; mov_reg  , *   , 4] ^
   [r , k  ; mov_reg  , *   , 4] ^
   [r , M  ; mov_imm  , *   , 4] mov\t%w0, %1
   [r , n  ; mov_imm  , *   , *] #
   [r , Usv; mov_imm  , sve , 4] << aarch64_output_sve_cnt_immediate ('cnt', 
'%x0', operands[1]);
   [r , m  ; load_4   , *   , 4] ldr\t%w0, %1
   [w , m  ; load_4   , fp  , 4] ldr\t%s0, %1
   [m , rZ ; store_4  , *   , 4] str\t%w1, %0
   [m , w  ; store_4  , fp  , 4] str\t%s1, %0
   [r , Usw; load_4   , *   , 8] adrp\t%x0, %A1;ldr\t%w0, [%x0, %L1]
   [r , Usa; adr  , *   , 4] adr\t%x0, %c1
   [r , Ush; adr  , *   , 4] adrp\t%x0, %A1
   [w , rZ ; f_mcr, fp  , 4] fmov\t%s0, %w1
   [r , w  ; f_mrc, fp  , 4] fmov\t%w0, %s1
   [w , w  ; fmov , fp  , 4] fmov\t%s0, %s1
   [w , Ds ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate 
(operands[1], SImode);"
  "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)
&& REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
  [(const_int 0)]
  {
aarch64_expand_mov_immediate (operands[0], operands[1]);
DONE;
  }
  ;; The "mov_imm" type for CNT is just a placeholder.
)

The patch contains some more rewritten examples for both Arm and AArch64.  I
have included them for examples in this RFC but the final version posted in
GCC 14 will have these split out.

The main syntax rules are as follows (See docs for full rules):
  - Template must start with "@@" to use the new syntax.
  - "@@" is followed by a layout in parentheses which is "cons:" followed by
a list of match_operand/match_scratch IDs, then a semicolon, then the
same for attributes ("attrs:"). Both sections are optional (so you can
use only cons, or only attrs, or both), and cons must come before attrs
if present.
  - Each alternative begins with any amount of whitespace.
  - Following the whitespace is a comma-separated list of constraints and/or
attributes within brackets [], with sections separated by a semicolon.
  - Following the closing ']' is any amount of whitespace, and then the actual
asm output.
  - Spaces are allowed in the list (they will simply be removed).
  - All alternatives should be specified: a blank list should be
"[,,]", 

[PATCH 2/3]middle-end match.pd: simplify debug dump checks

2023-04-18 Thread Tamar Christina via Gcc-patches
Hi All,

This is a small improvement in QoL codegen for match.pd to save time not
re-evaluating the condition for printing debug information in every function.

There is a small but consistent runtime and compile time win here.  The runtime
win comes from not having to do the condition over again, and on Arm plaforms
we now use the new test-and-branch support for booleans to only have a single
instruction here.

Compile time win is gotten from not having to do all the string parsing for the
printf and having less string interning to do.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR bootstrap/84402
* dumpfile.h (dump_folding_p): New.
* dumpfile.cc (set_dump_file): Use it.
* generic-match-head.cc (dump_debug): New.
* gimple-match-head.cc (dump_debug): New.
* genmatch.cc (output_line_directive):  Support outputting only line
because file is implied.
(dt_simplify::gen_1): Call debug_dump instead of printf.

--- inline copy of patch -- 
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index 
7d5eca899dcc98676a9ce7a7efff8e439854ff89..e7b595ddecdcca9983d9584b8b2417ae1941c7d4
 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -522,6 +522,7 @@ parse_dump_option (const char *, const char **);
 extern FILE *dump_file;
 extern dump_flags_t dump_flags;
 extern const char *dump_file_name;
+extern bool dump_folding_p;
 
 extern bool dumps_are_enabled;
 
diff --git a/gcc/dumpfile.cc b/gcc/dumpfile.cc
index 
51f68c8c6b40051ba3125c84298ee44ca52f5d17..f805aa73f3aa244d847149eec26505181ce4efe8
 100644
--- a/gcc/dumpfile.cc
+++ b/gcc/dumpfile.cc
@@ -63,6 +63,7 @@ FILE *dump_file = NULL;
 const char *dump_file_name;
 dump_flags_t dump_flags;
 bool dumps_are_enabled = false;
+bool dump_folding_p = false;
 
 
 /* Set global "dump_file" to NEW_DUMP_FILE, refreshing the "dumps_are_enabled"
@@ -73,6 +74,7 @@ set_dump_file (FILE *new_dump_file)
 {
   dumpfile_ensure_any_optinfo_are_flushed ();
   dump_file = new_dump_file;
+  dump_folding_p = dump_file && (dump_flags & TDF_FOLDING);
   dump_context::get ().refresh_dumps_are_enabled ();
 }
 
diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index 
f011204c5be450663231bdece0596317b37f9f9b..16b8f9f3b61d3d5651a5a41a8c0552f50b55cc7c
 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -102,3 +102,17 @@ optimize_successive_divisions_p (tree, tree)
 {
   return false;
 }
+
+/* Helper method for debug printing to reducing string parsing overhead.  Keep
+   in sync with version in gimple-match-head.cc.  */
+
+static
+void dump_debug (bool simplify, int loc, const char *file, int lineno)
+{
+  if (simplify)
+fprintf (dump_file, "Applying pattern %s:%d, %s:%d\n", "match.pd", loc,
+file, lineno);
+  else
+fprintf (dump_file, "Matching expression %s:%d, %s:%d\n", "match.pd", loc,
+file, lineno);
+}
\ No newline at end of file
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 
638606b2502f640e59527fc5a0b23fa3bedd0cee..bd7c6ff4a3fb89d456b02242707fd823b737f20d
 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -185,7 +185,8 @@ fprintf_indent (FILE *f, unsigned int indent, const char 
*format, ...)
 
 static void
 output_line_directive (FILE *f, location_t location,
-  bool dumpfile = false, bool fnargs = false)
+  bool dumpfile = false, bool fnargs = false,
+  bool loc_only = false)
 {
   const line_map_ordinary *map;
   linemap_resolve_location (line_table, location, LRK_SPELLING_LOCATION, );
@@ -204,7 +205,9 @@ output_line_directive (FILE *f, location_t location,
   else
++file;
 
-  if (fnargs)
+  if (loc_only)
+   fprintf (f, "%d", loc.line);
+  else if (fnargs)
fprintf (f, "\"%s\", %d", file, loc.line);
   else
fprintf (f, "%s:%d", file, loc.line);
@@ -3431,14 +3434,11 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
   needs_label = true;
 }
 
-  fprintf_indent (f, indent, "if (UNLIKELY (dump_file && (dump_flags & 
TDF_FOLDING))) "
-  "fprintf (dump_file, \"%s ",
-  s->kind == simplify::SIMPLIFY
-  ? "Applying pattern" : "Matching expression");
-  fprintf (f, "%%s:%%d, %%s:%%d\\n\", ");
+  fprintf_indent (f, indent, "if (UNLIKELY (dump_folding_p)) "
+   "dump_debug (%s, ", s->kind == simplify::SIMPLIFY ? "true" : "false");
   output_line_directive (f,
 result ? result->location : s->match->location, true,
-true);
+true, true);
   fprintf (f, ", __FILE__, __LINE__);\n");
 
   fprintf_indent (f, indent, "{\n");
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 
ec603f9d043c3924ea442bb49b5300a3573503cf..ae0c5c8a74fd9f1acdb616014941b11961e96c04
 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -1412,3 +1412,17 

[PATCH 2/3] RFC - match.pd: simplify debug dump checks

2023-04-06 Thread Tamar Christina via Gcc-patches
Hi All,

Just sending these so people can test the series

This is a small improvement in QoL codegen for match.pd to save time not
re-evaluating the condition for printing debug information in every function.

There is a small but consistent runtime and compile time win here.  The runtime
win comes from not having to do the condition over again, and on Arm plaforms
we now use the new test-and-branch support for booleans to only have a single
instruction here.

Compile time win is gotten from not having to do all the string parsing for the
printf and having less string interning to do.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for GCC 14?

Thanks,
Tamar

gcc/ChangeLog:

PR bootstrap/84402
* dumpfile.h (dump_folding_p): New.
* dumpfile.cc (set_dump_file): Use it.
* generic-match-head.cc (dump_debug): New.
* gimple-match-head.cc (dump_debug): New.
* genmatch.cc (output_line_directive):  Support outputting only line
because file is implied.
(dt_simplify::gen_1): Call debug_dump instead of printf.

--- inline copy of patch -- 
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index 
7d5eca899dcc98676a9ce7a7efff8e439854ff89..e7b595ddecdcca9983d9584b8b2417ae1941c7d4
 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -522,6 +522,7 @@ parse_dump_option (const char *, const char **);
 extern FILE *dump_file;
 extern dump_flags_t dump_flags;
 extern const char *dump_file_name;
+extern bool dump_folding_p;
 
 extern bool dumps_are_enabled;
 
diff --git a/gcc/dumpfile.cc b/gcc/dumpfile.cc
index 
51f68c8c6b40051ba3125c84298ee44ca52f5d17..f805aa73f3aa244d847149eec26505181ce4efe8
 100644
--- a/gcc/dumpfile.cc
+++ b/gcc/dumpfile.cc
@@ -63,6 +63,7 @@ FILE *dump_file = NULL;
 const char *dump_file_name;
 dump_flags_t dump_flags;
 bool dumps_are_enabled = false;
+bool dump_folding_p = false;
 
 
 /* Set global "dump_file" to NEW_DUMP_FILE, refreshing the "dumps_are_enabled"
@@ -73,6 +74,7 @@ set_dump_file (FILE *new_dump_file)
 {
   dumpfile_ensure_any_optinfo_are_flushed ();
   dump_file = new_dump_file;
+  dump_folding_p = dump_file && (dump_flags & TDF_FOLDING);
   dump_context::get ().refresh_dumps_are_enabled ();
 }
 
diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index 
f011204c5be450663231bdece0596317b37f9f9b..16b8f9f3b61d3d5651a5a41a8c0552f50b55cc7c
 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -102,3 +102,17 @@ optimize_successive_divisions_p (tree, tree)
 {
   return false;
 }
+
+/* Helper method for debug printing to reducing string parsing overhead.  Keep
+   in sync with version in gimple-match-head.cc.  */
+
+static
+void dump_debug (bool simplify, int loc, const char *file, int lineno)
+{
+  if (simplify)
+fprintf (dump_file, "Applying pattern %s:%d, %s:%d\n", "match.pd", loc,
+file, lineno);
+  else
+fprintf (dump_file, "Matching expression %s:%d, %s:%d\n", "match.pd", loc,
+file, lineno);
+}
\ No newline at end of file
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 
638606b2502f640e59527fc5a0b23fa3bedd0cee..bd7c6ff4a3fb89d456b02242707fd823b737f20d
 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -185,7 +185,8 @@ fprintf_indent (FILE *f, unsigned int indent, const char 
*format, ...)
 
 static void
 output_line_directive (FILE *f, location_t location,
-  bool dumpfile = false, bool fnargs = false)
+  bool dumpfile = false, bool fnargs = false,
+  bool loc_only = false)
 {
   const line_map_ordinary *map;
   linemap_resolve_location (line_table, location, LRK_SPELLING_LOCATION, );
@@ -204,7 +205,9 @@ output_line_directive (FILE *f, location_t location,
   else
++file;
 
-  if (fnargs)
+  if (loc_only)
+   fprintf (f, "%d", loc.line);
+  else if (fnargs)
fprintf (f, "\"%s\", %d", file, loc.line);
   else
fprintf (f, "%s:%d", file, loc.line);
@@ -3431,14 +3434,11 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
   needs_label = true;
 }
 
-  fprintf_indent (f, indent, "if (UNLIKELY (dump_file && (dump_flags & 
TDF_FOLDING))) "
-  "fprintf (dump_file, \"%s ",
-  s->kind == simplify::SIMPLIFY
-  ? "Applying pattern" : "Matching expression");
-  fprintf (f, "%%s:%%d, %%s:%%d\\n\", ");
+  fprintf_indent (f, indent, "if (UNLIKELY (dump_folding_p)) "
+   "dump_debug (%s, ", s->kind == simplify::SIMPLIFY ? "true" : "false");
   output_line_directive (f,
 result ? result->location : s->match->location, true,
-true);
+true, true);
   fprintf (f, ", __FILE__, __LINE__);\n");
 
   fprintf_indent (f, indent, "{\n");
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 
ec603f9d043c3924ea442bb49b5300a3573503cf..ae0c5c8a74fd9f1acdb616014941b11961e96c04
 100644
--- a/gcc/gimple-match-head.cc

[PATCH 1/3] RFC match.pd: don't emit label if not needed

2023-04-06 Thread Tamar Christina via Gcc-patches
Hi All,

Just sending these so people can test the series.

This is a small QoL codegen improvement for match.pd to not emit labels when
they are not needed.  The codegen is nice and there is a small (but consistent)
improvement in compile time.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for GCC 14?

Thanks,
Tamar

gcc/ChangeLog:

PR bootstrap/84402
* genmatch.cc (dt_simplify::gen_1): Only emit labels if used.

--- inline copy of patch -- 
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 
4fab4135347c43d95546a7df0bb1c4d074937288..638606b2502f640e59527fc5a0b23fa3bedd0cee
 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -3352,6 +3352,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
   char local_fail_label[256];
   snprintf (local_fail_label, 256, "next_after_fail%u", ++fail_label_cnt);
   fail_label = local_fail_label;
+  bool needs_label = false;
 
   /* Analyze captures and perform early-outs on the incoming arguments
  that cover cases we cannot handle.  */
@@ -3366,6 +3367,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
fprintf_indent (f, indent,
"if (TREE_SIDE_EFFECTS (_p%d)) goto %s;\n",
i, fail_label);
+   needs_label = true;
if (verbose >= 1)
  warning_at (as_a  (s->match)->ops[i]->location,
  "forcing toplevel operand to have no "
@@ -3381,6 +3383,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
fprintf_indent (f, indent,
"if (TREE_SIDE_EFFECTS (captures[%d])) "
"goto %s;\n", i, fail_label);
+   needs_label = true;
if (verbose >= 1)
  warning_at (cinfo.info[i].c->location,
  "forcing captured operand to have no "
@@ -3423,7 +3426,10 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
 }
 
   if (s->kind == simplify::SIMPLIFY)
-fprintf_indent (f, indent, "if (UNLIKELY (!dbg_cnt (match))) goto %s;\n", 
fail_label);
+{
+  fprintf_indent (f, indent, "if (UNLIKELY (!dbg_cnt (match))) goto 
%s;\n", fail_label);
+  needs_label = true;
+}
 
   fprintf_indent (f, indent, "if (UNLIKELY (dump_file && (dump_flags & 
TDF_FOLDING))) "
   "fprintf (dump_file, \"%s ",
@@ -3496,9 +3502,12 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
  "res_op->resimplify (%s, valueize);\n",
  !e->force_leaf ? "lseq" : "NULL");
  if (e->force_leaf)
-   fprintf_indent (f, indent,
-   "if (!maybe_push_res_to_seq (res_op, NULL)) "
-   "goto %s;\n", fail_label);
+   {
+ fprintf_indent (f, indent,
+ "if (!maybe_push_res_to_seq (res_op, NULL)) "
+ "goto %s;\n", fail_label);
+ needs_label = true;
+   }
}
}
   else if (result->type == operand::OP_CAPTURE
@@ -3554,9 +3563,12 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
  continue;
if (cinfo.info[i].result_use_count
> cinfo.info[i].match_use_count)
- fprintf_indent (f, indent,
- "if (! tree_invariant_p (captures[%d])) "
- "goto %s;\n", i, fail_label);
+ {
+   fprintf_indent (f, indent,
+   "if (! tree_invariant_p (captures[%d])) "
+   "goto %s;\n", i, fail_label);
+   needs_label = true;
+ }
  }
  for (unsigned j = 0; j < e->ops.length (); ++j)
{
@@ -3607,6 +3619,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
{
  fprintf_indent (f, indent, "if (!_r)\n");
  fprintf_indent (f, indent, "  goto %s;\n", fail_label);
+ needs_label = true;
}
}
}
@@ -3647,7 +3660,8 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
 }
   indent -= 2;
   fprintf_indent (f, indent, "}\n");
-  fprintf (f, "%s:;\n", fail_label);
+  if (needs_label)
+fprintf (f, "%s:;\n", fail_label);
   fail_label = NULL;
 }
 




-- 
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 
4fab4135347c43d95546a7df0bb1c4d074937288..638606b2502f640e59527fc5a0b23fa3bedd0cee
 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -3352,6 +3352,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
   char 

[PATCH][committed][testsuite]: move mla_1 test to aarch64 only [PR109118]

2023-03-14 Thread Tamar Christina via Gcc-patches
Hi All,

I previously made the test generic, but there's no list
of targets that support integer MLA, and so it's not
really feasible for me to make this generic.

As such I've moved it to be AArch64 only.

committed under the obvious rule.

Thanks,
Tamar

gcc/testsuite/ChangeLog:

PR testsuite/109118
* gcc.dg/mla_1.c: Moved to...
* gcc.target/aarch64/sve/mla_3.c: ...here.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/mla_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/mla_3.c
similarity index 78%
rename from gcc/testsuite/gcc.dg/mla_1.c
rename to gcc/testsuite/gcc.target/aarch64/sve/mla_3.c
index 
98e5808ee7005e3e5c1b2d5688bfaf267a4d66ce..25e99f7d72a2fd5be0cdf9a8e9d9edddf22c40cb
 100644
--- a/gcc/testsuite/gcc.dg/mla_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mla_3.c
@@ -1,6 +1,5 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target vect_int } */
-/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve 
-fdump-tree-optimized" { target aarch64*-*-* } } */
+/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve 
-fdump-tree-optimized" } */
 
 unsigned int
 f1 (unsigned int a, unsigned int b, unsigned int c) {
@@ -37,4 +36,4 @@ g3 (vec a, vec b, vec c)
   return a * b + c;
 }
 
-/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" { target 
aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" } } */




-- 
diff --git a/gcc/testsuite/gcc.dg/mla_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/mla_3.c
similarity index 78%
rename from gcc/testsuite/gcc.dg/mla_1.c
rename to gcc/testsuite/gcc.target/aarch64/sve/mla_3.c
index 
98e5808ee7005e3e5c1b2d5688bfaf267a4d66ce..25e99f7d72a2fd5be0cdf9a8e9d9edddf22c40cb
 100644
--- a/gcc/testsuite/gcc.dg/mla_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mla_3.c
@@ -1,6 +1,5 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target vect_int } */
-/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve 
-fdump-tree-optimized" { target aarch64*-*-* } } */
+/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve 
-fdump-tree-optimized" } */
 
 unsigned int
 f1 (unsigned int a, unsigned int b, unsigned int c) {
@@ -37,4 +36,4 @@ g3 (vec a, vec b, vec c)
   return a * b + c;
 }
 
-/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" { target 
aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" } } */





RE: [PATCH 2/4][ranger]: Add range-ops for widen addition and widen multiplication [PR108583]

2023-03-10 Thread Tamar Christina via Gcc-patches
> >> As Andrew has been advising on this one, I'd prefer for him to review it.
> >> However, he's on vacation this week.  FYI...
> >>
> >> Aldy
> >>
> >> On Mon, Mar 6, 2023 at 12:22 PM Tamar Christina
> >>  wrote:
> >>> Ping.
> >>>
> >>> And updated the patch to reject cases that we don't expect or can
> >>> handle
> >> cleanly for now.
> >>
> Its OK by me...  but i think a release managers haa to sign off on it for this
> stage. Next stage 1 I will formalize the process a bit more for nonstandard
> rangeops
> 

Thanks!

Richi is this change OK with you?

Thanks,
Tamar
> Andrew



RE: [PATCH 3/4]middle-end: Implement preferred_div_as_shifts_over_mult [PR108583]

2023-03-09 Thread Tamar Christina via Gcc-patches
Hi,

Here's the respun patch.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/108583
* target.def (preferred_div_as_shifts_over_mult): New.
* doc/tm.texi.in: Document it.
* doc/tm.texi: Regenerate.
* targhooks.cc (default_preferred_div_as_shifts_over_mult): New.
* targhooks.h (default_preferred_div_as_shifts_over_mult): New.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Use it.

gcc/testsuite/ChangeLog:

PR target/108583
* gcc.dg/vect/vect-div-bitmask-4.c: New test.
* gcc.dg/vect/vect-div-bitmask-5.c: New test.

--- inline copy of patch ---

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 
50a8872a6695b18b9bed0d393bacf733833633db..bf7269e323de1a065d4d04376e5a2703cbb0f9fa
 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6137,6 +6137,12 @@ instruction pattern.  There is no need for the hook to 
handle these two
 implementation approaches itself.
 @end deftypefn
 
+@deftypefn {Target Hook} bool 
TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT (const_tree @var{type})
+Sometimes it is possible to implement a vector division using a sequence
+of two addition-shift pairs, giving four instructions in total.
+Return true if taking this approach for @var{vectype} is likely
+to be better than using a sequence involving highpart multiplication.
+Default is false if @code{can_mult_highpart_p}, otherwise true.
 @end deftypefn
 
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION 
(unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 
3e07978a02f4e6077adae6cadc93ea4273295f1f..0051017a7fd67691a343470f36ad4fc32c8e7e15
 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4173,6 +4173,7 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_VECTORIZE_VEC_PERM_CONST
 
+@hook TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT
 
 @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
 
diff --git a/gcc/target.def b/gcc/target.def
index 
e0a5c7adbd962f5d08ed08d1d81afa2c2baa64a5..e4474a3ed6bd2f5f5c010bf0d40c2a371370490c
 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1868,6 +1868,18 @@ correct for most targets.",
  poly_uint64, (const_tree type),
  default_preferred_vector_alignment)
 
+/* Returns whether the target has a preference for decomposing divisions using
+   shifts rather than multiplies.  */
+DEFHOOK
+(preferred_div_as_shifts_over_mult,
+ "Sometimes it is possible to implement a vector division using a sequence\n\
+of two addition-shift pairs, giving four instructions in total.\n\
+Return true if taking this approach for @var{vectype} is likely\n\
+to be better than using a sequence involving highpart multiplication.\n\
+Default is false if @code{can_mult_highpart_p}, otherwise true.",
+ bool, (const_tree type),
+ default_preferred_div_as_shifts_over_mult)
+
 /* Return true if vector alignment is reachable (by peeling N
iterations) for the given scalar type.  */
 DEFHOOK
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 
a6a4809ca91baa5d7fad2244549317a31390f0c2..a207963b9e6eb9300df0043e1b79aa6c941d0f7f
 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -53,6 +53,8 @@ extern scalar_int_mode default_unwind_word_mode (void);
 extern unsigned HOST_WIDE_INT default_shift_truncation_mask
   (machine_mode);
 extern unsigned int default_min_divisions_for_recip_mul (machine_mode);
+extern bool default_preferred_div_as_shifts_over_mult
+  (const_tree);
 extern int default_mode_rep_extended (scalar_int_mode, scalar_int_mode);
 
 extern tree default_stack_protect_guard (void);
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index 
211525720a620d6f533e2da91e03877337a931e7..7f39ff9b7ec2bf66625d48a47bb76e96c05a3233
 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1483,6 +1483,15 @@ default_preferred_vector_alignment (const_tree type)
   return TYPE_ALIGN (type);
 }
 
+/* The default implementation of
+   TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT.  */
+
+bool
+default_preferred_div_as_shifts_over_mult (const_tree type)
+{
+  return can_mult_highpart_p (TYPE_MODE (type), TYPE_UNSIGNED (type));
+}
+
 /* By default assume vectors of element TYPE require a multiple of the natural
alignment of TYPE.  TYPE is naturally aligned if IS_PACKED is false.  */
 bool
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
new file mode 100644
index 
..c81f8946922250234bf759e0a0a04ea8c1f73e3c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
@@ -0,0 +1,25 @@
+/* { dg-require-effective-target vect_int } */
+
+#include 
+#include "tree-vect.h"
+
+typedef unsigned __attribute__((__vector_size__ (16))) V;
+
+static __attribute__((__noinline__)) __attribute__((__noclone__)) V
+foo (V v, 

RE: [PATCH 2/4][ranger]: Add range-ops for widen addition and widen multiplication [PR108583]

2023-03-09 Thread Tamar Christina via Gcc-patches
Cheers,

Thanks! I'll way for him to come back then 

Thanks,
Tamar

> -Original Message-
> From: Aldy Hernandez 
> Sent: Wednesday, March 8, 2023 8:57 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; amacl...@redhat.com
> Subject: Re: [PATCH 2/4][ranger]: Add range-ops for widen addition and
> widen multiplication [PR108583]
> 
> As Andrew has been advising on this one, I'd prefer for him to review it.
> However, he's on vacation this week.  FYI...
> 
> Aldy
> 
> On Mon, Mar 6, 2023 at 12:22 PM Tamar Christina
>  wrote:
> >
> > Ping.
> >
> > And updated the patch to reject cases that we don't expect or can handle
> cleanly for now.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR target/108583
> > * gimple-range-op.h (gimple_range_op_handler): Add
> maybe_non_standard.
> > * gimple-range-op.cc
> (gimple_range_op_handler::gimple_range_op_handler):
> > Use it.
> > (gimple_range_op_handler::maybe_non_standard): New.
> > * range-op.cc (class operator_widen_plus_signed,
> > operator_widen_plus_signed::wi_fold, class
> operator_widen_plus_unsigned,
> > operator_widen_plus_unsigned::wi_fold, class
> operator_widen_mult_signed,
> > operator_widen_mult_signed::wi_fold, class
> operator_widen_mult_unsigned,
> > operator_widen_mult_unsigned::wi_fold,
> > ptr_op_widen_mult_signed, ptr_op_widen_mult_unsigned,
> > ptr_op_widen_plus_signed, ptr_op_widen_plus_unsigned): New.
> > * range-op.h (ptr_op_widen_mult_signed,
> ptr_op_widen_mult_unsigned,
> > ptr_op_widen_plus_signed, ptr_op_widen_plus_unsigned): New
> >
> > Co-Authored-By: Andrew MacLeod 
> >
> > --- Inline copy of patch ---
> >
> > diff --git a/gcc/gimple-range-op.h b/gcc/gimple-range-op.h index
> >
> 743b858126e333ea9590c0f175aacb476260c048..1bf63c5ce6f5db924a1f5
> 907ab45
> > 39e376281bd0 100644
> > --- a/gcc/gimple-range-op.h
> > +++ b/gcc/gimple-range-op.h
> > @@ -41,6 +41,7 @@ public:
> >  relation_trio = TRIO_VARYING);
> >  private:
> >void maybe_builtin_call ();
> > +  void maybe_non_standard ();
> >gimple *m_stmt;
> >tree m_op1, m_op2;
> >  };
> > diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index
> >
> d9dfdc56939bb62ade72726b15c3d5e87e4ddcd1..a5d625387e712c170e1e
> 68f6a7d4
> > 94027f6ef0d0 100644
> > --- a/gcc/gimple-range-op.cc
> > +++ b/gcc/gimple-range-op.cc
> > @@ -179,6 +179,8 @@
> gimple_range_op_handler::gimple_range_op_handler (gimple *s)
> >// statements.
> >if (is_a  (m_stmt))
> >  maybe_builtin_call ();
> > +  else
> > +maybe_non_standard ();
> >  }
> >
> >  // Calculate what we can determine of the range of this unary @@
> > -764,6 +766,57 @@ public:
> >}
> >  } op_cfn_parity;
> >
> > +// Set up a gimple_range_op_handler for any nonstandard function
> > +which can be // supported via range-ops.
> > +
> > +void
> > +gimple_range_op_handler::maybe_non_standard () {
> > +  range_operator *signed_op = ptr_op_widen_mult_signed;
> > +  range_operator *unsigned_op = ptr_op_widen_mult_unsigned;
> > +  if (gimple_code (m_stmt) == GIMPLE_ASSIGN)
> > +switch (gimple_assign_rhs_code (m_stmt))
> > +  {
> > +   case WIDEN_PLUS_EXPR:
> > +   {
> > + signed_op = ptr_op_widen_plus_signed;
> > + unsigned_op = ptr_op_widen_plus_unsigned;
> > +   }
> > +   gcc_fallthrough ();
> > +   case WIDEN_MULT_EXPR:
> > +   {
> > + m_valid = false;
> > + m_op1 = gimple_assign_rhs1 (m_stmt);
> > + m_op2 = gimple_assign_rhs2 (m_stmt);
> > + tree ret = gimple_assign_lhs (m_stmt);
> > + bool signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED;
> > + bool signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED;
> > + bool signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED;
> > +
> > + /* Normally these operands should all have the same sign, but
> > +some passes and violate this by taking mismatched sign args.  
> > At
> > +the moment the only one that's possible is mismatch inputs and
> > +unsigned output.  Once ranger supports signs for the operands 
> > we
> > +can properly fix it,  for now only accept the case we can do
> > +correctly.  */
> > + if ((signed1 ^ signed2) && signed_ret)
> > +   return;
> > +
> > + m_valid = true;
> > + if (signed2 && !signed1)
> > +   std::swap (m_op1, m_op2);
> > +
> > + if (signed1 || signed2)
> > +   m_int = signed_op;
> > + else
> > +   m_int = unsigned_op;
> > + break;
> > +   }
> > +   default:
> > + break;
> > +  }
> > +}
> > +
> >  // Set up a gimple_range_op_handler for any built in function which
> > can be  // supported via range-ops.
> >
> > diff --git a/gcc/range-op.h 

[PATCH]middle-end: don't form FMAs when multiplication is not single use. [PR108583]

2023-03-09 Thread Tamar Christina via Gcc-patches
Hi All,

The testcase

typedef unsigned int vec __attribute__((vector_size(32)));
vec
f3 (vec a, vec b, vec c)
{
  vec d = a * b;
  return d + ((c + d) >> 1);
}

shows a case where we don't want to form an FMA due to the MUL not being single
use.  In this case to form an FMA we have to redo the MUL as well as we no
longer have it to share.

As such making an FMA here would be a de-optimization.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/108583
* tree-ssa-math-opts.cc (convert_mult_to_fma): Inhibit FMA in case not
single use.

gcc/testsuite/ChangeLog:

PR target/108583
* gcc.dg/mla_1.c: New test.

Co-Authored-By: Richard Sandiford 

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/mla_1.c b/gcc/testsuite/gcc.dg/mla_1.c
new file mode 100644
index 
..a92ecf248116d89b1bc4207a907ea5ed95728a28
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mla_1.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve 
-fdump-tree-optimized" } */
+
+unsigned int
+f1 (unsigned int a, unsigned int b, unsigned int c) {
+  unsigned int d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+unsigned int
+g1 (unsigned int a, unsigned int b, unsigned int c) {
+  return a * b + c;
+}
+
+__Uint32x4_t
+f2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
+  __Uint32x4_t d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+__Uint32x4_t
+g2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
+  return a * b + c;
+}
+
+typedef unsigned int vec __attribute__((vector_size(32))); vec
+f3 (vec a, vec b, vec c)
+{
+  vec d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+vec
+g3 (vec a, vec b, vec c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" { target 
aarch64*-*-* } } } */
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 
5ab5b944a573ad24ce8427aff24fc5215bf05dac..26ed91d58fa4709a67c903ad446d267a3113c172
 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -3346,6 +3346,20 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree 
op2,
param_avoid_fma_max_bits));
   bool defer = check_defer;
   bool seen_negate_p = false;
+
+  /* There is no numerical difference between fused and unfused integer FMAs,
+ and the assumption below that FMA is as cheap as addition is unlikely
+ to be true, especially if the multiplication occurs multiple times on
+ the same chain.  E.g., for something like:
+
+(((a * b) + c) >> 1) + (a * b)
+
+ we do not want to duplicate the a * b into two additions, not least
+ because the result is not a natural FMA chain.  */
+  if (ANY_INTEGRAL_TYPE_P (type)
+  && !has_single_use (mul_result))
+return false;
+
   /* Make sure that the multiplication statement becomes dead after
  the transformation, thus that all uses are transformed to FMAs.
  This means we assume that an FMA operation has the same cost




-- 
diff --git a/gcc/testsuite/gcc.dg/mla_1.c b/gcc/testsuite/gcc.dg/mla_1.c
new file mode 100644
index 
..a92ecf248116d89b1bc4207a907ea5ed95728a28
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mla_1.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve 
-fdump-tree-optimized" } */
+
+unsigned int
+f1 (unsigned int a, unsigned int b, unsigned int c) {
+  unsigned int d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+unsigned int
+g1 (unsigned int a, unsigned int b, unsigned int c) {
+  return a * b + c;
+}
+
+__Uint32x4_t
+f2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
+  __Uint32x4_t d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+__Uint32x4_t
+g2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
+  return a * b + c;
+}
+
+typedef unsigned int vec __attribute__((vector_size(32))); vec
+f3 (vec a, vec b, vec c)
+{
+  vec d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+vec
+g3 (vec a, vec b, vec c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" { target 
aarch64*-*-* } } } */
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 
5ab5b944a573ad24ce8427aff24fc5215bf05dac..26ed91d58fa4709a67c903ad446d267a3113c172
 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -3346,6 +3346,20 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree 
op2,
param_avoid_fma_max_bits));
   bool defer = check_defer;
   bool seen_negate_p = false;
+
+  /* There is no numerical difference between fused and unfused integer FMAs,
+ and the assumption below that FMA is as cheap as addition is unlikely
+ to be true, especially if the multiplication occurs multiple times on
+ the same chain.  E.g., 

RE: [PATCH 4/4]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583]

2023-03-08 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Wednesday, March 8, 2023 9:18 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 4/4]AArch64 Update div-bitmask to implement new
> optab instead of target hook [PR108583]
> 
> Tamar Christina  writes:
> > Ping,
> >
> > And updating the hook.
> >
> > There are no new test as new correctness tests were added to the
> > mid-end and the existing codegen tests for this already exist.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR target/108583
> > * config/aarch64/aarch64-simd.md
> (@aarch64_bitmask_udiv3): Remove.
> > (*bitmask_shift_plus): New.
> > * config/aarch64/aarch64-sve2.md (*bitmask_shift_plus): New.
> > (@aarch64_bitmask_udiv3): Remove.
> > * config/aarch64/aarch64.cc
> > (aarch64_vectorize_can_special_div_by_constant,
> > TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Removed.
> > (TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT,
> > aarch64_vectorize_preferred_div_as_shifts_over_mult): New.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index
> >
> 7f212bf37cd2c120dceb7efa733c9fa76226f029..e1ecb88634f93d380ef534
> 093ea6
> > 599dc7278108 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -4867,60 +4867,27 @@ (define_expand
> "aarch64_hn2"
> >}
> >  )
> >
> > -;; div optimizations using narrowings -;; we can do the division e.g.
> > shorts by 255 faster by calculating it as -;; (x + ((x + 257) >> 8))
> > >> 8 assuming the operation is done in -;; double the precision of x.
> > -;;
> > -;; If we imagine a short as being composed of two blocks of bytes
> > then -;; adding 257 or 0b_0001__0001 to the number is
> > equivalent to -;; adding 1 to each sub component:
> > -;;
> > -;;  short value of 16-bits
> > -;; ┌──┬┐
> > -;; │  ││
> > -;; └──┴┘
> > -;;   8-bit part1 ▲  8-bit part2   ▲
> > -;;   ││
> > -;;   ││
> > -;;  +1   +1
> > -;;
> > -;; after the first addition, we have to shift right by 8, and narrow
> > the -;; results back to a byte.  Remember that the addition must be
> > done in -;; double the precision of the input.  Since 8 is half the
> > size of a short -;; we can use a narrowing halfing instruction in
> > AArch64, addhn which also -;; does the addition in a wider precision
> > and narrows back to a byte.  The -;; shift itself is implicit in the
> > operation as it writes back only the top -;; half of the result. i.e. bits 
> > 2*esize-
> 1:esize.
> > -;;
> > -;; Since we have narrowed the result of the first part back to a
> > byte, for -;; the second addition we can use a widening addition, uaddw.
> > -;;
> > -;; For the final shift, since it's unsigned arithmetic we emit an ushr by 
> > 8.
> > -;;
> > -;; The shift is later optimized by combine to a uzp2 with movi #0.
> > -(define_expand "@aarch64_bitmask_udiv3"
> > -  [(match_operand:VQN 0 "register_operand")
> > -   (match_operand:VQN 1 "register_operand")
> > -   (match_operand:VQN 2 "immediate_operand")]
> > +;; Optimize ((a + b) >> n) + c where n is half the bitsize of the
> > +vector (define_insn_and_split "*bitmask_shift_plus"
> > +  [(set (match_operand:VQN 0 "register_operand" "=")
> > +   (plus:VQN
> > + (lshiftrt:VQN
> > +   (plus:VQN (match_operand:VQN 1 "register_operand" "w")
> > + (match_operand:VQN 2 "register_operand" "w"))
> > +   (match_operand:VQN 3
> > +"aarch64_simd_shift_imm_vec_exact_top" "Dr"))
> 
> I guess this is personal preference, sorry, but I think we should drop the
> constraint.  The predicate does the real check, and the operand is never
> reloaded, so "Dr" isn't any more helpful than an empty constraint, and IMO
> can be confusing.
> 
> > + (match_operand:VQN 4 "register_operand" "w")))]
> >"TARGET_SIMD"
> > +  "#"
> > +  "&& true"
> > +  [(const_int 0)]
> >  {
> > -  unsigned HOST_WIDE_INT size
> > -= (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1;
> > -  rtx elt = unwrap_const_vec_duplicate (operands[2]);
> > -  if (!CONST_INT_P (elt) || UINTVAL (elt) != size)
> > -FAIL;
> > -
> > -  rtx addend = gen_reg_rtx (mode);
> > -  rtx val = aarch64_simd_gen_const_vector_dup (mode, 1);
> > -  emit_move_insn (addend, lowpart_subreg (mode, val,
> > mode));
> > -  rtx tmp1 = gen_reg_rtx (mode);
> > -  rtx tmp2 = gen_reg_rtx (mode);
> > -  emit_insn (gen_aarch64_addhn (tmp1, operands[1], addend));
> > -  unsigned bitsize = GET_MODE_UNIT_BITSIZE (mode);
> > -  rtx shift_vector = 

[PATCH]middle-end: On emergency dumps finish the graph generation.

2023-03-07 Thread Tamar Christina via Gcc-patches
Hi All,

When doing an emergency dump the cfg output dumps are corrupted because the
ending "}" is missing.

Normally when the pass manager finishes it would call finish_graph_dump_file to
produce this.  This is called here because each pass can dump multiple digraphs.

However during an emergency dump we only dump the current function and so after
that is done we never go back to the pass manager.

As such, we need to manually call finish_graph_dump_file in order to properly
finish off graph generation.

With this -ftree-dump-*-graph works properly during a crash dump.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* passes.cc (emergency_dump_function): Finish graph generation.

--- inline copy of patch -- 
diff --git a/gcc/passes.cc b/gcc/passes.cc
index 
347214e81d0cfac05d9ba782db0eda1cdd7e9c87..38642a4010941b414a1ed1fd70a348778addbf60
 100644
--- a/gcc/passes.cc
+++ b/gcc/passes.cc
@@ -1845,6 +1845,13 @@ emergency_dump_function ()
   fprintf (dump_file, "\n\n\nEMERGENCY DUMP:\n\n");
   execute_function_dump (cfun, current_pass);
 
+  /* Normally the passmanager will close the graphs as a pass could be wanting
+ to print multiple digraphs. But during an emergency dump there can only be
+ one and we must finish the graph manually.  */
+  if ((cfun->curr_properties & PROP_cfg)
+  && (dump_flags & TDF_GRAPH))
+finish_graph_dump_file (dump_file_name);
+
   if (symtab && current_pass->type == IPA_PASS)
 symtab->dump (dump_file);
 }




-- 
diff --git a/gcc/passes.cc b/gcc/passes.cc
index 
347214e81d0cfac05d9ba782db0eda1cdd7e9c87..38642a4010941b414a1ed1fd70a348778addbf60
 100644
--- a/gcc/passes.cc
+++ b/gcc/passes.cc
@@ -1845,6 +1845,13 @@ emergency_dump_function ()
   fprintf (dump_file, "\n\n\nEMERGENCY DUMP:\n\n");
   execute_function_dump (cfun, current_pass);
 
+  /* Normally the passmanager will close the graphs as a pass could be wanting
+ to print multiple digraphs. But during an emergency dump there can only be
+ one and we must finish the graph manually.  */
+  if ((cfun->curr_properties & PROP_cfg)
+  && (dump_flags & TDF_GRAPH))
+finish_graph_dump_file (dump_file_name);
+
   if (symtab && current_pass->type == IPA_PASS)
 symtab->dump (dump_file);
 }





RE: [PATCH] PR rtl-optimization/106594: Preserve zero_extend in combine when cheap.

2023-03-06 Thread Tamar Christina via Gcc-patches
> Hi!
> 
> On Sun, Mar 05, 2023 at 03:33:40PM -0600, Segher Boessenkool wrote:
> > On Sun, Mar 05, 2023 at 08:43:20PM +, Tamar Christina wrote:
> > Yes, *look* better: I have seen no proof or indication that this would
> 
> ("looks", I cannot type, sorry)
> 
> > actually generate better code, not even on just aarch, let alone on
> > the majority of targets.  As I said I have a test running, you may be
> > lucky even :-)  It has to run for about six hours more and after that
> > it needs analysis still (a few more hours if it isn't obviously always
> > better or worse), so expect results tomorrow night at the earliest.
> 
> The results are in:
> 
> $ perl sizes.pl --percent C[12]
> C1C2
>alpha   7082243  100.066%
>  arc   4207975  100.015%
>  arm  11518624  100.008%
>arm64  24514565  100.067%
>armhf  16661684  100.098%
> csky   4031841  100.002%
> i386 0 0
> ia64  20354295  100.029%
> m68k   4394084  100.023%
>   microblaze   6549965  100.014%
> mips  10684680  100.024%
>   mips64   8171850  100.002%
>nios2   4356713  100.012%
> openrisc   5010570  100.003%
>   parisc   8406294  100.002%
> parisc64 0 0
>  powerpc  11104901   99.992%
>powerpc64  24532358  100.057%
>  powerpc64le  21293219  100.062%
>  riscv32   2028474  100.131%
>  riscv64   9515453  100.120%
> s390  20519612  100.279%
>   sh 0 0
>  shnommu   1840960  100.012%
>sparc   5314422  100.004%
>  sparc64   7964129   99.992%
>   x86_64 0 0
>   xtensa   2925723  100.070%
> 
> 
> C1 is the original, C2 with your patch.  These numbers are the code sizes of a
> Linux kernel, some defconfig for every arch.  This is a good measure of how
> effective combine was.
> 
> The patch is a tiny win for sparc64 and classic powerpc32 only, but bad
> everywhere else.  Look at that s390 number!  Or riscv, or most of the arm
> variants (including aarch64).
> 
> Do you want me to look in detail what causes this regression on some
> particular target, i.e. why we really still need the expand_compound
> functionality there?
> 

Hi,

Thanks for having a look! I think the Richards are exploring a different 
solution on the PR
so I don't think it's worth looking at now (maybe in stage-1?).  Thanks for 
checking though!

I Appreciate you all helping to get this fixed!

Kind Regards,
Tamar

> (Btw.  "0" means the target did not build.  For the x86 targets this is just 
> more
> -Werror madness that seeped in it seems.  For parisc64 and sh it is the choice
> of config.  Will fix.)
> 
> 
> Segher


  1   2   3   4   5   6   >