[PATCH 10/10] vect: Consider vec_perm costing for VMAT_CONTIGUOUS_REVERSE

2023-09-13 Thread Kewen Lin via Gcc-patches
For VMAT_CONTIGUOUS_REVERSE, the transform code in function
vectorizable_store generates a VEC_PERM_EXPR stmt before
storing, but it's never considered in costing.

This patch is to make it consider vec_perm in costing, it
adjusts the order of transform code a bit to make it easy
to early return for costing_p.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Consider generated
VEC_PERM_EXPR stmt for VMAT_CONTIGUOUS_REVERSE in costing as
vec_perm.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c: New test.
---
 .../costmodel/ppc/costmodel-vect-store-2.c| 29 +
 gcc/tree-vect-stmts.cc| 63 +++
 2 files changed, 65 insertions(+), 27 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c
new file mode 100644
index 000..72b67cf9040
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-additional-options "-mvsx" } */
+
+/* Verify we do cost the required vec_perm.  */
+
+int
+foo (int *a, int *b, int len)
+{
+  int i;
+  int *a1 = a;
+  int *a0 = a1 - 4;
+  for (i = 0; i < len; i++)
+{
+  *b = *a0 + *a1;
+  b--;
+  a0++;
+  a1++;
+}
+  return 0;
+}
+
+/* The reason why it doesn't check the exact count is that
+   we can get more than 1 vec_perm when it's compiled with
+   partial vector capability like Power10 (retrying for
+   the epilogue) or it's complied without unaligned vector
+   memory access support (realign).  */
+/* { dg-final { scan-tree-dump {\mvec_perm\M} "vect" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 3d451c80bca..ce925cc1d53 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9279,6 +9279,40 @@ vectorizable_store (vec_info *vinfo,
   stmt_vec_info next_stmt_info = first_stmt_info;
   for (i = 0; i < vec_num; i++)
{
+ if (!costing_p)
+   {
+ if (slp)
+   vec_oprnd = vec_oprnds[i];
+ else if (grouped_store)
+   /* For grouped stores vectorized defs are interleaved in
+  vect_permute_store_chain().  */
+   vec_oprnd = result_chain[i];
+   }
+
+ if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
+   {
+ if (costing_p)
+   inside_cost += record_stmt_cost (cost_vec, 1, vec_perm,
+stmt_info, 0, vect_body);
+ else
+   {
+ tree perm_mask = perm_mask_for_reverse (vectype);
+ tree perm_dest = vect_create_destination_var (
+   vect_get_store_rhs (stmt_info), vectype);
+ tree new_temp = make_ssa_name (perm_dest);
+
+ /* Generate the permute statement.  */
+ gimple *perm_stmt
+   = gimple_build_assign (new_temp, VEC_PERM_EXPR, vec_oprnd,
+  vec_oprnd, perm_mask);
+ vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
+  gsi);
+
+ perm_stmt = SSA_NAME_DEF_STMT (new_temp);
+ vec_oprnd = new_temp;
+   }
+   }
+
  if (costing_p)
{
  vect_get_store_cost (vinfo, stmt_info, 1,
@@ -9294,8 +9328,6 @@ vectorizable_store (vec_info *vinfo,
 
  continue;
}
- unsigned misalign;
- unsigned HOST_WIDE_INT align;
 
  tree final_mask = NULL_TREE;
  tree final_len = NULL_TREE;
@@ -9315,13 +9347,8 @@ vectorizable_store (vec_info *vinfo,
dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, gsi,
   stmt_info, bump);
 
- if (slp)
-   vec_oprnd = vec_oprnds[i];
- else if (grouped_store)
-   /* For grouped stores vectorized defs are interleaved in
-  vect_permute_store_chain().  */
-   vec_oprnd = result_chain[i];
-
+ unsigned misalign;
+ unsigned HOST_WIDE_INT align;
  align = known_alignment (DR_TARGET_ALIGNMENT (first_dr_info));
  if (alignment_support_scheme == dr_aligned)
misalign = 0;
@@ -9338,24 +9365,6 @@ vectorizable_store (vec_info *vinfo,
misalign);
  align = least_bit_hwi (misalign | align);
 
- if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
-   {
- tree perm_mask = perm_mask_for_reverse (vectype);
- tree perm_dest
-   = 

[PATCH 09/10] vect: Get rid of vect_model_store_cost

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch is to eventually get rid of vect_model_store_cost,
it adjusts the costing for the remaining memory access types
VMAT_CONTIGUOUS{, _DOWN, _REVERSE} by moving costing close
to the transform code.  Note that in vect_model_store_cost,
there is one special handling for vectorizing a store into
the function result, since it's extra penalty and the
transform part doesn't have it, this patch keep it alone.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_store_cost): Remove.
(vectorizable_store): Adjust the costing for the remaining memory
access types VMAT_CONTIGUOUS{, _DOWN, _REVERSE}.
---
 gcc/tree-vect-stmts.cc | 137 +
 1 file changed, 44 insertions(+), 93 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e3ba8077091..3d451c80bca 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -951,81 +951,6 @@ cfun_returns (tree decl)
   return false;
 }
 
-/* Function vect_model_store_cost
-
-   Models cost for stores.  In the case of grouped accesses, one access
-   has the overhead of the grouped access attributed to it.  */
-
-static void
-vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies,
-  vect_memory_access_type memory_access_type,
-  dr_alignment_support alignment_support_scheme,
-  int misalignment,
-  vec_load_store_type vls_type, slp_tree slp_node,
-  stmt_vector_for_cost *cost_vec)
-{
-  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
- && memory_access_type != VMAT_ELEMENTWISE
- && memory_access_type != VMAT_STRIDED_SLP
- && memory_access_type != VMAT_LOAD_STORE_LANES
- && memory_access_type != VMAT_CONTIGUOUS_PERMUTE);
-
-  unsigned int inside_cost = 0, prologue_cost = 0;
-
-  /* ???  Somehow we need to fix this at the callers.  */
-  if (slp_node)
-ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-
-  if (vls_type == VLS_STORE_INVARIANT)
-{
-  if (!slp_node)
-   prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
-  stmt_info, 0, vect_prologue);
-}
-
-
-  /* Costs of the stores.  */
-  vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
-  misalignment, _cost, cost_vec);
-
-  /* When vectorizing a store into the function result assign
- a penalty if the function returns in a multi-register location.
- In this case we assume we'll end up with having to spill the
- vector result and do piecewise loads as a conservative estimate.  */
-  tree base = get_base_address (STMT_VINFO_DATA_REF (stmt_info)->ref);
-  if (base
-  && (TREE_CODE (base) == RESULT_DECL
- || (DECL_P (base) && cfun_returns (base)))
-  && !aggregate_value_p (base, cfun->decl))
-{
-  rtx reg = hard_function_value (TREE_TYPE (base), cfun->decl, 0, 1);
-  /* ???  Handle PARALLEL in some way.  */
-  if (REG_P (reg))
-   {
- int nregs = hard_regno_nregs (REGNO (reg), GET_MODE (reg));
- /* Assume that a single reg-reg move is possible and cheap,
-do not account for vector to gp register move cost.  */
- if (nregs > 1)
-   {
- /* Spill.  */
- prologue_cost += record_stmt_cost (cost_vec, ncopies,
-vector_store,
-stmt_info, 0, vect_epilogue);
- /* Loads.  */
- prologue_cost += record_stmt_cost (cost_vec, ncopies * nregs,
-scalar_load,
-stmt_info, 0, vect_epilogue);
-   }
-   }
-}
-
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
- "vect_model_store_cost: inside_cost = %d, "
- "prologue_cost = %d .\n", inside_cost, prologue_cost);
-}
-
-
 /* Calculate cost of DR's memory access.  */
 void
 vect_get_store_cost (vec_info *, stmt_vec_info stmt_info, int ncopies,
@@ -9223,6 +9148,11 @@ vectorizable_store (vec_info *vinfo,
   return true;
 }
 
+  gcc_assert (memory_access_type == VMAT_CONTIGUOUS
+ || memory_access_type == VMAT_CONTIGUOUS_DOWN
+ || memory_access_type == VMAT_CONTIGUOUS_PERMUTE
+ || memory_access_type == VMAT_CONTIGUOUS_REVERSE);
+
   unsigned inside_cost = 0, prologue_cost = 0;
   auto_vec result_chain (group_size);
   auto_vec vec_oprnds;
@@ -9257,10 +9187,9 @@ vectorizable_store (vec_info *vinfo,
 that there is no interleaving, DR_GROUP_SIZE is 1,
 and only one iteration of the loop will be executed.  */
  op = vect_get_store_rhs (next_stmt_info);
- if (costing_p
- && memory_access_type == 

[PATCH 03/10] vect: Adjust vectorizable_store costing on VMAT_GATHER_SCATTER

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on VMAT_GATHER_SCATTER
in function vectorizable_store (all three cases), then we
won't depend on vect_model_load_store for its costing any
more.  This patch shouldn't have any functional changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get
VMAT_GATHER_SCATTER any more, remove VMAT_GATHER_SCATTER related
handlings and the related parameter gs_info.
(vect_build_scatter_store_calls): Add the handlings on costing with
one more argument cost_vec.
(vectorizable_store): Adjust the cost handling on VMAT_GATHER_SCATTER
without calling vect_model_store_cost any more.
---
 gcc/tree-vect-stmts.cc | 188 ++---
 1 file changed, 118 insertions(+), 70 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 36f7c5b9f4b..3f908242fee 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -959,12 +959,12 @@ cfun_returns (tree decl)
 static void
 vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies,
   vect_memory_access_type memory_access_type,
-  gather_scatter_info *gs_info,
   dr_alignment_support alignment_support_scheme,
   int misalignment,
   vec_load_store_type vls_type, slp_tree slp_node,
   stmt_vector_for_cost *cost_vec)
 {
+  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER);
   unsigned int inside_cost = 0, prologue_cost = 0;
   stmt_vec_info first_stmt_info = stmt_info;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
@@ -1012,18 +1012,9 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
 
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   /* Costs of the stores.  */
-  if (memory_access_type == VMAT_ELEMENTWISE
-  || memory_access_type == VMAT_GATHER_SCATTER)
+  if (memory_access_type == VMAT_ELEMENTWISE)
 {
   unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
-  if (memory_access_type == VMAT_GATHER_SCATTER
- && gs_info->ifn == IFN_LAST && !gs_info->decl)
-   /* For emulated scatter N offset vector element extracts
-  (we assume the scalar scaling and ptr + offset add is consumed by
-  the load).  */
-   inside_cost += record_stmt_cost (cost_vec, ncopies * assumed_nunits,
-vec_to_scalar, stmt_info, 0,
-vect_body);
   /* N scalar stores plus extracting the elements.  */
   inside_cost += record_stmt_cost (cost_vec,
   ncopies * assumed_nunits,
@@ -1034,9 +1025,7 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
 misalignment, _cost, cost_vec);
 
   if (memory_access_type == VMAT_ELEMENTWISE
-  || memory_access_type == VMAT_STRIDED_SLP
-  || (memory_access_type == VMAT_GATHER_SCATTER
- && gs_info->ifn == IFN_LAST && !gs_info->decl))
+  || memory_access_type == VMAT_STRIDED_SLP)
 {
   /* N scalar stores plus extracting the elements.  */
   unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
@@ -2999,7 +2988,8 @@ vect_build_gather_load_calls (vec_info *vinfo, 
stmt_vec_info stmt_info,
 static void
 vect_build_scatter_store_calls (vec_info *vinfo, stmt_vec_info stmt_info,
gimple_stmt_iterator *gsi, gimple **vec_stmt,
-   gather_scatter_info *gs_info, tree mask)
+   gather_scatter_info *gs_info, tree mask,
+   stmt_vector_for_cost *cost_vec)
 {
   loop_vec_info loop_vinfo = dyn_cast (vinfo);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
@@ -3009,6 +2999,30 @@ vect_build_scatter_store_calls (vec_info *vinfo, 
stmt_vec_info stmt_info,
   poly_uint64 scatter_off_nunits
 = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype);
 
+  /* FIXME: Keep the previous costing way in vect_model_store_cost by
+ costing N scalar stores, but it should be tweaked to use target
+ specific costs on related scatter store calls.  */
+  if (cost_vec)
+{
+  tree op = vect_get_store_rhs (stmt_info);
+  enum vect_def_type dt;
+  gcc_assert (vect_is_simple_use (op, vinfo, ));
+  unsigned int inside_cost, prologue_cost = 0;
+  if (dt == vect_constant_def || dt == vect_external_def)
+   prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
+  stmt_info, 0, vect_prologue);
+  unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
+  inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits,
+ scalar_store, stmt_info, 0, vect_body);
+
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+  

[PATCH 07/10] vect: Adjust vectorizable_store costing on VMAT_CONTIGUOUS_PERMUTE

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on VMAT_CONTIGUOUS_PERMUTE
in function vectorizable_store.  We don't call function
vect_model_store_cost for it any more.  It's the case of
interleaving stores, so it skips all stmts excepting for
first_stmt_info, consider the whole group when costing
first_stmt_info.  This patch shouldn't have any functional
changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_store_cost): Assert it will never
get VMAT_CONTIGUOUS_PERMUTE and remove VMAT_CONTIGUOUS_PERMUTE related
handlings.
(vectorizable_store): Adjust the cost handling on
VMAT_CONTIGUOUS_PERMUTE without calling vect_model_store_cost.
---
 gcc/tree-vect-stmts.cc | 128 -
 1 file changed, 74 insertions(+), 54 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index fbd16b8a487..e3ba8077091 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -967,10 +967,10 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
   gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
  && memory_access_type != VMAT_ELEMENTWISE
  && memory_access_type != VMAT_STRIDED_SLP
- && memory_access_type != VMAT_LOAD_STORE_LANES);
+ && memory_access_type != VMAT_LOAD_STORE_LANES
+ && memory_access_type != VMAT_CONTIGUOUS_PERMUTE);
+
   unsigned int inside_cost = 0, prologue_cost = 0;
-  stmt_vec_info first_stmt_info = stmt_info;
-  bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
 
   /* ???  Somehow we need to fix this at the callers.  */
   if (slp_node)
@@ -983,35 +983,6 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
   stmt_info, 0, vect_prologue);
 }
 
-  /* Grouped stores update all elements in the group at once,
- so we want the DR for the first statement.  */
-  if (!slp_node && grouped_access_p)
-first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
-
-  /* True if we should include any once-per-group costs as well as
- the cost of the statement itself.  For SLP we only get called
- once per group anyhow.  */
-  bool first_stmt_p = (first_stmt_info == stmt_info);
-
-  /* We assume that the cost of a single store-lanes instruction is
- equivalent to the cost of DR_GROUP_SIZE separate stores.  If a grouped
- access is instead being provided by a permute-and-store operation,
- include the cost of the permutes.  */
-  if (first_stmt_p
-  && memory_access_type == VMAT_CONTIGUOUS_PERMUTE)
-{
-  /* Uses a high and low interleave or shuffle operations for each
-needed permute.  */
-  int group_size = DR_GROUP_SIZE (first_stmt_info);
-  int nstmts = ncopies * ceil_log2 (group_size) * group_size;
-  inside_cost = record_stmt_cost (cost_vec, nstmts, vec_perm,
- stmt_info, 0, vect_body);
-
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
- "vect_model_store_cost: strided group_size = %d .\n",
- group_size);
-}
 
   /* Costs of the stores.  */
   vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
@@ -8408,9 +8379,7 @@ vectorizable_store (vec_info *vinfo,
 costing, use the first one instead.  */
   if (grouped_store
  && !slp
- && first_stmt_info != stmt_info
- && (memory_access_type == VMAT_ELEMENTWISE
- || memory_access_type == VMAT_LOAD_STORE_LANES))
+ && first_stmt_info != stmt_info)
return true;
 }
   gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info));
@@ -9254,14 +9223,15 @@ vectorizable_store (vec_info *vinfo,
   return true;
 }
 
+  unsigned inside_cost = 0, prologue_cost = 0;
   auto_vec result_chain (group_size);
   auto_vec vec_oprnds;
   for (j = 0; j < ncopies; j++)
 {
   gimple *new_stmt;
-  if (j == 0 && !costing_p)
+  if (j == 0)
{
- if (slp)
+ if (slp && !costing_p)
{
  /* Get vectorized arguments for SLP_NODE.  */
  vect_get_vec_defs (vinfo, stmt_info, slp_node, 1, op,
@@ -9287,13 +9257,20 @@ vectorizable_store (vec_info *vinfo,
 that there is no interleaving, DR_GROUP_SIZE is 1,
 and only one iteration of the loop will be executed.  */
  op = vect_get_store_rhs (next_stmt_info);
- vect_get_vec_defs_for_operand (vinfo, next_stmt_info, ncopies,
-op, gvec_oprnds[i]);
- vec_oprnd = (*gvec_oprnds[i])[0];
- dr_chain.quick_push (vec_oprnd);
+ if (costing_p
+ && memory_access_type == VMAT_CONTIGUOUS_PERMUTE)
+   update_prologue_cost (_cost, op);
+ else 

[PATCH/RFC 08/10] aarch64: Don't use CEIL for vector_store in aarch64_stp_sequence_cost

2023-09-13 Thread Kewen Lin via Gcc-patches
This costing adjustment patch series exposes one issue in
aarch64 specific costing adjustment for STP sequence.  It
causes the below test cases to fail:

  - gcc/testsuite/gcc.target/aarch64/ldp_stp_15.c
  - gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
  - gcc/testsuite/gcc.target/aarch64/ldp_stp_17.c
  - gcc/testsuite/gcc.target/aarch64/ldp_stp_18.c

Take the below function extracted from ldp_stp_15.c as
example:

void
dup_8_int32_t (int32_t *x, int32_t val)
{
for (int i = 0; i < 8; ++i)
  x[i] = val;
}

Without my patch series, during slp1 it gets:

  val_8(D) 2 times unaligned_store (misalign -1) costs 2 in body
  node 0x10008c85e38 1 times scalar_to_vec costs 1 in prologue

then the final vector cost is 3.

With my patch series, during slp1 it gets:

  val_8(D) 1 times unaligned_store (misalign -1) costs 1 in body
  val_8(D) 1 times unaligned_store (misalign -1) costs 1 in body
  node 0x10004cc5d88 1 times scalar_to_vec costs 1 in prologue

but the final vector cost is 17.  The unaligned_store count is
actually unchanged, but the final vector costs become different,
it's because the below aarch64 special handling makes the
different costs:

  /* Apply the heuristic described above m_stp_sequence_cost.  */
  if (m_stp_sequence_cost != ~0U)
{
  uint64_t cost = aarch64_stp_sequence_cost (count, kind,
 stmt_info, vectype);
  m_stp_sequence_cost = MIN (m._stp_sequence_cost + cost, ~0U);
}

For the former, since the count is 2, function
aarch64_stp_sequence_cost returns 2 as "CEIL (count, 2) * 2".
While for the latter, it's separated into twice calls with
count 1, aarch64_stp_sequence_cost returns 2 for each time,
so it returns 4 in total.

For this case, the stmt with scalar_to_vec also contributes
4 to m_stp_sequence_cost, then the final m_stp_sequence_cost
are 6 (2+4) vs. 8 (4+4).

Considering scalar_costs->m_stp_sequence_cost is 8 and below
checking and re-assigning:

  else if (m_stp_sequence_cost >= scalar_costs->m_stp_sequence_cost)
m_costs[vect_body] = 2 * scalar_costs->total_cost ();

For the former, the body cost of vector isn't changed; but
for the latter, the body cost of vector is double of scalar
cost which is 8 for this case, then it becomes 16 which is
bigger than what we expect.

I'm not sure why it adopts CEIL for the return value for
case unaligned_store in function aarch64_stp_sequence_cost,
but I tried to modify it with "return count;" (as it can
get back to previous cost), there is no failures exposed
in regression testing.  I expected that if the previous
unaligned_store count is even, this adjustment doesn't
change anything, if it's odd, the adjustment may reduce
it by one, but I'd guess it would be few.  Besides, as
the comments for m_stp_sequence_cost, the current
handlings seems temporary, maybe a tweak like this can be
accepted, so I posted this RFC/PATCH to request comments.
this one line change is considered.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_stp_sequence_cost): Return
count directly instead of the adjusted value computed with CEIL.
---
 gcc/config/aarch64/aarch64.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 37d414021ca..9fb4fbd883d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -17051,7 +17051,7 @@ aarch64_stp_sequence_cost (unsigned int count, 
vect_cost_for_stmt kind,
  if (!aarch64_aligned_constant_offset_p (stmt_info, size))
return count * 2;
}
-  return CEIL (count, 2) * 2;
+  return count;
 
 case scalar_store:
   if (stmt_info && STMT_VINFO_DATA_REF (stmt_info))
-- 
2.31.1



[PATCH 01/10] vect: Ensure vect store is supported for some VMAT_ELEMENTWISE case

2023-09-13 Thread Kewen Lin via Gcc-patches
When making/testing patches to move costing next to the
transform code for vectorizable_store, some ICEs got
exposed when I further refined the costing handlings on
VMAT_ELEMENTWISE.  The apparent cause is triggering the
assertion in rs6000 specific function for costing
rs6000_builtin_vectorization_cost:

  if (TARGET_ALTIVEC)
 /* Misaligned stores are not supported.  */
 gcc_unreachable ();

I used vect_get_store_cost instead of the original way by
record_stmt_cost with scalar_store for costing, that is to
use one unaligned_store instead, it matches what we use in
transforming, it's a vector store as below:

  else if (group_size >= const_nunits
   && group_size % const_nunits == 0)
{
   nstores = 1;
   lnel = const_nunits;
   ltype = vectype;
   lvectype = vectype;
}

So IMHO it's more consistent with vector store instead of
scalar store, with the given compilation option
-mno-allow-movmisalign, the misaligned vector store is
unexpected to be used in vectorizer, but why it's still
adopted?  In the current implementation of function
get_group_load_store_type, we always set alignment support
scheme as dr_unaligned_supported for VMAT_ELEMENTWISE, it
is true if we always adopt scalar stores, but as the above
code shows, we could use vector stores for some cases, so
we should use the correct alignment support scheme for it.

This patch is to ensure the vector store is supported by
further checking with vect_supportable_dr_alignment.  The
ICEs got exposed with patches moving costing next to the
transform but they haven't been landed, the test coverage
would be there once they get landed.  The affected test
cases are:
  - gcc.dg/vect/slp-45.c
  - gcc.dg/vect/vect-alias-check-{10,11,12}.c

btw, I tried to make some correctness test case, but I
realized that -mno-allow-movmisalign is mainly for noting
movmisalign optab and it doesn't guard for the actual hw
vector memory access insns, so I failed to make it unless
I also altered some conditions for them as it.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Ensure the generated
vector store for some case of VMAT_ELEMENTWISE is supported.
---
 gcc/tree-vect-stmts.cc | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cd7c1090d88..a5caaf0bca2 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8558,10 +8558,18 @@ vectorizable_store (vec_info *vinfo,
  else if (group_size >= const_nunits
   && group_size % const_nunits == 0)
{
- nstores = 1;
- lnel = const_nunits;
- ltype = vectype;
- lvectype = vectype;
+ int mis_align = dr_misalignment (first_dr_info, vectype);
+ dr_alignment_support dr_align
+   = vect_supportable_dr_alignment (vinfo, dr_info, vectype,
+mis_align);
+ if (dr_align == dr_aligned
+ || dr_align == dr_unaligned_supported)
+   {
+ nstores = 1;
+ lnel = const_nunits;
+ ltype = vectype;
+ lvectype = vectype;
+   }
}
  ltype = build_aligned_type (ltype, TYPE_ALIGN (elem_type));
  ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-- 
2.31.1



[PATCH 05/10] vect: Adjust vectorizable_store costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on VMAT_ELEMENTWISE
and VMAT_STRIDED_SLP in function vectorizable_store.  We
don't call function vect_model_store_cost for them any more.

Like what we improved for PR82255 on load side, this change
helps us to get rid of unnecessary vec_to_scalar costing
for some case with VMAT_STRIDED_SLP.  One typical test case
gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c has been
associated.  And it helps some cases with some inconsistent
costing too.

Besides, this also special-cases the interleaving stores
for these two affected memory access types, since for the
interleaving stores the whole chain is vectorized when the
last store in the chain is reached, the other stores in the
group would be skipped.  To keep consistent with this and
follows the transforming handlings like iterating the whole
group, it only costs for the first store in the group.
Ideally we can only cost for the last one but it's not
trivial and using the first one is actually equivalent.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get
VMAT_ELEMENTWISE and VMAT_STRIDED_SLP any more, and remove their
related handlings.
(vectorizable_store): Adjust the cost handling on VMAT_ELEMENTWISE
and VMAT_STRIDED_SLP without calling vect_model_store_cost.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c: New test.
---
 .../costmodel/ppc/costmodel-vect-store-1.c|  23 +++
 gcc/tree-vect-stmts.cc| 160 +++---
 2 files changed, 120 insertions(+), 63 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c
new file mode 100644
index 000..ab5f3301492
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" }
+
+/* This test case is partially extracted from case
+   gcc.dg/vect/vect-avg-16.c, it's to verify we don't
+   cost a store with vec_to_scalar when we shouldn't.  */
+
+void
+test (signed char *restrict a, signed char *restrict b, signed char *restrict 
c,
+  int n)
+{
+  for (int j = 0; j < n; ++j)
+{
+  for (int i = 0; i < 16; ++i)
+   a[i] = (b[i] + c[i]) >> 1;
+  a += 20;
+  b += 20;
+  c += 20;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "vec_to_scalar" 0 "vect" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 048c14d291c..3d01168080a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -964,7 +964,9 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
   vec_load_store_type vls_type, slp_tree slp_node,
   stmt_vector_for_cost *cost_vec)
 {
-  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER);
+  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
+ && memory_access_type != VMAT_ELEMENTWISE
+ && memory_access_type != VMAT_STRIDED_SLP);
   unsigned int inside_cost = 0, prologue_cost = 0;
   stmt_vec_info first_stmt_info = stmt_info;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
@@ -1010,29 +1012,9 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
  group_size);
 }
 
-  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   /* Costs of the stores.  */
-  if (memory_access_type == VMAT_ELEMENTWISE)
-{
-  unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
-  /* N scalar stores plus extracting the elements.  */
-  inside_cost += record_stmt_cost (cost_vec,
-  ncopies * assumed_nunits,
-  scalar_store, stmt_info, 0, vect_body);
-}
-  else
-vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
-misalignment, _cost, cost_vec);
-
-  if (memory_access_type == VMAT_ELEMENTWISE
-  || memory_access_type == VMAT_STRIDED_SLP)
-{
-  /* N scalar stores plus extracting the elements.  */
-  unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
-  inside_cost += record_stmt_cost (cost_vec,
-  ncopies * assumed_nunits,
-  vec_to_scalar, stmt_info, 0, vect_body);
-}
+  vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
+  misalignment, _cost, cost_vec);
 
   /* When vectorizing a store into the function result assign
  a penalty if the function returns in a multi-register location.
@@ -8416,6 +8398,18 @@ vectorizable_store (vec_info *vinfo,
 "Vectorizing an unaligned 

[PATCH 06/10] vect: Adjust vectorizable_store costing on VMAT_LOAD_STORE_LANES

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on VMAT_LOAD_STORE_LANES
in function vectorizable_store.  We don't call function
vect_model_store_cost for it any more.  It's the case of
interleaving stores, so it skips all stmts excepting for
first_stmt_info, consider the whole group when costing
first_stmt_info.  This patch shouldn't have any functional
changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_store_cost): Assert it will never
get VMAT_LOAD_STORE_LANES.
(vectorizable_store): Adjust the cost handling on VMAT_LOAD_STORE_LANES
without calling vect_model_store_cost.  Factor out new lambda function
update_prologue_cost.
---
 gcc/tree-vect-stmts.cc | 110 -
 1 file changed, 75 insertions(+), 35 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 3d01168080a..fbd16b8a487 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -966,7 +966,8 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
 {
   gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
  && memory_access_type != VMAT_ELEMENTWISE
- && memory_access_type != VMAT_STRIDED_SLP);
+ && memory_access_type != VMAT_STRIDED_SLP
+ && memory_access_type != VMAT_LOAD_STORE_LANES);
   unsigned int inside_cost = 0, prologue_cost = 0;
   stmt_vec_info first_stmt_info = stmt_info;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
@@ -8408,7 +8409,8 @@ vectorizable_store (vec_info *vinfo,
   if (grouped_store
  && !slp
  && first_stmt_info != stmt_info
- && memory_access_type == VMAT_ELEMENTWISE)
+ && (memory_access_type == VMAT_ELEMENTWISE
+ || memory_access_type == VMAT_LOAD_STORE_LANES))
return true;
 }
   gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info));
@@ -8479,6 +8481,31 @@ vectorizable_store (vec_info *vinfo,
 dump_printf_loc (MSG_NOTE, vect_location, "transform store. ncopies = 
%d\n",
 ncopies);
 
+  /* Check if we need to update prologue cost for invariant,
+ and update it accordingly if so.  If it's not for
+ interleaving store, we can just check vls_type; but if
+ it's for interleaving store, need to check the def_type
+ of the stored value since the current vls_type is just
+ for first_stmt_info.  */
+  auto update_prologue_cost = [&](unsigned *prologue_cost, tree store_rhs)
+  {
+gcc_assert (costing_p);
+if (slp)
+  return;
+if (grouped_store)
+  {
+   gcc_assert (store_rhs);
+   enum vect_def_type cdt;
+   gcc_assert (vect_is_simple_use (store_rhs, vinfo, ));
+   if (cdt != vect_constant_def && cdt != vect_external_def)
+ return;
+  }
+else if (vls_type != VLS_STORE_INVARIANT)
+  return;
+*prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, stmt_info,
+   0, vect_prologue);
+  };
+
   if (memory_access_type == VMAT_ELEMENTWISE
   || memory_access_type == VMAT_STRIDED_SLP)
 {
@@ -8646,14 +8673,8 @@ vectorizable_store (vec_info *vinfo,
  if (!costing_p)
vect_get_vec_defs (vinfo, next_stmt_info, slp_node, ncopies, op,
   _oprnds);
- else if (!slp)
-   {
- enum vect_def_type cdt;
- gcc_assert (vect_is_simple_use (op, vinfo, ));
- if (cdt == vect_constant_def || cdt == vect_external_def)
-   prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
-  stmt_info, 0, vect_prologue);
-   }
+ else
+   update_prologue_cost (_cost, op);
  unsigned int group_el = 0;
  unsigned HOST_WIDE_INT
elsz = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
@@ -8857,13 +8878,7 @@ vectorizable_store (vec_info *vinfo,
   if (memory_access_type == VMAT_LOAD_STORE_LANES)
 {
   gcc_assert (!slp && grouped_store);
-  if (costing_p)
-   {
- vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type,
-alignment_support_scheme, misalignment,
-vls_type, slp_node, cost_vec);
- return true;
-   }
+  unsigned inside_cost = 0, prologue_cost = 0;
   for (j = 0; j < ncopies; j++)
{
  gimple *new_stmt;
@@ -8879,29 +8894,39 @@ vectorizable_store (vec_info *vinfo,
 DR_GROUP_SIZE is the exact number of stmts in the
 chain. Therefore, NEXT_STMT_INFO can't be NULL_TREE.  */
  op = vect_get_store_rhs (next_stmt_info);
- vect_get_vec_defs_for_operand (vinfo, next_stmt_info, ncopies,
-op, gvec_oprnds[i]);
- vec_oprnd = (*gvec_oprnds[i])[0];
- 

[PATCH 04/10] vect: Simplify costing on vectorizable_scan_store

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch is to simplify the costing on the case
vectorizable_scan_store without calling function
vect_model_store_cost any more.

I considered if moving the costing into function
vectorizable_scan_store is a good idea, for doing
that, we have to pass several variables down which
are only used for costing, and for now we just
want to keep the costing as the previous, haven't
tried to make this costing consistent with what the
transforming does, so I think we can leave it for now.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Adjust costing on
vectorizable_scan_store without calling vect_model_store_cost
any more.
---
 gcc/tree-vect-stmts.cc | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 3f908242fee..048c14d291c 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8432,11 +8432,23 @@ vectorizable_store (vec_info *vinfo,
   else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3)
 {
   gcc_assert (memory_access_type == VMAT_CONTIGUOUS);
+  gcc_assert (!slp);
   if (costing_p)
{
- vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type,
-alignment_support_scheme, misalignment,
-vls_type, slp_node, cost_vec);
+ unsigned int inside_cost = 0, prologue_cost = 0;
+ if (vls_type == VLS_STORE_INVARIANT)
+   prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
+  stmt_info, 0, vect_prologue);
+ vect_get_store_cost (vinfo, stmt_info, ncopies,
+  alignment_support_scheme, misalignment,
+  _cost, cost_vec);
+
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"vect_model_store_cost: inside_cost = %d, "
+"prologue_cost = %d .\n",
+inside_cost, prologue_cost);
+
  return true;
}
   return vectorizable_scan_store (vinfo, stmt_info, gsi, vec_stmt, 
ncopies);
-- 
2.31.1



[PATCH 00/10] vect: Move costing next to the transform for vect store

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch series is a follow up for the previous patch
series for vector load [1].  Some of associated test cases
show the benefits of this kind of structuring.  Like the
one on vect load, this patch series makes costing not call
function vect_model_store_cost any more but next to the
transform.

Most of them are organized according to the memory access
types of vector store, hope it can make the review and
bisection easy.  The changes just follow the handlings in
the function vect_model_store_cost first, then refine a bit
by referring to the transform code, I also checked them
with some typical test cases to verify.

The whole series can be bootstrapped and regtested
incrementally on:
  - x86_64-redhat-linux
  - aarch64-linux-gnu
  - powerpc64-linux-gnu P7, P8 and P9
  - powerpc64le-linux-gnu P8, P9 and P10

By considering the current vector test buckets are mainly
tested without cost model, I also verified the whole patch
series was neutral for SPEC2017 int/fp on Power9 at O2,
O3 and Ofast separately.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621460.html

Kewen Lin (10):
  vect: Ensure vect store is supported for some VMAT_ELEMENTWISE case
  vect: Move vect_model_store_cost next to the transform in vectorizable_store
  vect: Adjust vectorizable_store costing on VMAT_GATHER_SCATTER
  vect: Simplify costing on vectorizable_scan_store
  vect: Adjust vectorizable_store costing on VMAT_ELEMENTWISE and 
VMAT_STRIDED_SLP
  vect: Adjust vectorizable_store costing on VMAT_LOAD_STORE_LANES
  vect: Adjust vectorizable_store costing on VMAT_CONTIGUOUS_PERMUTE
  aarch64: Don't use CEIL for vector_store in aarch64_stp_sequence_cost
  vect: Get rid of vect_model_store_cost
  vect: Consider vec_perm costing for VMAT_CONTIGUOUS_REVERSE

 gcc/config/aarch64/aarch64.cc |   2 +-
 .../costmodel/ppc/costmodel-vect-store-1.c|  23 +
 .../costmodel/ppc/costmodel-vect-store-2.c|  29 +
 gcc/tree-vect-stmts.cc| 717 +++---
 4 files changed, 493 insertions(+), 278 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c

-- 
2.31.1



[PATCH 02/10] vect: Move vect_model_store_cost next to the transform in vectorizable_store

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch is an initial patch to move costing next to the
transform, it still adopts vect_model_store_cost for costing
but moves and duplicates it down according to the handlings
of different vect_memory_access_types or some special
handling need, hope it can make the subsequent patches easy
to review.  This patch should not have any functional
changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Move and duplicate the call
to vect_model_store_cost down to some different transform paths
according to the handlings of different vect_memory_access_types
or some special handling need.
---
 gcc/tree-vect-stmts.cc | 79 --
 1 file changed, 60 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a5caaf0bca2..36f7c5b9f4b 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8372,7 +8372,8 @@ vectorizable_store (vec_info *vinfo,
return false;
 }
 
-  if (!vec_stmt) /* transformation not required.  */
+  bool costing_p = !vec_stmt;
+  if (costing_p) /* transformation not required.  */
 {
   STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) = memory_access_type;
 
@@ -8401,11 +8402,6 @@ vectorizable_store (vec_info *vinfo,
 "Vectorizing an unaligned access.\n");
 
   STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
-  vect_model_store_cost (vinfo, stmt_info, ncopies,
-memory_access_type, _info,
-alignment_support_scheme,
-misalignment, vls_type, slp_node, cost_vec);
-  return true;
 }
   gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info));
 
@@ -8415,12 +8411,27 @@ vectorizable_store (vec_info *vinfo,
 
   if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl)
 {
-  vect_build_scatter_store_calls (vinfo, stmt_info, gsi, vec_stmt,
- _info, mask);
+  if (costing_p)
+   vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type,
+  _info, alignment_support_scheme, misalignment,
+  vls_type, slp_node, cost_vec);
+  else
+   vect_build_scatter_store_calls (vinfo, stmt_info, gsi, vec_stmt,
+   _info, mask);
   return true;
 }
   else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3)
-return vectorizable_scan_store (vinfo, stmt_info, gsi, vec_stmt, ncopies);
+{
+  gcc_assert (memory_access_type == VMAT_CONTIGUOUS);
+  if (costing_p)
+   {
+ vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type,
+_info, alignment_support_scheme,
+misalignment, vls_type, slp_node, cost_vec);
+ return true;
+   }
+  return vectorizable_scan_store (vinfo, stmt_info, gsi, vec_stmt, 
ncopies);
+}
 
   if (grouped_store)
 {
@@ -8449,13 +8460,21 @@ vectorizable_store (vec_info *vinfo,
   else
 ref_type = reference_alias_ptr_type (DR_REF (first_dr_info->dr));
 
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
- "transform store. ncopies = %d\n", ncopies);
+  if (!costing_p && dump_enabled_p ())
+dump_printf_loc (MSG_NOTE, vect_location, "transform store. ncopies = 
%d\n",
+ncopies);
 
   if (memory_access_type == VMAT_ELEMENTWISE
   || memory_access_type == VMAT_STRIDED_SLP)
 {
+  if (costing_p)
+   {
+ vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type,
+_info, alignment_support_scheme,
+misalignment, vls_type, slp_node, cost_vec);
+ return true;
+   }
+
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
   gimple *incr;
@@ -8718,8 +8737,9 @@ vectorizable_store (vec_info *vinfo,
   else if (memory_access_type == VMAT_GATHER_SCATTER)
 {
   aggr_type = elem_type;
-  vect_get_strided_load_store_ops (stmt_info, loop_vinfo, gsi, _info,
-  , _offset, loop_lens);
+  if (!costing_p)
+   vect_get_strided_load_store_ops (stmt_info, loop_vinfo, gsi, _info,
+, _offset, loop_lens);
 }
   else
 {
@@ -8731,7 +8751,7 @@ vectorizable_store (vec_info *vinfo,
  memory_access_type, loop_lens);
 }
 
-  if (mask)
+  if (mask && !costing_p)
 LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true;
 
   /* In case the vectorization factor (VF) is bigger than the number
@@ -8782,6 +8802,13 @@ vectorizable_store (vec_info *vinfo,
   if (memory_access_type == VMAT_LOAD_STORE_LANES)
 {
   gcc_assert (!slp && grouped_store);
+  if (costing_p)
+   {
+ vect_model_store_cost (vinfo, stmt_info, ncopies, 

[PATCH 0/9] vect: Move costing next to the transform for vect load

2023-06-12 Thread Kewen Lin via Gcc-patches
This patch series follows Richi's suggestion at the link [1],
which suggest structuring vectorizable_load to make costing
next to the transform, in order to make it easier to keep
costing and the transform in sync.  For now, it's a known
issue that what we cost can be inconsistent with what we
transform, as the case in PR82255 and some other associated
test cases in the patches of this series show.

Basically this patch series makes costing not call function
vect_model_load_cost any more.  To make the review and
bisection easy, I organized the changes according to the
memory access types of vector load.  For each memory access
type, firstly it follows the handlings in the function
vect_model_load_costto avoid any missing, then refines
further by referring to the transform code, I also checked
them with some typical test cases to verify.  Hope the
subjects of patches are clear enough.

The whole series can be bootstrapped and regtested
incrementally on:
  - x86_64-redhat-linux
  - aarch64-linux-gnu
  - powerpc64-linux-gnu P7, P8 and P9
  - powerpc64le-linux-gnu P8, P9 and P10

By considering the current vector test buckets are mainly
tested without cost model, I also verified the whole patch
series was neutral for SPEC2017 int/fp on Power9 at O2,
O3 and Ofast separately.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html

Kewen Lin (9):
  vect: Move vect_model_load_cost next to the transform in vectorizable_load
  vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl
  vect: Adjust vectorizable_load costing on VMAT_INVARIANT
  vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and 
VMAT_STRIDED_SLP
  vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER
  vect: Adjust vectorizable_load costing on VMAT_LOAD_STORE_LANES
  vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE
  vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE
  vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS

 .../vect/costmodel/ppc/costmodel-pr82255.c|  31 +
 .../costmodel/ppc/costmodel-vect-reversed.c   |  22 +
 gcc/testsuite/gcc.target/i386/pr70021.c   |   2 +-
 gcc/tree-vect-stmts.cc| 651 ++
 4 files changed, 432 insertions(+), 274 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c

-- 
2.31.1



[PATCH 7/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE

2023-06-12 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on
VMAT_CONTIGUOUS_REVERSE in function vectorizable_load.  We
don't call function vect_model_load_cost for it any more.

This change makes us not miscount some required vector
permutation as the associated test case shows.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_load_cost): Assert it won't get
VMAT_CONTIGUOUS_REVERSE any more.
(vectorizable_load): Adjust the costing handling on
VMAT_CONTIGUOUS_REVERSE without calling vect_model_load_cost.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c: New test.
---
 .../costmodel/ppc/costmodel-vect-reversed.c   |  22 
 gcc/tree-vect-stmts.cc| 109 --
 2 files changed, 93 insertions(+), 38 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c
new file mode 100644
index 000..651274be038
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-additional-options "-mvsx" } */
+
+/* Verify we do cost the required vec_perm.  */
+
+int x[1024], y[1024];
+
+void
+foo ()
+{
+  for (int i = 0; i < 512; ++i)
+{
+  x[2 * i] = y[1023 - (2 * i)];
+  x[2 * i + 1] = y[1023 - (2 * i + 1)];
+}
+}
+/* The reason why it doesn't check the exact count is that
+   retrying for the epilogue with partial vector capability
+   like Power10 can result in more than 1 vec_perm.  */
+/* { dg-final { scan-tree-dump {\mvec_perm\M} "vect" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 4c5ce2ab278..7f8d9db5363 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1134,11 +1134,8 @@ vect_model_load_cost (vec_info *vinfo,
  slp_tree slp_node,
  stmt_vector_for_cost *cost_vec)
 {
-  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
- && memory_access_type != VMAT_INVARIANT
- && memory_access_type != VMAT_ELEMENTWISE
- && memory_access_type != VMAT_STRIDED_SLP
- && memory_access_type != VMAT_LOAD_STORE_LANES);
+  gcc_assert (memory_access_type == VMAT_CONTIGUOUS
+ || memory_access_type == VMAT_CONTIGUOUS_PERMUTE);
 
   unsigned int inside_cost = 0, prologue_cost = 0;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
@@ -10292,7 +10289,7 @@ vectorizable_load (vec_info *vinfo,
  = record_stmt_cost (cost_vec, cnunits,
  scalar_load, stmt_info, 0,
  vect_body);
-   goto vec_num_loop_costing_end;
+   break;
  }
if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
  vec_offset = vec_offsets[vec_num * j + i];
@@ -10335,7 +10332,7 @@ vectorizable_load (vec_info *vinfo,
inside_cost
  = record_stmt_cost (cost_vec, 1, vec_construct,
  stmt_info, 0, vect_body);
-   goto vec_num_loop_costing_end;
+   break;
  }
unsigned HOST_WIDE_INT const_offset_nunits
  = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype)
@@ -10390,7 +10387,7 @@ vectorizable_load (vec_info *vinfo,
  }
 
if (costing_p)
- goto vec_num_loop_costing_end;
+ break;
 
align =
  known_alignment (DR_TARGET_ALIGNMENT (first_dr_info));
@@ -10563,7 +10560,7 @@ vectorizable_load (vec_info *vinfo,
case dr_explicit_realign:
  {
if (costing_p)
- goto vec_num_loop_costing_end;
+ break;
tree ptr, bump;
 
tree vs = size_int (TYPE_VECTOR_SUBPARTS (vectype));
@@ -10627,7 +10624,7 @@ vectorizable_load (vec_info *vinfo,
case dr_explicit_realign_optimized:
  {
if (costing_p)
- goto vec_num_loop_costing_end;
+ break;
if (TREE_CODE (dataref_ptr) == SSA_NAME)
  new_temp = copy_ssa_name (dataref_ptr);
else
@@ -10650,22 +10647,37 @@ vectorizable_load (vec_info *vinfo,
default:
  gcc_unreachable ();
}
- vec_dest = vect_create_destination_var (scalar_dest, vectype);
-  

[PATCH 5/9] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER

2023-06-12 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on VMAT_GATHER_SCATTER
in function vectorizable_load.  We don't call function
vect_model_load_cost for it any more.

It's mainly for gather loads with IFN or emulated gather
loads, it follows the handlings in function
vect_model_load_cost.  This patch shouldn't have any
functional changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on
VMAT_GATHER_SCATTER without calling vect_model_load_cost.
(vect_model_load_cost): Adjut the assertion on VMAT_GATHER_SCATTER,
remove VMAT_GATHER_SCATTER related handlings and the related parameter
gs_info.
---
 gcc/tree-vect-stmts.cc | 123 +
 1 file changed, 75 insertions(+), 48 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 651dc800380..a3fd0bf879e 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1131,11 +1131,10 @@ vect_model_load_cost (vec_info *vinfo,
  vect_memory_access_type memory_access_type,
  dr_alignment_support alignment_support_scheme,
  int misalignment,
- gather_scatter_info *gs_info,
  slp_tree slp_node,
  stmt_vector_for_cost *cost_vec)
 {
-  gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl)
+  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
  && memory_access_type != VMAT_INVARIANT
  && memory_access_type != VMAT_ELEMENTWISE
  && memory_access_type != VMAT_STRIDED_SLP);
@@ -1222,35 +1221,9 @@ vect_model_load_cost (vec_info *vinfo,
  group_size);
 }
 
-  /* The loads themselves.  */
-  if (memory_access_type == VMAT_GATHER_SCATTER)
-{
-  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-  unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
-  if (memory_access_type == VMAT_GATHER_SCATTER
- && gs_info->ifn == IFN_LAST && !gs_info->decl)
-   /* For emulated gathers N offset vector element extracts
-  (we assume the scalar scaling and ptr + offset add is consumed by
-  the load).  */
-   inside_cost += record_stmt_cost (cost_vec, ncopies * assumed_nunits,
-vec_to_scalar, stmt_info, 0,
-vect_body);
-  /* N scalar loads plus gathering them into a vector.  */
-  inside_cost += record_stmt_cost (cost_vec,
-  ncopies * assumed_nunits,
-  scalar_load, stmt_info, 0, vect_body);
-}
-  else
-vect_get_load_cost (vinfo, stmt_info, ncopies,
-   alignment_support_scheme, misalignment, first_stmt_p,
-   _cost, _cost, 
-   cost_vec, cost_vec, true);
-
-  if (memory_access_type == VMAT_GATHER_SCATTER
-  && gs_info->ifn == IFN_LAST
-  && !gs_info->decl)
-inside_cost += record_stmt_cost (cost_vec, ncopies, vec_construct,
-stmt_info, 0, vect_body);
+  vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
+ misalignment, first_stmt_p, _cost, _cost,
+ cost_vec, cost_vec, true);
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
@@ -10137,6 +10110,7 @@ vectorizable_load (vec_info *vinfo,
 }
   tree vec_mask = NULL_TREE;
   poly_uint64 group_elt = 0;
+  unsigned int inside_cost = 0;
   for (j = 0; j < ncopies; j++)
 {
   /* 1. Create the vector or array pointer update chain.  */
@@ -10268,23 +10242,25 @@ vectorizable_load (vec_info *vinfo,
  /* Record that VEC_ARRAY is now dead.  */
  vect_clobber_variable (vinfo, stmt_info, gsi, vec_array);
}
-  else if (!costing_p)
+  else
{
  for (i = 0; i < vec_num; i++)
{
  tree final_mask = NULL_TREE;
- if (loop_masks
- && memory_access_type != VMAT_INVARIANT)
-   final_mask = vect_get_loop_mask (gsi, loop_masks,
-vec_num * ncopies,
-vectype, vec_num * j + i);
- if (vec_mask)
-   final_mask = prepare_vec_mask (loop_vinfo, mask_vectype,
-  final_mask, vec_mask, gsi);
-
- if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info))
-   dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr,
-  gsi, stmt_info, bump);
+ if (!costing_p)
+   {
+ if (loop_masks && memory_access_type != VMAT_INVARIANT)
+   final_mask
+ = vect_get_loop_mask (gsi, loop_masks, vec_num * ncopies,
+ 

[PATCH 4/9] vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP

2023-06-12 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on VMAT_ELEMENTWISE
and VMAT_STRIDED_SLP in function vectorizable_load.  We
don't call function vect_model_load_cost for them any more.

As PR82255 shows, we don't always need a vector construction
there, moving costing next to the transform can make us only
cost for vector construction when it's actually needed.
Besides, it can count the number of loads consistently for
some cases.

 PR tree-optimization/82255

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling
on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP without calling
vect_model_load_cost.
(vect_model_load_cost): Assert it won't get VMAT_ELEMENTWISE and
VMAT_STRIDED_SLP any more, and remove their related handlings.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c: New test.

2023-06-13  Bill Schmidt  
Kewen Lin  
---
 .../vect/costmodel/ppc/costmodel-pr82255.c|  31 
 gcc/tree-vect-stmts.cc| 170 +++---
 2 files changed, 134 insertions(+), 67 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c
new file mode 100644
index 000..9317ee2e15b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+/* PR82255: Ensure we don't require a vec_construct cost when we aren't
+   going to generate a strided load.  */
+
+extern int abs (int __x) __attribute__ ((__nothrow__, __leaf__))
+__attribute__ ((__const__));
+
+static int
+foo (unsigned char *w, int i, unsigned char *x, int j)
+{
+  int tot = 0;
+  for (int a = 0; a < 16; a++)
+{
+#pragma GCC unroll 16
+  for (int b = 0; b < 16; b++)
+   tot += abs (w[b] - x[b]);
+  w += i;
+  x += j;
+}
+  return tot;
+}
+
+void
+bar (unsigned char *w, unsigned char *x, int i, int *result)
+{
+  *result = foo (w, 16, x, i);
+}
+
+/* { dg-final { scan-tree-dump-times "vec_construct" 0 "vect" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 19c61d703c8..651dc800380 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1136,7 +1136,9 @@ vect_model_load_cost (vec_info *vinfo,
  stmt_vector_for_cost *cost_vec)
 {
   gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl)
- && memory_access_type != VMAT_INVARIANT);
+ && memory_access_type != VMAT_INVARIANT
+ && memory_access_type != VMAT_ELEMENTWISE
+ && memory_access_type != VMAT_STRIDED_SLP);
 
   unsigned int inside_cost = 0, prologue_cost = 0;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
@@ -1221,8 +1223,7 @@ vect_model_load_cost (vec_info *vinfo,
 }
 
   /* The loads themselves.  */
-  if (memory_access_type == VMAT_ELEMENTWISE
-  || memory_access_type == VMAT_GATHER_SCATTER)
+  if (memory_access_type == VMAT_GATHER_SCATTER)
 {
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
@@ -1244,10 +1245,10 @@ vect_model_load_cost (vec_info *vinfo,
alignment_support_scheme, misalignment, first_stmt_p,
_cost, _cost, 
cost_vec, cost_vec, true);
-  if (memory_access_type == VMAT_ELEMENTWISE
-  || memory_access_type == VMAT_STRIDED_SLP
-  || (memory_access_type == VMAT_GATHER_SCATTER
- && gs_info->ifn == IFN_LAST && !gs_info->decl))
+
+  if (memory_access_type == VMAT_GATHER_SCATTER
+  && gs_info->ifn == IFN_LAST
+  && !gs_info->decl)
 inside_cost += record_stmt_cost (cost_vec, ncopies, vec_construct,
 stmt_info, 0, vect_body);
 
@@ -9591,14 +9592,6 @@ vectorizable_load (vec_info *vinfo,
   if (memory_access_type == VMAT_ELEMENTWISE
   || memory_access_type == VMAT_STRIDED_SLP)
 {
-  if (costing_p)
-   {
- vect_model_load_cost (vinfo, stmt_info, ncopies, vf,
-   memory_access_type, alignment_support_scheme,
-   misalignment, _info, slp_node, cost_vec);
- return true;
-   }
-
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
   tree offvar;
@@ -9610,6 +9603,7 @@ vectorizable_load (vec_info *vinfo,
   unsigned int const_nunits = nunits.to_constant ();
   unsigned HOST_WIDE_INT cst_offset = 0;
   tree dr_offset;
+  unsigned int inside_cost = 0;
 
   gcc_assert (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo));
   gcc_assert (!nested_in_vect_loop);
@@ -9624,6 +9618,7 @@ vectorizable_load (vec_info *vinfo,
  first_stmt_info = stmt_info;
  first_dr_info = dr_info;
  

[PATCH 8/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE

2023-06-12 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on
VMAT_CONTIGUOUS_PERMUTE in function vectorizable_load.  We
don't call function vect_model_load_cost for it any more.

As the affected test case gcc.target/i386/pr70021.c shows,
the previous costing can under-cost the total generated
vector loads as for VMAT_CONTIGUOUS_PERMUTE function
vect_model_load_cost doesn't consider the group size which
is considered as vec_num during the transformation.

This patch makes the count of vector load in costing become
consistent with what we generates during the transformation.
To be more specific, for the given test case, for memory
access b[i_20], it costed for 2 vector loads before,
with this patch it costs 8 instead, it matches the final
count of generated vector loads basing from b.  This costing
change makes cost model analysis feel it's not profitable
to vectorize the first loop, so this patch adjusts the test
case without vect cost model any more.

But note that this test case also exposes something we can
improve further is that although the number of vector
permutation what we costed and generated are consistent,
but DCE can further optimize some unused permutation out,
it would be good if we can predict that and generate only
those necessary permutations.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_load_cost): Assert this function only
handle memory_access_type VMAT_CONTIGUOUS, remove some
VMAT_CONTIGUOUS_PERMUTE related handlings.
(vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS_PERMUTE
without calling vect_model_load_cost.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr70021.c: Adjust with -fno-vect-cost-model.
---
 gcc/testsuite/gcc.target/i386/pr70021.c |  2 +-
 gcc/tree-vect-stmts.cc  | 88 ++---
 2 files changed, 51 insertions(+), 39 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr70021.c 
b/gcc/testsuite/gcc.target/i386/pr70021.c
index 6562c0f2bd0..d509583601e 100644
--- a/gcc/testsuite/gcc.target/i386/pr70021.c
+++ b/gcc/testsuite/gcc.target/i386/pr70021.c
@@ -1,7 +1,7 @@
 /* PR target/70021 */
 /* { dg-do run } */
 /* { dg-require-effective-target avx2 } */
-/* { dg-options "-O2 -ftree-vectorize -mavx2 -fdump-tree-vect-details 
-mtune=skylake" } */
+/* { dg-options "-O2 -ftree-vectorize -mavx2 -fdump-tree-vect-details 
-mtune=skylake -fno-vect-cost-model" } */
 
 #include "avx2-check.h"
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 7f8d9db5363..e7a97dbe05d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1134,8 +1134,7 @@ vect_model_load_cost (vec_info *vinfo,
  slp_tree slp_node,
  stmt_vector_for_cost *cost_vec)
 {
-  gcc_assert (memory_access_type == VMAT_CONTIGUOUS
- || memory_access_type == VMAT_CONTIGUOUS_PERMUTE);
+  gcc_assert (memory_access_type == VMAT_CONTIGUOUS);
 
   unsigned int inside_cost = 0, prologue_cost = 0;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
@@ -1174,26 +1173,6 @@ vect_model_load_cost (vec_info *vinfo,
  once per group anyhow.  */
   bool first_stmt_p = (first_stmt_info == stmt_info);
 
-  /* We assume that the cost of a single load-lanes instruction is
- equivalent to the cost of DR_GROUP_SIZE separate loads.  If a grouped
- access is instead being provided by a load-and-permute operation,
- include the cost of the permutes.  */
-  if (first_stmt_p
-  && memory_access_type == VMAT_CONTIGUOUS_PERMUTE)
-{
-  /* Uses an even and odd extract operations or shuffle operations
-for each needed permute.  */
-  int group_size = DR_GROUP_SIZE (first_stmt_info);
-  int nstmts = ncopies * ceil_log2 (group_size) * group_size;
-  inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm,
-  stmt_info, 0, vect_body);
-
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
- "vect_model_load_cost: strided group_size = %d .\n",
- group_size);
-}
-
   vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
  misalignment, first_stmt_p, _cost, _cost,
  cost_vec, cost_vec, true);
@@ -10652,11 +10631,22 @@ vectorizable_load (vec_info *vinfo,
 alignment support schemes.  */
  if (costing_p)
{
- if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
+ /* For VMAT_CONTIGUOUS_PERMUTE if it's grouped load, we
+only need to take care of the first stmt, whose
+stmt_info is first_stmt_info, vec_num iterating on it
+will cover the cost for the remaining, it's consistent
+with transforming.  For the prologue cost for realign,
+we only need to count it once for the whole group.  */
+   

[PATCH 9/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS

2023-06-12 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on VMAT_CONTIGUOUS in
function vectorizable_load.  We don't call function
vect_model_load_cost for it any more.  It removes function
vect_model_load_cost which becomes useless and unreachable
now.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_load_cost): Remove.
(vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS without
calling vect_model_load_cost.
---
 gcc/tree-vect-stmts.cc | 92 +-
 1 file changed, 9 insertions(+), 83 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e7a97dbe05d..be3b277e8e1 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1117,73 +1117,6 @@ vect_get_store_cost (vec_info *, stmt_vec_info 
stmt_info, int ncopies,
 }
 }
 
-
-/* Function vect_model_load_cost
-
-   Models cost for loads.  In the case of grouped accesses, one access has
-   the overhead of the grouped access attributed to it.  Since unaligned
-   accesses are supported for loads, we also account for the costs of the
-   access scheme chosen.  */
-
-static void
-vect_model_load_cost (vec_info *vinfo,
- stmt_vec_info stmt_info, unsigned ncopies, poly_uint64 vf,
- vect_memory_access_type memory_access_type,
- dr_alignment_support alignment_support_scheme,
- int misalignment,
- slp_tree slp_node,
- stmt_vector_for_cost *cost_vec)
-{
-  gcc_assert (memory_access_type == VMAT_CONTIGUOUS);
-
-  unsigned int inside_cost = 0, prologue_cost = 0;
-  bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
-
-  gcc_assert (cost_vec);
-
-  /* ???  Somehow we need to fix this at the callers.  */
-  if (slp_node)
-ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-
-  if (slp_node && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
-{
-  /* If the load is permuted then the alignment is determined by
-the first group element not by the first scalar stmt DR.  */
-  stmt_vec_info first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
-  /* Record the cost for the permutation.  */
-  unsigned n_perms, n_loads;
-  vect_transform_slp_perm_load (vinfo, slp_node, vNULL, NULL,
-   vf, true, _perms, _loads);
-  inside_cost += record_stmt_cost (cost_vec, n_perms, vec_perm,
-  first_stmt_info, 0, vect_body);
-
-  /* And adjust the number of loads performed.  This handles
-redundancies as well as loads that are later dead.  */
-  ncopies = n_loads;
-}
-
-  /* Grouped loads read all elements in the group at once,
- so we want the DR for the first statement.  */
-  stmt_vec_info first_stmt_info = stmt_info;
-  if (!slp_node && grouped_access_p)
-first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
-
-  /* True if we should include any once-per-group costs as well as
- the cost of the statement itself.  For SLP we only get called
- once per group anyhow.  */
-  bool first_stmt_p = (first_stmt_info == stmt_info);
-
-  vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
- misalignment, first_stmt_p, _cost, _cost,
- cost_vec, cost_vec, true);
-
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
- "vect_model_load_cost: inside_cost = %d, "
- "prologue_cost = %d .\n", inside_cost, prologue_cost);
-}
-
-
 /* Calculate cost of DR's memory access.  */
 void
 vect_get_load_cost (vec_info *, stmt_vec_info stmt_info, int ncopies,
@@ -10639,7 +10572,8 @@ vectorizable_load (vec_info *vinfo,
 we only need to count it once for the whole group.  */
  bool first_stmt_info_p = first_stmt_info == stmt_info;
  bool add_realign_cost = first_stmt_info_p && i == 0;
- if (memory_access_type == VMAT_CONTIGUOUS_REVERSE
+ if (memory_access_type == VMAT_CONTIGUOUS
+ || memory_access_type == VMAT_CONTIGUOUS_REVERSE
  || (memory_access_type == VMAT_CONTIGUOUS_PERMUTE
  && (!grouped_load || first_stmt_info_p)))
vect_get_load_cost (vinfo, stmt_info, 1,
@@ -10763,15 +10697,14 @@ vectorizable_load (vec_info *vinfo,
 direct vect_transform_slp_perm_load to DCE the unused parts.
 ???  This is a hack to prevent compile-time issues as seen
 in PR101120 and friends.  */
- if (costing_p
- && memory_access_type != VMAT_CONTIGUOUS)
+ if (costing_p)
{
  vect_transform_slp_perm_load (vinfo, slp_node, vNULL, nullptr, vf,
true, _perms, nullptr);
  inside_cost = record_stmt_cost (cost_vec, n_perms, vec_perm,

[PATCH 2/9] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl

2023-06-12 Thread Kewen Lin via Gcc-patches
This patch adds one extra argument cost_vec to function
vect_build_gather_load_calls, so that we can do costing
next to the tranform in vect_build_gather_load_calls.
For now, the implementation just follows the handlings in
vect_model_load_cost, it isn't so good, so placing one
FIXME for any further improvement.  This patch should not
cause any functional changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_build_gather_load_calls): Add the handlings
on costing with one extra argument cost_vec.
(vectorizable_load): Adjust the call to vect_build_gather_load_calls.
(vect_model_load_cost): Assert it won't get VMAT_GATHER_SCATTER with
gs_info.decl set any more.
---
 gcc/tree-vect-stmts.cc | 31 +++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 44514658be3..744cdf40e26 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1135,6 +1135,8 @@ vect_model_load_cost (vec_info *vinfo,
  slp_tree slp_node,
  stmt_vector_for_cost *cost_vec)
 {
+  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl);
+
   unsigned int inside_cost = 0, prologue_cost = 0;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
 
@@ -2819,7 +2821,8 @@ vect_build_gather_load_calls (vec_info *vinfo, 
stmt_vec_info stmt_info,
  gimple_stmt_iterator *gsi,
  gimple **vec_stmt,
  gather_scatter_info *gs_info,
- tree mask)
+ tree mask,
+ stmt_vector_for_cost *cost_vec)
 {
   loop_vec_info loop_vinfo = dyn_cast  (vinfo);
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
@@ -2831,6 +2834,23 @@ vect_build_gather_load_calls (vec_info *vinfo, 
stmt_vec_info stmt_info,
   poly_uint64 gather_off_nunits
 = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype);
 
+  /* FIXME: Keep the previous costing way in vect_model_load_cost by costing
+ N scalar loads, but it should be tweaked to use target specific costs
+ on related gather load calls.  */
+  if (!vec_stmt)
+{
+  unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
+  unsigned int inside_cost;
+  inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits,
+ scalar_load, stmt_info, 0, vect_body);
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"vect_model_load_cost: inside_cost = %d, "
+"prologue_cost = 0 .\n",
+inside_cost);
+  return;
+}
+
   tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info->decl));
   tree rettype = TREE_TYPE (TREE_TYPE (gs_info->decl));
   tree srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
@@ -9483,13 +9503,8 @@ vectorizable_load (vec_info *vinfo,
 
   if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl)
 {
-  if (costing_p)
-   vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type,
- alignment_support_scheme, misalignment, _info,
- slp_node, cost_vec);
-  else
-   vect_build_gather_load_calls (vinfo, stmt_info, gsi, vec_stmt, _info,
- mask);
+  vect_build_gather_load_calls (vinfo, stmt_info, gsi, vec_stmt, _info,
+   mask, cost_vec);
   return true;
 }
 
-- 
2.31.1



[PATCH 3/9] vect: Adjust vectorizable_load costing on VMAT_INVARIANT

2023-06-12 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on VMAT_INVARIANT in
function vectorizable_load.  We don't call function
vect_model_load_cost for it any more.

To make the costing on VMAT_INVARIANT better, this patch is
to query hoist_defs_of_uses for hoisting decision, and add
costs for different "where" based on it.  Currently function
hoist_defs_of_uses would always hoist the defs of all SSA
uses, adding one argument HOIST_P aims to avoid the actual
hoisting during costing phase.

gcc/ChangeLog:

* tree-vect-stmts.cc (hoist_defs_of_uses): Add one argument HOIST_P.
(vectorizable_load): Adjust the handling on VMAT_INVARIANT to respect
hoisting decision and without calling vect_model_load_cost.
(vect_model_load_cost): Assert it won't get VMAT_INVARIANT any more
and remove VMAT_INVARIANT related handlings.
---
 gcc/tree-vect-stmts.cc | 61 +++---
 1 file changed, 39 insertions(+), 22 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 744cdf40e26..19c61d703c8 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1135,7 +1135,8 @@ vect_model_load_cost (vec_info *vinfo,
  slp_tree slp_node,
  stmt_vector_for_cost *cost_vec)
 {
-  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl);
+  gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl)
+ && memory_access_type != VMAT_INVARIANT);
 
   unsigned int inside_cost = 0, prologue_cost = 0;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
@@ -1238,16 +1239,6 @@ vect_model_load_cost (vec_info *vinfo,
   ncopies * assumed_nunits,
   scalar_load, stmt_info, 0, vect_body);
 }
-  else if (memory_access_type == VMAT_INVARIANT)
-{
-  /* Invariant loads will ideally be hoisted and splat to a vector.  */
-  prologue_cost += record_stmt_cost (cost_vec, 1,
-scalar_load, stmt_info, 0,
-vect_prologue);
-  prologue_cost += record_stmt_cost (cost_vec, 1,
-scalar_to_vec, stmt_info, 0,
-vect_prologue);
-}
   else
 vect_get_load_cost (vinfo, stmt_info, ncopies,
alignment_support_scheme, misalignment, first_stmt_p,
@@ -9121,10 +9112,11 @@ permute_vec_elements (vec_info *vinfo,
 /* Hoist the definitions of all SSA uses on STMT_INFO out of the loop LOOP,
inserting them on the loops preheader edge.  Returns true if we
were successful in doing so (and thus STMT_INFO can be moved then),
-   otherwise returns false.  */
+   otherwise returns false.  HOIST_P indicates if we want to hoist the
+   definitions of all SSA uses, it would be false when we are costing.  */
 
 static bool
-hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop)
+hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop, bool hoist_p)
 {
   ssa_op_iter i;
   tree op;
@@ -9158,6 +9150,9 @@ hoist_defs_of_uses (stmt_vec_info stmt_info, class loop 
*loop)
   if (!any)
 return true;
 
+  if (!hoist_p)
+return true;
+
   FOR_EACH_SSA_TREE_OPERAND (op, stmt_info->stmt, i, SSA_OP_USE)
 {
   gimple *def_stmt = SSA_NAME_DEF_STMT (op);
@@ -9510,14 +9505,6 @@ vectorizable_load (vec_info *vinfo,
 
   if (memory_access_type == VMAT_INVARIANT)
 {
-  if (costing_p)
-   {
- vect_model_load_cost (vinfo, stmt_info, ncopies, vf,
-   memory_access_type, alignment_support_scheme,
-   misalignment, _info, slp_node, cost_vec);
- return true;
-   }
-
   gcc_assert (!grouped_load && !mask && !bb_vinfo);
   /* If we have versioned for aliasing or the loop doesn't
 have any data dependencies that would preclude this,
@@ -9525,7 +9512,37 @@ vectorizable_load (vec_info *vinfo,
 thus we can insert it on the preheader edge.  */
   bool hoist_p = (LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo)
  && !nested_in_vect_loop
- && hoist_defs_of_uses (stmt_info, loop));
+ && hoist_defs_of_uses (stmt_info, loop, !costing_p));
+  if (costing_p)
+   {
+ if (hoist_p)
+   {
+ unsigned int prologue_cost;
+ prologue_cost = record_stmt_cost (cost_vec, 1, scalar_load,
+   stmt_info, 0, vect_prologue);
+ prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
+stmt_info, 0, vect_prologue);
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"vect_model_load_cost: inside_cost = 0, "
+"prologue_cost = %d 

[PATCH 6/9] vect: Adjust vectorizable_load costing on VMAT_LOAD_STORE_LANES

2023-06-12 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on
VMAT_LOAD_STORE_LANES in function vectorizable_load.  We
don't call function vect_model_load_cost for it any more.

It follows what we do in the function vect_model_load_cost,
and shouldn't have any functional changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on
VMAT_LOAD_STORE_LANES without calling vect_model_load_cost.
(vectorizable_load): Remove VMAT_LOAD_STORE_LANES related handling and
assert it will never get VMAT_LOAD_STORE_LANES.
---
 gcc/tree-vect-stmts.cc | 73 --
 1 file changed, 42 insertions(+), 31 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a3fd0bf879e..4c5ce2ab278 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1137,7 +1137,8 @@ vect_model_load_cost (vec_info *vinfo,
   gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
  && memory_access_type != VMAT_INVARIANT
  && memory_access_type != VMAT_ELEMENTWISE
- && memory_access_type != VMAT_STRIDED_SLP);
+ && memory_access_type != VMAT_STRIDED_SLP
+ && memory_access_type != VMAT_LOAD_STORE_LANES);
 
   unsigned int inside_cost = 0, prologue_cost = 0;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
@@ -1176,31 +1177,6 @@ vect_model_load_cost (vec_info *vinfo,
  once per group anyhow.  */
   bool first_stmt_p = (first_stmt_info == stmt_info);
 
-  /* An IFN_LOAD_LANES will load all its vector results, regardless of which
- ones we actually need.  Account for the cost of unused results.  */
-  if (first_stmt_p && !slp_node && memory_access_type == VMAT_LOAD_STORE_LANES)
-{
-  unsigned int gaps = DR_GROUP_SIZE (first_stmt_info);
-  stmt_vec_info next_stmt_info = first_stmt_info;
-  do
-   {
- gaps -= 1;
- next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info);
-   }
-  while (next_stmt_info);
-  if (gaps)
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"vect_model_load_cost: %d unused vectors.\n",
-gaps);
- vect_get_load_cost (vinfo, stmt_info, ncopies * gaps,
- alignment_support_scheme, misalignment, false,
- _cost, _cost,
- cost_vec, cost_vec, true);
-   }
-}
-
   /* We assume that the cost of a single load-lanes instruction is
  equivalent to the cost of DR_GROUP_SIZE separate loads.  If a grouped
  access is instead being provided by a load-and-permute operation,
@@ -10110,7 +10086,7 @@ vectorizable_load (vec_info *vinfo,
 }
   tree vec_mask = NULL_TREE;
   poly_uint64 group_elt = 0;
-  unsigned int inside_cost = 0;
+  unsigned int inside_cost = 0, prologue_cost = 0;
   for (j = 0; j < ncopies; j++)
 {
   /* 1. Create the vector or array pointer update chain.  */
@@ -10190,8 +10166,42 @@ vectorizable_load (vec_info *vinfo,
dr_chain.create (vec_num);
 
   gimple *new_stmt = NULL;
-  if (memory_access_type == VMAT_LOAD_STORE_LANES && !costing_p)
+  if (memory_access_type == VMAT_LOAD_STORE_LANES)
{
+ if (costing_p)
+   {
+ /* An IFN_LOAD_LANES will load all its vector results,
+regardless of which ones we actually need.  Account
+for the cost of unused results.  */
+ if (grouped_load && first_stmt_info == stmt_info)
+   {
+ unsigned int gaps = DR_GROUP_SIZE (first_stmt_info);
+ stmt_vec_info next_stmt_info = first_stmt_info;
+ do
+   {
+ gaps -= 1;
+ next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info);
+   }
+ while (next_stmt_info);
+ if (gaps)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"vect_model_load_cost: %d "
+"unused vectors.\n",
+gaps);
+ vect_get_load_cost (vinfo, stmt_info, gaps,
+ alignment_support_scheme,
+ misalignment, false, _cost,
+ _cost, cost_vec, cost_vec,
+ true);
+   }
+   }
+ vect_get_load_cost (vinfo, stmt_info, 1, alignment_support_scheme,
+ misalignment, false, _cost,
+ _cost, cost_vec, cost_vec, true);
+ continue;
+   }
  tree vec_array;
 
  vec_array = create_vector_array 

[PATCH 1/9] vect: Move vect_model_load_cost next to the transform in vectorizable_load

2023-06-12 Thread Kewen Lin via Gcc-patches
This patch is an initial patch to move costing next to the
transform, it still adopts vect_model_load_cost for costing
but moves and duplicates it down according to the handlings
of different vect_memory_access_types, hope it can make the
subsequent patches easy to review.  This patch should not
have any functional changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Move and duplicate the call
to vect_model_load_cost down to some different transform paths
according to the handlings of different vect_memory_access_types.
---
 gcc/tree-vect-stmts.cc | 86 --
 1 file changed, 57 insertions(+), 29 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a7acc032d47..44514658be3 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9430,7 +9430,9 @@ vectorizable_load (vec_info *vinfo,
}
 }
 
-  if (!vec_stmt) /* transformation not required.  */
+  bool costing_p = !vec_stmt;
+
+  if (costing_p) /* transformation not required.  */
 {
   if (slp_node
  && mask
@@ -9464,17 +9466,13 @@ vectorizable_load (vec_info *vinfo,
vinfo->any_known_not_updated_vssa = true;
 
   STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
-  vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type,
-   alignment_support_scheme, misalignment,
-   _info, slp_node, cost_vec);
-  return true;
 }
 
   if (!slp)
 gcc_assert (memory_access_type
== STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info));
 
-  if (dump_enabled_p ())
+  if (dump_enabled_p () && !costing_p)
 dump_printf_loc (MSG_NOTE, vect_location,
  "transform load. ncopies = %d\n", ncopies);
 
@@ -9485,13 +9483,26 @@ vectorizable_load (vec_info *vinfo,
 
   if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl)
 {
-  vect_build_gather_load_calls (vinfo,
-   stmt_info, gsi, vec_stmt, _info, mask);
+  if (costing_p)
+   vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type,
+ alignment_support_scheme, misalignment, _info,
+ slp_node, cost_vec);
+  else
+   vect_build_gather_load_calls (vinfo, stmt_info, gsi, vec_stmt, _info,
+ mask);
   return true;
 }
 
   if (memory_access_type == VMAT_INVARIANT)
 {
+  if (costing_p)
+   {
+ vect_model_load_cost (vinfo, stmt_info, ncopies, vf,
+   memory_access_type, alignment_support_scheme,
+   misalignment, _info, slp_node, cost_vec);
+ return true;
+   }
+
   gcc_assert (!grouped_load && !mask && !bb_vinfo);
   /* If we have versioned for aliasing or the loop doesn't
 have any data dependencies that would preclude this,
@@ -9548,6 +9559,14 @@ vectorizable_load (vec_info *vinfo,
   if (memory_access_type == VMAT_ELEMENTWISE
   || memory_access_type == VMAT_STRIDED_SLP)
 {
+  if (costing_p)
+   {
+ vect_model_load_cost (vinfo, stmt_info, ncopies, vf,
+   memory_access_type, alignment_support_scheme,
+   misalignment, _info, slp_node, cost_vec);
+ return true;
+   }
+
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
   tree offvar;
@@ -9989,17 +10008,20 @@ vectorizable_load (vec_info *vinfo,
 here, since we can't guarantee first_stmt_info DR has been
 initialized yet, use first_stmt_info_for_drptr DR by bumping the
 distance from first_stmt_info DR instead as below.  */
-  if (!diff_first_stmt_info)
-   msq = vect_setup_realignment (vinfo,
- first_stmt_info, gsi, _token,
- alignment_support_scheme, NULL_TREE,
- _loop);
-  if (alignment_support_scheme == dr_explicit_realign_optimized)
-   {
- phi = as_a  (SSA_NAME_DEF_STMT (msq));
- offset = size_binop (MINUS_EXPR, TYPE_SIZE_UNIT (vectype),
-  size_one_node);
- gcc_assert (!first_stmt_info_for_drptr);
+  if (!costing_p)
+   {
+ if (!diff_first_stmt_info)
+   msq = vect_setup_realignment (vinfo, first_stmt_info, gsi,
+ _token,
+ alignment_support_scheme, NULL_TREE,
+ _loop);
+ if (alignment_support_scheme == dr_explicit_realign_optimized)
+   {
+ phi = as_a (SSA_NAME_DEF_STMT (msq));
+ offset = size_binop (MINUS_EXPR, TYPE_SIZE_UNIT (vectype),
+  size_one_node);
+ gcc_assert (!first_stmt_info_for_drptr);
+   }
}

[PATCH 8/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4

2022-11-24 Thread Kewen Lin via Gcc-patches
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 4, it's to rework the handlings on GE/GEU/LE/LEU,
also make the function not recursive any more.  This patch
doesn't introduce any functionality change.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refine the
handlings for operators GE/GEU/LE/LEU.
---
 gcc/config/rs6000/rs6000.cc | 87 -
 1 file changed, 17 insertions(+), 70 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index b4ca7b3d1b1..a3645e321a7 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15640,7 +15640,7 @@ output_cbranch (rtx op, const char *label, int 
reversed, rtx_insn *insn)
 }
 
 /* Emit vector compare for operands OP0 and OP1 using code RCODE.
-   DMODE is expected destination mode. This is a recursive function.  */
+   DMODE is expected destination mode.  */
 
 static rtx
 rs6000_emit_vector_compare (enum rtx_code rcode,
@@ -15649,7 +15649,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
 {
   gcc_assert (VECTOR_UNIT_ALTIVEC_OR_VSX_P (dmode));
   gcc_assert (GET_MODE (op0) == GET_MODE (op1));
-  rtx mask;
+  rtx mask = gen_reg_rtx (dmode);
 
   /* In vector.md, we support all kinds of vector float point
  comparison operators in a comparison rtl pattern, we can
@@ -15658,7 +15658,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
  of raising invalid exception.  */
   if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
 {
-  mask = gen_reg_rtx (dmode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
   return mask;
 }
@@ -15667,11 +15666,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
  have direct hardware instructions, just emit it directly
  here.  */
   if (rcode == EQ || rcode == GT || rcode == GTU)
-{
-  mask = gen_reg_rtx (dmode);
-  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
-  return mask;
-}
+emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
   else if (rcode == LT || rcode == LTU)
 {
   /* lt{,u}(a,b) = gt{,u}(b,a)  */
@@ -15679,76 +15674,28 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   std::swap (op0, op1);
   mask = gen_reg_rtx (dmode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
-  return mask;
 }
-  else if (rcode == NE)
+  else if (rcode == NE || rcode == LE || rcode == LEU)
 {
-  /* ne(a,b) = ~eq(a,b)  */
+  /* ne(a,b) = ~eq(a,b); le{,u}(a,b) = ~gt{,u}(a,b)  */
+  enum rtx_code code = reverse_condition (rcode);
   mask = gen_reg_rtx (dmode);
-  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (EQ, dmode, op0, op1)));
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
+  enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
+  gcc_assert (nor_code != CODE_FOR_nothing);
+  emit_insn (GEN_FCN (nor_code) (mask, mask));
+} else {
+  /* ge{,u}(a,b) = ~gt{,u}(b,a)  */
+  gcc_assert (rcode == GE || rcode == GEU);
+  enum rtx_code code = rcode == GE ? GT : GTU;
+  mask = gen_reg_rtx (dmode);
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
   enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
   gcc_assert (nor_code != CODE_FOR_nothing);
   emit_insn (GEN_FCN (nor_code) (mask, mask));
-  return mask;
-}
-
-  switch (rcode)
-{
-case GE:
-case GEU:
-case LE:
-case LEU:
-  /* Try GT/GTU/LT/LTU OR EQ */
-  {
-   rtx c_rtx, eq_rtx;
-   enum insn_code ior_code;
-   enum rtx_code new_code;
-
-   switch (rcode)
- {
- case  GE:
-   new_code = GT;
-   break;
-
- case GEU:
-   new_code = GTU;
-   break;
-
- case LE:
-   new_code = LT;
-   break;
-
- case LEU:
-   new_code = LTU;
-   break;
-
- default:
-   gcc_unreachable ();
- }
-
-   ior_code = optab_handler (ior_optab, dmode);
-   if (ior_code == CODE_FOR_nothing)
- return NULL_RTX;
-
-   c_rtx = rs6000_emit_vector_compare (new_code, op0, op1, dmode);
-   if (!c_rtx)
- return NULL_RTX;
-
-   eq_rtx = rs6000_emit_vector_compare (EQ, op0, op1, dmode);
-   if (!eq_rtx)
- return NULL_RTX;
-
-   mask = gen_reg_rtx (dmode);
-   emit_insn (GEN_FCN (ior_code) (mask, c_rtx, eq_rtx));
-   return mask;
-  }
-  break;
-default:
-  return NULL_RTX;
 }
 
-  /* You only get two chances.  */
-  return NULL_RTX;
+  return mask;
 }
 
 /* Emit vector conditional expression.  DEST is destination. 

[PATCH 5/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1

2022-11-24 Thread Kewen Lin via Gcc-patches
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 1, it's to remove the helper function
rs6000_emit_vector_compare_inner and move the logics into
rs6000_emit_vector_compare.  This patch doesn't introduce any
functionality change.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove.
(rs6000_emit_vector_compare): Emit rtx comparison for operators EQ/
GT/GTU directly.
---
 gcc/config/rs6000/rs6000.cc | 37 +
 1 file changed, 9 insertions(+), 28 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 94e039649f5..0a5826800c5 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15639,30 +15639,6 @@ output_cbranch (rtx op, const char *label, int 
reversed, rtx_insn *insn)
   return string;
 }
 
-/* Return insn for VSX or Altivec comparisons.  */
-
-static rtx
-rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
-{
-  machine_mode mode = GET_MODE (op0);
-  gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT);
-
-  switch (code)
-{
-default:
-  break;
-
-case EQ:
-case GT:
-case GTU:
-  rtx mask = gen_reg_rtx (mode);
-  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1)));
-  return mask;
-}
-
-  return NULL_RTX;
-}
-
 /* Emit vector compare for operands OP0 and OP1 using code RCODE.
DMODE is expected destination mode. This is a recursive function.  */
 
@@ -15687,10 +15663,15 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   return mask;
 }
 
-  /* See if the comparison works as is.  */
-  mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
-  if (mask)
-return mask;
+  /* For any of vector integer comparison operators for which we
+ have direct hardware instructions, just emit it directly
+ here.  */
+  if (rcode == EQ || rcode == GT || rcode == GTU)
+{
+  mask = gen_reg_rtx (dmode);
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
+  return mask;
+}
 
   bool swap_operands = false;
   bool try_again = false;
-- 
2.27.0



[PATCH 4/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4

2022-11-24 Thread Kewen Lin via Gcc-patches
All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is part 4, it further checks for comparison opeators
LT/UNGE.  In rs6000_emit_vector_compare, for the handling
of LT, it switches to use code GT, swaps operands and try
again, it's exactly the same as what we have in vector.md:

; lt(a,b)   = gt(b,a)

As to UNGE, in rs6000_emit_vector_compare, it uses reversed
code LT and further operates on the result with one_cmpl,
it's also the same as what's in vector.md:

; unge(a,b) = ~lt(a,b)

This patch should not have any functionality change too.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Emit rtx
comparison for operators LT/UNGE of MODE_VECTOR_FLOAT directly.
(rs6000_emit_vector_compare): Move assertion of no MODE_VECTOR_FLOAT to
function beginning.
---
 gcc/config/rs6000/rs6000.cc | 24 
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 98754805bd2..94e039649f5 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15645,6 +15645,7 @@ static rtx
 rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
 {
   machine_mode mode = GET_MODE (op0);
+  gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT);
 
   switch (code)
 {
@@ -15654,7 +15655,6 @@ rs6000_emit_vector_compare_inner (enum rtx_code code, 
rtx op0, rtx op1)
 case EQ:
 case GT:
 case GTU:
-  gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT);
   rtx mask = gen_reg_rtx (mode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1)));
   return mask;
@@ -15679,18 +15679,8 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
  comparison operators in a comparison rtl pattern, we can
  just emit the comparison rtx insn directly here.  Besides,
  we should have a centralized place to handle the possibility
- of raising invalid exception.  For EQ/GT/GE/UNORDERED/
- ORDERED/LTGT/UNEQ, they are handled equivalently as before;
- for NE/UNLE/UNLT, they are handled with reversed code
- and inverting, it's the same as before; for LE/UNGT, they
- are handled with LE ior EQ previously, emitting directly
- here will make use of GE later, it's slightly better;
-
- FIXME: Handle the remaining vector float comparison operators
- here.  */
-  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
-  && rcode != LT
-  && rcode != UNGE)
+ of raising invalid exception.  */
+  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
 {
   mask = gen_reg_rtx (dmode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
@@ -15718,23 +15708,17 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   try_again = true;
   break;
 case NE:
-case UNGE:
   /* Invert condition and try again.
 e.g., A != B becomes ~(A==B).  */
   {
-   enum rtx_code rev_code;
enum insn_code nor_code;
rtx mask2;
 
-   rev_code = reverse_condition_maybe_unordered (rcode);
-   if (rev_code == UNKNOWN)
- return NULL_RTX;
-
nor_code = optab_handler (one_cmpl_optab, dmode);
if (nor_code == CODE_FOR_nothing)
  return NULL_RTX;
 
-   mask2 = rs6000_emit_vector_compare (rev_code, op0, op1, dmode);
+   mask2 = rs6000_emit_vector_compare (EQ, op0, op1, dmode);
if (!mask2)
  return NULL_RTX;
 
-- 
2.27.0



[PATCH 2/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2

2022-11-24 Thread Kewen Lin via Gcc-patches
All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is part 2, it further checks for comparison opeators
NE/UNLE/UNLT.  In rs6000_emit_vector_compare, they are
handled with reversed code which is queried from function
reverse_condition_maybe_unordered and inverting with
one_cmpl_optab.  It's the same as what we have in vector.md:

; ne(a,b)   = ~eq(a,b)
; unle(a,b) = ~gt(a,b)
; unlt(a,b) = ~ge(a,b)

The operators on the right side have been supported in part 1.
This patch should not have any functionality change too.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Emit rtx
comparison for operators NE/UNLE/UNLT of MODE_VECTOR_FLOAT directly.
---
 gcc/config/rs6000/rs6000.cc | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 5a8f7ff3bf8..09299bef6a2 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15679,20 +15679,18 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
  comparison operators in a comparison rtl pattern, we can
  just emit the comparison rtx insn directly here.  Besides,
  we should have a centralized place to handle the possibility
- of raising invalid exception.  As the first step, only check
- operators EQ/GT/GE/UNORDERED/ORDERED/LTGT/UNEQ for now, they
- are handled equivalently as before.
+ of raising invalid exception.  For EQ/GT/GE/UNORDERED/
+ ORDERED/LTGT/UNEQ, they are handled equivalently as before;
+ for NE/UNLE/UNLT, they are handled with reversed code
+ and inverting, it's the same as before.
 
  FIXME: Handle the remaining vector float comparison operators
  here.  */
   if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
-  && (rcode == EQ
- || rcode == GT
- || rcode == GE
- || rcode == UNORDERED
- || rcode == ORDERED
- || rcode == LTGT
- || rcode == UNEQ))
+  && rcode != LT
+  && rcode != LE
+  && rcode != UNGE
+  && rcode != UNGT)
 {
   mask = gen_reg_rtx (dmode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
@@ -15720,8 +15718,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   try_again = true;
   break;
 case NE:
-case UNLE:
-case UNLT:
 case UNGE:
 case UNGT:
   /* Invert condition and try again.
-- 
2.27.0



[PATCH 7/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3

2022-11-24 Thread Kewen Lin via Gcc-patches
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 3, it's to refactor the handlings on NE.
This patch doesn't introduce any functionality change.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refactor the
handlings for operator NE.
---
 gcc/config/rs6000/rs6000.cc | 30 ++
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index c1aebbb5c03..b4ca7b3d1b1 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15681,29 +15681,19 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
   return mask;
 }
+  else if (rcode == NE)
+{
+  /* ne(a,b) = ~eq(a,b)  */
+  mask = gen_reg_rtx (dmode);
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (EQ, dmode, op0, op1)));
+  enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
+  gcc_assert (nor_code != CODE_FOR_nothing);
+  emit_insn (GEN_FCN (nor_code) (mask, mask));
+  return mask;
+}
 
   switch (rcode)
 {
-case NE:
-  /* Invert condition and try again.
-e.g., A != B becomes ~(A==B).  */
-  {
-   enum insn_code nor_code;
-   rtx mask2;
-
-   nor_code = optab_handler (one_cmpl_optab, dmode);
-   if (nor_code == CODE_FOR_nothing)
- return NULL_RTX;
-
-   mask2 = rs6000_emit_vector_compare (EQ, op0, op1, dmode);
-   if (!mask2)
- return NULL_RTX;
-
-   mask = gen_reg_rtx (dmode);
-   emit_insn (GEN_FCN (nor_code) (mask, mask2));
-   return mask;
-  }
-  break;
 case GE:
 case GEU:
 case LE:
-- 
2.27.0



[PATCH 1/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1

2022-11-24 Thread Kewen Lin via Gcc-patches
All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is part 1, it only handles the operators which are
already emitted with an rtx comparison previously in function
rs6000_emit_vector_compare_inner, they are EQ/GT/GE/ORDERED/
UNORDERED/UNEQ/LTGT.  There is no functionality change.

With this change, rs6000_emit_vector_compare_inner would
only work for vector integer comparison handling, it would
be cleaned up later in vector integer comparison rework.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Move
MODE_VECTOR_FLOAT handlings out.
(rs6000_emit_vector_compare): Emit rtx comparison for operators EQ/GT/
GE/UNORDERED/ORDERED/UNEQ/LTGT of MODE_VECTOR_FLOAT directly, and
adjust one call site of rs6000_emit_vector_compare_inner to
rs6000_emit_vector_compare.
---
 gcc/config/rs6000/rs6000.cc | 47 -
 1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d2743f7bce6..5a8f7ff3bf8 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15644,7 +15644,6 @@ output_cbranch (rtx op, const char *label, int 
reversed, rtx_insn *insn)
 static rtx
 rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
 {
-  rtx mask;
   machine_mode mode = GET_MODE (op0);
 
   switch (code)
@@ -15652,19 +15651,11 @@ rs6000_emit_vector_compare_inner (enum rtx_code code, 
rtx op0, rtx op1)
 default:
   break;
 
-case GE:
-  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
-   return NULL_RTX;
-  /* FALLTHRU */
-
 case EQ:
 case GT:
 case GTU:
-case ORDERED:
-case UNORDERED:
-case UNEQ:
-case LTGT:
-  mask = gen_reg_rtx (mode);
+  gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT);
+  rtx mask = gen_reg_rtx (mode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1)));
   return mask;
 }
@@ -15680,18 +15671,42 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
rtx op0, rtx op1,
machine_mode dmode)
 {
-  rtx mask;
-  bool swap_operands = false;
-  bool try_again = false;
-
   gcc_assert (VECTOR_UNIT_ALTIVEC_OR_VSX_P (dmode));
   gcc_assert (GET_MODE (op0) == GET_MODE (op1));
+  rtx mask;
+
+  /* In vector.md, we support all kinds of vector float point
+ comparison operators in a comparison rtl pattern, we can
+ just emit the comparison rtx insn directly here.  Besides,
+ we should have a centralized place to handle the possibility
+ of raising invalid exception.  As the first step, only check
+ operators EQ/GT/GE/UNORDERED/ORDERED/LTGT/UNEQ for now, they
+ are handled equivalently as before.
+
+ FIXME: Handle the remaining vector float comparison operators
+ here.  */
+  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
+  && (rcode == EQ
+ || rcode == GT
+ || rcode == GE
+ || rcode == UNORDERED
+ || rcode == ORDERED
+ || rcode == LTGT
+ || rcode == UNEQ))
+{
+  mask = gen_reg_rtx (dmode);
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
+  return mask;
+}
 
   /* See if the comparison works as is.  */
   mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
   if (mask)
 return mask;
 
+  bool swap_operands = false;
+  bool try_again = false;
+
   switch (rcode)
 {
 case LT:
@@ -15791,7 +15806,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   if (swap_operands)
std::swap (op0, op1);
 
-  mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
+  mask = rs6000_emit_vector_compare (rcode, op0, op1, dmode);
   if (mask)
return mask;
 }
-- 
2.27.0



[PATCH 9/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5

2022-11-24 Thread Kewen Lin via Gcc-patches
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 5, it's to refactor all the handlings of vector
integer comparison to make it neat.  This patch doesn't
introduce any functionality change.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refactor the
handlings of vector integer comparison.
---
 gcc/config/rs6000/rs6000.cc | 68 -
 1 file changed, 44 insertions(+), 24 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a3645e321a7..b694080840a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15662,34 +15662,54 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   return mask;
 }
 
-  /* For any of vector integer comparison operators for which we
- have direct hardware instructions, just emit it directly
- here.  */
-  if (rcode == EQ || rcode == GT || rcode == GTU)
-emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
-  else if (rcode == LT || rcode == LTU)
+  bool swap_operands = false;
+  bool need_invert = false;
+  enum rtx_code code = rcode;
+
+  switch (rcode)
 {
+case EQ:
+case GT:
+case GTU:
+  /* Emit directly with native hardware insn.  */
+  break;
+case LT:
+case LTU:
   /* lt{,u}(a,b) = gt{,u}(b,a)  */
-  enum rtx_code code = swap_condition (rcode);
-  std::swap (op0, op1);
-  mask = gen_reg_rtx (dmode);
-  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
+  code = swap_condition (rcode);
+  swap_operands = true;
+  break;
+case NE:
+case LE:
+case LEU:
+  /* ne(a,b) = ~eq(a,b); le{,u}(a,b) = ~gt{,u}(a,b)  */
+  code = reverse_condition (rcode);
+  need_invert = true;
+  break;
+case GE:
+  /* ge(a,b) = ~gt(b,a)  */
+  code = GT;
+  swap_operands = true;
+  need_invert = true;
+  break;
+case GEU:
+  /* geu(a,b) = ~gtu(b,a)  */
+  code = GTU;
+  swap_operands = true;
+  need_invert = true;
+  break;
+default:
+  gcc_unreachable ();
+  break;
 }
-  else if (rcode == NE || rcode == LE || rcode == LEU)
+
+  if (swap_operands)
+std::swap (op0, op1);
+
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
+
+  if (need_invert)
 {
-  /* ne(a,b) = ~eq(a,b); le{,u}(a,b) = ~gt{,u}(a,b)  */
-  enum rtx_code code = reverse_condition (rcode);
-  mask = gen_reg_rtx (dmode);
-  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
-  enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
-  gcc_assert (nor_code != CODE_FOR_nothing);
-  emit_insn (GEN_FCN (nor_code) (mask, mask));
-} else {
-  /* ge{,u}(a,b) = ~gt{,u}(b,a)  */
-  gcc_assert (rcode == GE || rcode == GEU);
-  enum rtx_code code = rcode == GE ? GT : GTU;
-  mask = gen_reg_rtx (dmode);
-  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
   enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
   gcc_assert (nor_code != CODE_FOR_nothing);
   emit_insn (GEN_FCN (nor_code) (mask, mask));
-- 
2.27.0



[PATCH 6/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2

2022-11-24 Thread Kewen Lin via Gcc-patches
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 2, it's to refactor the handlings on LT and LTU.
This patch doesn't introduce any functionality change.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refine the
handlings for operators LT and LTU.
---
 gcc/config/rs6000/rs6000.cc | 32 +---
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 0a5826800c5..c1aebbb5c03 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15672,22 +15672,18 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
   return mask;
 }
-
-  bool swap_operands = false;
-  bool try_again = false;
+  else if (rcode == LT || rcode == LTU)
+{
+  /* lt{,u}(a,b) = gt{,u}(b,a)  */
+  enum rtx_code code = swap_condition (rcode);
+  std::swap (op0, op1);
+  mask = gen_reg_rtx (dmode);
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
+  return mask;
+}
 
   switch (rcode)
 {
-case LT:
-  rcode = GT;
-  swap_operands = true;
-  try_again = true;
-  break;
-case LTU:
-  rcode = GTU;
-  swap_operands = true;
-  try_again = true;
-  break;
 case NE:
   /* Invert condition and try again.
 e.g., A != B becomes ~(A==B).  */
@@ -15761,16 +15757,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   return NULL_RTX;
 }
 
-  if (try_again)
-{
-  if (swap_operands)
-   std::swap (op0, op1);
-
-  mask = rs6000_emit_vector_compare (rcode, op0, op1, dmode);
-  if (mask)
-   return mask;
-}
-
   /* You only get two chances.  */
   return NULL_RTX;
 }
-- 
2.27.0



[PATCH 3/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3

2022-11-24 Thread Kewen Lin via Gcc-patches
All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is part 3, it further checks for comparison opeators
LE/UNGT.  In rs6000_emit_vector_compare, UNGT is handled
with reversed code LE and inverting with one_cmpl_optab,
LE is handled with LT ior EQ, while in vector.md, we have
the support:

; le(a,b)   = ge(b,a)
; ungt(a,b) = ~le(a,b)

The associated test case shows it's an improvement.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Emit rtx
comparison for operators LE/UNGT of MODE_VECTOR_FLOAT directly.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/vcond-fp.c: New test.
---
 gcc/config/rs6000/rs6000.cc |  9 
 gcc/testsuite/gcc.target/powerpc/vcond-fp.c | 25 +
 2 files changed, 29 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 09299bef6a2..98754805bd2 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15682,15 +15682,15 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
  of raising invalid exception.  For EQ/GT/GE/UNORDERED/
  ORDERED/LTGT/UNEQ, they are handled equivalently as before;
  for NE/UNLE/UNLT, they are handled with reversed code
- and inverting, it's the same as before.
+ and inverting, it's the same as before; for LE/UNGT, they
+ are handled with LE ior EQ previously, emitting directly
+ here will make use of GE later, it's slightly better;
 
  FIXME: Handle the remaining vector float comparison operators
  here.  */
   if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
   && rcode != LT
-  && rcode != LE
-  && rcode != UNGE
-  && rcode != UNGT)
+  && rcode != UNGE)
 {
   mask = gen_reg_rtx (dmode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
@@ -15719,7 +15719,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   break;
 case NE:
 case UNGE:
-case UNGT:
   /* Invert condition and try again.
 e.g., A != B becomes ~(A==B).  */
   {
diff --git a/gcc/testsuite/gcc.target/powerpc/vcond-fp.c 
b/gcc/testsuite/gcc.target/powerpc/vcond-fp.c
new file mode 100644
index 000..b71861d3588
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vcond-fp.c
@@ -0,0 +1,25 @@
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -mpower8-vector" } 
*/
+
+/* Test we use xvcmpge[sd]p rather than xvcmpeq[sd]p and xvcmpgt[sd]p
+   for UNGT and LE handlings.  */
+
+#define UNGT(a, b) (!__builtin_islessequal ((a), (b)))
+#define LE(a, b) (((a) <= (b)))
+
+#define TEST_VECT(NAME, TYPE)  
\
+  __attribute__ ((noipa)) void test_##NAME##_##TYPE (TYPE *x, TYPE *y, 
\
+int *res, int n)  \
+  {
\
+for (int i = 0; i < n; i++)
\
+  res[i] = NAME (x[i], y[i]);  
\
+  }
+
+#define TEST(TYPE) 
\
+  TEST_VECT (UNGT, TYPE)   
\
+  TEST_VECT (LE, TYPE)
+
+TEST (float)
+TEST (double)
+
+/* { dg-final { scan-assembler-not {\mxvcmp(gt|eq)[sd]p\M} } } */
-- 
2.27.0



[PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare

2022-11-24 Thread Kewen Lin via Gcc-patches
Hi,

Following Segher's suggestion, this patch series is to rework
function rs6000_emit_vector_compare for vector float and int
in multiple steps, it's based on the previous attempts [1][2].
As mentioned in [1], the need to rework this for float is to
make a centralized place for vector float comparison handlings
instead of supporting with swapping ops and reversing code etc.
dispersedly.  It's also for a subsequent patch to handle
comparison operators with or without trapping math (PR105480).
With the handling on vector float reworked, we can further make
the handling on vector int simplified as shown.

For Segher's concern about whether this rework causes any
assembly change, I constructed two testcases for vector float[3]
and int[4] respectively before, it showed the most are fine
excepting for the difference on LE and UNGT, it's demonstrated
as improvement since it uses GE instead of GT ior EQ.  The
associated test case in patch 3/9 is a good example.

Besides, w/ and w/o the whole patch series, I built the whole
SPEC2017 at options -O3 and -Ofast separately, checked the
differences on object assembly.  The result showed that the
most are unchanged, except for:

  * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
9 object files with differences.

  * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
one and 527.cam4_r has 4 object files with differences.

By looking into these differences, all significant differences
are caused by the known improvement mentined above transforming
GT ior EQ to GE, which can also affect unrolling decision due
to insn count.  Some other trivial differences are branch
target offset difference, nop difference for alignment, vsx
register number differences etc.

I also evaluated the runtime performance for these changed
benchmarks, the result is neutral.

These patches are bootstrapped and regress-tested
incrementally on powerpc64-linux-gnu P7 & P8, and
powerpc64le-linux-gnu P9 & P10.

Is it ok for trunk?

BR,
Kewen
-
[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
[4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html

Kewen Lin (9):
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5

 gcc/config/rs6000/rs6000.cc | 180 ++--
 gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
 2 files changed, 74 insertions(+), 131 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c

-- 
2.27.0



[PATCH 11/15] csky: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/csky/csky.md (*cskyv2_adddi3, *ck801_adddi3, *cskyv2_adddi1_1,
*cskyv2_subdi3, *ck801_subdi3, *cskyv2_subdi1_1, cskyv2_addcc,
cskyv2_addcc_invert, *cskyv2_anddi3, *ck801_anddi3, *cskyv2_iordi3,
*ck801_iordi3, *cskyv2_xordi3, *ck801_xordi3,): Fix split condition.
---
 gcc/config/csky/csky.md | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/config/csky/csky.md b/gcc/config/csky/csky.md
index f91d851cb2c..54143a0efea 100644
--- a/gcc/config/csky/csky.md
+++ b/gcc/config/csky/csky.md
@@ -850,7 +850,7 @@ (define_insn_and_split "*cskyv2_adddi3"
(clobber (reg:CC CSKY_CC_REGNUM))]
   "CSKY_ISA_FEATURE (E2)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -877,7 +877,7 @@ (define_insn_and_split "*ck801_adddi3"
(clobber (reg:CC CSKY_CC_REGNUM))]
   "CSKY_ISA_FEATURE (E1)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -906,7 +906,7 @@ (define_insn_and_split "*cskyv2_adddi1_1"
(clobber (reg:CC CSKY_CC_REGNUM))]
   "CSKY_ISA_FEATURE (E2)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1048,7 +1048,7 @@ (define_insn_and_split "*cskyv2_subdi3"
(clobber (reg:CC CSKY_CC_REGNUM))]
   "CSKY_ISA_FEATURE (E2)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1075,7 +1075,7 @@ (define_insn_and_split "*ck801_subdi3"
(clobber (reg:CC CSKY_CC_REGNUM))]
   "CSKY_ISA_FEATURE (E1)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1104,7 +1104,7 @@ (define_insn_and_split "*cskyv2_subdi1_1"
(clobber (reg:CC CSKY_CC_REGNUM))]
   "CSKY_ISA_FEATURE (E2)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1276,7 +1276,7 @@ (define_insn_and_split "cskyv2_addcc"
dect\t%0, %1, %M2
#
#"
-  "reload_completed && !rtx_equal_p (operands[0], operands[1])"
+  "&& reload_completed && !rtx_equal_p (operands[0], operands[1])"
   [(set (match_dup 0)
(if_then_else:SI (ne (reg:CC CSKY_CC_REGNUM) (const_int 0))
 (plus:SI (match_dup 0) (match_dup 2]
@@ -1302,7 +1302,7 @@ (define_insn_and_split "cskyv2_addcc_invert"
decf\t%0, %1, %M2
#
#"
-  "reload_completed && !rtx_equal_p (operands[0], operands[1])"
+  "&& reload_completed && !rtx_equal_p (operands[0], operands[1])"
   [(set (match_dup 0)
(if_then_else:SI (eq (reg:CC CSKY_CC_REGNUM) (const_int 0))
 (plus:SI (match_dup 0) (match_dup 2]
@@ -1691,7 +1691,7 @@ (define_insn_and_split "*cskyv2_anddi3"
(match_operand:DI 2 "register_operand" "b,r")))]
   "CSKY_ISA_FEATURE (E2)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1716,7 +1716,7 @@ (define_insn_and_split "*ck801_anddi3"
   (match_operand:DI 2 "register_operand" "r")))]
   "CSKY_ISA_FEATURE (E1)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1801,7 +1801,7 @@ (define_insn_and_split "*cskyv2_iordi3"
(match_operand:DI 2 "register_operand" "b,  r")))]
   "CSKY_ISA_FEATURE (E2)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1826,7 +1826,7 @@ (define_insn_and_split "*ck801_iordi3"
(match_operand:DI 2 "register_operand" "r")))]
   "CSKY_ISA_FEATURE (E1)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1911,7 +1911,7 @@ (define_insn_and_split "*cskyv2_xordi3"
(match_operand:DI 2 "register_operand" "b,  r")))]
   "CSKY_ISA_FEATURE (E2)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
@@ -1936,7 +1936,7 @@ (define_insn_and_split "*ck801_xordi3"
(match_operand:DI 2 "register_operand" "r")))]
   "CSKY_ISA_FEATURE (E1)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
   {
 int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD;
-- 
2.27.0



[PATCH 15/15] sh: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/sh/sh.md (call_pcrel, call_value_pcrel, sibcall_pcrel,
sibcall_value_pcrel): Fix split condition.
---
 gcc/config/sh/sh.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 93ee7c9a7de..1bb325c7044 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -6566,7 +6566,7 @@ (define_insn_and_split "call_pcrel"
(clobber (match_scratch:SI 2 "="))]
   "TARGET_SH2"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   rtx lab = PATTERN (gen_call_site ());
@@ -6678,7 +6678,7 @@ (define_insn_and_split "call_value_pcrel"
(clobber (match_scratch:SI 3 "="))]
   "TARGET_SH2"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   rtx lab = PATTERN (gen_call_site ());
@@ -6877,7 +6877,7 @@ (define_insn_and_split "sibcall_pcrel"
(return)]
   "TARGET_SH2 && !TARGET_FDPIC"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   rtx lab = PATTERN (gen_call_site ());
@@ -7043,7 +7043,7 @@ (define_insn_and_split "sibcall_value_pcrel"
(return)]
   "TARGET_SH2 && !TARGET_FDPIC"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   rtx lab = PATTERN (gen_call_site ());
-- 
2.27.0



[PATCH 14/15] mips: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/mips/mips.md (*udivmod4, udivmod4_mips16): Fix
split condition.
---
 gcc/config/mips/mips.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 455b9b802f6..4efb7503df3 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -2961,7 +2961,7 @@ (define_insn_and_split "*udivmod4"
  (match_dup 2)))]
   "ISA_HAS_DIV && !TARGET_MIPS16"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   emit_insn (gen_udivmod4_split (operands[3], operands[1], operands[2]));
@@ -2982,7 +2982,7 @@ (define_insn_and_split "udivmod4_mips16"
(clobber (match_operand:GPR 4 "lo_operand" "=l"))]
   "ISA_HAS_DIV && TARGET_MIPS16"
   "#"
-  "cse_not_expected"
+  "&& cse_not_expected"
   [(const_int 0)]
 {
   emit_insn (gen_udivmod4_split (operands[3], operands[1], operands[2]));
-- 
2.27.0



[PATCH 12/15] i386: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/i386/i386.md (*add3_doubleword, *addv4_doubleword,
*addv4_doubleword_1, *sub3_doubleword,
*subv4_doubleword, *subv4_doubleword_1,
*add3_doubleword_cc_overflow_1, *divmodsi4_const,
*neg2_doubleword, *tls_dynamic_gnu2_combine_64_): Fix split
condition.
---
 gcc/config/i386/i386.md | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6eb9de81921..2bd09e502ae 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -5491,7 +5491,7 @@ (define_insn_and_split "*add3_doubleword"
(clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (PLUS, mode, operands)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
   (compare:CCC
 (plus:DWIH (match_dup 1) (match_dup 2))
@@ -6300,7 +6300,7 @@ (define_insn_and_split "*addv4_doubleword"
(plus: (match_dup 1) (match_dup 2)))]
   "ix86_binary_operator_ok (PLUS, mode, operands)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
   (compare:CCC
 (plus:DWIH (match_dup 1) (match_dup 2))
@@ -6347,7 +6347,7 @@ (define_insn_and_split "*addv4_doubleword_1"
&& CONST_SCALAR_INT_P (operands[2])
&& rtx_equal_p (operands[2], operands[3])"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
   (compare:CCC
 (plus:DWIH (match_dup 1) (match_dup 2))
@@ -6641,7 +6641,7 @@ (define_insn_and_split "*sub3_doubleword"
(clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (MINUS, mode, operands)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
   (compare:CC (match_dup 1) (match_dup 2)))
  (set (match_dup 0)
@@ -6817,7 +6817,7 @@ (define_insn_and_split "*subv4_doubleword"
(minus: (match_dup 1) (match_dup 2)))]
   "ix86_binary_operator_ok (MINUS, mode, operands)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
   (compare:CC (match_dup 1) (match_dup 2)))
  (set (match_dup 0)
@@ -6862,7 +6862,7 @@ (define_insn_and_split "*subv4_doubleword_1"
&& CONST_SCALAR_INT_P (operands[2])
&& rtx_equal_p (operands[2], operands[3])"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
   (compare:CC (match_dup 1) (match_dup 2)))
  (set (match_dup 0)
@@ -7542,7 +7542,7 @@ (define_insn_and_split 
"*add3_doubleword_cc_overflow_1"
(plus: (match_dup 1) (match_dup 2)))]
   "ix86_binary_operator_ok (PLUS, mode, operands)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
   (compare:CCC
 (plus:DWIH (match_dup 1) (match_dup 2))
@@ -9000,7 +9000,7 @@ (define_insn_and_split "*divmodsi4_const"
(clobber (reg:CC FLAGS_REG))]
   "!optimize_function_for_size_p (cfun)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 1) (match_dup 4))
(parallel [(set (match_dup 0)
@@ -10515,7 +10515,7 @@ (define_insn_and_split "*neg2_doubleword"
(clobber (reg:CC FLAGS_REG))]
   "ix86_unary_operator_ok (NEG, mode, operands)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel
 [(set (reg:CCC FLAGS_REG)
  (ne:CCC (match_dup 1) (const_int 0)))
@@ -16898,7 +16898,7 @@ (define_insn_and_split 
"*tls_dynamic_gnu2_combine_64_"
(clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && TARGET_GNU2_TLS"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0) (match_dup 4))]
 {
   operands[4] = can_create_pseudo_p () ? gen_reg_rtx (ptr_mode) : operands[0];
-- 
2.27.0



[PATCH 13/15] ia64: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/ia64/vect.md (*vec_extractv2sf_0_le, *vec_extractv2sf_0_be):
Fix split condition.
---
 gcc/config/ia64/vect.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/ia64/vect.md b/gcc/config/ia64/vect.md
index 1a2452289b7..0f3a406d620 100644
--- a/gcc/config/ia64/vect.md
+++ b/gcc/config/ia64/vect.md
@@ -1422,7 +1422,7 @@ (define_insn_and_split "*vec_extractv2sf_0_le"
   UNSPEC_VECT_EXTR))]
   "!TARGET_BIG_ENDIAN"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
 {
   if (REG_P (operands[1]) && FR_REGNO_P (REGNO (operands[1])))
@@ -1440,7 +1440,7 @@ (define_insn_and_split "*vec_extractv2sf_0_be"
   UNSPEC_VECT_EXTR))]
   "TARGET_BIG_ENDIAN"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]
 {
   if (MEM_P (operands[1]))
-- 
2.27.0



[PATCH 10/15] bfin: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/bfin/bfin.md (movdi_insn, movdf_insn): Fix split condition.
---
 gcc/config/bfin/bfin.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/bfin/bfin.md b/gcc/config/bfin/bfin.md
index fd65f4d9e63..41a50974136 100644
--- a/gcc/config/bfin/bfin.md
+++ b/gcc/config/bfin/bfin.md
@@ -506,7 +506,7 @@ (define_insn_and_split "movdi_insn"
(match_operand:DI 1 "general_operand" "iFx,r,mx"))]
   "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) == REG"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 2) (match_dup 3))
(set (match_dup 4) (match_dup 5))]
 {
@@ -718,7 +718,7 @@ (define_insn_and_split "movdf_insn"
(match_operand:DF 1 "general_operand" "iFx,r,mx"))]
   "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) == REG"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 2) (match_dup 3))
(set (match_dup 4) (match_dup 5))]
 {
-- 
2.27.0



[PATCH 09/15] arm: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix one non-robust split condition, to make
it applied on top of the corresponding condition for define_insn
part, otherwise the splitting could perform unexpectedly.

gcc/ChangeLog:

* config/arm/arm.md (*minmax_arithsi_non_canon): Fix split condition.
---
 gcc/config/arm/arm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 4adc976b8b6..9a27d421484 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4198,7 +4198,7 @@ (define_insn_and_split "*minmax_arithsi_non_canon"
   "TARGET_32BIT && !arm_eliminable_register (operands[1])
&& !(arm_restrict_it && CONST_INT_P (operands[3]))"
   "#"
-  "TARGET_32BIT && !arm_eliminable_register (operands[1]) && reload_completed"
+  "&& reload_completed"
   [(set (reg:CC CC_REGNUM)
 (compare:CC (match_dup 2) (match_dup 3)))
 
-- 
2.27.0



[PATCH 08/15] alpha: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/alpha/alpha.md (*movtf_internal, *movti_internal): Fix split
condition.
---
 gcc/config/alpha/alpha.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 98d09d43721..87617afd0c6 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -3830,7 +3830,7 @@ (define_insn_and_split "*movtf_internal"
   "register_operand (operands[0], TFmode)
|| reg_or_0_operand (operands[1], TFmode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 1) (match_dup 3))]
   "alpha_split_tmode_pair (operands, TFmode, true);")
@@ -4091,7 +4091,7 @@ (define_insn_and_split "*movti_internal"
 && ! CONSTANT_P (operands[1]))
|| reg_or_0_operand (operands[1], TImode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 1) (match_dup 3))]
   "alpha_split_tmode_pair (operands, TImode, true);")
-- 
2.27.0



[PATCH 06/15] visium: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/visium/visium.md (*add3_insn, *addsi3_insn, *addi3_insn,
*sub3_insn, *subsi3_insn, *subdi3_insn, *neg2_insn,
*negdi2_insn, *and3_insn, *ior3_insn, *xor3_insn,
*one_cmpl2_insn, *ashl3_insn, *ashr3_insn,
*lshr3_insn, *trunchiqi2_insn, *truncsihi2_insn,
*truncdisi2_insn, *extendqihi2_insn, *extendqisi2_insn,
*extendhisi2_insn, *extendsidi2_insn, *zero_extendqihi2_insn,
*zero_extendqisi2_insn, *zero_extendsidi2_insn): Fix split condition.
---
 gcc/config/visium/visium.md | 50 ++---
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/gcc/config/visium/visium.md b/gcc/config/visium/visium.md
index 83ccf088124..ca2234bf253 100644
--- a/gcc/config/visium/visium.md
+++ b/gcc/config/visium/visium.md
@@ -792,7 +792,7 @@ (define_insn_and_split "*add3_insn"
  (match_operand:QHI 2 "register_operand" "r")))]
   "ok_for_simple_arith_logic_operands (operands, mode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0)
   (plus:QHI (match_dup 1) (match_dup 2)))
  (clobber (reg:CC R_FLAGS))])]
@@ -850,7 +850,7 @@ (define_insn_and_split "*addsi3_insn"
 (match_operand:SI 2 "add_operand"  " L,r,J")))]
   "ok_for_simple_arith_logic_operands (operands, SImode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0)
   (plus:SI (match_dup 1) (match_dup 2)))
  (clobber (reg:CC R_FLAGS))])]
@@ -912,7 +912,7 @@ (define_insn_and_split "*addi3_insn"
 (match_operand:DI 2 "add_operand"  " L,J, r")))]
   "ok_for_simple_arith_logic_operands (operands, DImode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   visium_split_double_add (PLUS, operands[0], operands[1], operands[2]);
@@ -1007,7 +1007,7 @@ (define_insn_and_split "*sub3_insn"
   (match_operand:QHI 2 "register_operand" "r")))]
   "ok_for_simple_arith_logic_operands (operands, mode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0)
   (minus:QHI (match_dup 1) (match_dup 2)))
  (clobber (reg:CC R_FLAGS))])]
@@ -1064,7 +1064,7 @@ (define_insn_and_split "*subsi3_insn"
  (match_operand:SI 2 "add_operand"  " L,r, J")))]
   "ok_for_simple_arith_logic_operands (operands, SImode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0)
   (minus:SI (match_dup 1) (match_dup 2)))
  (clobber (reg:CC R_FLAGS))])]
@@ -1125,7 +1125,7 @@ (define_insn_and_split "*subdi3_insn"
  (match_operand:DI 2 "add_operand"  " L,J, r")))]
   "ok_for_simple_arith_logic_operands (operands, DImode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   visium_split_double_add (MINUS, operands[0], operands[1], operands[2]);
@@ -1209,7 +1209,7 @@ (define_insn_and_split "*neg2_insn"
(neg:I (match_operand:I 1 "register_operand" "r")))]
   "ok_for_simple_arith_logic_operands (operands, mode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0) (neg:I (match_dup 1)))
  (clobber (reg:CC R_FLAGS))])]
   ""
@@ -1253,7 +1253,7 @@ (define_insn_and_split "*negdi2_insn"
(neg:DI (match_operand:DI 1 "register_operand" "r")))]
   "ok_for_simple_arith_logic_operands (operands, DImode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   visium_split_double_add (MINUS, operands[0], const0_rtx, operands[1]);
@@ -1415,7 +1415,7 @@ (define_insn_and_split "*and3_insn"
   (match_operand:I 2 "register_operand" "r")))]
   "ok_for_simple_arith_logic_operands (operands, mode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0)
   (and:I (match_dup 1) (match_dup 2)))
  (clobber (reg:CC R_FLAGS))])]
@@ -1453,7 +1453,7 @@ (define_insn_and_split "*ior3_insn"
   (match_operand:I 2 "register_operand" "r")))]
   "ok_for_simple_arith_logic_operands (operands, mode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0)
   (ior:I (match_dup 1) (match_dup 2)))
  (clobber (reg:CC R_FLAGS))])]
@@ -1491,7 +1491,7 @@ (define_insn_and_split "*xor3_insn"
   (match_operand:I 2 "register_operand" "r")))]
   "ok_for_simple_arith_logic_operands (operands, mode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(parallel [(set (match_dup 0)
   (xor:I (match_dup 1) (match_dup 2)))
  (clobber (reg:CC 

[PATCH 07/15] xtensa: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/xtensa/xtensa.md (movdi_internal, movdf_internal): Fix split
condition.
---
 gcc/config/xtensa/xtensa.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index cdf22f14b94..e0bf720d6e0 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -779,7 +779,7 @@ (define_insn_and_split "movdi_internal"
   "register_operand (operands[0], DImode)
|| register_operand (operands[1], DImode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 1) (match_dup 3))]
 {
@@ -1053,7 +1053,7 @@ (define_insn_and_split "movdf_internal"
   "register_operand (operands[0], DFmode)
|| register_operand (operands[1], DFmode)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 1) (match_dup 3))]
 {
-- 
2.27.0



[PATCH 05/15] v850: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/v850/v850.md (cbranchsf4, cbranchdf4, *movsicc_normal,
*movsicc_reversed): Fix split condition.
---
 gcc/config/v850/v850.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/v850/v850.md b/gcc/config/v850/v850.md
index 872f17913de..d4a953c6bdb 100644
--- a/gcc/config/v850/v850.md
+++ b/gcc/config/v850/v850.md
@@ -374,7 +374,7 @@ (define_insn_and_split "cbranchsf4"
  (pc)))]
   "TARGET_USE_FPU"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 4) (match_dup 5))
(set (pc)
 (if_then_else (match_dup 6)
@@ -428,7 +428,7 @@ (define_insn_and_split "cbranchdf4"
  (pc)))]
   "TARGET_USE_FPU"
   "#"
-  "reload_completed"
+  "&& reload_completed"
 ;; How to get the mode here?
   [(set (match_dup 4) (match_dup 5))
(set (pc)
@@ -1210,7 +1210,7 @@ (define_insn_and_split "*movsicc_normal"
 (match_operand:SI 3 "reg_or_0_operand" "rI")))]
   "(TARGET_V850E_UP)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (reg:CC CC_REGNUM)
(compare:CC (match_dup 4) (match_dup 5)))
(set (match_dup 0)
@@ -1229,7 +1229,7 @@ (define_insn_and_split "*movsicc_reversed"
 (match_operand:SI 3 "reg_or_0_operand" "rJ")))]
   "(TARGET_V850E_UP)"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (reg:CC CC_REGNUM)
(compare:CC (match_dup 4) (match_dup 5)))
(set (match_dup 0)
-- 
2.27.0



[PATCH 04/15] s390: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/s390/s390.md (*cstorecc_z13): Fix split condition.
* config/s390/vector.md (fprx2_to_tf, tf_to_fprx2): Likewise.
---
 gcc/config/s390/s390.md   | 2 +-
 gcc/config/s390/vector.md | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 4debdcd1247..1d66c30b9d5 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -6941,7 +6941,7 @@ (define_insn_and_split "*cstorecc_z13"
 (match_operand 3 "const_int_operand"  "")]))]
   "TARGET_Z13"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (match_dup 0) (const_int 0))
(set (match_dup 0)
(if_then_else:GPR
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 1ed1d0665d4..8aa4e82c28d 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -641,7 +641,7 @@ (define_insn_and_split "fprx2_to_tf"
   "@
vmrhg\t%v0,%1,%N1
#"
-  "!(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))"
+  "&& !(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))"
   [(set (match_dup 2) (match_dup 3))
(set (match_dup 4) (match_dup 5))]
 {
@@ -916,7 +916,7 @@ (define_insn_and_split "tf_to_fprx2"
(subreg:FPRX2 (match_operand:TF 1 "general_operand"   "v,AR") 0))]
   "TARGET_VXE"
   "#"
-  "!(MEM_P (operands[1]) && MEM_VOLATILE_P (operands[1]))"
+  "&& !(MEM_P (operands[1]) && MEM_VOLATILE_P (operands[1]))"
   [(set (match_dup 2) (match_dup 3))
(set (match_dup 4) (match_dup 5))]
 {
-- 
2.27.0



[PATCH 02/15] m32c: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/m32c/cond.md (stzx_reversed_, movhicc__):
Fix split condition.
---
 gcc/config/m32c/cond.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/m32c/cond.md b/gcc/config/m32c/cond.md
index b80b10320fb..ce6493fc9f6 100644
--- a/gcc/config/m32c/cond.md
+++ b/gcc/config/m32c/cond.md
@@ -106,7 +106,7 @@ (define_insn_and_split "stzx_reversed_"
 (match_operand:QHI 2 "const_int_operand" "")))]
   "(TARGET_A24 || GET_MODE (operands[0]) == QImode) && reload_completed"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0)
(if_then_else:QHI (eq (reg:CC FLG_REGNO) (const_int 0))
  (match_dup 2)
@@ -230,7 +230,7 @@ (define_insn_and_split "movhicc__"
  (match_operand:HI 4 "const_int_operand" "")))]
   "TARGET_A24"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(set (reg:CC FLG_REGNO)
(compare (match_dup 1)
 (match_dup 2)))
-- 
2.27.0



[PATCH 03/15] rx: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix one non-robust split condition, to make
it applied on top of the corresponding condition for define_insn
part, otherwise the splitting could perform unexpectedly.

gcc/ChangeLog:

* config/rx/rx.md (cstoresf4): Fix split condition.
---
 gcc/config/rx/rx.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rx/rx.md b/gcc/config/rx/rx.md
index b76fce97bdc..c5297685a38 100644
--- a/gcc/config/rx/rx.md
+++ b/gcc/config/rx/rx.md
@@ -714,7 +714,7 @@ (define_insn_and_split "cstoresf4"
  (match_operand:SF 3 "rx_source_operand" "rFQ")]))]
   "ALLOW_RX_FPU_INSNS"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(const_int 0)]
 {
   rtx flags, x;
-- 
2.27.0



[PATCH 00/15] Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
Hi,

This trivial patch series is the secondary product from the previous
investigation to see how many define_insn_and_split cases where
split_condition isn't applied on top of condition for define_insn
part and doesn't contain it, when there were some discussions on
whether we should warn for empty split condition or join both
conditions implicitly etc.  (See the threads[1][2]).

For some of investigated define_insn_and_splits, the corresponding
split_condition is suspected not robust, especially the split
condition has only reload_complete.  Lacking of good understanding
on the related port and the context of the code, I could be wrong.
But I think it may be a good idea to raise them and get them either
fixed or clarified.  It would be also good as preparation for the
possible conditions joining in future.  For some ports with the
proposed fixes applied, the split conditions in all
define_insn_and_splits will either have the explicit leading "&&" or
fully contain the condition for define_insn part.  In other words,
the implicit conditions joining would be one nop for this kind of
port, we don't need any other checks/fixes for it.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571647.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572120.html

BR,
Kewen
-

Kewen Lin (15):
  frv: Fix non-robust split condition in define_insn_and_split
  m32c: Fix non-robust split condition in define_insn_and_split
  rx: Fix non-robust split condition in define_insn_and_split
  s390: Fix non-robust split condition in define_insn_and_split
  v850: Fix non-robust split condition in define_insn_and_split
  visium: Fix non-robust split condition in define_insn_and_split
  xtensa: Fix non-robust split condition in define_insn_and_split
  alpha: Fix non-robust split condition in define_insn_and_split
  arm: Fix non-robust split condition in define_insn_and_split
  bfin: Fix non-robust split condition in define_insn_and_split
  csky: Fix non-robust split condition in define_insn_and_split
  i386: Fix non-robust split condition in define_insn_and_split
  ia64: Fix non-robust split condition in define_insn_and_split
  mips: Fix non-robust split condition in define_insn_and_split
  sh: Fix non-robust split condition in define_insn_and_split

 gcc/config/alpha/alpha.md   |  4 +--
 gcc/config/arm/arm.md   |  2 +-
 gcc/config/bfin/bfin.md |  4 +--
 gcc/config/csky/csky.md | 28 ++---
 gcc/config/frv/frv.md   | 10 
 gcc/config/i386/i386.md | 20 +++
 gcc/config/ia64/vect.md |  4 +--
 gcc/config/m32c/cond.md |  4 +--
 gcc/config/mips/mips.md |  4 +--
 gcc/config/rx/rx.md |  2 +-
 gcc/config/s390/s390.md |  2 +-
 gcc/config/s390/vector.md   |  4 +--
 gcc/config/sh/sh.md |  8 +++---
 gcc/config/v850/v850.md |  8 +++---
 gcc/config/visium/visium.md | 50 ++---
 gcc/config/xtensa/xtensa.md |  4 +--
 16 files changed, 79 insertions(+), 79 deletions(-)

-- 
2.27.0



[PATCH 01/15] frv: Fix non-robust split condition in define_insn_and_split

2021-11-11 Thread Kewen Lin via Gcc-patches
This patch is to fix some non-robust split conditions in some
define_insn_and_splits, to make each of them applied on top of
the corresponding condition for define_insn part, otherwise the
splitting could perform unexpectedly.

gcc/ChangeLog:

* config/frv/frv.md (*abssi2_internal, *minmax_si_signed,
*minmax_si_unsigned, *minmax_sf, *minmax_df): Fix split condition.
---
 gcc/config/frv/frv.md | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/frv/frv.md b/gcc/config/frv/frv.md
index a2aa1b2d2ac..fea6dedc53d 100644
--- a/gcc/config/frv/frv.md
+++ b/gcc/config/frv/frv.md
@@ -4676,7 +4676,7 @@ (define_insn_and_split "*abssi2_internal"
(clobber (match_operand:CC_CCR 3 "icr_operand" "=v,v"))]
   "TARGET_COND_MOVE"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(match_dup 4)]
   "operands[4] = frv_split_abs (operands);"
   [(set_attr "length" "12,16")
@@ -4717,7 +4717,7 @@ (define_insn_and_split "*minmax_si_signed"
(clobber (match_operand:CC_CCR 5 "icr_operand" "=v,v,v"))]
   "TARGET_COND_MOVE"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(match_dup 6)]
   "operands[6] = frv_split_minmax (operands);"
   [(set_attr "length" "12,12,16")
@@ -4758,7 +4758,7 @@ (define_insn_and_split "*minmax_si_unsigned"
(clobber (match_operand:CC_CCR 5 "icr_operand" "=v,v,v"))]
   "TARGET_COND_MOVE"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(match_dup 6)]
   "operands[6] = frv_split_minmax (operands);"
   [(set_attr "length" "12,12,16")
@@ -4799,7 +4799,7 @@ (define_insn_and_split "*minmax_sf"
(clobber (match_operand:CC_CCR 5 "fcr_operand" "=w,w,w"))]
   "TARGET_COND_MOVE && TARGET_HARD_FLOAT"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(match_dup 6)]
   "operands[6] = frv_split_minmax (operands);"
   [(set_attr "length" "12,12,16")
@@ -4840,7 +4840,7 @@ (define_insn_and_split "*minmax_df"
(clobber (match_operand:CC_CCR 5 "fcr_operand" "=w,w,w"))]
   "TARGET_COND_MOVE && TARGET_HARD_FLOAT && TARGET_DOUBLE"
   "#"
-  "reload_completed"
+  "&& reload_completed"
   [(match_dup 6)]
   "operands[6] = frv_split_minmax (operands);"
   [(set_attr "length" "12,12,16")
-- 
2.27.0



[PATCH 04/11] cris: Update unexpected empty split condition

2021-06-01 Thread Kewen Lin via Gcc-patches
gcc/ChangeLog:

* config/cris/cris.md (*addi_reload): Fix empty split condition.
---
 gcc/config/cris/cris.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md
index 7de0ec63fcf..d5a3c703a83 100644
--- a/gcc/config/cris/cris.md
+++ b/gcc/config/cris/cris.md
@@ -1311,7 +1311,7 @@ (define_insn_and_split "*addi_reload"
&& (INTVAL (operands[3]) == 2 || INTVAL (operands[3]) == 4)
&& (reload_in_progress || reload_completed)"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0)
(plus:SI (ashift:SI (match_dup 2) (match_dup 3)) (match_dup 1)))]
   "operands[3] = operands[3] == const2_rtx ? const1_rtx : const2_rtx;")
-- 
2.17.1



[PATCH 11/11] sparc: Update unexpected empty split condition

2021-06-01 Thread Kewen Lin via Gcc-patches
gcc/ChangeLog:

* config/sparc/sparc.md (*snedi_zero_vis3,
*neg_snedi_zero_subxc, *plus_snedi_zero,
*plus_plus_snedi_zero, *minus_snedi_zero,
*minus_minus_snedi_zero): Fix empty split condition.
---
 gcc/config/sparc/sparc.md | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index c5d369626cc..0f85cb192c8 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -855,7 +855,7 @@ (define_insn_and_split "*snedi_zero_vis3"
(clobber (reg:CCX CC_REG))]
   "TARGET_ARCH64 && TARGET_VIS3"
   "#"
-  ""
+  "&& 1"
   [(set (reg:CCXC CC_REG) (compare:CCXC (not:DI (match_dup 1)) (const_int -1)))
(set (match_dup 0) (ltu:W (reg:CCXC CC_REG) (const_int 0)))]
   ""
@@ -882,7 +882,7 @@ (define_insn_and_split "*neg_snedi_zero_subxc"
(clobber (reg:CCX CC_REG))]
   "TARGET_ARCH64 && TARGET_SUBXC"
   "#"
-  ""
+  "&& 1"
   [(set (reg:CCXC CC_REG) (compare:CCXC (not:DI (match_dup 1)) (const_int -1)))
(set (match_dup 0) (neg:W (ltu:W (reg:CCXC CC_REG) (const_int 0]
   ""
@@ -984,7 +984,7 @@ (define_insn_and_split "*plus_snedi_zero"
(clobber (reg:CCX CC_REG))]
   "TARGET_ARCH64 && TARGET_VIS3"
   "#"
-  ""
+  "&& 1"
   [(set (reg:CCXC CC_REG) (compare:CCXC (not:DI (match_dup 1)) (const_int -1)))
(set (match_dup 0) (plus:W (ltu:W (reg:CCXC CC_REG) (const_int 0))
  (match_dup 2)))]
@@ -1000,7 +1000,7 @@ (define_insn_and_split "*plus_plus_snedi_zero"
(clobber (reg:CCX CC_REG))]
   "TARGET_ARCH64 && TARGET_VIS3"
   "#"
-  ""
+  "&& 1"
   [(set (reg:CCXC CC_REG) (compare:CCXC (not:DI (match_dup 1)) (const_int -1)))
(set (match_dup 0) (plus:W (plus:W (ltu:W (reg:CCXC CC_REG) (const_int 0))
  (match_dup 2))
@@ -1048,7 +1048,7 @@ (define_insn_and_split "*minus_snedi_zero"
(clobber (reg:CCX CC_REG))]
   "TARGET_ARCH64 && TARGET_SUBXC"
   "#"
-  ""
+  "&& 1"
   [(set (reg:CCXC CC_REG) (compare:CCXC (not:DI (match_dup 1)) (const_int -1)))
(set (match_dup 0) (minus:W (match_dup 2)
   (ltu:W (reg:CCXC CC_REG) (const_int 0]
@@ -1064,7 +1064,7 @@ (define_insn_and_split "*minus_minus_snedi_zero"
(clobber (reg:CCX CC_REG))]
   "TARGET_ARCH64 && TARGET_SUBXC"
   "#"
-  ""
+  "&& 1"
   [(set (reg:CCXC CC_REG) (compare:CCXC (not:DI (match_dup 1)) (const_int -1)))
(set (match_dup 0) (minus:W (minus:W (match_dup 2)
(ltu:W (reg:CCXC CC_REG) (const_int 0)))
-- 
2.17.1



[PATCH 10/11] sh: Update unexpected empty split condition

2021-06-01 Thread Kewen Lin via Gcc-patches
gcc/ChangeLog:

* config/sh/sh.md (doloop_end_split): Fix empty split condition.
---
 gcc/config/sh/sh.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index e3af9ae21c1..93ee7c9a7de 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -6424,7 +6424,7 @@ (define_insn_and_split "doloop_end_split"
(clobber (reg:SI T_REG))]
   "TARGET_SH2"
   "#"
-  ""
+  "&& 1"
   [(parallel [(set (reg:SI T_REG)
   (eq:SI (match_dup 2) (const_int 1)))
  (set (match_dup 0) (plus:SI (match_dup 2) (const_int -1)))])
-- 
2.17.1



[PATCH 09/11] or1k: Update unexpected empty split condition

2021-06-01 Thread Kewen Lin via Gcc-patches
gcc/ChangeLog:

* config/or1k/or1k.md (*movdi): Fix empty split condition.
---
 gcc/config/or1k/or1k.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/or1k/or1k.md b/gcc/config/or1k/or1k.md
index eb94efba0e4..495b3e277ba 100644
--- a/gcc/config/or1k/or1k.md
+++ b/gcc/config/or1k/or1k.md
@@ -351,7 +351,7 @@ (define_insn_and_split "*movdi"
   "register_operand (operands[0], DImode)
|| reg_or_0_operand (operands[1], DImode)"
   "#"
-  ""
+  "&& 1"
   [(const_int 0)]
 {
   rtx l0 = operand_subword (operands[0], 0, 0, DImode);
-- 
2.17.1



[PATCH 08/11] mips: Update unexpected empty split condition

2021-06-01 Thread Kewen Lin via Gcc-patches
gcc/ChangeLog:

* config/mips/mips.md (, bswapsi2, bswapdi2): Fix empty
split condition.
---
 gcc/config/mips/mips.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index eef3cfd50a8..455b9b802f6 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -5835,7 +5835,7 @@ (define_insn_and_split ""
 (match_operand:SI 2 "immediate_operand" "I")))]
   "TARGET_MIPS16"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0) (match_dup 1))
(set (match_dup 0) (lshiftrt:SI (match_dup 0) (match_dup 2)))]
   ""
@@ -5871,7 +5871,7 @@ (define_insn_and_split "bswapsi2"
(bswap:SI (match_operand:SI 1 "register_operand" "d")))]
   "ISA_HAS_WSBH && ISA_HAS_ROR"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0) (unspec:SI [(match_dup 1)] UNSPEC_WSBH))
(set (match_dup 0) (rotatert:SI (match_dup 0) (const_int 16)))]
   ""
@@ -5882,7 +5882,7 @@ (define_insn_and_split "bswapdi2"
(bswap:DI (match_operand:DI 1 "register_operand" "d")))]
   "TARGET_64BIT && ISA_HAS_WSBH"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0) (unspec:DI [(match_dup 1)] UNSPEC_DSBH))
(set (match_dup 0) (unspec:DI [(match_dup 0)] UNSPEC_DSHD))]
   ""
-- 
2.17.1



[PATCH 05/11] h8300: Update unexpected empty split condition

2021-06-01 Thread Kewen Lin via Gcc-patches
gcc/ChangeLog:

* config/h8300/combiner.md (*andsi3_lshiftrt_n_sb): Fix empty split
condition.
---
 gcc/config/h8300/combiner.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/h8300/combiner.md b/gcc/config/h8300/combiner.md
index 20e19da0419..e31bd507a6f 100644
--- a/gcc/config/h8300/combiner.md
+++ b/gcc/config/h8300/combiner.md
@@ -271,7 +271,7 @@ (define_insn_and_split "*andsi3_lshiftrt_n_sb"
   "exact_log2 (INTVAL (operands[3])) < 16
&& INTVAL (operands[2]) + exact_log2 (INTVAL (operands[3])) == 31"
   "#"
-  ""
+  "&& 1"
   [(parallel [(set (match_dup 0)
   (and:SI (lshiftrt:SI (match_dup 1) (match_dup 2))
   (match_dup 3)))
-- 
2.17.1



[PATCH 03/11] arm: Update unexpected empty split condition

2021-06-01 Thread Kewen Lin via Gcc-patches
gcc/ChangeLog:

* config/arm/vfp.md (no_literal_pool_df_immediate,
no_literal_pool_sf_immediate): Fix empty split condition.
---
 gcc/config/arm/vfp.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index f97af92716b..55b6c1ac585 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -2129,7 +2129,7 @@ (define_insn_and_split "no_literal_pool_df_immediate"
&& !arm_const_double_rtx (operands[1])
&& !(TARGET_VFP_DOUBLE && vfp3_const_double_rtx (operands[1]))"
   "#"
-  ""
+  "&& 1"
   [(const_int 0)]
 {
   long buf[2];
@@ -2154,7 +2154,7 @@ (define_insn_and_split "no_literal_pool_sf_immediate"
&& TARGET_VFP_BASE
&& !vfp3_const_double_rtx (operands[1])"
   "#"
-  ""
+  "&& 1"
   [(const_int 0)]
 {
   long buf;
-- 
2.17.1



[PATCH 06/11] i386: Update unexpected empty split condition

2021-06-01 Thread Kewen Lin via Gcc-patches
gcc/ChangeLog:

* config/i386/i386.md (*load_tp_x32_zext, *add_tp_x32_zext,
*tls_dynamic_gnu2_combine_32): Fix empty split condition.
* config/i386/sse.md (*_pmovmskb_lt,
*_pmovmskb_zext_lt, *sse2_pmovmskb_ext_lt,
*_pblendvb_lt): Likewise.
---
 gcc/config/i386/i386.md | 6 +++---
 gcc/config/i386/sse.md  | 8 
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9ff35d9a607..545d048906d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15712,7 +15712,7 @@ (define_insn_and_split "*load_tp_x32_zext"
  (unspec:SI [(const_int 0)] UNSPEC_TP)))]
   "TARGET_X32"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0)
(zero_extend:DI (match_dup 1)))]
 {
@@ -15750,7 +15750,7 @@ (define_insn_and_split "*add_tp_x32_zext"
(clobber (reg:CC FLAGS_REG))]
   "TARGET_X32"
   "#"
-  ""
+  "&& 1"
   [(parallel
  [(set (match_dup 0)
   (zero_extend:DI
@@ -15841,7 +15841,7 @@ (define_insn_and_split "*tls_dynamic_gnu2_combine_32"
(clobber (reg:CC FLAGS_REG))]
   "!TARGET_64BIT && TARGET_GNU2_TLS"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0) (match_dup 5))]
 {
   operands[5] = can_create_pseudo_p () ? gen_reg_rtx (Pmode) : operands[0];
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9d3728d1cb0..a9d78030119 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16467,7 +16467,7 @@ (define_insn_and_split "*_pmovmskb_lt"
  UNSPEC_MOVMSK))]
   "TARGET_SSE2"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0)
(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))]
   ""
@@ -16489,7 +16489,7 @@ (define_insn_and_split "*_pmovmskb_zext_lt"
UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE2"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0)
(zero_extend:DI (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK)))]
   ""
@@ -16511,7 +16511,7 @@ (define_insn_and_split "*sse2_pmovmskb_ext_lt"
UNSPEC_MOVMSK)))]
   "TARGET_64BIT && TARGET_SSE2"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0)
(sign_extend:DI (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK)))]
   ""
@@ -17769,7 +17769,7 @@ (define_insn_and_split "*_pblendvb_lt"
  UNSPEC_BLENDV))]
   "TARGET_SSE4_1"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0)
(unspec:VI1_AVX2
 [(match_dup 1) (match_dup 2) (match_dup 3)] UNSPEC_BLENDV))]
-- 
2.17.1



[PATCH 07/11] m68k: Update unexpected empty split condition

2021-06-01 Thread Kewen Lin via Gcc-patches
gcc/ChangeLog:

* config/m68k/m68k.md (*zero_extend_inc, *zero_extend_dec,
*zero_extendsidi2): Fix empty split condition.
---
 gcc/config/m68k/m68k.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
index 59a456cd496..82d075e8bf0 100644
--- a/gcc/config/m68k/m68k.md
+++ b/gcc/config/m68k/m68k.md
@@ -1693,7 +1693,7 @@ (define_insn_and_split "*zero_extend_inc"
GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT &&
GET_MODE_SIZE (GET_MODE (operands[0])) == GET_MODE_SIZE (GET_MODE 
(operands[1])) * 2"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0)
(const_int 0))
(set (match_dup 0)
@@ -1710,7 +1710,7 @@ (define_insn_and_split "*zero_extend_dec"
GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_INT &&
GET_MODE_SIZE (GET_MODE (operands[0])) == GET_MODE_SIZE (GET_MODE 
(operands[1])) * 2"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 0)
(match_dup 1))
(set (match_dup 0)
@@ -1764,7 +1764,7 @@ (define_insn_and_split "*zero_extendsidi2"
(zero_extend:DI (match_operand:SI 1 "nonimmediate_src_operand" "")))]
   "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) != MEM"
   "#"
-  ""
+  "&& 1"
   [(set (match_dup 2)
(match_dup 1))
(set (match_dup 3)
-- 
2.17.1



[PATCH 02/11] arc: Update unexpected empty split condition

2021-06-01 Thread Kewen Lin via Gcc-patches
gcc/ChangeLog:

* config/arc/arc.md (*bbit_di): Fix empty split condition.
---
 gcc/config/arc/arc.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 7a52551eef5..a03840c4c36 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -5020,7 +5020,7 @@ (define_insn_and_split "*bbit_di"
(clobber (reg:CC_ZN CC_REG))]
   "!CROSSING_JUMP_P (insn)"
   "#"
-  ""
+  "&& 1"
   [(parallel
  [(set (pc) (if_then_else (match_dup 3) (label_ref (match_dup 0)) (pc)))
   (clobber (reg:CC_ZN CC_REG))])]
-- 
2.17.1



[PATCH 01/11] gen: Emit error msg for empty split condition

2021-06-01 Thread Kewen Lin via Gcc-patches
As Segher suggested, this patch is to emit the error message
if the split condition of define_insn_and_split is empty while
the insn condition isn't.

gcc/ChangeLog:

* gensupport.c (process_rtx): Emit error message for empty
split condition in define_insn_and_split while the insn
condition isn't.
---
 gcc/gensupport.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/gensupport.c b/gcc/gensupport.c
index 0f19bd70664..52cee120215 100644
--- a/gcc/gensupport.c
+++ b/gcc/gensupport.c
@@ -620,6 +620,9 @@ process_rtx (rtx desc, file_location loc)
  }
else if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE)
  error_at (loc, "the rewrite condition must start with `&&'");
+   else if (split_cond[0] == '\0' && strlen (XSTR (desc, 2)) != 0)
+ error_at (loc, "the split condition mustn't be empty if the "
+"insn condition isn't empty");
XSTR (split, 1) = split_cond;
if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE)
  XVEC (split, 2) = gen_rewrite_sequence (XVEC (desc, 1));
-- 
2.17.1



[RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-01 Thread Kewen Lin via Gcc-patches
Hi all,

define_insn_and_split should avoid to use empty split condition
if the condition for define_insn isn't empty, otherwise it can
sometimes result in unexpected consequence, since the split
will always be done even if the insn condition doesn't hold.

To avoid forgetting to add "&& 1" onto split condition, as
Segher suggested in thread[1], this series is to add the check
and raise an error if it catches the unexpected cases.  With
this new check, we have to fix up some existing
define_insn_and_split which are detected as error.  I hope all
these places are not intentional to be kept as blank.

Any comments are highly appreciated.

BR,
Kewen

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566970.html

Kewen Lin (11):
  gen: Emit error msg for empty split condition
  arc: Update unexpected empty split condition
  arm: Update unexpected empty split condition
  cris: Update unexpected empty split condition
  h8300: Update unexpected empty split condition
  i386: Update unexpected empty split condition
  m68k: Update unexpected empty split condition
  mips: Update unexpected empty split condition
  or1k: Update unexpected empty split condition
  sh: Update unexpected empty split condition
  sparc: Update unexpected empty split condition

 gcc/config/arc/arc.md|  2 +-
 gcc/config/arm/vfp.md|  4 ++--
 gcc/config/cris/cris.md  |  2 +-
 gcc/config/h8300/combiner.md |  2 +-
 gcc/config/i386/i386.md  |  6 +++---
 gcc/config/i386/sse.md   |  8 
 gcc/config/m68k/m68k.md  |  6 +++---
 gcc/config/mips/mips.md  |  6 +++---
 gcc/config/or1k/or1k.md  |  2 +-
 gcc/config/sh/sh.md  |  2 +-
 gcc/config/sparc/sparc.md| 12 ++--
 gcc/gensupport.c |  3 +++
 12 files changed, 29 insertions(+), 26 deletions(-)

-- 
2.17.1