[gcc r15-2374] vect: Fix single_imm_use in tree_vect_patterns

2024-07-29 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:49339d8b7e03a7ba0d4a5e118af993f175485b41

commit r15-2374-g49339d8b7e03a7ba0d4a5e118af993f175485b41
Author: Feng Xue 
Date:   Fri Jun 14 15:49:23 2024 +0800

vect: Fix single_imm_use in tree_vect_patterns

Since pattern statement coexists with normal statements in a way that it is
not linked into function body, we should not invoke utility procedures that
depends on def/use graph on pattern statement, such as counting uses of a
pseudo value defined by a pattern statement. This patch is to fix a bug of
this type in vect pattern formation.

2024-06-14 Feng Xue 

gcc/
* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Only call
single_imm_use if statement is not generated from pattern 
recognition.

Diff:
---
 gcc/tree-vect-patterns.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 5fbd1a4fa6b4..4674a16d15f4 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -2702,7 +2702,8 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
   /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a
  PLUS_EXPR then do the shift last as some targets can combine the shift and
  add into a single instruction.  */
-  if (lhs && single_imm_use (lhs, _p, _stmt))
+  if (lhs && !is_pattern_stmt_p (stmt_info)
+  && single_imm_use (lhs, _p, _stmt))
 {
   if (gimple_code (use_stmt) == GIMPLE_ASSIGN
  && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR)


[gcc r15-2097] vect: Optimize order of lane-reducing operations in loop def-use cycles

2024-07-17 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:db3c8c9726d0bafbb9f85b6d7027fe83602643e7

commit r15-2097-gdb3c8c9726d0bafbb9f85b6d7027fe83602643e7
Author: Feng Xue 
Date:   Wed May 29 17:28:14 2024 +0800

vect: Optimize order of lane-reducing operations in loop def-use cycles

When transforming multiple lane-reducing operations in a loop reduction 
chain,
originally, corresponding vectorized statements are generated into def-use
cycles starting from 0. The def-use cycle with smaller index, would contain
more statements, which means more instruction dependency. For example:

   int sum = 1;
   for (i)
 {
   sum += d0[i] * d1[i];  // dot-prod 
   sum += w[i];   // widen-sum 
   sum += abs(s0[i] - s1[i]); // sad 
   sum += n[i];   // normal 
 }

Original transformation result:

   for (i / 16)
 {
   sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0);
   sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1);
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   ...
 }

For a higher instruction parallelism in final vectorized loop, an optimal
means is to make those effective vector lane-reducing ops be distributed
evenly among all def-use cycles. Transformed as the below, DOT_PROD,
WIDEN_SUM and SADs are generated into disparate cycles, instruction
dependency among them could be eliminated.

   for (i / 16)
 {
   sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = sum_v0;  // copy
   sum_v1 = WIDEN_SUM (w_v1[i: 0 ~ 15], sum_v1);
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = sum_v0;  // copy
   sum_v1 = sum_v1;  // copy
   sum_v2 = SAD (s0_v2[i: 0 ~ 7 ], s1_v2[i: 0 ~ 7 ], sum_v2);
   sum_v3 = SAD (s0_v3[i: 8 ~ 15], s1_v3[i: 8 ~ 15], sum_v3);

   ...
 }

2024-03-22 Feng Xue 

gcc/
PR tree-optimization/114440
* tree-vectorizer.h (struct _stmt_vec_info): Add a new field
reduc_result_pos.
* tree-vect-loop.cc (vect_transform_reduction): Generate 
lane-reducing
statements in an optimized order.

Diff:
---
 gcc/tree-vect-loop.cc | 64 +--
 gcc/tree-vectorizer.h |  6 +
 2 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 1c3dbf4bc71b..d7d628efa60f 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8844,6 +8844,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   sum += d0[i] * d1[i];  // dot-prod 
   sum += w[i];   // widen-sum 
   sum += abs(s0[i] - s1[i]); // sad 
+  sum += n[i];   // normal 
 }
 
 The vector size is 128-bit,vectorization factor is 16.  Reduction
@@ -8861,19 +8862,27 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy
 
-  sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0);
-  sum_v1 = sum_v1;  // copy
+  sum_v0 = sum_v0;  // copy
+  sum_v1 = WIDEN_SUM (w_v1[i: 0 ~ 15], sum_v1);
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy
 
-  sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0);
-  sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1);
-  sum_v2 = sum_v2;  // copy
+  sum_v0 = sum_v0;  // copy
+  sum_v1 = SAD (s0_v1[i: 0 ~ 7 ], s1_v1[i: 0 ~ 7 ], sum_v1);
+  sum_v2 = SAD (s0_v2[i: 8 ~ 15], s1_v2[i: 8 ~ 15], sum_v2);
   sum_v3 = sum_v3;  // copy
+
+  sum_v0 += n_v0[i: 0  ~ 3 ];
+  sum_v1 += n_v1[i: 4  ~ 7 ];
+  sum_v2 += n_v2[i: 8  ~ 11];
+  sum_v3 += n_v3[i: 12 ~ 15];
 }
 
-  sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3;   // = sum_v0 + sum_v1
-   */
+Moreover, for a higher instruction parallelism in final vectorized
+loop, it is considered to make those effective vector lane-reducing
+ops be distributed evenly among all def-use cycles.  In the above
+example, DOT_PROD, WIDEN_SUM and SADs are generated into 

[gcc r15-2096] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-07-17 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:178cc419512f7e358f88dfe2336625aa99cd7438

commit r15-2096-g178cc419512f7e358f88dfe2336625aa99cd7438
Author: Feng Xue 
Date:   Wed May 29 17:22:36 2024 +0800

vect: Support multiple lane-reducing operations for loop reduction 
[PR114440]

For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, 
current
vectorizer could only handle the pattern if the reduction chain does not
contain other operation, no matter the other is normal or lane-reducing.

This patches removes some constraints in reduction analysis to allow 
multiple
arbitrary lane-reducing operations with mixed input vectypes in a loop
reduction chain. For example:

   int sum = 1;
   for (i)
 {
   sum += d0[i] * d1[i];  // dot-prod 
   sum += w[i];   // widen-sum 
   sum += abs(s0[i] - s1[i]); // sad 
 }

The vector size is 128-bit vectorization factor is 16. Reduction statements
would be transformed as:

   vector<4> int sum_v0 = { 0, 0, 0, 1 };
   vector<4> int sum_v1 = { 0, 0, 0, 0 };
   vector<4> int sum_v2 = { 0, 0, 0, 0 };
   vector<4> int sum_v3 = { 0, 0, 0, 0 };

   for (i / 16)
 {
   sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0);
   sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1);
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy
 }

sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3;   // = sum_v0 + sum_v1

2024-03-22 Feng Xue 

gcc/
PR tree-optimization/114440
* tree-vectorizer.h (vectorizable_lane_reducing): New function
declaration.
* tree-vect-stmts.cc (vect_analyze_stmt): Call new function
vectorizable_lane_reducing to analyze lane-reducing operation.
* tree-vect-loop.cc (vect_model_reduction_cost): Remove cost 
computation
code related to emulated_mixed_dot_prod.
(vectorizable_lane_reducing): New function.
(vectorizable_reduction): Allow multiple lane-reducing operations in
loop reduction. Move some original lane-reducing related code to
vectorizable_lane_reducing.
(vect_transform_reduction): Adjust comments with updated example.

gcc/testsuite/
PR tree-optimization/114440
* gcc.dg/vect/vect-reduc-chain-1.c
* gcc.dg/vect/vect-reduc-chain-2.c
* gcc.dg/vect/vect-reduc-chain-3.c
* gcc.dg/vect/vect-reduc-chain-dot-slp-1.c
* gcc.dg/vect/vect-reduc-chain-dot-slp-2.c
* gcc.dg/vect/vect-reduc-chain-dot-slp-3.c
* gcc.dg/vect/vect-reduc-chain-dot-slp-4.c
* gcc.dg/vect/vect-reduc-dot-slp-1.c

Diff:
---
 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c |  64 ++
 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-2.c |  79 +++
 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-3.c |  68 ++
 .../gcc.dg/vect/vect-reduc-chain-dot-slp-1.c   |  95 
 .../gcc.dg/vect/vect-reduc-chain-dot-slp-2.c   |  67 ++
 .../gcc.dg/vect/vect-reduc-chain-dot-slp-3.c   |  79 +++
 .../gcc.dg/vect/vect-reduc-chain-dot-slp-4.c   |  63 ++
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-1.c   |  60 +
 gcc/tree-vect-loop.cc  | 241 +++--
 gcc/tree-vect-stmts.cc |   2 +
 gcc/tree-vectorizer.h  |   2 +
 11 files changed, 750 insertions(+), 70 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
new file mode 100644
index ..80b0089ea0fa
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
@@ -0,0 +1,64 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { 
aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_dotprod_neon }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 signed
+#endif
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res,
+   SIGNEDNESS_2 char *restrict a,
+   SIGNEDNESS_2 char *restrict b,
+   SIGNEDNESS_2 char *restrict c,
+   SIGNEDNESS_2 char *restrict d,
+   SIGNEDNESS_1 int *restrict e)
+{
+  for (int i = 0; i < N; ++i)
+{
+ 

[gcc r15-2095] vect: Refit lane-reducing to be normal operation

2024-07-17 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:8b59fa9d8ca25bdf0792390a8bdeae151532a530

commit r15-2095-g8b59fa9d8ca25bdf0792390a8bdeae151532a530
Author: Feng Xue 
Date:   Tue Jul 2 17:12:00 2024 +0800

vect: Refit lane-reducing to be normal operation

Vector stmts number of an operation is calculated based on output vectype.
This is over-estimated for lane-reducing operation, which would cause vector
def/use mismatched when we want to support loop reduction mixed with lane-
reducing and normal operations. One solution is to refit lane-reducing
to make it behave like a normal one, by adding new pass-through copies to
fix possible def/use gap. And resultant superfluous statements could be
optimized away after vectorization.  For example:

  int sum = 1;
  for (i)
{
  sum += d0[i] * d1[i];  // dot-prod 
}

  The vector size is 128-bit,vectorization factor is 16.  Reduction
  statements would be transformed as:

  vector<4> int sum_v0 = { 0, 0, 0, 1 };
  vector<4> int sum_v1 = { 0, 0, 0, 0 };
  vector<4> int sum_v2 = { 0, 0, 0, 0 };
  vector<4> int sum_v3 = { 0, 0, 0, 0 };

  for (i / 16)
{
  sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
  sum_v1 = sum_v1;  // copy
  sum_v2 = sum_v2;  // copy
  sum_v3 = sum_v3;  // copy
}

  sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3;   // = sum_v0

2024-07-02 Feng Xue 

gcc/
* tree-vect-loop.cc (vect_reduction_update_partial_vector_usage):
Calculate effective vector stmts number with generic
vect_get_num_copies.
(vect_transform_reduction): Insert copies for lane-reducing so as to
fix over-estimated vector stmts number.
(vect_transform_cycle_phi): Calculate vector PHI number only based 
on
output vectype.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Remove
adjustment on vector stmts number specific to slp reduction.

Diff:
---
 gcc/tree-vect-loop.cc | 134 ++
 gcc/tree-vect-slp.cc  |  27 +++---
 2 files changed, 121 insertions(+), 40 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a464bc8607c2..9c5c30535713 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7472,12 +7472,8 @@ vect_reduction_update_partial_vector_usage 
(loop_vec_info loop_vinfo,
= get_masked_reduction_fn (reduc_fn, vectype_in);
   vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
   vec_loop_lens *lens = _VINFO_LENS (loop_vinfo);
-  unsigned nvectors;
-
-  if (slp_node)
-   nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-  else
-   nvectors = vect_get_num_copies (loop_vinfo, vectype_in);
+  unsigned nvectors = vect_get_num_copies (loop_vinfo, slp_node,
+  vectype_in);
 
   if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
vect_record_loop_len (loop_vinfo, lens, nvectors, vectype_in, 1);
@@ -8599,12 +8595,15 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   stmt_vec_info phi_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info));
   gphi *reduc_def_phi = as_a  (phi_info->stmt);
   int reduc_index = STMT_VINFO_REDUC_IDX (stmt_info);
-  tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (reduc_info);
+  tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (stmt_info);
+
+  if (!vectype_in)
+vectype_in = STMT_VINFO_VECTYPE (stmt_info);
 
   if (slp_node)
 {
   ncopies = 1;
-  vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
+  vec_num = vect_get_num_copies (loop_vinfo, slp_node, vectype_in);
 }
   else
 {
@@ -8662,13 +8661,40 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   bool lane_reducing = lane_reducing_op_p (code);
   gcc_assert (single_defuse_cycle || lane_reducing);
 
+  if (lane_reducing)
+{
+  /* The last operand of lane-reducing op is for reduction.  */
+  gcc_assert (reduc_index == (int) op.num_ops - 1);
+}
+
   /* Create the destination vector  */
   tree scalar_dest = gimple_get_lhs (stmt_info->stmt);
   tree vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
 
+  if (lane_reducing && !slp_node && !single_defuse_cycle)
+{
+  /* Note: there are still vectorizable cases that can not be handled by
+single-lane slp.  Probably it would take some time to evolve the
+feature to a mature state.  So we have to keep the below non-slp code
+path as failsafe for lane-reducing support.  */
+  gcc_assert (op.num_ops <= 3);
+  for (unsigned i = 0; i < op.num_ops; i++)
+   {
+ unsigned oprnd_ncopies = ncopies;
+
+ if ((int) i == reduc_index)
+   {
+ tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+ oprnd_ncopies = vect_get_num_copies (loop_vinfo, 

[gcc r15-2094] vect: Add a unified vect_get_num_copies for slp and non-slp

2024-07-17 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:e7fbae834f8db2508d3161d88efe7ddbb702e437

commit r15-2094-ge7fbae834f8db2508d3161d88efe7ddbb702e437
Author: Feng Xue 
Date:   Fri Jul 12 16:38:28 2024 +0800

vect: Add a unified vect_get_num_copies for slp and non-slp

Extend original vect_get_num_copies (pure loop-based) to calculate number of
vector stmts for slp node regarding a generic vect region.

2024-07-12 Feng Xue 

gcc/
* tree-vectorizer.h (vect_get_num_copies): New overload function.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Calculate
number of vector stmts for slp node with vect_get_num_copies.
(vect_slp_analyze_node_operations): Calculate number of vector 
elements
for constant/external slp node with vect_get_num_copies.

Diff:
---
 gcc/tree-vect-slp.cc  | 19 +++
 gcc/tree-vectorizer.h | 28 +++-
 2 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index d0a8531fd3b3..4dadbc6854de 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6573,17 +6573,7 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, 
slp_tree node,
  }
 }
   else
-{
-  poly_uint64 vf;
-  if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
-   vf = loop_vinfo->vectorization_factor;
-  else
-   vf = 1;
-  unsigned int group_size = SLP_TREE_LANES (node);
-  tree vectype = SLP_TREE_VECTYPE (node);
-  SLP_TREE_NUMBER_OF_VEC_STMTS (node)
-   = vect_get_num_vectors (vf * group_size, vectype);
-}
+SLP_TREE_NUMBER_OF_VEC_STMTS (node) = vect_get_num_copies (vinfo, node);
 
   /* Handle purely internal nodes.  */
   if (SLP_TREE_CODE (node) == VEC_PERM_EXPR)
@@ -6851,12 +6841,9 @@ vect_slp_analyze_node_operations (vec_info *vinfo, 
slp_tree node,
  && j == 1);
  continue;
}
- unsigned group_size = SLP_TREE_LANES (child);
- poly_uint64 vf = 1;
- if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
-   vf = loop_vinfo->vectorization_factor;
+
  SLP_TREE_NUMBER_OF_VEC_STMTS (child)
-   = vect_get_num_vectors (vf * group_size, vector_type);
+   = vect_get_num_copies (vinfo, child);
  /* And cost them.  */
  vect_prologue_cost_for_slp (child, cost_vec);
}
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 8eb3ec4df869..1e2121abaffc 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2080,6 +2080,32 @@ vect_get_num_vectors (poly_uint64 nunits, tree vectype)
   return exact_div (nunits, TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
 }
 
+/* Return the number of vectors in the context of vectorization region VINFO,
+   needed for a group of statements, whose size is specified by lanes of NODE,
+   if NULL, it is 1.  The statements are supposed to be interleaved together
+   with no gap, and all operate on vectors of type VECTYPE, if NULL, the
+   vectype of NODE is used.  */
+
+inline unsigned int
+vect_get_num_copies (vec_info *vinfo, slp_tree node, tree vectype = NULL)
+{
+  poly_uint64 vf;
+
+  if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
+vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  else
+vf = 1;
+
+  if (node)
+{
+  vf *= SLP_TREE_LANES (node);
+  if (!vectype)
+   vectype = SLP_TREE_VECTYPE (node);
+}
+
+  return vect_get_num_vectors (vf, vectype);
+}
+
 /* Return the number of copies needed for loop vectorization when
a statement operates on vectors of type VECTYPE.  This is the
vectorization factor divided by the number of elements in
@@ -2088,7 +2114,7 @@ vect_get_num_vectors (poly_uint64 nunits, tree vectype)
 inline unsigned int
 vect_get_num_copies (loop_vec_info loop_vinfo, tree vectype)
 {
-  return vect_get_num_vectors (LOOP_VINFO_VECT_FACTOR (loop_vinfo), vectype);
+  return vect_get_num_copies (loop_vinfo, NULL, vectype);
 }
 
 /* Update maximum unit count *MAX_NUNITS so that it accounts for


[gcc r15-1727] vect: Determine input vectype for multiple lane-reducing operations

2024-06-30 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:3aa004f1db327d5728a8fd0afcfed24e767f0499

commit r15-1727-g3aa004f1db327d5728a8fd0afcfed24e767f0499
Author: Feng Xue 
Date:   Sun Jun 16 13:00:32 2024 +0800

vect: Determine input vectype for multiple lane-reducing operations

The input vectype of reduction PHI statement must be determined before
vect cost computation for the reduction. Since lance-reducing operation has
different input vectype from normal one, so we need to traverse all 
reduction
statements to find out the input vectype with the least lanes, and set that 
to
the PHI statement.

2024-06-16 Feng Xue 

gcc/
* tree-vect-loop.cc (vectorizable_reduction): Determine input 
vectype
during traversal of reduction statements.

Diff:
---
 gcc/tree-vect-loop.cc | 79 ---
 1 file changed, 56 insertions(+), 23 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 6f32867f85a..3095ff5ab6b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7643,7 +7643,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 {
   stmt_vec_info def = loop_vinfo->lookup_def (reduc_def);
   stmt_vec_info vdef = vect_stmt_to_vectorize (def);
-  if (STMT_VINFO_REDUC_IDX (vdef) == -1)
+  int reduc_idx = STMT_VINFO_REDUC_IDX (vdef);
+
+  if (reduc_idx == -1)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -7686,10 +7688,57 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  return false;
}
}
-  else if (!stmt_info)
-   /* First non-conversion stmt.  */
-   stmt_info = vdef;
-  reduc_def = op.ops[STMT_VINFO_REDUC_IDX (vdef)];
+  else
+   {
+ /* First non-conversion stmt.  */
+ if (!stmt_info)
+   stmt_info = vdef;
+
+ if (lane_reducing_op_p (op.code))
+   {
+ enum vect_def_type dt;
+ tree vectype_op;
+
+ /* The last operand of lane-reducing operation is for
+reduction.  */
+ gcc_assert (reduc_idx > 0 && reduc_idx == (int) op.num_ops - 1);
+
+ if (!vect_is_simple_use (op.ops[0], loop_vinfo, , _op))
+   return false;
+ 
+ tree type_op = TREE_TYPE (op.ops[0]);
+
+ if (!vectype_op)
+   { 
+ vectype_op = get_vectype_for_scalar_type (loop_vinfo,
+   type_op);
+ if (!vectype_op)
+   return false;
+   }
+
+ /* For lane-reducing operation vectorizable analysis needs the
+reduction PHI information */
+ STMT_VINFO_REDUC_DEF (def) = phi_info;
+
+ /* Each lane-reducing operation has its own input vectype, while
+reduction PHI will record the input vectype with the least
+lanes.  */
+ STMT_VINFO_REDUC_VECTYPE_IN (vdef) = vectype_op;
+
+ /* To accommodate lane-reducing operations of mixed input
+vectypes, choose input vectype with the least lanes for the
+reduction PHI statement, which would result in the most
+ncopies for vectorized reduction results.  */
+ if (!vectype_in
+ || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
+  < GET_MODE_SIZE (SCALAR_TYPE_MODE (type_op
+   vectype_in = vectype_op;
+   }
+ else
+   vectype_in = STMT_VINFO_VECTYPE (phi_info);
+   }
+
+  reduc_def = op.ops[reduc_idx];
   reduc_chain_length++;
   if (!stmt_info && slp_node)
slp_for_stmt_info = SLP_TREE_CHILDREN (slp_for_stmt_info)[0];
@@ -7747,6 +7796,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
   STMT_VINFO_REDUC_VECTYPE (reduc_info) = vectype_out;
+  STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) = vectype_in;
+
   gimple_match_op op;
   if (!gimple_extract_op (stmt_info->stmt, ))
 gcc_unreachable ();
@@ -7831,16 +7882,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  = get_vectype_for_scalar_type (loop_vinfo,
 TREE_TYPE (op.ops[i]), slp_op[i]);
 
-  /* To properly compute ncopies we are interested in the widest
-non-reduction input type in case we're looking at a widening
-accumulation that we later handle in vect_transform_reduction.  */
-  if (lane_reducing
- && vectype_op[i]
- && (!vectype_in
- || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
- < GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE 
(vectype_op[i]))
-   vectype_in = vectype_op[i];
-
   /* Record how the non-reduction-def value of COND_EXPR is 

[gcc r15-1726] vect: Fix shift-by-induction for single-lane slp

2024-06-30 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:1ff5f8f8a05dd57620a1e2abbf87bd511b113cce

commit r15-1726-g1ff5f8f8a05dd57620a1e2abbf87bd511b113cce
Author: Feng Xue 
Date:   Wed Jun 26 22:02:53 2024 +0800

vect: Fix shift-by-induction for single-lane slp

Allow shift-by-induction for slp node, when it is single lane, which is
aligned with the original loop-based handling.

2024-06-26 Feng Xue 

gcc/
* tree-vect-stmts.cc (vectorizable_shift): Allow shift-by-induction
for single-lane slp node.

gcc/testsuite/
* gcc.dg/vect/vect-shift-6.c
* gcc.dg/vect/vect-shift-7.c

Diff:
---
 gcc/testsuite/gcc.dg/vect/vect-shift-6.c | 52 
 gcc/testsuite/gcc.dg/vect/vect-shift-7.c | 69 
 gcc/tree-vect-stmts.cc   |  2 +-
 3 files changed, 122 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-6.c 
b/gcc/testsuite/gcc.dg/vect/vect-shift-6.c
new file mode 100644
index 000..277093bc7bb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-6.c
@@ -0,0 +1,52 @@
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_int } */
+
+#include 
+#include 
+#include "tree-vect.h"
+
+#define N 32
+
+int32_t A[N]; 
+int32_t B[N];
+
+#define FN(name)   \
+__attribute__((noipa)) \
+void name(int32_t *a)  \
+{  \
+  for (int i = 0; i < N / 2; i++)  \
+{  \
+   a[2 * i + 0] <<= i; \
+   a[2 * i + 1] <<= i; \
+}  \
+}
+
+
+FN(foo_vec)
+
+#pragma GCC push_options
+#pragma GCC optimize ("O0")
+FN(foo_novec)
+#pragma GCC pop_options
+
+int main ()
+{
+  int i;
+
+  check_vect ();
+
+#pragma GCC novector
+  for (i = 0; i < N; i++)
+A[i] = B[i] = -(i + 1);
+
+  foo_vec(A);
+  foo_novec(B);
+
+  /* check results:  */
+#pragma GCC novector
+  for (i = 0; i < N; i++)
+if (A[i] != B[i])
+  abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-7.c 
b/gcc/testsuite/gcc.dg/vect/vect-shift-7.c
new file mode 100644
index 000..6de3f39a87f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-7.c
@@ -0,0 +1,69 @@
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "--param max-completely-peel-times=6" } */
+
+#include 
+#include 
+#include "tree-vect.h"
+
+#define N 16
+#define M 16
+
+int32_t A[N];
+int32_t B[N];
+
+#define FN(name)   \
+__attribute__((noipa)) \
+void name(int32_t *a, int m)   \
+{  \
+  for (int i = 0; i < N / 2; i++)  \
+{  \
+  int s1 = i;  \
+  int s2 = s1 + 1; \
+  int32_t r1 = 0;  \
+  int32_t r2 = 7;  \
+  int32_t t1 = m;  \
+  \
+  for (int j = 0; j < M; j++)  \
+ { \
+r1 += t1 << s1;\
+r2 += t1 << s2;\
+t1++;  \
+s1++;  \
+s2++;  \
+ } \
+   \
+   a[2 * i + 0] = r1;  \
+   a[2 * i + 1] = r2;  \
+}  \
+}
+
+
+FN(foo_vec)
+
+#pragma GCC push_options
+#pragma GCC optimize ("O0")
+FN(foo_novec)
+#pragma GCC pop_options
+
+int main ()
+{
+  int i;
+
+  check_vect ();
+
+#pragma GCC novector
+  for (i = 0; i < N; i++)
+A[i] = B[i] = 0;
+
+  foo_vec(A, 0);
+  foo_novec(B, 0);
+
+  /* check results:  */
+#pragma GCC novector
+  for (i = 0; i < N; i++)
+if (A[i] != B[i])
+  abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 7b889f31645..aab3aa59962 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6175,7 +6175,7 @@ vectorizable_shift (vec_info *vinfo,
   if ((dt[1] == vect_internal_def
|| dt[1] == vect_induction_def
|| dt[1] == vect_nested_cycle)
-  && !slp_node)
+  && (!slp_node || SLP_TREE_LANES (slp_node) == 1))
 scalar_shift_arg = false;
   else if (dt[1] == vect_constant_def
   || dt[1] == vect_external_def


[gcc r15-1465] vect: Tighten an assertion for lane-reducing in transform

2024-06-19 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:ecbc96bb2873e453b0bd33d602ce34ad0d9d9cfd

commit r15-1465-gecbc96bb2873e453b0bd33d602ce34ad0d9d9cfd
Author: Feng Xue 
Date:   Sun Jun 16 13:33:52 2024 +0800

vect: Tighten an assertion for lane-reducing in transform

According to logic of code nearby the assertion, all lane-reducing 
operations
should not appear, not just DOT_PROD_EXPR. Since "use_mask_by_cond_expr_p"
treats SAD_EXPR same as DOT_PROD_EXPR, and WIDEN_SUM_EXPR should not be 
allowed
by the following assertion "gcc_assert (commutative_binary_op_p (...))", so
tighten the assertion.

2024-06-16 Feng Xue 

gcc/
* tree-vect-loop.cc (vect_transform_reduction): Change assertion to
cover all lane-reducing ops.

Diff:
---
 gcc/tree-vect-loop.cc | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 1d60ac47e553..347dac97e497 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8618,7 +8618,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
 }
 
   bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info);
-  gcc_assert (single_defuse_cycle || lane_reducing_op_p (code));
+  bool lane_reducing = lane_reducing_op_p (code);
+  gcc_assert (single_defuse_cycle || lane_reducing);
 
   /* Create the destination vector  */
   tree scalar_dest = gimple_get_lhs (stmt_info->stmt);
@@ -8674,8 +8675,9 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   tree vop[3] = { vec_oprnds[0][i], vec_oprnds[1][i], NULL_TREE };
   if (masked_loop_p && !mask_by_cond_expr)
{
- /* No conditional ifns have been defined for dot-product yet.  */
- gcc_assert (code != DOT_PROD_EXPR);
+ /* No conditional ifns have been defined for lane-reducing op
+yet.  */
+ gcc_assert (!lane_reducing);
 
  /* Make sure that the reduction accumulator is vop[0].  */
  if (reduc_index == 1)


[gcc r15-1464] vect: Use an array to replace 3 relevant variables

2024-06-19 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:b9c369d900ccfbd2271028611af3f08b5cf6f998

commit r15-1464-gb9c369d900ccfbd2271028611af3f08b5cf6f998
Author: Feng Xue 
Date:   Sun Jun 16 13:21:13 2024 +0800

vect: Use an array to replace 3 relevant variables

It's better to place 3 relevant independent variables into array, since we
have requirement to access them via an index in the following patch. At the
same time, this change may get some duplicated code be more compact.

2024-06-16 Feng Xue 

gcc/
* tree-vect-loop.cc (vect_transform_reduction): Replace 
vec_oprnds0/1/2
with one new array variable vec_oprnds[3].

Diff:
---
 gcc/tree-vect-loop.cc | 43 ++-
 1 file changed, 18 insertions(+), 25 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 27f77ed8b0b6..1d60ac47e553 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8580,9 +8580,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
 
   /* Transform.  */
   tree new_temp = NULL_TREE;
-  auto_vec vec_oprnds0;
-  auto_vec vec_oprnds1;
-  auto_vec vec_oprnds2;
+  auto_vec vec_oprnds[3];
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
@@ -8630,14 +8628,15 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
  definition.  */
   if (!cond_fn_p)
 {
+  gcc_assert (reduc_index >= 0 && reduc_index <= 2);
   vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies,
 single_defuse_cycle && reduc_index == 0
-? NULL_TREE : op.ops[0], _oprnds0,
+? NULL_TREE : op.ops[0], _oprnds[0],
 single_defuse_cycle && reduc_index == 1
-? NULL_TREE : op.ops[1], _oprnds1,
+? NULL_TREE : op.ops[1], _oprnds[1],
 op.num_ops == 3
 && !(single_defuse_cycle && reduc_index == 2)
-? op.ops[2] : NULL_TREE, _oprnds2);
+? op.ops[2] : NULL_TREE, _oprnds[2]);
 }
   else
 {
@@ -8645,12 +8644,12 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
 vectype.  */
   gcc_assert (single_defuse_cycle
  && (reduc_index == 1 || reduc_index == 2));
-  vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies,
-op.ops[0], truth_type_for (vectype_in), _oprnds0,
+  vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies, op.ops[0],
+truth_type_for (vectype_in), _oprnds[0],
 reduc_index == 1 ? NULL_TREE : op.ops[1],
-NULL_TREE, _oprnds1,
+NULL_TREE, _oprnds[1],
 reduc_index == 2 ? NULL_TREE : op.ops[2],
-NULL_TREE, _oprnds2);
+NULL_TREE, _oprnds[2]);
 }
 
   /* For single def-use cycles get one copy of the vectorized reduction
@@ -8658,20 +8657,21 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   if (single_defuse_cycle)
 {
   vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, 1,
-reduc_index == 0 ? op.ops[0] : NULL_TREE, _oprnds0,
-reduc_index == 1 ? op.ops[1] : NULL_TREE, _oprnds1,
+reduc_index == 0 ? op.ops[0] : NULL_TREE,
+_oprnds[0],
+reduc_index == 1 ? op.ops[1] : NULL_TREE,
+_oprnds[1],
 reduc_index == 2 ? op.ops[2] : NULL_TREE,
-_oprnds2);
+_oprnds[2]);
 }
 
   bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info);
+  unsigned num = vec_oprnds[reduc_index == 0 ? 1 : 0].length ();
 
-  unsigned num = (reduc_index == 0
- ? vec_oprnds1.length () : vec_oprnds0.length ());
   for (unsigned i = 0; i < num; ++i)
 {
   gimple *new_stmt;
-  tree vop[3] = { vec_oprnds0[i], vec_oprnds1[i], NULL_TREE };
+  tree vop[3] = { vec_oprnds[0][i], vec_oprnds[1][i], NULL_TREE };
   if (masked_loop_p && !mask_by_cond_expr)
{
  /* No conditional ifns have been defined for dot-product yet.  */
@@ -8696,7 +8696,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   else
{
  if (op.num_ops >= 3)
-   vop[2] = vec_oprnds2[i];
+   vop[2] = vec_oprnds[2][i];
 
  if (masked_loop_p && mask_by_cond_expr)
{
@@ -8727,14 +8727,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
}
 
   if (single_defuse_cycle && i < num - 1)
-   {
- if (reduc_index == 0)
-   vec_oprnds0.safe_push (gimple_get_lhs (new_stmt));
- else if (reduc_index == 1)
-   vec_oprnds1.safe_push (gimple_get_lhs (new_stmt));
- else if (reduc_index == 2)
-   

[gcc r15-1463] vect: Use one reduction_type local variable

2024-06-19 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:0726f1cde5459ccdbaa6af8c6904276a28d572ba

commit r15-1463-g0726f1cde5459ccdbaa6af8c6904276a28d572ba
Author: Feng Xue 
Date:   Sun Jun 16 12:17:26 2024 +0800

vect: Use one reduction_type local variable

Two local variables were defined to refer same STMT_VINFO_REDUC_TYPE, better
to keep only one.

2024-06-16 Feng Xue 

gcc/
* tree-vect-loop.cc (vectorizable_reduction): Remove v_reduc_type, 
and
replace it to another local variable reduction_type.

Diff:
---
 gcc/tree-vect-loop.cc | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index aab408d1019d..27f77ed8b0b6 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7868,10 +7868,10 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   if (lane_reducing)
 STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in;
 
-  enum vect_reduction_type v_reduc_type = STMT_VINFO_REDUC_TYPE (phi_info);
-  STMT_VINFO_REDUC_TYPE (reduc_info) = v_reduc_type;
+  enum vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (phi_info);
+  STMT_VINFO_REDUC_TYPE (reduc_info) = reduction_type;
   /* If we have a condition reduction, see if we can simplify it further.  */
-  if (v_reduc_type == COND_REDUCTION)
+  if (reduction_type == COND_REDUCTION)
 {
   if (slp_node && SLP_TREE_LANES (slp_node) != 1)
return false;
@@ -8038,7 +8038,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 
   STMT_VINFO_REDUC_CODE (reduc_info) = orig_code;
 
-  vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info);
+  reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info);
   if (reduction_type == TREE_CODE_REDUCTION)
 {
   /* Check whether it's ok to change the order of the computation.


[gcc r15-1462] vect: Remove duplicated check on reduction operand

2024-06-19 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:a944e57506fc64b8eede79c2405ba0b498461f0b

commit r15-1462-ga944e57506fc64b8eede79c2405ba0b498461f0b
Author: Feng Xue 
Date:   Sun Jun 16 12:08:56 2024 +0800

vect: Remove duplicated check on reduction operand

In vectorizable_reduction, one check on a reduction operand via index could 
be
contained by another one check via pointer, so remove the former.

2024-06-16 Feng Xue 

gcc/
* tree-vect-loop.cc (vectorizable_reduction): Remove the duplicated
check.

Diff:
---
 gcc/tree-vect-loop.cc | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index eeb75c09e91a..aab408d1019d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7815,11 +7815,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 "use not simple.\n");
  return false;
}
-  if (i == STMT_VINFO_REDUC_IDX (stmt_info))
-   continue;
 
-  /* For an IFN_COND_OP we might hit the reduction definition operand
-twice (once as definition, once as else).  */
+  /* Skip reduction operands, and for an IFN_COND_OP we might hit the
+reduction operand twice (once as definition, once as else).  */
   if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)])
continue;


[gcc r15-1461] vect: Add a function to check lane-reducing stmt

2024-06-19 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:70466e6f9d9fb87f78ffe2e397ca876b380cb493

commit r15-1461-g70466e6f9d9fb87f78ffe2e397ca876b380cb493
Author: Feng Xue 
Date:   Sat Jun 15 23:17:10 2024 +0800

vect: Add a function to check lane-reducing stmt

Add a utility function to check if a statement is lane-reducing operation,
which could simplify some existing code.

2024-06-16 Feng Xue 

gcc/
* tree-vectorizer.h (lane_reducing_stmt_p): New function.
* tree-vect-slp.cc (vect_analyze_slp): Use new function
lane_reducing_stmt_p to check statement.

Diff:
---
 gcc/tree-vect-slp.cc  |  4 +---
 gcc/tree-vectorizer.h | 12 
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 7d18b5bfee5d..a5665946a4eb 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3919,7 +3919,6 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
  scalar_stmts.create (loop_vinfo->reductions.length ());
  for (auto next_info : loop_vinfo->reductions)
{
- gassign *g;
  next_info = vect_stmt_to_vectorize (next_info);
  if ((STMT_VINFO_RELEVANT_P (next_info)
   || STMT_VINFO_LIVE_P (next_info))
@@ -3931,8 +3930,7 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
{
  /* Do not discover SLP reductions combining lane-reducing
 ops, that will fail later.  */
- if (!(g = dyn_cast  (STMT_VINFO_STMT (next_info)))
- || !lane_reducing_op_p (gimple_assign_rhs_code (g)))
+ if (!lane_reducing_stmt_p (STMT_VINFO_STMT (next_info)))
scalar_stmts.quick_push (next_info);
  else
{
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 6bb0f5c3a56f..60224f4e2847 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2169,12 +2169,24 @@ vect_apply_runtime_profitability_check_p (loop_vec_info 
loop_vinfo)
  && th >= vect_vf_for_cost (loop_vinfo));
 }
 
+/* Return true if CODE is a lane-reducing opcode.  */
+
 inline bool
 lane_reducing_op_p (code_helper code)
 {
   return code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == SAD_EXPR;
 }
 
+/* Return true if STMT is a lane-reducing statement.  */
+
+inline bool
+lane_reducing_stmt_p (gimple *stmt)
+{
+  if (auto *assign = dyn_cast  (stmt))
+return lane_reducing_op_p (gimple_assign_rhs_code (assign));
+  return false;
+}
+
 /* Source location + hotness information. */
 extern dump_user_location_t vect_location;


[gcc r15-963] vect: Bind input vectype to lane-reducing operation

2024-05-31 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:d53f555edb95248dbf81347ba5e4136e9a491eca

commit r15-963-gd53f555edb95248dbf81347ba5e4136e9a491eca
Author: Feng Xue 
Date:   Wed May 29 16:41:57 2024 +0800

vect: Bind input vectype to lane-reducing operation

The input vectype is an attribute of lane-reducing operation, instead of
reduction PHI that it is associated to, since there might be more than one
lane-reducing operations with different type in a loop reduction chain. So
bind each lane-reducing operation with its own input type.

2024-05-29 Feng Xue 

gcc/
* tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): Remove 
parameter
loop_vinfo. Get input vectype from stmt_info instead of reduction 
PHI.
(vect_model_reduction_cost): Remove loop_vinfo argument of call to
vect_is_emulated_mixed_dot_prod.
(vect_transform_reduction): Likewise.
(vectorizable_reduction): Likewise, and bind input vectype to
lane-reducing operation.

Diff:
---
 gcc/tree-vect-loop.cc | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 7a6a6b6161d..5b85cffb37f 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5270,8 +5270,7 @@ have_whole_vector_shift (machine_mode mode)
See vect_emulate_mixed_dot_prod for the actual sequence used.  */
 
 static bool
-vect_is_emulated_mixed_dot_prod (loop_vec_info loop_vinfo,
-stmt_vec_info stmt_info)
+vect_is_emulated_mixed_dot_prod (stmt_vec_info stmt_info)
 {
   gassign *assign = dyn_cast (stmt_info->stmt);
   if (!assign || gimple_assign_rhs_code (assign) != DOT_PROD_EXPR)
@@ -5282,10 +5281,9 @@ vect_is_emulated_mixed_dot_prod (loop_vec_info 
loop_vinfo,
   if (TYPE_SIGN (TREE_TYPE (rhs1)) == TYPE_SIGN (TREE_TYPE (rhs2)))
 return false;
 
-  stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
-  gcc_assert (reduc_info->is_reduc_info);
+  gcc_assert (STMT_VINFO_REDUC_VECTYPE_IN (stmt_info));
   return !directly_supported_p (DOT_PROD_EXPR,
-   STMT_VINFO_REDUC_VECTYPE_IN (reduc_info),
+   STMT_VINFO_REDUC_VECTYPE_IN (stmt_info),
optab_vector_mixed_sign);
 }
 
@@ -5324,8 +5322,8 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
   if (!gimple_extract_op (orig_stmt_info->stmt, ))
 gcc_unreachable ();
 
-  bool emulated_mixed_dot_prod
-= vect_is_emulated_mixed_dot_prod (loop_vinfo, stmt_info);
+  bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info);
+
   if (reduction_type == EXTRACT_LAST_REDUCTION)
 /* No extra instructions are needed in the prologue.  The loop body
operations are costed in vectorizable_condition.  */
@@ -7837,6 +7835,11 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 vectype_in = STMT_VINFO_VECTYPE (phi_info);
   STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) = vectype_in;
 
+  /* Each lane-reducing operation has its own input vectype, while reduction
+ PHI records the input vectype with least lanes.  */
+  if (lane_reducing)
+STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in;
+
   enum vect_reduction_type v_reduc_type = STMT_VINFO_REDUC_TYPE (phi_info);
   STMT_VINFO_REDUC_TYPE (reduc_info) = v_reduc_type;
   /* If we have a condition reduction, see if we can simplify it further.  */
@@ -8363,7 +8366,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   if (single_defuse_cycle || lane_reducing)
 {
   int factor = 1;
-  if (vect_is_emulated_mixed_dot_prod (loop_vinfo, stmt_info))
+  if (vect_is_emulated_mixed_dot_prod (stmt_info))
/* Three dot-products and a subtraction.  */
factor = 4;
   record_stmt_cost (cost_vec, ncopies * factor, vector_stmt,
@@ -8615,8 +8618,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
: _oprnds2));
 }
 
-  bool emulated_mixed_dot_prod
-= vect_is_emulated_mixed_dot_prod (loop_vinfo, stmt_info);
+  bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info);
+
   FOR_EACH_VEC_ELT (vec_oprnds0, i, def0)
 {
   gimple *new_stmt;


[gcc r15-962] vect: Split out partial vect checking for reduction into a function

2024-05-31 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:79c3547b8adfdfdb2a167c1b9c9428902510adab

commit r15-962-g79c3547b8adfdfdb2a167c1b9c9428902510adab
Author: Feng Xue 
Date:   Wed May 29 13:45:09 2024 +0800

vect: Split out partial vect checking for reduction into a function

Partial vectorization checking for vectorizable_reduction is a piece of
relatively isolated code, which may be reused by other places. Move the
code into a new function for sharing.

2024-05-29 Feng Xue 

gcc/
* tree-vect-loop.cc (vect_reduction_update_partial_vector_usage): 
New
function.
(vectorizable_reduction): Move partial vectorization checking code 
to
vect_reduction_update_partial_vector_usage.

Diff:
---
 gcc/tree-vect-loop.cc | 137 --
 1 file changed, 77 insertions(+), 60 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a42d79c7cbf..7a6a6b6161d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7391,6 +7391,79 @@ build_vect_cond_expr (code_helper code, tree vop[3], 
tree mask,
 }
 }
 
+/* Given an operation with CODE in loop reduction path whose reduction PHI is
+   specified by REDUC_INFO, the operation has TYPE of scalar result, and its
+   input vectype is represented by VECTYPE_IN. The vectype of vectorized result
+   may be different from VECTYPE_IN, either in base type or vectype lanes,
+   lane-reducing operation is the case.  This function check if it is possible,
+   and how to perform partial vectorization on the operation in the context
+   of LOOP_VINFO.  */
+
+static void
+vect_reduction_update_partial_vector_usage (loop_vec_info loop_vinfo,
+   stmt_vec_info reduc_info,
+   slp_tree slp_node,
+   code_helper code, tree type,
+   tree vectype_in)
+{
+  enum vect_reduction_type reduc_type = STMT_VINFO_REDUC_TYPE (reduc_info);
+  internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info);
+  internal_fn cond_fn = get_conditional_internal_fn (code, type);
+
+  if (reduc_type != FOLD_LEFT_REDUCTION
+  && !use_mask_by_cond_expr_p (code, cond_fn, vectype_in)
+  && (cond_fn == IFN_LAST
+ || !direct_internal_fn_supported_p (cond_fn, vectype_in,
+ OPTIMIZE_FOR_SPEED)))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"can't operate on partial vectors because"
+" no conditional operation is available.\n");
+  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+}
+  else if (reduc_type == FOLD_LEFT_REDUCTION
+  && reduc_fn == IFN_LAST
+  && !expand_vec_cond_expr_p (vectype_in, truth_type_for (vectype_in),
+  SSA_NAME))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+   "can't operate on partial vectors because"
+   " no conditional operation is available.\n");
+  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+}
+  else if (reduc_type == FOLD_LEFT_REDUCTION
+  && internal_fn_mask_index (reduc_fn) == -1
+  && FLOAT_TYPE_P (vectype_in)
+  && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"can't operate on partial vectors because"
+" signed zeros cannot be preserved.\n");
+  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+}
+  else
+{
+  internal_fn mask_reduc_fn
+   = get_masked_reduction_fn (reduc_fn, vectype_in);
+  vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
+  vec_loop_lens *lens = _VINFO_LENS (loop_vinfo);
+  unsigned nvectors;
+
+  if (slp_node)
+   nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
+  else
+   nvectors = vect_get_num_copies (loop_vinfo, vectype_in);
+
+  if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
+   vect_record_loop_len (loop_vinfo, lens, nvectors, vectype_in, 1);
+  else
+   vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype_in, NULL);
+}
+}
+
 /* Function vectorizable_reduction.
 
Check if STMT_INFO performs a reduction operation that can be vectorized.
@@ -7456,7 +7529,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   bool single_defuse_cycle = false;
   bool nested_cycle = false;
   bool double_reduc = false;
-  int vec_num;
   tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
   tree cond_reduc_val = NULL_TREE;
 
@@ -8283,11 +8355,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  return false;
}
 
-  if 

[gcc r15-961] vect: Add a function to check lane-reducing code

2024-05-31 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:c0f31701556c4162463f28bc0f03007f40a6176e

commit r15-961-gc0f31701556c4162463f28bc0f03007f40a6176e
Author: Feng Xue 
Date:   Wed May 29 13:12:12 2024 +0800

vect: Add a function to check lane-reducing code

Check if an operation is lane-reducing requires comparison of code against
three kinds (DOT_PROD_EXPR/WIDEN_SUM_EXPR/SAD_EXPR).  Add an utility
function to make source coding for the check handy and concise.

2024-05-29 Feng Xue 

gcc/
* tree-vectorizer.h (lane_reducing_op_p): New function.
* tree-vect-slp.cc (vect_analyze_slp): Use new function
lane_reducing_op_p to check statement code.
* tree-vect-loop.cc (vect_transform_reduction): Likewise.
(vectorizable_reduction): Likewise, and change name of a local
variable that holds the result flag.

Diff:
---
 gcc/tree-vect-loop.cc | 29 -
 gcc/tree-vect-slp.cc  |  4 +---
 gcc/tree-vectorizer.h |  6 ++
 3 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 04a9ac64df7..a42d79c7cbf 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7650,9 +7650,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   gimple_match_op op;
   if (!gimple_extract_op (stmt_info->stmt, ))
 gcc_unreachable ();
-  bool lane_reduc_code_p = (op.code == DOT_PROD_EXPR
-   || op.code == WIDEN_SUM_EXPR
-   || op.code == SAD_EXPR);
+  bool lane_reducing = lane_reducing_op_p (op.code);
 
   if (!POINTER_TYPE_P (op.type) && !INTEGRAL_TYPE_P (op.type)
   && !SCALAR_FLOAT_TYPE_P (op.type))
@@ -7664,7 +7662,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 
   /* For lane-reducing ops we're reducing the number of reduction PHIs
  which means the only use of that may be in the lane-reducing operation.  
*/
-  if (lane_reduc_code_p
+  if (lane_reducing
   && reduc_chain_length != 1
   && !only_slp_reduc_chain)
 {
@@ -7678,7 +7676,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  since we'll mix lanes belonging to different reductions.  But it's
  OK to use them in a reduction chain or when the reduction group
  has just one element.  */
-  if (lane_reduc_code_p
+  if (lane_reducing
   && slp_node
   && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)
   && SLP_TREE_LANES (slp_node) > 1)
@@ -7738,7 +7736,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   /* To properly compute ncopies we are interested in the widest
 non-reduction input type in case we're looking at a widening
 accumulation that we later handle in vect_transform_reduction.  */
-  if (lane_reduc_code_p
+  if (lane_reducing
  && vectype_op[i]
  && (!vectype_in
  || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
@@ -8211,7 +8209,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   && loop_vinfo->suggested_unroll_factor == 1)
 single_defuse_cycle = true;
 
-  if (single_defuse_cycle || lane_reduc_code_p)
+  if (single_defuse_cycle || lane_reducing)
 {
   gcc_assert (op.code != COND_EXPR);
 
@@ -8227,7 +8225,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 mixed-sign dot-products can be implemented using signed
 dot-products.  */
   machine_mode vec_mode = TYPE_MODE (vectype_in);
-  if (!lane_reduc_code_p
+  if (!lane_reducing
  && !directly_supported_p (op.code, vectype_in, optab_vector))
 {
   if (dump_enabled_p ())
@@ -8252,7 +8250,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  For the other cases try without the single cycle optimization.  */
   if (!ok)
{
- if (lane_reduc_code_p)
+ if (lane_reducing)
return false;
  else
single_defuse_cycle = false;
@@ -8263,7 +8261,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   /* If the reduction stmt is one of the patterns that have lane
  reduction embedded we cannot handle the case of ! single_defuse_cycle.  */
   if ((ncopies > 1 && ! single_defuse_cycle)
-  && lane_reduc_code_p)
+  && lane_reducing)
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -8274,7 +8272,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 
   if (slp_node
   && !(!single_defuse_cycle
-  && !lane_reduc_code_p
+  && !lane_reducing
   && reduction_type != FOLD_LEFT_REDUCTION))
 for (i = 0; i < (int) op.num_ops; i++)
   if (!vect_maybe_update_slp_op_vectype (slp_op[i], vectype_op[i]))
@@ -8295,7 +8293,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   /* Cost the reduction op inside the loop if transformed via
  vect_transform_reduction.  Otherwise this is costed by the
  separate vectorizable_* routines.  */
-  if 

[gcc r15-904] Delete a file due to push error

2024-05-29 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:b24b081113c696f4e523c8ae53fc3ab89c3b4e4d

commit r15-904-gb24b081113c696f4e523c8ae53fc3ab89c3b4e4d
Author: Feng Xue 
Date:   Wed May 29 22:20:45 2024 +0800

Delete a file due to push error

gcc/
* tree-vect-loop.c : Removed.

Diff:
---
 gcc/tree-vect-loop.c | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
deleted file mode 100644
index e69de29bb2d..000


[gcc r15-903] vect: Unify bbs in loop_vec_info and bb_vec_info

2024-05-29 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:9c747183efa555e45200523c162021e385511be5

commit r15-903-g9c747183efa555e45200523c162021e385511be5
Author: Feng Xue 
Date:   Thu May 16 11:08:38 2024 +0800

vect: Unify bbs in loop_vec_info and bb_vec_info

Both derived classes have their own "bbs" field, which have exactly same
purpose of recording all basic blocks inside the corresponding vect region,
while the fields are composed by different data type, one is normal array,
the other is auto_vec. This difference causes some duplicated code even
handling the same stuff, almost in tree-vect-patterns. One refinement is
lifting this field into the base class "vec_info", and reset its value to
the continuous memory area pointed by two old "bbs" in each constructor
of derived classes.

2024-05-16 Feng Xue 

gcc/
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Move
initialization of bbs to explicit construction code.  Adjust the
definition of nbbs.
(update_epilogue_loop_vinfo): Update nbbs for epilog vinfo.
* tree-vect-patterns.cc (vect_determine_precisions): Make
loop_vec_info and bb_vec_info share same code.
(vect_pattern_recog): Remove duplicated vect_pattern_recog_1 loop.
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Access to bbs[0]
via base vec_info class.
(_bb_vec_info::_bb_vec_info): Initialize bbs and nbbs using data
fields of input auto_vec<> bbs.
(vect_slp_region): Use access to nbbs to replace original
bbs.length().
(vect_schedule_slp_node): Access to bbs[0] via base vec_info class.
* tree-vectorizer.cc (vec_info::vec_info): Add initialization of
bbs and nbbs.
(vec_info::insert_seq_on_entry): Access to bbs[0] via base vec_info
class.
* tree-vectorizer.h (vec_info): Add new fields bbs and nbbs.
(LOOP_VINFO_NBBS): New macro.
(BB_VINFO_BBS): Rename BB_VINFO_BB to BB_VINFO_BBS.
(BB_VINFO_NBBS): New macro.
(_loop_vec_info): Remove field bbs.
(_bb_vec_info): Rename field bbs.

Diff:
---
 gcc/tree-vect-loop.c  |   0
 gcc/tree-vect-loop.cc |   7 ++-
 gcc/tree-vect-patterns.cc | 142 +-
 gcc/tree-vect-slp.cc  |  23 +---
 gcc/tree-vectorizer.cc|   7 ++-
 gcc/tree-vectorizer.h |  23 
 6 files changed, 74 insertions(+), 128 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
new file mode 100644
index 000..e69de29bb2d
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 3b94bb13a8b..04a9ac64df7 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1028,7 +1028,6 @@ bb_in_loop_p (const_basic_block bb, const void *data)
 _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
   : vec_info (vec_info::loop, shared),
 loop (loop_in),
-bbs (XCNEWVEC (basic_block, loop->num_nodes)),
 num_itersm1 (NULL_TREE),
 num_iters (NULL_TREE),
 num_iters_unchanged (NULL_TREE),
@@ -1079,8 +1078,9 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
  case of the loop forms we allow, a dfs order of the BBs would the same
  as reversed postorder traversal, so we are safe.  */
 
-  unsigned int nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p,
- bbs, loop->num_nodes, loop);
+  bbs = XCNEWVEC (basic_block, loop->num_nodes);
+  nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p, bbs,
+loop->num_nodes, loop);
   gcc_assert (nbbs == loop->num_nodes);
 
   for (unsigned int i = 0; i < nbbs; i++)
@@ -11667,6 +11667,7 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree 
advance)
 
   free (LOOP_VINFO_BBS (epilogue_vinfo));
   LOOP_VINFO_BBS (epilogue_vinfo) = epilogue_bbs;
+  LOOP_VINFO_NBBS (epilogue_vinfo) = epilogue->num_nodes;
 
   /* Advance data_reference's with the number of iterations of the previous
  loop and its prologue.  */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 8929e5aa7f3..88e7e34d78d 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -6925,81 +6925,41 @@ vect_determine_stmt_precisions (vec_info *vinfo, 
stmt_vec_info stmt_info)
 void
 vect_determine_precisions (vec_info *vinfo)
 {
+  basic_block *bbs = vinfo->bbs;
+  unsigned int nbbs = vinfo->nbbs;
+
   DUMP_VECT_SCOPE ("vect_determine_precisions");
 
-  if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
+  for (unsigned int i = 0; i < nbbs; i++)
 {
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-  unsigned int nbbs = loop->num_nodes;
-
-  for (unsigned int i = 0; i < nbbs; i++)
+  basic_block bb = bbs[i];
+  for (auto gsi 

[gcc r15-863] vect: Use vect representative statement instead of original in patch recog [PR115060]

2024-05-28 Thread Feng Xue via Gcc-cvs
https://gcc.gnu.org/g:a3aeff4ce95bd616a2108dc2363d9cbaba53b170

commit r15-863-ga3aeff4ce95bd616a2108dc2363d9cbaba53b170
Author: Feng Xue 
Date:   Thu May 23 15:25:53 2024 +0800

vect: Use vect representative statement instead of original in patch recog 
[PR115060]

Some utility functions (such as vect_look_through_possible_promotion) that 
are
to find out certain kind of direct or indirect definition SSA for a value, 
may
return the original one of the SSA, not its pattern representative SSA, even
pattern is involved. For example,

   a = (T1) patt_b;
   patt_b = (T2) c;// b = ...
   patt_c = not-a-cast;// c = ...

Given 'a', the mentioned function will return 'c', instead of 'patt_c'. This
subtlety would make some pattern recog code that is unaware of it mis-use 
the
original instead of the new pattern statement, which is inconsistent wth
processing logic of the pattern formation pass. This patch corrects the 
issue
by forcing another utility function (vect_get_internal_def) return the 
pattern
statement information to caller by default.

2024-05-23 Feng Xue 

gcc/
PR tree-optimization/115060
* tree-vect-patterns.cc (vect_get_internal_def): Return statement 
for
vectorization.
(vect_widened_op_tree): Call vect_get_internal_def instead of 
look_def
to get statement information.
(vect_recog_widen_abd_pattern): No need to call 
vect_stmt_to_vectorize.

Diff:
---
 gcc/tree-vect-patterns.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index a313dc64643..8929e5aa7f3 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -266,7 +266,7 @@ vect_get_internal_def (vec_info *vinfo, tree op)
   stmt_vec_info def_stmt_info = vinfo->lookup_def (op);
   if (def_stmt_info
   && STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_internal_def)
-return def_stmt_info;
+return vect_stmt_to_vectorize (def_stmt_info);
   return NULL;
 }
 
@@ -655,7 +655,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info 
stmt_info, tree_code code,
 
  /* Recursively process the definition of the operand.  */
  stmt_vec_info def_stmt_info
-   = vinfo->lookup_def (this_unprom->op);
+   = vect_get_internal_def (vinfo, this_unprom->op);
+
  nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
   widened_code, shift_p, max_nops,
   this_unprom, common_type,
@@ -1739,7 +1740,6 @@ vect_recog_widen_abd_pattern (vec_info *vinfo, 
stmt_vec_info stmt_vinfo,
   if (!abd_pattern_vinfo)
 return NULL;
 
-  abd_pattern_vinfo = vect_stmt_to_vectorize (abd_pattern_vinfo);
   gcall *abd_stmt = dyn_cast  (STMT_VINFO_STMT (abd_pattern_vinfo));
   if (!abd_stmt
   || !gimple_call_internal_p (abd_stmt)