To align vectorized def/use when lane-reducing op is present in loop reduction,
we may need to insert extra trivial pass-through copies, which would cause
mismatch between lane-reducing vector copy and loop mask index. This could be
fixed by computing the right index around a new counter on effective lane-
reducing vector copies.
Thanks,
Feng
---
gcc/
PR tree-optimization/116985
* tree-vect-loop.cc (vect_transform_reduction): Compute loop mask
index based on effective vector copies for reduction op.
---
gcc/tree-vect-loop.cc | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index ade72a5124f..025442aabc3 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8916,6 +8916,7 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info);
unsigned num = vec_oprnds[reduc_index == 0 ? 1 : 0].length ();
+ unsigned mask_index = 0;
for (unsigned i = 0; i < num; ++i)
{
@@ -8954,7 +8955,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
std::swap (vop[0], vop[1]);
}
tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks,
- vec_num * ncopies, vectype_in, i);
+ vec_num * ncopies, vectype_in,
+ mask_index++);
gcall *call = gimple_build_call_internal (cond_fn, 4, mask,
vop[0], vop[1], vop[0]);
new_temp = make_ssa_name (vec_dest, call);
@@ -8971,7 +8973,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
if (masked_loop_p && mask_by_cond_expr)
{
tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks,
- vec_num * ncopies, vectype_in, i);
+ vec_num * ncopies, vectype_in,
+ mask_index++);
build_vect_cond_expr (code, vop, mask, gsi);
}
--
2.17.1