https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122474
--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #11) > Looking at the PR123053 testcase the guard of the VEC_SHL_INSERT optab check > is false. > > if ((double_reduc || neutral_op) > && !nunits_out.is_constant () > && (SLP_TREE_LANES (slp_node) != 1 && !reduc_chain) > && (!neutral_op > || !operand_equal_p (neutral_op, > vect_phi_initial_value (reduc_def_phi))) > && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT, > vectype_out, OPTIMIZE_FOR_SPEED)) > > in particular SLP_TREE_LANES (slp_node) == 1. r16-4558-g1b387bd8978577 > added this check, commenting > > "This is however not needed if the target can do the reduction using the new > optabs, and the initial reduction value matches the neutral value and we > have one SLP lane while not having a reduction chain." > > But the check does not match what is done - instead > > && (!(SLP_TREE_LANES (slp_node) == 1 > && !reduc_chain > && neutral_op > && operand_equal_p (neutral_op, ...)) > > instead would match what the comment in the commit says. > > The following patch fixes that particular testcase for me. I'll post it > and see if the risc-v CI still works. > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index aa59cd1a39d..d3bb788d866 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -7653,12 +7653,16 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > /* For double reductions, and for SLP reductions with a neutral value, > we construct a variable-length initial vector by loading a vector > full of the neutral value and then shift-and-inserting the start > - values into the low-numbered elements. */ > + values into the low-numbered elements. This is however not needed > + if the target can do the reduction using the new optabs, and the > initial > + reduction value matches the neutral value and we have one SLP lane > + while not having a reduction chain. */ > if ((double_reduc || neutral_op) > && !nunits_out.is_constant () > - && (SLP_TREE_LANES (slp_node) != 1 && !reduc_chain) > - && (!neutral_op > - || !operand_equal_p (neutral_op, > + && !(SLP_TREE_LANES (slp_node) == 1 > + && !reduc_chain > + && neutral_op > + && operand_equal_p (neutral_op, > vect_phi_initial_value (reduc_def_phi))) > && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT, > vectype_out, OPTIMIZE_FOR_SPEED)) Bootstrapped & tested on aarch64-linux, but +FAIL: gcc.target/aarch64/sve/vect-reduc-bool-1.c scan-tree-dump-times vect "opt imized: loop vectorized" 2 +FAIL: gcc.target/aarch64/sve/vect-reduc-bool-19.c scan-tree-dump-times vect "op timized: loop vectorized" 1 +FAIL: gcc.target/aarch64/sve/vect-reduc-bool-2.c scan-tree-dump-times vect "opt imized: loop vectorized" 2 +FAIL: gcc.target/aarch64/sve/vect-reduc-bool-20.c scan-tree-dump-times vect "optimized: loop vectorized" 2 +FAIL: gcc.target/aarch64/sve/vect-reduc-bool-3.c scan-tree-dump-times vect "optimized: loop vectorized" 2 +FAIL: gcc.target/aarch64/sve/vect-reduc-bool-4.c scan-tree-dump-times vect "optimized: loop vectorized" 2 +FAIL: gcc.target/aarch64/sve/vect-reduc-bool-5.c scan-tree-dump-times vect "optimized: loop vectorized" 2 +FAIL: gcc.target/aarch64/sve/vect-reduc-bool-6.c scan-tree-dump-times vect "optimized: loop vectorized" 2 +FAIL: gcc.target/aarch64/sve/vect-reduc-bool-7.c scan-tree-dump-times vect "optimized: loop vectorized" 2 +FAIL: gcc.target/aarch64/sve/vect-reduc-bool-8.c scan-tree-dump-times vect "optimized: loop vectorized" 2 +FAIL: gcc.target/aarch64/sve/vect-reduc-bool-9.c check-function-bodies fand +FAIL: gcc.target/aarch64/sve/vect-reduc-bool-9.c scan-tree-dump-times vect "optimized: loop vectorized" 3 RISC-V testing still running.
