https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122277
--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 14 Oct 2025, rdapp at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122277 > > --- Comment #2 from Robin Dapp <rdapp at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #1) > > Hmm, I see V4QI being used: > > > > ~/obj-riscv-g/gcc> /home/rguenther/obj-riscv-g/gcc/xgcc > > -B/home/rguenther/obj-riscv-g/gcc/ > > /home/rguenther/src/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118019- > > 2.c -march=rv64gcv_zvl512b -mno-vector-strict-align > > -fdiagnostics-plain-output -O3 -ftree-vectorize -O3 -march=rv64gcv_zvl512b > > -mabi=lp64d -mno-vector-strict-align -ffat-lto-objects -fno-ident -S -o > > pr118019-2.s -fdump-tree-vect-details -fopt-info-vec > > /home/rguenther/src/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118019- > > 2.c:42:21: optimized: loop vectorized using 4 byte vectors and unroll factor > > 4 > > /home/rguenther/src/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118019- > > 2.c:34:21: optimized: loop vectorized using 16 byte vectors and unroll > > factor 4 > > Ugh, sorry. I have been working on a local version that has > > int tmp[4][4]; > > instead of > > uint32_t tmp[4][4]; > > (that's what upstream x264 uses). Ah, so one change is that previously we didn't attempt single-lane SLP when a reduction chain failed vectorization in the end: note: ==> examining statement: _59 = tmp[0][i_147]; missed: permutation not supported, using elementwise access missed: Not using elementwise accesses due to variable vectorization factor. but now we do, and single-lane RVVM1SI looks better from your cost model. I believe that before my patch we never tried this because there's a conversion around the reduction and /* ??? When there's a conversion around the reduction chain 'last' isn't the entry of the reduction. */ if (STMT_VINFO_DEF_TYPE (last) != vect_reduction_def) return opt_result::failure_at (vect_location, "SLP build failed.\n"); /* It can be still vectorized as part of an SLP reduction. */ loop_vinfo->reductions.safe_push (last); we had previously - we had no way to "recover" the original non-chained reduction. That's one of the "improvements", we can now handle every reduction chain as its regular original reduction ...
