On Mon, Jan 12, 2026 at 11:58 AM Jeffrey Law <[email protected]> wrote: > > This fixes a P3 regression relative to gcc-13 on the RISC-V platform for this > code: > > unsigned char a; > > int main() { > short b = a = 0; > for (; a != 19; a++) > if (a) > b = 32872 >> a; > > if (b == 0) > return 0; > else > return 1; > } > > -march=rv64gcv_zvl256b -mabi=lp64d -O3 -ftree-vectorize > > > Doesn't need vector at all. Good code generation here looks like: > > > lui a5,%hi(a) > li a4,19 > sb a4,%lo(a)(a5) > li a0,0 > ret > > > gcc-14 and gcc-15 produce horrific code here, roughly 20 instructions, over > half of which are vector. It's not even worth posting, it's atrocious. > > The trunk improves things, but not quite to the quality of gcc-13: > > vsetivli zero,8,e16,mf2,ta,ma > vmv.v.i v1,0 > lui a5,%hi(a) > li a4,19 > vslidedown.vi v1,v1,1 > sb a4,%lo(a)(a5) > vmv.x.s a0,v1 > snez a0,a0 > ret > > > If we look at the .optimized dump we have this nugget: > > _26 = .VEC_EXTRACT ({ 0, 0, 0, 0, 0, 0, 0, 0 }, 1); > > > If we're extracting an element out of a uniform vector, then any element will > do and it's conveniently returned by uniform_vector_p. So with a simple > match.pd pattern that simplifies to _26 = 0. That in turn allows elimination > of all the vector code and simplify the return value to a constant as well, > resulting in the desired code shown earlier. > > One could easily argue that this need not be restricted to a uniform vector > and I would totally agree. But given we're in stage4, the minimal fix for > the regression seems more appropriate. But I could certainly be convinced to > handle the more general case here. > > Bootstrapped and regression tested on x86 & riscv64. Tested across the cross > configurations as well with no regressions. > > > OK for the trunk?
I think this should be closer to what I did for VEC_SHL_INSERT in r16-4742-gfcde4c81644aec (this was based on the review where I tried to do a similar thing as you did: https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658275.html ). That is, in fold_const_call (in fold-const.cc) add VEC_EXTRACT case and do similar as fold_const_vec_shl_insert. Something like: ``` static tree fold_const_vec_extract (tree, tree arg0, tree arg1) { if (TREE_CODE (arg0) != VECTOR_CST) return NULL_TREE; /* vec_extract ( dup(CST), N) -> CST. */ if (tree elem = uniform_vector_p (arg0)) return elem; return NULL_TREE; } ``` And also in match.pd add: ``` (simplify (IFN_VEC_EXTRACT (vec_duplicate @0) @1) @0) ``` I don't think we need to care about the bounds check on @1 either because then it would be undefined anyways. Thanks, Andrew > > Jeff > > > >
