On Mon, Jan 12, 2026 at 11:58 AM Jeffrey Law
<[email protected]> wrote:
>
> This fixes a P3 regression relative to gcc-13 on the RISC-V platform for this 
> code:
>
> unsigned char a;
>
> int main() {
>   short b = a = 0;
>   for (; a != 19; a++)
>     if (a)
>       b = 32872 >> a;
>
>   if (b == 0)
>     return 0;
>   else
>     return 1;
> }
>
> -march=rv64gcv_zvl256b -mabi=lp64d -O3 -ftree-vectorize
>
>
> Doesn't need vector at all.  Good code generation here looks like:
>
>
>         lui     a5,%hi(a)
>         li      a4,19
>         sb      a4,%lo(a)(a5)
>         li      a0,0
>         ret
>
>
> gcc-14 and gcc-15 produce horrific code here, roughly 20 instructions, over 
> half of which are vector.  It's not even worth posting, it's atrocious.
>
> The trunk improves things, but not quite to the quality of gcc-13:
>
>         vsetivli        zero,8,e16,mf2,ta,ma
>         vmv.v.i v1,0
>         lui     a5,%hi(a)
>         li      a4,19
>         vslidedown.vi   v1,v1,1
>         sb      a4,%lo(a)(a5)
>         vmv.x.s a0,v1
>         snez    a0,a0
>         ret
>
>
> If we look at the .optimized dump we have this nugget:
>
>   _26 = .VEC_EXTRACT ({ 0, 0, 0, 0, 0, 0, 0, 0 }, 1);
>
>
> If we're extracting an element out of a uniform vector, then any element will 
> do and it's conveniently returned by uniform_vector_p.    So with a simple 
> match.pd pattern that simplifies to _26 = 0.  That in turn allows elimination 
> of all the vector code and simplify the return value to a constant as well, 
> resulting in the desired code shown earlier.
>
> One could easily argue that this need not be restricted to a uniform vector 
> and I would totally agree.  But given we're in stage4, the minimal fix for 
> the regression seems more appropriate.  But I could certainly be convinced to 
> handle the more general case here.
>
> Bootstrapped and regression tested on x86 & riscv64.  Tested across the cross 
> configurations as well with no regressions.
>
>
> OK for the trunk?

I think this should be closer to what I did for VEC_SHL_INSERT in
r16-4742-gfcde4c81644aec (this was based on the review where I tried
to do a similar thing as you did:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658275.html ).

That is, in fold_const_call (in fold-const.cc) add VEC_EXTRACT case
and do similar as fold_const_vec_shl_insert.
Something like:
```
static tree
fold_const_vec_extract (tree, tree arg0, tree arg1)
{
  if (TREE_CODE (arg0) != VECTOR_CST)
    return NULL_TREE;

  /* vec_extract ( dup(CST), N) -> CST. */
  if (tree elem = uniform_vector_p (arg0))
      return elem;

  return NULL_TREE;
}
```
And also  in match.pd add:
```
(simplify
 (IFN_VEC_EXTRACT (vec_duplicate @0) @1)
  @0)
```

I don't think we need to care about the bounds check on @1 either
because then it would be undefined anyways.


Thanks,
Andrew


>
> Jeff
>
>
>
>

Reply via email to