Re: [PATCH][vect]: tighten IFN_VEC_SHL_INSERT requirements.

Richard Biener Thu, 30 Oct 2025 05:07:15 -0700

On Thu, 30 Oct 2025, Tamar Christina wrote:

> The 10/30/2025 10:34, Richard Biener wrote:
> > On Thu, 30 Oct 2025, Tamar Christina wrote:
> > 
> > > This fixes the two reported ICEs by accounting for that neutral_op can be 
> > > NULL
> > > and that we should require IFN_VEC_SHL_INSERT for non reduction defs.
> > > 
> > > The conditions were taken from what is required to delay the reduction 
> > > with the
> > > initial values later on in vect_transform_cycle_phi.
> > > 
> > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > -m32, -m64 and no issues
> > > 
> > > Ok for master?
> > > 
> > > Thanks,
> > > Tamar
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   PR tree-optimization/122474
> > >   PR tree-optimization/122475
> > >   * tree-vect-loop.cc (vectorizable_reduction): Tighten IFN_VEC_SHL_INSERT
> > >   requirement.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   PR tree-optimization/122474
> > >   PR tree-optimization/122475
> > >   * gcc.dg/vect/pr122474.c: New test.
> > >   * gcc.dg/vect/pr122475.c: New test.
> > >   * gcc.target/aarch64/sve/vect-reduc-bool-19.c: New test.
> > >   * gcc.target/aarch64/sve/vect-reduc-bool-20.c: New test.
> > > 
> > > ---
> > > diff --git a/gcc/testsuite/gcc.dg/vect/pr122474.c 
> > > b/gcc/testsuite/gcc.dg/vect/pr122474.c
> > > new file mode 100644
> > > index 
> > > 0000000000000000000000000000000000000000..c5d70b518a0078a6ec887a37d1c898365ec6f01d
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/pr122474.c
> > > @@ -0,0 +1,19 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-additional-options "-fwrapv -Waggressive-loop-optimizations -w" 
> > > } */
> > > +/* { dg-additional-options "-march=armv8-a+sve" { target aarch64*-*-* } 
> > > } */
> > > +/* Check that we don't ICE.  */
> > > +
> > > +_Bool a;
> > > +void b(unsigned c[][8][8][8][8][8]) {
> > > +  for (int d = 1; d; d += 3)
> > > +    for (int e = 0; e < 11; e += 2)
> > > +      for (short f = 0; f < 1; f = 30482)
> > > +        for (short g = 0; g < 014; g++)
> > > +          a = ({
> > > +            int h = a;
> > > +            int i = c[2][f][d][d][d][d] ^ c[g][g][1][2][g][g];
> > > +            i ? h : i;
> > > +          });
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump-times "optimized: loop vectorized" 1 
> > > "vect" { target aarch64*-*-* } } } */
> > > diff --git a/gcc/testsuite/gcc.dg/vect/pr122475.c 
> > > b/gcc/testsuite/gcc.dg/vect/pr122475.c
> > > new file mode 100644
> > > index 
> > > 0000000000000000000000000000000000000000..ed229c5475cc96fc0ef3beea71b101db89e08916
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/pr122475.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-additional-options "-march=armv8-a+sve" { target aarch64*-*-* } 
> > > } */
> > > +/* Check that we don't ICE.  */
> > > +int a;
> > > +int b;
> > > +int main() {
> > > +  for (char t = 0; t < 14; t += 2)
> > > +    for (int u = 0; u < 242; u += 4) {
> > > +      a = a < 0 ? a : 0;
> > > +      b = b < 0 ? b : 0;
> > > +    }
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump-times "optimized: loop vectorized" 1 
> > > "vect" { target aarch64*-*-* } } } */
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-reduc-bool-19.c 
> > > b/gcc/testsuite/gcc.target/aarch64/sve/vect-reduc-bool-19.c
> > > new file mode 100644
> > > index 
> > > 0000000000000000000000000000000000000000..6492c44c065159d06d7e69499dfc9834721d62ef
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-reduc-bool-19.c
> > > @@ -0,0 +1,17 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-additional-options "-mautovec-preference=sve-only 
> > > -fdump-tree-vect-details -O3 --param vect-epilogues-nomask=0" } */
> > > +
> > > +int p[128];
> > > +
> > > +bool __attribute__((noipa))
> > > +fand (int n, bool r1, bool r2)
> > > +{
> > > +  bool r = true;
> > > +  for (int i = 0; i < (n/2); i+=2)
> > > +    {
> > > +      r &= (p[i] != 0) & r1;
> > > +      r &= (p[i+1] != 0) & r2;
> > > +    }
> > > +  return r;
> > > +}
> > > +/* { dg-final { scan-tree-dump-times "optimized: loop vectorized" 1 
> > > "vect" } } */
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-reduc-bool-20.c 
> > > b/gcc/testsuite/gcc.target/aarch64/sve/vect-reduc-bool-20.c
> > > new file mode 100644
> > > index 
> > > 0000000000000000000000000000000000000000..83c5c206dd2597c3776317005a3dd38ee5991f16
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-reduc-bool-20.c
> > > @@ -0,0 +1,32 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-additional-options "-mautovec-preference=sve-only 
> > > -fdump-tree-vect-details -O3 --param vect-epilogues-nomask=0" } */
> > > +
> > > +#include <stdbool.h>
> > > +#include <stdint.h>
> > > +
> > > +void vec_slp_cmp (char* restrict a, char* restrict b, int n) {
> > > +    bool x0 = b[0] != 0;
> > > +    bool x1 = b[1] != 0;
> > > +    bool x2 = b[2] != 0;
> > > +    bool x3 = b[3] != 0;
> > > +    for (int i = 0; i < n; ++i) {
> > > +        x0 &= (a[i * 4] != 0);
> > > +        x1 &= (a[i * 4 + 1] != 0);
> > > +        x2 &= (a[i * 4 + 2] != 0);
> > > +        x3 &= (a[i * 4 + 3] != 0);
> > > +    }
> > > +    b[0] = x0;
> > > +    b[1] = x1;
> > > +    b[2] = x2;
> > > +    b[3] = x3;
> > > +}
> > > +
> > > +void vec_slp_cmp1 (char* restrict a, char* restrict b, int n) {
> > > +    bool x0 = b[0] != 0;
> > > +    for (int i = 0; i < n; ++i) {
> > > +        x0 &= (a[i] != 0);
> > > +    }
> > > +    b[0] = x0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump-times "optimized: loop vectorized" 2 
> > > "vect" { target aarch64*-*-* } } } */
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > index 
> > > 50cdc2a90fa29c1e0d116c0589bc246e6d8fcc84..bfcea0d52337eead7ee80572963f986bee00591d
> > >  100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -7577,8 +7577,14 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
> > >       values into the low-numbered elements.  */
> > >    if ((double_reduc || neutral_op)
> > >        && !nunits_out.is_constant ()
> > > -      && (SLP_TREE_LANES (slp_node) != 1 && !reduc_chain)
> > > -      && !operand_equal_p (neutral_op, vect_phi_initial_value 
> > > (reduc_def_phi))
> > > +      && (!neutral_op
> > > +   || (!reduc_chain
> > > +        && (SLP_TREE_LANES (slp_node) != 1
> > > +            || (!VECT_REDUC_INFO_REUSED_ACCUMULATOR (reduc_info)
> > 
> > So this will be always true given we set 
> > VECT_REDUC_INFO_REUSED_ACCUMULATOR only during transform.  Meaning
> > this will be a source of mismatch which is why I earlier suggested
> > that we'd need to ensure we can later drop the requirement, like
> > forcing of no accumulator re-use.  Thus, after the checks we can
> > perform here, if ...
> 
> sure, and it's not the only thing that's wrong.  As I tried to ask on the 
> ticket and on IRC,
> !operand_equal_p (neutal_op, vect_phi_initial_value (reduc_def_phi)) is 
> always true for
> boolean reductions.
> 
> As I mentioned in both places, neutral_op_for_reduction returns a mask value 
> for bitwise
> operations. i.e. it returns a boolean value with -1.  This will never work 
> for a boolean
> reduction because initial value is boolean 0/1.  So trivially this condition 
> here and
> anywhere else not using the value as a mask always going to be false for 
> boolean reductions.


Ah, yes.  That's a missed optimization.

> So it's hard to see what is going to work and what isn't.  I did try to modify
> neutral_op_for_reduction with an overload to indicate that for 
> VECTOR_BOOLEAN_TYPE_P types
> it should return 0/1 and use that during this analysis, because for the most 
> part during
> analysis it only cares about if it exist or not.

I think it's better to alter the equality check for booleans to
check for matching non-zero-ness here.

> > 
> > > +                && STMT_VINFO_DEF_TYPE (stmt_info)
> > > +                         != vect_reduction_def))
> > > +        && !operand_equal_p (neutral_op,
> > > +                             vect_phi_initial_value (reduc_def_phi))))
> > >        && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT,
> > >                                     vectype_out, OPTIMIZE_FOR_SPEED))
> > 
> > ... this fails, allow to continue but set some
> > VECT_REDUC_INFO_MAY_NOT_REUSE_ACCUMULATOR or so?
> > 
> > I'd suggest to push the !neutral_op change and rework the rest in
> > a way suggested?
> 
> But I don't see how you can drop it.. not reusing the accumulator requires
> the vector splat and inserting of the neutral values. so you end up with
> needing VEC_SHL_INSERT...

Hmm, indeed.  So we _must_ re-use accumulators even.  What I'm saying
is that we can't let the "let's decide during transform" stand and
not require IFN_VEC_SHL_INSERT.

So the only valid check to have here is that of the inital value being
also a neutral value.

> So VECT_REDUC_INFO_MAY_NOT_REUSE_ACCUMULATOR is a dead end if you don't have
> VEC_SHL_INSERT...
> 
> > 
> > That said, the accumulator re-use might be not an actual blocker
> > to delay the initial value adjustment, but just an implementation
> > detail missing.  Does the other bug run into exactly that?
> > 
> 
> The other case gets passed here because we only support accumulator re-use
> for reduction defs.  The check on STMT_VINFO_DEF_TYPE (stmt_info) == 
> vect_reduction_def
> is enough to fix it.

I'll note that a STMT_VINFO_DEF_TYPE check at least looks suspicious.
In the trasform-cycle case this is the loop PHI, so this means we're
dealing with a reduction, not a double reduction or a nested cycle.
So I think in vectorizable_reduction this should be covered by
!double_reduc.

Overall clearer would possibly be

 if (!nunits_out.is_constant ()
     && (double_reduc
      || !neutral_op
      || (neutral_op && !operand_equal_p (...)))
     && !direct_internal_fn_... (..))

thus refactor the conditions a bit, possibly even adding sub-comments.

I'd factor out a initial_value_neutral_for_reduction (neutral_op, initval)
function to handle the boolean case better, usable from both places.

> I copied the full condition just to be consistent expecting it to be set 
> during analysis
> and didn't realize it was only set by vect_find_reusable_accumulator.
> 
> Tamar.
> 
> > Richard.
> > 
> > >      {
> > > 
> > > 
> > > 
> > 
> > -- 
> > Richard Biener <[email protected]>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH][vect]: tighten IFN_VEC_SHL_INSERT requirements.

Reply via email to