The 10/30/2025 10:34, Richard Biener wrote:
> On Thu, 30 Oct 2025, Tamar Christina wrote:
>
> > This fixes the two reported ICEs by accounting for that neutral_op can be
> > NULL
> > and that we should require IFN_VEC_SHL_INSERT for non reduction defs.
> >
> > The conditions were taken from what is required to delay the reduction with
> > the
> > initial values later on in vect_transform_cycle_phi.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/122474
> > PR tree-optimization/122475
> > * tree-vect-loop.cc (vectorizable_reduction): Tighten IFN_VEC_SHL_INSERT
> > requirement.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/122474
> > PR tree-optimization/122475
> > * gcc.dg/vect/pr122474.c: New test.
> > * gcc.dg/vect/pr122475.c: New test.
> > * gcc.target/aarch64/sve/vect-reduc-bool-19.c: New test.
> > * gcc.target/aarch64/sve/vect-reduc-bool-20.c: New test.
> >
> > ---
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr122474.c
> > b/gcc/testsuite/gcc.dg/vect/pr122474.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..c5d70b518a0078a6ec887a37d1c898365ec6f01d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/pr122474.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-fwrapv -Waggressive-loop-optimizations -w" }
> > */
> > +/* { dg-additional-options "-march=armv8-a+sve" { target aarch64*-*-* } }
> > */
> > +/* Check that we don't ICE. */
> > +
> > +_Bool a;
> > +void b(unsigned c[][8][8][8][8][8]) {
> > + for (int d = 1; d; d += 3)
> > + for (int e = 0; e < 11; e += 2)
> > + for (short f = 0; f < 1; f = 30482)
> > + for (short g = 0; g < 014; g++)
> > + a = ({
> > + int h = a;
> > + int i = c[2][f][d][d][d][d] ^ c[g][g][1][2][g][g];
> > + i ? h : i;
> > + });
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "optimized: loop vectorized" 1 "vect"
> > { target aarch64*-*-* } } } */
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr122475.c
> > b/gcc/testsuite/gcc.dg/vect/pr122475.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..ed229c5475cc96fc0ef3beea71b101db89e08916
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/pr122475.c
> > @@ -0,0 +1,13 @@
> > +/* { dg-additional-options "-march=armv8-a+sve" { target aarch64*-*-* } }
> > */
> > +/* Check that we don't ICE. */
> > +int a;
> > +int b;
> > +int main() {
> > + for (char t = 0; t < 14; t += 2)
> > + for (int u = 0; u < 242; u += 4) {
> > + a = a < 0 ? a : 0;
> > + b = b < 0 ? b : 0;
> > + }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "optimized: loop vectorized" 1 "vect"
> > { target aarch64*-*-* } } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-reduc-bool-19.c
> > b/gcc/testsuite/gcc.target/aarch64/sve/vect-reduc-bool-19.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..6492c44c065159d06d7e69499dfc9834721d62ef
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-reduc-bool-19.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-mautovec-preference=sve-only
> > -fdump-tree-vect-details -O3 --param vect-epilogues-nomask=0" } */
> > +
> > +int p[128];
> > +
> > +bool __attribute__((noipa))
> > +fand (int n, bool r1, bool r2)
> > +{
> > + bool r = true;
> > + for (int i = 0; i < (n/2); i+=2)
> > + {
> > + r &= (p[i] != 0) & r1;
> > + r &= (p[i+1] != 0) & r2;
> > + }
> > + return r;
> > +}
> > +/* { dg-final { scan-tree-dump-times "optimized: loop vectorized" 1 "vect"
> > } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-reduc-bool-20.c
> > b/gcc/testsuite/gcc.target/aarch64/sve/vect-reduc-bool-20.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..83c5c206dd2597c3776317005a3dd38ee5991f16
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-reduc-bool-20.c
> > @@ -0,0 +1,32 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-mautovec-preference=sve-only
> > -fdump-tree-vect-details -O3 --param vect-epilogues-nomask=0" } */
> > +
> > +#include <stdbool.h>
> > +#include <stdint.h>
> > +
> > +void vec_slp_cmp (char* restrict a, char* restrict b, int n) {
> > + bool x0 = b[0] != 0;
> > + bool x1 = b[1] != 0;
> > + bool x2 = b[2] != 0;
> > + bool x3 = b[3] != 0;
> > + for (int i = 0; i < n; ++i) {
> > + x0 &= (a[i * 4] != 0);
> > + x1 &= (a[i * 4 + 1] != 0);
> > + x2 &= (a[i * 4 + 2] != 0);
> > + x3 &= (a[i * 4 + 3] != 0);
> > + }
> > + b[0] = x0;
> > + b[1] = x1;
> > + b[2] = x2;
> > + b[3] = x3;
> > +}
> > +
> > +void vec_slp_cmp1 (char* restrict a, char* restrict b, int n) {
> > + bool x0 = b[0] != 0;
> > + for (int i = 0; i < n; ++i) {
> > + x0 &= (a[i] != 0);
> > + }
> > + b[0] = x0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "optimized: loop vectorized" 2 "vect"
> > { target aarch64*-*-* } } } */
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index
> > 50cdc2a90fa29c1e0d116c0589bc246e6d8fcc84..bfcea0d52337eead7ee80572963f986bee00591d
> > 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -7577,8 +7577,14 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
> > values into the low-numbered elements. */
> > if ((double_reduc || neutral_op)
> > && !nunits_out.is_constant ()
> > - && (SLP_TREE_LANES (slp_node) != 1 && !reduc_chain)
> > - && !operand_equal_p (neutral_op, vect_phi_initial_value
> > (reduc_def_phi))
> > + && (!neutral_op
> > + || (!reduc_chain
> > + && (SLP_TREE_LANES (slp_node) != 1
> > + || (!VECT_REDUC_INFO_REUSED_ACCUMULATOR (reduc_info)
>
> So this will be always true given we set
> VECT_REDUC_INFO_REUSED_ACCUMULATOR only during transform. Meaning
> this will be a source of mismatch which is why I earlier suggested
> that we'd need to ensure we can later drop the requirement, like
> forcing of no accumulator re-use. Thus, after the checks we can
> perform here, if ...
sure, and it's not the only thing that's wrong. As I tried to ask on the
ticket and on IRC,
!operand_equal_p (neutal_op, vect_phi_initial_value (reduc_def_phi)) is always
true for
boolean reductions.
As I mentioned in both places, neutral_op_for_reduction returns a mask value
for bitwise
operations. i.e. it returns a boolean value with -1. This will never work for
a boolean
reduction because initial value is boolean 0/1. So trivially this condition
here and
anywhere else not using the value as a mask always going to be false for
boolean reductions.
So it's hard to see what is going to work and what isn't. I did try to modify
neutral_op_for_reduction with an overload to indicate that for
VECTOR_BOOLEAN_TYPE_P types
it should return 0/1 and use that during this analysis, because for the most
part during
analysis it only cares about if it exist or not.
>
> > + && STMT_VINFO_DEF_TYPE (stmt_info)
> > + != vect_reduction_def))
> > + && !operand_equal_p (neutral_op,
> > + vect_phi_initial_value (reduc_def_phi))))
> > && !direct_internal_fn_supported_p (IFN_VEC_SHL_INSERT,
> > vectype_out, OPTIMIZE_FOR_SPEED))
>
> ... this fails, allow to continue but set some
> VECT_REDUC_INFO_MAY_NOT_REUSE_ACCUMULATOR or so?
>
> I'd suggest to push the !neutral_op change and rework the rest in
> a way suggested?
But I don't see how you can drop it.. not reusing the accumulator requires
the vector splat and inserting of the neutral values. so you end up with
needing VEC_SHL_INSERT...
So VECT_REDUC_INFO_MAY_NOT_REUSE_ACCUMULATOR is a dead end if you don't have
VEC_SHL_INSERT...
>
> That said, the accumulator re-use might be not an actual blocker
> to delay the initial value adjustment, but just an implementation
> detail missing. Does the other bug run into exactly that?
>
The other case gets passed here because we only support accumulator re-use
for reduction defs. The check on STMT_VINFO_DEF_TYPE (stmt_info) ==
vect_reduction_def
is enough to fix it.
I copied the full condition just to be consistent expecting it to be set during
analysis
and didn't realize it was only set by vect_find_reusable_accumulator.
Tamar.
> Richard.
>
> > {
> >
> >
> >
>
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
--