Re: [RFC 7/9] Update constant creation for BB SLP with predicated tails

Richard Biener Wed, 12 Nov 2025 06:12:51 -0800

On Wed, 12 Nov 2025, Richard Sandiford wrote:

> As a general comment, we do already support constructors of
> variable-length vectors, thanks to Tejas's earlier ACLE work.
> For example: https://godbolt.org/z/Eqd8sM4cv .  The quality
> of the output is awful, but it does look correct.
> 
> Sorry if that was already common ground -- just mentioning it
> because I didn't see it in this subthread.
> 
> Richard Biener <[email protected]> writes:
> > On Fri, 7 Nov 2025, Christopher Bazley wrote:
> >
> >> 
> >> On 07/11/2025 13:35, Richard Biener wrote:
> >> > On Wed, 5 Nov 2025, Christopher Bazley wrote:
> >> >
> >> >> On 28/10/2025 13:51, Richard Biener wrote:
> >> >>> On Tue, 28 Oct 2025, Christopher Bazley wrote:
> >> >>>
> >> >>>> vect_create_constant_vectors is updated to pad with zeros
> >> >>>> between the end of a group and the end of a vector of the type
> >> >>>> chosen for the SLP node, when used for BB SLP. This function
> >> >>>> calls gimple_build_vector, which also has to be updated for
> >> >>>> SVE vector types (by using the lower bound as the number of
> >> >>>> elements, e.g., 16 for VNx16QI).
> >> >>>> ---
> >> >>>>    gcc/gimple-fold.cc   |  2 +-
> >> >>>>    gcc/tree-vect-slp.cc | 43 
> >> >>>> +++++++++++++++++++++++++++++++++++--------
> >> >>>>    2 files changed, 36 insertions(+), 9 deletions(-)
> >> >>>>
> >> >>>> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> >> >>>> index edcc04adc08..e5fe0ea12a7 100644
> >> >>>> --- a/gcc/gimple-fold.cc
> >> >>>> +++ b/gcc/gimple-fold.cc
> >> >>>> @@ -11275,7 +11275,7 @@ gimple_build_vector (gimple_stmt_iterator 
> >> >>>> *gsi,
> >> >>>>           {
> >> >>>>     gimple_seq seq = NULL;
> >> >>>>     tree type = builder->type ();
> >> >>>> -     unsigned int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> >> >>>> +     unsigned int nelts = constant_lower_bound (TYPE_VECTOR_SUBPARTS
> >> >>>> (type));
> >> >>> I don't think this is desirable in a generic helper?  How does
> >> >>> the 'builder' class deal with the all-constant case?  It seems
> >> >>> handling for constant vs. non-constant will now differ semantically
> >> >>> (instead of ICEing in one case previously).
> >> >> This was the most minimal change I could make to get the feature working
> >> >> (whilst debugging many other issues) and it seemed harmless to me, so I
> >> >> didn't
> >> >> spend much time thinking about it.
> >> >>
> >> >> I know very little about the builder, but my understanding is that
> >> >> it would
> >> >> behave as though elements beyond the lower bound do not
> >> >> exist. e.g., if the
> >> >> vector type is VNx16QI then TREE_CONSTANT would return true for the
> >> >> CONSTRUCTOR node created by build_constructor if elements 0..15 are
> >> >> constant.
> >> >>
> >> >> This is presumably not safe general-purpose behaviour, because it would
> >> >> leave
> >> >> any other elements uninitialised (which does not matter for my
> >> >> use-case). I
> >> >> have no objection to trying to solve this elsewhere (probably in
> >> >> vect_create_constant_vectors) but I'll first need to revert this
> >> >> change and
> >> >> remind myself what breaks.
> >> > Fixing this upthread would be definitely better.  Not sure exactly how.
> >> > Alternatively the change could be done in a way to assert that
> >> > the tree_vector_builder has less than or exactly the same number
> >> > of elements as constant_lower_bound of nelts.  I don't exactly
> >> > remember what the builder tracks here and what constraints for
> >> > initialization of VLA vectors are.
> >> 
> >> I've done some further investigation.
> >> 
> >> One of the tests that failed without my change to gimple_build_vector was
> >> gcc.target/aarch64/sve/slp_6.c. I made that change to enable building of 
> >> the
> >> following constant (among others):
> >> 
> >> _70 = {_85, _21, _55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
> >> 
> >> That constant has only 16 elements although the type of _70 is
> >> vector([16,16])
> >> unsigned char:
> >> 
> >> void vec_slp_int8_t (int8_t * restrict a, int8_t * restrict b, int n)
> >> {
> >> ...
> >> 
> >>   vector([16,16]) signed char vect_x0_43.58;
> >>   vector([16,16]) signed char vect__90.57;
> >>   vector([16,16]) unsigned char vect__89.56;
> >>   vector([16,16]) unsigned char vect__87.55;
> >>   vector([16,16]) signed char vect_x0_26.54;
> >>   vector([16,16]) signed char vect_x0_34.47;
> >> 
> >> ...
> >> 
> >>   vector([16,16]) signed char vect_x1_35.41;
> >> 
> >> ...
> >> 
> >>   vector([16,16]) signed char vect_x2_36.35;
> >> 
> >> ...
> >> 
> >>   void * _8;
> >>   vector([16,16]) signed char[3] * _9;
> >> 
> >> ...
> >> 
> >>   unsigned char _21;
> >>   vector([16,16]) unsigned char _22;
> >>   unsigned char _55;
> >>   vector([16,16]) unsigned char _56;
> >> 
> >> ...
> >> 
> >>   vector([16,16]) unsigned char _70;
> >>   vector([16,16]) unsigned char _84;
> >>   unsigned char _85;
> >> 
> >> ...
> >> 
> >>   <bb 5> [local count: 105119324]:
> >>   _84 = (vector([16,16]) unsigned char) vect_x0_34.47_82;
> >>   _85 = .REDUC_PLUS (_84);
> >>   _22 = (vector([16,16]) unsigned char) vect_x1_35.41_38;
> >>   _21 = .REDUC_PLUS (_22);
> >>   _56 = (vector([16,16]) unsigned char) vect_x2_36.35_58;
> >>   _55 = .REDUC_PLUS (_56);
> >>   _70 = {_85, _21, _55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
> >>   vect__89.56_3 = vect__87.55_2 + _70;
> >>   vect__90.57_4 = VIEW_CONVERT_EXPR<vector([16,16]) signed 
> >> char>(vect__89.56_3);
> >> 
> >>   <bb 6> [local count: 118111600]:
> >>   # vect_x0_43.58_5 = PHI <vect__90.57_4(5), vect_x0_26.54_1(8)>
> >>   .MASK_STORE (b_25(D), 8B, { -1, -1, -1, 0, 0, 0, 0, 0, ... }, 
> >> vect_x0_43.58_5); [tail call]
> >>   return;
> >> 
> >> The compiled code looks correct, although movi d0,#0 only zeros the first 
> >> 16
> >> bytes of the variable-length vector constant:
> 
> Architecturally, it zeros the whole register, including the “upper” SVE-only
> bits.  But what mode is the move done in?  Does the RTL move insn use
> a 16-byte mode or a VL-byte mode?
> 
> >> 
> >>     addvl    x0, sp, #2
> >>     movi    d0, #0
> >>     st1b    z0.b, p6, [sp, #2, mul vl]
> >>     uaddv    d27, p6, z27.b
> >>     uaddv    d26, p6, z26.b
> >>     uaddv    d25, p6, z25.b
> >>     str    b27, [x0]
> >>     addvl    x0, sp, #1
> >>     add    x0, x0, 1
> >>     ptrue    p7.b, vl3
> >>     ld1b    z0.b, p6/z, [sp, #2, mul vl]
> >>     st1b    z0.b, p6, [sp, #1, mul vl]
> >>     str    b26, [x0]
> >>     ld1b    z0.b, p6/z, [sp, #1, mul vl]
> >>     st1b    z0.b, p6, [sp]
> >>     str    b25, [sp, 2]
> >>     ld1b    z0.b, p6/z, [sp]
> >>     add    z28.b, z0.b, z28.b
> >>     st1b    z28.b, p7, [x1]
> >>     addvl    sp, sp, #3
> >>     .cfi_def_cfa_offset 0
> >>     ret
> >> 
> >> (This code has already been noted to be inefficient, which I plan to 
> >> address
> >> separately.)
> >> 
> >> The decision about how many bytes to zero is made in the calling function,
> >> vect_create_constant_vectors (which also uses constant_lower_bound), rather
> >> than in gimple_build_vector:
> >> 
> >>   unsigned int elt_count = group_size;
> >>   if (is_a<bb_vec_info> (vinfo))
> >>     {
> >>       /* We don't use duplicate_and_interleave for basic block 
> >> vectorization.
> >>     We know that the group size fits within a single vector, so all we need
> >>     to do for VLA is to pad the constant to the minimum vector length.  */
> >>       nunits = constant_lower_bound (TYPE_VECTOR_SUBPARTS (vector_type));
> >>       elt_count = MAX (nunits, group_size);
> >>     }
> >> 
> >> My current understanding is that you don't object to this part of my 
> >> change.
> >> Whatever happens in gimple_build_vector won’t alter the fact that only the
> >> minimum number of bytes are zeroed, and in most cases that’s the desirable
> >> outcome.
> >> 
> >> I therefore plan to keep my modification to gimple_build_vector, but add an
> >> assertion that builder->encoded_nelts () <= constant_lower_bound
> >> (TYPE_VECTOR_SUBPARTS (builder->type ())) so that the modified function 
> >> never
> >> builds fewer elements than expected when one of them is non-constant. Would
> >> that be OK?
> >
> > I'm not sure builder->encoded_nelts () is the correct thing to check
> > here.  In particular any stepped encoding should be rejected as well,
> > so nelts_per_pattern () must be <= 2.  And even then the interpretation
> > is then to fill with the last value IIRC, and as you get zero-filling
> > with building a "short" CTOR that last element should be a zero.  I'm
> > not sure how to get at the "last" value, but I think that given
> > we create a "short" CTOR we need to check that all remaining
> > elements of the VLA vector are actually encoded as zeros?
> 
> Agreed.  The only valid situations seem to be:
> 
> (1) a duplicate of a single zero, where:
> 
>     npatterns == nelts_per_pattern == encoded_nelts == 1
> 
>     and the only encoded value is zero
> 
> (2) the combination of:
> 
>     - nelts_per_pattern == 2
>     - multiple_p (TYPE_VECTOR_SUBPARTS (type), npatterns)
>     - the second half of the encoded elements are all zeros
> 
> But these combinations would not come about by chance.  The caller
> would have to take steps to ensure that they're true.  So rather
> than check for these relatively complex conditions, it  might
> be clearer to add a new gimple_build interface that explicitly
> fills with zeros, using a normal array (instead of a
> tree_vector_builder) for the explicitly-initialised elements.


So I think what we need is setting the "lowpart" (aka constant lower_bound
of a VLA vector register) from a fixed-size constructor.  I'd
imagine most VLA vector ISAs can achieve that at least via a
"zero-extending" load?  But how would we represent this in GIMPLE?

I fear the above restrictions are too tight for what we need for
BB SLP which is having an arbitrary "lowpart" CTOR.

Richard.

> Thanks,
> Richard
> 
> > I hope Richard can give us a hint at what's the correct thing to do
> > there.  In principle creating a fixed-size vector CTOR and then
> > using a VEC_PERM to fill that with zeros like VEC_PERM <fixed-length,
> > VLA zero, { 0, 1, 2, 3, 4, 5, 6 ... }> would be that, or simply
> > a V_C_E of the fixed-length vector to the VLA vector.
> >
> > Richard.
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [RFC 7/9] Update constant creation for BB SLP with predicated tails

Reply via email to