On Wed, 12 Nov 2025, Richard Sandiford wrote:
> As a general comment, we do already support constructors of
> variable-length vectors, thanks to Tejas's earlier ACLE work.
> For example: https://godbolt.org/z/Eqd8sM4cv . The quality
> of the output is awful, but it does look correct.
>
> Sorry if that was already common ground -- just mentioning it
> because I didn't see it in this subthread.
>
> Richard Biener <[email protected]> writes:
> > On Fri, 7 Nov 2025, Christopher Bazley wrote:
> >
> >>
> >> On 07/11/2025 13:35, Richard Biener wrote:
> >> > On Wed, 5 Nov 2025, Christopher Bazley wrote:
> >> >
> >> >> On 28/10/2025 13:51, Richard Biener wrote:
> >> >>> On Tue, 28 Oct 2025, Christopher Bazley wrote:
> >> >>>
> >> >>>> vect_create_constant_vectors is updated to pad with zeros
> >> >>>> between the end of a group and the end of a vector of the type
> >> >>>> chosen for the SLP node, when used for BB SLP. This function
> >> >>>> calls gimple_build_vector, which also has to be updated for
> >> >>>> SVE vector types (by using the lower bound as the number of
> >> >>>> elements, e.g., 16 for VNx16QI).
> >> >>>> ---
> >> >>>> gcc/gimple-fold.cc | 2 +-
> >> >>>> gcc/tree-vect-slp.cc | 43
> >> >>>> +++++++++++++++++++++++++++++++++++--------
> >> >>>> 2 files changed, 36 insertions(+), 9 deletions(-)
> >> >>>>
> >> >>>> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> >> >>>> index edcc04adc08..e5fe0ea12a7 100644
> >> >>>> --- a/gcc/gimple-fold.cc
> >> >>>> +++ b/gcc/gimple-fold.cc
> >> >>>> @@ -11275,7 +11275,7 @@ gimple_build_vector (gimple_stmt_iterator
> >> >>>> *gsi,
> >> >>>> {
> >> >>>> gimple_seq seq = NULL;
> >> >>>> tree type = builder->type ();
> >> >>>> - unsigned int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> >> >>>> + unsigned int nelts = constant_lower_bound (TYPE_VECTOR_SUBPARTS
> >> >>>> (type));
> >> >>> I don't think this is desirable in a generic helper? How does
> >> >>> the 'builder' class deal with the all-constant case? It seems
> >> >>> handling for constant vs. non-constant will now differ semantically
> >> >>> (instead of ICEing in one case previously).
> >> >> This was the most minimal change I could make to get the feature working
> >> >> (whilst debugging many other issues) and it seemed harmless to me, so I
> >> >> didn't
> >> >> spend much time thinking about it.
> >> >>
> >> >> I know very little about the builder, but my understanding is that
> >> >> it would
> >> >> behave as though elements beyond the lower bound do not
> >> >> exist. e.g., if the
> >> >> vector type is VNx16QI then TREE_CONSTANT would return true for the
> >> >> CONSTRUCTOR node created by build_constructor if elements 0..15 are
> >> >> constant.
> >> >>
> >> >> This is presumably not safe general-purpose behaviour, because it would
> >> >> leave
> >> >> any other elements uninitialised (which does not matter for my
> >> >> use-case). I
> >> >> have no objection to trying to solve this elsewhere (probably in
> >> >> vect_create_constant_vectors) but I'll first need to revert this
> >> >> change and
> >> >> remind myself what breaks.
> >> > Fixing this upthread would be definitely better. Not sure exactly how.
> >> > Alternatively the change could be done in a way to assert that
> >> > the tree_vector_builder has less than or exactly the same number
> >> > of elements as constant_lower_bound of nelts. I don't exactly
> >> > remember what the builder tracks here and what constraints for
> >> > initialization of VLA vectors are.
> >>
> >> I've done some further investigation.
> >>
> >> One of the tests that failed without my change to gimple_build_vector was
> >> gcc.target/aarch64/sve/slp_6.c. I made that change to enable building of
> >> the
> >> following constant (among others):
> >>
> >> _70 = {_85, _21, _55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
> >>
> >> That constant has only 16 elements although the type of _70 is
> >> vector([16,16])
> >> unsigned char:
> >>
> >> void vec_slp_int8_t (int8_t * restrict a, int8_t * restrict b, int n)
> >> {
> >> ...
> >>
> >> vector([16,16]) signed char vect_x0_43.58;
> >> vector([16,16]) signed char vect__90.57;
> >> vector([16,16]) unsigned char vect__89.56;
> >> vector([16,16]) unsigned char vect__87.55;
> >> vector([16,16]) signed char vect_x0_26.54;
> >> vector([16,16]) signed char vect_x0_34.47;
> >>
> >> ...
> >>
> >> vector([16,16]) signed char vect_x1_35.41;
> >>
> >> ...
> >>
> >> vector([16,16]) signed char vect_x2_36.35;
> >>
> >> ...
> >>
> >> void * _8;
> >> vector([16,16]) signed char[3] * _9;
> >>
> >> ...
> >>
> >> unsigned char _21;
> >> vector([16,16]) unsigned char _22;
> >> unsigned char _55;
> >> vector([16,16]) unsigned char _56;
> >>
> >> ...
> >>
> >> vector([16,16]) unsigned char _70;
> >> vector([16,16]) unsigned char _84;
> >> unsigned char _85;
> >>
> >> ...
> >>
> >> <bb 5> [local count: 105119324]:
> >> _84 = (vector([16,16]) unsigned char) vect_x0_34.47_82;
> >> _85 = .REDUC_PLUS (_84);
> >> _22 = (vector([16,16]) unsigned char) vect_x1_35.41_38;
> >> _21 = .REDUC_PLUS (_22);
> >> _56 = (vector([16,16]) unsigned char) vect_x2_36.35_58;
> >> _55 = .REDUC_PLUS (_56);
> >> _70 = {_85, _21, _55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
> >> vect__89.56_3 = vect__87.55_2 + _70;
> >> vect__90.57_4 = VIEW_CONVERT_EXPR<vector([16,16]) signed
> >> char>(vect__89.56_3);
> >>
> >> <bb 6> [local count: 118111600]:
> >> # vect_x0_43.58_5 = PHI <vect__90.57_4(5), vect_x0_26.54_1(8)>
> >> .MASK_STORE (b_25(D), 8B, { -1, -1, -1, 0, 0, 0, 0, 0, ... },
> >> vect_x0_43.58_5); [tail call]
> >> return;
> >>
> >> The compiled code looks correct, although movi d0,#0 only zeros the first
> >> 16
> >> bytes of the variable-length vector constant:
>
> Architecturally, it zeros the whole register, including the “upper” SVE-only
> bits. But what mode is the move done in? Does the RTL move insn use
> a 16-byte mode or a VL-byte mode?
>
> >>
> >> addvl x0, sp, #2
> >> movi d0, #0
> >> st1b z0.b, p6, [sp, #2, mul vl]
> >> uaddv d27, p6, z27.b
> >> uaddv d26, p6, z26.b
> >> uaddv d25, p6, z25.b
> >> str b27, [x0]
> >> addvl x0, sp, #1
> >> add x0, x0, 1
> >> ptrue p7.b, vl3
> >> ld1b z0.b, p6/z, [sp, #2, mul vl]
> >> st1b z0.b, p6, [sp, #1, mul vl]
> >> str b26, [x0]
> >> ld1b z0.b, p6/z, [sp, #1, mul vl]
> >> st1b z0.b, p6, [sp]
> >> str b25, [sp, 2]
> >> ld1b z0.b, p6/z, [sp]
> >> add z28.b, z0.b, z28.b
> >> st1b z28.b, p7, [x1]
> >> addvl sp, sp, #3
> >> .cfi_def_cfa_offset 0
> >> ret
> >>
> >> (This code has already been noted to be inefficient, which I plan to
> >> address
> >> separately.)
> >>
> >> The decision about how many bytes to zero is made in the calling function,
> >> vect_create_constant_vectors (which also uses constant_lower_bound), rather
> >> than in gimple_build_vector:
> >>
> >> unsigned int elt_count = group_size;
> >> if (is_a<bb_vec_info> (vinfo))
> >> {
> >> /* We don't use duplicate_and_interleave for basic block
> >> vectorization.
> >> We know that the group size fits within a single vector, so all we need
> >> to do for VLA is to pad the constant to the minimum vector length. */
> >> nunits = constant_lower_bound (TYPE_VECTOR_SUBPARTS (vector_type));
> >> elt_count = MAX (nunits, group_size);
> >> }
> >>
> >> My current understanding is that you don't object to this part of my
> >> change.
> >> Whatever happens in gimple_build_vector won’t alter the fact that only the
> >> minimum number of bytes are zeroed, and in most cases that’s the desirable
> >> outcome.
> >>
> >> I therefore plan to keep my modification to gimple_build_vector, but add an
> >> assertion that builder->encoded_nelts () <= constant_lower_bound
> >> (TYPE_VECTOR_SUBPARTS (builder->type ())) so that the modified function
> >> never
> >> builds fewer elements than expected when one of them is non-constant. Would
> >> that be OK?
> >
> > I'm not sure builder->encoded_nelts () is the correct thing to check
> > here. In particular any stepped encoding should be rejected as well,
> > so nelts_per_pattern () must be <= 2. And even then the interpretation
> > is then to fill with the last value IIRC, and as you get zero-filling
> > with building a "short" CTOR that last element should be a zero. I'm
> > not sure how to get at the "last" value, but I think that given
> > we create a "short" CTOR we need to check that all remaining
> > elements of the VLA vector are actually encoded as zeros?
>
> Agreed. The only valid situations seem to be:
>
> (1) a duplicate of a single zero, where:
>
> npatterns == nelts_per_pattern == encoded_nelts == 1
>
> and the only encoded value is zero
>
> (2) the combination of:
>
> - nelts_per_pattern == 2
> - multiple_p (TYPE_VECTOR_SUBPARTS (type), npatterns)
> - the second half of the encoded elements are all zeros
>
> But these combinations would not come about by chance. The caller
> would have to take steps to ensure that they're true. So rather
> than check for these relatively complex conditions, it might
> be clearer to add a new gimple_build interface that explicitly
> fills with zeros, using a normal array (instead of a
> tree_vector_builder) for the explicitly-initialised elements.
So I think what we need is setting the "lowpart" (aka constant lower_bound
of a VLA vector register) from a fixed-size constructor. I'd
imagine most VLA vector ISAs can achieve that at least via a
"zero-extending" load? But how would we represent this in GIMPLE?
I fear the above restrictions are too tight for what we need for
BB SLP which is having an arbitrary "lowpart" CTOR.
Richard.
> Thanks,
> Richard
>
> > I hope Richard can give us a hint at what's the correct thing to do
> > there. In principle creating a fixed-size vector CTOR and then
> > using a VEC_PERM to fill that with zeros like VEC_PERM <fixed-length,
> > VLA zero, { 0, 1, 2, 3, 4, 5, 6 ... }> would be that, or simply
> > a V_C_E of the fixed-length vector to the VLA vector.
> >
> > Richard.
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)