On Tue, 29 Jul 2025, Alfie Richards wrote: > On 29/07/2025 10:13, Richard Biener wrote: > > On Tue, 29 Jul 2025, Alfie Richards wrote: > > > >> (Whoops, s/WIP/RFC) > >> > >> Hi All, > >> > >> This patch adds support for capping VF at runtime for VLA loops with a > >> data dependency. > >> On 28/07/2025 15:56, Alfie Richards wrote: > >> Hi All, > >> > >> This patch adds support for capping VF at runtime for VLA loops with a > >> data dependency. > >> > >> Previously, no loop with a data dependency could be vectorized with VLA as > >> we > >> made no assumption on the upper bound of vector length. This adds basic > >> support > >> for this case (only for partial vectors with a full mask though). > >> > >> The bump_vector_ptr logic is incorrect right now. I dont understand it > >> sufficiently yet. > >> > >> Thoughts on this initial direction? > >> > >> gcc/ChangeLog: > >> > >> * tree-vect-data-refs.cc (bump_vector_ptr): Add early return to stop > >> ICE (incorrect logic though). > >> * tree-vect-loop-manip.cc (vect_set_loop_invariant_group_controls): > >> New > >> function. > >> (vect_set_loop_condition_normal): Set rgroup controls in capped case. > >> (vect_set_loop_condition): Add logic to create capped VF ssa node. > >> (vect_gen_vector_loop_niters): Add logic to peel in case of capped VF. > >> (vect_do_peeling): Ditto. > >> * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): > >> * tree-vect-stmts.cc (vect_get_data_ptr_increment): > >> * tree-vectorizer.h (GCC_TREE_VECTORIZER_H): > >> (LOOP_VINFO_RUNTIME_VF_CAP): > > > > There is already LOOP_VINFO_MAX_VECT_FACTOR which should be identical > > to your LOOP_VINFO_RUNTIME_VF_CAP. > Ah! Thank you that's really helpful.> > > I _think_ given both the IV increment and the loop mask is generated > > by vect_set_loop_condition_partial_vectors the only change should > > be there? Any other "bump" should be magically OK already? Since > > max_vf is loop invariant and even constant non-poly it should be > > possible to somehow adjust the .WHILE_ULT by a max() operation? > We were hoping to do this with a loop invariant number of elements per > iteration, creating the predicate and VF in the loop header, and using an > epilogue (possibly a VLS epilogue) afterwards. We expect this to be a better > trade off overall. > > Is this what you mean in this? > > From what I could see in vect_set_loop_condition_partial_vectors it would be > hard to adapt it to work for this situation. Hence instead adapting > vect_set_loop_condition_normal which has the logic for iteration independent > VF. > > That said, you are constraining the VF, so IMO that's what you should > > do - you keep vector types as selected but you should adjust the > > actual VF, not insert workarounds elsewhere? Or does that blow up > > in your face? Any code checking the VF either does it wrong or > > it does for a reason, and then using the non-altered VF would be > > wrong? > > Apologies, I don't follow what you mean by this. Do you mean we should update > the VF value in loop_vec_info to be a tree with the MIN_EXPR? This is what > Richard S suggested previously.
No, why would it be a MIN_EXPR? Ah, because the VF is poly. Hmm. But yes, I think this is required for correctness eventually. > I also am aware I need to do a full audit of places using VF to see which need > to be updated with knowledge of this, but am still in the business of getting > to grips with large bits of it. There's quite some code doing vf.is_constant (), so making VF a tree will be quite disruptive. Note I've talked about making VF fractional to support re-rolling, with similar consequences. If you don't adjust VF then you still have to audit all VF uses to see if they need adjustments (see your bumping case). Masking allows us to have a VF that's not consistent with the vector types used, something we also want for basic-block vectorization, where we want to mask out inactive lanes we have no data for. I'd see to keep those related uses in mind when changing what the VF really means and how to figure whether an operation needs additional masking (for basic-block vectorization VF is always one, but there the "group size" is less than the number of vector elements). Instead of making VF a tree and making it a MIN_EXPR, a simpler (temporary) approach might be to make it a new class which could provide the APIs currently used on VF with a sane fallback in case there's a max_vf we need to apply? That might tell you places to fix. I still assume that at the moment you want to restrict this to VLA vectorization, so the extra masking should be enforced when setting up loop masking. Richard. > > Thanks for the feedback! > Alfie> > > Thanks, > > Richard. > > > >> (use_capped_vf): > >> (vect_vf_for_cost): > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.target/aarch64/sve_cap_1.c: New test. > >> * gcc.target/aarch64/sve_cap_1_run.c: New test. > >> --- > >> gcc/testsuite/gcc.target/aarch64/sve_cap_1.c | 54 +++++++++++++ > >> .../gcc.target/aarch64/sve_cap_1_run.c | 38 ++++++++++ > >> gcc/tree-vect-data-refs.cc | 2 +- > >> gcc/tree-vect-loop-manip.cc | 76 ++++++++++++++++++- > >> gcc/tree-vect-loop.cc | 11 +++ > >> gcc/tree-vect-stmts.cc | 12 +++ > >> gcc/tree-vectorizer.h | 25 +++++- > >> 7 files changed, 213 insertions(+), 5 deletions(-) > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve_cap_1.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve_cap_1_run.c > >> > >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve_cap_1.c > >> b/gcc/testsuite/gcc.target/aarch64/sve_cap_1.c > >> new file mode 100644 > >> index 00000000000..31d97cd7619 > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.target/aarch64/sve_cap_1.c > >> @@ -0,0 +1,54 @@ > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O2 -fno-inline -march=armv8-a+sve -fno-vect-cost-model > >> -mautovec-preference=sve-only" } */ > >> + > >> +#define LOOP(TYPE) \ > >> + void \ > >> + f_##TYPE##_5 (TYPE *a, int n) \ > >> + { \ > >> + for (long i = 0; i < n; ++i) \ > >> + a[i] += a[i - 5]; \ > >> + } \ > >> + void \ > >> + f_##TYPE##_10 (TYPE *a, int n) \ > >> + { \ > >> + for (long i = 0; i < n; ++i) \ > >> + a[i] += a[i - 10]; \ > >> + } > >> + > >> +LOOP (char) > >> +LOOP (short) > >> +LOOP (int) > >> +LOOP (float) > >> +LOOP (long) > >> +LOOP (double) > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)