On 29/07/2025 10:13, Richard Biener wrote:
On Tue, 29 Jul 2025, Alfie Richards wrote:

(Whoops, s/WIP/RFC)

Hi All,

This patch adds support for capping VF at runtime for VLA loops with a
data dependency.
On 28/07/2025 15:56, Alfie Richards wrote:
Hi All,

This patch adds support for capping VF at runtime for VLA loops with a
data dependency.

Previously, no loop with a data dependency could be vectorized with VLA as we
made no assumption on the upper bound of vector length. This adds basic
support
for this case (only for partial vectors with a full mask though).

The bump_vector_ptr logic is incorrect right now. I dont understand it
sufficiently yet.

Thoughts on this initial direction?

gcc/ChangeLog:

        * tree-vect-data-refs.cc (bump_vector_ptr): Add early return to stop
          ICE (incorrect logic though).
        * tree-vect-loop-manip.cc (vect_set_loop_invariant_group_controls):
        New
          function.
        (vect_set_loop_condition_normal): Set rgroup controls in capped case.
        (vect_set_loop_condition): Add logic to create capped VF ssa node.
        (vect_gen_vector_loop_niters): Add logic to peel in case of capped VF.
        (vect_do_peeling): Ditto.
        * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info):
        * tree-vect-stmts.cc (vect_get_data_ptr_increment):
        * tree-vectorizer.h (GCC_TREE_VECTORIZER_H):
        (LOOP_VINFO_RUNTIME_VF_CAP):

There is already LOOP_VINFO_MAX_VECT_FACTOR which should be identical
to your LOOP_VINFO_RUNTIME_VF_CAP.
Ah! Thank you that's really helpful.>
I _think_ given both the IV increment and the loop mask is generated
by vect_set_loop_condition_partial_vectors the only change should
be there?  Any other "bump" should be magically OK already?  Since
max_vf is loop invariant and even constant non-poly it should be
possible to somehow adjust the .WHILE_ULT by a max() operation?
We were hoping to do this with a loop invariant number of elements per iteration, creating the predicate and VF in the loop header, and using an epilogue (possibly a VLS epilogue) afterwards. We expect this to be a better trade off overall.

Is this what you mean in this?

From what I could see in vect_set_loop_condition_partial_vectors it would be hard to adapt it to work for this situation. Hence instead adapting vect_set_loop_condition_normal which has the logic for iteration independent VF.
That said, you are constraining the VF, so IMO that's what you should
do - you keep vector types as selected but you should adjust the
actual VF, not insert workarounds elsewhere?  Or does that blow up
in your face?  Any code checking the VF either does it wrong or
it does for a reason, and then using the non-altered VF would be
wrong?

Apologies, I don't follow what you mean by this. Do you mean we should update the VF value in loop_vec_info to be a tree with the MIN_EXPR? This is what Richard S suggested previously.

I also am aware I need to do a full audit of places using VF to see which need to be updated with knowledge of this, but am still in the business of getting to grips with large bits of it.

Thanks for the feedback!
Alfie>
Thanks,
Richard.

        (use_capped_vf):
        (vect_vf_for_cost):

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/sve_cap_1.c: New test.
        * gcc.target/aarch64/sve_cap_1_run.c: New test.
---
   gcc/testsuite/gcc.target/aarch64/sve_cap_1.c  | 54 +++++++++++++
   .../gcc.target/aarch64/sve_cap_1_run.c        | 38 ++++++++++
   gcc/tree-vect-data-refs.cc                    |  2 +-
   gcc/tree-vect-loop-manip.cc                   | 76 ++++++++++++++++++-
   gcc/tree-vect-loop.cc                         | 11 +++
   gcc/tree-vect-stmts.cc                        | 12 +++
   gcc/tree-vectorizer.h                         | 25 +++++-
   7 files changed, 213 insertions(+), 5 deletions(-)
   create mode 100644 gcc/testsuite/gcc.target/aarch64/sve_cap_1.c
   create mode 100644 gcc/testsuite/gcc.target/aarch64/sve_cap_1_run.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve_cap_1.c
b/gcc/testsuite/gcc.target/aarch64/sve_cap_1.c
new file mode 100644
index 00000000000..31d97cd7619
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve_cap_1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-inline -march=armv8-a+sve -fno-vect-cost-model
-mautovec-preference=sve-only" } */
+
+#define LOOP(TYPE)                             \
+  void                                         \
+  f_##TYPE##_5 (TYPE *a, int n)                        \
+  {                                            \
+    for (long i = 0; i < n; ++i)                    \
+      a[i] += a[i - 5];                                \
+  } \
+  void                                         \
+  f_##TYPE##_10 (TYPE *a, int n)                       \
+  {                                            \
+    for (long i = 0; i < n; ++i)                    \
+      a[i] += a[i - 10];                               \
+  }
+
+LOOP (char)
+LOOP (short)
+LOOP (int)
+LOOP (float)
+LOOP (long)
+LOOP (double)

Reply via email to