> Am 02.06.2025 um 14:16 schrieb Tamar Christina <tamar.christ...@arm.com>: > > >> >> -----Original Message----- >> From: Richard Biener <rguent...@suse.de> >> Sent: Monday, May 26, 2025 2:56 PM >> To: Tamar Christina <tamar.christ...@arm.com> >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com> >> Subject: RE: [PATCH 1/2]middle-end: Apply loop->unroll directly in vectorizer >> >> On Mon, 19 May 2025, Tamar Christina wrote: >> >>>>> /* Complete the target-specific cost calculations. */ >>>>> loop_vinfo->vector_costs->finish_cost (loop_vinfo->scalar_costs); >>>>> vec_prologue_cost = loop_vinfo->vector_costs->prologue_cost (); >>>>> @@ -12373,6 +12394,13 @@ vect_transform_loop (loop_vec_info >> loop_vinfo, >>>> gimple *loop_vectorized_call) >>>>> dump_printf_loc (MSG_NOTE, vect_location, "Disabling unrolling due to" >>>>> " variable-length vectorization factor\n"); >>>>> } >>>>> + >>>>> + /* When we have unrolled the loop due to a user requested value we >> should >>>>> + leave it up to the RTL unroll heuristics to determine if it's still >>>>> worth >>>>> + while to unroll more. */ >>>>> + if (LOOP_VINFO_USER_UNROLL (loop_vinfo)) >>>> >>>> What I meant with copying of LOOP_VINFO_USER_UNROLL is that I think >>>> you'll never get to this being true as you set the suggested unroll >>>> factor for the costing attempt of the not extra unrolled loop but >>>> the transform where you want to reset is is when the unrolling >>>> was actually applied? >>> >>> It was being set on every analysis of the main loop body. Since it wasn't >>> actually cleared until we've picked a mode and did codegen the condition >>> would >>> be true. >>> >>> However.. >>> >>>> >>>> That said, it would be clearer if LOOP_VINFO_USER_UNROLL would be >>>> set in vect_analyze_loop_1 where we have >>>> >>> >>> I agree this is much nicer. >>> >>> Bootstrapped Regtested on aarch64-none-linux-gnu, >>> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu >>> -m32, -m64 and no issues. >>> >>> Ok for master? >>> >>> Thanks, >>> Tamar >>> >>> gcc/ChangeLog: >>> >>> * doc/extend.texi: Document pragma unroll interaction with vectorizer. >>> * tree-vectorizer.h (LOOP_VINFO_USER_UNROLL): New. >>> (class _loop_vec_info): Add user_unroll. >>> * tree-vect-loop.cc (vect_analyze_loop_1 ): Set >>> suggested_unroll_factor and retry. >>> (_loop_vec_info::_loop_vec_info): Initialize user_unroll. >>> (vect_transform_loop): Clear the loop->unroll value if the pragma was >>> used. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/aarch64/unroll-vect.c: New test. >>> >>> -- inline copy of patch -- >>> >>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi >>> index >> e87a3c271f8420d8fd175823b5bb655f76c89afe..f8261d13903afc90d3341c09a >> b3fdbd0ab96ea49 100644 >>> --- a/gcc/doc/extend.texi >>> +++ b/gcc/doc/extend.texi >>> @@ -10398,6 +10398,11 @@ unrolled @var{n} times regardless of any >> commandline arguments. >>> When the option is @var{preferred} then the user is allowed to override the >>> unroll amount through commandline options. >>> >>> +If the loop was vectorized the unroll factor specified will be used to >>> seed the >>> +vectorizer unroll factor. Whether the loop is unrolled or not will be >>> +determined by target costing. The resulting vectorized loop may still be >>> +unrolled more in later passes depending on the target costing. >>> + >>> @end table >>> >>> @node Thread-Local >>> diff --git a/gcc/testsuite/gcc.target/aarch64/unroll-vect.c >> b/gcc/testsuite/gcc.target/aarch64/unroll-vect.c >>> new file mode 100644 >>> index >> 0000000000000000000000000000000000000000..3cb774ba95787ebee488fb >> e7306299ef28e6bb35 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/aarch64/unroll-vect.c >>> @@ -0,0 +1,20 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-additional-options "-O3 -march=armv8-a --param aarch64-autovec- >> preference=asimd-only -std=gnu99" } */ >>> +/* { dg-final { check-function-bodies "**" "" "" } } */ >>> + >>> +/* >>> +** f1: >>> +** ... >>> +** add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s >>> +** add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s >>> +** add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s >>> +** add v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s >>> +** ... >>> +*/ >>> +void f1 (int *restrict a, int n) >>> +{ >>> +#pragma GCC unroll 16 >>> + for (int i = 0; i < n; i++) >>> + a[i] *= 2; >>> +} >>> + >>> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc >>> index >> fe6f3cf188e40396b299ff9e814cc402bc2d4e2d..f215b6bc7881e7e659272cefbe >> 3d5c8892ef768c 100644 >>> --- a/gcc/tree-vect-loop.cc >>> +++ b/gcc/tree-vect-loop.cc >>> @@ -1073,6 +1073,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, >> vec_info_shared *shared) >>> peeling_for_gaps (false), >>> peeling_for_niter (false), >>> early_breaks (false), >>> + user_unroll (false), >>> no_data_dependencies (false), >>> has_mask_store (false), >>> scalar_loop_scaling (profile_probability::uninitialized ()), >>> @@ -3428,27 +3429,51 @@ vect_analyze_loop_1 (class loop *loop, >> vec_info_shared *shared, >>> res ? "succeeded" : "failed", >>> GET_MODE_NAME (loop_vinfo->vector_mode)); >>> >>> - if (res && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) && >> suggested_unroll_factor > 1) >>> + auto user_unroll = LOOP_VINFO_LOOP (loop_vinfo)->unroll; >>> + if (res && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) >>> + /* Check to see if the user wants to unroll or if the target wants >>> to. */ >>> + && (suggested_unroll_factor > 1 || user_unroll > 1)) >>> { >>> - if (dump_enabled_p ()) >>> - dump_printf_loc (MSG_NOTE, vect_location, >>> + if (suggested_unroll_factor == 1) >>> + { >>> + int assumed_vf = vect_vf_for_cost (loop_vinfo); >>> + int unroll_fact = user_unroll / assumed_vf; >>> + suggested_unroll_factor = 1 << ceil_log2 (unroll_fact); >> >> So with SLP we can have non-power-of-two vectorization factors, and >> certainly the extra unrolling we apply ontop of determining the VF >> has no additional restrictions. So I woder why you go through >> 1 << ceil_log2 (unroll_fact) here instead of just using >> user_unroll / assumed_vf? IMO a truncating division is OK, we could >> argue for rounding? If we apply costing across modes there might >> be a mode with the unrolling exactly matching - I'm unsure if we >> want to adhere to #pragma GCC unroll irrespective of costing or not. >> >> That said, as we simply do >> >> if (applying_suggested_uf) >> LOOP_VINFO_VECT_FACTOR (loop_vinfo) *= loop_vinfo- >>> suggested_unroll_factor; >> >> I'd say go with >> >> suggested_unroll_factor = user_unroll / assumed_vf; >> >> ? >> > > The reason for it was that there's an assumption in place that the final VF > is a power of two. Which is enforced by > > if (!is_gimple_val (niters_vector)) > { > var = create_tmp_var (type, "bnd"); > gimple_seq stmts = NULL; > niters_vector = force_gimple_operand (niters_vector, &stmts, true, var); > gsi_insert_seq_on_edge_immediate (pe, stmts); > /* Peeling algorithm guarantees that vector loop bound is at least ONE, > we set range information to make niters analyzer's life easier. > Note the number of latch iteration value can be TYPE_MAX_VALUE so > we have to represent the vector niter TYPE_MAX_VALUE + 1 >> log_vf. */ > if (stmts != NULL && log_vf) That’s just executed whe unrolling is enforced? The VF can easily be non-power of two, even odd, when SLP is in effect. > where log_vf is using exact_log2 which doesn't return null on non-power-of-2 > values, > it returns -1; > > So the check is somewhat incorrect. Wrong I’d say. > We'd need > > if (stmts != NULL && log_vf && ! integer_minus_onep (log_vf)) > { > > But I didn't do so because we would then not set range information for > niters_vector. But does the code rely on logVF for the range? Why would it only work for that? Peeling ensures niter/VF >= 1 even for VF == 3 > So instead I just made sure that the unfoll factor is a power of two, such > that the multiplication > returns a power of two. > > Would you like the extra !integer_minus_onep (log_vf)) instead? It does seem > that many things > would be less optimal. E.g. in > > /* Create: niters >> log2(vf) */ > /* If it's known that niters == number of latch executions + 1 doesn't > overflow, we can generate niters >> log2(vf); otherwise we generate > (niters - vf) >> log2(vf) + 1 by using the fact that we know ratio > will be at least one. */ > log_vf = build_int_cst (type, exact_log2 (const_vf)); > if (niters_no_overflow) > niters_vector = fold_build2 (RSHIFT_EXPR, type, ni_minus_gap, log_vf); > > even for a small non-power of two unroll factor like, 3 would then become -1, > so niters_vector suddenly > becomes unknown. No, but we need to use a division instead of a shift. Richard > > Thanks, > Tamar > >> The rest of the patch looks OK to me. >> >> Richard. >> >> >>> + if (suggested_unroll_factor > 1) >>> + { >>> + if (dump_enabled_p ()) >>> + dump_printf_loc (MSG_NOTE, vect_location, >>> + "setting unroll factor to %d based on user requested " >>> + "unroll factor %d and suggested vectorization " >>> + "factor: %d\n", >>> + suggested_unroll_factor, user_unroll, assumed_vf); >>> + } >>> + } >>> + >>> + if (suggested_unroll_factor > 1) >>> + { >>> + if (dump_enabled_p ()) >>> + dump_printf_loc (MSG_NOTE, vect_location, >>> "***** Re-trying analysis for unrolling" >>> " with unroll factor %d and slp %s.\n", >>> suggested_unroll_factor, >>> slp_done_for_suggested_uf ? "on" : "off"); >>> - loop_vec_info unroll_vinfo >>> - = vect_create_loop_vinfo (loop, shared, loop_form_info, NULL); >>> - unroll_vinfo->vector_mode = vector_mode; >>> - unroll_vinfo->suggested_unroll_factor = suggested_unroll_factor; >>> - opt_result new_res = vect_analyze_loop_2 (unroll_vinfo, fatal, NULL, >>> - slp_done_for_suggested_uf); >>> - if (new_res) >>> - { >>> - delete loop_vinfo; >>> - loop_vinfo = unroll_vinfo; >>> - } >>> - else >>> - delete unroll_vinfo; >>> + loop_vec_info unroll_vinfo >>> + = vect_create_loop_vinfo (loop, shared, loop_form_info, NULL); >>> + unroll_vinfo->vector_mode = vector_mode; >>> + unroll_vinfo->suggested_unroll_factor = suggested_unroll_factor; >>> + opt_result new_res >>> + = vect_analyze_loop_2 (unroll_vinfo, fatal, NULL, >>> + slp_done_for_suggested_uf); >>> + if (new_res) >>> + { >>> + delete loop_vinfo; >>> + loop_vinfo = unroll_vinfo; >>> + LOOP_VINFO_USER_UNROLL (loop_vinfo) = user_unroll > 1; >>> + } >>> + else >>> + delete unroll_vinfo; >>> + } >>> } >>> >>> /* Remember the autodetected vector mode. */ >>> @@ -12373,6 +12398,13 @@ vect_transform_loop (loop_vec_info loop_vinfo, >> gimple *loop_vectorized_call) >>> dump_printf_loc (MSG_NOTE, vect_location, "Disabling unrolling due to" >>> " variable-length vectorization factor\n"); >>> } >>> + >>> + /* When we have unrolled the loop due to a user requested value we should >>> + leave it up to the RTL unroll heuristics to determine if it's still >>> worth >>> + while to unroll more. */ >>> + if (LOOP_VINFO_USER_UNROLL (loop_vinfo)) >>> + loop->unroll = 0; >>> + >>> /* Free SLP instances here because otherwise stmt reference counting >>> won't work. */ >>> slp_instance instance; >>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h >>> index >> a2f33a5ecd60288fe7f28ee639ff8b6a77667796..8fd8c10ec64f7241d6b097491f >> 84400164893911 100644 >>> --- a/gcc/tree-vectorizer.h >>> +++ b/gcc/tree-vectorizer.h >>> @@ -970,6 +970,10 @@ public: >>> /* Main loop IV cond. */ >>> gcond* loop_iv_cond; >>> >>> + /* True if we have an unroll factor requested by the user through pragma >>> GCC >>> + unroll. */ >>> + bool user_unroll; >>> + >>> /* True if there are no loop carried data dependencies in the loop. >>> If loop->safelen <= 1, then this is always true, either the loop >>> didn't have any loop carried data dependencies, or the loop is being >>> @@ -1094,6 +1098,7 @@ public: >>> #define LOOP_VINFO_CHECK_UNEQUAL_ADDRS(L) (L)->check_unequal_addrs >>> #define LOOP_VINFO_CHECK_NONZERO(L) (L)->check_nonzero >>> #define LOOP_VINFO_LOWER_BOUNDS(L) (L)->lower_bounds >>> +#define LOOP_VINFO_USER_UNROLL(L) (L)->user_unroll >>> #define LOOP_VINFO_GROUPED_STORES(L) (L)->grouped_stores >>> #define LOOP_VINFO_SLP_INSTANCES(L) (L)->slp_instances >>> #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor >>> >> >> -- >> Richard Biener <rguent...@suse.de> >> SUSE Software Solutions Germany GmbH, >> Frankenstrasse 146, 90461 Nuernberg, Germany; >> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)