On 21/11/2024 10:02, Richard Biener wrote:
> On Fri, 15 Nov 2024, Alex Coplan wrote:
>
> > Hi,
> >
> > This is a v2 which hopefully addresses the feedback for v1 of the 1/5
> > patch, originally posted here:
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666648.html
> >
> > As mentioned on IRC, it will need follow-up work to fix up latent
> > profile issues, but that can be done during stage 3. We will also need
> > a simple (hopefully obvious, even) follow-up patch to fix expectations
> > for various tests (since we now vectorize loops which we previously
> > couldn't).
> >
> > OK for trunk?
>
> I'm still looking at
>
> + if (dr_info->need_peeling_for_alignment)
> + {
> + /* Vector size in bytes. */
> + poly_uint64 safe_align = tree_to_poly_uint64 (TYPE_SIZE_UNIT
> (vectype));
> +
> + /* We can only peel for loops, of course. */
> + gcc_checking_assert (loop_vinfo);
> +
> + /* Calculate the number of vectors read per vector iteration. If
> + it is a power of two, multiply through to get the required
> + alignment in bytes. Otherwise, fail analysis since alignment
> + peeling wouldn't work in such a case. */
> + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> + if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
> + vf *= DR_GROUP_SIZE (stmt_info);
>
> so this is the total number of scalars we load, so maybe call it
> that way, num_scalars.
Will do.
>
> +
> + auto num_vectors = vect_get_num_vectors (vf, vectype);
> + if (!pow2p_hwi (num_vectors))
>
> side-note - with all these multiplies there's the possibility that
> we get a testcase that has safe_align > PAGE_SIZE, meaning it's
> no longer a good way to avoid trapping. This problem of course
> exists generally, we avoid it elsewhere by not having very large
> vectors or limiting the group-size. The actual problem is that
> we don't know the actual page size, but we maybe could configure
> a minimum as part of the platform configuration. Should we for
> now simply add
>
> || safe_align > 4096
>
> here? A testcase would load 512 contiguous uint64 to form an early exit
> condition, quite unlikely I guess.
Good point. I suppose this really depends whether there are
targets/platforms that GCC supports with a page size smaller than 4k.
Perhaps a min_page_size target hook (defaulting to 4k) would be
sensible. Then if there are any such targets they can override the
hook. WDYT?
>
> With DR_GROUP_SIZE != 1 there's also the question whether we can
> ever reach the desired alignment since each peel will skip
> DR_GROUP_SIZE scalar elements - either those are already
> DR_GROUP_SIZE aligned or the result will never be.
I had a look at this, I think this should already be handled by
vector_alignment_reachable_p which already includes a suitable check for
STMT_VINFO_GROUPED_ACCESS. E.g. for the following testcase:
double *a;
int f(int n) {
for (int i = 0; i < n; i += 2)
if (a[i])
__builtin_abort();
}
we have DR_GROUP_SIZE = 2 and in the dump (-O3, with the alignment peeling
patches) we see:
note: === vect_enhance_data_refs_alignment ===
missed: vector alignment may not be reachable
note: vect_can_advance_ivs_p:
note: Analyze phi: i_12 = PHI <i_9(8), 0(7)>
note: Alignment of access forced using versioning.
note: Versioning for alignment will be applied.
so we decide to version instead, since peeling isn't viable here
(vector_alignment_reachable_p correctly returns false).
Perhaps that says the DR flag should really be called
need_peeling_or_versioning_for_alignment (although better I guess just
to update the comment above its definition to mention versioning).
>
> @@ -7208,7 +7277,8 @@ vect_supportable_dr_alignment (vec_info *vinfo,
> dr_vec_info *dr_info,
> if (misalignment == DR_MISALIGNMENT_UNKNOWN)
> is_packed = not_size_aligned (DR_REF (dr));
> if (targetm.vectorize.support_vector_misalignment (mode, type,
> misalignment,
> - is_packed))
> + is_packed)
> + && !dr_info->need_peeling_for_alignment)
> return dr_unaligned_supported;
>
> I think you need to do this earlier, like with
>
> if (misalignment == 0)
> return dr_aligned;
> + else if (dr_info->need_peeling_for_alignment)
> + return dr_unaligned_unsupported;
That seems more obviously correct indeed. I'll make that change.
>
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index c8dc7153298..be2c2a1bc75 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3129,12 +3129,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
> niters, tree nitersm1,
> int estimated_vf;
> int prolog_peeling = 0;
> bool vect_epilogues = loop_vinfo->epilogue_vinfo != NULL;
> - /* We currently do not support prolog peeling if the target alignment
> is not
> - known at compile time. 'vect_gen_prolog_loop_niters' depends on the
> - target alignment being constant. */
> - dr_vec_info *dr_info = LOOP_VINFO_UNALIGNED_DR (loop_vinfo);
> - if (dr_info && !DR_TARGET_ALIGNMENT (dr_info).is_constant ())
> - return NULL;
>
> this indeed looks like an odd (wrong) place to enforce this, so I
> suppose you figured the check is not needed, independent of the patch?
Yeah, exactly, it seems that it isn't needed.
>
> OK with the above changes.
Thanks! I guess we just need to decide on the page size issue above
before going ahead.
Alex
>
> Thanks,
> Richard.
>
>
> > Thanks,
> > Alex
> >
> > -- >8 --
> >
> > This allows us to vectorize more loops with early exits by forcing
> > peeling for alignment to make sure that we're guaranteed to be able to
> > safely read an entire vector iteration without crossing a page boundary.
> >
> > To make this work for VLA architectures we have to allow compile-time
> > non-constant target alignments. We also have to override the result of
> > the target's preferred_vector_alignment hook if it isn't already a
> > power-of-two multiple of the amount read per vector iteration.
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
> > Set need_peeling_for_alignment flag on read DRs instead of
> > failing vectorization. Punt on gathers and strided_p accesses.
> > (dr_misalignment): Handle non-constant target alignments.
> > (vect_compute_data_ref_alignment): If need_peeling_for_alignment
> > flag is set on the DR, then override the target alignment chosen
> > by the preferred_vector_alignment hook to choose a safe
> > alignment. Add new result parameter. Use it ...
> > (vect_analyze_data_refs_alignment): ... here, and fail
> > analysis if vect_compute_data_ref_alignment sets it to a
> > failure.
> > (vect_supportable_dr_alignment): Override
> > support_vector_misalignment hook if need_peeling_for_alignment
> > is set on the DR: in this case we must return
> > dr_unaligned_unsupported in order to force peeling.
> > * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog
> > peeling by a compile-time non-constant amount.
> > * tree-vectorizer.h (dr_vec_info): Add new flag
> > need_peeling_for_alignment.
> >
>
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)