On Mon, 28 Oct 2024, Alex Coplan wrote:
> This allows us to vectorize more loops with early exits by forcing
> peeling for alignment to make sure that we're guaranteed to be able to
> safely read an entire vector iteration without crossing a page boundary.
>
> To make this work for VLA architectures we have to allow compile-time
> non-constant target alignments. We also have to override the result of
> the target's preferred_vector_alignment hook if it isn't a power-of-two
> multiple of the TYPE_SIZE of the chosen vector type.
>
> There is currently an implicit assumption that the TYPE_SIZE of the
> vector type is itself a power of two. For non-VLA types this
> could be checked directly in the vectorizer. For VLA types I
> had discussed offline with Richard S about adding a target hook to allow
> the vectorizer to query the backend to confirm that a given VLA type
> is known to have a power-of-two size at runtime.
GCC assumes all vectors have power-of-two size, so I don't think we
need to check anything but we'd instead have to make sure the
target constrains the hardware when this assumption doesn't hold
in silicon.
> I thought we
> might be able to do this check in vector_alignment_reachable_p. Any
> thoughts on that, richi?
For the purpose of alignment peeling yeah, I guess this would be
a possible place to check this. The hook is currently used for
the case where the element has a lower alignment than its
size and thus vector alignment cannot be reached by peeling.
Btw, I thought we can already apply peeling for alignment for
VLA vectors ...
> gcc/ChangeLog:
>
> * tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
> Set need_peeling_for_alignment flag on read DRs instead of
> failing vectorization. Punt on gathers.
> (dr_misalignment): Handle non-constant target alignments.
> (vect_compute_data_ref_alignment): If need_peeling_for_alignment
> flag is set on the DR, then override the target alignment chosen
> by the preferred_vector_alignment hook to choose a safe
> alignment.
> (vect_supportable_dr_alignment): Override
> support_vector_misalignment hook if need_peeling_for_alignment
> is set on the DR: in this case we must return
> dr_unaligned_unsupported in order to force peeling.
> * tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog
> peeling by a compile-time non-constant amount.
> * tree-vectorizer.h (dr_vec_info): Add new flag
> need_peeling_for_alignment.
> ---
> gcc/tree-vect-data-refs.cc | 77 ++++++++++++++++++++++++++++++-------
> gcc/tree-vect-loop-manip.cc | 6 ---
> gcc/tree-vectorizer.h | 5 +++
> 3 files changed, 68 insertions(+), 20 deletions(-)
Eh, where's the inline copy ...
@@ -739,15 +739,22 @@ vect_analyze_early_break_dependences (loop_vec_info
loop_vinfo)
if (DR_IS_READ (dr_ref)
&& !ref_within_array_bound (stmt, DR_REF (dr_ref)))
{
+ if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo))
+ {
+ const char *msg
you want to add STMT_VINFO_STRIDED_P as well.
/* Vector size in bytes. */
+ poly_uint64 safe_align
+ = exact_div (tree_to_poly_uint64 (TYPE_SIZE (vectype)),
BITS_PER_UNIT);
safe_align = TYPE_SIZE_UNIT (vectype);
+ /* Multiply by the unroll factor to get the number of bytes read
+ per vector iteration. */
+ if (loop_vinfo)
+ {
+ auto num_copies = vect_get_num_copies (loop_vinfo, vectype);
+ gcc_checking_assert (pow2p_hwi (num_copies));
+ safe_align *= num_copies;
the unroll factor is the vectorization factor - I think the above goes
wrong for grouped accesses like an early break condition
if (a[2*i] == a[2*i+1])
or so. Thus, multiply by LOOP_VINFO_VECT_FACTOR (loop_vinfo).
Note this number doesn't need to be a power of two (and num_copies
above neither)
The rest of the patch looks good to me.
Richard.