Richard Biener <rguent...@suse.de> writes:
> On Tue, 14 Sep 2021, Richard Sandiford wrote:
>
>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > This changes us to maintain and compute (mis-)alignment info for
>> > the first element of a group only rather than for each DR when
>> > doing interleaving and for the earliest, first, or first in the SLP
>> > node (or any pair or all three of those) when SLP vectorizing.
>> >
>> > For this to work out the easiest way I have changed the accessors
>> > DR_MISALIGNMENT and DR_TARGET_ALIGNMENT to do the indirection to
>> > the first element rather than adjusting all callers.
>> > dr_misalignment is moved out-of-line and I'm not too fond of the
>> > poly-int dances there (any hints?), but basically we are now
>> > adjusting the first elements misalignment based on the DR_INIT
>> > difference.
>> >
>> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>> >
>> > Richard.
>> >
>> > 2021-09-13  Richard Biener  <rguent...@suse.de>
>> >
>> >    * tree-vectorizer.h (dr_misalignment): Move out of line.
>> >    (dr_target_alignment): New.
>> >    (DR_TARGET_ALIGNMENT): Wrap dr_target_alignment.
>> >    (set_dr_target_alignment): New.
>> >    (SET_DR_TARGET_ALIGNMENT): Wrap set_dr_target_alignment.
>> >    * tree-vect-data-refs.c (dr_misalignment): Compute and
>> >    return the group members misalignment.
>> >    (vect_compute_data_ref_alignment): Use SET_DR_TARGET_ALIGNMENT.
>> >    (vect_analyze_data_refs_alignment): Compute alignment only
>> >    for the first element of a DR group.
>> >    (vect_slp_analyze_node_alignment): Likewise.
>> > ---
>> >  gcc/tree-vect-data-refs.c | 65 ++++++++++++++++++++++++---------------
>> >  gcc/tree-vectorizer.h     | 24 ++++++++++-----
>> >  2 files changed, 57 insertions(+), 32 deletions(-)
>> >
>> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> > index 66e76132d14..b53d6a0b3f1 100644
>> > --- a/gcc/tree-vect-data-refs.c
>> > +++ b/gcc/tree-vect-data-refs.c
>> > @@ -887,6 +887,36 @@ vect_slp_analyze_instance_dependence (vec_info 
>> > *vinfo, slp_instance instance)
>> >    return res;
>> >  }
>> >  
>> > +/* Return the misalignment of DR_INFO.  */
>> > +
>> > +int
>> > +dr_misalignment (dr_vec_info *dr_info)
>> > +{
>> > +  if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
>> > +    {
>> > +      dr_vec_info *first_dr
>> > +  = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
>> > +      int misalign = first_dr->misalignment;
>> > +      gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
>> > +      if (misalign == DR_MISALIGNMENT_UNKNOWN)
>> > +  return misalign;
>> > +      poly_offset_int diff = (wi::to_poly_offset (DR_INIT (dr_info->dr))
>> > +                        - wi::to_poly_offset (DR_INIT (first_dr->dr)));
>> > +      poly_int64 mispoly = misalign + diff.to_constant ().to_shwi ();
>> > +      bool res = known_misalignment (mispoly,
>> > +                               first_dr->target_alignment.to_constant (),
>> > +                               &misalign);
>> > +      gcc_assert (res);
>> > +      return misalign;
>> 
>> Yeah, not too keen on the to_constants here.  The one on diff looks
>> redundant -- you could just use diff.force_shwi () instead, and
>> keep everything poly_int.
>>
>> For the known_misalignment I think we should use:
>> 
>>    if (!can_div_trunc_p (mispoly, first_dr->target_alignment,
>>                       &quotient, &misalign))
>>      misalign = DR_MISALIGNMENT_UNKNOWN;
>>    return misalign;
>> 
>> There are then no to_constant assumptions.
>
> OK, note that group analysis does
>
>           /* Check that the DR_INITs are compile-time constants.  */
>           if (TREE_CODE (DR_INIT (dra)) != INTEGER_CST
>               || TREE_CODE (DR_INIT (drb)) != INTEGER_CST)
>             break;
>
>           /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb).  */
>           HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
>           HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
>
> so I'm confident my variant was "correct", but it still was ugly.

Ah, OK.  In that case I don't mind the original version, but it would be
good to have a comment above the to_constant saying where the condition
is enforced.

I'm just trying to avoid to_constant calls with no comment to explain
them, and with no nearby is_constant call.  Otherwise it could end up
a bit like tree_to_uhwi, where sometimes tree_fits_uhwi_p really has
been checked earlier (not always obvious where) and sometimes
tree_to_uhwi is just used out of hope, to avoid having to think about
the alternative.

> There's also the issue that target_alignment is poly_uint64 but
> misalignment is signed int.
>
> Note that can_div_trunc_p seems to require a poly_uint64 remainder,
> I'm not sure what to do with that, so I used is_constant.

Ah, yeah, forgot about that sorry.  I guess in that case, using
is_constant on first_dr->target_alignment and sticking with
known_misalignment would make sense.

> Btw, to what value do we want to align with variable sized vectors?
> We are aligning globals according to target_alignment but only
> support that for constant alignment.  Most users only want to
> distinguish between aligned or not aligned and the actual misalignment
> is only used to seed the SSA alignment info, so I suppose being
> too conservative in dr_misalignment for variable size vectors isn't
> too bad if we correctly identify fully aligned accesses?

In practice we don't require more than element alignment for VLA SVE
as things stand.  However, there's support for using fully-predicated
loops in which the first loop iteration aligns by using leading
inactive lanes where possible (e.g. start with 1 inactive lane
for a misalignment of 1).  In principle that would work for VLA too,
we just haven't implemented it yet.

> I'm now testing a variant that looks like
>
> int
> dr_misalignment (dr_vec_info *dr_info)
> {
>   if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
>     {
>       dr_vec_info *first_dr
>         = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
>       int misalign = first_dr->misalignment;
>       gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
>       if (misalign == DR_MISALIGNMENT_UNKNOWN)
>         return misalign;
>       /* vect_analyze_data_ref_accesses guarantees that DR_INIT are
>          INTEGER_CSTs and the first element in the group has the lowest
>          address.  */
>       poly_offset_int diff = (wi::to_poly_offset (DR_INIT (dr_info->dr))
>                               - wi::to_poly_offset (DR_INIT (first_dr->dr)));
>       gcc_assert (known_ge (diff, 0));
>       poly_uint64 mispoly = misalign + diff.force_uhwi ();
>       unsigned HOST_WIDE_INT quotient;
>       if (can_div_trunc_p (mispoly, first_dr->target_alignment, &quotient,
>                            &mispoly)
>           && mispoly.is_constant ())
>         return mispoly.to_constant ();
>       return DR_MISALIGNMENT_UNKNOWN;
>
> I wonder why a non-poly works for the quotient here.  I suppose it's
> because the runtime factor is the same for all poly-ints and thus
> it cancels out?  But then the remainder likely will never be constant
> unless it is zero?

It's not so much that the quotient is guaranteed to be constant,
but that that particular overload is designed for the case in which
the caller only cares about constant quotients, and wants can_div_trunc_p
to fail otherwise.  In other words, it's really just a way of cutting down
on is_constant calls.

poly-int.h doesn't provide every possible overload of can_div_trunc_p.
In principle it could have a fully-general 4-poly_int version, but we
haven't needed that yet.  It could also have a new overload for the
case we care about here, avoiding the separate is_constant.

Thanks,
Richard

Reply via email to