On Fri, 24 Sep 2021, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > This allows vectorization (in practice non-loop vectorization) to
> > have a stmt participate in different vector type vectorizations.
> > It allows us to remove vect_update_shared_vectype and replace it
> > by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
> > vect_analyze_stmt and vect_transform_stmt.
> >
> > For data-ref the situation is a bit more complicated since we
> > analyze alignment info with a specific vector type in mind which
> > doesn't play well when that changes.
> >
> > So the bulk of the change is passing down the actual vector type
> > used for a vectorized access to the various accessors of alignment
> > info, first and foremost dr_misalignment but also aligned_access_p,
> > known_alignment_for_access_p, vect_known_alignment_in_bytes and
> > vect_supportable_dr_alignment.  I took the liberty to replace
> > ALL_CAPS macro accessors with the lower-case function invocations.
> >
> > The actual changes to the behavior are in dr_misalignment which now
> > is the place factoring in the negative step adjustment as well as
> > handling alignment queries for a vector type with bigger alignment
> > requirements than what we can (or have) analyze(d).
> >
> > vect_slp_analyze_node_alignment makes use of this and upon receiving
> > a vector type with a bigger alingment desire re-analyzes the DR
> > with respect to it but keeps an older more precise result if possible.
> > In this context it might be possible to do the analysis just once
> > but instead of analyzing with respect to a specific desired alignment
> > look for the biggest alignment we can compute a not unknown alignment.
> >
> > The ChangeLog includes the functional changes but not the bulk due
> > to the alignment accessor API changes - I hope that's something good.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, testing on SPEC
> > CPU 2017 in progress (for stats and correctness).
> >
> > Any comments?
> 
> Sorry for the super-slow response, some comments below.
> 
> > […]
> > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> > index a57700f2c1b..c42fc2fb272 100644
> > --- a/gcc/tree-vect-data-refs.c
> > +++ b/gcc/tree-vect-data-refs.c
> > @@ -887,37 +887,53 @@ vect_slp_analyze_instance_dependence (vec_info 
> > *vinfo, slp_instance instance)
> >    return res;
> >  }
> >  
> > -/* Return the misalignment of DR_INFO.  */
> > +/* Return the misalignment of DR_INFO accessed in VECTYPE.  */
> >  
> >  int
> > -dr_misalignment (dr_vec_info *dr_info)
> > +dr_misalignment (dr_vec_info *dr_info, tree vectype)
> >  {
> > +  HOST_WIDE_INT diff = 0;
> > +  /* Alignment is only analyzed for the first element of a DR group,
> > +     use that but adjust misalignment by the offset of the access.  */
> >    if (STMT_VINFO_GROUPED_ACCESS (dr_info->stmt))
> >      {
> >        dr_vec_info *first_dr
> >     = STMT_VINFO_DR_INFO (DR_GROUP_FIRST_ELEMENT (dr_info->stmt));
> > -      int misalign = first_dr->misalignment;
> > -      gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> > -      if (misalign == DR_MISALIGNMENT_UNKNOWN)
> > -   return misalign;
> >        /* vect_analyze_data_ref_accesses guarantees that DR_INIT are
> >      INTEGER_CSTs and the first element in the group has the lowest
> >      address.  Likewise vect_compute_data_ref_alignment will
> >      have ensured that target_alignment is constant and otherwise
> >      set misalign to DR_MISALIGNMENT_UNKNOWN.  */
> 
> Can you move the second sentence down so that it stays with the to_constant?
> 
> > -      HOST_WIDE_INT diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr))
> > -                       - TREE_INT_CST_LOW (DR_INIT (first_dr->dr)));
> > +      diff = (TREE_INT_CST_LOW (DR_INIT (dr_info->dr))
> > +         - TREE_INT_CST_LOW (DR_INIT (first_dr->dr)));
> >        gcc_assert (diff >= 0);
> > -      unsigned HOST_WIDE_INT target_alignment_c
> > -   = first_dr->target_alignment.to_constant ();
> > -      return (misalign + diff) % target_alignment_c;
> > +      dr_info = first_dr;
> >      }
> > -  else
> > +
> > +  int misalign = dr_info->misalignment;
> > +  gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> > +  if (misalign == DR_MISALIGNMENT_UNKNOWN)
> > +    return misalign;
> > +
> > +  /* If the access is only aligned for a vector type with smaller alignment
> > +     requirement the access has unknown misalignment.  */
> > +  if (maybe_lt (dr_info->target_alignment * BITS_PER_UNIT,
> > +           targetm.vectorize.preferred_vector_alignment (vectype)))
> > +    return DR_MISALIGNMENT_UNKNOWN;
> > +
> > +  /* If this is a backward running DR then first access in the larger
> > +     vectype actually is N-1 elements before the address in the DR.
> > +     Adjust misalign accordingly.  */
> > +  if (tree_int_cst_sgn (DR_STEP (dr_info->dr)) < 0)
> >      {
> > -      int misalign = dr_info->misalignment;
> > -      gcc_assert (misalign != DR_MISALIGNMENT_UNINITIALIZED);
> > -      return misalign;
> > +      if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant ())
> > +   return DR_MISALIGNMENT_UNKNOWN;
> > +      diff += ((TYPE_VECTOR_SUBPARTS (vectype).to_constant () - 1)
> > +          * -TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype))));
> 
> The target alignment might only be element alignment.  If so, I think we
> should either skip this block or return early above.  This would avoid
> returning DR_MISALIGNMENT_UNKNOWN unnecessarily for VLA.
> 
> Or, more generally, I think this is case where known_misalignment
> does work.  I.e. diff can be a poly_int64 and:
> 
> >      }
> > +  unsigned HOST_WIDE_INT target_alignment_c
> > +    = dr_info->target_alignment.to_constant ();
> > +  return (misalign + diff) % target_alignment_c;
> 
> the last line can become:
> 
>   if (!known_misalignment (misalign + diff, target_alignment_c, &misalign))
>     return DR_MISALIGNMENT_UNKNOWN;
>   return misalign;
> 
> avoiding the need for the is_constant above.

OK, done - this then matches more closely what 
vect_compute_data_ref_alignment did.

> > […]
> > @@ -1896,10 +1902,9 @@ vect_enhance_data_refs_alignment (loop_vec_info 
> > loop_vinfo)
> >           unsigned int target_align =
> >             DR_TARGET_ALIGNMENT (dr_info).to_constant ();
> >           unsigned int dr_size = vect_get_scalar_dr_size (dr_info);
> > -         mis = (negative
> > -                ? DR_MISALIGNMENT (dr_info)
> > -                : -DR_MISALIGNMENT (dr_info));
> > -         if (DR_MISALIGNMENT (dr_info) != 0)
> > +         unsigned int mis = dr_misalignment (dr_info, vectype);
> > +         mis = negative ? mis : -mis;
> 
> Just checking: is this still correct?  It probably is, but it looked a
> bit like we're handling negative steps twice.  Same further down.

Yeah, it's done the same number of times than before.  But I didn't try
to understand this instance but I hope to followup with a change that
moves the negative step adjustment to where we actually do vectorize
with the computed address offset ... (there's a PR which shows we
currently generate wrong-code for negative stride SLP where this
adjustment is off and thus we get wrong aligned/misaligned accesses)

Richard.

Reply via email to