Richard Biener <richard.guent...@gmail.com> writes:
> On Tue, Mar 20, 2018 at 4:26 PM, Richard Sandiford
> <richard.sandif...@linaro.org> wrote:
>> Richard Biener <richard.guent...@gmail.com> writes:
>>> On Mon, Mar 19, 2018 at 10:27 PM, Richard Sandiford
>>> <richard.sandif...@linaro.org> wrote:
>>>> Richard Biener <richard.guent...@gmail.com> writes:
>>>>> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford
>>>>> <richard.sandif...@linaro.org> wrote:
>>>>>> Index: gcc/tree-data-ref.c
>>>>>> ===================================================================
>>>>>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000
>>>>>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000
>>>>>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d
>>>>>>    return alignment;
>>>>>>  }
>>>>>>
>>>>>> +/* If BASE is a pointer-typed SSA name, try to find the object that it
>>>>>> +   is based on.  Return this object X on success and store the alignment
>>>>>> +   in bytes of BASE - &X in *ALIGNMENT_OUT.  */
>>>>>> +
>>>>>> +static tree
>>>>>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out)
>>>>>> +{
>>>>>> +  if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE 
>>>>>> (base)))
>>>>>> +    return NULL_TREE;
>>>>>> +
>>>>>> +  gimple *def = SSA_NAME_DEF_STMT (base);
>>>>>> +  base = analyze_scalar_evolution (loop_containing_stmt (def), base);
>>>>>
>>>>> I think it is an error to use the def stmt of base here.  Instead you
>>>>> need to pass down the point/loop of the use.  For the same reason you
>>>>> also want to instantiate parameters after analyzing the evolution.
>>>>>
>>>>> In the end, if the BB to be vectorized is contained in a loop nest
>>>>> we want to
>>>>> instantiate up to the point where (eventually) a DECL we can re-align
>>>>> appears,
>>>>> but using the containing loop for now looks OK.
>>>>
>>>> Why's that necessary/better though?  We're not interested in the
>>>> evolution of the value at the point it*s used by the potential vector
>>>> code, but in how we get from the ultimate base (if there is one) to the
>>>> definition of the SSA name.
>>>
>>> Hmm, ok.
>>>
>>>> I don't think instantiating the SCEV would give any new information,
>>>> but it could lose some.  E.g. if we have:
>>>>
>>>>   x = &foo;
>>>>   do
>>>>     x += 8;
>>>>   while (*y++ < 10)
>>>>   ...potential vector use of *x...
>>>>
>>>> we wouldn't be able to express the address of x after the do-while
>>>> loop, because there's nothing that counts the number of iterations.
>>>> But the uninstantiated SCEV could tell us that x = &foo + N * 8 for
>>>> some (unknown) N.
>>>
>>> Not sure that it works that way.  I'm not 100% sure what kind of
>>> parameters are left symbolic, and if analysis loop and instantiation
>>> "loop" are the same I guess it doesn't make any difference...
>>>
>>>> (OK, so that doesn't actually work unless we take the effort
>>>> to look through loop-closed SSA form, but still...)
>>>>
>>>>>> +  /* Peel chrecs and record the minimum alignment preserved by
>>>>>> +     all steps.  */
>>>>>> +  unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;
>>>>>> +  while (TREE_CODE (base) == POLYNOMIAL_CHREC)
>>>>>> +    {
>>>>>> + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT
>>>>>> (base));
>>>>>> +      alignment = MIN (alignment, step_alignment);
>>>>>> +      base = CHREC_LEFT (base);
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Punt if the expression is too complicated to handle.  */
>>>>>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE
>>>>>> (base)))
>>>>>> +    return NULL_TREE;
>>>>>> +
>>>>>> +  /* Analyze the base to which the steps we peeled were applied.  */
>>>>>> +  poly_int64 bitsize, bitpos, bytepos;
>>>>>> +  machine_mode mode;
>>>>>> +  int unsignedp, reversep, volatilep;
>>>>>> +  tree offset;
>>>>>> +  base = get_inner_reference (build_fold_indirect_ref (base),
>>>>>
>>>>> ick :/
>>>>>
>>>>> what happens if you simply use get_pointer_alignment here and
>>>>> strip down POINTER_PLUS_EXPRs to the ultimate LHS?  (basically
>>>>> what get_pointer_alignment_1 does)  After all get_base_for_alignment_1
>>>>> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or
>>>>> &MEM[a_1 + 4].
>>>>
>>>> Yeah, but those have (hopefully) been handled by dr_analyze_innermost
>>>> already, which should have stripped off both the constant and variable
>>>> parts of the offset.  So I think SSA names are the only interesting
>>>> input.  The problem is that once we follow the definitions of an SSA
>>>> name through CHREC_LEFTs, we get a general address again.
>>>>
>>>> get_pointer_alignment and get_pointer_alignment_1 don't do what we want
>>>> because they take the alignment of the existing object into account,
>>>> whereas here we explicitly want to ignore that and only calculate the
>>>> alignment of the offset.
>>>
>>> Ah, indeed - I missed that fact.
>>>
>>>> I guess we could pass a flag down to ignore the current alignment though?
>>>
>>> But we need the base anyway...  so, given we can only ever re-align decls
>>> and never plain pointers instead of using build_fold_indirect_ref do
>>>
>>>  if (TREE_CODE (base) != ADDR_EXPR)
>>>    return NULL_TREE;
>>>
>>> else use TREE_OPERAND (base, 0)?
>>
>> The build_fold_indirect_ref also helps for POINTER_PLUS_EXPR,
>> which is what we get for things like the do-while above (e.g.
>> { &a + 64, +, 64 }_n if x points to a double *.)
>>
>> I guess everything we care about would be handled by fold_indirect_ref_1
>> though, so would that be OK instead?
>
> Sounds like a compromise ;)
>
> Patch is ok with that change.

Thanks, here's what I installed.  Tested as before.

Richard


2018-03-24  Richard Sandiford  <richard.sandif...@linaro.org>

gcc/
        PR tree-optimization/84005
        * tree-data-ref.h (get_base_for_alignment): Declare.
        * tree-data-ref.c (get_base_for_alignment_1): New function.
        (get_base_for_alignment): Likewise.
        * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use
        get_base_for_alignment to find a suitable base object, instead
        of always using drb->base_address.

gcc/testsuite/
        PR tree-optimization/84005
        * gcc.dg/vect/bb-slp-1.c: Make sure there is no message about
        failing to force the alignment.

Index: gcc/tree-data-ref.h
===================================================================
--- gcc/tree-data-ref.h 2018-03-18 10:24:43.602669700 +0000
+++ gcc/tree-data-ref.h 2018-03-24 10:47:51.281625630 +0000
@@ -463,6 +463,7 @@ extern bool compute_all_dependences (vec
 extern tree find_data_references_in_bb (struct loop *, basic_block,
                                         vec<data_reference_p> *);
 extern unsigned int dr_alignment (innermost_loop_behavior *);
+extern tree get_base_for_alignment (tree, unsigned int *);
 
 /* Return the alignment in bytes that DR is guaranteed to have at all
    times.  */
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c 2018-03-21 20:51:32.502280975 +0000
+++ gcc/tree-data-ref.c 2018-03-24 10:47:51.281625630 +0000
@@ -5202,6 +5202,81 @@ dr_alignment (innermost_loop_behavior *d
   return alignment;
 }
 
+/* If BASE is a pointer-typed SSA name, try to find the object that it
+   is based on.  Return this object X on success and store the alignment
+   in bytes of BASE - &X in *ALIGNMENT_OUT.  */
+
+static tree
+get_base_for_alignment_1 (tree base, unsigned int *alignment_out)
+{
+  if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base)))
+    return NULL_TREE;
+
+  gimple *def = SSA_NAME_DEF_STMT (base);
+  base = analyze_scalar_evolution (loop_containing_stmt (def), base);
+
+  /* Peel chrecs and record the minimum alignment preserved by
+     all steps.  */
+  unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;
+  while (TREE_CODE (base) == POLYNOMIAL_CHREC)
+    {
+      unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base));
+      alignment = MIN (alignment, step_alignment);
+      base = CHREC_LEFT (base);
+    }
+
+  /* Punt if the expression is too complicated to handle.  */
+  if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE (base)))
+    return NULL_TREE;
+
+  /* The only useful cases are those for which a dereference folds to something
+     other than an INDIRECT_REF.  */
+  tree ref_type = TREE_TYPE (TREE_TYPE (base));
+  tree ref = fold_indirect_ref_1 (UNKNOWN_LOCATION, ref_type, base);
+  if (!ref)
+    return NULL_TREE;
+
+  /* Analyze the base to which the steps we peeled were applied.  */
+  poly_int64 bitsize, bitpos, bytepos;
+  machine_mode mode;
+  int unsignedp, reversep, volatilep;
+  tree offset;
+  base = get_inner_reference (ref, &bitsize, &bitpos, &offset, &mode,
+                             &unsignedp, &reversep, &volatilep);
+  if (!base || !multiple_p (bitpos, BITS_PER_UNIT, &bytepos))
+    return NULL_TREE;
+
+  /* Restrict the alignment to that guaranteed by the offsets.  */
+  unsigned int bytepos_alignment = known_alignment (bytepos);
+  if (bytepos_alignment != 0)
+    alignment = MIN (alignment, bytepos_alignment);
+  if (offset)
+    {
+      unsigned int offset_alignment = highest_pow2_factor (offset);
+      alignment = MIN (alignment, offset_alignment);
+    }
+
+  *alignment_out = alignment;
+  return base;
+}
+
+/* Return the object whose alignment would need to be changed in order
+   to increase the alignment of ADDR.  Store the maximum achievable
+   alignment in *MAX_ALIGNMENT.  */
+
+tree
+get_base_for_alignment (tree addr, unsigned int *max_alignment)
+{
+  tree base = get_base_for_alignment_1 (addr, max_alignment);
+  if (base)
+    return base;
+
+  if (TREE_CODE (addr) == ADDR_EXPR)
+    addr = TREE_OPERAND (addr, 0);
+  *max_alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;
+  return addr;
+}
+
 /* Recursive helper function.  */
 
 static bool
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c   2018-03-18 10:24:43.602669700 +0000
+++ gcc/tree-vect-data-refs.c   2018-03-24 10:47:51.283625566 +0000
@@ -957,11 +957,11 @@ vect_compute_data_ref_alignment (struct
 
   if (base_alignment < vector_alignment)
     {
-      tree base = drb->base_address;
-      if (TREE_CODE (base) == ADDR_EXPR)
-       base = TREE_OPERAND (base, 0);
-      if (!vect_can_force_dr_alignment_p (base,
-                                         vector_alignment * BITS_PER_UNIT))
+      unsigned int max_alignment;
+      tree base = get_base_for_alignment (drb->base_address, &max_alignment);
+      if (max_alignment < vector_alignment
+         || !vect_can_force_dr_alignment_p (base,
+                                            vector_alignment * BITS_PER_UNIT))
        {
          if (dump_enabled_p ())
            {
Index: gcc/testsuite/gcc.dg/vect/bb-slp-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-1.c        2018-03-18 10:24:43.602669700 
+0000
+++ gcc/testsuite/gcc.dg/vect/bb-slp-1.c        2018-03-24 10:47:51.280625662 
+0000
@@ -54,5 +54,5 @@ int main (void)
   return 0;
 }
 
+/* { dg-final { scan-tree-dump-not "can't force alignment" "slp1" } } */
 /* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } } */
-  

Reply via email to