Richard Biener <richard.guent...@gmail.com> writes:
> On Thu, May 4, 2017 at 2:12 PM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford
>> <richard.sandif...@linaro.org> wrote:
>>> This patch tries to calculate conservatively-correct distance
>>> vectors for two references whose base addresses are not the same.
>>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence
>>> isn't guaranteed to occur.
>>>
>>> The motivating example is:
>>>
>>>   struct s { int x[8]; };
>>>   void
>>>   f (struct s *a, struct s *b)
>>>   {
>>>     for (int i = 0; i < 8; ++i)
>>>       a->x[i] += b->x[i];
>>>   }
>>>
>>> in which the "a" and "b" accesses are either independent or have a
>>> dependence distance of 0 (assuming -fstrict-aliasing).  Neither case
>>> prevents vectorisation, so we can vectorise without an alias check.
>>>
>>> I'd originally wanted to do the same thing for arrays as well, e.g.:
>>>
>>>   void
>>>   f (int a[][8], struct b[][8])
>>>   {
>>>     for (int i = 0; i < 8; ++i)
>>>       a[0][i] += b[0][i];
>>>   }
>>>
>>> I think this is valid because C11 6.7.6.2/6 says:
>>>
>>>   For two array types to be compatible, both shall have compatible
>>>   element types, and if both size specifiers are present, and are
>>>   integer constant expressions, then both size specifiers shall have
>>>   the same constant value.
>>>
>>> So if we access an array through an int (*)[8], it must have type X[8]
>>> or X[], where X is compatible with int.  It doesn't seem possible in
>>> either case for "a[0]" and "b[0]" to overlap when "a != b".
>>>
>>> However, Richard B said that (at least in gimple) we support arbitrary
>>> overlap of arrays and allow arrays to be accessed with different
>>> dimensionality.  There are examples of this in PR50067.  I've therefore
>>> only handled references that end in a structure field access.
>>>
>>> There are two ways of handling these dependences in the vectoriser:
>>> use them to limit VF, or check at runtime as before.  I've gone for
>>> the approach of checking at runtime if we can, to avoid limiting VF
>>> unnecessarily.  We still fall back to a VF cap when runtime checks
>>> aren't allowed.
>>>
>>> The patch tests whether we queued an alias check with a dependence
>>> distance of X and then picked a VF <= X, in which case it's safe to
>>> drop the alias check.  Since vect_prune_runtime_alias_check_list can
>>> be called twice with different VF for the same loop, it's no longer
>>> safe to clear may_alias_ddrs on exit.  Instead we should use
>>> comp_alias_ddrs to check whether versioning is necessary.
>>>
>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> You seem to do your "fancy" thing but also later compute the old
>> base equality anyway (for same_base_p).  It looks to me for this
>> case the new fancy code can be simply skipped, keeping num_dimensions
>> as before?
>>
>> +      /* Try to approach equal type sizes.  */
>> +      if (!COMPLETE_TYPE_P (type_a)
>> +         || !COMPLETE_TYPE_P (type_b)
>> +         || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a))
>> +         || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b)))
>> +       break;
>>
>> ah, interesting idea to avoid a quadratic search.  Note that you should
>> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR
>> as they are used for type-punning.

All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs,
ARRAY_REFs or COMPONENT_REFs of structures, since that's all that
dr_analyze_indices allows, so I think we safe in terms of the tree codes.

>> I see nonoverlapping_component_refs_of_decl_p should simply skip
>> ARRAY_REFs - but I also see there:
>>
>>       /* ??? We cannot simply use the type of operand #0 of the refs here
>>          as the Fortran compiler smuggles type punning into COMPONENT_REFs
>>          for common blocks instead of using unions like everyone else.  */
>>       tree type1 = DECL_CONTEXT (field1);
>>       tree type2 = DECL_CONTEXT (field2);
>>
>> so you probably can't simply use TREE_TYPE (outer_ref) for type 
>> compatibility.
>> You also may not use types_compatible_p here as for LTO that is _way_ too
>> lax for aggregates.  The above uses
>>
>>       /* We cannot disambiguate fields in a union or qualified union.  */
>>       if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
>>          return false;
>>
>> so you should also bail out on unions here, rather than the check you do 
>> later.

The loop stops before we get to a union, so I think "only" the RECORD_TYPE
COMPONENT_REF handling is a potential problem.  Does this mean that
I should use the nonoverlapping_component_refs_of_decl_p code:

      tree field1 = TREE_OPERAND (ref1, 1);
      tree field2 = TREE_OPERAND (ref2, 1);

      /* ??? We cannot simply use the type of operand #0 of the refs here
         as the Fortran compiler smuggles type punning into COMPONENT_REFs
         for common blocks instead of using unions like everyone else.  */
      tree type1 = DECL_CONTEXT (field1);
      tree type2 = DECL_CONTEXT (field2);

      /* We cannot disambiguate fields in a union or qualified union.  */
      if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
         return false;

      if (field1 != field2)
        {
          /* A field and its representative need to be considered the
             same.  */
          if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2
              || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1)
            return false;
          /* Different fields of the same record type cannot overlap.
             ??? Bitfields can overlap at RTL level so punt on them.  */
          if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
            return false;
          return true;
        }

as the disambiguation test for COMPONENT_REFs, instead of types_compatible_p
during the new loop?  And test for this as well as unions in the outer
references?

>> You seem to rely on getting an access_fn entry for each handled_component_p.
>> It looks like this is the case -- we even seem to stop at unions (with the 
>> same
>> fortran "issue").  I'm not sure that's the best thing to do but you
>> rely on that.

Yeah, the loop is deliberately limited to the components associated with
an access_fn.  I did wonder at first whether dr_analyze_indices should
store the original component reference trees for each access function.
That would make things simpler and more explicit, but would also eat up
more memory.  Things like object_address_invariant_in_loop_p rely on the
access_fns in the same way that the loop in the patch does.

>> I don't understand the looping, it needs more comments.  You seem to be
>> looking for the innermost compatible RECORD_TYPE but then num_dimensions
>> is how many compatible refs you found on the way (with incompatible ones
>> not counting?!).  What about an inner varying array of structs?  This seems 
>> to
>> be disregarded in the analysis now?  Thus, a[i].s.b[i].j vs. __real
>> b[i].s.b[i].j?

I'll try to improve the comments.  But the idea is that both sequences are
as long as possible, while that still gives compatible types.  If there is
more than one such sequence, we pick the one nearest the base.

So in your example, the access functions would be:

               0   1   2   3   4
  a:          .j [i]  .b  .s [i]

           0   1   2   3   4   5
  b:  __real  .j [i]  .b  .s [i]

If a and b are pointers, the final access functions would be
unconstrained base accesses, so we'd end up with:

  a: [0, 3]
  b: [1, 4]

for both sequences.

>> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p
>> conveniently start from the other
>> end of the ref here.
>
> That said, for the motivational cases we either have one ref having
> more dimensions than the other (the __real vs. full complex access) or
> they have the same number of dimensions (and no access fn for the
> base).
>
> For the first case we should simply "drop" access_fns of the larger
> dimensional ref (from the start, plus outer component refs) up to the
> point the number of dimensions are equal.

Yeah, that's what happens for your example.  But if we had:

    a[i].s.c.d
    __real b[i].s.b[i].j

(where d is the same type as the real component) then the access
functions would be:

                   0   1   2   3
  a:              .d  .c  .s [i]

           0   1   2   3   4   5
  b:  __real  .j [i]  .b  .s [i]

Comparing the a0/b2 column doesn't make sense, because one's an array
and the other is a structure.  In this case the sequence we care about is:

  a: [1, 3]
  b: [3, 5]

which is what the loop gives.  The a1/b3 column is the one that proves
there's no dependence.

> Then we have the case of
>
>   ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b))
>
> where we have to punt.
>
> Then we have the case of
>
>   ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
>
> which is where the new code should kick in to see if we can drop access_fns
> from the other end (as unanalyzable but either having distance zero or not
> aliased because of TBAA).
>
> At least your testcases suggest you do not want to handle
>
>  struct s { int x[N]; };
>  struct r { struct s s; };
>  f (struct s *a, struct r *b)
>  {
>     for (i = 0; i < N; ++i)
>       a->s.x[i] = b->x[i];
>  }
>
> ?
>
> With this example your loop which seems to search for a "common"
> sequence in (different) midst of the reference trees makes more sense
> (still that loop is awkward to understand).

Yeah, I want to handle that too, just hadn't thought of it as a specific
testcase.  The code does give the expected dependence distance of 0.

Thanks,
Richard

Reply via email to