Hi,

On Tue, Dec 05 2017, Richard Biener wrote:
> On Tue, 5 Dec 2017, Martin Jambor wrote:
>
>> On Tue, Dec 05 2017, Martin Jambor wrote:
>> > On Tue, Dec 05 2017, Martin Jambor wrote:
>> > Hi,
>> >
>> >> Hi,
>> >>
>> >> this is a followup to Richi's
>> >> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02396.html to fix PR
>> >> 83141.  The basic idea is simple, be just as conservative about type
>> >> changing MEM_REFs as we are about actual VCEs.
>> >>
>> >> I have checked how that would affect compilation of SPEC 2006 and (non
>> >> LTO) Mozilla Firefox and am happy to report that the difference was
>> >> tiny.  However, I had to make the test less strict, otherwise testcase
>> >> gcc.dg/guality/pr54970.c kept failing because it contains folded memcpy
>> >> and expects us to track values accross:
>> >>
>> >>   int a[] = { 1, 2, 3 };
>> >>   /* ... */
>> >>   __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a));
>> >>                           /* { dg-final { gdb-test 31 "a\[0\]" "4" } } */
>> >>                           /* { dg-final { gdb-test 31 "a\[1\]" "5" } } */
>> >>                           /* { dg-final { gdb-test 31 "a\[2\]" "6" } } */
>> >>   
>> >> SRA is able to load replacement of a[0] directly from the temporary
>> >> array which is apparently necessary to generate proper debug info.  I
>> >> have therefore allowed the current transformation to go forward if the
>> >> source does not contain any padding or if it is a read-only declaration.
>> >
>> > Ah, the read-only test is of course bogus, it was a last minute addition
>> > when I was apparently already too tired to think it through.  Please
>> > disregard that line in the patch (it has passed bootstrap and testing
>> > without it).
>> >
>> > Sorry for the noise,
>> >
>> > Martin
>> >
>> 
>> And for the record, below is the actual patch, after a fresh round of
>> re-testing to double check I did not mess up anything else.  As before,
>> I'd like to ask for review, especially of the type_contains_padding_p
>> predicate and then would like to commit it to trunk.
>
> Comments below...
>
>> Thanks,
>> 
>> Martin
>> 
>> 
>> 2017-12-05  Martin Jambor  <mjam...@suse.cz>
>> 
>>      PR tree-optimization/83141
>>      * tree-sra.c (type_contains_padding_p): New function.
>>      (contains_vce_or_bfcref_p): Move up in the file, also test for
>>      MEM_REFs implicitely changing types with padding.  Remove inline
>>      keyword.
>>      (build_accesses_from_assign): Check contains_vce_or_bfcref_p
>>      before setting bit in should_scalarize_away_bitmap.
>> 
>> testsuite/
>>      * gcc.dg/tree-ssa/pr83141.c: New test.
>> ---
>>  gcc/testsuite/gcc.dg/tree-ssa/pr83141.c |  31 +++++++++
>>  gcc/tree-sra.c                          | 115 
>> ++++++++++++++++++++++++++------
>>  2 files changed, 127 insertions(+), 19 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
>> 
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
>> new file mode 100644
>> index 00000000000..86caf66025b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
>> @@ -0,0 +1,31 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O -fdump-tree-esra-details" } */
>> +
>> +volatile short vs;
>> +volatile long vl;
>> +
>> +struct A { short s; long i; long j; };
>> +struct A a, b;
>> +void foo ()
>> +{
>> +  struct A c;
>> +  __builtin_memcpy (&c, &b, sizeof (struct A));
>> +  __builtin_memcpy (&a, &c, sizeof (struct A));
>> +
>> +  vs = c.s;
>> +  vl = c.i;
>> +  vl = c.j;
>> +}
>> +int main()
>> +{
>> +  __builtin_memset (&b, 0, sizeof (struct A));
>> +  b.s = 1;
>> +  __builtin_memcpy ((char *)&b+2, &b, 2);
>> +  foo ();
>> +  __builtin_memcpy (&a, (char *)&a+2, 2);
>> +  if (a.s != 1)
>> +    __builtin_abort ();
>> +  return 0;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-not "Will attempt to totally scalarize" 
>> "esra" } } */
>> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>> index 866cff0edb0..df9f10f83b6 100644
>> --- a/gcc/tree-sra.c
>> +++ b/gcc/tree-sra.c
>> @@ -1141,6 +1141,100 @@ contains_view_convert_expr_p (const_tree ref)
>>    return false;
>>  }
>>  
>> +/* Return false if we can determine that TYPE has no padding, otherwise 
>> return
>> +   true.  */
>> +
>> +static bool
>> +type_contains_padding_p (tree type)
>> +{
>> +  if (is_gimple_reg_type (type))
>> +    return false;
>> +
>> +  if (!tree_fits_uhwi_p (TYPE_SIZE (type)))
>> +    return true;
>> +
>> +  switch (TREE_CODE (type))
>> +    {
>> +    case RECORD_TYPE:
>> +      {
>> +    unsigned HOST_WIDE_INT pos = 0;
>> +    for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
>> +      if (TREE_CODE (fld) == FIELD_DECL)
>> +        {
>> +          tree ft = TREE_TYPE (fld);
>> +
>> +          if (DECL_BIT_FIELD (fld)
>> +              || DECL_PADDING_P (fld)
>> +              || !tree_fits_uhwi_p (TYPE_SIZE (ft)))
>> +            return true;
>> +
>> +          tree t = bit_position (fld);
>> +          if (!tree_fits_uhwi_p (t))
>> +            return true;
>> +          unsigned HOST_WIDE_INT bp = tree_to_uhwi (t);
>> +          if (pos != bp)
>> +            return true;
>> +
>> +          pos += tree_to_uhwi (TYPE_SIZE (ft));
>> +          if (type_contains_padding_p (ft))
>> +            return true;
>> +        }
>> +
>> +    if (pos != tree_to_uhwi (TYPE_SIZE (type)))
>> +      return true;
>> +
>> +    return false;
>> +      }
>> +
>> +    case ARRAY_TYPE:
>> +      return type_contains_padding_p (TYPE_SIZE (type));
>> +
>> +    default:
>> +      return true;
>> +    }
>> +  gcc_unreachable ();
>> +}
>
> The possibly repeated walk over all fields and subfields of an aggregate
> type looks expensive - some caching on the main variant might be
> advisable to me.

It almost never happens but I guess you are right.  But see below.

>
>> +/* Return true if REF contains a MEM_REF that performs type conversion from 
>> a
>> +   type with padding or any VIEW_CONVERT_EXPR or a COMPONENT_REF with a
>> +   bit-field field declaration.  */
>> +
>> +static bool
>> +contains_vce_or_bfcref_p (const_tree ref)
>> +{
>> +  while (true)
>> +    {
>> +      if (TREE_CODE (ref) == MEM_REF)
>> +    {
>> +      tree op0 = TREE_OPERAND (ref, 0);
>> +      if (TREE_CODE (op0) == ADDR_EXPR)
>> +        {
>> +          tree mem = TREE_OPERAND (op0, 0);
>> +          if ((TYPE_MAIN_VARIANT (TREE_TYPE (ref))
>> +               != TYPE_MAIN_VARIANT (TREE_TYPE (mem)))
>> +              && type_contains_padding_p (TREE_TYPE (mem)))
>> +            return true;
>> +
>> +          ref = mem;
>> +          continue;
>> +        }
>> +      else
>> +        return false;
>> +    }
>> +
>> +      if (!handled_component_p (ref))
>> +    return false;
>> +
>> +      if (TREE_CODE (ref) == VIEW_CONVERT_EXPR
>> +      || (TREE_CODE (ref) == COMPONENT_REF
>> +          && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))))
>> +    return true;
>> +      ref = TREE_OPERAND (ref, 0);
>> +    }
>
> It is best to retain the old code and simply append
>
>   if (TREE_CODE (ref) == MEM_REF)
>
> after the while (handled_component_p ()) loop.  A MEM_REF can only
> appear in innermost position.

I hoped so but could not find any such check quickly in the tree-cfg.c
verifier.

>
> I'm not sure that we really want to retain SRAing structures
> based on their fields when we see a VCEd MEM_REF accessing them.
> This might for example do floating-point accesses on data that
> isn't floating-point and on some architectures may raise exceptions
> (IA64 comes to my mind).  The memcpy folding code explicitely disallows
> float modes and also BOOLEAN_TYPE and ENUMERAL_TYPE becuase they
> might not match their precision:
>
>       /* Make sure we are not copying using a floating-point mode or
>          a type whose size possibly does not match its precision.  */
>       if (FLOAT_MODE_P (TYPE_MODE (desttype))
>           || TREE_CODE (desttype) == BOOLEAN_TYPE
>           || TREE_CODE (desttype) == ENUMERAL_TYPE)
>         desttype = bitwise_type_for_mode (TYPE_MODE (desttype));
>       if (FLOAT_MODE_P (TYPE_MODE (srctype))
>           || TREE_CODE (srctype) == BOOLEAN_TYPE
>           || TREE_CODE (srctype) == ENUMERAL_TYPE)
>         srctype = bitwise_type_for_mode (TYPE_MODE (srctype));
>
> so - what's the reason you think you need to retain SRAing memcpied
> structs?  Via total scalarization I mean - if we see the "real" accesses
> besides the aggregate copy then we of course win.

Two reasons.  First, Jakub explicitely asked me to do it in
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02139.html.  I will openly
admit that at the time I was planning to silently ignore it and hope to
get away with it because I shared your caution.  (Now I see I forgot
about the second request which I did not want to ignore and will adjust
the testcase accordingly.)

Second is the testcase I described in my previous email.  When I saw

  FAIL: gcc.dg/guality/pr54970.c   -O1  line 31 a[0] == 4

At all optimization levels, I grumbled about Jakub being right again and
duly decided to bite the bullet and do what he asked me to because it
fixes the issue.  But if you allow me to XFAIL the guality test, I will
happily remove the whole padding detection, I don't really like it
either.

The debug information is apparently lost because a[0] is never used from
that point on, as opposed to a[1] and a[2] for which the guality test
still passes.

>
> Do we check anywhere when seeing an aggregate copy that lhs and rhs
> have matching TYPE_MAIN_VARIANT?

No.  I can do some due diligence measurements but I think it would not
hurt to add it.

> GIMPLE allows matching TYPE_CANONICAL
> which can have mismatched fields (LTO, cross-language) or even any
> type if TYPE_STRUCTURAL_EQUALITY.  That is, what happens if we mark
> two distinct enough decls for total scalarization that are copied
> to each other?  Do we end up re-materializing them if their
> "sturcture" doesn't match when transforming the assignment?

Basically yes, although we try to do the rematerialization only on the
RHS or LHS.  In more detail, if an assignment is not considered fragile
by the condition

  if (modify_this_stmt
      || gimple_has_volatile_ops (stmt)
      || contains_vce_or_bfcref_p (rhs)
      || contains_vce_or_bfcref_p (lhs)
      || stmt_ends_bb_p (stmt))

the following happens:

  1) if each replacement of the LHS has a match on the RHS in terms of
     offset and size, scalar assignment between them is performed,
     possibly with a VIEW_CONVERT_EXPR if types do not match.  

  2) If for some LHS replacement that is not the case, then we flush all
     RHS replacements either to the LHS (hoping to make the original RHS
     redundant) if the RHS replacements contain all data there is in
     RHS, or to RHS otherwise, and load the LHS replacement from the LHS
     or RHS that we picked (really the same one, we do not rely on the
     original assignment to carry the data).

  3) Same flushing happens when some part of LHS is not scalarized,
     unless we know it is never read, then we don't care.

  Generally, if we can prove that we do not loose any data, the original
  statement is deleted.
  
But let me emphasize again that whenever correctness is the issue, the
question whether an SRA recorded access comes from total scalarization
or not is not important.  Users accessing the data in some other part of
the function can create them too.  Users equipped with placement new can
create all sorts of weird type accesses and because tree-sra is flow
insensitive, they can then force scalarization to such replacements even
at places where the data have wildly different types.

Martin

Reply via email to