Now also to the list...

---------- Forwarded message ----------
Date: Wed, 6 Dec 2017 14:48:00 +0100 (CET)
From: Richard Biener <rguent...@suse.de>
To: Martin Jambor <mjam...@suse.cz>
Subject: Re: [PR 83141] Prevent SRA from removing type changing assignment

On Wed, 6 Dec 2017, Martin Jambor wrote:

> Hi,
> 
> On Tue, Dec 05 2017, Richard Biener wrote:
> > On Tue, 5 Dec 2017, Martin Jambor wrote:
> >
> >> On Tue, Dec 05 2017, Martin Jambor wrote:
> >> > On Tue, Dec 05 2017, Martin Jambor wrote:
> >> > Hi,
> >> >
> >> >> Hi,
> >> >>
> >> >> this is a followup to Richi's
> >> >> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02396.html to fix PR
> >> >> 83141.  The basic idea is simple, be just as conservative about type
> >> >> changing MEM_REFs as we are about actual VCEs.
> >> >>
> >> >> I have checked how that would affect compilation of SPEC 2006 and (non
> >> >> LTO) Mozilla Firefox and am happy to report that the difference was
> >> >> tiny.  However, I had to make the test less strict, otherwise testcase
> >> >> gcc.dg/guality/pr54970.c kept failing because it contains folded memcpy
> >> >> and expects us to track values accross:
> >> >>
> >> >>   int a[] = { 1, 2, 3 };
> >> >>   /* ... */
> >> >>   __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a));
> >> >>                                 /* { dg-final { gdb-test 31 "a\[0\]" 
> >> >> "4" } } */
> >> >>                                 /* { dg-final { gdb-test 31 "a\[1\]" 
> >> >> "5" } } */
> >> >>                                 /* { dg-final { gdb-test 31 "a\[2\]" 
> >> >> "6" } } */
> >> >>   
> >> >> SRA is able to load replacement of a[0] directly from the temporary
> >> >> array which is apparently necessary to generate proper debug info.  I
> >> >> have therefore allowed the current transformation to go forward if the
> >> >> source does not contain any padding or if it is a read-only declaration.
> >> >
> >> > Ah, the read-only test is of course bogus, it was a last minute addition
> >> > when I was apparently already too tired to think it through.  Please
> >> > disregard that line in the patch (it has passed bootstrap and testing
> >> > without it).
> >> >
> >> > Sorry for the noise,
> >> >
> >> > Martin
> >> >
> >> 
> >> And for the record, below is the actual patch, after a fresh round of
> >> re-testing to double check I did not mess up anything else.  As before,
> >> I'd like to ask for review, especially of the type_contains_padding_p
> >> predicate and then would like to commit it to trunk.
> >
> > Comments below...
> >
> >> Thanks,
> >> 
> >> Martin
> >> 
> >> 
> >> 2017-12-05  Martin Jambor  <mjam...@suse.cz>
> >> 
> >>    PR tree-optimization/83141
> >>    * tree-sra.c (type_contains_padding_p): New function.
> >>    (contains_vce_or_bfcref_p): Move up in the file, also test for
> >>    MEM_REFs implicitely changing types with padding.  Remove inline
> >>    keyword.
> >>    (build_accesses_from_assign): Check contains_vce_or_bfcref_p
> >>    before setting bit in should_scalarize_away_bitmap.
> >> 
> >> testsuite/
> >>    * gcc.dg/tree-ssa/pr83141.c: New test.
> >> ---
> >>  gcc/testsuite/gcc.dg/tree-ssa/pr83141.c |  31 +++++++++
> >>  gcc/tree-sra.c                          | 115 
> >> ++++++++++++++++++++++++++------
> >>  2 files changed, 127 insertions(+), 19 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> >> 
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c 
> >> b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> >> new file mode 100644
> >> index 00000000000..86caf66025b
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c
> >> @@ -0,0 +1,31 @@
> >> +/* { dg-do run } */
> >> +/* { dg-options "-O -fdump-tree-esra-details" } */
> >> +
> >> +volatile short vs;
> >> +volatile long vl;
> >> +
> >> +struct A { short s; long i; long j; };
> >> +struct A a, b;
> >> +void foo ()
> >> +{
> >> +  struct A c;
> >> +  __builtin_memcpy (&c, &b, sizeof (struct A));
> >> +  __builtin_memcpy (&a, &c, sizeof (struct A));
> >> +
> >> +  vs = c.s;
> >> +  vl = c.i;
> >> +  vl = c.j;
> >> +}
> >> +int main()
> >> +{
> >> +  __builtin_memset (&b, 0, sizeof (struct A));
> >> +  b.s = 1;
> >> +  __builtin_memcpy ((char *)&b+2, &b, 2);
> >> +  foo ();
> >> +  __builtin_memcpy (&a, (char *)&a+2, 2);
> >> +  if (a.s != 1)
> >> +    __builtin_abort ();
> >> +  return 0;
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-not "Will attempt to totally scalarize" 
> >> "esra" } } */
> >> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> >> index 866cff0edb0..df9f10f83b6 100644
> >> --- a/gcc/tree-sra.c
> >> +++ b/gcc/tree-sra.c
> >> @@ -1141,6 +1141,100 @@ contains_view_convert_expr_p (const_tree ref)
> >>    return false;
> >>  }
> >>  
> >> +/* Return false if we can determine that TYPE has no padding, otherwise 
> >> return
> >> +   true.  */
> >> +
> >> +static bool
> >> +type_contains_padding_p (tree type)
> >> +{
> >> +  if (is_gimple_reg_type (type))
> >> +    return false;
> >> +
> >> +  if (!tree_fits_uhwi_p (TYPE_SIZE (type)))
> >> +    return true;
> >> +
> >> +  switch (TREE_CODE (type))
> >> +    {
> >> +    case RECORD_TYPE:
> >> +      {
> >> +  unsigned HOST_WIDE_INT pos = 0;
> >> +  for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
> >> +    if (TREE_CODE (fld) == FIELD_DECL)
> >> +      {
> >> +        tree ft = TREE_TYPE (fld);
> >> +
> >> +        if (DECL_BIT_FIELD (fld)
> >> +            || DECL_PADDING_P (fld)
> >> +            || !tree_fits_uhwi_p (TYPE_SIZE (ft)))
> >> +          return true;
> >> +
> >> +        tree t = bit_position (fld);
> >> +        if (!tree_fits_uhwi_p (t))
> >> +          return true;
> >> +        unsigned HOST_WIDE_INT bp = tree_to_uhwi (t);
> >> +        if (pos != bp)
> >> +          return true;
> >> +
> >> +        pos += tree_to_uhwi (TYPE_SIZE (ft));
> >> +        if (type_contains_padding_p (ft))
> >> +          return true;
> >> +      }
> >> +
> >> +  if (pos != tree_to_uhwi (TYPE_SIZE (type)))
> >> +    return true;
> >> +
> >> +  return false;
> >> +      }
> >> +
> >> +    case ARRAY_TYPE:
> >> +      return type_contains_padding_p (TYPE_SIZE (type));
> >> +
> >> +    default:
> >> +      return true;
> >> +    }
> >> +  gcc_unreachable ();
> >> +}
> >
> > The possibly repeated walk over all fields and subfields of an aggregate
> > type looks expensive - some caching on the main variant might be
> > advisable to me.
> 
> It almost never happens but I guess you are right.  But see below.
> 
> >
> >> +/* Return true if REF contains a MEM_REF that performs type conversion 
> >> from a
> >> +   type with padding or any VIEW_CONVERT_EXPR or a COMPONENT_REF with a
> >> +   bit-field field declaration.  */
> >> +
> >> +static bool
> >> +contains_vce_or_bfcref_p (const_tree ref)
> >> +{
> >> +  while (true)
> >> +    {
> >> +      if (TREE_CODE (ref) == MEM_REF)
> >> +  {
> >> +    tree op0 = TREE_OPERAND (ref, 0);
> >> +    if (TREE_CODE (op0) == ADDR_EXPR)
> >> +      {
> >> +        tree mem = TREE_OPERAND (op0, 0);
> >> +        if ((TYPE_MAIN_VARIANT (TREE_TYPE (ref))
> >> +             != TYPE_MAIN_VARIANT (TREE_TYPE (mem)))
> >> +            && type_contains_padding_p (TREE_TYPE (mem)))
> >> +          return true;
> >> +
> >> +        ref = mem;
> >> +        continue;
> >> +      }
> >> +    else
> >> +      return false;
> >> +  }
> >> +
> >> +      if (!handled_component_p (ref))
> >> +  return false;
> >> +
> >> +      if (TREE_CODE (ref) == VIEW_CONVERT_EXPR
> >> +    || (TREE_CODE (ref) == COMPONENT_REF
> >> +        && DECL_BIT_FIELD (TREE_OPERAND (ref, 1))))
> >> +  return true;
> >> +      ref = TREE_OPERAND (ref, 0);
> >> +    }
> >
> > It is best to retain the old code and simply append
> >
> >   if (TREE_CODE (ref) == MEM_REF)
> >
> > after the while (handled_component_p ()) loop.  A MEM_REF can only
> > appear in innermost position.
> 
> I hoped so but could not find any such check quickly in the tree-cfg.c
> verifier.
> 
> >
> > I'm not sure that we really want to retain SRAing structures
> > based on their fields when we see a VCEd MEM_REF accessing them.
> > This might for example do floating-point accesses on data that
> > isn't floating-point and on some architectures may raise exceptions
> > (IA64 comes to my mind).  The memcpy folding code explicitely disallows
> > float modes and also BOOLEAN_TYPE and ENUMERAL_TYPE becuase they
> > might not match their precision:
> >
> >       /* Make sure we are not copying using a floating-point mode or
> >          a type whose size possibly does not match its precision.  */
> >       if (FLOAT_MODE_P (TYPE_MODE (desttype))
> >           || TREE_CODE (desttype) == BOOLEAN_TYPE
> >           || TREE_CODE (desttype) == ENUMERAL_TYPE)
> >         desttype = bitwise_type_for_mode (TYPE_MODE (desttype));
> >       if (FLOAT_MODE_P (TYPE_MODE (srctype))
> >           || TREE_CODE (srctype) == BOOLEAN_TYPE
> >           || TREE_CODE (srctype) == ENUMERAL_TYPE)
> >         srctype = bitwise_type_for_mode (TYPE_MODE (srctype));
> >
> > so - what's the reason you think you need to retain SRAing memcpied
> > structs?  Via total scalarization I mean - if we see the "real" accesses
> > besides the aggregate copy then we of course win.
> 
> Two reasons.  First, Jakub explicitely asked me to do it in
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02139.html.  I will openly
> admit that at the time I was planning to silently ignore it and hope to
> get away with it because I shared your caution.  (Now I see I forgot
> about the second request which I did not want to ignore and will adjust
> the testcase accordingly.)
> 
> Second is the testcase I described in my previous email.  When I saw
> 
>   FAIL: gcc.dg/guality/pr54970.c   -O1  line 31 a[0] == 4
> 
> At all optimization levels, I grumbled about Jakub being right again and
> duly decided to bite the bullet and do what he asked me to because it
> fixes the issue.  But if you allow me to XFAIL the guality test, I will
> happily remove the whole padding detection, I don't really like it
> either.
> 
> The debug information is apparently lost because a[0] is never used from
> that point on, as opposed to a[1] and a[2] for which the guality test
> still passes.

XFAILing that is fine I think.

> >
> > Do we check anywhere when seeing an aggregate copy that lhs and rhs
> > have matching TYPE_MAIN_VARIANT?
> 
> No.  I can do some due diligence measurements but I think it would not
> hurt to add it.
> 
> > GIMPLE allows matching TYPE_CANONICAL
> > which can have mismatched fields (LTO, cross-language) or even any
> > type if TYPE_STRUCTURAL_EQUALITY.  That is, what happens if we mark
> > two distinct enough decls for total scalarization that are copied
> > to each other?  Do we end up re-materializing them if their
> > "sturcture" doesn't match when transforming the assignment?
> 
> Basically yes, although we try to do the rematerialization only on the
> RHS or LHS.  In more detail, if an assignment is not considered fragile
> by the condition
> 
>   if (modify_this_stmt
>       || gimple_has_volatile_ops (stmt)
>       || contains_vce_or_bfcref_p (rhs)
>       || contains_vce_or_bfcref_p (lhs)
>       || stmt_ends_bb_p (stmt))
> 
> the following happens:
> 
>   1) if each replacement of the LHS has a match on the RHS in terms of
>      offset and size, scalar assignment between them is performed,
>      possibly with a VIEW_CONVERT_EXPR if types do not match.  
> 
>   2) If for some LHS replacement that is not the case, then we flush all
>      RHS replacements either to the LHS (hoping to make the original RHS
>      redundant) if the RHS replacements contain all data there is in
>      RHS, or to RHS otherwise, and load the LHS replacement from the LHS
>      or RHS that we picked (really the same one, we do not rely on the
>      original assignment to carry the data).
> 
>   3) Same flushing happens when some part of LHS is not scalarized,
>      unless we know it is never read, then we don't care.
> 
>   Generally, if we can prove that we do not loose any data, the original
>   statement is deleted.
>   
> But let me emphasize again that whenever correctness is the issue, the
> question whether an SRA recorded access comes from total scalarization
> or not is not important.  Users accessing the data in some other part of
> the function can create them too.  Users equipped with placement new can
> create all sorts of weird type accesses and because tree-sra is flow
> insensitive, they can then force scalarization to such replacements even
> at places where the data have wildly different types.

Yes, but SRA will never create loads or stores for the non-total
scalarization case it will only choose one (better non-FP if not
all accesses are FP - I think compare_access_positions guarantees that) 
scalar register for each replacement, right?

Basically it will replace _all_ accesses of a memory piece with
a register instead, making that memory piece unused?

So for an aggregate assignment chain like

 A.x = ...;
 B = A;
 C = B;
   = ... C.y;

that is, B is only aggregate-accessed but A and C are also otherwise.
Here we can chose to elide B based on the accesses of A and B.

Richard.

Reply via email to