Now also to the list... ---------- Forwarded message ---------- Date: Wed, 6 Dec 2017 14:48:00 +0100 (CET) From: Richard Biener <rguent...@suse.de> To: Martin Jambor <mjam...@suse.cz> Subject: Re: [PR 83141] Prevent SRA from removing type changing assignment
On Wed, 6 Dec 2017, Martin Jambor wrote: > Hi, > > On Tue, Dec 05 2017, Richard Biener wrote: > > On Tue, 5 Dec 2017, Martin Jambor wrote: > > > >> On Tue, Dec 05 2017, Martin Jambor wrote: > >> > On Tue, Dec 05 2017, Martin Jambor wrote: > >> > Hi, > >> > > >> >> Hi, > >> >> > >> >> this is a followup to Richi's > >> >> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02396.html to fix PR > >> >> 83141. The basic idea is simple, be just as conservative about type > >> >> changing MEM_REFs as we are about actual VCEs. > >> >> > >> >> I have checked how that would affect compilation of SPEC 2006 and (non > >> >> LTO) Mozilla Firefox and am happy to report that the difference was > >> >> tiny. However, I had to make the test less strict, otherwise testcase > >> >> gcc.dg/guality/pr54970.c kept failing because it contains folded memcpy > >> >> and expects us to track values accross: > >> >> > >> >> int a[] = { 1, 2, 3 }; > >> >> /* ... */ > >> >> __builtin_memcpy (&a, (int [3]) { 4, 5, 6 }, sizeof (a)); > >> >> /* { dg-final { gdb-test 31 "a\[0\]" > >> >> "4" } } */ > >> >> /* { dg-final { gdb-test 31 "a\[1\]" > >> >> "5" } } */ > >> >> /* { dg-final { gdb-test 31 "a\[2\]" > >> >> "6" } } */ > >> >> > >> >> SRA is able to load replacement of a[0] directly from the temporary > >> >> array which is apparently necessary to generate proper debug info. I > >> >> have therefore allowed the current transformation to go forward if the > >> >> source does not contain any padding or if it is a read-only declaration. > >> > > >> > Ah, the read-only test is of course bogus, it was a last minute addition > >> > when I was apparently already too tired to think it through. Please > >> > disregard that line in the patch (it has passed bootstrap and testing > >> > without it). > >> > > >> > Sorry for the noise, > >> > > >> > Martin > >> > > >> > >> And for the record, below is the actual patch, after a fresh round of > >> re-testing to double check I did not mess up anything else. As before, > >> I'd like to ask for review, especially of the type_contains_padding_p > >> predicate and then would like to commit it to trunk. > > > > Comments below... > > > >> Thanks, > >> > >> Martin > >> > >> > >> 2017-12-05 Martin Jambor <mjam...@suse.cz> > >> > >> PR tree-optimization/83141 > >> * tree-sra.c (type_contains_padding_p): New function. > >> (contains_vce_or_bfcref_p): Move up in the file, also test for > >> MEM_REFs implicitely changing types with padding. Remove inline > >> keyword. > >> (build_accesses_from_assign): Check contains_vce_or_bfcref_p > >> before setting bit in should_scalarize_away_bitmap. > >> > >> testsuite/ > >> * gcc.dg/tree-ssa/pr83141.c: New test. > >> --- > >> gcc/testsuite/gcc.dg/tree-ssa/pr83141.c | 31 +++++++++ > >> gcc/tree-sra.c | 115 > >> ++++++++++++++++++++++++++------ > >> 2 files changed, 127 insertions(+), 19 deletions(-) > >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr83141.c > >> > >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c > >> b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c > >> new file mode 100644 > >> index 00000000000..86caf66025b > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83141.c > >> @@ -0,0 +1,31 @@ > >> +/* { dg-do run } */ > >> +/* { dg-options "-O -fdump-tree-esra-details" } */ > >> + > >> +volatile short vs; > >> +volatile long vl; > >> + > >> +struct A { short s; long i; long j; }; > >> +struct A a, b; > >> +void foo () > >> +{ > >> + struct A c; > >> + __builtin_memcpy (&c, &b, sizeof (struct A)); > >> + __builtin_memcpy (&a, &c, sizeof (struct A)); > >> + > >> + vs = c.s; > >> + vl = c.i; > >> + vl = c.j; > >> +} > >> +int main() > >> +{ > >> + __builtin_memset (&b, 0, sizeof (struct A)); > >> + b.s = 1; > >> + __builtin_memcpy ((char *)&b+2, &b, 2); > >> + foo (); > >> + __builtin_memcpy (&a, (char *)&a+2, 2); > >> + if (a.s != 1) > >> + __builtin_abort (); > >> + return 0; > >> +} > >> + > >> +/* { dg-final { scan-tree-dump-not "Will attempt to totally scalarize" > >> "esra" } } */ > >> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > >> index 866cff0edb0..df9f10f83b6 100644 > >> --- a/gcc/tree-sra.c > >> +++ b/gcc/tree-sra.c > >> @@ -1141,6 +1141,100 @@ contains_view_convert_expr_p (const_tree ref) > >> return false; > >> } > >> > >> +/* Return false if we can determine that TYPE has no padding, otherwise > >> return > >> + true. */ > >> + > >> +static bool > >> +type_contains_padding_p (tree type) > >> +{ > >> + if (is_gimple_reg_type (type)) > >> + return false; > >> + > >> + if (!tree_fits_uhwi_p (TYPE_SIZE (type))) > >> + return true; > >> + > >> + switch (TREE_CODE (type)) > >> + { > >> + case RECORD_TYPE: > >> + { > >> + unsigned HOST_WIDE_INT pos = 0; > >> + for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld)) > >> + if (TREE_CODE (fld) == FIELD_DECL) > >> + { > >> + tree ft = TREE_TYPE (fld); > >> + > >> + if (DECL_BIT_FIELD (fld) > >> + || DECL_PADDING_P (fld) > >> + || !tree_fits_uhwi_p (TYPE_SIZE (ft))) > >> + return true; > >> + > >> + tree t = bit_position (fld); > >> + if (!tree_fits_uhwi_p (t)) > >> + return true; > >> + unsigned HOST_WIDE_INT bp = tree_to_uhwi (t); > >> + if (pos != bp) > >> + return true; > >> + > >> + pos += tree_to_uhwi (TYPE_SIZE (ft)); > >> + if (type_contains_padding_p (ft)) > >> + return true; > >> + } > >> + > >> + if (pos != tree_to_uhwi (TYPE_SIZE (type))) > >> + return true; > >> + > >> + return false; > >> + } > >> + > >> + case ARRAY_TYPE: > >> + return type_contains_padding_p (TYPE_SIZE (type)); > >> + > >> + default: > >> + return true; > >> + } > >> + gcc_unreachable (); > >> +} > > > > The possibly repeated walk over all fields and subfields of an aggregate > > type looks expensive - some caching on the main variant might be > > advisable to me. > > It almost never happens but I guess you are right. But see below. > > > > >> +/* Return true if REF contains a MEM_REF that performs type conversion > >> from a > >> + type with padding or any VIEW_CONVERT_EXPR or a COMPONENT_REF with a > >> + bit-field field declaration. */ > >> + > >> +static bool > >> +contains_vce_or_bfcref_p (const_tree ref) > >> +{ > >> + while (true) > >> + { > >> + if (TREE_CODE (ref) == MEM_REF) > >> + { > >> + tree op0 = TREE_OPERAND (ref, 0); > >> + if (TREE_CODE (op0) == ADDR_EXPR) > >> + { > >> + tree mem = TREE_OPERAND (op0, 0); > >> + if ((TYPE_MAIN_VARIANT (TREE_TYPE (ref)) > >> + != TYPE_MAIN_VARIANT (TREE_TYPE (mem))) > >> + && type_contains_padding_p (TREE_TYPE (mem))) > >> + return true; > >> + > >> + ref = mem; > >> + continue; > >> + } > >> + else > >> + return false; > >> + } > >> + > >> + if (!handled_component_p (ref)) > >> + return false; > >> + > >> + if (TREE_CODE (ref) == VIEW_CONVERT_EXPR > >> + || (TREE_CODE (ref) == COMPONENT_REF > >> + && DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))) > >> + return true; > >> + ref = TREE_OPERAND (ref, 0); > >> + } > > > > It is best to retain the old code and simply append > > > > if (TREE_CODE (ref) == MEM_REF) > > > > after the while (handled_component_p ()) loop. A MEM_REF can only > > appear in innermost position. > > I hoped so but could not find any such check quickly in the tree-cfg.c > verifier. > > > > > I'm not sure that we really want to retain SRAing structures > > based on their fields when we see a VCEd MEM_REF accessing them. > > This might for example do floating-point accesses on data that > > isn't floating-point and on some architectures may raise exceptions > > (IA64 comes to my mind). The memcpy folding code explicitely disallows > > float modes and also BOOLEAN_TYPE and ENUMERAL_TYPE becuase they > > might not match their precision: > > > > /* Make sure we are not copying using a floating-point mode or > > a type whose size possibly does not match its precision. */ > > if (FLOAT_MODE_P (TYPE_MODE (desttype)) > > || TREE_CODE (desttype) == BOOLEAN_TYPE > > || TREE_CODE (desttype) == ENUMERAL_TYPE) > > desttype = bitwise_type_for_mode (TYPE_MODE (desttype)); > > if (FLOAT_MODE_P (TYPE_MODE (srctype)) > > || TREE_CODE (srctype) == BOOLEAN_TYPE > > || TREE_CODE (srctype) == ENUMERAL_TYPE) > > srctype = bitwise_type_for_mode (TYPE_MODE (srctype)); > > > > so - what's the reason you think you need to retain SRAing memcpied > > structs? Via total scalarization I mean - if we see the "real" accesses > > besides the aggregate copy then we of course win. > > Two reasons. First, Jakub explicitely asked me to do it in > https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02139.html. I will openly > admit that at the time I was planning to silently ignore it and hope to > get away with it because I shared your caution. (Now I see I forgot > about the second request which I did not want to ignore and will adjust > the testcase accordingly.) > > Second is the testcase I described in my previous email. When I saw > > FAIL: gcc.dg/guality/pr54970.c -O1 line 31 a[0] == 4 > > At all optimization levels, I grumbled about Jakub being right again and > duly decided to bite the bullet and do what he asked me to because it > fixes the issue. But if you allow me to XFAIL the guality test, I will > happily remove the whole padding detection, I don't really like it > either. > > The debug information is apparently lost because a[0] is never used from > that point on, as opposed to a[1] and a[2] for which the guality test > still passes. XFAILing that is fine I think. > > > > Do we check anywhere when seeing an aggregate copy that lhs and rhs > > have matching TYPE_MAIN_VARIANT? > > No. I can do some due diligence measurements but I think it would not > hurt to add it. > > > GIMPLE allows matching TYPE_CANONICAL > > which can have mismatched fields (LTO, cross-language) or even any > > type if TYPE_STRUCTURAL_EQUALITY. That is, what happens if we mark > > two distinct enough decls for total scalarization that are copied > > to each other? Do we end up re-materializing them if their > > "sturcture" doesn't match when transforming the assignment? > > Basically yes, although we try to do the rematerialization only on the > RHS or LHS. In more detail, if an assignment is not considered fragile > by the condition > > if (modify_this_stmt > || gimple_has_volatile_ops (stmt) > || contains_vce_or_bfcref_p (rhs) > || contains_vce_or_bfcref_p (lhs) > || stmt_ends_bb_p (stmt)) > > the following happens: > > 1) if each replacement of the LHS has a match on the RHS in terms of > offset and size, scalar assignment between them is performed, > possibly with a VIEW_CONVERT_EXPR if types do not match. > > 2) If for some LHS replacement that is not the case, then we flush all > RHS replacements either to the LHS (hoping to make the original RHS > redundant) if the RHS replacements contain all data there is in > RHS, or to RHS otherwise, and load the LHS replacement from the LHS > or RHS that we picked (really the same one, we do not rely on the > original assignment to carry the data). > > 3) Same flushing happens when some part of LHS is not scalarized, > unless we know it is never read, then we don't care. > > Generally, if we can prove that we do not loose any data, the original > statement is deleted. > > But let me emphasize again that whenever correctness is the issue, the > question whether an SRA recorded access comes from total scalarization > or not is not important. Users accessing the data in some other part of > the function can create them too. Users equipped with placement new can > create all sorts of weird type accesses and because tree-sra is flow > insensitive, they can then force scalarization to such replacements even > at places where the data have wildly different types. Yes, but SRA will never create loads or stores for the non-total scalarization case it will only choose one (better non-FP if not all accesses are FP - I think compare_access_positions guarantees that) scalar register for each replacement, right? Basically it will replace _all_ accesses of a memory piece with a register instead, making that memory piece unused? So for an aggregate assignment chain like A.x = ...; B = A; C = B; = ... C.y; that is, B is only aggregate-accessed but A and C are also otherwise. Here we can chose to elide B based on the accesses of A and B. Richard.