The following tries to address the issue that PRE is quite happy to introduce new IVs in loops just because it can compute some constant value on the loop entry edge. In principle there's already code that should work against that but it simply searches for a optimize_edge_for_speed () edge. That still considers the loop entry edge to be worth optimizing because it ends up as maybe_hot_edge_p (e) for -O2 which compares the edge count against the entry block count. For PRE we want something more local (comparing to the destination block count).
Now for the simple testcases this shouldn't make a difference but hot/cold uses PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION) which isn't the same as profile_probabilities likely or very likely... Still one key of the patch is that we compare the sum of the edge counts where the value is available (and thus the redundancy elimination happens) with the count we have to insert rather than looking for a single optimize_edge_for_speed_p edge. For that I've used if (avail_count < block->count.apply_probability (profile_probability::unlikely ())) so we block insertion if the redundancies would overall be "unlikely". I'm also not sure why maybe_hot_count_p uses HOT_BB_FREQUENCY_FRACTION while there exists HOT_BB_COUNT_FRACTION (with a ten-fold larger default value) that seems to match better for scaling a profile-count? Honza? Bootstrap & regtest running on x86-64-unknown-linux-gnu. Does the above predicate look sane or am I on a wrong track with using the destination block count here (I realize even the "locally cold" entries into the block might be quite hot globally). For a 1:1 translation of the existing code to sth using the original predicate but summing over edges I could use !maybe_hot_count_p (cfun, avail_count)? But then we're back to PRE doing the unwanted insertions. Changing maybe_hot_count_p to use HOT_BB_COUNT_FRACTION doesn't make any difference there (obviously). Thanks, Richard. 2019-10-06 Richard Biener <rguent...@suse.de> PR tree-optimization/91975 * tree-ssa-pre.c (do_pre_regular_insertion): Adjust profitability check to use the sum of all edge counts the value is available on and check against unlikely execution of the block. * gcc.dg/tree-ssa/ldist-39.c: New testcase. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c new file mode 100644 index 00000000000..a63548979ea --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ldist-details" } */ + +#define T int + +const T a[] = { 0, 1, 2, 3, 4, 5, 6, 7 }; +T b[sizeof a / sizeof *a]; + +void f0 (void) +{ + const T *s = a; + T *d = b; + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) + d[i] = s[i]; +} + +void g0 (void) +{ + const T *s = a; + T *d = b; + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) + *d++ = *s++; +} + +extern const T c[sizeof a / sizeof *a]; + +void f1 (void) +{ + const T *s = c; + T *d = b; + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) + d[i] = s[i]; +} + +void g1 (void) +{ + const T *s = c; + T *d = b; + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) + *d++ = *s++; +} + +/* { dg-final { scan-tree-dump-times "generated memcpy" 4 "ldist" } } */ diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index c618601a184..af49ba388c1 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -3195,7 +3195,7 @@ do_pre_regular_insertion (basic_block block, basic_block dom) pre_expr eprime = NULL; edge_iterator ei; pre_expr edoubleprime = NULL; - bool do_insertion = false; + profile_count avail_count = profile_count::zero (); val = get_expr_value_id (expr); if (bitmap_set_contains_value (PHI_GEN (block), val)) @@ -3250,10 +3250,7 @@ do_pre_regular_insertion (basic_block block, basic_block dom) { avail[pred->dest_idx] = edoubleprime; by_some = true; - /* We want to perform insertions to remove a redundancy on - a path in the CFG we want to optimize for speed. */ - if (optimize_edge_for_speed_p (pred)) - do_insertion = true; + avail_count += pred->count (); if (first_s == NULL) first_s = edoubleprime; else if (!pre_expr_d::equal (first_s, edoubleprime)) @@ -3266,7 +3263,11 @@ do_pre_regular_insertion (basic_block block, basic_block dom) partially redundant. */ if (!cant_insert && !all_same && by_some) { - if (!do_insertion) + /* We want to perform insertions to remove a redundancy on + a path in the CFG that is somewhat likely. Avoid inserting + when we'd only remove a redundancy on unlikely paths. */ + if (avail_count < block->count.apply_probability + (profile_probability::unlikely ())) { if (dump_file && (dump_flags & TDF_DETAILS)) {