The following tries to address the issue that PRE is quite happy
to introduce new IVs in loops just because it can compute some
constant value on the loop entry edge.  In principle there's
already code that should work against that but it simply searches
for a optimize_edge_for_speed () edge.  That still considers
the loop entry edge to be worth optimizing because it ends
up as maybe_hot_edge_p (e) for -O2 which compares the edge count
against the entry block count.  For PRE we want something more
local (comparing to the destination block count).

Now for the simple testcases this shouldn't make a difference
but hot/cold uses PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION) which
isn't the same as profile_probabilities likely or very likely...

Still one key of the patch is that we compare the sum of the
edge counts where the value is available (and thus the redundancy
elimination happens) with the count we have to insert rather
than looking for a single optimize_edge_for_speed_p edge.

For that I've used

    if (avail_count < block->count.apply_probability
                                    (profile_probability::unlikely ()))

so we block insertion if the redundancies would overall be "unlikely".

I'm also not sure why maybe_hot_count_p uses HOT_BB_FREQUENCY_FRACTION
while there exists HOT_BB_COUNT_FRACTION (with a ten-fold larger
default value) that seems to match better for scaling a profile-count?

Honza?

Bootstrap & regtest running on x86-64-unknown-linux-gnu.

Does the above predicate look sane or am I on a wrong track with
using the destination block count here (I realize even the "locally cold"
entries into the block might be quite hot globally).

For a 1:1 translation of the existing code to sth using the
original predicate but summing over edges I could use
!maybe_hot_count_p (cfun, avail_count)?  But then we're back to
PRE doing the unwanted insertions.  Changing maybe_hot_count_p
to use HOT_BB_COUNT_FRACTION doesn't make any difference there
(obviously).

Thanks,
Richard.

2019-10-06  Richard Biener  <rguent...@suse.de>

        PR tree-optimization/91975
        * tree-ssa-pre.c (do_pre_regular_insertion): Adjust
        profitability check to use the sum of all edge counts the
        value is available on and check against unlikely execution
        of the block.

        * gcc.dg/tree-ssa/ldist-39.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c
new file mode 100644
index 00000000000..a63548979ea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ldist-details" } */
+
+#define T int
+
+const T a[] = { 0, 1, 2, 3, 4, 5, 6, 7 };
+T b[sizeof a / sizeof *a];
+
+void f0 (void)
+{
+  const T *s = a;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+    d[i] = s[i];
+}
+
+void g0 (void)
+{
+  const T *s = a;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+    *d++ = *s++;
+}
+
+extern const T c[sizeof a / sizeof *a];
+
+void f1 (void)
+{
+  const T *s = c;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+    d[i] = s[i];
+}
+
+void g1 (void)
+{
+  const T *s = c;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+    *d++ = *s++;
+}
+
+/* { dg-final { scan-tree-dump-times "generated memcpy" 4 "ldist" } } */
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index c618601a184..af49ba388c1 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3195,7 +3195,7 @@ do_pre_regular_insertion (basic_block block, basic_block 
dom)
          pre_expr eprime = NULL;
          edge_iterator ei;
          pre_expr edoubleprime = NULL;
-         bool do_insertion = false;
+         profile_count avail_count = profile_count::zero ();
 
          val = get_expr_value_id (expr);
          if (bitmap_set_contains_value (PHI_GEN (block), val))
@@ -3250,10 +3250,7 @@ do_pre_regular_insertion (basic_block block, basic_block 
dom)
                {
                  avail[pred->dest_idx] = edoubleprime;
                  by_some = true;
-                 /* We want to perform insertions to remove a redundancy on
-                    a path in the CFG we want to optimize for speed.  */
-                 if (optimize_edge_for_speed_p (pred))
-                   do_insertion = true;
+                 avail_count += pred->count ();
                  if (first_s == NULL)
                    first_s = edoubleprime;
                  else if (!pre_expr_d::equal (first_s, edoubleprime))
@@ -3266,7 +3263,11 @@ do_pre_regular_insertion (basic_block block, basic_block 
dom)
             partially redundant.  */
          if (!cant_insert && !all_same && by_some)
            {
-             if (!do_insertion)
+             /* We want to perform insertions to remove a redundancy on
+                a path in the CFG that is somewhat likely.  Avoid inserting
+                when we'd only remove a redundancy on unlikely paths.  */
+             if (avail_count < block->count.apply_probability
+                                           (profile_probability::unlikely ()))
                {
                  if (dump_file && (dump_flags & TDF_DETAILS))
                    {

Reply via email to