https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #25 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
(In reply to Aldy Hernandez from comment #22)
> Created attachment 59001 [details]
> reduce recursion in forward threader (patch in testing)

Avoiding unnecessary recursion in simplify_control_stmt_condition_1() reduces
the time in DOM by 23%:

Before:
 dominator optimization             : 213.87 ( 51%)   0.41 (  7%) 215.03 ( 51%)
  113M (  6%)
 backwards jump threading           :  29.34 (  7%)   0.19 (  3%)  29.73 (  7%)
   68M (  3%)
 TOTAL                              : 416.61          5.97        423.92       
 2041M

After:
 dominator optimization             : 164.61 ( 44%)   0.40 (  7%) 165.65 ( 44%)
  113M (  6%)
 backwards jump threading           :  28.18 (  8%)   0.20 (  3%)  28.50 (  8%)
   68M (  3%)
 TOTAL                              : 370.40          5.88        377.37       
 2041M

As mentioned, in our benchmark suite I see a difference of 3 threads at -O1,
and nothing at -O2.  I still haven't been able to pin point the exact
difference in threads.  I will do so when I return next week (I'm having
logistical issues with daycare this week).  I wouldn't be surprised if it was
nothing, as the forward threader frequently tries to thread unreachable paths
which has made comparing the forward / backward threaders difficult.

If we look at dom_jt_simplifier::simplify(), we'll see that we first try with
DOM's internal tables, and then with ranger.  Avoiding the recursion in
simplify_control_stmt_condition_1() (the caller) with the proposed patch,
causes us to attempt to simplify far less paths, but it will cause less
attempts with DOM's internal tables.  In theory this is OK, because the
statements in question are either integer or pointers (no relations), so ranger
should be able to do this better than DOM:

  /* If the condition has the form (A & B) CMP 0 or (A | B) CMP 0 then
     recurse into the LHS to see if there is a simplification that
     makes this condition always true or always false along the edge
     E.  */
  if ((cond_code == EQ_EXPR || cond_code == NE_EXPR)
      && TREE_CODE (op0) == SSA_NAME
      && integer_zerop (op1))
    {
    ...
    }

In other words, the patch also reduces the recursion into DOM's tables, but I
think this is OK, because DOM's lookup_avail_expr() shouldn't be getting
anything ranger can't already get with pointers and integers.  I still want to
check those 3 threads (not in this testcase, but in our internal benchmarks) to
make sure.

Reply via email to