https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #24 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #13)
> Most of the -O1 dom time is spent in threading using path ranger to simplify
> the JT conditions.  That in turn does (for each threading from scratch?)
> GORI computes with most of the time spent in range_from_dom in the
> gori_compute::may_recompute_p cycle (and there doing bitmap operations).
> That compute has a 'depth' --param but it looks like range_from_dom
> doesn't and we have a very deep dominator tree for this testcase.

The path solver does look outside of the path, but only for incoming ranges to
the path (range_on_entry).  It uses a ranger, but it is shared among all paths
and queries to it should be O(1) as they should all be cached in the ranger. 
This is in DOM:

  /* Recursively walk the dominator tree optimizing statements.  */
  gimple_ranger *ranger = enable_ranger (fun);
  path_range_query path_query (*ranger, /*resolve=*/true);

We try really hard in path_range_query to not query anything outside of the
path, not even with the shared ranger, unless m_resolve == true.  But even so,
things should be cached.

Curiously, I tried disabling the full ranger resolver in DOM threading with:

diff --git a/gcc/tree-ssa-dom.cc b/gcc/tree-ssa-dom.cc
index 221fe6db321..7279d359606 100644
--- a/gcc/tree-ssa-dom.cc
+++ b/gcc/tree-ssa-dom.cc
@@ -926,7 +926,7 @@ pass_dominator::execute (function *fun)

   /* Recursively walk the dominator tree optimizing statements.  */
   gimple_ranger *ranger = enable_ranger (fun);
-  path_range_query path_query (*ranger);
+  path_range_query path_query (*ranger, /*resolve=*/false);
   dom_jt_simplifier simplifier (avail_exprs_stack, ranger, &path_query);
   dom_jt_state state (const_and_copies, avail_exprs_stack);
   jump_threader threader (&simplifier, &state);

But this only reduced the usage from 164 secs to 85 secs.  Intuitively it seems
logical that if we're doing full resolving of things outside of the path, PHI
edges, and doing a lot more work, that we could be 2x as slow.  But we're still
slow.

 dominator optimization             :  85.59 ( 30%)   0.33 (  6%)  86.15 ( 30%)
  113M (  6%)
 backwards jump threading           :  29.58 ( 10%)   0.23 (  4%)  29.97 ( 10%)
   68M (  3%)
 TOTAL                              : 284.35          5.84        291.01       
 2041M

Perhaps Andrew can comment on the cache and on the range_from_dom() impact.  We
shouldn't be looking outside of the path, at least not in a way that impacts
compile time drastically, if that's the case.

Reply via email to