On Wed, 18 Sep 2019 at 22:17, Prathamesh Kulkarni
<prathamesh.kulka...@linaro.org> wrote:
>
> On Wed, 18 Sep 2019 at 01:46, Richard Biener <richard.guent...@gmail.com> 
> wrote:
> >
> > On Tue, Sep 17, 2019 at 7:18 PM Wilco Dijkstra <wilco.dijks...@arm.com> 
> > wrote:
> > >
> > > Hi Richard,
> > >
> > > > The issue with the bugzilla is that it lacked appropriate testcase(s) 
> > > > and thus
> > > > it is now a mess.  There are clear testcases (maybe not in the 
> > > > benchmarks you
> > >
> > > Agreed - it's not clear whether any of the proposed changes would actually
> > > help the original issue. My patch absolutely does.
> > >
> > > > care about) that benefit from code hoisting as enabler, mainly when 
> > > > control
> > > > flow can be then converted to data flow.  Also note that "size 
> > > > optimizations"
> > > > are important for all cases where followup transforms have size limits 
> > > > on the IL
> > > > in place.
> > >
> > > The gain from -fcode-hoisting is about 0.2% overall on Thumb-2. Ie. it's 
> > > definitely
> > > useful, but there are much larger gains to be had from other tweaks [1]. 
> > > So we can
> > > live without it until a better solution is found.
> >
> > A "solution" for better eembc benchmark results?
> >
> > The issues are all latent even w/o code-hoisting since you can do the
> > same transform at the source level.  Which is usually why I argue
> > trying to fix this in code-hoisting is not a complete fix.  Nor is turning
> > off random GIMPLE passes for specific benchmark regressions.
> >
> > Anyway, it's arm maintainers call if you want to have such hacks in
> > place or not.
> >
> > As a release manager I say that GCC isn't a benchmark compiler.
> >
> > As the one "responsible" for the code-hoisting introduction I say that
> > as long as I don't have access to the actual benchmark I can't assess
> > wrongdoing of the pass nor suggest an appropriate place for optimization.
> Hi Richard,
> The actual benchmark function for PR80155 is almost identical to FMS()
> function defined in
> pr77445-2.c, and the test-case reproduces the same issue as in the benchmark.
Hi,
The attached patch is another workaround for hoisting. The rationale
behind the patch is
to avoid "long range" hoistings  for a "large enough" CFG.

The patch walks dom tree for the block and finds the "furthest" dom
block, for which
intersect_p (availout_in_some, AVAIL_OUT (dom_block)) is true. The
"distance" is measured
in terms of dom depth.

We can have two params say param_hoist_n_bbs to determine "large enough"
CFG, and param_hoist_expr_dist, that avoids hoisting if the "distance"
exceeds the param threshold.
For the values (hardcoded) in the patch, it "works" for avoiding the
spill and does not regress ssa-pre-*.c and ssa-hoist-*.c
tests (the param values could be made target-specific). Does the
approach look reasonable ?
My concern with current version is that walking dom tree per block in
do_hoist_insertion can end up being
pretty expensive. Any suggestions on how we could speed that up ?

Alternatively, we could consider hoisting from "bottom up", one block
at a time, and keep a map to count number of
times an expr is hoisted and refuse hoisting the expr further up if it
exceeds target-specific param threshold ?

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Richard.
> >
> > >
> > > [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01739.html
> > >
> > > Wilco
diff --git a/gcc/graph.h b/gcc/graph.h
index 5ec4f1c107f..c05cc6df25b 100644
--- a/gcc/graph.h
+++ b/gcc/graph.h
@@ -24,5 +24,6 @@ extern void print_graph_cfg (const char *, struct function *);
 extern void clean_graph_dump_file (const char *);
 extern void finish_graph_dump_file (const char *);
 extern void debug_dot_cfg (struct function *);
+extern void save_dot_cfg (struct function *);
 
 #endif /* ! GCC_GRAPH_H */
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index c618601a184..d13d1fdfd8a 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3463,6 +3463,35 @@ do_pre_partial_partial_insertion (basic_block block, basic_block dom)
   return new_stuff;
 }
 
+/* Return longest "distance" of block, for which
+   intersect_p (availout_in_some, AVAIL_OUT (block)) is true.
+   Distance is measured in terms of dom depth.  */
+
+static int
+get_longest_dist (bitmap_head availout_in_some, basic_block block)
+{
+  basic_block son;
+  int dist;
+  int max_dist = 0;
+
+  for (son = first_dom_son (CDI_DOMINATORS, block);
+       son;
+       son = next_dom_son (CDI_DOMINATORS, son))
+    {
+      dist = get_longest_dist (availout_in_some, son);
+      if (dist > max_dist)
+	max_dist = dist;
+    }
+
+  if (max_dist != -1)
+    return max_dist + 1;
+
+  if (bitmap_intersect_p (&availout_in_some, &AVAIL_OUT (block)->values))
+    return 0;
+
+  return -1;
+}
+
 /* Insert expressions in BLOCK to compute hoistable values up.
    Return TRUE if something was inserted, otherwise return FALSE.
    The caller has to make sure that BLOCK has at least two successors.  */
@@ -3509,6 +3538,7 @@ do_hoist_insertion (basic_block block)
   if (bitmap_empty_p (&hoistable_set.values))
     return false;
 
+
   /* Compute which of the hoistable values is in AVAIL_OUT of
      at least one of the successors of BLOCK.  */
   bitmap_head availout_in_some;
@@ -3527,6 +3557,20 @@ do_hoist_insertion (basic_block block)
   if (bitmap_empty_p (&availout_in_some))
     return false;
 
+  int param_hoist_n_bbs = 50;
+  int param_expr_hoist_dist = 2;
+
+  /* ??? For "large" CFG, avoid "long range" hoisting of values
+     from AVAIL_OUT (dom blocks) into candidate block. The distance
+     is measured in terms of dominator depth.  */
+
+  if (n_basic_blocks_for_fn (cfun) > param_hoist_n_bbs)
+    {
+      int dist = get_longest_dist (availout_in_some, block);
+      if (dist > param_expr_hoist_dist)
+	return false;
+    }
+
   /* Hack hoitable_set in-place so we can use sorted_array_from_bitmap_set.  */
   bitmap_move (&hoistable_set.values, &availout_in_some);
   hoistable_set.expressions = ANTIC_IN (block)->expressions;

Reply via email to