https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651

--- Comment #10 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 18 Jan 2018, aldyh at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651
> 
> Aldy Hernandez <aldyh at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |jakub at gcc dot gnu.org,
>                    |                            |rguenth at gcc dot gnu.org
> 
> --- Comment #9 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
> Original regression in 7.x started with the -fcode-hoisting pass in r238242. 
> Things started improving with r254948, though that is probably unrelated.
> 
> Perhaps Richard can comment.

code-hoisting does its Job - it reduces the number of stmts in the
program.  Together with PRE code-hoisting enables more PRE and
thus causes extra PHIs (not sure if those are the problem).  But
if you look at code-hoisting in isolation (-fcode-hoisting -fno-tree-pre)
then it should be always profitable - it's probably the extra PRE
that does the harm here.  Nubers on my machine:

> ./xgcc -B. t.c -O2
> /usr//bin/time ./a.out 
4.13user 0.00system 0:04.13elapsed 99%CPU (0avgtext+0avgdata 
1040maxresident)k
0inputs+0outputs (0major+61minor)pagefaults 0swaps
> /usr//bin/time ./a.out 
4.06user 0.00system 0:04.06elapsed 100%CPU (0avgtext+0avgdata 
1032maxresident)k
0inputs+0outputs (0major+60minor)pagefaults 0swaps
> ./xgcc -B. t.c -O2 -fno-tree-pre -fcode-hoisting
> /usr//bin/time ./a.out 
3.87user 0.00system 0:03.87elapsed 99%CPU (0avgtext+0avgdata 
1052maxresident)k
0inputs+0outputs (0major+61minor)pagefaults 0swaps
> /usr//bin/time ./a.out 
3.90user 0.00system 0:03.90elapsed 99%CPU (0avgtext+0avgdata 
1060maxresident)k
0inputs+0outputs (0major+62minor)pagefaults 0swaps
> ./xgcc -B. t.c -O2 -ftree-pre -fno-code-hoisting
> /usr//bin/time ./a.out 
3.85user 0.00system 0:03.85elapsed 100%CPU (0avgtext+0avgdata 
1032maxresident)k
0inputs+0outputs (0major+60minor)pagefaults 0swaps
> /usr//bin/time ./a.out 
3.85user 0.01system 0:03.87elapsed 99%CPU (0avgtext+0avgdata 
1060maxresident)k
0inputs+0outputs (0major+62minor)pagefaults 0swaps

note that both PRE and code-hoisting are sources of increased
register pressure.

> ./xgcc -B. t.c -O2 -ftree-pre -fcode-hoisting -S
> grep rsp t.s | wc -l
47
> ./xgcc -B. t.c -O2 -ftree-pre -fno-code-hoisting -S
> grep rsp t.s | wc -l
11
> ./xgcc -B. t.c -O2 -fno-tree-pre -fcode-hoisting -S
> grep rsp t.s | wc -l
11

taming PRE down by decoupling code hoisting and PRE results in

> ./xgcc -B. t.c -O2 -ftree-pre -fcode-hoisting -S
> grep rsp t.s | wc -l
11
> ./xgcc -B. t.c -O2 -ftree-pre -fcode-hoisting 
> /usr//bin/time ./a.out 
3.90user 0.00system 0:03.90elapsed 100%CPU (0avgtext+0avgdata 
1148maxresident)k
0inputs+0outputs (0major+63minor)pagefaults 0swaps
> /usr//bin/time ./a.out 
3.89user 0.00system 0:03.89elapsed 100%CPU (0avgtext+0avgdata 
1128maxresident)k
0inputs+0outputs (0major+60minor)pagefaults 0swaps

Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c  (revision 256837)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -3687,15 +3687,23 @@ insert (void)
       if (dump_file && dump_flags & TDF_DETAILS)
        fprintf (dump_file, "Starting insert iteration %d\n", 
num_iterations);
       new_stuff = insert_aux (ENTRY_BLOCK_PTR_FOR_FN (cfun), 
flag_tree_pre,
-                             flag_code_hoisting);
+                             false);

       /* Clear the NEW sets before the next iteration.  We have already
          fully propagated its contents.  */
-      if (new_stuff)
+      if (new_stuff || flag_code_hoisting)
        FOR_ALL_BB_FN (bb, cfun)
          bitmap_set_free (NEW_SETS (bb));
     }
   statistics_histogram_event (cfun, "insert iterations", num_iterations);
+
+  if (flag_code_hoisting)
+    {
+      if (dump_file && dump_flags & TDF_DETAILS)
+       fprintf (dump_file, "Starting insert for code hoisting\n");
+      new_stuff = insert_aux (ENTRY_BLOCK_PTR_FOR_FN (cfun), false,
+                             flag_code_hoisting);
+    }
 }

but AFAIU this patch shouldn't have any effect...  I guess I have
to think about this 2nd order effect again (might be a missed
PRE in the first place which of course wouldn't help us ;)).
The above FAILs for example

FAIL: gcc.dg/tree-ssa/ssa-hoist-3.c scan-tree-dump pre "Insertions: 1"
FAIL: gcc.dg/tree-ssa/ssa-pre-30.c scan-tree-dump-times pre "Replaced MEM" 2

Reply via email to