https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155

--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oddly enough

Index: gcc/tree-ssa-coalesce.c
===================================================================
--- gcc/tree-ssa-coalesce.c     (revision 264259)
+++ gcc/tree-ssa-coalesce.c     (working copy)
@@ -620,7 +620,11 @@ ssa_conflicts_merge (ssa_conflicts *ptr,
     {
       bitmap bz = ptr->conflicts[z];
       if (bz)
-       bitmap_set_bit (bz, x);
+       {
+         bool was_there = bitmap_clear_bit (bz, y);
+         gcc_checking_assert (was_there);
+         bitmap_set_bit (bz, x);
+       }
     }

   if (bx)

changes at least the 2nd testcase to run faster (albeit memory use stays around
the same).

w/o patch

> /usr/bin/time ./cc1 -quiet t2.c
108.14user 1.62system 1:49.78elapsed 99%CPU (0avgtext+0avgdata
5610876maxresident)k
0inputs+440outputs (0major+1442106minor)pagefaults 0swaps

w/ patch

> /usr/bin/time ./cc1 -quiet t2.c
86.53user 1.46system 1:27.99elapsed 99%CPU (0avgtext+0avgdata
5610888maxresident)k
0inputs+440outputs (0major+1440069minor)pagefaults 0swaps

note this is a -O0 "optimized" cc1 binary with checking enabled so ...

It's even so slightly faster with doing

          if (y < x)
            {
              bool was_there = bitmap_clear_bit (bz, y);
              gcc_checking_assert (was_there);
              bitmap_set_bit (bz, x);
            }
          else
            {
              bitmap_set_bit (bz, x);
              bool was_there = bitmap_clear_bit (bz, y);
              gcc_checking_assert (was_there);
            }

but that's probably luck (bitmap caching and path length from start vs.
x / y).  Eventually doing a forward walk also makes prefetchers
happy.

Probably with a lot of coalesces this keeps the conflict bitmaps small
(and thus the bitmap element walks fast).

Timings with release checking and optimized build:

patched:

> /usr/bin/time ../../obj/gcc/cc1 -quiet t2.c
22.91user 1.45system 0:24.38elapsed 99%CPU (0avgtext+0avgdata
5515460maxresident)k
0inputs+440outputs (0major+1378102minor)pagefaults 0swaps

patched, fancy forward walk:

> /usr/bin/time ../../obj/gcc/cc1 -quiet t2.c
22.47user 1.39system 0:23.88elapsed 99%CPU (0avgtext+0avgdata
5515420maxresident)k
0inputs+440outputs (0major+1377586minor)pagefaults 0swaps

unpatched:

> /usr/bin/time ../../obj/gcc/cc1 -quiet t2.c
46.60user 1.43system 0:48.03elapsed 99%CPU (0avgtext+0avgdata
5515380maxresident)k
0inputs+440outputs (0major+1378102minor)pagefaults 0swaps

I'm testing the simple patch now.

Reply via email to