https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155
--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> --- Oddly enough Index: gcc/tree-ssa-coalesce.c =================================================================== --- gcc/tree-ssa-coalesce.c (revision 264259) +++ gcc/tree-ssa-coalesce.c (working copy) @@ -620,7 +620,11 @@ ssa_conflicts_merge (ssa_conflicts *ptr, { bitmap bz = ptr->conflicts[z]; if (bz) - bitmap_set_bit (bz, x); + { + bool was_there = bitmap_clear_bit (bz, y); + gcc_checking_assert (was_there); + bitmap_set_bit (bz, x); + } } if (bx) changes at least the 2nd testcase to run faster (albeit memory use stays around the same). w/o patch > /usr/bin/time ./cc1 -quiet t2.c 108.14user 1.62system 1:49.78elapsed 99%CPU (0avgtext+0avgdata 5610876maxresident)k 0inputs+440outputs (0major+1442106minor)pagefaults 0swaps w/ patch > /usr/bin/time ./cc1 -quiet t2.c 86.53user 1.46system 1:27.99elapsed 99%CPU (0avgtext+0avgdata 5610888maxresident)k 0inputs+440outputs (0major+1440069minor)pagefaults 0swaps note this is a -O0 "optimized" cc1 binary with checking enabled so ... It's even so slightly faster with doing if (y < x) { bool was_there = bitmap_clear_bit (bz, y); gcc_checking_assert (was_there); bitmap_set_bit (bz, x); } else { bitmap_set_bit (bz, x); bool was_there = bitmap_clear_bit (bz, y); gcc_checking_assert (was_there); } but that's probably luck (bitmap caching and path length from start vs. x / y). Eventually doing a forward walk also makes prefetchers happy. Probably with a lot of coalesces this keeps the conflict bitmaps small (and thus the bitmap element walks fast). Timings with release checking and optimized build: patched: > /usr/bin/time ../../obj/gcc/cc1 -quiet t2.c 22.91user 1.45system 0:24.38elapsed 99%CPU (0avgtext+0avgdata 5515460maxresident)k 0inputs+440outputs (0major+1378102minor)pagefaults 0swaps patched, fancy forward walk: > /usr/bin/time ../../obj/gcc/cc1 -quiet t2.c 22.47user 1.39system 0:23.88elapsed 99%CPU (0avgtext+0avgdata 5515420maxresident)k 0inputs+440outputs (0major+1377586minor)pagefaults 0swaps unpatched: > /usr/bin/time ../../obj/gcc/cc1 -quiet t2.c 46.60user 1.43system 0:48.03elapsed 99%CPU (0avgtext+0avgdata 5515380maxresident)k 0inputs+440outputs (0major+1378102minor)pagefaults 0swaps I'm testing the simple patch now.