https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545
--- Comment #17 from davidxl <xinliangli at gmail dot com> ---
(In reply to Jan Hubicka from comment #16)
> I have moved tracer before the late cleanups that seems to be rather obbious
> thing to do. This lets us to optimize the testcase (with -O2):
> int main() ()
> {
> struct A * ap;
> int i;
> int _6;
>
> <bb 2>:
>
> <bb 3>:
> # i_29 = PHI <i_22(6), 0(2)>
> _6 = i_29 % 7;
> if (_6 == 0)
> goto <bb 4>;
> else
> goto <bb 5>;
>
> <bb 4>:
> ap_8 = operator new (16);
> ap_8->i = 0;
> ap_8->_vptr.A = &MEM[(void *)&_ZTV1A + 16B];
> goto <bb 6>;
>
> <bb 5>:
> ap_13 = operator new (16);
> MEM[(struct B *)ap_13].D.2244.i = 0;
> MEM[(struct B *)ap_13].b = 0;
> MEM[(struct B *)ap_13].D.2244._vptr.A = &MEM[(void *)&_ZTV1B + 16B];
>
> <bb 6>:
> # ap_4 = PHI <ap_13(5), ap_8(4)>
> operator delete (ap_4);
> i_22 = i_29 + 1;
> if (i_22 != 10000)
> goto <bb 3>;
> else
> goto <bb 7>;
>
> <bb 7>:
> return 0;
>
> }
>
> Martin, I do not have SPEC setup, do you think you can benchmark the
> attached patch with SPEC and profile feedback and also non-FDO -O3 -ftracer
> compared to -O3, please?
> It would be nice to know code size impact, too.
> Index: passes.def
> ===================================================================
> --- passes.def (revision 215651)
> +++ passes.def (working copy)
> @@ -155,6 +155,7 @@ along with GCC; see the file COPYING3.
> NEXT_PASS (pass_dce);
> NEXT_PASS (pass_call_cdce);
> NEXT_PASS (pass_cselim);
> + NEXT_PASS (pass_tracer);
> NEXT_PASS (pass_copy_prop);
> NEXT_PASS (pass_tree_ifcombine);
> NEXT_PASS (pass_phiopt);
> @@ -252,7 +253,6 @@ along with GCC; see the file COPYING3.
> NEXT_PASS (pass_cse_reciprocals);
> NEXT_PASS (pass_reassoc);
> NEXT_PASS (pass_strength_reduction);
> - NEXT_PASS (pass_tracer);
> NEXT_PASS (pass_dominator);
> NEXT_PASS (pass_strlen);
> NEXT_PASS (pass_vrp);
>
> Doing it at same approximately the same place as loop header copying seems
> to make most sense to me. It benefits from early cleanups and DCE definitly
> and it should enable more fun with the later scalar passes that are almost
> all rerun then.
WE can try some internal benchmarks with this change too.
David