https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116743
--- Comment #19 from Eugene Rozenfeld <erozen at microsoft dot com> ---
I investigated what happens in the compiler.
In afdo_annotate_cfg we have these lines:
cgraph_node::get (current_function_decl)->count
= profile_count::from_gcov_type (s->head_count ()).afdo ();
ENTRY_BLOCK_PTR_FOR_FN (cfun)->count
= profile_count::from_gcov_type (s->head_count ()).afdo ();
In the test case these are set to 0.
Before g:3d9e6767939e they stayed at 0. After g:3d9e6767939e with better count
propagation ENTRY_BLOCK_PTR_FOR_FN (cfun)->count became non-zero; however,
cgraph_node::get (current_function_decl)->count wasn't updated and stayed at 0.
This caused a problem in execute_fixup_cfg:
profile_count num = node->count;
profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
bool scale = num.initialized_p () && !(num == den);
Before g:3d9e6767939e num and den were both 0 and scale was false; after
g:3d9e6767939e num was 0, but den wasn't 0 so scale became true and we lost bb
counts:
if (scale)
bb->count = bb->count.apply_scale (num, den);
I think the fix is to update cgraph_node::get (current_function_decl)->count
once we've done count propagation.
I propose the patch below. Rama, can you please check if this resolves your
perf regression?
diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 2b34b80b82d..dcd70248c26 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1537,8 +1537,6 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts)
if (s == NULL)
return;
- cgraph_node::get (current_function_decl)->count
- = profile_count::from_gcov_type (s->head_count ()).afdo ();
ENTRY_BLOCK_PTR_FOR_FN (cfun)->count
= profile_count::from_gcov_type (s->head_count ()).afdo ();
EXIT_BLOCK_PTR_FOR_FN (cfun)->count = profile_count::zero ().afdo ();
@@ -1577,6 +1575,8 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts)
/* Calculate, propagate count and probability information on CFG. */
afdo_calculate_branch_prob (&annotated_bb);
}
+ cgraph_node::get(current_function_decl)->count
+ = ENTRY_BLOCK_PTR_FOR_FN(cfun)->count;
update_max_bb_count ();
profile_status_for_fn (cfun) = PROFILE_READ;
if (flag_value_profile_transformations)