Re: [PATCH] Ensure count_scale is no larger than REG_BR_PROB_BASE
> On Mon, May 19, 2014 at 1:40 PM, Jan Hubicka wrote: > >> I've updated the patch. Shall I move the check inside cgraph_clone_node? > > > > Thanks, > > I think it is OK as it is. I belive individual users should know what do to > > in such cases themselves. > > You may want to also check what ipa-cp is doing. > > I checked ipa-cp, but didn't see count propagation anywhere. Could you > point me to the function? I believe it is done in update_specialized_profile and also via cgraph_create_virtual_clone honza
Re: [PATCH] Ensure count_scale is no larger than REG_BR_PROB_BASE
On Mon, May 19, 2014 at 1:40 PM, Jan Hubicka wrote: >> I've updated the patch. Shall I move the check inside cgraph_clone_node? > > Thanks, > I think it is OK as it is. I belive individual users should know what do to > in such cases themselves. > You may want to also check what ipa-cp is doing. I checked ipa-cp, but didn't see count propagation anywhere. Could you point me to the function? Thanks, Dehao > > Patch is OK (with Changelog) > Honza >> >> Thanks, >> Dehao >> >> Index: gcc/ipa-inline-transform.c >> === >> --- gcc/ipa-inline-transform.c (revision 210535) >> +++ gcc/ipa-inline-transform.c (working copy) >> @@ -183,8 +183,9 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d >>if (freq_scale == -1) >> freq_scale = e->frequency; >>n = cgraph_clone_node (e->callee, e->callee->decl, >> - e->count, freq_scale, update_original, >> - vNULL, true, inlining_into, NULL); >> + MIN (e->count, e->callee->count), freq_scale, >> + update_original, vNULL, true, inlining_into, >> + NULL); >>cgraph_redirect_edge_callee (e, n); >> } >> } >> Index: gcc/tree-inline.c >> === >> --- gcc/tree-inline.c (revision 210535) >> +++ gcc/tree-inline.c (working copy) >> @@ -4355,7 +4355,7 @@ expand_call_inline (basic_block bb, gimple stmt, c >> function in any way before this point, as this CALL_EXPR may be >> a self-referential call; if we're calling ourselves, we need to >> duplicate our body before altering anything. */ >> - copy_body (id, bb->count, >> + copy_body (id, cg_edge->callee->count, >> GCOV_COMPUTE_SCALE (cg_edge->frequency, CGRAPH_FREQ_BASE), >> bb, return_block, NULL); >> >> >> On Fri, May 16, 2014 at 6:41 PM, Jan Hubicka wrote: >> >> Do you mean adjusting bb->count? Because in >> >> expand_call_inline(tree-inline.c), it will use bb->count to pass into >> >> copy_body to calculate count_scale. >> > >> > What about taking here callee->count instead? For inline nodes without >> > any capping hack, bb->count == edge->count = callee->count. >> > >> > When profile ends up being obviously inconsistent, I would say that >> > inliner can cap callee->count during producing the clone... >> > >> > Honza >> >> >> >> Thanks, >> >> Dehao >> >> >> >> On Fri, May 16, 2014 at 5:22 PM, Jan Hubicka wrote: >> >> >> In AutoFDO, a basic block's count can be much larger than it's actual >> >> >> count because debug info might be incorrect. In this case, a call edge >> >> >> count (calculated from BB count) could be much larger than callee's >> >> >> header count, making the count_scale incorrectly large. >> >> > >> >> > In this case I still think we should handle this when producing the >> >> > clone: >> >> > we do not want to have clone's count much larger as well, so i think >> >> > inliner >> >> > and ipa-cp needs to deal with capping here instead >> >> > >> >> > Honza >> >> >> >> >> >> Dehao >> >> >> > >> >> >> > >> >> >> > Honza
Re: [PATCH] Ensure count_scale is no larger than REG_BR_PROB_BASE
> I've updated the patch. Shall I move the check inside cgraph_clone_node? Thanks, I think it is OK as it is. I belive individual users should know what do to in such cases themselves. You may want to also check what ipa-cp is doing. Patch is OK (with Changelog) Honza > > Thanks, > Dehao > > Index: gcc/ipa-inline-transform.c > === > --- gcc/ipa-inline-transform.c (revision 210535) > +++ gcc/ipa-inline-transform.c (working copy) > @@ -183,8 +183,9 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d >if (freq_scale == -1) > freq_scale = e->frequency; >n = cgraph_clone_node (e->callee, e->callee->decl, > - e->count, freq_scale, update_original, > - vNULL, true, inlining_into, NULL); > + MIN (e->count, e->callee->count), freq_scale, > + update_original, vNULL, true, inlining_into, > + NULL); >cgraph_redirect_edge_callee (e, n); > } > } > Index: gcc/tree-inline.c > === > --- gcc/tree-inline.c (revision 210535) > +++ gcc/tree-inline.c (working copy) > @@ -4355,7 +4355,7 @@ expand_call_inline (basic_block bb, gimple stmt, c > function in any way before this point, as this CALL_EXPR may be > a self-referential call; if we're calling ourselves, we need to > duplicate our body before altering anything. */ > - copy_body (id, bb->count, > + copy_body (id, cg_edge->callee->count, > GCOV_COMPUTE_SCALE (cg_edge->frequency, CGRAPH_FREQ_BASE), > bb, return_block, NULL); > > > On Fri, May 16, 2014 at 6:41 PM, Jan Hubicka wrote: > >> Do you mean adjusting bb->count? Because in > >> expand_call_inline(tree-inline.c), it will use bb->count to pass into > >> copy_body to calculate count_scale. > > > > What about taking here callee->count instead? For inline nodes without > > any capping hack, bb->count == edge->count = callee->count. > > > > When profile ends up being obviously inconsistent, I would say that > > inliner can cap callee->count during producing the clone... > > > > Honza > >> > >> Thanks, > >> Dehao > >> > >> On Fri, May 16, 2014 at 5:22 PM, Jan Hubicka wrote: > >> >> In AutoFDO, a basic block's count can be much larger than it's actual > >> >> count because debug info might be incorrect. In this case, a call edge > >> >> count (calculated from BB count) could be much larger than callee's > >> >> header count, making the count_scale incorrectly large. > >> > > >> > In this case I still think we should handle this when producing the > >> > clone: > >> > we do not want to have clone's count much larger as well, so i think > >> > inliner > >> > and ipa-cp needs to deal with capping here instead > >> > > >> > Honza > >> >> > >> >> Dehao > >> >> > > >> >> > > >> >> > Honza
Re: [PATCH] Ensure count_scale is no larger than REG_BR_PROB_BASE
I've updated the patch. Shall I move the check inside cgraph_clone_node? Thanks, Dehao Index: gcc/ipa-inline-transform.c === --- gcc/ipa-inline-transform.c (revision 210535) +++ gcc/ipa-inline-transform.c (working copy) @@ -183,8 +183,9 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d if (freq_scale == -1) freq_scale = e->frequency; n = cgraph_clone_node (e->callee, e->callee->decl, - e->count, freq_scale, update_original, - vNULL, true, inlining_into, NULL); + MIN (e->count, e->callee->count), freq_scale, + update_original, vNULL, true, inlining_into, + NULL); cgraph_redirect_edge_callee (e, n); } } Index: gcc/tree-inline.c === --- gcc/tree-inline.c (revision 210535) +++ gcc/tree-inline.c (working copy) @@ -4355,7 +4355,7 @@ expand_call_inline (basic_block bb, gimple stmt, c function in any way before this point, as this CALL_EXPR may be a self-referential call; if we're calling ourselves, we need to duplicate our body before altering anything. */ - copy_body (id, bb->count, + copy_body (id, cg_edge->callee->count, GCOV_COMPUTE_SCALE (cg_edge->frequency, CGRAPH_FREQ_BASE), bb, return_block, NULL); On Fri, May 16, 2014 at 6:41 PM, Jan Hubicka wrote: >> Do you mean adjusting bb->count? Because in >> expand_call_inline(tree-inline.c), it will use bb->count to pass into >> copy_body to calculate count_scale. > > What about taking here callee->count instead? For inline nodes without > any capping hack, bb->count == edge->count = callee->count. > > When profile ends up being obviously inconsistent, I would say that > inliner can cap callee->count during producing the clone... > > Honza >> >> Thanks, >> Dehao >> >> On Fri, May 16, 2014 at 5:22 PM, Jan Hubicka wrote: >> >> In AutoFDO, a basic block's count can be much larger than it's actual >> >> count because debug info might be incorrect. In this case, a call edge >> >> count (calculated from BB count) could be much larger than callee's >> >> header count, making the count_scale incorrectly large. >> > >> > In this case I still think we should handle this when producing the clone: >> > we do not want to have clone's count much larger as well, so i think >> > inliner >> > and ipa-cp needs to deal with capping here instead >> > >> > Honza >> >> >> >> Dehao >> >> > >> >> > >> >> > Honza
Re: [PATCH] Ensure count_scale is no larger than REG_BR_PROB_BASE
> Do you mean adjusting bb->count? Because in > expand_call_inline(tree-inline.c), it will use bb->count to pass into > copy_body to calculate count_scale. What about taking here callee->count instead? For inline nodes without any capping hack, bb->count == edge->count = callee->count. When profile ends up being obviously inconsistent, I would say that inliner can cap callee->count during producing the clone... Honza > > Thanks, > Dehao > > On Fri, May 16, 2014 at 5:22 PM, Jan Hubicka wrote: > >> In AutoFDO, a basic block's count can be much larger than it's actual > >> count because debug info might be incorrect. In this case, a call edge > >> count (calculated from BB count) could be much larger than callee's > >> header count, making the count_scale incorrectly large. > > > > In this case I still think we should handle this when producing the clone: > > we do not want to have clone's count much larger as well, so i think inliner > > and ipa-cp needs to deal with capping here instead > > > > Honza > >> > >> Dehao > >> > > >> > > >> > Honza
Re: [PATCH] Ensure count_scale is no larger than REG_BR_PROB_BASE
Do you mean adjusting bb->count? Because in expand_call_inline(tree-inline.c), it will use bb->count to pass into copy_body to calculate count_scale. Thanks, Dehao On Fri, May 16, 2014 at 5:22 PM, Jan Hubicka wrote: >> In AutoFDO, a basic block's count can be much larger than it's actual >> count because debug info might be incorrect. In this case, a call edge >> count (calculated from BB count) could be much larger than callee's >> header count, making the count_scale incorrectly large. > > In this case I still think we should handle this when producing the clone: > we do not want to have clone's count much larger as well, so i think inliner > and ipa-cp needs to deal with capping here instead > > Honza >> >> Dehao >> > >> > >> > Honza
Re: [PATCH] Ensure count_scale is no larger than REG_BR_PROB_BASE
> In AutoFDO, a basic block's count can be much larger than it's actual > count because debug info might be incorrect. In this case, a call edge > count (calculated from BB count) could be much larger than callee's > header count, making the count_scale incorrectly large. In this case I still think we should handle this when producing the clone: we do not want to have clone's count much larger as well, so i think inliner and ipa-cp needs to deal with capping here instead Honza > > Dehao > > > > > > Honza
Re: [PATCH] Ensure count_scale is no larger than REG_BR_PROB_BASE
On Fri, May 16, 2014 at 4:41 PM, Jan Hubicka wrote: > > > Is this patch ok for trunk? Bootstrapped and regression test on-going. > > > > Thanks, > > Dehao > > > > 2014-05-16 Dehao Chen > > > > * tree-inline.c (initialize_cfun): Ensure count_scale is no larger > > than REG_BR_PROB_BASE. > > (copy_cfg_body): Likewise. > > This seems like wrong place to paper around the problem - symmetric count > scaling is done during production of the inline clone. I think if we want to > be smart about broken profiles, we should do it at that place instead here > at inliner... > > What kind of problem does this patch solve? In AutoFDO, a basic block's count can be much larger than it's actual count because debug info might be incorrect. In this case, a call edge count (calculated from BB count) could be much larger than callee's header count, making the count_scale incorrectly large. Dehao > > > Honza
Re: [PATCH] Ensure count_scale is no larger than REG_BR_PROB_BASE
> Is this patch ok for trunk? Bootstrapped and regression test on-going. > > Thanks, > Dehao > > 2014-05-16 Dehao Chen > > * tree-inline.c (initialize_cfun): Ensure count_scale is no larger > than REG_BR_PROB_BASE. > (copy_cfg_body): Likewise. This seems like wrong place to paper around the problem - symmetric count scaling is done during production of the inline clone. I think if we want to be smart about broken profiles, we should do it at that place instead here at inliner... What kind of problem does this patch solve? Honza