On Fri, Sep 27, 2013 at 7:15 AM, Teresa Johnson <tejohn...@google.com> wrote: > On Thu, Sep 26, 2013 at 3:02 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >>> >>> Why not just have probably_never_executed_bb_p return simply return >>> false bb->frequency is non-zero (right now it does the opposite - >> >> We want to have frequencies guessed for functions that was not trained >> in the profiling run (that was patch I posted earlier that I think did not >> go in, yet). > > Right, but for splitting and bb layout purposes, for these statically > guessed unprofiled functions we in fact don't want to do any splitting > or treat the bbs as never executed (which shouldn't be a change from > the status quo since all the bbs in these functions are currently 0 > weight, it's only when we inline in the case of comdats that they > appear colder than the surrounding code, but in fact we don't want > this). > > The only other caller to probably_never_executed_bb_p is > compute_function_frequency, but in the case of statically guessed > functions they will have profile_status != PROFILE_READ and won't > invoke probably_never_executed_bb_p. But re-reading our most recent > exchange on the comdat profile issue, it sounds like you were > suggesting guessing profiles for all 0-weight functions early, then > dropping them from PROFILE_READ to PROFILE_GUESSED only once we > determine in ipa-inline that there is a potentially non-zero call path > to them. In that case with the change I describe above to > probably_never_executed_bb_p, the 0-weight functions with 0 calls to > them will incorrectly be marked as NODE_FREQUENCY_NORMAL, which would > be bad as they would not be size optimized or moved into the cold > section. > > So it seems like we want different handling of these guessed > frequencies in compute_function_frequency and bb-reorder.c. Actually I > think we can handle this by checking if the function entry block has a > 0 count. If so, then we just look at the bb counts and not the > frequencies for determining bb hotness as the frequencies would > presumably have been statically-guessed. This will ensure that the > cgraph node continues to be marked unlikely and size-optimized. If the > function entry block has a non-zero count, then we look at both the bb > count and the bb frequency - if they are both zero then the bb is > probably never executed, but if either is non-zero then we should > treat the block as possibly executed (which will come into play for > splitting and bb layout).
Here is a patch to handle the profile insanities conservatively during splitting. It also simplifies the probably_never_executed* code to treat missing counts within a profiled function differently (conservatively, based on frequency) from the case where the whole function has a guessed profile. That way, once a patch to guess profiles for non-executed functions is added, they will continue to have their nodes marked as unlikely. I also pulled the guts of the probably_never_executed_bb_p code out to a helper that is then invoked by both the bb and edge versions of this function, so they stay in sync. This gets rid of a number of the failures with splitting + the linker script to make the unlikely section non-executable. I have a patch to fix some jump threading insanities that I will send separately. Bootstrapped and regression tested on x86_64. Also tested with an lto profiledbootstrap. Ok for trunk? Thanks, Teresa 2013-09-29 Teresa Johnson <tejohn...@google.com> * bb-reorder.c (find_rarely_executed_basic_blocks_and_crossing_edges): Treat profile insanities conservatively. * predict.c (probably_never_executed): New function. Treat profile insanities conservatively. (probably_never_executed_bb_p): Invoke probably_never_executed. (probably_never_executed_edge_p): Invoke probably_never_executed. Index: bb-reorder.c =================================================================== --- bb-reorder.c (revision 202947) +++ bb-reorder.c (working copy) @@ -1564,8 +1564,25 @@ find_rarely_executed_basic_blocks_and_crossing_edg /* Mark which partition (hot/cold) each basic block belongs in. */ FOR_EACH_BB (bb) { + bool cold_bb = false; if (probably_never_executed_bb_p (cfun, bb)) { + /* Handle profile insanities created by upstream optimizations + by also checking the incoming edge weights. If there is a non-cold + incoming edge, conservatively prevent this block from being split + into the cold section. */ + cold_bb = true; + FOR_EACH_EDGE (e, ei, bb->preds) + { + if (!probably_never_executed_edge_p (cfun, e)) + { + cold_bb = false; + break; + } + } + } + if (cold_bb) + { BB_SET_PARTITION (bb, BB_COLD_PARTITION); cold_bb_count++; } Index: predict.c =================================================================== --- predict.c (revision 202947) +++ predict.c (working copy) @@ -226,26 +226,26 @@ maybe_hot_edge_p (edge e) } -/* Return true in case BB is probably never executed. */ -bool -probably_never_executed_bb_p (struct function *fun, const_basic_block bb) +/* Return true if profile COUNT and FREQUENCY, or function FUN static + node frequency reflects never being executed. */ + +static bool +probably_never_executed (struct function *fun, + gcov_type count, int frequency) { gcc_checking_assert (fun); if (profile_status_for_function (fun) == PROFILE_READ) { - if ((bb->count * 4 + profile_info->runs / 2) / profile_info->runs > 0) + if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0) return false; - if (!bb->frequency) - return true; - if (!ENTRY_BLOCK_PTR->frequency) - return false; - if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE) - { - return (RDIV (bb->frequency * ENTRY_BLOCK_PTR->count, - ENTRY_BLOCK_PTR->frequency) - < REG_BR_PROB_BASE / 4); - } + // If this is a profiled function (entry bb non-zero count), then base + // the coldness decision on the frequency. This will handle cases where + // counts are not updated properly during optimizations or expansion. + if (ENTRY_BLOCK_PTR->count) + return frequency == 0; + // Unprofiled function, frequencies statically assigned. All bbs are + // treated as cold. return true; } if ((!profile_info || !flag_branch_probabilities) @@ -256,19 +256,21 @@ maybe_hot_edge_p (edge e) } +/* Return true in case BB is probably never executed. */ + +bool +probably_never_executed_bb_p (struct function *fun, const_basic_block bb) +{ + return probably_never_executed (fun, bb->count, bb->frequency); +} + + /* Return true in case edge E is probably never executed. */ bool probably_never_executed_edge_p (struct function *fun, edge e) { - gcc_checking_assert (fun); - if (profile_info && flag_branch_probabilities) - return ((e->count + profile_info->runs / 2) / profile_info->runs) == 0; - if ((!profile_info || !flag_branch_probabilities) - && (cgraph_get_node (fun->decl)->frequency - == NODE_FREQUENCY_UNLIKELY_EXECUTED)) - return true; - return false; + return probably_never_executed (fun, e->count, EDGE_FREQUENCY (e)); } /* Return true if NODE should be optimized for size. */ > > Teresa > >> >> Currently I return true when frequency indicate that BB is executed at least >> in >> 1/4th of all executions. With the cases discussed I see we may need to >> reduce >> this threshold. In general I do not like much hard tests for 0 because >> meaning >> of 0 depends on REG_BR_FREQ_BASE that is supposed to be changeable and we may >> want to make frequencies sreal, too. >> >> I suppose we may introduce --param for this. You are also right that I >> should >> update probably_never_executed_edge_p (I intended so, but obviously the code >> ended up in mainline accidentally). >> >> I however saw at least one case of jump threading where this trick did not >> help: the jump threading update confused itself by scaling via counts rather >> than frequencies and ended up with dropping everything to 0. This makes it >> more tempting to try to go with sreals for those.... >> >> Honza >> >>> returns true when bb->frequency is 0)? Making this change removed a >>> bunch of other failures. With this change as well, there are only 3 >>> cases that still fail with 1 train run that pass with 100. Need to >>> look at those. >>> >>> > >>> > Will you look into logic of do_jump or shall I try to dive in? >>> >>> I can take a look, but probably won't have a chance until late this >>> week. If you don't get to it before then I will see if I can figure >>> out why it is applying the branch probabilities this way. >>> >>> Teresa >>> >>> > >>> > Honza >>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413