Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

Teresa Johnson Sun, 29 Sep 2013 08:07:15 -0700

On Fri, Sep 27, 2013 at 7:15 AM, Teresa Johnson <tejohn...@google.com> wrote:
> On Thu, Sep 26, 2013 at 3:02 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>>
>>> Why not just have probably_never_executed_bb_p return simply return
>>> false bb->frequency is non-zero (right now it does the opposite -
>>
>> We want to have frequencies guessed for functions that was not trained
>> in the profiling run (that was patch I posted earlier that I think did not
>> go in, yet).
>
> Right, but for splitting and bb layout purposes, for these statically
> guessed unprofiled functions we in fact don't want to do any splitting
> or treat the bbs as never executed (which shouldn't be a change from
> the status quo since all the bbs in these functions are currently 0
> weight, it's only when we inline in the case of comdats that they
> appear colder than the surrounding code, but in fact we don't want
> this).
>
> The only other caller to probably_never_executed_bb_p is
> compute_function_frequency, but in the case of statically guessed
> functions they will have profile_status != PROFILE_READ and won't
> invoke probably_never_executed_bb_p. But re-reading our most recent
> exchange on the comdat profile issue, it sounds like you were
> suggesting guessing profiles for all 0-weight functions early, then
> dropping them from PROFILE_READ to PROFILE_GUESSED only once we
> determine in ipa-inline that there is a potentially non-zero call path
> to them. In that case with the change I describe above to
> probably_never_executed_bb_p, the 0-weight functions with 0 calls to
> them will incorrectly be marked as NODE_FREQUENCY_NORMAL, which would
> be bad as they would not be size optimized or moved into the cold
> section.
>
> So it seems like we want different handling of these guessed
> frequencies in compute_function_frequency and bb-reorder.c. Actually I
> think we can handle this by checking if the function entry block has a
> 0 count. If so, then we just look at the bb counts and not the
> frequencies for determining bb hotness as the frequencies would
> presumably have been statically-guessed. This will ensure that the
> cgraph node continues to be marked unlikely and size-optimized. If the
> function entry block has a non-zero count, then we look at both the bb
> count and the bb frequency - if they are both zero then the bb is
> probably never executed, but if either is non-zero then we should
> treat the block as possibly executed (which will come into play for
> splitting and bb layout).


Here is a patch to handle the profile insanities conservatively during
splitting. It also simplifies the probably_never_executed* code to
treat missing counts within a profiled function differently
(conservatively, based on frequency) from the case where the whole
function has a guessed profile. That way, once a patch to guess
profiles for non-executed functions is added, they will continue to
have their nodes marked as unlikely. I also pulled the guts of the
probably_never_executed_bb_p code out to a helper that is then invoked
by both the bb and edge versions of this function, so they stay in
sync.

This gets rid of a number of the failures with splitting + the linker
script to make the unlikely section non-executable. I have a patch to
fix some jump threading insanities that I will send separately.

Bootstrapped and regression tested on x86_64. Also tested with an lto
profiledbootstrap. Ok for trunk?

Thanks,
Teresa

2013-09-29  Teresa Johnson  <tejohn...@google.com>

        * bb-reorder.c (find_rarely_executed_basic_blocks_and_crossing_edges):
        Treat profile insanities conservatively.
        * predict.c (probably_never_executed): New function. Treat profile
        insanities conservatively.
        (probably_never_executed_bb_p): Invoke probably_never_executed.
        (probably_never_executed_edge_p): Invoke probably_never_executed.

Index: bb-reorder.c
===================================================================
--- bb-reorder.c        (revision 202947)
+++ bb-reorder.c        (working copy)
@@ -1564,8 +1564,25 @@ find_rarely_executed_basic_blocks_and_crossing_edg
   /* Mark which partition (hot/cold) each basic block belongs in.  */
   FOR_EACH_BB (bb)
     {
+      bool cold_bb = false;
       if (probably_never_executed_bb_p (cfun, bb))
         {
+          /* Handle profile insanities created by upstream optimizations
+             by also checking the incoming edge weights. If there is a non-cold
+             incoming edge, conservatively prevent this block from being split
+             into the cold section.  */
+          cold_bb = true;
+          FOR_EACH_EDGE (e, ei, bb->preds)
+            {
+              if (!probably_never_executed_edge_p (cfun, e))
+                {
+                  cold_bb = false;
+                  break;
+                }
+            }
+        }
+      if (cold_bb)
+        {
           BB_SET_PARTITION (bb, BB_COLD_PARTITION);
           cold_bb_count++;
         }
Index: predict.c
===================================================================
--- predict.c   (revision 202947)
+++ predict.c   (working copy)
@@ -226,26 +226,26 @@ maybe_hot_edge_p (edge e)
 }


-/* Return true in case BB is probably never executed.  */

-bool
-probably_never_executed_bb_p (struct function *fun, const_basic_block bb)
+/* Return true if profile COUNT and FREQUENCY, or function FUN static
+   node frequency reflects never being executed.  */
+
+static bool
+probably_never_executed (struct function *fun,
+                         gcov_type count, int frequency)
 {
   gcc_checking_assert (fun);
   if (profile_status_for_function (fun) == PROFILE_READ)
     {
-      if ((bb->count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
+      if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
        return false;
-      if (!bb->frequency)
-       return true;
-      if (!ENTRY_BLOCK_PTR->frequency)
-       return false;
-      if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE)
-       {
-         return (RDIV (bb->frequency * ENTRY_BLOCK_PTR->count,
-                       ENTRY_BLOCK_PTR->frequency)
-                 < REG_BR_PROB_BASE / 4);
-       }
+      // If this is a profiled function (entry bb non-zero count), then base
+      // the coldness decision on the frequency. This will handle cases where
+      // counts are not updated properly during optimizations or expansion.
+      if (ENTRY_BLOCK_PTR->count)
+       return frequency == 0;
+      // Unprofiled function, frequencies statically assigned. All bbs are
+      // treated as cold.
       return true;
     }
   if ((!profile_info || !flag_branch_probabilities)
@@ -256,19 +256,21 @@ maybe_hot_edge_p (edge e)
 }


+/* Return true in case BB is probably never executed.  */
+
+bool
+probably_never_executed_bb_p (struct function *fun, const_basic_block bb)
+{
+  return probably_never_executed (fun, bb->count, bb->frequency);
+}
+
+
 /* Return true in case edge E is probably never executed.  */

 bool
 probably_never_executed_edge_p (struct function *fun, edge e)
 {
-  gcc_checking_assert (fun);
-  if (profile_info && flag_branch_probabilities)
-    return ((e->count + profile_info->runs / 2) / profile_info->runs) == 0;
-  if ((!profile_info || !flag_branch_probabilities)
-      && (cgraph_get_node (fun->decl)->frequency
-         == NODE_FREQUENCY_UNLIKELY_EXECUTED))
-    return true;
-  return false;
+  return probably_never_executed (fun, e->count, EDGE_FREQUENCY (e));
 }

 /* Return true if NODE should be optimized for size.  */


>
> Teresa
>
>>
>> Currently I return true when frequency indicate that BB is executed at least 
>> in
>> 1/4th of all executions.  With the cases discussed I see we may need to 
>> reduce
>> this threshold.  In general I do not like much hard tests for 0 because 
>> meaning
>> of 0 depends on REG_BR_FREQ_BASE that is supposed to be changeable and we may
>> want to make frequencies sreal, too.
>>
>> I suppose we may introduce --param for this.  You are also right that I 
>> should
>> update probably_never_executed_edge_p (I intended so, but obviously the code
>> ended up in mainline accidentally).
>>
>> I however saw at least one case of jump threading where this trick did not
>> help: the jump threading update confused itself by scaling via counts rather
>> than frequencies and ended up with dropping everything to 0. This makes it
>> more tempting to try to go with sreals for those....
>>
>> Honza
>>
>>> returns true when bb->frequency is 0)? Making this change removed a
>>> bunch of other failures. With this change as well, there are only 3
>>> cases that still fail with 1 train run that pass with 100. Need to
>>> look at those.
>>>
>>> >
>>> > Will you look into logic of do_jump or shall I try to dive in?
>>>
>>> I can take a look, but probably won't have a chance until late this
>>> week. If you don't get to it before then I will see if I can figure
>>> out why it is applying the branch probabilities this way.
>>>
>>> Teresa
>>>
>>> >
>>> > Honza
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

Reply via email to