Hi Kewen,

Below are my comments.  I don't want to override Alexander's review, and if the 
patch looks good to him, it's fine to ignore my concerns.

My main concern is that this adds a new entity -- forceful skipping of 
DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change in 
behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is already 
quite a bit of logic in the scheduler to skip them _as part of normal 
operation_.

Would you please consider 2 ideas below.

#1:
After a brief look, I'm guessing this part is causing the problem:
haifa-sched.cc <http://haifa-sched.cc/>:schedule_block():
=== [1]
  /* Loop until all the insns in BB are scheduled.  */
  while ((*current_sched_info->schedule_more_p) ())
    {
      perform_replacements_new_cycle ();
      do
        {
          start_clock_var = clock_var;

          clock_var++;

          advance_one_cycle ();
===

and then in the nested loop we have
=== [2]
          /* We don't want md sched reorder to even see debug isns, so put
             them out right away.  */
          if (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0))
              && (*current_sched_info->schedule_more_p) ())
            {
              while (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0)))
                {
                  rtx_insn *insn = ready_remove_first (&ready);
                  gcc_assert (DEBUG_INSN_P (insn));
                  (*current_sched_info->begin_schedule_ready) (insn);
                  scheduled_insns.safe_push (insn);
                  last_scheduled_insn = insn;
                  advance = schedule_insn (insn);
                  gcc_assert (advance == 0);
                  if (ready.n_ready > 0)
                    ready_sort (&ready);
                }
            }
===
.  At the [1] point we already have sorted ready list, and I don't see any 
blockers to doing [2] before calling advance_one_cycle().

#2
Another approach, which might be even easier, is to save the state of DFA 
before the initial advance_one_cycle(), and then restore it if no real insns 
have been scheduled.

Kind regards,

--
Maxim Kuvyrkov
https://www.linaro.org


> On Nov 8, 2023, at 06:49, Kewen.Lin <li...@linux.ibm.com> wrote:
> 
> Hi,
> 
> Gentle ping this:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html
> 
> BR,
> Kewen
> 
> on 2023/10/25 10:45, Kewen.Lin wrote:
>> Hi,
>> 
>> This is almost a repost for v2 which was posted at[1] in March
>> excepting for:
>>  1) rebased from r14-4810 which is relatively up-to-date,
>>     some conflicts on "int to bool" return type change have
>>     been resolved;
>>  2) adjust commit log a bit;
>>  3) fix misspelled "articial" with "artificial" somewhere;
>> 
>> --
>> *v2 comments*:
>> 
>> By addressing Alexander's comments, against v1 this
>> patch v2 mainly:
>> 
>>  - Rename no_real_insns_p to no_real_nondebug_insns_p;
>>  - Introduce enum rgn_bb_deps_free_action for three
>>    kinds of actions to free deps;
>>  - Change function free_deps_for_bb_no_real_insns_p to
>>    resolve_forw_deps which only focuses on forward deps;
>>  - Extend the handlings to cover dbg-cnt sched_block,
>>    add one test case for it;
>>  - Move free_trg_info call in schedule_region to an
>>    appropriate place.
>> 
>> One thing I'm not sure about is the change in function
>> sched_rgn_local_finish, currently the invocation to
>> sched_rgn_local_free is guarded with !sel_sched_p (),
>> so I just follow it, but the initialization of those
>> structures (in sched_rgn_local_init) isn't guarded
>> with !sel_sched_p (), it looks odd.
>> 
>> --
>> 
>> As PR108273 shows, when there is one block which only has
>> NOTE_P and LABEL_P insns at non-debug mode while has some
>> extra DEBUG_INSN_P insns at debug mode, after scheduling
>> it, the DFA states would be different between debug mode
>> and non-debug mode.  Since at non-debug mode, the block
>> meets no_real_insns_p, it gets skipped; while at debug
>> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
>> and DEBUG_INSN_P, the call of function advance_one_cycle
>> will change the DFA state.  PR108519 also shows this issue
>> can be exposed by some scheduler changes.
>> 
>> This patch is to change function no_real_insns_p to
>> function no_real_nondebug_insns_p by taking debug insn into
>> account, which make us not try to schedule for the block
>> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
>> resulting in consistent DFA states between non-debug and
>> debug mode.
>> 
>> Changing no_real_insns_p to no_real_nondebug_insns_p caused
>> ICE when doing free_block_dependencies, the root cause is
>> that we create dependencies for debug insns, those
>> dependencies are expected to be resolved during scheduling
>> insns, but they get skipped after this change.
>> By checking the code, it looks it's reasonable to skip to
>> compute block dependences for no_real_nondebug_insns_p
>> blocks.  There is also another issue, which gets exposed
>> in SPEC2017 bmks build at option -O2 -g, is that we could
>> skip to schedule some block, which already gets dependency
>> graph built so has dependencies computed and rgn_n_insns
>> accumulated, then the later verification on if the graph
>> becomes exhausted by scheduling would fail as follow:
>> 
>>  /* Sanity check: verify that all region insns were
>>     scheduled.  */
>>    gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>> 
>> , and also some forward deps aren't resovled.
>> 
>> As Alexander pointed out, the current debug count handling
>> also suffers the similar issue, so this patch handles these
>> two cases together: one is for some block gets skipped by
>> !dbg_cnt (sched_block), the other is for some block which
>> is not no_real_nondebug_insns_p initially but becomes
>> no_real_nondebug_insns_p due to speculative scheduling.
>> 
>> This patch can be bootstrapped and regress-tested on
>> x86_64-redhat-linux, aarch64-linux-gnu and
>> powerpc64{,le}-linux-gnu.
>> 
>> I also verified this patch can pass SPEC2017 both intrate
>> and fprate bmks building at -g -O2/-O3.
>> 
>> Any thoughts?  Is it ok for trunk?
>> 
>> [1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html
>> [2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html
>> 
>> BR,
>> Kewen
>> -----
>> PR rtl-optimization/108273
>> 
>> gcc/ChangeLog:
>> 
>> * haifa-sched.cc (no_real_insns_p): Rename to ...
>> (no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
>> * sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
>> no_real_nondebug_insns_p.
>> * sched-int.h (no_real_insns_p): Rename to ...
>> (no_real_nondebug_insns_p): ... this.
>> * sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
>> (bb_deps_free_actions): New static variable.
>> (compute_block_dependences): Skip for no_real_nondebug_insns_p.
>> (resolve_forw_deps): New function.
>> (free_block_dependencies): Check bb_deps_free_actions and call
>> function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTIFICIAL.
>> (compute_priorities): Replace no_real_insns_p with
>> no_real_nondebug_insns_p.
>> (schedule_region): Replace no_real_insns_p with
>> no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTIFICIAL if the block
>> get dependencies computed before but skipped now, fix up count
>> sched_rgn_n_insns for it too.  Call free_trg_info when the block
>> gets scheduled, and move sched_rgn_local_finish after the loop
>> of free_block_dependencies loop.
>> (sched_rgn_local_init): Allocate and compute bb_deps_free_actions.
>> (sched_rgn_local_finish): Free bb_deps_free_actions.
>> * sel-sched.cc (sel_region_target_finish): Replace no_real_insns_p with
>> no_real_nondebug_insns_p.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.target/powerpc/pr108273.c: New test.
>> ---
>> gcc/haifa-sched.cc                          |   9 +-
>> gcc/sched-ebb.cc                            |   2 +-
>> gcc/sched-int.h                             |   2 +-
>> gcc/sched-rgn.cc                            | 148 +++++++++++++++-----
>> gcc/sel-sched.cc                            |   3 +-
>> gcc/testsuite/gcc.target/powerpc/pr108273.c |  26 ++++
>> 6 files changed, 150 insertions(+), 40 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108273.c
>> 
>> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
>> index 8e8add709b3..30cc90ec49f 100644
>> --- a/gcc/haifa-sched.cc
>> +++ b/gcc/haifa-sched.cc
>> @@ -5033,14 +5033,17 @@ get_ebb_head_tail (basic_block beg, basic_block end,
>>   *tailp = end_tail;
>> }
>> 
>> -/* Return true if there are no real insns in the range [ HEAD, TAIL ].  */
>> +/* Return true if there are no real nondebug insns in the range
>> +   [ HEAD, TAIL ].  */
>> 
>> bool
>> -no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
>> +no_real_nondebug_insns_p (const rtx_insn *head, const rtx_insn *tail)
>> {
>>   while (head != NEXT_INSN (tail))
>>     {
>> -      if (!NOTE_P (head) && !LABEL_P (head))
>> +      if (!NOTE_P (head)
>> +  && !LABEL_P (head)
>> +  && !DEBUG_INSN_P (head))
>> return false;
>>       head = NEXT_INSN (head);
>>     }
>> diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc
>> index 110fcdbca4d..03d96290a7c 100644
>> --- a/gcc/sched-ebb.cc
>> +++ b/gcc/sched-ebb.cc
>> @@ -491,7 +491,7 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool 
>> modulo_scheduling)
>>   first_bb = BLOCK_FOR_INSN (head);
>>   last_bb = BLOCK_FOR_INSN (tail);
>> 
>> -  if (no_real_insns_p (head, tail))
>> +  if (no_real_nondebug_insns_p (head, tail))
>>     return BLOCK_FOR_INSN (tail);
>> 
>>   gcc_assert (INSN_P (head) && INSN_P (tail));
>> diff --git a/gcc/sched-int.h b/gcc/sched-int.h
>> index 64a2f0bcff9..adca494ade5 100644
>> --- a/gcc/sched-int.h
>> +++ b/gcc/sched-int.h
>> @@ -1397,7 +1397,7 @@ extern void free_global_sched_pressure_data (void);
>> extern int haifa_classify_insn (const_rtx);
>> extern void get_ebb_head_tail (basic_block, basic_block,
>>       rtx_insn **, rtx_insn **);
>> -extern bool no_real_insns_p (const rtx_insn *, const rtx_insn *);
>> +extern bool no_real_nondebug_insns_p (const rtx_insn *, const rtx_insn *);
>> 
>> extern int insn_sched_cost (rtx_insn *);
>> extern int dep_cost_1 (dep_t, dw_t);
>> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
>> index e5964f54ead..2549e834aa8 100644
>> --- a/gcc/sched-rgn.cc
>> +++ b/gcc/sched-rgn.cc
>> @@ -213,6 +213,22 @@ static int rgn_nr_edges;
>> /* Array of size rgn_nr_edges.  */
>> static edge *rgn_edges;
>> 
>> +/* Possible actions for dependencies freeing.  */
>> +enum rgn_bb_deps_free_action
>> +{
>> +  /* This block doesn't get dependencies computed so don't need to free.  */
>> +  RGN_BB_DEPS_FREE_NO,
>> +  /* This block gets scheduled normally so free dependencies as usual.  */
>> +  RGN_BB_DEPS_FREE_NORMAL,
>> +  /* This block gets skipped in scheduling but has dependencies computed 
>> early,
>> +     need to free the forward list artificially.  */
>> +  RGN_BB_DEPS_FREE_ARTIFICIAL
>> +};
>> +
>> +/* For basic block i, bb_deps_free_actions[i] indicates which action needs
>> +   to be taken for freeing its dependencies.  */
>> +static enum rgn_bb_deps_free_action *bb_deps_free_actions;
>> +
>> /* Mapping from each edge in the graph to its number in the rgn.  */
>> #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux)
>> #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr))
>> @@ -2735,6 +2751,15 @@ compute_block_dependences (int bb)
>>   gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>>   get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>> 
>> +  /* Don't compute block dependences if there are no real nondebug insns.  
>> */
>> +  if (no_real_nondebug_insns_p (head, tail))
>> +    {
>> +      if (current_nr_blocks > 1)
>> + propagate_deps (bb, &tmp_deps);
>> +      free_deps (&tmp_deps);
>> +      return;
>> +    }
>> +
>>   sched_analyze (&tmp_deps, head, tail);
>> 
>>   add_branch_dependences (head, tail);
>> @@ -2749,6 +2774,24 @@ compute_block_dependences (int bb)
>>     targetm.sched.dependencies_evaluation_hook (head, tail);
>> }
>> 
>> +/* Artificially resolve forward dependencies for instructions HEAD to TAIL. 
>>  */
>> +
>> +static void
>> +resolve_forw_deps (rtx_insn *head, rtx_insn *tail)
>> +{
>> +  rtx_insn *insn;
>> +  rtx_insn *next_tail = NEXT_INSN (tail);
>> +  sd_iterator_def sd_it;
>> +  dep_t dep;
>> +
>> +  /* There could be some insns which get skipped in scheduling but we 
>> compute
>> +     dependencies for them previously, so make them resolved.  */
>> +  for (insn = head; insn != next_tail; insn = NEXT_INSN (insn))
>> +    for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
>> + sd_iterator_cond (&sd_it, &dep);)
>> +      sd_resolve_dep (sd_it);
>> +}
>> +
>> /* Free dependencies of instructions inside BB.  */
>> static void
>> free_block_dependencies (int bb)
>> @@ -2758,9 +2801,12 @@ free_block_dependencies (int bb)
>> 
>>   get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>> 
>> -  if (no_real_insns_p (head, tail))
>> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
>>     return;
>> 
>> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL)
>> +    resolve_forw_deps (head, tail);
>> +
>>   sched_free_deps (head, tail, true);
>> }
>> 
>> @@ -3024,7 +3070,7 @@ compute_priorities (void)
>>       gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb));
>>       get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>> 
>> -      if (no_real_insns_p (head, tail))
>> +      if (no_real_nondebug_insns_p (head, tail))
>> continue;
>> 
>>       rgn_n_insns += set_priorities (head, tail);
>> @@ -3158,7 +3204,7 @@ schedule_region (int rgn)
>> 
>>  get_ebb_head_tail (first_bb, last_bb, &head, &tail);
>> 
>> -  if (no_real_insns_p (head, tail))
>> +  if (no_real_nondebug_insns_p (head, tail))
>>    {
>>      gcc_assert (first_bb == last_bb);
>>      continue;
>> @@ -3178,44 +3224,62 @@ schedule_region (int rgn)
>> 
>>       get_ebb_head_tail (first_bb, last_bb, &head, &tail);
>> 
>> -      if (no_real_insns_p (head, tail))
>> +      if (no_real_nondebug_insns_p (head, tail))
>> {
>>  gcc_assert (first_bb == last_bb);
>>  save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]);
>> -  continue;
>> +
>> +  if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO)
>> +    continue;
>> +
>> +  /* As it's not no_real_nondebug_insns_p initially, then it has some
>> +     dependencies computed so free it artificially.  */
>> +  bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL;
>> }
>> +      else
>> + {
>> +  current_sched_info->prev_head = PREV_INSN (head);
>> +  current_sched_info->next_tail = NEXT_INSN (tail);
>> 
>> -      current_sched_info->prev_head = PREV_INSN (head);
>> -      current_sched_info->next_tail = NEXT_INSN (tail);
>> +  remove_notes (head, tail);
>> 
>> -      remove_notes (head, tail);
>> +  unlink_bb_notes (first_bb, last_bb);
>> 
>> -      unlink_bb_notes (first_bb, last_bb);
>> +  target_bb = bb;
>> 
>> -      target_bb = bb;
>> +  gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
>> +  current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
>> 
>> -      gcc_assert (flag_schedule_interblock || current_nr_blocks == 1);
>> -      current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;
>> +  curr_bb = first_bb;
>> +  if (dbg_cnt (sched_block))
>> +    {
>> +      int saved_last_basic_block = last_basic_block_for_fn (cfun);
>> 
>> -      curr_bb = first_bb;
>> -      if (dbg_cnt (sched_block))
>> -        {
>> -  int saved_last_basic_block = last_basic_block_for_fn (cfun);
>> +      schedule_block (&curr_bb, bb_state[first_bb->index]);
>> +      gcc_assert (EBB_FIRST_BB (bb) == first_bb);
>> +      sched_rgn_n_insns += sched_n_insns;
>> +      realloc_bb_state_array (saved_last_basic_block);
>> +      save_state_for_fallthru_edge (last_bb, curr_state);
>> 
>> -  schedule_block (&curr_bb, bb_state[first_bb->index]);
>> -  gcc_assert (EBB_FIRST_BB (bb) == first_bb);
>> -  sched_rgn_n_insns += sched_n_insns;
>> -  realloc_bb_state_array (saved_last_basic_block);
>> -  save_state_for_fallthru_edge (last_bb, curr_state);
>> -        }
>> -      else
>> -        {
>> -          sched_rgn_n_insns += rgn_n_insns;
>> -        }
>> +      /* Clean up.  */
>> +      if (current_nr_blocks > 1)
>> + free_trg_info ();
>> +    }
>> +  else
>> +    bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL;
>> + }
>> 
>> -      /* Clean up.  */
>> -      if (current_nr_blocks > 1)
>> - free_trg_info ();
>> +      /* We have counted this block when computing rgn_n_insns
>> + previously, so need to fix up sched_rgn_n_insns now.  */
>> +      if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL)
>> + {
>> +  while (head != NEXT_INSN (tail))
>> +    {
>> +      if (INSN_P (head))
>> + sched_rgn_n_insns++;
>> +      head = NEXT_INSN (head);
>> +    }
>> + }
>>     }
>> 
>>   /* Sanity check: verify that all region insns were scheduled.  */
>> @@ -3223,13 +3287,13 @@ schedule_region (int rgn)
>> 
>>   sched_finish_ready_list ();
>> 
>> -  /* Done with this region.  */
>> -  sched_rgn_local_finish ();
>> -
>>   /* Free dependencies.  */
>>   for (bb = 0; bb < current_nr_blocks; ++bb)
>>     free_block_dependencies (bb);
>> 
>> +  /* Done with this region.  */
>> +  sched_rgn_local_finish ();
>> +
>>   gcc_assert (haifa_recovery_bb_ever_added_p
>>      || deps_pools_are_empty_p ());
>> }
>> @@ -3450,6 +3514,19 @@ sched_rgn_local_init (int rgn)
>>    e->aux = NULL;
>>         }
>>     }
>> +
>> +  /* Initialize bb_deps_free_actions.  */
>> +  bb_deps_free_actions
>> +    = XNEWVEC (enum rgn_bb_deps_free_action, current_nr_blocks);
>> +  for (bb = 0; bb < current_nr_blocks; bb++)
>> +    {
>> +      rtx_insn *head, *tail;
>> +      get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
>> +      if (no_real_nondebug_insns_p (head, tail))
>> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NO;
>> +      else
>> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NORMAL;
>> +    }
>> }
>> 
>> /* Free data computed for the finished region.  */
>> @@ -3467,9 +3544,12 @@ sched_rgn_local_free (void)
>> void
>> sched_rgn_local_finish (void)
>> {
>> -  if (current_nr_blocks > 1 && !sel_sched_p ())
>> +  if (!sel_sched_p ())
>>     {
>> -      sched_rgn_local_free ();
>> +      if (current_nr_blocks > 1)
>> + sched_rgn_local_free ();
>> +
>> +      free (bb_deps_free_actions);
>>     }
>> }
>> 
>> diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc
>> index 1925f4a9461..8310c892e13 100644
>> --- a/gcc/sel-sched.cc
>> +++ b/gcc/sel-sched.cc
>> @@ -7213,7 +7213,8 @@ sel_region_target_finish (bool reset_sched_cycles_p)
>> 
>>       find_ebb_boundaries (EBB_FIRST_BB (i), scheduled_blocks);
>> 
>> -      if (no_real_insns_p (current_sched_info->head, 
>> current_sched_info->tail))
>> +      if (no_real_nondebug_insns_p (current_sched_info->head,
>> +    current_sched_info->tail))
>> continue;
>> 
>>       if (reset_sched_cycles_p)
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108273.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr108273.c
>> new file mode 100644
>> index 00000000000..937224eaa69
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr108273.c
>> @@ -0,0 +1,26 @@
>> +/* { dg-options "-O2 -fdbg-cnt=sched_block:1" } */
>> +/* { dg-prune-output {\*\*\*dbgcnt:.*limit.*reached} } */
>> +
>> +/* Verify there is no ICE.  */
>> +
>> +int a, b, c, e, f;
>> +float d;
>> +
>> +void
>> +g ()
>> +{
>> +  float h, i[1];
>> +  for (; f;)
>> +    if (c)
>> +      {
>> + d *e;
>> + if (b)
>> +  {
>> +    float *j = i;
>> +    j[0] += 0;
>> +  }
>> + h += d;
>> +      }
>> +  if (h)
>> +    a = i[0];
>> +}
>> --
>> 2.39.1

Reply via email to