Ayal Zaks <ayal.z...@gmail.com> writes:
>> OK.  For the follow-on iv patch, it seemed easier to keep both bounds
>> inclusive for the loop, then make the "end" exclusive when setting the
>> out parameters.  (Note that there shouldn't be any problem with overflow
>> when making the bound exclusive, because the size of the range has been
>> restricted to "ii" by that stage.  The follow-on patch does use
>> saturating maths for all ops though.)
>
> Sure, no problem having "end" inclusive, as it simplifies things. We
> can even keep it inclusive all the way through, iterating over: for (c
> = start; c != end+step; c += step)...

Yeah, I'd prefer that too.  I might do it once Revital's and
Roman's changes are in.

>> I don't have powerpc hardware that I can do meaningful performance
>> testing on, but I did run it through a Popular* Embedded Benchmark
>> on an ARM Cortex-A8 board with -O3 -fmodulo-sched
>> -fmodulo-sched-allow-regmoves.  There were no changes.  (And this is
>> a benchmark that does benefit from modulo scheduling, in some cases
>> by a significant amount.)
>>
> Ok, the patch should be neutral performance-wise. One could check btw
> if SMS produces exactly the same output in both cases, even w/o a
> powerpc (or any other) HW.

OK.  The patch does change the output a little because of the MEM_DEP
range thing.  To compensate for that, I tried compiling libav on ARM with:

Index: gcc/modulo-sched.c
===================================================================
--- gcc/modulo-sched.c  2011-08-05 11:23:16.000000000 +0100
+++ gcc/modulo-sched.c  2011-08-05 11:27:57.000000000 +0100
@@ -1680,7 +1680,7 @@ get_sched_window (partial_schedule_ptr p
                          p_st, early_start, e->latency);
 
              if (e->data_type == MEM_DEP)
-               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
+               end = MIN (end, SCHED_TIME (v_node) + ii);
            }
          else if (dump_file)
             fprintf (dump_file, "the node is not scheduled\n");
@@ -1729,7 +1729,7 @@ get_sched_window (partial_schedule_ptr p
                          s_st, late_start, e->latency);
 
              if (e->data_type == MEM_DEP)
-               end = MAX (end, SCHED_TIME (v_node) - ii + 1);
+               end = MAX (end, SCHED_TIME (v_node) - ii);
              if (dump_file)
                  fprintf (dump_file, "end = %d\n", end);
 
@@ -1791,7 +1791,7 @@ get_sched_window (partial_schedule_ptr p
                 count_preds++;
 
              if (e->data_type == MEM_DEP)
-               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
+               end = MIN (end, SCHED_TIME (v_node) + ii);
            }
           else if (dump_file)
             fprintf (dump_file, "the node is not scheduled\n");

applied, then tried the same thing with the revised patch below.
The output was the same.

(FWIW, libav did show up extra differences when using the patch
that I'd originally submitted.  They were due to the count_preds
and count_succs thing that you picked up in your review.)

>> Bootstrapped & regression-tested on powerpc-ibm-aix5.3 with the
>> additional patch:
>>
>> Index: gcc/opts.c
>> ===================================================================
>> --- gcc/opts.c  2011-08-03 10:56:50.000000000 +0100
>> +++ gcc/opts.c  2011-08-03 10:56:50.000000000 +0100
>> @@ -449,6 +449,8 @@ static const struct default_options defa
>>     { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
>>     { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 },
>>     { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 },
>> +    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched, NULL, 1 },
>> +    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched_allow_regmoves, NULL, 1 },
>>
>>     /* -O2 optimizations.  */
>>     { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
>>
>> applied for both the "before" and "after" runs.  OK to install?
>
>
> Yes, with just a couple of minor non-mandatory comments:
> 1. Setting "count_preds = psp_not_empty;" and "count_succs =
> pss_not_empty;" suggests that you want to decrement non-interesting
> neighbors instead of incrementing interesting ones. Otherwise can
> initialize to zero (doesn't really matter, just a bit confusing).

Yeah, good point.  I initialised them to zero, like you said,
and changed the reversal condition to:

  if (pss_not_empty && count_succs >= count_preds)

Which I have admit makes a lot more sense than what I submitted,
and avoids some unwanted changes in output.

> 2. I find it easier to read "late_start = MIN (late_start, early_start
> + (ii - 1));" instead of "late_start = MIN (early_start + (ii - 1),
> late_start);".

Oops, that was accidental.  Fixed.

> 3. Suggest to set step=1 at the beginning, explaining from the start
> that we first compute a forward range (start<end), and may later
> possibly revert it.

Yeah, that's better.

> 4. "late_start" would better be renamed "late_end".

Since you said this was optional, I decided to leave it as it is.
The name and value of "late start" come directly from the paper.

> 5. While we're at it, as you've unified the treatments, we can now
> actually do with "start" and "end" only, and do away with early_start
> and late_start altogether...

The IV patch needs the same distinction as we have now, so I'd prefer
to keep it.

Thanks for the review.  Here's what I applied after testing on
powerpc-ibm-aix5.3.

Richard


gcc/
        * modulo-sched.c (get_sched_window): Use just one loop for predecessors
        and one loop for successors.  Fix upper bound of memory range.

Index: gcc/modulo-sched.c
===================================================================
*** gcc/modulo-sched.c  2011-08-05 10:39:27.000000000 +0100
--- gcc/modulo-sched.c  2011-08-05 10:40:49.000000000 +0100
*************** #define MAX_SPLIT_NUM 10
*** 1630,1638 ****
  
  static int
  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
!                 sbitmap sched_nodes, int ii, int *start_p, int *step_p, int 
*end_p)
  {
    int start, step, end;
    ddg_edge_ptr e;
    sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
    sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
--- 1630,1640 ----
  
  static int
  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
!                 sbitmap sched_nodes, int ii, int *start_p, int *step_p,
!                 int *end_p)
  {
    int start, step, end;
+   int early_start, late_start;
    ddg_edge_ptr e;
    sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
    sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
*************** get_sched_window (partial_schedule_ptr p
*** 1640,1645 ****
--- 1642,1649 ----
    sbitmap u_node_succs = NODE_SUCCESSORS (u_node);
    int psp_not_empty;
    int pss_not_empty;
+   int count_preds;
+   int count_succs;
  
    /* 1. compute sched window for u (start, end, step).  */
    sbitmap_zero (psp);
*************** get_sched_window (partial_schedule_ptr p
*** 1647,1861 ****
    psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes);
    pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes);
  
!   if (psp_not_empty && !pss_not_empty)
!     {
!       int early_start = INT_MIN;
! 
!       end = INT_MAX;
!       for (e = u_node->in; e != 0; e = e->next_in)
!       {
!         ddg_node_ptr v_node = e->src;
! 
!           if (dump_file)
!             {
!             fprintf (dump_file, "\nProcessing edge: ");
!               print_ddg_edge (dump_file, e);
!             fprintf (dump_file,
!                      "\nScheduling %d (%d) in psp_not_empty,"
!                      " checking p %d (%d): ", u_node->cuid,
!                      INSN_UID (u_node->insn), v_node->cuid, INSN_UID
!                      (v_node->insn));
!             }
! 
!         if (TEST_BIT (sched_nodes, v_node->cuid))
!           {
!               int p_st = SCHED_TIME (v_node);
! 
!               early_start =
!                 MAX (early_start, p_st + e->latency - (e->distance * ii));
! 
!               if (dump_file)
!                 fprintf (dump_file,
!                          "pred st = %d; early_start = %d; latency: %d",
!                          p_st, early_start, e->latency);
! 
!             if (e->data_type == MEM_DEP)
!               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
!           }
!          else if (dump_file)
!             fprintf (dump_file, "the node is not scheduled\n");
!       }
!       start = early_start;
!       end = MIN (end, early_start + ii);
!       /* Schedule the node close to it's predecessors.  */
!       step = 1;
! 
!       if (dump_file)
!         fprintf (dump_file,
!                "\nScheduling %d (%d) in a window (%d..%d) with step %d\n",
!                u_node->cuid, INSN_UID (u_node->insn), start, end, step);
!     }
! 
!   else if (!psp_not_empty && pss_not_empty)
!     {
!       int late_start = INT_MAX;
! 
!       end = INT_MIN;
!       for (e = u_node->out; e != 0; e = e->next_out)
!       {
!         ddg_node_ptr v_node = e->dest;
! 
!           if (dump_file)
!             {
!               fprintf (dump_file, "\nProcessing edge:");
!               print_ddg_edge (dump_file, e);
!               fprintf (dump_file,
!                        "\nScheduling %d (%d) in pss_not_empty,"
!                        " checking s %d (%d): ", u_node->cuid,
!                        INSN_UID (u_node->insn), v_node->cuid, INSN_UID
!                        (v_node->insn));
!             }
! 
!         if (TEST_BIT (sched_nodes, v_node->cuid))
!           {
!               int s_st = SCHED_TIME (v_node);
! 
!               late_start = MIN (late_start,
!                                 s_st - e->latency + (e->distance * ii));
! 
!               if (dump_file)
!                 fprintf (dump_file,
!                          "succ st = %d; late_start = %d; latency = %d",
!                          s_st, late_start, e->latency);
! 
!             if (e->data_type == MEM_DEP)
!               end = MAX (end, SCHED_TIME (v_node) - ii + 1);
!              if (dump_file)
!                  fprintf (dump_file, "end = %d\n", end);
! 
!           }
!           else if (dump_file)
!             fprintf (dump_file, "the node is not scheduled\n");
  
!       }
!       start = late_start;
!       end = MAX (end, late_start - ii);
!       /* Schedule the node close to it's successors.  */
!       step = -1;
  
!       if (dump_file)
!         fprintf (dump_file,
!                  "\nScheduling %d (%d) in a window (%d..%d) with step %d\n",
!                  u_node->cuid, INSN_UID (u_node->insn), start, end, step);
  
!     }
  
!   else if (psp_not_empty && pss_not_empty)
!     {
!       int early_start = INT_MIN;
!       int late_start = INT_MAX;
!       int count_preds = 0;
!       int count_succs = 0;
  
!       start = INT_MIN;
!       end = INT_MAX;
!       for (e = u_node->in; e != 0; e = e->next_in)
!       {
!         ddg_node_ptr v_node = e->src;
  
!         if (dump_file)
!           {
!               fprintf (dump_file, "\nProcessing edge:");
!               print_ddg_edge (dump_file, e);
              fprintf (dump_file,
!                      "\nScheduling %d (%d) in psp_pss_not_empty,"
!                      " checking p %d (%d): ", u_node->cuid, INSN_UID
!                      (u_node->insn), v_node->cuid, INSN_UID
!                      (v_node->insn));
!           }
! 
!         if (TEST_BIT (sched_nodes, v_node->cuid))
!           {
!               int p_st = SCHED_TIME (v_node);
! 
!             early_start = MAX (early_start,
!                                p_st + e->latency
!                                - (e->distance * ii));
! 
!               if (dump_file)
!                 fprintf (dump_file,
!                          "pred st = %d; early_start = %d; latency = %d",
!                          p_st, early_start, e->latency);
! 
!               if (e->type == TRUE_DEP && e->data_type == REG_DEP)
!                 count_preds++;
  
!             if (e->data_type == MEM_DEP)
!               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
!           }
!           else if (dump_file)
!             fprintf (dump_file, "the node is not scheduled\n");
  
!       }
!       for (e = u_node->out; e != 0; e = e->next_out)
!       {
!         ddg_node_ptr v_node = e->dest;
  
!         if (dump_file)
!           {
!               fprintf (dump_file, "\nProcessing edge:");
!               print_ddg_edge (dump_file, e);
!             fprintf (dump_file,
!                      "\nScheduling %d (%d) in psp_pss_not_empty,"
!                      " checking s %d (%d): ", u_node->cuid, INSN_UID
!                      (u_node->insn), v_node->cuid, INSN_UID
!                      (v_node->insn));
!           }
  
!         if (TEST_BIT (sched_nodes, v_node->cuid))
!           {
!               int s_st = SCHED_TIME (v_node);
  
!             late_start = MIN (late_start,
!                               s_st - e->latency
!                               + (e->distance * ii));
  
!               if (dump_file)
!                 fprintf (dump_file,
!                          "succ st = %d; late_start = %d; latency = %d",
!                          s_st, late_start, e->latency);
  
!                if (e->type == TRUE_DEP && e->data_type == REG_DEP)
!                  count_succs++;
  
!             if (e->data_type == MEM_DEP)
!               start = MAX (start, SCHED_TIME (v_node) - ii + 1);
!           }
!           else if (dump_file)
!             fprintf (dump_file, "the node is not scheduled\n");
  
!       }
!       start = MAX (start, early_start);
!       end = MIN (end, MIN (early_start + ii, late_start + 1));
!       step = 1;
!       /* If there are more successors than predecessors schedule the
!          node close to it's successors.  */
!       if (count_succs >= count_preds)
!         {
!           int old_start = start;
  
!           start = end - 1;
!           end = old_start - 1;
!           step = -1;
!         }
!     }
!   else /* psp is empty && pss is empty.  */
!     {
!       start = SCHED_ASAP (u_node);
!       end = start + ii;
!       step = 1;
      }
  
    *start_p = start;
    *step_p = step;
    *end_p = end;
--- 1651,1772 ----
    psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes);
    pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes);
  
!   /* We first compute a forward range (start <= end), then decide whether
!      to reverse it.  */
!   early_start = INT_MIN;
!   late_start = INT_MAX;
!   start = INT_MIN;
!   end = INT_MAX;
!   step = 1;
! 
!   count_preds = 0;
!   count_succs = 0;
! 
!   /* Calculate early_start and limit end.  Both bounds are inclusive.  */
!   if (psp_not_empty)
!     for (e = u_node->in; e != 0; e = e->next_in)
!       {
!       ddg_node_ptr v_node = e->src;
  
!       if (dump_file)
!         {
!           fprintf (dump_file, "\nProcessing edge: ");
!           print_ddg_edge (dump_file, e);
!           fprintf (dump_file,
!                    "\nScheduling %d (%d) in psp_not_empty,"
!                    " checking p %d (%d): ", u_node->cuid,
!                    INSN_UID (u_node->insn), v_node->cuid, INSN_UID
!                    (v_node->insn));
!         }
  
!       if (TEST_BIT (sched_nodes, v_node->cuid))
!         {
!           int p_st = SCHED_TIME (v_node);
  
!           early_start = MAX (early_start,
!                              p_st + e->latency - (e->distance * ii));
  
!           if (e->data_type == MEM_DEP)
!             end = MIN (end, p_st + ii - 1);
  
!           if (e->type == TRUE_DEP && e->data_type == REG_DEP)
!             count_preds++;
  
!           if (dump_file)
              fprintf (dump_file,
!                      "pred st = %d; early_start = %d; latency: %d;"
!                      " end: %d\n", p_st, early_start, e->latency, end);
  
!         }
!       else if (dump_file)
!         fprintf (dump_file, "the node is not scheduled\n");
!       }
  
!   /* Calculate late_start and limit start.  Both bounds are inclusive.  */
!   if (pss_not_empty)
!     for (e = u_node->out; e != 0; e = e->next_out)
!       {
!       ddg_node_ptr v_node = e->dest;
  
!       if (dump_file)
!         {
!           fprintf (dump_file, "\nProcessing edge:");
!           print_ddg_edge (dump_file, e);
!           fprintf (dump_file,
!                    "\nScheduling %d (%d) in pss_not_empty,"
!                    " checking s %d (%d): ", u_node->cuid,
!                    INSN_UID (u_node->insn), v_node->cuid, INSN_UID
!                    (v_node->insn));
!         }
  
!       if (TEST_BIT (sched_nodes, v_node->cuid))
!         {
!           int s_st = SCHED_TIME (v_node);
  
!           late_start = MIN (late_start,
!                             s_st - e->latency + (e->distance * ii));
  
!           if (e->data_type == MEM_DEP)
!             start = MAX (start, s_st - ii + 1);
  
!           if (e->type == TRUE_DEP && e->data_type == REG_DEP)
!             count_succs++;
  
!           if (dump_file)
!             fprintf (dump_file,
!                      "succ st = %d; late_start = %d; latency = %d;"
!                      " start=%d", s_st, late_start, e->latency, start);
  
!         }
!       else if (dump_file)
!         fprintf (dump_file, "the node is not scheduled\n");
!       }
  
!   /* Get a target scheduling window no bigger than ii.  */
!   if (early_start == INT_MIN && late_start == INT_MAX)
!     early_start = SCHED_ASAP (u_node);
!   else if (early_start == INT_MIN)
!     early_start = late_start - (ii - 1);
!   late_start = MIN (late_start, early_start + (ii - 1));
! 
!   /* Apply memory dependence limits.  */
!   start = MAX (start, early_start);
!   end = MIN (end, late_start);
! 
!   /* If there are at least as many successors as predecessors, schedule the
!      node close to its successors.  */
!   if (pss_not_empty && count_succs >= count_preds)
!     {
!       int tmp = end;
!       end = start;
!       start = tmp;
!       step = -1;
      }
  
+   /* Now that we've finalized the window, make END an exclusive rather
+      than an inclusive bound.  */
+   end += step;
+ 
    *start_p = start;
    *step_p = step;
    *end_p = end;
*************** get_sched_window (partial_schedule_ptr p
*** 1867,1876 ****
        if (dump_file)
        fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n",
                 start, end, step);
!     return -1;
      }
  
!     return 0;
  }
  
  /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the
--- 1778,1787 ----
        if (dump_file)
        fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n",
                 start, end, step);
!       return -1;
      }
  
!   return 0;
  }
  
  /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the

Reply via email to