Hi Richard,

2011/8/4 Richard Sandiford <richard.sandif...@linaro.org>
>
> Hi Ayal,
>
> Thanks to you and Revital for the replies.  The reason I asked is that
> I wanted to rewrite gen_sched_window so that it has only one loop over
> the PSPs and one loop over the PSSs.


This rewrite makes perfect sense regardless of any follow-up patches,
as it clarifies and reuses code.

>
> I have a follow-up patch to use
> iv analysis to reduce the number of memory dependencies (or at least
> increase the distance between them), and it was easier to do that
> after this change.


Reducing memory dependencies is definitely important.

>
> Ayal Zaks <ayal.z...@gmail.com> writes:
> >>> I ask because in the final range:
> >>>
> >>>       start = early_start;
> >>>       end = MIN (end, early_start + ii);
> >>>       /* Schedule the node close to it's predecessors.  */
> >>>       step = 1;
> >>>
> >>> END is an exclusive bound.  It seems like we might be double-counting
> > here,
> >>> and effectively limiting the schedule to SCHED_TIME (v_node) + ii - 2.
> >>
> >>
> >>Yes, I think it indeed should be fixed. Thanks for reporting on this.
> >>
> >>Revital
> >
> > Agreed;
> >
> >                       if (e->data_type == MEM_DEP)
> >                                 end = MIN (end, SCHED_TIME (v_node) + ii - 
> > 1);
> >
> > should be replaced with
> >
> >                       if (e->data_type == MEM_DEP)
> >                                 end = MIN (end, p_st + ii);
> >
> > also for the (3rd) case when there are both previously-scheduled
> > predessors and previously-scheduled successors. The range is inclusive
> > of start and exclusive of end: for (c = start; c != end; c += step)...
>
> OK.  For the follow-on iv patch, it seemed easier to keep both bounds
> inclusive for the loop, then make the "end" exclusive when setting the
> out parameters.  (Note that there shouldn't be any problem with overflow
> when making the bound exclusive, because the size of the range has been
> restricted to "ii" by that stage.  The follow-on patch does use
> saturating maths for all ops though.)


Sure, no problem having "end" inclusive, as it simplifies things. We
can even keep it inclusive all the way through, iterating over: for (c
= start; c != end+step; c += step)...
>
> >>This doesn't seem to be in the paper, and the comment suggests
> >>"count_succs > count_preds" rather than "count_succs >= count_preds".
> >>Is the ">=" vs ">" important?
> >
> > I think not: in both cases you'll be trying to minimize the same
> > number of live ranges. But indeed it's good to be consistent with the
> > comment.
>
> OK.  For no particular reason other than cowardice, I ended up keeping
> this as:
>
> !   if (count_succs && count_succs >= count_preds)
>
> The reason for asking was that:
>
> !   if (count_succs > count_preds)
>
> seemed more natural, and would match the existing comment.  I'm happy
> to test that instead if you prefer.
>
I wouldn't worry about this tie breaker, unless there's a reason (in
which case the reason should hopefully provide a secondary criteria).

>
> I don't have powerpc hardware that I can do meaningful performance
> testing on, but I did run it through a Popular* Embedded Benchmark
> on an ARM Cortex-A8 board with -O3 -fmodulo-sched
> -fmodulo-sched-allow-regmoves.  There were no changes.  (And this is
> a benchmark that does benefit from modulo scheduling, in some cases
> by a significant amount.)
>
Ok, the patch should be neutral performance-wise. One could check btw
if SMS produces exactly the same output in both cases, even w/o a
powerpc (or any other) HW.

>
> Bootstrapped & regression-tested on powerpc-ibm-aix5.3 with the
> additional patch:
>
> Index: gcc/opts.c
> ===================================================================
> --- gcc/opts.c  2011-08-03 10:56:50.000000000 +0100
> +++ gcc/opts.c  2011-08-03 10:56:50.000000000 +0100
> @@ -449,6 +449,8 @@ static const struct default_options defa
>     { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
>     { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 },
>     { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 },
> +    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched, NULL, 1 },
> +    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched_allow_regmoves, NULL, 1 },
>
>     /* -O2 optimizations.  */
>     { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
>
> applied for both the "before" and "after" runs.  OK to install?


Yes, with just a couple of minor non-mandatory comments:
1. Setting "count_preds = psp_not_empty;" and "count_succs =
pss_not_empty;" suggests that you want to decrement non-interesting
neighbors instead of incrementing interesting ones. Otherwise can
initialize to zero (doesn't really matter, just a bit confusing).
2. I find it easier to read "late_start = MIN (late_start, early_start
+ (ii - 1));" instead of "late_start = MIN (early_start + (ii - 1),
late_start);".
3. Suggest to set step=1 at the beginning, explaining from the start
that we first compute a forward range (start<end), and may later
possibly revert it.
4. "late_start" would better be renamed "late_end".
5. While we're at it, as you've unified the treatments, we can now
actually do with "start" and "end" only, and do away with early_start
and late_start altogether...

Thanks,
Ayal.

>
> Thanks,
> Richard
>
>
> gcc/
>        * modulo-sched.c (get_sched_window): Use just one loop for predecessors
>        and one loop for successors.  Fix upper bound of memory range.
>
> Index: gcc/modulo-sched.c
> ===================================================================
> *** gcc/modulo-sched.c  2011-08-04 09:09:29.000000000 +0100
> --- gcc/modulo-sched.c  2011-08-04 09:49:16.000000000 +0100
> *************** #define MAX_SPLIT_NUM 10
> *** 1630,1638 ****
>
>  static int
>  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
> !                 sbitmap sched_nodes, int ii, int *start_p, int *step_p, int 
> *end_p)
>  {
>    int start, step, end;
>    ddg_edge_ptr e;
>    sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
>    sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
> --- 1630,1640 ----
>
>  static int
>  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
> !                 sbitmap sched_nodes, int ii, int *start_p, int *step_p,
> !                 int *end_p)
>  {
>    int start, step, end;
> +   int early_start, late_start;
>    ddg_edge_ptr e;
>    sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
>    sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
> *************** get_sched_window (partial_schedule_ptr p
> *** 1640,1645 ****
> --- 1642,1649 ----
>    sbitmap u_node_succs = NODE_SUCCESSORS (u_node);
>    int psp_not_empty;
>    int pss_not_empty;
> +   int count_preds;
> +   int count_succs;
>
>    /* 1. compute sched window for u (start, end, step).  */
>    sbitmap_zero (psp);
> *************** get_sched_window (partial_schedule_ptr p
> *** 1647,1861 ****
>    psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes);
>    pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes);
>
> !   if (psp_not_empty && !pss_not_empty)
> !     {
> !       int early_start = INT_MIN;
> !
> !       end = INT_MAX;
> !       for (e = u_node->in; e != 0; e = e->next_in)
> !       {
> !         ddg_node_ptr v_node = e->src;
> !
> !           if (dump_file)
> !             {
> !             fprintf (dump_file, "\nProcessing edge: ");
> !               print_ddg_edge (dump_file, e);
> !             fprintf (dump_file,
> !                      "\nScheduling %d (%d) in psp_not_empty,"
> !                      " checking p %d (%d): ", u_node->cuid,
> !                      INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                      (v_node->insn));
> !             }
> !
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int p_st = SCHED_TIME (v_node);
> !
> !               early_start =
> !                 MAX (early_start, p_st + e->latency - (e->distance * ii));
> !
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "pred st = %d; early_start = %d; latency: %d",
> !                          p_st, early_start, e->latency);
> !
> !             if (e->data_type == MEM_DEP)
> !               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> !           }
> !          else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
> !       }
> !       start = early_start;
> !       end = MIN (end, early_start + ii);
> !       /* Schedule the node close to it's predecessors.  */
> !       step = 1;
> !
> !       if (dump_file)
> !         fprintf (dump_file,
> !                "\nScheduling %d (%d) in a window (%d..%d) with step %d\n",
> !                u_node->cuid, INSN_UID (u_node->insn), start, end, step);
> !     }
> !
> !   else if (!psp_not_empty && pss_not_empty)
> !     {
> !       int late_start = INT_MAX;
> !
> !       end = INT_MIN;
> !       for (e = u_node->out; e != 0; e = e->next_out)
> !       {
> !         ddg_node_ptr v_node = e->dest;
> !
> !           if (dump_file)
> !             {
> !               fprintf (dump_file, "\nProcessing edge:");
> !               print_ddg_edge (dump_file, e);
> !               fprintf (dump_file,
> !                        "\nScheduling %d (%d) in pss_not_empty,"
> !                        " checking s %d (%d): ", u_node->cuid,
> !                        INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                        (v_node->insn));
> !             }
> !
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int s_st = SCHED_TIME (v_node);
> !
> !               late_start = MIN (late_start,
> !                                 s_st - e->latency + (e->distance * ii));
> !
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "succ st = %d; late_start = %d; latency = %d",
> !                          s_st, late_start, e->latency);
> !
> !             if (e->data_type == MEM_DEP)
> !               end = MAX (end, SCHED_TIME (v_node) - ii + 1);
> !              if (dump_file)
> !                  fprintf (dump_file, "end = %d\n", end);
> !
> !           }
> !           else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
>
> !       }
> !       start = late_start;
> !       end = MAX (end, late_start - ii);
> !       /* Schedule the node close to it's successors.  */
> !       step = -1;
>
> !       if (dump_file)
> !         fprintf (dump_file,
> !                  "\nScheduling %d (%d) in a window (%d..%d) with step %d\n",
> !                  u_node->cuid, INSN_UID (u_node->insn), start, end, step);
>
> !     }
>
> !   else if (psp_not_empty && pss_not_empty)
> !     {
> !       int early_start = INT_MIN;
> !       int late_start = INT_MAX;
> !       int count_preds = 0;
> !       int count_succs = 0;
>
> !       start = INT_MIN;
> !       end = INT_MAX;
> !       for (e = u_node->in; e != 0; e = e->next_in)
> !       {
> !         ddg_node_ptr v_node = e->src;
>
> !         if (dump_file)
> !           {
> !               fprintf (dump_file, "\nProcessing edge:");
> !               print_ddg_edge (dump_file, e);
>              fprintf (dump_file,
> !                      "\nScheduling %d (%d) in psp_pss_not_empty,"
> !                      " checking p %d (%d): ", u_node->cuid, INSN_UID
> !                      (u_node->insn), v_node->cuid, INSN_UID
> !                      (v_node->insn));
> !           }
> !
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int p_st = SCHED_TIME (v_node);
> !
> !             early_start = MAX (early_start,
> !                                p_st + e->latency
> !                                - (e->distance * ii));
> !
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "pred st = %d; early_start = %d; latency = %d",
> !                          p_st, early_start, e->latency);
> !
> !               if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !                 count_preds++;
>
> !             if (e->data_type == MEM_DEP)
> !               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> !           }
> !           else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
>
> !       }
> !       for (e = u_node->out; e != 0; e = e->next_out)
> !       {
> !         ddg_node_ptr v_node = e->dest;
>
> !         if (dump_file)
> !           {
> !               fprintf (dump_file, "\nProcessing edge:");
> !               print_ddg_edge (dump_file, e);
> !             fprintf (dump_file,
> !                      "\nScheduling %d (%d) in psp_pss_not_empty,"
> !                      " checking s %d (%d): ", u_node->cuid, INSN_UID
> !                      (u_node->insn), v_node->cuid, INSN_UID
> !                      (v_node->insn));
> !           }
>
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int s_st = SCHED_TIME (v_node);
>
> !             late_start = MIN (late_start,
> !                               s_st - e->latency
> !                               + (e->distance * ii));
>
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "succ st = %d; late_start = %d; latency = %d",
> !                          s_st, late_start, e->latency);
>
> !                if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !                  count_succs++;
>
> !             if (e->data_type == MEM_DEP)
> !               start = MAX (start, SCHED_TIME (v_node) - ii + 1);
> !           }
> !           else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
>
> !       }
> !       start = MAX (start, early_start);
> !       end = MIN (end, MIN (early_start + ii, late_start + 1));
> !       step = 1;
> !       /* If there are more successors than predecessors schedule the
> !          node close to it's successors.  */
> !       if (count_succs >= count_preds)
> !         {
> !           int old_start = start;
>
> !           start = end - 1;
> !           end = old_start - 1;
> !           step = -1;
> !         }
> !     }
> !   else /* psp is empty && pss is empty.  */
> !     {
> !       start = SCHED_ASAP (u_node);
> !       end = start + ii;
> !       step = 1;
>      }
>
>    *start_p = start;
>    *step_p = step;
>    *end_p = end;
> --- 1651,1769 ----
>    psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes);
>    pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes);
>
> !   early_start = INT_MIN;
> !   late_start = INT_MAX;
> !   start = INT_MIN;
> !   end = INT_MAX;
> !   count_preds = psp_not_empty;
> !   count_succs = pss_not_empty;
> !
> !   /* Calculate early_start and limit end.  Both bounds are inclusive.  */
> !   if (psp_not_empty)
> !     for (e = u_node->in; e != 0; e = e->next_in)
> !       {
> !       ddg_node_ptr v_node = e->src;
>
> !       if (dump_file)
> !         {
> !           fprintf (dump_file, "\nProcessing edge: ");
> !           print_ddg_edge (dump_file, e);
> !           fprintf (dump_file,
> !                    "\nScheduling %d (%d) in psp_not_empty,"
> !                    " checking p %d (%d): ", u_node->cuid,
> !                    INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                    (v_node->insn));
> !         }
>
> !       if (TEST_BIT (sched_nodes, v_node->cuid))
> !         {
> !           int p_st = SCHED_TIME (v_node);
>
> !           early_start = MAX (early_start,
> !                              p_st + e->latency - (e->distance * ii));
>
> !           if (e->data_type == MEM_DEP)
> !             end = MIN (end, p_st + ii - 1);
>
> !           if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !             count_preds++;
>
> !           if (dump_file)
>              fprintf (dump_file,
> !                      "pred st = %d; early_start = %d; latency: %d;"
> !                      " end: %d\n", p_st, early_start, e->latency, end);
>
> !         }
> !       else if (dump_file)
> !         fprintf (dump_file, "the node is not scheduled\n");
> !       }
>
> !   /* Calculate late_start and limit start.  Both bounds are inclusive.  */
> !   if (pss_not_empty)
> !     for (e = u_node->out; e != 0; e = e->next_out)
> !       {
> !       ddg_node_ptr v_node = e->dest;
>
> !       if (dump_file)
> !         {
> !           fprintf (dump_file, "\nProcessing edge:");
> !           print_ddg_edge (dump_file, e);
> !           fprintf (dump_file,
> !                    "\nScheduling %d (%d) in pss_not_empty,"
> !                    " checking s %d (%d): ", u_node->cuid,
> !                    INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                    (v_node->insn));
> !         }
>
> !       if (TEST_BIT (sched_nodes, v_node->cuid))
> !         {
> !           int s_st = SCHED_TIME (v_node);
>
> !           late_start = MIN (late_start,
> !                             s_st - e->latency + (e->distance * ii));
>
> !           if (e->data_type == MEM_DEP)
> !             start = MAX (start, s_st - ii + 1);
>
> !           if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !             count_succs++;
>
> !           if (dump_file)
> !             fprintf (dump_file,
> !                      "succ st = %d; late_start = %d; latency = %d;"
> !                      " start=%d", s_st, late_start, e->latency, start);
>
> !         }
> !       else if (dump_file)
> !         fprintf (dump_file, "the node is not scheduled\n");
> !       }
>
> !   /* Get a target scheduling window no bigger than ii.  */
> !   if (early_start == INT_MIN && late_start == INT_MAX)
> !     early_start = SCHED_ASAP (u_node);
> !   else if (early_start == INT_MIN)
> !     early_start = late_start - (ii - 1);
> !   late_start = MIN (early_start + (ii - 1), late_start);
> !
> !   /* Apply memory dependence limits.  */
> !   start = MAX (start, early_start);
> !   end = MIN (end, late_start);
> !   step = 1;
> !
> !   /* If there are at least as many successors as predecessors, schedule the
> !      node close to its successors.  */
> !   if (count_succs && count_succs >= count_preds)
> !     {
> !       int tmp = end;
> !       end = start;
> !       start = tmp;
> !       step = -1;
>      }
>
> +   /* Now that we've finalized the window, make END an exclusive rather
> +      than an inclusive bound.  */
> +   end += step;
> +
>    *start_p = start;
>    *step_p = step;
>    *end_p = end;
> *************** get_sched_window (partial_schedule_ptr p
> *** 1867,1876 ****
>        if (dump_file)
>        fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n",
>                 start, end, step);
> !     return -1;
>      }
>
> !     return 0;
>  }
>
>  /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the
> --- 1775,1784 ----
>        if (dump_file)
>        fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n",
>                 start, end, step);
> !       return -1;
>      }
>
> !   return 0;
>  }
>
>  /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the

Reply via email to