Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2012-02-04 Thread Ayal Zaks
SMS changes are ok.


* common.opt (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose):
   New flags.

We should document what the different verbosity levels are, or
at-least their range.


Thanks,
Ayal.



On Tue, Jan 10, 2012 at 7:48 PM, Vladimir Makarov vmaka...@redhat.com wrote:
 On 01/03/2012 04:25 AM, Revital1 Eres wrote:


 Attached is an updated version with the two changes mentioned above taken
 from the previous patch.

 Tested and bootstrap with the other patch in the series on
 ppc64-redhat-linux, enabling SMS on loops with SC 1.

 Thanks again,
 Revital


 IRA changes are ok for me.
 Thanks, Revital.

 2012-01-03  Richard Sandifordrichard.sandif...@linaro.org
             Revital Eresrevital.e...@linaro.org

         * loop-invariant.c (get_regno_pressure_class): Move function to...
         * ira.c: Here.
         * common.opt (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose):
         New flags.
         * doc/invoke.texi (fmodulo-sched-reg-pressure,
         -fmodulo-sched-verbose): Document the flags.
         * ira.h (get_regno_pressure_class,
         reset_pseudo_classes_defined_p): Declare.
         * ira-costs.c (reset_pseudo_classes_defined_p): New function.
         * Makefile.in (modulo-sched.o): Include ira.h and modulo-sched.h.
         (modulo-sched-pressure.o): New.
         * modulo-sched.c (ira.h, modulo-sched.h): New includes.
         (partial_schedule_ptr, ps_insn_ptr, struct ps_insn,
         struct ps_reg_move_info, struct partial_schedule): Move to
         modulo-sched.h.
         (ps_rtl_insn, ps_reg_move): Remove static.
         (apply_reg_moves): Remove static and call df_insn_rescan only
         if PS is final.
         (undo_reg_moves): New function.
         (sms_schedule): Call register pressure estimation.
         * modulo-sched.h: New file.
         * modulo-sched-pressure.c: New file.

 (See attached file: patch_pressure_3_1_12.txt)




Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2012-01-10 Thread Vladimir Makarov

On 01/03/2012 04:25 AM, Revital1 Eres wrote:


Attached is an updated version with the two changes mentioned above taken
from the previous patch.

Tested and bootstrap with the other patch in the series on
ppc64-redhat-linux, enabling SMS on loops with SC 1.

Thanks again,
Revital



IRA changes are ok for me.
Thanks, Revital.

2012-01-03  Richard Sandifordrichard.sandif...@linaro.org
 Revital Eresrevital.e...@linaro.org

 * loop-invariant.c (get_regno_pressure_class): Move function to...
 * ira.c: Here.
 * common.opt (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose):
 New flags.
 * doc/invoke.texi (fmodulo-sched-reg-pressure,
 -fmodulo-sched-verbose): Document the flags.
 * ira.h (get_regno_pressure_class,
 reset_pseudo_classes_defined_p): Declare.
 * ira-costs.c (reset_pseudo_classes_defined_p): New function.
 * Makefile.in (modulo-sched.o): Include ira.h and modulo-sched.h.
 (modulo-sched-pressure.o): New.
 * modulo-sched.c (ira.h, modulo-sched.h): New includes.
 (partial_schedule_ptr, ps_insn_ptr, struct ps_insn,
 struct ps_reg_move_info, struct partial_schedule): Move to
 modulo-sched.h.
 (ps_rtl_insn, ps_reg_move): Remove static.
 (apply_reg_moves): Remove static and call df_insn_rescan only
 if PS is final.
 (undo_reg_moves): New function.
 (sms_schedule): Call register pressure estimation.
 * modulo-sched.h: New file.
 * modulo-sched-pressure.c: New file.

(See attached file: patch_pressure_3_1_12.txt)




Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2012-01-03 Thread Revital1 Eres

Hello,

 On Mon, Jan 2, 2012 at 3:30 PM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
  Ayal Zaks ayal.z...@gmail.com writes:
  +  for (i = 0; i  ira_pressure_classes_num; i++)
  +    {
  +      enum reg_class pressure_class;
  +
  +      pressure_class = ira_pressure_classes[i];
  +
  +      if (max_reg_pressure[pressure_class] == 0)
  +     continue;
  +
  +      if (dump_file)
  +     fprintf (dump_file, %s=%d  %d , reg_class_names
[pressure_class],
  +              max_reg_pressure[pressure_class],
  +              ira_available_class_regs[pressure_class]);
  +
  +      if (max_reg_pressure[pressure_class]
  +        ira_class_hard_regs_num[pressure_class])
  +     {
  +       if (dump_file)
  +         fprintf (dump_file, (pressure)\n);
  +
  +       pressure_p = true;
 
  you can break; now.
 
  FWIW, I thought the same thing at first, but I think Revital's wayis
better.
  It's nice to know when looking at dumps whether there is excess
pressure
  in more than one class.  This isn't performance-critical code after
all.
 

 ok

  however, you have everything setup to compute the amount of spill, so
  suggest to do the following instead:
 
            pressure += max_reg_pressure[pressure_class]
                        - ira_class_hard_regs_num[pressure_class]);
 
  or better call the variable spillage instead of pressure, and
  return it. Current use will only check if it's zero or positive.
 
  I read this suggestion in the same way as Revital seems to have done:
  that we sum the pressure change over all classes.  But that isn't the
idea.
  Using N too many registers in one pressure class cannot be mitigated by
  leaving N registers in another pressure class unused.
 

 of-course (wasn't thinking of spilling from one register file to
 another); meant to suggest doing:

 if (max_reg_pressure[pressure_class]  ira_class_hard_regs_num
 [pressure_class])
   spillage += max_reg_pressure[pressure_class] -
 ira_class_hard_regs_num[pressure_class]);


  TBH, the original version of this function looked better to me.
 

 ok, it would surely suffice for now.


Attached is an updated version with the two changes mentioned above taken
from the previous patch.

Tested and bootstrap with the other patch in the series on
ppc64-redhat-linux, enabling SMS on loops with SC 1.

Thanks again,
Revital

2012-01-03  Richard Sandiford  richard.sandif...@linaro.org
Revital Eres  revital.e...@linaro.org

* loop-invariant.c (get_regno_pressure_class): Move function to...
* ira.c: Here.
* common.opt (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose):
New flags.
* doc/invoke.texi (fmodulo-sched-reg-pressure,
-fmodulo-sched-verbose): Document the flags.
* ira.h (get_regno_pressure_class,
reset_pseudo_classes_defined_p): Declare.
* ira-costs.c (reset_pseudo_classes_defined_p): New function.
* Makefile.in (modulo-sched.o): Include ira.h and modulo-sched.h.
(modulo-sched-pressure.o): New.
* modulo-sched.c (ira.h, modulo-sched.h): New includes.
(partial_schedule_ptr, ps_insn_ptr, struct ps_insn,
struct ps_reg_move_info, struct partial_schedule): Move to
modulo-sched.h.
(ps_rtl_insn, ps_reg_move): Remove static.
(apply_reg_moves): Remove static and call df_insn_rescan only
if PS is final.
(undo_reg_moves): New function.
(sms_schedule): Call register pressure estimation.
* modulo-sched.h: New file.
* modulo-sched-pressure.c: New file.

(See attached file: patch_pressure_3_1_12.txt)Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 182766)
+++ doc/invoke.texi (working copy)
@@ -374,6 +374,7 @@ Objective-C and Objective-C++ Dialects}.
 -floop-parallelize-all -flto -flto-compression-level @gol
 -flto-partition=@var{alg} -flto-report -fmerge-all-constants @gol
 -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol
+-fmodulo-sched-reg-pressure -fmodulo-sched-verbose=@var{n} @gol
 -fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg 
@gol
 -fno-default-inline @gol
 -fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol
@@ -6476,6 +6477,16 @@ deleted which will trigger the generatio
 life-range analysis.  This option is effective only with
 @option{-fmodulo-sched} enabled.
 
+@item -fmodulo-sched-reg-pressure
+@opindex fmodulo-sched-reg-pressure
+Do not apply @option{-fmodulo-sched} to loops if the result would lead
+to register spilling within the loop.
+This option is effective only with @option{-fmodulo-sched} enabled.
+
+@item -fmodulo-sched-verbose=@var{n}
+@opindex fmodulo-sched-verbose
+Set up how verbose dump file for the SMS will be.  
+
 @item -fno-branch-count-reg
 @opindex fno-branch-count-reg
 Do not use ``decrement and branch'' instructions on a count register,
Index: modulo-sched.h

Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2012-01-02 Thread Richard Sandiford
Ayal Zaks ayal.z...@gmail.com writes:
 +  for (i = 0; i  ira_pressure_classes_num; i++)
 +{
 +  enum reg_class pressure_class;
 +
 +  pressure_class = ira_pressure_classes[i];
 +
 +  if (max_reg_pressure[pressure_class] == 0)
 + continue;
 +
 +  if (dump_file)
 + fprintf (dump_file, %s=%d  %d , reg_class_names[pressure_class],
 +  max_reg_pressure[pressure_class],
 +  ira_available_class_regs[pressure_class]);
 +
 +  if (max_reg_pressure[pressure_class]
 +ira_class_hard_regs_num[pressure_class])
 + {
 +   if (dump_file)
 + fprintf (dump_file, (pressure)\n);
 +
 +   pressure_p = true;

 you can break; now.

FWIW, I thought the same thing at first, but I think Revital's way is better.
It's nice to know when looking at dumps whether there is excess pressure
in more than one class.  This isn't performance-critical code after all.

 however, you have everything setup to compute the amount of spill, so
 suggest to do the following instead:

   pressure += max_reg_pressure[pressure_class]
   - ira_class_hard_regs_num[pressure_class]);

 or better call the variable spillage instead of pressure, and
 return it. Current use will only check if it's zero or positive.

I read this suggestion in the same way as Revital seems to have done:
that we sum the pressure change over all classes.  But that isn't the idea.
Using N too many registers in one pressure class cannot be mitigated by
leaving N registers in another pressure class unused.

TBH, the original version of this function looked better to me.

 Future use, as discussed offline, should compare this to the amount of
 spillage the original loop (w/o modulo-scheduling) is expected to
 have.

I wonder if that's really worth pursuing though.  For one thing,
we've no control over where the spill code is going to be inserted,
so the final ii is going to be a little arbitrary.  That's less of a
problem without SMS because we're able to reschedule the code after
register allocation in order to move spills around.

For another thing, the pressure of the original (unscheduled) code isn't
necessarily indicative of what can be achieved by the normal scheduler
with something like -fsched-pressure.  We can't assume that the original
block has minimum pressure.

If we're serious about wanting to use SMS in high-pressure loops,
perhaps we should consider trying to split live ranges in SMS itself.
I'm not sure it makes sense to leave an SMSed loop that we know
is going to need spill code.

Richard


Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2012-01-02 Thread Ayal Zaks
On Mon, Jan 2, 2012 at 3:30 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 Ayal Zaks ayal.z...@gmail.com writes:
 +  for (i = 0; i  ira_pressure_classes_num; i++)
 +    {
 +      enum reg_class pressure_class;
 +
 +      pressure_class = ira_pressure_classes[i];
 +
 +      if (max_reg_pressure[pressure_class] == 0)
 +     continue;
 +
 +      if (dump_file)
 +     fprintf (dump_file, %s=%d  %d , reg_class_names[pressure_class],
 +              max_reg_pressure[pressure_class],
 +              ira_available_class_regs[pressure_class]);
 +
 +      if (max_reg_pressure[pressure_class]
 +        ira_class_hard_regs_num[pressure_class])
 +     {
 +       if (dump_file)
 +         fprintf (dump_file, (pressure)\n);
 +
 +       pressure_p = true;

 you can break; now.

 FWIW, I thought the same thing at first, but I think Revital's way is better.
 It's nice to know when looking at dumps whether there is excess pressure
 in more than one class.  This isn't performance-critical code after all.


ok

 however, you have everything setup to compute the amount of spill, so
 suggest to do the following instead:

           pressure += max_reg_pressure[pressure_class]
                       - ira_class_hard_regs_num[pressure_class]);

 or better call the variable spillage instead of pressure, and
 return it. Current use will only check if it's zero or positive.

 I read this suggestion in the same way as Revital seems to have done:
 that we sum the pressure change over all classes.  But that isn't the idea.
 Using N too many registers in one pressure class cannot be mitigated by
 leaving N registers in another pressure class unused.


of-course (wasn't thinking of spilling from one register file to
another); meant to suggest doing:

if (max_reg_pressure[pressure_class]  ira_class_hard_regs_num[pressure_class])
  spillage += max_reg_pressure[pressure_class] -
ira_class_hard_regs_num[pressure_class]);


 TBH, the original version of this function looked better to me.


ok, it would surely suffice for now.


 Future use, as discussed offline, should compare this to the amount of
 spillage the original loop (w/o modulo-scheduling) is expected to
 have.

 I wonder if that's really worth pursuing though.  For one thing,
 we've no control over where the spill code is going to be inserted,
 so the final ii is going to be a little arbitrary.  That's less of a
 problem without SMS because we're able to reschedule the code after
 register allocation in order to move spills around.

Indeed; that's the main motivation for the second sched pass.


 For another thing, the pressure of the original (unscheduled) code isn't
 necessarily indicative of what can be achieved by the normal scheduler
 with something like -fsched-pressure.  We can't assume that the original
 block has minimum pressure.

 If we're serious about wanting to use SMS in high-pressure loops,
 perhaps we should consider trying to split live ranges in SMS itself.
 I'm not sure it makes sense to leave an SMSed loop that we know
 is going to need spill code.

 Richard

Agreed. Suggest to have concrete testcases guide further development.
Thanks,
Ayal.


Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2012-01-01 Thread Revital Eres
Hello,

Thanks for the comments! I incorporated them in the attached patch.

Currently testing and bootstrap with the other patch in the series on
ppc64-redhat-linux, enabling SMS on loops with SC 1.

Thanks again,
Revital

2012-01-01  Richard Sandiford  richard.sandif...@linaro.org
Revital Eres  revital.e...@linaro.org

* loop-invariant.c (get_regno_pressure_class): Move function to...
* ira.c: Here.
* common.opt (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose):
New flags.
* doc/invoke.texi (fmodulo-sched-reg-pressure,
-fmodulo-sched-verbose): Document the flags.
* ira.h (get_regno_pressure_class,
reset_pseudo_classes_defined_p): Declare.
* ira-costs.c (reset_pseudo_classes_defined_p): New function.
* Makefile.in (modulo-sched.o): Include ira.h and modulo-sched.h.
(modulo-sched-pressure.o): New.
* modulo-sched.c (ira.h, modulo-sched.h): New includes.
(partial_schedule_ptr, ps_insn_ptr, struct ps_insn,
struct ps_reg_move_info, struct partial_schedule): Move to
modulo-sched.h.
(ps_rtl_insn, ps_reg_move): Remove static.
(apply_reg_moves): Remove static and call df_insn_rescan only
if PS is final.
(undo_reg_moves): New function.
(sms_schedule): Call register pressure estimation.
* modulo-sched.h: New file.
* modulo-sched-pressure.c: New file.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 182766)
+++ doc/invoke.texi (working copy)
@@ -374,6 +374,7 @@ Objective-C and Objective-C++ Dialects}.
 -floop-parallelize-all -flto -flto-compression-level @gol
 -flto-partition=@var{alg} -flto-report -fmerge-all-constants @gol
 -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol
+-fmodulo-sched-reg-pressure -fmodulo-sched-verbose=@var{n} @gol
 -fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg 
@gol
 -fno-default-inline @gol
 -fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol
@@ -6476,6 +6477,16 @@ deleted which will trigger the generatio
 life-range analysis.  This option is effective only with
 @option{-fmodulo-sched} enabled.
 
+@item -fmodulo-sched-reg-pressure
+@opindex fmodulo-sched-reg-pressure
+Do not apply @option{-fmodulo-sched} to loops if the result would lead
+to register spilling within the loop.
+This option is effective only with @option{-fmodulo-sched} enabled.
+
+@item -fmodulo-sched-verbose=@var{n}
+@opindex fmodulo-sched-verbose
+Set up how verbose dump file for the SMS will be.  
+
 @item -fno-branch-count-reg
 @opindex fno-branch-count-reg
 Do not use ``decrement and branch'' instructions on a count register,
Index: modulo-sched.h
===
--- modulo-sched.h  (revision 0)
+++ modulo-sched.h  (revision 0)
@@ -0,0 +1,120 @@
+/* Swing Modulo Scheduling implementation.
+   Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 
+   Free Software Foundation, Inc.
+   Contributed by Revital Eres revital.e...@linaro.org 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */
+
+#ifndef GCC_SMS_H
+#define GCC_SMS_H
+
+#include ddg.h
+
+extern HARD_REG_SET eliminable_regset;
+
+typedef struct partial_schedule *partial_schedule_ptr;
+
+typedef struct ps_insn *ps_insn_ptr;
+
+/* A single instruction in the partial schedule.  */
+struct ps_insn
+{
+  /* Identifies the instruction to be scheduled.  Values smaller than
+ the ddg's num_nodes refer directly to ddg nodes.  A value of
+ X - num_nodes refers to register move X.  */
+  int id;
+
+  /* The (absolute) cycle in which the PS instruction is scheduled.
+ Same as SCHED_TIME (node).  */
+  int cycle;
+
+  /* The next/prev PS_INSN in the same row.  */
+  ps_insn_ptr next_in_row,
+ prev_in_row;
+
+};
+
+/* Information about a register move that has been added to a partial
+   schedule.  */
+struct ps_reg_move_info
+{
+  /* The source of the move is defined by the ps_insn with id DEF.
+ The destination is used by the ps_insns with the ids in USES.  */
+  int def;
+  sbitmap uses;
+
+  /* The original form of USES' instructions used OLD_REG, but they
+ should now use NEW_REG.  */
+  rtx old_reg;
+  rtx new_reg;
+
+  /* The number of consecutive stages that the 

Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2011-12-31 Thread Ayal Zaks
On Mon, Dec 19, 2011 at 4:28 PM, Richard Sandiford
richard.sandif...@linaro.org wrote:
 Hi Revital,

 Revital Eres revital.e...@linaro.org writes:
 The attached patch is a resubmission following comments made by Ayal
 and Richard.

 Tested and bootstrap with the other patches in the series on
 ppc64-redhat-linux, enabling SMS on loops with SC 1.


The modulo-sched parts are approved, with some comments provided below.
Moving get_regno_pressure_class() from loop-invariant.c to ira.c and
other ira.h/ira-costs.c changes are not for me to approve, but they
make sense to me.

 Looks really good to me.  I'll leave any approval to Ayal though.
 Just one suggestion:

 +       /* Handle register moves.  */

while we're at it, the comment above should be assigned to the then
part below:

 +       if (ps_i-id = ps-g-num_nodes)
 +         {
 +           int old_regno = REGNO (ps_reg_move (ps, ps_i-id)-old_reg);
 +           int new_regno = REGNO (ps_reg_move (ps, ps_i-id)-new_reg);
 +
 +           change_pressure (old_regno, true);
 +           change_pressure (new_regno, false);
 +           change_pressure (old_regno, true);
 +         }
 +       else
 +         {
 +           for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
 +             change_pressure (DF_REF_REGNO (*use_rec), true);
 +
 +           for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
 +             change_pressure (DF_REF_REGNO (*def_rec), false);
 +
 +           for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
 +             change_pressure (DF_REF_REGNO (*use_rec), true);
 +         }

 It might be worth adding a comment to say why the code is doing it this way.

Indeed, this does raise an eyebrow, and your explanation below is
great. Suggest to start by saying that The last two steps are the
natural way one would go about updating live registers in a bottom-up
scan, except that in some cases (iiuc) the same physical register
cannot be assigned to both use and def on same insn, so the first step
is added conservatively.

 E.g.:

      /* Process all uses, all defs, and then all uses again.  The first
         two steps give us an estimate of the pressure when dying inputs
         cannot be tied to outputs (which is the worst case).  The second
         two steps update the set of live registers ready for the next
         instruction.  */

 Also, as a general future direction comment, I think we should seriously
 consider turning both this and -fmodulo-sched-allow-regmoves on by default,
 so that -fmodulo-sched uses them unless explicitly told not to.  The results
 for ARM seemed to suggest that it is now the best SMS mode (although the
 results can't be shared, unfortunately).  It'd be unfortunate if users
 had to write:

    -fmodulo-sched -fmodulo-sched-allow-regmoves -fmodulo-sched-reg-pressure

 instead of plain:

    -fmodulo-sched

 It might make sense as a follow-on patch rather than part of the main commit.


I'd be in favor, provided this works well for other platforms as well
of-course. Ignoring the potential increase of register pressure inside
a loop is being incautiously optimistic...

It would be great to document testcases where spillage is detected;
the original motivation of SMS is to reduce size of register live
ranges so as to prevent spillage. There may be other means of
preventing or mitigating spills other than bumping the II (e.g.,
Minimum register requirements for a module schedule Micro'94), which
could be devised if concrete examples are analyzed.

 There's also the separate debate about whether SMS should now be enabled
 by default for ARM at -O3, but that's for another day...


Guess so... ;-)
Thanks.

 Richard


Here are some more comments:

+change_pressure (int regno, bool incr_p)
 ^^
I realize this was taken from elsewhere, but maybe update_pressure
or update_current_pressure would better fit here.

+{
+  int nregs;
+  enum reg_class pressure_class;
+
+  if (regno  FIRST_PSEUDO_REGISTER
+   (TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
+ || TEST_HARD_REG_BIT (eliminable_regset, regno)))
+return;
+
+  /* Update the current set of live registers.  Exit early if nothing
+ has changed.  */

please clarify: we want to increase curr_reg_pressure as we scan
upwards and encounter a last use; if regno is already live, this use
is not last.

likewise, we want to decrease curr_reg_pressure as we encounter a def;
regno might not be live when !incr_p, if the def feeds only uses in
next iteration.

+  if (incr_p
+  ? !bitmap_set_bit (curr_regs_live, regno)
+  : !bitmap_clear_bit (curr_regs_live, regno))
+return;
+
+  pressure_class = get_regno_pressure_class (regno, nregs);
+
+  if (!incr_p)
+curr_reg_pressure[pressure_class] -= nregs;
+  else
+curr_reg_pressure[pressure_class] += nregs;
+

sounds a bit more natural to ask if (incr_p)...; and only if so,
check also if the increased current pressure exceeds max:

+  if (max_reg_pressure[pressure_class]  

Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2011-12-19 Thread Richard Sandiford
Hi Revital,

Revital Eres revital.e...@linaro.org writes:
 The attached patch is a resubmission following comments made by Ayal
 and Richard.

 Tested and bootstrap with the other patches in the series on
 ppc64-redhat-linux, enabling SMS on loops with SC 1.

Looks really good to me.  I'll leave any approval to Ayal though.
Just one suggestion:

 +   /* Handle register moves.  */
 +   if (ps_i-id = ps-g-num_nodes)
 + {
 +   int old_regno = REGNO (ps_reg_move (ps, ps_i-id)-old_reg);
 +   int new_regno = REGNO (ps_reg_move (ps, ps_i-id)-new_reg);
 +
 +   change_pressure (old_regno, true);
 +   change_pressure (new_regno, false);
 +   change_pressure (old_regno, true);
 + }
 +   else
 + {
 +   for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
 + change_pressure (DF_REF_REGNO (*use_rec), true);
 +
 +   for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
 + change_pressure (DF_REF_REGNO (*def_rec), false);
 +
 +   for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
 + change_pressure (DF_REF_REGNO (*use_rec), true);
 + }

It might be worth adding a comment to say why the code is doing it this way.
E.g.:

  /* Process all uses, all defs, and then all uses again.  The first
 two steps give us an estimate of the pressure when dying inputs
 cannot be tied to outputs (which is the worst case).  The second
 two steps update the set of live registers ready for the next
 instruction.  */

Also, as a general future direction comment, I think we should seriously
consider turning both this and -fmodulo-sched-allow-regmoves on by default,
so that -fmodulo-sched uses them unless explicitly told not to.  The results
for ARM seemed to suggest that it is now the best SMS mode (although the
results can't be shared, unfortunately).  It'd be unfortunate if users
had to write:

-fmodulo-sched -fmodulo-sched-allow-regmoves -fmodulo-sched-reg-pressure

instead of plain:

-fmodulo-sched

It might make sense as a follow-on patch rather than part of the main commit.

There's also the separate debate about whether SMS should now be enabled
by default for ARM at -O3, but that's for another day...

Richard


[PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2011-12-17 Thread Revital Eres
Hello,

The attached patch is a resubmission following comments made by Ayal
and Richard.

Tested and bootstrap with the other patches in the series on
ppc64-redhat-linux, enabling SMS on loops with SC 1.

Comments are welcome.

Thanks,
Revital


   2011-12-18  Richard Sandiford  richard.sandif...@linaro.org
Revital Eres  revital.e...@linaro.org

* loop-invariant.c (get_regno_pressure_class): Move function to...
* ira.c: Here.
* common.opt (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose):
New flags.
* doc/invoke.texi (fmodulo-sched-reg-pressure,
-fmodulo-sched-verbose): Document the flags.
* ira.h (get_regno_pressure_class,
reset_pseudo_classes_defined_p): Declare.
* ira-costs.c (reset_pseudo_classes_defined_p): New function.
* Makefile.in (modulo-sched.o): Include ira.h and modulo-sched.h.
(modulo-sched-pressure.o): New.
* modulo-sched.c (ira.h, modulo-sched.h): New includes.
(partial_schedule_ptr, ps_insn_ptr, struct ps_insn,
struct ps_reg_move_info, struct partial_schedule): Move to
modulo-sched.h.
(ps_rtl_insn, ps_reg_move): Remove static.
(apply_reg_moves): Remove static and call df_insn_rescan only
if PS is final.
(undo_reg_moves): New function.
(sms_schedule): Call register pressure estimation.
* modulo-sched.h: New file.
* modulo-sched-pressure.c: New file.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 182357)
+++ doc/invoke.texi (working copy)
@@ -373,6 +373,7 @@ Objective-C and Objective-C++ Dialects}.
 -floop-parallelize-all -flto -flto-compression-level @gol
 -flto-partition=@var{alg} -flto-report -fmerge-all-constants @gol
 -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol
+-fmodulo-sched-reg-pressure -fmodulo-sched-verbose=@var{n} @gol
 -fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg 
@gol
 -fno-default-inline @gol
 -fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol
@@ -6474,6 +6475,16 @@ deleted which will trigger the generatio
 life-range analysis.  This option is effective only with
 @option{-fmodulo-sched} enabled.
 
+@item -fmodulo-sched-reg-pressure
+@opindex fmodulo-sched-reg-pressure
+Do not apply @option{-fmodulo-sched} to loops if the result would lead
+to register spilling within the loop.
+This option is effective only with @option{-fmodulo-sched} enabled.
+
+@item -fmodulo-sched-verbose=@var{n}
+@opindex fmodulo-sched-verbose
+Set up how verbose dump file for the SMS will be.  
+
 @item -fno-branch-count-reg
 @opindex fno-branch-count-reg
 Do not use ``decrement and branch'' instructions on a count register,
Index: modulo-sched.h
===
--- modulo-sched.h  (revision 0)
+++ modulo-sched.h  (revision 0)
@@ -0,0 +1,120 @@
+/* Swing Modulo Scheduling implementation.
+   Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 
+   Free Software Foundation, Inc.
+   Contributed by Revital Eres revital.e...@linaro.org 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */
+
+#ifndef GCC_SMS_H
+#define GCC_SMS_H
+
+#include ddg.h
+
+extern HARD_REG_SET eliminable_regset;
+
+typedef struct partial_schedule *partial_schedule_ptr;
+
+typedef struct ps_insn *ps_insn_ptr;
+
+/* A single instruction in the partial schedule.  */
+struct ps_insn
+{
+  /* Identifies the instruction to be scheduled.  Values smaller than
+ the ddg's num_nodes refer directly to ddg nodes.  A value of
+ X - num_nodes refers to register move X.  */
+  int id;
+
+  /* The (absolute) cycle in which the PS instruction is scheduled.
+ Same as SCHED_TIME (node).  */
+  int cycle;
+
+  /* The next/prev PS_INSN in the same row.  */
+  ps_insn_ptr next_in_row,
+ prev_in_row;
+
+};
+
+/* Information about a register move that has been added to a partial
+   schedule.  */
+struct ps_reg_move_info
+{
+  /* The source of the move is defined by the ps_insn with id DEF.
+ The destination is used by the ps_insns with the ids in USES.  */
+  int def;
+  sbitmap uses;
+
+  /* The original form of USES' instructions used OLD_REG, but they
+ should now use NEW_REG.  */
+  rtx old_reg;
+  rtx new_reg;
+
+  /* The number of 

Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule

2011-11-25 Thread Richard Sandiford
Hi Revital,

Revital Eres revital.e...@linaro.org writes:
 The attached patch adds register pressure estimation of the partial schedule.

My main comment is that we shouldn't need to track separate liveness
sets for each loop here, since we're only looking at one basic block.
I.e., rather than operate on the per-loop LOOP_DATA (loop)-regs_{ref,live},
we should be able to use a single pair of bitmaps.

Also, the code goes to a lot of trouble over this case:

+  /* Add to the set of out live regs all the registers defined in bb
+ which have uses outside of it (those registers where eliminated in
+ the above calculation).  Eliminate from this set the definitions
+ that exist in the epilog and with no uses inside the basic-block
+ as these definitions will be eliminated from the bb and thus should
+ not be considered for estimating register pressure in the bb.  */

But how often does it occur in practice?  It's not necessarily the case
that the instruction will be eliminated, because things like volatility
might require us to keep it.  It's probably more accurate to say that we
can treat these as unused defs.

There's an argument to say that we should only consider registers
that are used in the loop.  If the pressure is high because of
registers that are live across the loop but not used within it,
then it's reasonable to force code outside the loop to spill some
of those.  That would suggest starting with the intersection of
DR_LR_OUT and DF_LR_BB_INFO (bb)-use.  Starting with that set
also has the advantage of handling the above case for free.

(This occurs often in our friend the popular embedded benchmark, which
often has a single function of the form:

  A: ...set up...
  B: for (i = 0; i  num_runs; i++)
  C:   ...benchmark...
  D: ...record time...

Some values are live from A-D, but those values shouldn't affect
an SMSable loop somewhere in C.)

We talked earlier about making the main pressure-estimation code
process the loop twice, but I see instead you've gone for two
separate passes, one to calculate LR out, then the main pass.
I think with the changes above, running the same loop twice is
going to be easier and no less efficient.  We could even add
code to skip the second iteration if it would start with the
same lr_out as the first iteration.

Richard


[PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule

2011-11-20 Thread Revital Eres
Hello,

The attached patch adds register pressure estimation of the partial schedule.

Tested and bootstrap with the other patches in the series on
ppc64-redhat-linux,
enabling SMS on loops with SC 1.

Comments are welcome.

Thanks,
Revital

Changelog:
* loop-invariant.c (get_regno_pressure_class): Move function to...
* ira.c (get_regno_pressure_class): Here.
* common.opt (fmodulo-sched-reg-pressure): New flag.
* doc/invoke.texi (fmodulo-sched-reg-pressure): Document it.
* ira.h (get_regno_pressure_class): Declare.
* rtl.h (set_reg_allocno_class): Declare.
* reginfo.c (set_reg_allocno_class): New function.
* Makefile.in (modulo-sched.o): Include ira.h.
* modulo-sched.c (ira.h): New include.
(rtl_insn_ps, undo_reg_moves, mark_def_regs, mark_reg_use,
mark_reg_use_1, insn_exists_in_epilog_p, calc_lr_out_regs,
change_pressure, update_reg_moves_pressure_info,
initiate_reg_pressure_info, mark_regno_live, mark_reg_birth_1,
mark_reg_birth, mark_regno_death, mark_ref_regs,
calc_insn_reg_pressure_info, calc_reg_pressure, free_loop_data,
free_reg_pressure_info, ps_reg_pressure_p): New functions.
(apply_reg_moves): Add parameter.
(curr_regs_live, curr_reg_pressure, curr_loop): New
data-structures.
(loop_data): New struct.
(LOOP_DATA): New definition.
(sms_schedule): Use register pressure estimation.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 181149)
+++ doc/invoke.texi (working copy)
@@ -373,6 +373,7 @@ Objective-C and Objective-C++ Dialects}.
 -floop-parallelize-all -flto -flto-compression-level @gol
 -flto-partition=@var{alg} -flto-report -fmerge-all-constants @gol
 -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol
+-fmodulo-sched-reg-pressure @gol
 -fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg 
@gol
 -fno-default-inline @gol
 -fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol
@@ -6457,6 +6458,11 @@ deleted which will trigger the generatio
 life-range analysis.  This option is effective only with
 @option{-fmodulo-sched} enabled.
 
+@item -fmodulo-sched-reg-pressure
+@opindex fmodulo-sched-reg-pressure
+Perform SMS based modulo scheduling with register pressure estimation.
+This option is effective only with @option{-fmodulo-sched} enabled.
+
 @item -fno-branch-count-reg
 @opindex fno-branch-count-reg
 Do not use ``decrement and branch'' instructions on a count register,
Index: loop-invariant.c
===
--- loop-invariant.c(revision 181149)
+++ loop-invariant.c(working copy)
@@ -1619,34 +1619,6 @@ static rtx regs_set[(FIRST_PSEUDO_REGIST
 /* Number of regs stored in the previous array.  */
 static int n_regs_set;
 
-/* Return pressure class and number of needed hard registers (through
-   *NREGS) of register REGNO.  */
-static enum reg_class
-get_regno_pressure_class (int regno, int *nregs)
-{
-  if (regno = FIRST_PSEUDO_REGISTER)
-{
-  enum reg_class pressure_class;
-
-  pressure_class = reg_allocno_class (regno);
-  pressure_class = ira_pressure_class_translate[pressure_class];
-  *nregs
-   = ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno)];
-  return pressure_class;
-}
-  else if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
-   ! TEST_HARD_REG_BIT (eliminable_regset, regno))
-{
-  *nregs = 1;
-  return ira_pressure_class_translate[REGNO_REG_CLASS (regno)];
-}
-  else
-{
-  *nregs = 0;
-  return NO_REGS;
-}
-}
-
 /* Increase (if INCR_P) or decrease current register pressure for
register REGNO.  */
 static void
Index: common.opt
===
--- common.opt  (revision 181149)
+++ common.opt  (working copy)
@@ -1457,6 +1457,10 @@ fmodulo-sched-allow-regmoves
 Common Report Var(flag_modulo_sched_allow_regmoves)
 Perform SMS based modulo scheduling with register moves allowed
 
+fmodulo-sched-reg-pressure
+Common Report Var(flag_modulo_sched_reg_pressure)
+Perform SMS based modulo scheduling with regsiter pressure estimation.
+
 fmove-loop-invariants
 Common Report Var(flag_move_loop_invariants) Init(1) Optimization
 Move loop invariant computations out of loops
Index: ira.c
===
--- ira.c   (revision 181149)
+++ ira.c   (working copy)
@@ -3784,6 +3784,34 @@ ira (FILE *f)
   timevar_pop (TV_IRA);
 }
 
+/* Return pressure class and number of needed hard registers (through
+   *NREGS) of register REGNO.  */
+enum reg_class
+get_regno_pressure_class (int regno, int *nregs)
+{
+  if (regno = FIRST_PSEUDO_REGISTER)
+{
+  enum reg_class pressure_class;
+
+  pressure_class = reg_allocno_class (regno);
+