Re: debug insns in SMS

2012-06-14 Thread Ayal Zaks
Thanks for the duplicate ping. This is fine.
So this indeed solves the discrepancy between running SMS w/ and w/o debugging?
Please include a comment next to the code stating why it's important
not to create such deps.
You may also want to store the result of DEP_PRO (dep) in
src_something and use it twice, for clarity.
Thanks,
Ayal.


On Thu, Jun 14, 2012 at 1:00 AM, Alexandre Oliva aol...@redhat.com wrote:

 Apologies for the duplicate ping, this one is now properly addressed to
 the pass maintainer.

 On Apr  9, 2012, Alexandre Oliva aol...@redhat.com wrote:

  On May  4, 2011, Revital1 Eres e...@il.ibm.com wrote:
  Hello Alexandre
  I think this will restore proper functioning to SMS in the presence of
  debug insns.  A while ago, we'd never generate deps of non-debug insns
  on debug insns.  I introduced them to enable sched to adjust (reset)
  debug insns when non-debug insns were moved before them.  I believe it
  is safe to leave them out of the SCCs.  Even though this will end up
  causing some loss of debug info, that's probably unavoidable, and the
  end result after this change is pobably the best we can hope for.
   Your
  thoughts?

  Thanks for the patch!

  I actually discussed this issue with Ayal yesterday.
  Ayal also suggested to reconsider the edges that are created in
  the DDG between real instructions and debug_insns. Currently, we create
  bidirectional anti deps edges between them. This leads to the problem
  you
  were trying to solve in the current patch (described below) where these
  extra edges influence the construction of the strongly connected
  component
  and the code generated with and w\o -g. Your patch seems to solve this
  problem.
  However I can not approve it as I'm not the maintainer (Ayal is).

  Ping?

  (Retested on x86_64-linux-gnu and i686-pc-linux-gnu)


  for  gcc/ChangeLog
  from  Alexandre Oliva  aol...@redhat.com

        * ddg.c (build_intra_loop_deps): Discard deps of nondebug on
  debug.

 Ping?  http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00419.html

 --
 Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
 You must be the change you wish to see in the world. -- Gandhi
 Be Free! -- http://FSFLA.org/   FSF Latin America board member
 Free Software Evangelist      Red Hat Brazil Compiler Engineer


Re: [SMS] Support new loop pattern

2012-03-30 Thread Ayal Zaks
Roman, Andrey,

Sorry for the delayed response.

It would indeed be good to have SMS apply to more loop patterns, still
within the realm of *countable* loops. SMS was originally designed to
handle doloops, with a specific pattern controlling the loop, easily
identified and separable from the loop's body. The newly proposed
change to support new loop patterns is pretty invasive and sizable,
taking place entirely within modulo-sched.c. The main issue I've been
considering, is whether it would be possible instead to transform the
new loop patterns we want SMS to handle, into doloops (potentially
introducing additional induction variables to feed other uses), and
then feed the resulting loop into SMS as is? In other words, could you
fold it into doloop.c? And if so, will doing so introduce significant
overheads?

2012/3/29 Andrey Belevantsev a...@ispras.ru:
 Hello,

 I'd like to ping again those SMS patches once we're back to Stage 1.

 Ayal, maybe it would remove some burden for you if you'd review the general
 SMS functionality of those patches, and we'd ask RTL folks to look at the
 pieces related to RTL pattern matching and generation?


It definitely would ... especially in light of the above issue.
Thanks (for your patches, patience, pings..),
Ayal.



 Yours,
 Andrey


 On 10.02.2012 16:15, Roman Zhuykov wrote:

 Ping.
 Ayal, please review this patch and these three patches too:
 http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00505.html
 http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00506.html
 http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01800.html

 --
 Roman Zhuykov
 zhr...@ispras.ru


Re: [PATCH] Fix -fdump-rtl-sms (PR rtl-optimization/52095)

2012-02-04 Thread Ayal Zaks
On Fri, Feb 3, 2012 at 8:07 PM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 On some targets e.g. sms-7.c test fails, because fprintf is called
 with %s format and NULL argument, GLIBC prints for that e.g.
 SMS loop num: 1, file: (null), line: 0
 but it isn't portable.  print-rtl.c guards the locator printing with
            /*  Pretty-print insn locators.  Ignore scoping as it is mostly
                redundant with line number information and do not print 
 anything
                when there is no location information available.  */
            if (INSN_LOCATOR (in_rtx)  insn_file (in_rtx))
              fprintf(outfile,  %s:%i, insn_file (in_rtx), insn_line 
 (in_rtx));
 which fixes this, but there are 7 different spots that would need adjusting
 in modulo-sched.c, so I've added a helper function for that.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


Ok with me.

It would be better to try and locate some relevant file:line
information elsewhere(?) instead of silently giving up.

Thanks,
Ayal.


 2012-02-03  Jakub Jelinek  ja...@redhat.com

        PR rtl-optimization/52095
        * modulo-sched.c (dump_insn_locator): New function.
        (loop_canon_p, sms_schedule): Use it.

 --- gcc/modulo-sched.c.jj       2011-12-14 08:11:03.0 +0100
 +++ gcc/modulo-sched.c  2012-02-03 13:45:49.137997767 +0100
 @@ -1246,6 +1246,19 @@ loop_single_full_bb_p (struct loop *loop
   return true;
  }

 +/* Dump file:line from INSN's location info to dump_file.  */
 +
 +static void
 +dump_insn_locator (rtx insn)
 +{
 +  if (dump_file  INSN_LOCATOR (insn))
 +    {
 +      const char *file = insn_file (insn);
 +      if (file)
 +       fprintf (dump_file,  %s:%i, file, insn_line (insn));
 +    }
 +}
 +
  /* A simple loop from SMS point of view; it is a loop that is composed of
    either a single basic block or two BBs - a header and a latch.  */
  #define SIMPLE_SMS_LOOP_P(loop) ((loop-num_nodes  3 )                    \
 @@ -1271,9 +1284,9 @@ loop_canon_p (struct loop *loop)
        {
          rtx insn = BB_END (loop-header);

 -         fprintf (dump_file, SMS loop many exits );
 -                 fprintf (dump_file,  %s %d (file, line)\n,
 -                          insn_file (insn), insn_line (insn));
 +         fprintf (dump_file, SMS loop many exits);
 +         dump_insn_locator (insn);
 +         fprintf (dump_file, \n);
        }
       return false;
     }
 @@ -1284,9 +1297,9 @@ loop_canon_p (struct loop *loop)
        {
          rtx insn = BB_END (loop-header);

 -         fprintf (dump_file, SMS loop many BBs. );
 -         fprintf (dump_file,  %s %d (file, line)\n,
 -                  insn_file (insn), insn_line (insn));
 +         fprintf (dump_file, SMS loop many BBs.);
 +         dump_insn_locator (insn);
 +         fprintf (dump_file, \n);
        }
       return false;
     }
 @@ -1407,13 +1420,13 @@ sms_schedule (void)
         }

       if (dump_file)
 -      {
 -         rtx insn = BB_END (loop-header);
 -
 -         fprintf (dump_file, SMS loop num: %d, file: %s, line: %d\n,
 -                  loop-num, insn_file (insn), insn_line (insn));
 +       {
 +         rtx insn = BB_END (loop-header);

 -      }
 +         fprintf (dump_file, SMS loop num: %d, loop-num);
 +         dump_insn_locator (insn);
 +         fprintf (dump_file, \n);
 +       }

       if (! loop_canon_p (loop))
         continue;
 @@ -1440,9 +1453,8 @@ sms_schedule (void)
        {
          if (dump_file)
            {
 -             fprintf (dump_file,  %s %d (file, line)\n,
 -                      insn_file (tail), insn_line (tail));
 -             fprintf (dump_file, SMS single-bb-loop\n);
 +             dump_insn_locator (tail);
 +             fprintf (dump_file, \nSMS single-bb-loop\n);
              if (profile_info  flag_branch_probabilities)
                {
                  fprintf (dump_file, SMS loop-count );
 @@ -1543,14 +1555,15 @@ sms_schedule (void)
         continue;

       if (dump_file)
 -      {
 -         rtx insn = BB_END (loop-header);
 +       {
 +         rtx insn = BB_END (loop-header);

 -         fprintf (dump_file, SMS loop num: %d, file: %s, line: %d\n,
 -                  loop-num, insn_file (insn), insn_line (insn));
 +         fprintf (dump_file, SMS loop num: %d, loop-num);
 +         dump_insn_locator (insn);
 +         fprintf (dump_file, \n);

 -         print_ddg (dump_file, g);
 -      }
 +         print_ddg (dump_file, g);
 +       }

       get_ebb_head_tail (loop-header, loop-header, head, tail);

 @@ -1561,9 +1574,8 @@ sms_schedule (void)

       if (dump_file)
        {
 -         fprintf (dump_file,  %s %d (file, line)\n,
 -                  insn_file (tail), insn_line (tail));
 -         fprintf (dump_file, SMS single-bb-loop\n);
 +         dump_insn_locator (tail);
 +         fprintf (dump_file, \nSMS single-bb-loop\n);
          if (profile_info  flag_branch_probabilities)
            {
              fprintf (dump_file, SMS loop-count );
 @@ 

Re: [PATCH, SMS] Fix PR51794

2012-02-04 Thread Ayal Zaks
On Tue, Jan 10, 2012 at 12:31 PM, Revital1 Eres e...@il.ibm.com wrote:

 Hello,

 The patch below fixes ICE reported in PR51794.
 It avoids creating DDG edges  for register uses of class DF_REF_ARTIFICIAL
 as
 the latter does not have real instructions for them and thus calling
 BLOCK_FOR_INSN fails.

 Tested and bootstrap on ppc64-redhat-linux, enabling SMS on loops with SC
 1.

 OK for mainline?


Yes, this is ok.
Thanks,
Ayal.

 Thanks,
 Revital

 Chanelog:

        PR rtl-optimization/51794
        * ddg.c (add_cross_iteration_register_deps): Avoid
        creating edges for uses of class DF_REF_ARTIFICIAL.

 testsuite/
        PR rtl-optimization/51794
        * gcc.dg/sms-12.c: New test.

 (See attached file: patch_fix_pr51794.txt)


Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2012-02-04 Thread Ayal Zaks
SMS changes are ok.


* common.opt (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose):
   New flags.

We should document what the different verbosity levels are, or
at-least their range.


Thanks,
Ayal.



On Tue, Jan 10, 2012 at 7:48 PM, Vladimir Makarov vmaka...@redhat.com wrote:
 On 01/03/2012 04:25 AM, Revital1 Eres wrote:


 Attached is an updated version with the two changes mentioned above taken
 from the previous patch.

 Tested and bootstrap with the other patch in the series on
 ppc64-redhat-linux, enabling SMS on loops with SC 1.

 Thanks again,
 Revital


 IRA changes are ok for me.
 Thanks, Revital.

 2012-01-03  Richard Sandifordrichard.sandif...@linaro.org
             Revital Eresrevital.e...@linaro.org

         * loop-invariant.c (get_regno_pressure_class): Move function to...
         * ira.c: Here.
         * common.opt (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose):
         New flags.
         * doc/invoke.texi (fmodulo-sched-reg-pressure,
         -fmodulo-sched-verbose): Document the flags.
         * ira.h (get_regno_pressure_class,
         reset_pseudo_classes_defined_p): Declare.
         * ira-costs.c (reset_pseudo_classes_defined_p): New function.
         * Makefile.in (modulo-sched.o): Include ira.h and modulo-sched.h.
         (modulo-sched-pressure.o): New.
         * modulo-sched.c (ira.h, modulo-sched.h): New includes.
         (partial_schedule_ptr, ps_insn_ptr, struct ps_insn,
         struct ps_reg_move_info, struct partial_schedule): Move to
         modulo-sched.h.
         (ps_rtl_insn, ps_reg_move): Remove static.
         (apply_reg_moves): Remove static and call df_insn_rescan only
         if PS is final.
         (undo_reg_moves): New function.
         (sms_schedule): Call register pressure estimation.
         * modulo-sched.h: New file.
         * modulo-sched-pressure.c: New file.

 (See attached file: patch_pressure_3_1_12.txt)




Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2012-01-02 Thread Ayal Zaks
On Mon, Jan 2, 2012 at 3:30 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 Ayal Zaks ayal.z...@gmail.com writes:
 +  for (i = 0; i  ira_pressure_classes_num; i++)
 +    {
 +      enum reg_class pressure_class;
 +
 +      pressure_class = ira_pressure_classes[i];
 +
 +      if (max_reg_pressure[pressure_class] == 0)
 +     continue;
 +
 +      if (dump_file)
 +     fprintf (dump_file, %s=%d  %d , reg_class_names[pressure_class],
 +              max_reg_pressure[pressure_class],
 +              ira_available_class_regs[pressure_class]);
 +
 +      if (max_reg_pressure[pressure_class]
 +        ira_class_hard_regs_num[pressure_class])
 +     {
 +       if (dump_file)
 +         fprintf (dump_file, (pressure)\n);
 +
 +       pressure_p = true;

 you can break; now.

 FWIW, I thought the same thing at first, but I think Revital's way is better.
 It's nice to know when looking at dumps whether there is excess pressure
 in more than one class.  This isn't performance-critical code after all.


ok

 however, you have everything setup to compute the amount of spill, so
 suggest to do the following instead:

           pressure += max_reg_pressure[pressure_class]
                       - ira_class_hard_regs_num[pressure_class]);

 or better call the variable spillage instead of pressure, and
 return it. Current use will only check if it's zero or positive.

 I read this suggestion in the same way as Revital seems to have done:
 that we sum the pressure change over all classes.  But that isn't the idea.
 Using N too many registers in one pressure class cannot be mitigated by
 leaving N registers in another pressure class unused.


of-course (wasn't thinking of spilling from one register file to
another); meant to suggest doing:

if (max_reg_pressure[pressure_class]  ira_class_hard_regs_num[pressure_class])
  spillage += max_reg_pressure[pressure_class] -
ira_class_hard_regs_num[pressure_class]);


 TBH, the original version of this function looked better to me.


ok, it would surely suffice for now.


 Future use, as discussed offline, should compare this to the amount of
 spillage the original loop (w/o modulo-scheduling) is expected to
 have.

 I wonder if that's really worth pursuing though.  For one thing,
 we've no control over where the spill code is going to be inserted,
 so the final ii is going to be a little arbitrary.  That's less of a
 problem without SMS because we're able to reschedule the code after
 register allocation in order to move spills around.

Indeed; that's the main motivation for the second sched pass.


 For another thing, the pressure of the original (unscheduled) code isn't
 necessarily indicative of what can be achieved by the normal scheduler
 with something like -fsched-pressure.  We can't assume that the original
 block has minimum pressure.

 If we're serious about wanting to use SMS in high-pressure loops,
 perhaps we should consider trying to split live ranges in SMS itself.
 I'm not sure it makes sense to leave an SMSed loop that we know
 is going to need spill code.

 Richard

Agreed. Suggest to have concrete testcases guide further development.
Thanks,
Ayal.


Re: [PATCH, SMS] Prevent the creation of reg-moves for definitions with MODE_CC

2012-01-01 Thread Ayal Zaks
The attached patch prevents the creation of reg-moves for definitions
with MODE_CC and thus solves this ICE.

Currently testing and bootstrap on ppc64-redhat-linux, enabling SMS on
loops with SC 1.

OK for 4.7 once testing completes?

Yes, thanks for catching this. Shouldn't we prevent creating such
regmoves for (the other case of) intra-loop anti-deps as well?


 While true, I doubt either PPC or MIPS really benefit from moving
 around registers of CCmode.

IIRC, PPC has eight 4-bit CR's and supports efficient copying of *one
bit* from one CR to another (via cror), which would require special
regmove handling. Hence I share your doubt.

Ayal.


On Thu, Dec 22, 2011 at 10:53 AM, Richard Sandiford
richard.sandif...@linaro.org wrote:
 Richard Henderson r...@redhat.com writes:
 On 12/20/2011 09:47 AM, Richard Sandiford wrote:
 Revital Eres revital.e...@linaro.org writes:
 +/* Return true if one of the definitions in INSN has MODE_CC.  Otherwise
 +   return false.  */
 +static bool
 +def_has_ccmode_p (rtx insn)
 +{
 +  df_ref *def;
 +
 +  for (def = DF_INSN_DEFS (insn); *def; def++)
 +    {
 +      enum machine_mode mode = GET_MODE (DF_REF_REG (*def));
 +
 +      if (GET_MODE_CLASS (mode) == MODE_CC)
 +       return true;
 +    }
 +
 +  return false;
 +}

 FWIW, an alternative might be to test have_regs_of_mode[(int) mode].
 That says whether there are any allocatable (non-fixed) registers
 of the given mode.

 While true, I doubt either PPC or MIPS really benefit from moving
 around registers of CCmode.  Certainly MIPS has no way of easily
 moving CCmode registers around.  It's a rather complicated reload,
 that.

 I'd be very tempted to simply go with the original patch.

 OK, sorry for the noise.

 Richard


Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)

2011-12-31 Thread Ayal Zaks
On Mon, Dec 19, 2011 at 4:28 PM, Richard Sandiford
richard.sandif...@linaro.org wrote:
 Hi Revital,

 Revital Eres revital.e...@linaro.org writes:
 The attached patch is a resubmission following comments made by Ayal
 and Richard.

 Tested and bootstrap with the other patches in the series on
 ppc64-redhat-linux, enabling SMS on loops with SC 1.


The modulo-sched parts are approved, with some comments provided below.
Moving get_regno_pressure_class() from loop-invariant.c to ira.c and
other ira.h/ira-costs.c changes are not for me to approve, but they
make sense to me.

 Looks really good to me.  I'll leave any approval to Ayal though.
 Just one suggestion:

 +       /* Handle register moves.  */

while we're at it, the comment above should be assigned to the then
part below:

 +       if (ps_i-id = ps-g-num_nodes)
 +         {
 +           int old_regno = REGNO (ps_reg_move (ps, ps_i-id)-old_reg);
 +           int new_regno = REGNO (ps_reg_move (ps, ps_i-id)-new_reg);
 +
 +           change_pressure (old_regno, true);
 +           change_pressure (new_regno, false);
 +           change_pressure (old_regno, true);
 +         }
 +       else
 +         {
 +           for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
 +             change_pressure (DF_REF_REGNO (*use_rec), true);
 +
 +           for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
 +             change_pressure (DF_REF_REGNO (*def_rec), false);
 +
 +           for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
 +             change_pressure (DF_REF_REGNO (*use_rec), true);
 +         }

 It might be worth adding a comment to say why the code is doing it this way.

Indeed, this does raise an eyebrow, and your explanation below is
great. Suggest to start by saying that The last two steps are the
natural way one would go about updating live registers in a bottom-up
scan, except that in some cases (iiuc) the same physical register
cannot be assigned to both use and def on same insn, so the first step
is added conservatively.

 E.g.:

      /* Process all uses, all defs, and then all uses again.  The first
         two steps give us an estimate of the pressure when dying inputs
         cannot be tied to outputs (which is the worst case).  The second
         two steps update the set of live registers ready for the next
         instruction.  */

 Also, as a general future direction comment, I think we should seriously
 consider turning both this and -fmodulo-sched-allow-regmoves on by default,
 so that -fmodulo-sched uses them unless explicitly told not to.  The results
 for ARM seemed to suggest that it is now the best SMS mode (although the
 results can't be shared, unfortunately).  It'd be unfortunate if users
 had to write:

    -fmodulo-sched -fmodulo-sched-allow-regmoves -fmodulo-sched-reg-pressure

 instead of plain:

    -fmodulo-sched

 It might make sense as a follow-on patch rather than part of the main commit.


I'd be in favor, provided this works well for other platforms as well
of-course. Ignoring the potential increase of register pressure inside
a loop is being incautiously optimistic...

It would be great to document testcases where spillage is detected;
the original motivation of SMS is to reduce size of register live
ranges so as to prevent spillage. There may be other means of
preventing or mitigating spills other than bumping the II (e.g.,
Minimum register requirements for a module schedule Micro'94), which
could be devised if concrete examples are analyzed.

 There's also the separate debate about whether SMS should now be enabled
 by default for ARM at -O3, but that's for another day...


Guess so... ;-)
Thanks.

 Richard


Here are some more comments:

+change_pressure (int regno, bool incr_p)
 ^^
I realize this was taken from elsewhere, but maybe update_pressure
or update_current_pressure would better fit here.

+{
+  int nregs;
+  enum reg_class pressure_class;
+
+  if (regno  FIRST_PSEUDO_REGISTER
+   (TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
+ || TEST_HARD_REG_BIT (eliminable_regset, regno)))
+return;
+
+  /* Update the current set of live registers.  Exit early if nothing
+ has changed.  */

please clarify: we want to increase curr_reg_pressure as we scan
upwards and encounter a last use; if regno is already live, this use
is not last.

likewise, we want to decrease curr_reg_pressure as we encounter a def;
regno might not be live when !incr_p, if the def feeds only uses in
next iteration.

+  if (incr_p
+  ? !bitmap_set_bit (curr_regs_live, regno)
+  : !bitmap_clear_bit (curr_regs_live, regno))
+return;
+
+  pressure_class = get_regno_pressure_class (regno, nregs);
+
+  if (!incr_p)
+curr_reg_pressure[pressure_class] -= nregs;
+  else
+curr_reg_pressure[pressure_class] += nregs;
+

sounds a bit more natural to ask if (incr_p)...; and only if so,
check also if the increased current pressure exceeds max:

+  if (max_reg_pressure[pressure_class]  

Re: [PATCH, SMS] Add missing free operation in mark_loop_unsched

2011-12-12 Thread Ayal Zaks
sure, OK, thanks for catching this leak.
Ayal.


On Mon, Dec 12, 2011 at 8:25 AM, Revital Eres revital.e...@linaro.org wrote:
 Hello,

 OK for 3.7?

 Sorry, I meant GCC 4.7.0...

 Thanks,
 Revital


Re: [PING][PR testsuite/47013] Fix SMS testsuite faliures

2011-12-09 Thread Ayal Zaks
These fixes to individual sms testcases are OK.
Thanks,
Ayal.


On Mon, Dec 5, 2011 at 3:07 PM, Revital Eres revital.e...@linaro.org wrote:
 Hello,

 Ping:  http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02444.html

 Thanks,
 Revital


Re: [PATCH SMS 1/2, RFC] Support traversing PS in reverse order

2011-11-23 Thread Ayal Zaks
On Mon, Nov 21, 2011 at 7:07 AM, Revital Eres revital.e...@linaro.org wrote:
 Hello,

 This patch support the estimation of register pressure in SMS.
 Although GCC is in stage 3 I would appreciate comments on it.
 Thanks to Richard and Ayal for discussing the implementation and their 
 insights.

 This part of the patch enables iterating on the partial schedule in the
 reverse order (from the last instruction the the first).

 Tested and bootstrap with the other patches in the series on
 ppc64-redhat-linux,
 enabling SMS on loops with SC 1.

 Comments are welcome.


This looks fine. Please rename rows_reverse to rows_last as discussed,
and simplify the bit that tracks last_in_row in ps_insn_find_column().
Thanks,
Ayal.

 Thanks,
 Revital

 Changelog:
        * modulo-sched.c (rows_reverse): New field in struct partial_schedule.
        (create_partial_schedule, free_partial_schedule,
        ps_insert_empty_row, ps_insn_advance_column,
        ps_insn_find_column, remove_node_from_ps, reset_partial_schedule,
        rotate_partial_schedule, verify_partial_schedule): Update the
        new field.



Re: [PATCH 5/9] [SMS] Support new loop pattern

2011-10-11 Thread Ayal Zaks
On Fri, Sep 30, 2011 at 5:22 PM, Roman Zhuykov zhr...@ispras.ru wrote:
 2011/7/21  zhr...@ispras.ru:
 This patch should be applied only after pending patches by Revital.


 Ping. New version is attached, it suits current trunk without
 additional patches.

Thanks for the ping.

 Also this related patch needs approval:
 http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01804.html

 The loop should meet the following requirements.
 First three are the same as for loop with doloop pattern:
 ...
 The next three describe the control part of new supported loops.
 - the last jump instruction should look like:  pc=(regF!=0)?label:pc, regF is

you'd probably want to bump to next instruction if falling through,
e.g., pc=(regF!=0)?label:pc+4

  flag register;
 - the last instruction which sets regF should be: regF=COMPARE(regC,X), 
 where X
  is a constant, or maybe a register, which is not changed inside a loop;
 - only one instruction modifies regC inside a loop (other can use regC, but 
 not
  write), and it should simply adjust it by a constant: regC=regC+step, where
  step is a constant.

 When doloop is succesfully scheduled by SMS, its number of
 iterations of loop kernel should be decreased by the number of stages in a
 schedule minus one, while other iterations expand to prologue and epilogue.
 In new supported loops such approach can't be used, because some
 instructions can use count register (regC).  Instead of this,
 the final register value X in compare instruction regF=COMPARE(regC,X)
 is changed to another value Y respective to the stage this instruction
 is scheduled (Y = X - stage * step).

making sure this does not underflow; i.e., that the number of
iterations is no less than stage (you've addressed this towards the
end below).


 The main difference from doloop case is that regC can be used by some
 instructions in loop body.
 That's why we are unable to simply adjust regC initial value, but have
 to keep it's value correct on each particular iteration.
 So, we change comparison instruction accordingly.

 An example:
 int a[100];
 int main()
 {
  int i;
  for (i = 85; i  12; i -= 5)
      a[i] = i * i;
  return a[15]-225;
 }
 ARM assembler with -O2 -fno-auto-inc-dec:
        ldr     r0, .L5
        mov     r3, #85
        mov     r2, r0
 .L2:
        mul     r1, r3, r3
        sub     r3, r3, #5
        cmp     r3, #10
        str     r1, [r2, #340]
        sub     r2, r2, #20
        bne     .L2
        ldr     r0, [r0, #60]
        sub     r0, r0, #225
        bx      lr
 .L5:
        .word   a

 Loop body is executed 15 times.
 When compiling with SMS, it finds a schedule with ii=7, stage_count=3
 and following times:
 Stage  Time       Insn
 0          5      mul     r1, r3, r3
 1         10     sub     r3, r3, #5
 1         11     cmp     r3, #10
 1         11     str     r1, [r2, #340]
 1         13     bne     .L2
 2         16     sub     r2, r2, #20


branch is not scheduled last?

 To make new schedule correct the loop body
 should be executed 14 times and we change compare instruction:

the loop itself should execute 13 times.

 regF=COMPARE(regC,X) to regF=COMPARE(regC,Y) where Y = X - stage * step.
 In our example regC is r3, X is 10, step = -5, compare instruction
 is scheduled on stage 1, so it should be Y = 10 - 1 * (-5) = 15.


right. In general, if the compare is on stage s (starting from 0), it
will be executed s times in the epilog, so it should exit the loop
upon reaching Y = X - s * step.

 So, after SMS it looks like:
        ldr     r0, .L5
        mov     r3, #85
        mov     r2, r0
 ;;prologue
        mul     r1, r3, r3      ;;from stage 0 first iteration
        sub     r3, r3, #5      ;;3 insns from stage 1 first iteration
        cmp     r3, #10
        str     r1, [r2, #340]
        mul     r1, r3, r3      ;;from stage 0 second iteration
 ;;body
 .L2:
        sub     r3, r3, #5
        sub     r2, r2, #20
        cmp     r3, #15         ;; new value to compare with is Y=15
        str     r1, [r2, #340]
        mul     r1, r3, r3
        bne     .L2
 ;;epilogue
        sub     r2, r2, #20     ;;from stage 2 pre-last iteration
        sub     r3, r3, #5      ;;3 insns from stage 1 last iteration
        cmp     r3, #10
        str     r1, [r2, #340]
        sub     r2, r2, #20     ;;from stage 2 last iteration

        ldr     r0, [r0, #60]
        sub     r0, r0, #225
        bx      lr
 .L5:
        .word   a

 Real ARM assembler with SMS (after some optimizations and without dead code):
        mov     r3, #85
        ldr     r0, .L8
        mul     r1, r3, r3
        sub     r3, r3, #5
        mov     r2, r0
        str     r1, [r0, #340]
        mul     r1, r3, r3
 .L2:
        sub     r3, r3, #5
        sub     r2, r2, #20
        cmp     r3, #15
        str     r1, [r2, #340]
        mul     r1, r3, r3
        bne     .L2
        str     r1, [r2, #320]
        ldr     r0, [r0, #60]
        sub     r0, r0, #225
        bx      lr
 .L8:
        .word   a


 

Re: [4/4] Make SMS schedule register moves

2011-10-10 Thread Ayal Zaks
On Mon, Oct 10, 2011 at 1:57 PM, Richard Sandiford
richard.sandif...@linaro.org wrote:
 Ayal Zaks ayal.z...@gmail.com writes:
 I agree it's natural to schedule moves for intra-iteration dependencies
 in the normal get_sched_window way.  But suppose we have a dependency:

   A --(T,N,1)-- B

 that requires two moves M1 and M2.  If we think in terms of cycles
 (in the SCHED_TIME sense), then this effectively becomes:

   A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B

 because it is now M1 that is fed by both the loop and the incoming edge.
 But if there is a second dependency:

   A --(T,M,0)-- C

 that also requires two moves, we end up with:

   A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B
                                        --(T,M3,-1)-- B

 and dependence distances of -1 feel a bit weird. :-)  Of course,
 what we really have are two parallel dependencies:

   A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B

   A --(T,M1,0)-- M1' --(T,M2,0)-- M2' --(T,N3,0)-- B

 where M1' and M2' occupy the same position as M1 and M2 in the schedule,
 but are one stage further along.  But we only schedule them once,
 so if we take the cycle/SCHED_TIME route, we have to introduce
 dependencies of distance -1.


 Interesting; had to digest this distance 1 business, a result of
 thinking in cycles instead of rows (or conversely), and mixing
 dependences with scheduling; here's my understanding, based on your
 explanations:

 Suppose a Use is truely dependent on a Def, where both have been
 scheduled at some absolute cycles; think of them as timing the first
 iteration of the loop.
 Assume first that Use appears originally after Def in the original
 instruction sequence of the loop (dependence distance 0). In this
 case, Use requires register moves if its distance D from Def according
 to the schedule is more than ii cycles long -- by the time Use is
 executed, the value it needs is no longer available in the def'd
 register due to intervening occurrences of Def. So in this case, the
 first reg-move (among D/ii) should appear after Def, recording its
 value before the next occurrence of Def overwrites it, and feeding
 subsequent moves as needed before each is overwritten. Thus the
 scheduling window of this first reg-move is within (Def, Def+ii).

 Now, suppose Use appears before Def, i.e., Use is upwards-exposed; if
 it remains scheduled before the Def, no reg-move is needed. If, otoh,
 Def is scheduled (D0 cycles) before Use, breaking the anti-dependence
 between them, (D/ii + 1) reg-moves are needed in order to feed Use
 with the live-in value before Def. So in this case, the first reg-move
 should appear before Def (again feeding subsequent moves as needed
 before each is overwritten). Thus the scheduling window of this first
 reg-move is within (Def-ii, Def).

 In any case, each use requires a specific number of reg-moves, which
 begin either before or after the def; it is always safe (though
 potentially redundant) to start reg-moving before the def -- uses that
 do not need the live-in value will ignore it by feeding from a later
 reg-move.

 Right.  The distance on the Def-Use dependency is effectively transferred
 to the dependency between the Def and first move.

 And we can potentially have both kinds of Use at the same time.
 We then end up scheduling two moves, one in each window, but require
 them to occupy the same row and column.  It feels more convenient to
 schedule the earlier of the two moves.

 Q: if we generate reg-moves starting from before the def, would
 redundant reg-moves be eliminated if no use needs access to live-in
 value? If so, would that simplify their generation? (If not, it may be
 interesting to understand how to improve DCE to catch it.)

 To be clear, the new version of the patch (unlike the old one)
 doesn't emit reg-moves before the def if all uses are distance 0.
 We only do it where some or all uses are distance 1.  The first move
 before the def is always needed in that case.


Understood. The question was whether it would simplify things if we
always emit reg-moves before the def. And rely on DCE to hopefully
eliminate redundancies.

 So redudant moves are confined to the case where (a) we have more
 than one move, (b) we have both distance 0 and distance 1 uses, and
 (c) one of the two distance sets requires more moves than the other.
 If the distance 0 dependencies require more moves than the distance
 1 dependencies, we will have a redudant move in the prologue.
 If the distance 1 dependencies require more moves than the distance
 0 dependencies, we will have a redudant move in the epilogue.
 But I believe that this is already handled by dce.

 (For avoidance of doubt, the new code is more accurate than
 current trunk in this respect.)

 The issue of assigning stages to reg-moves is mostly relevant for
 prolog and epilog generation, which requires and receives special
 attention -- handled very nicely by ps_num_consecutive_stages! Note
 that currently a simple

Re: [4/4] Make SMS schedule register moves

2011-10-09 Thread Ayal Zaks
On Wed, Sep 28, 2011 at 4:49 PM, Richard Sandiford
richard.sandif...@linaro.org wrote:
 Ayal Zaks ayal.z...@gmail.com writes:
  +  /* The cyclic lifetime of move-new_reg starts and ends at move-def
  +     (the instruction that defines move-old_reg).
 
  So instruction I_REG_MOVE (new_reg=reg) must be scheduled before the
  next I_MUST_FOLLOW move/original-def (due to anti-dependence: it
  overwrites reg), but after the previous instance of I_MUST_FOLLOW (due
  to true dependence; i.e. account for latency also). Why do moves,
  except for the one closest to move-def (which is directly dependent
  upon it, i.e. for whom move-def == I_MUST_FOLLOW), have to worry
  about move-def at all? (Or have their cyclic lifetimes start and end
  there?)

 Because the uses of new_reg belong to the same move-def based cycle.


 the cycle (overloaded term; rather iteration in this context) to
 which the uses belong, is inferred from the cycle (absolute schedule
 time) in which they are scheduled, regardless of move-def.

 Just to prove your point about cycle being an overloaded term: I wasn't
 actually meaning it in the sense of (loop) iteration.  I meant a circular
 window based on move-def.


Point proven ;-)

 So (I think this is the uncontroversial bit): [M1] must be scheduled
 cyclically before [B] and cyclically after [C], with the cycle
 based at [B]:

    row 3 after [B]:  empty
    row 4:            [C]
    row 5:            [D]
    row 0:            empty
    row 1:            empty
    row 2:            [A]
    row 3 before [B]: empty

 [M1] could therefore go in row 1.  This part is OK.


 Here's how I see it:
 [M1] feeds [C] which is scheduled at cycle 10, so it must be scheduled
 before cycle 10-M_latency and after cycle 10-ii. [M1] uses the result
 of [B] which is scheduled at cycle 3, so must be scheduled after cycle
 3+B_latency and before cycle 3+ii. Taking all latencies to be 1 and
 ii=6, this yields a scheduling window of cycles [4,9]\cap[4,9]=[4,9];
 if scheduled at cycle 4 it must_follow [C], if scheduled at cycle 9 it
 must_precede [B]. This is identical to the logic behind the
 sched_window of any instruction, based on its dependencies (as you've
 updated not too long ago..), if we do not allow reg_moves (and
 arguably, one should not allow reg_moves when scheduling
 reg_moves...).

 To address the potential erroneous scenario of Loop 2, suppose [A] is
 scheduled as in the beginning in cycle 20, and that [M1] is scheduled
 in cycle 7 (\in[4,9]). Then
 [M2] feeds [D] and [A] which are scheduled at cycles 17 and 20, so it
 must be scheduled before cycle 17-1 and after cycle 20-6. [M2] uses
 the result of [M1], so must be scheduled after cycle 7+1 and before
 cycle 7+6. This yields the desired [14,16]\cap[8,13]=\emptyset.

 I agree it's natural to schedule moves for intra-iteration dependencies
 in the normal get_sched_window way.  But suppose we have a dependency:

   A --(T,N,1)-- B

 that requires two moves M1 and M2.  If we think in terms of cycles
 (in the SCHED_TIME sense), then this effectively becomes:

   A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B

 because it is now M1 that is fed by both the loop and the incoming edge.
 But if there is a second dependency:

   A --(T,M,0)-- C

 that also requires two moves, we end up with:

   A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B
                                        --(T,M3,-1)-- B

 and dependence distances of -1 feel a bit weird. :-)  Of course,
 what we really have are two parallel dependencies:

   A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B

   A --(T,M1,0)-- M1' --(T,M2,0)-- M2' --(T,N3,0)-- B

 where M1' and M2' occupy the same position as M1 and M2 in the schedule,
 but are one stage further along.  But we only schedule them once,
 so if we take the cycle/SCHED_TIME route, we have to introduce
 dependencies of distance -1.


Interesting; had to digest this distance 1 business, a result of
thinking in cycles instead of rows (or conversely), and mixing
dependences with scheduling; here's my understanding, based on your
explanations:

Suppose a Use is truely dependent on a Def, where both have been
scheduled at some absolute cycles; think of them as timing the first
iteration of the loop.
Assume first that Use appears originally after Def in the original
instruction sequence of the loop (dependence distance 0). In this
case, Use requires register moves if its distance D from Def according
to the schedule is more than ii cycles long -- by the time Use is
executed, the value it needs is no longer available in the def'd
register due to intervening occurrences of Def. So in this case, the
first reg-move (among D/ii) should appear after Def, recording its
value before the next occurrence of Def overwrites it, and feeding
subsequent moves as needed before each is overwritten. Thus the
scheduling window of this first reg-move is within (Def, Def+ii).

Now, suppose Use appears before Def, i.e., Use is upwards-exposed; if
it remains

Re: [3/4] SMS: Record moves in the partial schedule

2011-10-03 Thread Ayal Zaks
On Wed, Sep 28, 2011 at 4:53 PM, Richard Sandiford
richard.sandif...@linaro.org wrote:
 Ayal Zaks ayal.z...@gmail.com writes:
 Only request is to document that the register moves are
 placed/assigned-id's in a specific order.

I suppose this is the downside of splitting the patches up, sorry,
but the ids are only ordered for the throw-away loop:

 FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps-reg_moves, i, move)
   add_insn_before (move-insn, ps_first_note (ps, move-def), NULL);

and for the prologue/epilogue code.  Both are replaced in patch 4
with code that doesn't rely on the ordering of ids.
 Ok then, very well. I was mostly referring to the following in
 prologue/epiloque code, which merely relies on assigning all regmoves
 of a node consecutive id's:

 FWIW, the 4/4 that I just posted did finally get rid of the first_reg_move
  nreg_moves fields.

 Here's a slightly updated patch in line with your 4/4 comments.
 The move-def is now always the id of the predecessor, rather than
 the id of the original ddg node producer.  I've adapted the throw-away
 loop accordingly, but there are no other changes.


This is ok.

Just to make sure I follow, the changes were (for this patch):

1. setting up a different move-def for each move

 + move-def = i_reg_move  0 ? first_move + i_reg_move - 1 : i;

instead of the same original def for all

 +  move-def = i;

2. inserting each move right before its own def, bottom-up:

 +  FOR_EACH_VEC_ELT (ps_reg_move_info, ps-reg_moves, i, move)
 +add_insn_before (move-insn, ps_first_note (ps, move-def), NULL);

instead of inserting each move right before the original def, top-down:

 FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps-reg_moves, i, move)
   add_insn_before (move-insn, ps_first_note (ps, move-def), NULL);

Thanks,
Ayal.


 Tested on powerpc64-linux-gnu and with ARM benchmarks.

 Richard

 gcc/
        * modulo-sched.c (ps_insn): Adjust comment.
        (ps_reg_move_info): New structure.
        (partial_schedule): Add reg_moves field.
        (SCHED_PARAMS): Use node_sched_param_vec instead of node_sched_params.
        (node_sched_params): Turn first_reg_move into an identifier.
        (ps_reg_move): New function.
        (ps_rtl_insn): Cope with register moves.
        (ps_first_note): Adjust comment and assert that the instruction
        isn't a register move.
        (node_sched_params): Replace with...
        (node_sched_param_vec): ...this vector.
        (set_node_sched_params): Adjust accordingly.
        (print_node_sched_params): Take a partial schedule instead of a ddg.
        Use ps_rtl_insn and ps_reg_move.
        (generate_reg_moves): Rename to...
        (schedule_reg_moves): ...this.  Remove rescan parameter.  Record each
        move in the partial schedule, but don't emit it here.  Don't perform
        register substitutions here either.
        (apply_reg_moves): New function.
        (duplicate_insns_of_cycles): Use register indices directly,
        rather than finding instructions using PREV_INSN.  Use ps_reg_move.
        (sms_schedule): Call schedule_reg_moves before committing to
        a partial schedule.   Try the next ii if the schedule fails.
        Use apply_reg_moves instead of generate_reg_moves.  Adjust
        call to print_node_sched_params.  Free node_sched_param_vec
        instead of node_sched_params.
        (create_partial_schedule): Initialize reg_moves.
        (free_partial_schedule): Free reg_moves.

 Index: gcc/modulo-sched.c
 ===
 --- gcc/modulo-sched.c  2011-09-28 09:03:10.334301485 +0100
 +++ gcc/modulo-sched.c  2011-09-28 11:24:26.530300781 +0100
 @@ -124,7 +124,9 @@ #define PS_STAGE_COUNT(ps) (((partial_sc
  /* A single instruction in the partial schedule.  */
  struct ps_insn
  {
 -  /* The number of the ddg node whose instruction is being scheduled.  */
 +  /* Identifies the instruction to be scheduled.  Values smaller than
 +     the ddg's num_nodes refer directly to ddg nodes.  A value of
 +     X - num_nodes refers to register move X.  */
   int id;

   /* The (absolute) cycle in which the PS instruction is scheduled.
 @@ -137,6 +139,30 @@ struct ps_insn

  };

 +/* Information about a register move that has been added to a partial
 +   schedule.  */
 +struct ps_reg_move_info
 +{
 +  /* The source of the move is defined by the ps_insn with id DEF.
 +     The destination is used by the ps_insns with the ids in USES.  */
 +  int def;
 +  sbitmap uses;
 +
 +  /* The original form of USES' instructions used OLD_REG, but they
 +     should now use NEW_REG.  */
 +  rtx old_reg;
 +  rtx new_reg;
 +
 +  /* An instruction that sets NEW_REG to the correct value.  The first
 +     move associated with DEF will have an rhs of OLD_REG; later moves
 +     use the result of the previous move.  */
 +  rtx insn;
 +};
 +
 +typedef struct ps_reg_move_info ps_reg_move_info;
 +DEF_VEC_O (ps_reg_move_info);
 +DEF_VEC_ALLOC_O (ps_reg_move_info

Re: [PATCH, SMS 1/2] Avoid generating redundant reg-moves

2011-09-30 Thread Ayal Zaks
On Fri, Sep 30, 2011 at 10:03 AM, Revital Eres revital.e...@linaro.org wrote:
 Hello,

 This
 +  /* Skip instructions that do not set a register.  */
 +  if (set  !REG_P (SET_DEST (set)))
 +    continue;
 is ok. Can you also prevent !set insns from having reg_moves? (To be updated
 once auto_inc insns will be supported, if they'll deserve reg_moves too.)

 I added a check to verify that no reg-moves are created for !set instructions.

 Currently re-testing on ppc64-redhat-linux (bootstrap and regtest) and
 arm-linux-gnueabi (bootstrap c).

 OK to commit once tesing completes?


OK. later case  latter case.

Ayal.



 Thanks,
 Revital

 gcc/
        * modulo-sched.c (generate_reg_moves): Skip instructions that
        do not set a register and verify no regmoves are created for
        !single_set instructions.


 testsuite/
         * gcc.dg/sms-10.c: New file.



Re: [PATCH, SMS 1/2] Avoid generating redundant reg-moves

2011-09-27 Thread Ayal Zaks
On Tue, Sep 27, 2011 at 10:47 AM, Revital Eres revital.e...@linaro.org wrote:
 Hello,

 This
 +  /* Skip instructions that do not set a register.  */
 +  if (set  !REG_P (SET_DEST (set)))
 +    continue;
 is ok. Can you also prevent !set insns from having reg_moves? (To be updated
 once auto_inc insns will be supported, if they'll deserve reg_moves too.)

 Do you mean leaving any anti-dep edges to !set instructions similar to
 what is done for auto_inc addresses in part 2 of this patch?


No, I mean have a if (!set) continue; right after the above in
generate_reg_moves(). For patch 2/2 this would need to be updated to
allow generating reg_moves for non-single_set's which are auto-inc's
(making sure we generate them for the right reg).

Ayal.


 Thanks,
 Revital



Re: [PATCH, SMS 2/2] Support instructions with REG_INC_NOTE (second try)

2011-09-27 Thread Ayal Zaks
On Tue, Sep 27, 2011 at 9:47 AM, Revital Eres revital.e...@linaro.org wrote:
 Hello,

 ok, so if we have an auto-inc'ing insn which defines (auto-inc's) an
 addr register and another (say, result) register, we want to allow the
 result register to have life ranges in excess of ii (by eliminating
 anti-dep edges of distance 1 from uses to def, and generating
 reg_moves if/where needed), but avoid having such life ranges of addr
 (by retaining such anti-dep edges). Right?

 Yes.

 Are these all the edges? We have only one True dependence edge from
 insn 1 to insn 2, but insn 1 is setting two registers both used by
 insn 2 (regardless of what we decide to do with Anti-deps). As for
 Anti-deps of distance 1, we have only one going back from insn 2 to
 insn 1, perhaps corresponding to addr, allowing reg_moves for def1(?).
 But, it won't help def1, because this other Anti-dep will force them
 to be scheduled w/o reg_moves.

 Please ignore the edges in the previous example. It indeed was a mistake,
 sorry about the confusion.  Here are two examples taken from bootstrap
 on PPC of how the address is used; with the current patch applied and
 running with -fmodulo-sched-allow-regmoves:


Ok, this does have two anti-dep edges. But still, only a single true
dependence(?) ... can you see why?

Thanks,
Ayal.


 Node num: 2
 (insn 3681 3678 3682 500 (set (reg:QI 2914 [ MEM[base: D.9586_4130,
 offset: 0B] ])
        (mem:QI (pre_dec:DI (reg:DI 1644 [ ivtmp.687 ])) [0 MEM[base:
 D.9586_4130, offset: 0B]+0 S1 A8])) ../../gcc/libiberty/regex.c:4259
 358 {*movqi_internal}
     (expr_list:REG_INC (reg:DI 1644 [ ivtmp.687 ])
        (nil)))
 OUT ARCS:  [3681 -(T,2,1)- 3681]  [3681 -(T,2,0)- 3682]
 IN ARCS:  [3682 -(A,0,1)- 3681]  [3681 -(T,2,1)- 3681]  [3682
 -(A,0,1)- 3681]  [3682 -(T,2,1)- 3681]
 Node num: 3
 (insn 3682 3681 3683 500 (set (mem:QI (plus:DI (reg:DI 1644 [ ivtmp.687 ])
                (const_int 3 [0x3])) [0 MEM[base: D.9586_4130, offset:
 3B]+0 S1 A8])
        (reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ]))
 ../../gcc/libiberty/regex.c:4259 358 {*movqi_internal}
     (expr_list:REG_DEAD (reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ])
        (nil)))
 OUT ARCS:  [3682 -(A,0,1)- 3681]  [3682 -(A,0,1)- 3681]  [3682
 -(O,0,0)- 7263]  [3682 -(A,0,0)- 3683]  [3682 -(T,2,1)- 3681]
 IN ARCS:  [3681 -(T,2,0)- 3682]

 Another example of usage is as follows (the address register is not
 used in MEM):

 Node num: 0
 (insn 1419 1415 1423 9 (set (mem/f:DI (pre_inc:DI (reg:DI 1882 [
 ivtmp.1636 ])) [3 MEM[base: D.10911_2945, offset: 0B]+0 S8 A64])
        (reg/f:DI 3923)) ../../gcc/libiberty/regex.c:5788 378
 {*movdi_internal64}
     (expr_list:REG_INC (reg:DI 1882 [ ivtmp.1636 ])
        (nil)))
 OUT ARCS:  [1419 -(T,2,1)- 1419]  [1419 -(O,0,0)- 5932]  [1419
 -(O,0,0)- 1449]  [1419 -(T,2,1)- 1434]  [1419 -(T,2,0)- 1434]
 [1419 -(T,2,0)- 1432]  [1419 -(O,0,0)- 1431]  [1419 -(O,0,0)- 1427]
  [1419 -(O,0,0)- 1423]
 IN ARCS:  [1419 -(T,2,1)- 1419]  [1432 -(A,0,1)- 1419]  [1449
 -(O,0,1)- 1419]  [1434 -(A,0,1)- 1419]  [1431 -(O,0,1)- 1419]
 [1427 -(O,0,1)- 1419]  [1423 -(O,0,1)- 1419]
 Node num: 4
 (insn 1432 1431 1433 9 (set (reg:DI 2632)
        (plus:DI (reg/v/f:DI 1058 [ reg_info ])
            (reg:DI 1882 [ ivtmp.1636 ])))
 ../../gcc/libiberty/regex.c:5543 79 {*adddi3_internal1}
     (nil))
 OUT ARCS:  [1432 -(A,0,1)- 1419]  [1432 -(T,1,0)- 1433]
 IN ARCS:  [1419 -(T,2,0)- 1432]

 OK for mainline?


 OK, with the following comments:

 Thanks, will address the comments and re-submit.

 In other words, one would expect to see two Anti-dep edges from insn 2
 to insn 1, right?

 Yes, that's indeed the case in the first example above.

 Thanks,
 Revital



Re: [PATCH, SMS 1/2] Avoid generating redundant reg-moves

2011-09-26 Thread Ayal Zaks
OK for mainline?

Doh, hard to believe we never checked that an insn defines a register
before spitting out reg_moves for it ... nice catch.

This
+  /* Skip instructions that do not set a register.  */
+  if (set  !REG_P (SET_DEST (set)))
+    continue;
is ok. Can you also prevent !set insns from having reg_moves? (To be
updated once auto_inc insns will be supported, if they'll deserve
reg_moves too.)

Ayal.


 On Mon, Sep 26, 2011 at 7:31 AM, Revital Eres revital.e...@linaro.org wrote:

 Hello,

 The attached patch contains a fix to generate_reg_moves
 function. Currently we can generate reg-moves for stores which are later
 eliminated.  This happens when we have mem dependency with distance 1
 and as a result the number of regmoves is at least 1 based on the
 following
 calculation taken from generate_reg_moves ():

 if (e-distance == 1)
   nreg_moves4e = (SCHED_TIME (e-dest) - SCHED_TIME (e-src) + ii) / ii;

 This is an example of register move generated in such cases:

  reg_move = (insn 152 119 75 4 (set (reg:SI 231)
        (mem:SI (pre_modify:DI (reg:DI 215)
                (plus:DI (reg:DI 215)
                    (reg:DI 171 [ ivtmp.42 ]))) [3 MEM[base:
 pretmp.27_65, index: ivtmp.42_9, offset: 0B]+0 S4 A32])) -1
     (nil))

 When not handling REG_INC instructions this was not a problem as these
 reg-moves were removes by dead code elimination.
 for example:

 insn 1) mem[x]  = ...
 insn 2) .. = mem[y]

 When reg-move reg1 = mem [x] was generated mem[x] is not been used in
 insn 2 and thus reg1 could be eliminated.
 But with REG_INC this is different because the reg-move instruction
 remains and leads to bad gen.
 The attached tescase capture this case.

 Tested and bootstrap with patch 2 on ppc64-redhat-linux
 enabling SMS on loops with SC 1. On arm-linux-gnueabi bootstrap c on
 top of the set of patches that support do-loop pattern
 (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01807.html) which solves
 the bootstrap failure on ARM with SMS flags.

 OK for mainline?

 Thanks,
 Revital

 gcc/
        * modulo-sched.c (generate_reg_moves): Skip instructions that
        do not set a register.


 testsuite/
         * gcc.dg/sms-10.c: New file.



Re: [PATCH, SMS 2/2] Support instructions with REG_INC_NOTE (second try)

2011-09-26 Thread Ayal Zaks
On Mon, Sep 26, 2011 at 7:31 AM, Revital Eres revital.e...@linaro.org wrote:
 Hello,

 This patch extends the implementation to support instructions with
 REG_INC notes.
 It addresses the comments from the previous submission:
 http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01299.html.


ok, so if we have an auto-inc'ing insn which defines (auto-inc's) an
addr register and another (say, result) register, we want to allow the
result register to have life ranges in excess of ii (by eliminating
anti-dep edges of distance 1 from uses to def, and generating
reg_moves if/where needed), but avoid having such life ranges of addr
(by retaining such anti-dep edges). Right?

 btw, regarding your previous question about the usage of the address
 register been auto inc, apparently it can be used as follows:

 insn 1) def1 = MEM [ pre_dec (addr) ]
 out edges: [1 -(T,2,1)- 1]  [1 -(T,2,0)- 2]
 in edges: [1 -(T,2,1)- 1]  [2 -(T,2,1)- 1]  [2 -(A,0,1)-1]

 insn 2) MEM [ addr + 3 ] = def1
 out edges:  [2 -(T,2,1)- 1]  [2 -(A,0,1)-1]
 in edges: [1 -(T,2,0)- 2]


Are these all the edges? We have only one True dependence edge from
insn 1 to insn 2, but insn 1 is setting two registers both used by
insn 2 (regardless of what we decide to do with Anti-deps). As for
Anti-deps of distance 1, we have only one going back from insn 2 to
insn 1, perhaps corresponding to addr, allowing reg_moves for def1(?).
But, it won't help def1, because this other Anti-dep will force them
to be scheduled w/o reg_moves.

 Reg-moves were not created for the address when testing on ppc.

ok; note that the example above shows one could potentially have an
insn 2 scheduled in a later stage than insn 1, wanting to feed off an
earlier version of the pre_dec'd addr. (In this case, it would
probably be better to update the offset (3) than use a reg_move for
the addr. (Such folding might work for other cases as well(?))).


 Tested and bootstrap with patch 1 on ppc64-redhat-linux
 enabling SMS on loops with SC 1. On arm-linux-gnueabi bootstrap
 c on top of the set of patches that support do-loop pattern
 (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01807.html) which solves
 the bootstrap failure on ARM with SMS flags.

 OK for mainline?


OK, with the following comments:

Make sure reg_moves are generated for the correct (result, not addr)
register, in generate_reg_moves().

beenbeing (multiple appearances).

Add a note that autoinc_var_is_used_p (rtx def_insn, rtx use_insn)
doesn't need to consider the specific address register; no reg_moves
will be allowed for any life range defined by def_insn and used by
use_insn, if use_insn uses an address register auto-inc'ed by
def_insn.

In other words, one would expect to see two Anti-dep edges from insn 2
to insn 1, right?

Ayal.


 Thanks,
 Revital


        * ddg.c (autoinc_var_is_used_p): New function.
        (create_ddg_dep_from_intra_loop_link,
        add_cross_iteration_register_deps): Call it.
        * modulo-sched.c (sms_schedule): Handle instructions with REG_INC.



Re: [3/4] SMS: Record moves in the partial schedule

2011-09-22 Thread Ayal Zaks
 Only request is to document that the register moves are
 placed/assigned-id's in a specific order.

I suppose this is the downside of splitting the patches up, sorry,
but the ids are only ordered for the throw-away loop:

 FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps-reg_moves, i, move)
   add_insn_before (move-insn, ps_first_note (ps, move-def), NULL);

and for the prologue/epilogue code.  Both are replaced in patch 4
with code that doesn't rely on the ordering of ids.
Ok then, very well. I was mostly referring to the following in
prologue/epiloque code, which merely relies on assigning all regmoves
of a node consecutive id's:

  i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));

  /* The reg_moves start from the *first* reg_move backwards.  */
 !    i_reg_move = SCHED_FIRST_REG_MOVE (u) + (i_reg_moves - 1);

Ayal.

2011/9/22 Richard Sandiford richard.sandif...@linaro.org

 Ayal Zaks ayal.z...@gmail.com writes:
  Richard Sandiford richard.sandif...@linaro.org wrote on 30/08/2011
  03:10:50 PM:
 
  From: Richard Sandiford richard.sandif...@linaro.org
  To: gcc-patches@gcc.gnu.org
  Cc: Ayal Zaks/Haifa/IBM@IBMIL
  Date: 30/08/2011 03:10 PM
  Subject: [3/4] SMS: Record moves in the partial schedule
 
 
  This patch adds infrastructure that will be used by the final patch.
  Specifically:
 
    - it splits the generation of register moves into two:
  schedule_reg_moves
      records moves in the partial schedule, while apply_reg_moves makes
  the
      register substitutions.
 
      This patch doesn't actually schedule the moves.  Instead, there's
      some throw-away code in apply_reg_moves to emit the moves in the
      same as we do now.  That's throw-away code that will be removed
      in the final patch.
 
    - schedule_reg_moves is allowed to fail.  We then try again with the
      next ii (subject to the usual ii limits).
 
      In this patch, schedule_reg_moves always returns true.
 
    - The partial schedule uses ids to represent register moves.
      The first register move has id g-num_nodes.
 
  Richard
  This is fine.

 Thanks.

  Only request is to document that the register moves are
  placed/assigned-id's in a specific order.

 I suppose this is the downside of splitting the patches up, sorry,
 but the ids are only ordered for the throw-away loop:

  FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps-reg_moves, i, move)
    add_insn_before (move-insn, ps_first_note (ps, move-def), NULL);

 and for the prologue/epilogue code.  Both are replaced in patch 4
 with code that doesn't rely on the ordering of ids.

 I'd rather the code didn't rely on any ordering here.  If any future
 code is added that needs to know more about the moves, I think it should
 use the move structure instead of the ids.  (There's already a fair bit
 of info in the structure, but we could add more later if we need it.)

  Functionality-wise it results in identical code as current version,
  right?

 Yeah, that's right.  Only patch 4 does anything useful to the output.

 Richard




Re: [4/4] Make SMS schedule register moves

2011-09-22 Thread Ayal Zaks
 Richard Sandiford richard.sandif...@linaro.org wrote on 30/08/2011
 03:29:26 PM:

  From: Richard Sandiford richard.sandif...@linaro.org
  To: gcc-patches@gcc.gnu.org
  Cc: Ayal Zaks/Haifa/IBM@IBMIL
  Date: 30/08/2011 03:29 PM
  Subject: [4/4] Make SMS schedule register moves
 
  This is the move-scheduling patch itself.  It should be fairly
  self-explanatory.  Let me know if it isn't, and I'll try to improve
  the commentary.

 
  One potentially controversial change is to the way we handle moves
  in the prologue and epilogue.  The current code uses a conservative


 - conservativeaccurate? (Your approach seems to be more conservative)


  check to decide which stages need which moves.  This check copes
  with values that are live before the loop, and emits some moves
  that aren't actually needed.  The code also emits chains of moves
  that can be trivially optimised through propagation.  We rely on
  later patches to clean this up for us (and they do).
 
  So, rather than keep a rather complicated check here, I've simply
  emitted the moves for all stages.  In principle, that will generate
  a little more dead code than now in cases where chains of two moves
  are needed, but that seems to be very rare (when moves are scheduled).


indeed, it's better to simplify code generation and rely on subsequent
cleanups. Not sure about generating more (dead?) code in principle;
could you elaborate, for the record?


 
  Richard
 
 
  gcc/
     * modulo-sched.c (extend_node_sched_params): New function.
     (print_node_sched_params): Print the stage.
     (set_columns_for_row, schedule_reg_move): New functions.
     (set_columns_for_ps): Move up file and use set_columns_for_now.


typo: nowrow (unless you intend this to be temporary ;-)


     (schedule_reg_moves): Call extend_node_sched_params and
     schedule_reg_move.  Extend size of uses bitmap.  Return false
     if a move could not be scheduled.
     (apply_reg_moves): Don't emit moves here.
     (permute_partial_schedule): Handle register moves.
     (duplicate_insns_of_cycles): Remove for_prolog.  Always emit moves.


Always emit all moves


     (generate_prolog_epilog): Update calls accordingly.
 
  Index: gcc/modulo-sched.c



As I'm going over the patch, let me make sure I understand what's going on:

..
 +/* Try to schedule the move with ps_insn identifier I_REG_MOVE in PS.
 +   The source of the move is provided by I_MUST_FOLLOW, which has
 +   already been scheduled.

The source y of an x=y move is defined by the previous iteration of
the next move y=z (or by the original y=... move-def). We schedule
moves one after the other bottom-up, starting from the original
move-def moving backwards in cycles. This way, the instruction
I_MUST_FOLLOW defining y is always already scheduled when we come to
schedule the dependent I_REG_MOVE x=y move.

..
 +  /* The cyclic lifetime of move-new_reg starts and ends at move-def
 +     (the instruction that defines move-old_reg).

So instruction I_REG_MOVE (new_reg=reg) must be scheduled before the
next I_MUST_FOLLOW move/original-def (due to anti-dependence: it
overwrites reg), but after the previous instance of I_MUST_FOLLOW (due
to true dependence; i.e. account for latency also). Why do moves,
except for the one closest to move-def (which is directly dependent
upon it, i.e. for whom move-def == I_MUST_FOLLOW), have to worry
about move-def at all? (Or have their cyclic lifetimes start and end
there?)

In general, there's a distinction between the cycle an instruction
is scheduled at (for the first iteration), which is an absolute
arbitrary integer, and the row it is placed in the PS, which is
between 0 and ii-1, and is simply cycle mod ii. When scheduling, it's
clearer to reason about cycles, especially if your window includes a
row twice.

In addition to the (true+anti) dependences involving I_REG_MOVE with
I_MUST_FOLLOW, it has (true+anti) dependences with each use it feeds.

We could alternatively set these dependencies of I_REG_MOVE on
I_MUST_FOLLOW, and on its uses, and call get_sched_window(). But it's
presumably simpler to handle it directly here.

Right?
Ayal.


Re: [3/4] SMS: Record moves in the partial schedule

2011-09-21 Thread Ayal Zaks
Richard Sandiford richard.sandif...@linaro.org wrote on 30/08/2011
03:10:50 PM:

 From: Richard Sandiford richard.sandif...@linaro.org
 To: gcc-patches@gcc.gnu.org
 Cc: Ayal Zaks/Haifa/IBM@IBMIL
 Date: 30/08/2011 03:10 PM
 Subject: [3/4] SMS: Record moves in the partial schedule


 This patch adds infrastructure that will be used by the final patch.
 Specifically:

   - it splits the generation of register moves into two:
schedule_reg_moves
     records moves in the partial schedule, while apply_reg_moves makes
the
     register substitutions.

     This patch doesn't actually schedule the moves.  Instead, there's
     some throw-away code in apply_reg_moves to emit the moves in the
     same as we do now.  That's throw-away code that will be removed
     in the final patch.

   - schedule_reg_moves is allowed to fail.  We then try again with the
     next ii (subject to the usual ii limits).

     In this patch, schedule_reg_moves always returns true.

   - The partial schedule uses ids to represent register moves.
     The first register move has id g-num_nodes.

 Richard
This is fine. Only request is to document that the register moves are
placed/assigned-id's in a specific order. Functionality-wise it
results in identical code as current version, right?

Thanks,
Ayal.


Re: [2/4] SMS: Use ids to represent ps_insns

2011-09-13 Thread Ayal Zaks
2011/9/13 Richard Sandiford richard.sandif...@linaro.org

 Ayal Zaks ayal.z...@gmail.com writes:
  So instead of navigating directly from
  ps_insn-ddg_node-node_sched_params, we now use indices and lookup
  pointees in ddg_node and node_sched_params arrays. A bit of a
  nuisance, but it's ok with me.

 Well, IMO, ps_insn-ddg_node-node_sched_params is the same amount
 of indirection as node_sched_params[ps_insn-id].


Agreed. Both involve one indirection. The difference is in lookup
versus using each single direct pointer access.


  o One could still achieve the goal of having ps_insn's with
  node_sched_params but free of ddg_node's, w/o penalizing the existing
  direct access from ddg_node-node_sched_params, and w/o replicating
  the interface. If you add an (insn field and an) aux union to ps_insn,
  whose info points to your node_sched_params, you could reuse the same
  set of macros. Admittedly, it's a space-time tradeoff, where the space
  is probably not a major concern, and there are other places to look
  for first in sms to save time.

 I'm not sure what you mean by replicating the interface.  There's
 still only one interface for getting schedule params: everything still
 goes through the SCHED_* macros.

 If anything, I think having a way of going directly from the ps_insn to
 the sched params _would_ replicate the interface, because some parts of
 SMS want the scheduling parameters for a ddg node rather than a ps_insn.
 So some parts would (presumably) still use SCHED_* macros for a node,
 while the rest would use some other interface for ps_insns.

If we add to ps_insn the same aux structure as ddg_node has, with info
pointing to node_sched_params, we could reuse the same set of macros.


 But I'll do that if you think it's better.


No need, your solution is fine. Just wanted to clarify the options.


   If we do, I think we
 should remove the current SCHED_* macros and just have one that
 goes from a ddg node to the parameters structure itself.
 So SCHED_TIME (node) becomes SCHED_PARAMS (node)-time, etc.
 Code that operates on ps_insns could just use pi-sched_params-time.

 But if we want to keep SCHED_TIME, etc., macros that can be used for
 everything, then I don't see how changes to ps_insn would help.

  o first_reg_move and nreg_moves will be redundant for regmoves, right?
  No refactoring is perfect...

 After the patches, they're only there for debugging.  I did wonder
 whether I should just remove them.


I'd prefer not to keep useless fields around. If left only for
debugging, better document and/or rename.


  o it could be useful to have a debug version which checks array
  bounds, in case one feeds a bad index.

 FWIW, VEC does this for us.


ok, right, in patch [3/4], for node_sched_params and ps_reg_move_info.


  o and a minor comment below:
 
  /* The scheduling parameters held for each node. */
  typedef struct node_sched_params
  {
  - int asap; /* A lower-bound on the absolute scheduling cycle. */
  int time; /* The absolute scheduling cycle (time = asap). */
 
  Please fix/remove the (time = asap) comment, as there's no asap
  there any more.

 OK.

Thanks,
Ayal.


 Richard


Re: [2/4] SMS: Use ids to represent ps_insns

2011-09-12 Thread Ayal Zaks
Richard Sandiford richard.sandif...@linaro.org wrote on 30/08/2011
03:03:59 PM:

 From: Richard Sandiford richard.sandif...@linaro.org

 To: gcc-patches@gcc.gnu.org

 Cc: Ayal Zaks/Haifa/IBM@IBMIL

 Date: 30/08/2011 03:05 PM

 Subject: [2/4] SMS: Use ids to represent ps_insns


 Instructions in a partial schedule are currently represented as a
 ddg node. This patch uses a more abstract id instead. At the moment,
 the ids map directly to ddg nodes, but the next patch will add register
 moves to the end.

 One slight advantage of using ids is that we can leave the ASAP value
 on the node; we don't need to copy it across to the scheduling info.

 (Later patches use the same scheduling info for moves, for which an ASAP
 value would be wasted space.)

 Richard


OK.

So instead of navigating directly from
ps_insn-ddg_node-node_sched_params, we now use indices and lookup
pointees in ddg_node and node_sched_params arrays. A bit of a
nuisance, but it's ok with me. A couple of thoughts:

o One could still achieve the goal of having ps_insn's with
node_sched_params but free of ddg_node's, w/o penalizing the existing
direct access from ddg_node-node_sched_params, and w/o replicating
the interface. If you add an (insn field and an) aux union to ps_insn,
whose info points to your node_sched_params, you could reuse the same
set of macros. Admittedly, it's a space-time tradeoff, where the space
is probably not a major concern, and there are other places to look
for first in sms to save time.

o first_reg_move and nreg_moves will be redundant for regmoves, right?
No refactoring is perfect...

o it could be useful to have a debug version which checks array
bounds, in case one feeds a bad index.

o and a minor comment below:

 /* The scheduling parameters held for each node. */
 typedef struct node_sched_params
 {
 - int asap; /* A lower-bound on the absolute scheduling cycle. */
 int time; /* The absolute scheduling cycle (time = asap). */

Please fix/remove the (time = asap) comment, as there's no asap
there any more.

Thanks,
Ayal.


Re: [1/4] SMS: remove register undo list

2011-09-11 Thread Ayal Zaks
Resending; didn't seem to go through.

-- Forwarded message --
From: Ayal Zaks ayal.z...@gmail.com
Date: 2011/9/11
Subject: gcc-patches@gcc.gnu.org
To: Richard Sandiford ‫richard.sandif...@linaro.org‬


 Richard Sandiford richard.sandif...@linaro.org wrote on 30/08/2011
02:58:22 PM:

 From: Richard Sandiford richard.sandif...@linaro.org

 To: gcc-patches@gcc.gnu.org

 Cc: Ayal Zaks/Haifa/IBM@IBMIL

 Date: 30/08/2011 02:58 PM

 Subject: [1/4] SMS: remove register undo list


 This patch removes the (unused) undo_replace_buff_elem list.
 Verges on the obvious, but still.

Sure, this is fine.

Thanks for the clean-up,

Ayal.


 Patch 3 splits the generation of moves into two: one function to record
 what needs to happen, and another function to actually do it. It's then
 easy to bail out if we decide that we don't want the moves.

 Richard


 gcc/
 * modulo-sched.c (undo_replace_buff_elem): Delete.
 (generate_reg_moves): Don't build and return an undo list.
 (free_undo_replace_buff): Delete.
 (sms_schedule): Adjust call to generate_reg_moves.
 Don't call free_undo_replace_buff.

 Index: gcc/modulo-sched.c
 ===
 --- gcc/modulo-sched.c 2011-08-24 12:38:37.171362916 +0100
 +++ gcc/modulo-sched.c 2011-08-24 12:56:17.754942951 +0100
 @@ -165,17 +165,6 @@ struct partial_schedule
 int stage_count; /* The stage count of the partial schedule. */
 };

 -/* We use this to record all the register replacements we do in
 - the kernel so we can undo SMS if it is not profitable. */
 -struct undo_replace_buff_elem
 -{
 - rtx insn;
 - rtx orig_reg;
 - rtx new_reg;
 - struct undo_replace_buff_elem *next;
 -};
 -
 -

 static partial_schedule_ptr create_partial_schedule (int ii, ddg_ptr, int 
 history);
 static void free_partial_schedule (partial_schedule_ptr);
 @@ -456,13 +445,12 @@ print_node_sched_params (FILE *file, int
 nreg_moves = --- + 1 - { dependence.
 ii { 1 if not.
 */
 -static struct undo_replace_buff_elem *
 +static void
 generate_reg_moves (partial_schedule_ptr ps, bool rescan)
 {
 ddg_ptr g = ps-g;
 int ii = ps-ii;
 int i;
 - struct undo_replace_buff_elem *reg_move_replaces = NULL;

 for (i = 0; i  g-num_nodes; i++)
 {
 @@ -543,22 +531,6 @@ generate_reg_moves (partial_schedule_ptr

 EXECUTE_IF_SET_IN_SBITMAP (uses_of_defs[i_reg_move], 0, i_use, sbi)
 {
 - struct undo_replace_buff_elem *rep;
 -
 - rep = (struct undo_replace_buff_elem *)
 - xcalloc (1, sizeof (struct undo_replace_buff_elem));
 - rep-insn = g-nodes[i_use].insn;
 - rep-orig_reg = old_reg;
 - rep-new_reg = new_reg;
 -
 - if (! reg_move_replaces)
 - reg_move_replaces = rep;
 - else
 - {
 - rep-next = reg_move_replaces;
 - reg_move_replaces = rep;
 - }
 -
 replace_rtx (g-nodes[i_use].insn, old_reg, new_reg);
 if (rescan)
 df_insn_rescan (g-nodes[i_use].insn);
 @@ -568,21 +540,6 @@ generate_reg_moves (partial_schedule_ptr
 }
 sbitmap_vector_free (uses_of_defs);
 }
 - return reg_move_replaces;
 -}
 -
 -/* Free memory allocated for the undo buffer. */
 -static void
 -free_undo_replace_buff (struct undo_replace_buff_elem *reg_move_replaces)
 -{
 -
 - while (reg_move_replaces)
 - {
 - struct undo_replace_buff_elem *rep = reg_move_replaces;
 -
 - reg_move_replaces = reg_move_replaces-next;
 - free (rep);
 - }
 }

 /* Update the sched_params (time, row and stage) for node U using the II,
 @@ -1472,8 +1429,6 @@ sms_schedule (void)
 }
 else
 {
 - struct undo_replace_buff_elem *reg_move_replaces;
 -
 if (!opt_sc_p)
 {
 /* Rotate the partial schedule to have the branch in row ii-1. */
 @@ -1523,13 +1478,11 @@ sms_schedule (void)
 /* The life-info is not valid any more. */
 df_set_bb_dirty (g-bb);

 - reg_move_replaces = generate_reg_moves (ps, true);
 + generate_reg_moves (ps, true);
 if (dump_file)
 print_node_sched_params (dump_file, g-num_nodes, g);
 /* Generate prolog and epilog. */
 generate_prolog_epilog (ps, loop, count_reg, count_init);
 -
 - free_undo_replace_buff (reg_move_replaces);
 }

 free_partial_schedule (ps);


Re: [PATCH, SMS] Minor misc. fixes

2011-09-11 Thread Ayal Zaks
Copying the lists..

‎-- Forwarded message --‎
From: Ayal Zaks ‎ayal.z...@gmail.com‎
Date: 2011/9/11
Subject: Re: [PATCH, SMS] Minor misc. fixes
To: Revital Eres ‫revital.e...@linaro.org‬


2011/9/8 Revital Eres revital.e...@linaro.org

 Hello,

 The attached patch contains minor fixes.

 Currently testing and bootstrap on ppc64-redhat-linux enabling SMS on
 loops with SC 1.

 OK for mainline once testing completes?


OK.
While we're at it, an alternative would be to have
remove_node_from_ps() assert its own (parameters and) return value.
That is, replace if (c) return false by assert (!c) and have it
return void if successful. There's not much you can do if it returns
false. That would check its other invocation too.

Ayal.



 Thanks,
 Revital


 Changelog

        * modulo-sched.c (optimize_sc): Call remove_node_from_ps outside
        of gcc_assert.
        (sms_schedule): Add print info.


Re: [PATCH, SMS] Support instructions with REG_INC_NOTE (re-submisson)

2011-08-15 Thread Ayal Zaks
Ok, so this extends the infrastructure to support insns which set an
arbitrary number of registers, but currently specifically handles only
REG_INC situations (which set two registers). I'm not against
{0,1,infinity}, but wonder if this case really deserves the
complexity: post/pre-inc/decrementing load insns may need regmoves for
the register loaded, due to the latency of the load and desire to
schedule associated uses farther than ii cycles away (as do regular
loads), but do they also need regmoves for the address register being
post/pre-inc/decremented? Its latency should not be long, and it's
often feeding only itself so regmoves are not needed/won't help. If
not, perhaps a simpler solution is to allow REG_INC insns but disallow
their address register from being regmove'd, dedicating the single
regmove info for the value loaded.

Are there actually cases where you need the address register to regmove?

Ayal.

2011/8/15 Revital Eres revital.e...@linaro.org

 Hello,

 This is a re-submission of the patch to support instructions with 
 REG_INC_NOTE.
 (http://gcc.gnu.org/ml/gcc-patches/2011-04/msg01309.html)

 It contains a minor change from the previous submission suggested by
 Richard Sandiford: to use reg_referenced_p instead of
 rtx_referenced_p.

 The patch supports instructions with auto inc/dec operations as to
 support such instructions we can not assume that there is only one
 definition in the instruction. This assumption is currently used to
 generate reg-moves information. Because the auto inc/dec instruction
 defines more then one register we need to generate different reg-moves
 for each of the definitions. At some point in the SMS procedure we
 need to find
 the first reg-move generated for a specific definition(*) and if there
 is more than one definition in the instruction (lets call it insn1)
 we can not simply assume it's first reg-move instruction is the first
 instruction right before insn1.

 for example, the following instruction defines two variables: x and i:
 x = mem [POST_INC (i)]

 lets say both of them need reg-move

 insn 1) reg_move_i = i
 insn 2) reg_move_x = x
 insn 3) x = mem [POST_INC (i)]

 then in order to reach the first reg-move of i we can not just go to
 insn 2 like in the current implementation, so we need to save for each
 definition in the instruction it's first reg-move instruction.

 (*) I'm referring to the generation of prologue and epilogue
 (duplicate_insns_of_cycles) which uses first_reg_move field in
 node_sched_params structure, and SCHED_FIRST_REG_MOVE definition in
 modulo-sched.c.

 Reg-tested and bootstrap on ppc64-redhat-linux enabling SMS on loops with SC 
 1.

 OK for mainline?

 Thanks,
 Revital


        * modulo-sched.c (record_inc_dec_insn_info,
        free_node_sched_params): New functions.
        (SCHED_FIRST_REG_MOVE, SCHED_NREG_MOVES): Remove.
        (struct regmove_info): New.
        (insn_regmove_info): New field in node_sched_params.
        (print_node_sched_params): Print information for all the
        definitions in the instructions.
        (generate_reg_moves, duplicate_insns_of_cycles,
        set_node_sched_params): Adjust the code to handle instructions
        that have multiple definitions.
        (sms_schedule): Handle loops that contain instructions with
        FIND_REG_INC_NOTE and call free_node_sched_params.


Re: Patches ping

2011-07-29 Thread Ayal Zaks
 [PATCH, SMS 3/4] Optimize stage count
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01341.html

This patch for minimizing the stage count (which also refactors and
cleans up the code) is approved. Have some minor comments below,
followed by some thoughts for possible follow-up improvements.
Thanks,
Ayal.

Changelog:

       (sms_schedule_by_order): Update call to get_sched_window.
       all set_must_precede_follow.
    ^^^
    call


+/* Set bitmaps TMP_FOLLOW and TMP_PRECEDE to MUST_FOLLOW and MUST_PRECEDE
+   respectively only if cycle C falls in the scheduling window boundaries
    ^^
 on the border of


   sbitmap tmp_precede = NULL;
   sbitmap tmp_follow = NULL;
are redundantly reset in set_must_precede_follow().



+/* Update the sched_params for node U using the II,
    ^
(time, row and stage)
 +   the CYCLE of U and MIN_CYCLE.  */
Please also clarify that we're not simply taking
  SCHED_STAGE (u) = CALC_STAGE_COUNT (SCHED_TIME (u), min_cycle, ii);
because the stages are aligned on cycle 0.


+  /* First, normailize the partial schedualing.  */
   ^   ^

+   /* Try to achieve optimized SC by normalizing the partial
+  schedule (having the cycles start from cycle zero). The branch
   ^
+  location must be placed in row ii-1 in the final scheduling.

+  If that's not the case after the normalization then try to
    ^^
+  move the branch to that row if possible.  */
    ^
    If failed, shift all instructions to position the branch in row ii-1.


For consistency and clarity, may be instead of:

+   /* Bring the branch to cycle -1.  */
+   int amount = SCHED_TIME (g-closing_branch) + 1;
it would be better to have:

+   /* Bring the branch to cycle ii-1.  */
+   int amount = SCHED_TIME (g-closing_branch) - (ii - 1);


Some thoughts on possible improvements (not mandatory; especially
given the delay in approval, sorry..thanks for the ping):
o Have optimize_sc() take care of all possible rotations doing the
best it can, without returning an indication which leaves subsequent
(suboptimal) processing to the caller.
o Instead of removing the branch and then trying to get it back into
the same cycle if you can't place it in row ii-1, consider keeping it
in its place and removing it only if you succeed to schedule it
(also..) in row ii-1.
o May be worthwhile to apply more refactoring, so that the code to
reschedule the branch in row ii-1 reuses more of the general code for
scheduling an instruction (possibly setting end = start + end), but
avoid splitting a row.
o Would be interesting to learn of loops that still have suboptimal
SC's, to see if it's still an issue.
Ayal.


2011/7/20 Revital Eres revital.e...@linaro.org

 Hello,

 [PATCH, SMS 3/4] Optimize stage count
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01341.html

 [PATCH, SMS 4/4] Misc. fixes
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01342.html

 [PATCH, SMS] Fix calculation of issue_rate
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01344.html

 Thanks,
 Revital


Re: Patches ping

2011-07-29 Thread Ayal Zaks
[PATCH, SMS 4/4] Misc. fixes
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01342.html
Sure, this is fine.
(Sorry for all the previous '?'s..).
Thanks,
Ayal.

2011/7/20 Revital Eres revital.e...@linaro.org

 Hello,

 [PATCH, SMS 3/4] Optimize stage count
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01341.html

 [PATCH, SMS 4/4] Misc. fixes
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01342.html

 [PATCH, SMS] Fix calculation of issue_rate
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01344.html

 Thanks,
 Revital


Re: Patches ping

2011-07-29 Thread Ayal Zaks
 [PATCH, SMS] Fix calculation of issue_rate
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01344.html

This is ok (with the updated Changelog). Alternatively, we can have a
local variable for holding the issue_rate.
Ayal.


2011/7/20 Revital Eres revital.e...@linaro.org:
 Hello,

 [PATCH, SMS 3/4] Optimize stage count
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01341.html

 [PATCH, SMS 4/4] Misc. fixes
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01342.html

 [PATCH, SMS] Fix calculation of issue_rate
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01344.html

 Thanks,
 Revital



Re: [PATCH, SMS 1/4] Fix calculation of row_rest_count

2011-06-15 Thread Ayal Zaks
Revital Eres revital.e...@linaro.org wrote on 14/06/2011 09:27:32 AM:

 From: Revital Eres revital.e...@linaro.org
 To: Ayal Zaks/Haifa/IBM@IBMIL
 Cc: gcc-patches@gcc.gnu.org, Patch Tracking patc...@linaro.org
 Date: 14/06/2011 09:27 AM
 Subject: Re: [PATCH, SMS 1/4] Fix calculation of row_rest_count

 Hello,

  Please add the following:
  o A clarification that rows_length is used only (as an optimization) to
  back off quickly from trying to schedule a node in a full row; that is,
to
  avoid running through futile DFA state transitions.
  o An assert that ps-rows_length[i] equals the number of nodes in ps-
rows
  [i] (e.g., in verify_partial_schedule(); and then recheck...).

 Done. Besides the additions to address your comments I also added two
 more bits to update rows_length field into rotate_partial_schedule ()
 and ps_insert_empty_row () that were missing in the previous
 implementation of the patch.


Glad the additional assertion already proved useful ;-)

 The patch was re-tested on top of the patch to fix violation of memory
 dependence (http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00960.html)
 as follows: On ppc64-redhat-linux regtest as well as bootstrap with
 SMS flags, enabling SMS also on loops with stage count 1.  Regtested
 on SPU.
 On arm-linux-gnueabi bootstrap c language with SMS
 flags enabling SMS also on loops with stage count 1 and currently
 regtseted on c,c++.

 OK for mainline with these changes once regtests on arm-linux-gnueabi
completes?


OK.
Thanks,
Ayal.

 Thanks,
 Revital


 * modulo-sched.c (struct ps_insn): Remove row_rest_count field.
 (struct partial_schedule): Add rows_length field.
 (verify_partial_schedule): Check rows_length.
 (ps_insert_empty_row): Handle rows_length.
 (create_partial_schedule): Likewise.
 (free_partial_schedule): Likewise.
 (reset_partial_schedule): Likewise.
 (create_ps_insn): Remove rest_count argument.
 (remove_node_from_ps): Update rows_length.
 (add_node_to_ps): Update rows_length and call create_ps_insn
 without passing row_rest_count.
 (rotate_partial_schedule): Update rows_length.
 [attachment patch_sms_5_6.txt deleted by Ayal Zaks/Haifa/IBM]



Re: [PATCH, SMS] Fix violation of memory dependence

2011-06-14 Thread Ayal Zaks
Revital Eres revital.e...@linaro.org wrote on 13/06/2011 10:29:06 AM:

 From: Revital Eres revital.e...@linaro.org
 To: Ayal Zaks/Haifa/IBM@IBMIL
 Cc: gcc-patches@gcc.gnu.org, Patch Tracking patc...@linaro.org
 Date: 13/06/2011 10:29 AM
 Subject: [PATCH, SMS] Fix violation of memory dependence

 Hello,

 The attached patch fixes violation of memory dependencies. The
 problematic scenario happens when -fmodulo-sched-allow-regmoves flag
 is set and certain anti-dep edges are not created.

 For example, consider the following three instructions and the edges
 between them.  When -fmodulo-sched-allow-regmoves is set the edge (63 -
 Anti, 0 - 64) is not created. (probably due to transitivity)

 Insn 63)  r168 = MEM[176]
 Out edges: (63 - Anti, 0 - 64)
 In edges: (64 - True, 1 - 63), (68 - True, 1 - 63)

 insn 64)  176 = 176 + 4
 Out edges: (64 - True, 1 - 63), (64 - True, 0- 68)
 In edges: (63 - Anti, 0 - 64)

 insn 68)  MEM[176 – 4] =  193
 Out edges: (68 - True, 1 - 63)
 In edges: (64 - True, 0- 68)

 This anti-dep edge is on the path from one memory instruction to another
 --- from 63 to 68; such that removing the edge caused a violation of
 the memory dependencies as insn 63 was scheduled after insn 68.

 This patch adds intra edges between every two memory instructions in
 this case.  It fixes recent bootstrap failure on ARM. (with SMS flags)

 The patch was tested as follows:
 On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
 enabling SMS also on loops with stage count 1.  Regtested on SPU.
 On arm-linux-gnueabi bootstrap c language with SMS
 flags enabling SMS also on loops with stage count 1
 and currently regression testing on c,c++.

 OK for mainline once regtest on arm-linux-gnueabi completes?


Yes, this is a straightforward fix to a wrong-code bug, as discussed
offline. Other alternatives that might introduce less edges:
o connect predecessors of u with v, and u with successors of v, when
removing edge (u,v). Maybe there are other cases which rely on transitivity
(?).
o have a version of sched_analyze that avoids creating register anti-deps
to begin with, and thus will create memory-deps in the absence of
transitivity.


 * ddg.c (add_intra_loop_mem_dep): New function.

You could check first thing if (from-cuid == to-cuid), for code clarity.

Nice catch,
Ayal.


 Thanks,
 Revital

 Changelog:

 gcc/
 * ddg.c (add_intra_loop_mem_dep): New function.
 (build_intra_loop_deps): Call it.

 testsuite/
 * gcc.dg/sms-9.c: New file.
 [attachment patch_fix_regmoves_12_6.txt deleted by Ayal Zaks/Haifa/IBM]

Re: [PATCH, SMS 1/4] Fix calculation of row_rest_count

2011-05-29 Thread Ayal Zaks
 The calculation of the number of instructions in a row is currently
 done by updating row_rest_count field in struct ps_insn on the fly
 while creating a new instruction.  It is used to make sure we do not
 exceed
 the issue_rate.
 This calculation assumes the instruction is inserted in the beginning of
a
 row thus does not take into account the cases where it must follow other
 instructions.  Also, it's not been property updated when an instruction
 is removed.
 To avoid the overhead of maintaining this row_rest_count count in every
 instruction in each row as is currently done; this patch maintains one
 count per row which holds the number of instructions in the row.

 The patch was tested together with the rest of the patches in this
series.
 On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
 enabling SMS also on loops with stage count 1.  Regtested on SPU.
 On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
 flags enabling SMS also on loops with stage count 1.

 OK for mainline?


OK. The current situation is indeed broken, and inefficient, as you noted:
o We are checking only the first node in a row for its row_rest_count
value, to figure out if the row is full; hence it's really a property of
the row, not of a node.
o When scheduling a node, we update only that node's row_rest_count; hence
if we schedule a node in position other than the first, the row_rest_count
value of the first node is not incremented.
o When removing a node, we do not decrement any row_rest_count value.

Please add the following:
o A clarification that rows_length is used only (as an optimization) to
back off quickly from trying to schedule a node in a full row; that is, to
avoid running through futile DFA state transitions.
o An assert that ps-rows_length[i] equals the number of nodes in ps-rows
[i] (e.g., in verify_partial_schedule(); and then recheck...).

Thanks for fixing this,
Ayal.

 Thanks,
 Revital

 * modulo-sched.c (struct ps_insn): Remove row_rest_count field.
 (struct partial_schedule): Add rows_length field.
 (ps_insert_empty_row): Handle rows_length.
 (create_partial_schedule): Likewise.
 (free_partial_schedule): Likewise.
 (reset_partial_schedule): Likewise.
 (create_ps_insn): Remove rest_count argument.
 (remove_node_from_ps): Update rows_length.
 (add_node_to_ps): Update rows_length and call create_ps_insn
 without passing row_rest_count.
 [attachment patch_row_rest_count_17_5.txt deleted by Ayal
Zaks/Haifa/IBM]



Re: [PATCH, SMS 2/4] Move the creation of anti-dep edge

2011-05-29 Thread Ayal Zaks
Revital Eres revital.e...@linaro.org wrote on 19/05/2011 07:44:23 AM:

 From: Revital Eres revital.e...@linaro.org
 To: Ayal Zaks/Haifa/IBM@IBMIL
 Cc: gcc-patches@gcc.gnu.org, Patch Tracking patc...@linaro.org
 Date: 19/05/2011 07:44 AM
 Subject: [PATCH, SMS 2/4] Move the creation of anti-dep edge

 Hello,

 The attached patch moves the creation of anti-dep edge from a
 branch to it's def from create_ddg_dep_from_intra_loop_link () to
 add_cross_iteration_register_deps  () due to the fact the edge is with
 distance 1 and thus should be in the later function.
 The edge was added to avoid creating reg-moves.

 The patch was tested together with the rest of the patches in this
series.
 On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
 enabling SMS also on loops with stage count 1.  Regtested on SPU.
 On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
 flags enabling SMS also on loops with stage count 1.

 OK for mainline?


OK, this makes sense. Just to re-confirm, the exact same edges are created
in both cases, right?

+ /* Always create the edge if the use node is a branch in
+order to prevent the creation of reg-moves.  */
   if (DF_REF_ID (last_def) != DF_REF_ID (first_def)
-  || !flag_modulo_sched_allow_regmoves)
+  || !flag_modulo_sched_allow_regmoves
+ || (flag_modulo_sched_allow_regmoves  JUMP_P (use_node-
insn)))
  
  redundant; suffices to check

+   || JUMP_P (use_node-insn))

 create_ddg_dep_no_link (g, use_node, first_def_node,
ANTI_DEP,
 REG_DEP, 1);



 Thanks,
 Revital

 * ddg.c (create_ddg_dep_from_intra_loop_link): Remove the
creation
 of anti-dep edge from a branch.
 (add_cross_iteration_register_deps): Create anti-dep edge from
 a branch.


 Index: ddg.c
 ===
 --- ddg.c   (revision 173785)
 +++ ddg.c   (working copy)
 @@ -197,11 +197,6 @@ create_ddg_dep_from_intra_loop_link (ddg
  }
  }

 -  /* If a true dep edge enters the branch create an anti edge in the
 - opposite direction to prevent the creation of reg-moves.  */
 -  if ((DEP_TYPE (link) == REG_DEP_TRUE)  JUMP_P (dest_node-insn))
 -create_ddg_dep_no_link (g, dest_node, src_node, ANTI_DEP, REG_DEP,
1);
 -
 latency = dep_cost (link);
 e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance);
 add_edge_to_ddg (g, e);
 @@ -306,8 +301,11 @@ add_cross_iteration_register_deps (ddg_p

   gcc_assert (first_def_node);

 + /* Always create the edge if the use node is a branch in
 +order to prevent the creation of reg-moves.  */
if (DF_REF_ID (last_def) != DF_REF_ID (first_def)
 -  || !flag_modulo_sched_allow_regmoves)
 +  || !flag_modulo_sched_allow_regmoves
 + || (flag_modulo_sched_allow_regmoves  JUMP_P (use_node-
insn)))
  create_ddg_dep_no_link (g, use_node, first_def_node,
ANTI_DEP,
  REG_DEP, 1);



Re: [PATCH, SMS 2/3] Skip DEBUG_INSNs while recognizing doloop

2011-05-10 Thread Ayal Zaks
 OK for mainline?

Yes, this is pretty obvious. (You don't have to change to
prev_nondebug_insn btw).

Ayal.




From:   Revital Eres revital.e...@linaro.org
To: Ayal Zaks/Haifa/IBM@IBMIL
Cc: gcc-patches@gcc.gnu.org, Patch Tracking patc...@linaro.org
Date:   08/05/2011 07:37 AM
Subject:[PATCH, SMS 2/3] Skip DEBUG_INSNs while recognizing doloop



Hello,

The attached patch adds code to skip DEBUG_INSNs while recognizing
doloop pattern.

The patch was tested together with the rest of the patches in this series
and on top of the patch to support do-loop for ARM (not yet in mainline,
but approved http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html).
On ppc64-redhat-linux regtest as well as bootstrap with SMS flags
enabling SMS also on loops with stage count 1.  Regtested on SPU.
On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS
flags enabling SMS also on loops with stage count 1.

OK for mainline?

Thanks,
Revital

Changelog:

* modulo-sched.c (doloop_register_get): Ignore DEBUG_INSNs while
recognizing doloop.


Index: modulo-sched.c
===
--- modulo-sched.c  (revision 173368)
+++ modulo-sched.c  (working copy)
@@ -310,10 +313,10 @@ doloop_register_get (rtx head ATTRIBUTE_
  either a single (parallel) branch-on-count or a (non-parallel)
  branch immediately preceded by a single (decrement) insn.  */
   first_insn_not_to_check = (GET_CODE (PATTERN (tail)) == PARALLEL ? tail
- : PREV_INSN (tail));
+ : prev_nondebug_insn (tail));

   for (insn = head; insn != first_insn_not_to_check; insn = NEXT_INSN
(insn))
-if (reg_mentioned_p (reg, insn))
+if (reg_mentioned_p (reg, insn)  !DEBUG_INSN_P (insn))
   {
 if (dump_file)
 {