This week I investigated modulo scheduler on IA64.  Enabling SMS by default
(-fmodulo-sched -fmodulo-sched-allow-regmoves) leads to bootstrap failure
on IA64: gcc/build/genautomata.o differs while comparing stages 2 and 3.

I haven't studied this issue in detail, because the combination of these
my patches fixes this problem:

[Additional edges to instructions with clobber]
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00505.html
[Correctly delete anti-dep edges for renaming]
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00506.html

Then I have regtested two compilers - first is clean trunk, and the second
is trunk with SMS enabled by default and two patches mentioned above.
Comparing the results shows several new failures.

FAIL: gcc.dg/pr45259.c (internal compiler error)
FAIL: gcc.dg/pr45259.c (test for excess errors)
FAIL: gcc.dg/pr47881.c (test for excess errors)
FAIL: 
tr1/5_numerical_facilities/special_functions/08_cyl_bessel_i/check_value.cc
execution test
FAIL: 
tr1/5_numerical_facilities/special_functions/09_cyl_bessel_j/check_value.cc
execution test
FAIL: tr1/5_numerical_facilities/special_functions/11_cyl_neumann/check_value.cc
execution test
FAIL: tr1/5_numerical_facilities/special_functions/21_sph_bessel/check_value.cc
execution test
FAIL: tr1/5_numerical_facilities/special_functions/23_sph_neumann/check_value.cc
execution test

Problem with gcc.dg/pr45259.c is an ICE, which I earlier fixed by this patch:
[Correct extracting loop exit condition]
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg02049.html

In gcc.dg/pr47881.c -fcompare-debug failure happens. The difference between
-fcompare-debug dumps is only some NOTE_INSN_DELETED entries are placed
differently.  I haven't studied this problem.

And the last 5 new failures have dissappered after fixing the following
described issue.

Imagine the following doloop (each use and set is a fmad in real example):

use1 reg
set reg
use2 reg
insn
cloop

After SMS it looks like this, I write a scheduling stage and cycle before each
instruction.

0  0 set reg
0  0 use1 reg_copy
0  4 use2 regR
0 -4 reg_copy = reg
0  8 insn
0 -1 cloop

So all instructions were wrongly classified to stage zero.  While copying them
to prologue the regmove remains to be placed after use1, and as a
result, the register
reg_copy is used uninititalized in prologue.  This leads to miscompilation.

I have found that the issue can be fixed by additional schedule normalizarion
after scheduling branch instruction in optimize_sc function.  The situation
here is the same as in patch by Richard Sandiford
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00748.html
which enables scheduling regmoves.
"Moves that handle incoming values might have been added
to a new first stage.  Bump the stage count if so."
The same bumping should be done after scheduling branch.

In my model example branch is scheduled on cycle -1 and remains in zero stage.
When regmove is later scheduled on cycle -4 the Richard's check doesn't cause
normalization, because new PS_MIN_CYCLE is -4, but min_cycle is -12:

  min_cycle = PS_MIN_CYCLE (ps) - SMODULO (PS_MIN_CYCLE (ps), ps->ii);
  ...
  call schedule_reg_moves (ps)
  ...
  if (PS_MIN_CYCLE (ps) < min_cycle)
    {
      reset_sched_times (ps, 0);
      stage_count++;
    }

I attach patch which adds the same check into optimize_sc function.
With -fmodulo-sched -fmodulo-sched-allow-regmoves enabled it passes
bootstrap and regtest on IA64.  It also passes bootstrap and regtest
on x86_64 with SMS patched to schedule non-doloop loops.
[Support new loop pattern]
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00495.html

OK for trunk or maybe 4.8?

Happy holidays!

--
Roman Zhuykov
2011-12-29  Roman Zhuykov  <zhr...@ispras.ru>
        * modulo-sched.c (optimize_sc): Allow branch-scheduling to add a new
        first stage.
---

diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c
index 969b273..e5de595 100644
--- a/gcc/modulo-sched.c
+++ b/gcc/modulo-sched.c
@@ -998,7 +998,7 @@ optimize_sc (partial_schedule_ptr ps, ddg_ptr g)
       int row = SMODULO (branch_cycle, ps->ii);
       int num_splits = 0;
       sbitmap must_precede, must_follow, tmp_precede, tmp_follow;
-      int c;
+      int min_cycle, c;
 
       if (dump_file)
        fprintf (dump_file, "\nTrying to schedule node %d "
@@ -1053,6 +1053,7 @@ optimize_sc (partial_schedule_ptr ps, ddg_ptr g)
        if (next_ps_i->id == g->closing_branch->cuid)
          break;
 
+      min_cycle = PS_MIN_CYCLE (ps) - SMODULO (PS_MIN_CYCLE (ps), ps->ii);
       remove_node_from_ps (ps, next_ps_i);
       success =
        try_scheduling_node_in_cycle (ps, g->closing_branch->cuid, c,
@@ -1092,6 +1093,10 @@ optimize_sc (partial_schedule_ptr ps, ddg_ptr g)
          ok = true;
        }
 
+      /* This might have been added to a new first stage.  */
+      if (PS_MIN_CYCLE (ps) < min_cycle)
+       reset_sched_times (ps, 0);
+
       free (must_precede);
       free (must_follow);
     }

Reply via email to