This week I investigated modulo scheduler on IA64. Enabling SMS by default (-fmodulo-sched -fmodulo-sched-allow-regmoves) leads to bootstrap failure on IA64: gcc/build/genautomata.o differs while comparing stages 2 and 3.
I haven't studied this issue in detail, because the combination of these my patches fixes this problem: [Additional edges to instructions with clobber] http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00505.html [Correctly delete anti-dep edges for renaming] http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00506.html Then I have regtested two compilers - first is clean trunk, and the second is trunk with SMS enabled by default and two patches mentioned above. Comparing the results shows several new failures. FAIL: gcc.dg/pr45259.c (internal compiler error) FAIL: gcc.dg/pr45259.c (test for excess errors) FAIL: gcc.dg/pr47881.c (test for excess errors) FAIL: tr1/5_numerical_facilities/special_functions/08_cyl_bessel_i/check_value.cc execution test FAIL: tr1/5_numerical_facilities/special_functions/09_cyl_bessel_j/check_value.cc execution test FAIL: tr1/5_numerical_facilities/special_functions/11_cyl_neumann/check_value.cc execution test FAIL: tr1/5_numerical_facilities/special_functions/21_sph_bessel/check_value.cc execution test FAIL: tr1/5_numerical_facilities/special_functions/23_sph_neumann/check_value.cc execution test Problem with gcc.dg/pr45259.c is an ICE, which I earlier fixed by this patch: [Correct extracting loop exit condition] http://gcc.gnu.org/ml/gcc-patches/2011-09/msg02049.html In gcc.dg/pr47881.c -fcompare-debug failure happens. The difference between -fcompare-debug dumps is only some NOTE_INSN_DELETED entries are placed differently. I haven't studied this problem. And the last 5 new failures have dissappered after fixing the following described issue. Imagine the following doloop (each use and set is a fmad in real example): use1 reg set reg use2 reg insn cloop After SMS it looks like this, I write a scheduling stage and cycle before each instruction. 0 0 set reg 0 0 use1 reg_copy 0 4 use2 regR 0 -4 reg_copy = reg 0 8 insn 0 -1 cloop So all instructions were wrongly classified to stage zero. While copying them to prologue the regmove remains to be placed after use1, and as a result, the register reg_copy is used uninititalized in prologue. This leads to miscompilation. I have found that the issue can be fixed by additional schedule normalizarion after scheduling branch instruction in optimize_sc function. The situation here is the same as in patch by Richard Sandiford http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00748.html which enables scheduling regmoves. "Moves that handle incoming values might have been added to a new first stage. Bump the stage count if so." The same bumping should be done after scheduling branch. In my model example branch is scheduled on cycle -1 and remains in zero stage. When regmove is later scheduled on cycle -4 the Richard's check doesn't cause normalization, because new PS_MIN_CYCLE is -4, but min_cycle is -12: min_cycle = PS_MIN_CYCLE (ps) - SMODULO (PS_MIN_CYCLE (ps), ps->ii); ... call schedule_reg_moves (ps) ... if (PS_MIN_CYCLE (ps) < min_cycle) { reset_sched_times (ps, 0); stage_count++; } I attach patch which adds the same check into optimize_sc function. With -fmodulo-sched -fmodulo-sched-allow-regmoves enabled it passes bootstrap and regtest on IA64. It also passes bootstrap and regtest on x86_64 with SMS patched to schedule non-doloop loops. [Support new loop pattern] http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00495.html OK for trunk or maybe 4.8? Happy holidays! -- Roman Zhuykov
2011-12-29 Roman Zhuykov <zhr...@ispras.ru> * modulo-sched.c (optimize_sc): Allow branch-scheduling to add a new first stage. --- diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c index 969b273..e5de595 100644 --- a/gcc/modulo-sched.c +++ b/gcc/modulo-sched.c @@ -998,7 +998,7 @@ optimize_sc (partial_schedule_ptr ps, ddg_ptr g) int row = SMODULO (branch_cycle, ps->ii); int num_splits = 0; sbitmap must_precede, must_follow, tmp_precede, tmp_follow; - int c; + int min_cycle, c; if (dump_file) fprintf (dump_file, "\nTrying to schedule node %d " @@ -1053,6 +1053,7 @@ optimize_sc (partial_schedule_ptr ps, ddg_ptr g) if (next_ps_i->id == g->closing_branch->cuid) break; + min_cycle = PS_MIN_CYCLE (ps) - SMODULO (PS_MIN_CYCLE (ps), ps->ii); remove_node_from_ps (ps, next_ps_i); success = try_scheduling_node_in_cycle (ps, g->closing_branch->cuid, c, @@ -1092,6 +1093,10 @@ optimize_sc (partial_schedule_ptr ps, ddg_ptr g) ok = true; } + /* This might have been added to a new first stage. */ + if (PS_MIN_CYCLE (ps) < min_cycle) + reset_sched_times (ps, 0); + free (must_precede); free (must_follow); }