Re: debug insns in SMS
Thanks for the duplicate ping. This is fine. So this indeed solves the discrepancy between running SMS w/ and w/o debugging? Please include a comment next to the code stating why it's important not to create such deps. You may also want to store the result of DEP_PRO (dep) in src_something and use it twice, for clarity. Thanks, Ayal. On Thu, Jun 14, 2012 at 1:00 AM, Alexandre Oliva aol...@redhat.com wrote: Apologies for the duplicate ping, this one is now properly addressed to the pass maintainer. On Apr 9, 2012, Alexandre Oliva aol...@redhat.com wrote: On May 4, 2011, Revital1 Eres e...@il.ibm.com wrote: Hello Alexandre I think this will restore proper functioning to SMS in the presence of debug insns. A while ago, we'd never generate deps of non-debug insns on debug insns. I introduced them to enable sched to adjust (reset) debug insns when non-debug insns were moved before them. I believe it is safe to leave them out of the SCCs. Even though this will end up causing some loss of debug info, that's probably unavoidable, and the end result after this change is pobably the best we can hope for. Your thoughts? Thanks for the patch! I actually discussed this issue with Ayal yesterday. Ayal also suggested to reconsider the edges that are created in the DDG between real instructions and debug_insns. Currently, we create bidirectional anti deps edges between them. This leads to the problem you were trying to solve in the current patch (described below) where these extra edges influence the construction of the strongly connected component and the code generated with and w\o -g. Your patch seems to solve this problem. However I can not approve it as I'm not the maintainer (Ayal is). Ping? (Retested on x86_64-linux-gnu and i686-pc-linux-gnu) for gcc/ChangeLog from Alexandre Oliva aol...@redhat.com * ddg.c (build_intra_loop_deps): Discard deps of nondebug on debug. Ping? http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00419.html -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [SMS] Support new loop pattern
Roman, Andrey, Sorry for the delayed response. It would indeed be good to have SMS apply to more loop patterns, still within the realm of *countable* loops. SMS was originally designed to handle doloops, with a specific pattern controlling the loop, easily identified and separable from the loop's body. The newly proposed change to support new loop patterns is pretty invasive and sizable, taking place entirely within modulo-sched.c. The main issue I've been considering, is whether it would be possible instead to transform the new loop patterns we want SMS to handle, into doloops (potentially introducing additional induction variables to feed other uses), and then feed the resulting loop into SMS as is? In other words, could you fold it into doloop.c? And if so, will doing so introduce significant overheads? 2012/3/29 Andrey Belevantsev a...@ispras.ru: Hello, I'd like to ping again those SMS patches once we're back to Stage 1. Ayal, maybe it would remove some burden for you if you'd review the general SMS functionality of those patches, and we'd ask RTL folks to look at the pieces related to RTL pattern matching and generation? It definitely would ... especially in light of the above issue. Thanks (for your patches, patience, pings..), Ayal. Yours, Andrey On 10.02.2012 16:15, Roman Zhuykov wrote: Ping. Ayal, please review this patch and these three patches too: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00505.html http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00506.html http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01800.html -- Roman Zhuykov zhr...@ispras.ru
Re: [PATCH] Fix -fdump-rtl-sms (PR rtl-optimization/52095)
On Fri, Feb 3, 2012 at 8:07 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! On some targets e.g. sms-7.c test fails, because fprintf is called with %s format and NULL argument, GLIBC prints for that e.g. SMS loop num: 1, file: (null), line: 0 but it isn't portable. print-rtl.c guards the locator printing with /* Pretty-print insn locators. Ignore scoping as it is mostly redundant with line number information and do not print anything when there is no location information available. */ if (INSN_LOCATOR (in_rtx) insn_file (in_rtx)) fprintf(outfile, %s:%i, insn_file (in_rtx), insn_line (in_rtx)); which fixes this, but there are 7 different spots that would need adjusting in modulo-sched.c, so I've added a helper function for that. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok with me. It would be better to try and locate some relevant file:line information elsewhere(?) instead of silently giving up. Thanks, Ayal. 2012-02-03 Jakub Jelinek ja...@redhat.com PR rtl-optimization/52095 * modulo-sched.c (dump_insn_locator): New function. (loop_canon_p, sms_schedule): Use it. --- gcc/modulo-sched.c.jj 2011-12-14 08:11:03.0 +0100 +++ gcc/modulo-sched.c 2012-02-03 13:45:49.137997767 +0100 @@ -1246,6 +1246,19 @@ loop_single_full_bb_p (struct loop *loop return true; } +/* Dump file:line from INSN's location info to dump_file. */ + +static void +dump_insn_locator (rtx insn) +{ + if (dump_file INSN_LOCATOR (insn)) + { + const char *file = insn_file (insn); + if (file) + fprintf (dump_file, %s:%i, file, insn_line (insn)); + } +} + /* A simple loop from SMS point of view; it is a loop that is composed of either a single basic block or two BBs - a header and a latch. */ #define SIMPLE_SMS_LOOP_P(loop) ((loop-num_nodes 3 ) \ @@ -1271,9 +1284,9 @@ loop_canon_p (struct loop *loop) { rtx insn = BB_END (loop-header); - fprintf (dump_file, SMS loop many exits ); - fprintf (dump_file, %s %d (file, line)\n, - insn_file (insn), insn_line (insn)); + fprintf (dump_file, SMS loop many exits); + dump_insn_locator (insn); + fprintf (dump_file, \n); } return false; } @@ -1284,9 +1297,9 @@ loop_canon_p (struct loop *loop) { rtx insn = BB_END (loop-header); - fprintf (dump_file, SMS loop many BBs. ); - fprintf (dump_file, %s %d (file, line)\n, - insn_file (insn), insn_line (insn)); + fprintf (dump_file, SMS loop many BBs.); + dump_insn_locator (insn); + fprintf (dump_file, \n); } return false; } @@ -1407,13 +1420,13 @@ sms_schedule (void) } if (dump_file) - { - rtx insn = BB_END (loop-header); - - fprintf (dump_file, SMS loop num: %d, file: %s, line: %d\n, - loop-num, insn_file (insn), insn_line (insn)); + { + rtx insn = BB_END (loop-header); - } + fprintf (dump_file, SMS loop num: %d, loop-num); + dump_insn_locator (insn); + fprintf (dump_file, \n); + } if (! loop_canon_p (loop)) continue; @@ -1440,9 +1453,8 @@ sms_schedule (void) { if (dump_file) { - fprintf (dump_file, %s %d (file, line)\n, - insn_file (tail), insn_line (tail)); - fprintf (dump_file, SMS single-bb-loop\n); + dump_insn_locator (tail); + fprintf (dump_file, \nSMS single-bb-loop\n); if (profile_info flag_branch_probabilities) { fprintf (dump_file, SMS loop-count ); @@ -1543,14 +1555,15 @@ sms_schedule (void) continue; if (dump_file) - { - rtx insn = BB_END (loop-header); + { + rtx insn = BB_END (loop-header); - fprintf (dump_file, SMS loop num: %d, file: %s, line: %d\n, - loop-num, insn_file (insn), insn_line (insn)); + fprintf (dump_file, SMS loop num: %d, loop-num); + dump_insn_locator (insn); + fprintf (dump_file, \n); - print_ddg (dump_file, g); - } + print_ddg (dump_file, g); + } get_ebb_head_tail (loop-header, loop-header, head, tail); @@ -1561,9 +1574,8 @@ sms_schedule (void) if (dump_file) { - fprintf (dump_file, %s %d (file, line)\n, - insn_file (tail), insn_line (tail)); - fprintf (dump_file, SMS single-bb-loop\n); + dump_insn_locator (tail); + fprintf (dump_file, \nSMS single-bb-loop\n); if (profile_info flag_branch_probabilities) { fprintf (dump_file, SMS loop-count ); @@
Re: [PATCH, SMS] Fix PR51794
On Tue, Jan 10, 2012 at 12:31 PM, Revital1 Eres e...@il.ibm.com wrote: Hello, The patch below fixes ICE reported in PR51794. It avoids creating DDG edges for register uses of class DF_REF_ARTIFICIAL as the latter does not have real instructions for them and thus calling BLOCK_FOR_INSN fails. Tested and bootstrap on ppc64-redhat-linux, enabling SMS on loops with SC 1. OK for mainline? Yes, this is ok. Thanks, Ayal. Thanks, Revital Chanelog: PR rtl-optimization/51794 * ddg.c (add_cross_iteration_register_deps): Avoid creating edges for uses of class DF_REF_ARTIFICIAL. testsuite/ PR rtl-optimization/51794 * gcc.dg/sms-12.c: New test. (See attached file: patch_fix_pr51794.txt)
Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)
SMS changes are ok. * common.opt (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose): New flags. We should document what the different verbosity levels are, or at-least their range. Thanks, Ayal. On Tue, Jan 10, 2012 at 7:48 PM, Vladimir Makarov vmaka...@redhat.com wrote: On 01/03/2012 04:25 AM, Revital1 Eres wrote: Attached is an updated version with the two changes mentioned above taken from the previous patch. Tested and bootstrap with the other patch in the series on ppc64-redhat-linux, enabling SMS on loops with SC 1. Thanks again, Revital IRA changes are ok for me. Thanks, Revital. 2012-01-03 Richard Sandifordrichard.sandif...@linaro.org Revital Eresrevital.e...@linaro.org * loop-invariant.c (get_regno_pressure_class): Move function to... * ira.c: Here. * common.opt (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose): New flags. * doc/invoke.texi (fmodulo-sched-reg-pressure, -fmodulo-sched-verbose): Document the flags. * ira.h (get_regno_pressure_class, reset_pseudo_classes_defined_p): Declare. * ira-costs.c (reset_pseudo_classes_defined_p): New function. * Makefile.in (modulo-sched.o): Include ira.h and modulo-sched.h. (modulo-sched-pressure.o): New. * modulo-sched.c (ira.h, modulo-sched.h): New includes. (partial_schedule_ptr, ps_insn_ptr, struct ps_insn, struct ps_reg_move_info, struct partial_schedule): Move to modulo-sched.h. (ps_rtl_insn, ps_reg_move): Remove static. (apply_reg_moves): Remove static and call df_insn_rescan only if PS is final. (undo_reg_moves): New function. (sms_schedule): Call register pressure estimation. * modulo-sched.h: New file. * modulo-sched-pressure.c: New file. (See attached file: patch_pressure_3_1_12.txt)
Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)
On Mon, Jan 2, 2012 at 3:30 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Ayal Zaks ayal.z...@gmail.com writes: + for (i = 0; i ira_pressure_classes_num; i++) + { + enum reg_class pressure_class; + + pressure_class = ira_pressure_classes[i]; + + if (max_reg_pressure[pressure_class] == 0) + continue; + + if (dump_file) + fprintf (dump_file, %s=%d %d , reg_class_names[pressure_class], + max_reg_pressure[pressure_class], + ira_available_class_regs[pressure_class]); + + if (max_reg_pressure[pressure_class] + ira_class_hard_regs_num[pressure_class]) + { + if (dump_file) + fprintf (dump_file, (pressure)\n); + + pressure_p = true; you can break; now. FWIW, I thought the same thing at first, but I think Revital's way is better. It's nice to know when looking at dumps whether there is excess pressure in more than one class. This isn't performance-critical code after all. ok however, you have everything setup to compute the amount of spill, so suggest to do the following instead: pressure += max_reg_pressure[pressure_class] - ira_class_hard_regs_num[pressure_class]); or better call the variable spillage instead of pressure, and return it. Current use will only check if it's zero or positive. I read this suggestion in the same way as Revital seems to have done: that we sum the pressure change over all classes. But that isn't the idea. Using N too many registers in one pressure class cannot be mitigated by leaving N registers in another pressure class unused. of-course (wasn't thinking of spilling from one register file to another); meant to suggest doing: if (max_reg_pressure[pressure_class] ira_class_hard_regs_num[pressure_class]) spillage += max_reg_pressure[pressure_class] - ira_class_hard_regs_num[pressure_class]); TBH, the original version of this function looked better to me. ok, it would surely suffice for now. Future use, as discussed offline, should compare this to the amount of spillage the original loop (w/o modulo-scheduling) is expected to have. I wonder if that's really worth pursuing though. For one thing, we've no control over where the spill code is going to be inserted, so the final ii is going to be a little arbitrary. That's less of a problem without SMS because we're able to reschedule the code after register allocation in order to move spills around. Indeed; that's the main motivation for the second sched pass. For another thing, the pressure of the original (unscheduled) code isn't necessarily indicative of what can be achieved by the normal scheduler with something like -fsched-pressure. We can't assume that the original block has minimum pressure. If we're serious about wanting to use SMS in high-pressure loops, perhaps we should consider trying to split live ranges in SMS itself. I'm not sure it makes sense to leave an SMSed loop that we know is going to need spill code. Richard Agreed. Suggest to have concrete testcases guide further development. Thanks, Ayal.
Re: [PATCH, SMS] Prevent the creation of reg-moves for definitions with MODE_CC
The attached patch prevents the creation of reg-moves for definitions with MODE_CC and thus solves this ICE. Currently testing and bootstrap on ppc64-redhat-linux, enabling SMS on loops with SC 1. OK for 4.7 once testing completes? Yes, thanks for catching this. Shouldn't we prevent creating such regmoves for (the other case of) intra-loop anti-deps as well? While true, I doubt either PPC or MIPS really benefit from moving around registers of CCmode. IIRC, PPC has eight 4-bit CR's and supports efficient copying of *one bit* from one CR to another (via cror), which would require special regmove handling. Hence I share your doubt. Ayal. On Thu, Dec 22, 2011 at 10:53 AM, Richard Sandiford richard.sandif...@linaro.org wrote: Richard Henderson r...@redhat.com writes: On 12/20/2011 09:47 AM, Richard Sandiford wrote: Revital Eres revital.e...@linaro.org writes: +/* Return true if one of the definitions in INSN has MODE_CC. Otherwise + return false. */ +static bool +def_has_ccmode_p (rtx insn) +{ + df_ref *def; + + for (def = DF_INSN_DEFS (insn); *def; def++) + { + enum machine_mode mode = GET_MODE (DF_REF_REG (*def)); + + if (GET_MODE_CLASS (mode) == MODE_CC) + return true; + } + + return false; +} FWIW, an alternative might be to test have_regs_of_mode[(int) mode]. That says whether there are any allocatable (non-fixed) registers of the given mode. While true, I doubt either PPC or MIPS really benefit from moving around registers of CCmode. Certainly MIPS has no way of easily moving CCmode registers around. It's a rather complicated reload, that. I'd be very tempted to simply go with the original patch. OK, sorry for the noise. Richard
Re: [PATCH SMS 2/2, RFC] Register pressure estimation for the partial schedule (re-submission)
On Mon, Dec 19, 2011 at 4:28 PM, Richard Sandiford richard.sandif...@linaro.org wrote: Hi Revital, Revital Eres revital.e...@linaro.org writes: The attached patch is a resubmission following comments made by Ayal and Richard. Tested and bootstrap with the other patches in the series on ppc64-redhat-linux, enabling SMS on loops with SC 1. The modulo-sched parts are approved, with some comments provided below. Moving get_regno_pressure_class() from loop-invariant.c to ira.c and other ira.h/ira-costs.c changes are not for me to approve, but they make sense to me. Looks really good to me. I'll leave any approval to Ayal though. Just one suggestion: + /* Handle register moves. */ while we're at it, the comment above should be assigned to the then part below: + if (ps_i-id = ps-g-num_nodes) + { + int old_regno = REGNO (ps_reg_move (ps, ps_i-id)-old_reg); + int new_regno = REGNO (ps_reg_move (ps, ps_i-id)-new_reg); + + change_pressure (old_regno, true); + change_pressure (new_regno, false); + change_pressure (old_regno, true); + } + else + { + for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++) + change_pressure (DF_REF_REGNO (*use_rec), true); + + for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++) + change_pressure (DF_REF_REGNO (*def_rec), false); + + for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++) + change_pressure (DF_REF_REGNO (*use_rec), true); + } It might be worth adding a comment to say why the code is doing it this way. Indeed, this does raise an eyebrow, and your explanation below is great. Suggest to start by saying that The last two steps are the natural way one would go about updating live registers in a bottom-up scan, except that in some cases (iiuc) the same physical register cannot be assigned to both use and def on same insn, so the first step is added conservatively. E.g.: /* Process all uses, all defs, and then all uses again. The first two steps give us an estimate of the pressure when dying inputs cannot be tied to outputs (which is the worst case). The second two steps update the set of live registers ready for the next instruction. */ Also, as a general future direction comment, I think we should seriously consider turning both this and -fmodulo-sched-allow-regmoves on by default, so that -fmodulo-sched uses them unless explicitly told not to. The results for ARM seemed to suggest that it is now the best SMS mode (although the results can't be shared, unfortunately). It'd be unfortunate if users had to write: -fmodulo-sched -fmodulo-sched-allow-regmoves -fmodulo-sched-reg-pressure instead of plain: -fmodulo-sched It might make sense as a follow-on patch rather than part of the main commit. I'd be in favor, provided this works well for other platforms as well of-course. Ignoring the potential increase of register pressure inside a loop is being incautiously optimistic... It would be great to document testcases where spillage is detected; the original motivation of SMS is to reduce size of register live ranges so as to prevent spillage. There may be other means of preventing or mitigating spills other than bumping the II (e.g., Minimum register requirements for a module schedule Micro'94), which could be devised if concrete examples are analyzed. There's also the separate debate about whether SMS should now be enabled by default for ARM at -O3, but that's for another day... Guess so... ;-) Thanks. Richard Here are some more comments: +change_pressure (int regno, bool incr_p) ^^ I realize this was taken from elsewhere, but maybe update_pressure or update_current_pressure would better fit here. +{ + int nregs; + enum reg_class pressure_class; + + if (regno FIRST_PSEUDO_REGISTER + (TEST_HARD_REG_BIT (ira_no_alloc_regs, regno) + || TEST_HARD_REG_BIT (eliminable_regset, regno))) +return; + + /* Update the current set of live registers. Exit early if nothing + has changed. */ please clarify: we want to increase curr_reg_pressure as we scan upwards and encounter a last use; if regno is already live, this use is not last. likewise, we want to decrease curr_reg_pressure as we encounter a def; regno might not be live when !incr_p, if the def feeds only uses in next iteration. + if (incr_p + ? !bitmap_set_bit (curr_regs_live, regno) + : !bitmap_clear_bit (curr_regs_live, regno)) +return; + + pressure_class = get_regno_pressure_class (regno, nregs); + + if (!incr_p) +curr_reg_pressure[pressure_class] -= nregs; + else +curr_reg_pressure[pressure_class] += nregs; + sounds a bit more natural to ask if (incr_p)...; and only if so, check also if the increased current pressure exceeds max: + if (max_reg_pressure[pressure_class]
Re: [PATCH, SMS] Add missing free operation in mark_loop_unsched
sure, OK, thanks for catching this leak. Ayal. On Mon, Dec 12, 2011 at 8:25 AM, Revital Eres revital.e...@linaro.org wrote: Hello, OK for 3.7? Sorry, I meant GCC 4.7.0... Thanks, Revital
Re: [PING][PR testsuite/47013] Fix SMS testsuite faliures
These fixes to individual sms testcases are OK. Thanks, Ayal. On Mon, Dec 5, 2011 at 3:07 PM, Revital Eres revital.e...@linaro.org wrote: Hello, Ping: http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02444.html Thanks, Revital
Re: [PATCH SMS 1/2, RFC] Support traversing PS in reverse order
On Mon, Nov 21, 2011 at 7:07 AM, Revital Eres revital.e...@linaro.org wrote: Hello, This patch support the estimation of register pressure in SMS. Although GCC is in stage 3 I would appreciate comments on it. Thanks to Richard and Ayal for discussing the implementation and their insights. This part of the patch enables iterating on the partial schedule in the reverse order (from the last instruction the the first). Tested and bootstrap with the other patches in the series on ppc64-redhat-linux, enabling SMS on loops with SC 1. Comments are welcome. This looks fine. Please rename rows_reverse to rows_last as discussed, and simplify the bit that tracks last_in_row in ps_insn_find_column(). Thanks, Ayal. Thanks, Revital Changelog: * modulo-sched.c (rows_reverse): New field in struct partial_schedule. (create_partial_schedule, free_partial_schedule, ps_insert_empty_row, ps_insn_advance_column, ps_insn_find_column, remove_node_from_ps, reset_partial_schedule, rotate_partial_schedule, verify_partial_schedule): Update the new field.
Re: [PATCH 5/9] [SMS] Support new loop pattern
On Fri, Sep 30, 2011 at 5:22 PM, Roman Zhuykov zhr...@ispras.ru wrote: 2011/7/21 zhr...@ispras.ru: This patch should be applied only after pending patches by Revital. Ping. New version is attached, it suits current trunk without additional patches. Thanks for the ping. Also this related patch needs approval: http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01804.html The loop should meet the following requirements. First three are the same as for loop with doloop pattern: ... The next three describe the control part of new supported loops. - the last jump instruction should look like: pc=(regF!=0)?label:pc, regF is you'd probably want to bump to next instruction if falling through, e.g., pc=(regF!=0)?label:pc+4 flag register; - the last instruction which sets regF should be: regF=COMPARE(regC,X), where X is a constant, or maybe a register, which is not changed inside a loop; - only one instruction modifies regC inside a loop (other can use regC, but not write), and it should simply adjust it by a constant: regC=regC+step, where step is a constant. When doloop is succesfully scheduled by SMS, its number of iterations of loop kernel should be decreased by the number of stages in a schedule minus one, while other iterations expand to prologue and epilogue. In new supported loops such approach can't be used, because some instructions can use count register (regC). Instead of this, the final register value X in compare instruction regF=COMPARE(regC,X) is changed to another value Y respective to the stage this instruction is scheduled (Y = X - stage * step). making sure this does not underflow; i.e., that the number of iterations is no less than stage (you've addressed this towards the end below). The main difference from doloop case is that regC can be used by some instructions in loop body. That's why we are unable to simply adjust regC initial value, but have to keep it's value correct on each particular iteration. So, we change comparison instruction accordingly. An example: int a[100]; int main() { int i; for (i = 85; i 12; i -= 5) a[i] = i * i; return a[15]-225; } ARM assembler with -O2 -fno-auto-inc-dec: ldr r0, .L5 mov r3, #85 mov r2, r0 .L2: mul r1, r3, r3 sub r3, r3, #5 cmp r3, #10 str r1, [r2, #340] sub r2, r2, #20 bne .L2 ldr r0, [r0, #60] sub r0, r0, #225 bx lr .L5: .word a Loop body is executed 15 times. When compiling with SMS, it finds a schedule with ii=7, stage_count=3 and following times: Stage Time Insn 0 5 mul r1, r3, r3 1 10 sub r3, r3, #5 1 11 cmp r3, #10 1 11 str r1, [r2, #340] 1 13 bne .L2 2 16 sub r2, r2, #20 branch is not scheduled last? To make new schedule correct the loop body should be executed 14 times and we change compare instruction: the loop itself should execute 13 times. regF=COMPARE(regC,X) to regF=COMPARE(regC,Y) where Y = X - stage * step. In our example regC is r3, X is 10, step = -5, compare instruction is scheduled on stage 1, so it should be Y = 10 - 1 * (-5) = 15. right. In general, if the compare is on stage s (starting from 0), it will be executed s times in the epilog, so it should exit the loop upon reaching Y = X - s * step. So, after SMS it looks like: ldr r0, .L5 mov r3, #85 mov r2, r0 ;;prologue mul r1, r3, r3 ;;from stage 0 first iteration sub r3, r3, #5 ;;3 insns from stage 1 first iteration cmp r3, #10 str r1, [r2, #340] mul r1, r3, r3 ;;from stage 0 second iteration ;;body .L2: sub r3, r3, #5 sub r2, r2, #20 cmp r3, #15 ;; new value to compare with is Y=15 str r1, [r2, #340] mul r1, r3, r3 bne .L2 ;;epilogue sub r2, r2, #20 ;;from stage 2 pre-last iteration sub r3, r3, #5 ;;3 insns from stage 1 last iteration cmp r3, #10 str r1, [r2, #340] sub r2, r2, #20 ;;from stage 2 last iteration ldr r0, [r0, #60] sub r0, r0, #225 bx lr .L5: .word a Real ARM assembler with SMS (after some optimizations and without dead code): mov r3, #85 ldr r0, .L8 mul r1, r3, r3 sub r3, r3, #5 mov r2, r0 str r1, [r0, #340] mul r1, r3, r3 .L2: sub r3, r3, #5 sub r2, r2, #20 cmp r3, #15 str r1, [r2, #340] mul r1, r3, r3 bne .L2 str r1, [r2, #320] ldr r0, [r0, #60] sub r0, r0, #225 bx lr .L8: .word a
Re: [4/4] Make SMS schedule register moves
On Mon, Oct 10, 2011 at 1:57 PM, Richard Sandiford richard.sandif...@linaro.org wrote: Ayal Zaks ayal.z...@gmail.com writes: I agree it's natural to schedule moves for intra-iteration dependencies in the normal get_sched_window way. But suppose we have a dependency: A --(T,N,1)-- B that requires two moves M1 and M2. If we think in terms of cycles (in the SCHED_TIME sense), then this effectively becomes: A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B because it is now M1 that is fed by both the loop and the incoming edge. But if there is a second dependency: A --(T,M,0)-- C that also requires two moves, we end up with: A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B --(T,M3,-1)-- B and dependence distances of -1 feel a bit weird. :-) Of course, what we really have are two parallel dependencies: A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B A --(T,M1,0)-- M1' --(T,M2,0)-- M2' --(T,N3,0)-- B where M1' and M2' occupy the same position as M1 and M2 in the schedule, but are one stage further along. But we only schedule them once, so if we take the cycle/SCHED_TIME route, we have to introduce dependencies of distance -1. Interesting; had to digest this distance 1 business, a result of thinking in cycles instead of rows (or conversely), and mixing dependences with scheduling; here's my understanding, based on your explanations: Suppose a Use is truely dependent on a Def, where both have been scheduled at some absolute cycles; think of them as timing the first iteration of the loop. Assume first that Use appears originally after Def in the original instruction sequence of the loop (dependence distance 0). In this case, Use requires register moves if its distance D from Def according to the schedule is more than ii cycles long -- by the time Use is executed, the value it needs is no longer available in the def'd register due to intervening occurrences of Def. So in this case, the first reg-move (among D/ii) should appear after Def, recording its value before the next occurrence of Def overwrites it, and feeding subsequent moves as needed before each is overwritten. Thus the scheduling window of this first reg-move is within (Def, Def+ii). Now, suppose Use appears before Def, i.e., Use is upwards-exposed; if it remains scheduled before the Def, no reg-move is needed. If, otoh, Def is scheduled (D0 cycles) before Use, breaking the anti-dependence between them, (D/ii + 1) reg-moves are needed in order to feed Use with the live-in value before Def. So in this case, the first reg-move should appear before Def (again feeding subsequent moves as needed before each is overwritten). Thus the scheduling window of this first reg-move is within (Def-ii, Def). In any case, each use requires a specific number of reg-moves, which begin either before or after the def; it is always safe (though potentially redundant) to start reg-moving before the def -- uses that do not need the live-in value will ignore it by feeding from a later reg-move. Right. The distance on the Def-Use dependency is effectively transferred to the dependency between the Def and first move. And we can potentially have both kinds of Use at the same time. We then end up scheduling two moves, one in each window, but require them to occupy the same row and column. It feels more convenient to schedule the earlier of the two moves. Q: if we generate reg-moves starting from before the def, would redundant reg-moves be eliminated if no use needs access to live-in value? If so, would that simplify their generation? (If not, it may be interesting to understand how to improve DCE to catch it.) To be clear, the new version of the patch (unlike the old one) doesn't emit reg-moves before the def if all uses are distance 0. We only do it where some or all uses are distance 1. The first move before the def is always needed in that case. Understood. The question was whether it would simplify things if we always emit reg-moves before the def. And rely on DCE to hopefully eliminate redundancies. So redudant moves are confined to the case where (a) we have more than one move, (b) we have both distance 0 and distance 1 uses, and (c) one of the two distance sets requires more moves than the other. If the distance 0 dependencies require more moves than the distance 1 dependencies, we will have a redudant move in the prologue. If the distance 1 dependencies require more moves than the distance 0 dependencies, we will have a redudant move in the epilogue. But I believe that this is already handled by dce. (For avoidance of doubt, the new code is more accurate than current trunk in this respect.) The issue of assigning stages to reg-moves is mostly relevant for prolog and epilog generation, which requires and receives special attention -- handled very nicely by ps_num_consecutive_stages! Note that currently a simple
Re: [4/4] Make SMS schedule register moves
On Wed, Sep 28, 2011 at 4:49 PM, Richard Sandiford richard.sandif...@linaro.org wrote: Ayal Zaks ayal.z...@gmail.com writes: + /* The cyclic lifetime of move-new_reg starts and ends at move-def + (the instruction that defines move-old_reg). So instruction I_REG_MOVE (new_reg=reg) must be scheduled before the next I_MUST_FOLLOW move/original-def (due to anti-dependence: it overwrites reg), but after the previous instance of I_MUST_FOLLOW (due to true dependence; i.e. account for latency also). Why do moves, except for the one closest to move-def (which is directly dependent upon it, i.e. for whom move-def == I_MUST_FOLLOW), have to worry about move-def at all? (Or have their cyclic lifetimes start and end there?) Because the uses of new_reg belong to the same move-def based cycle. the cycle (overloaded term; rather iteration in this context) to which the uses belong, is inferred from the cycle (absolute schedule time) in which they are scheduled, regardless of move-def. Just to prove your point about cycle being an overloaded term: I wasn't actually meaning it in the sense of (loop) iteration. I meant a circular window based on move-def. Point proven ;-) So (I think this is the uncontroversial bit): [M1] must be scheduled cyclically before [B] and cyclically after [C], with the cycle based at [B]: row 3 after [B]: empty row 4: [C] row 5: [D] row 0: empty row 1: empty row 2: [A] row 3 before [B]: empty [M1] could therefore go in row 1. This part is OK. Here's how I see it: [M1] feeds [C] which is scheduled at cycle 10, so it must be scheduled before cycle 10-M_latency and after cycle 10-ii. [M1] uses the result of [B] which is scheduled at cycle 3, so must be scheduled after cycle 3+B_latency and before cycle 3+ii. Taking all latencies to be 1 and ii=6, this yields a scheduling window of cycles [4,9]\cap[4,9]=[4,9]; if scheduled at cycle 4 it must_follow [C], if scheduled at cycle 9 it must_precede [B]. This is identical to the logic behind the sched_window of any instruction, based on its dependencies (as you've updated not too long ago..), if we do not allow reg_moves (and arguably, one should not allow reg_moves when scheduling reg_moves...). To address the potential erroneous scenario of Loop 2, suppose [A] is scheduled as in the beginning in cycle 20, and that [M1] is scheduled in cycle 7 (\in[4,9]). Then [M2] feeds [D] and [A] which are scheduled at cycles 17 and 20, so it must be scheduled before cycle 17-1 and after cycle 20-6. [M2] uses the result of [M1], so must be scheduled after cycle 7+1 and before cycle 7+6. This yields the desired [14,16]\cap[8,13]=\emptyset. I agree it's natural to schedule moves for intra-iteration dependencies in the normal get_sched_window way. But suppose we have a dependency: A --(T,N,1)-- B that requires two moves M1 and M2. If we think in terms of cycles (in the SCHED_TIME sense), then this effectively becomes: A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B because it is now M1 that is fed by both the loop and the incoming edge. But if there is a second dependency: A --(T,M,0)-- C that also requires two moves, we end up with: A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B --(T,M3,-1)-- B and dependence distances of -1 feel a bit weird. :-) Of course, what we really have are two parallel dependencies: A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B A --(T,M1,0)-- M1' --(T,M2,0)-- M2' --(T,N3,0)-- B where M1' and M2' occupy the same position as M1 and M2 in the schedule, but are one stage further along. But we only schedule them once, so if we take the cycle/SCHED_TIME route, we have to introduce dependencies of distance -1. Interesting; had to digest this distance 1 business, a result of thinking in cycles instead of rows (or conversely), and mixing dependences with scheduling; here's my understanding, based on your explanations: Suppose a Use is truely dependent on a Def, where both have been scheduled at some absolute cycles; think of them as timing the first iteration of the loop. Assume first that Use appears originally after Def in the original instruction sequence of the loop (dependence distance 0). In this case, Use requires register moves if its distance D from Def according to the schedule is more than ii cycles long -- by the time Use is executed, the value it needs is no longer available in the def'd register due to intervening occurrences of Def. So in this case, the first reg-move (among D/ii) should appear after Def, recording its value before the next occurrence of Def overwrites it, and feeding subsequent moves as needed before each is overwritten. Thus the scheduling window of this first reg-move is within (Def, Def+ii). Now, suppose Use appears before Def, i.e., Use is upwards-exposed; if it remains
Re: [3/4] SMS: Record moves in the partial schedule
On Wed, Sep 28, 2011 at 4:53 PM, Richard Sandiford richard.sandif...@linaro.org wrote: Ayal Zaks ayal.z...@gmail.com writes: Only request is to document that the register moves are placed/assigned-id's in a specific order. I suppose this is the downside of splitting the patches up, sorry, but the ids are only ordered for the throw-away loop: FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps-reg_moves, i, move) add_insn_before (move-insn, ps_first_note (ps, move-def), NULL); and for the prologue/epilogue code. Both are replaced in patch 4 with code that doesn't rely on the ordering of ids. Ok then, very well. I was mostly referring to the following in prologue/epiloque code, which merely relies on assigning all regmoves of a node consecutive id's: FWIW, the 4/4 that I just posted did finally get rid of the first_reg_move nreg_moves fields. Here's a slightly updated patch in line with your 4/4 comments. The move-def is now always the id of the predecessor, rather than the id of the original ddg node producer. I've adapted the throw-away loop accordingly, but there are no other changes. This is ok. Just to make sure I follow, the changes were (for this patch): 1. setting up a different move-def for each move + move-def = i_reg_move 0 ? first_move + i_reg_move - 1 : i; instead of the same original def for all + move-def = i; 2. inserting each move right before its own def, bottom-up: + FOR_EACH_VEC_ELT (ps_reg_move_info, ps-reg_moves, i, move) +add_insn_before (move-insn, ps_first_note (ps, move-def), NULL); instead of inserting each move right before the original def, top-down: FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps-reg_moves, i, move) add_insn_before (move-insn, ps_first_note (ps, move-def), NULL); Thanks, Ayal. Tested on powerpc64-linux-gnu and with ARM benchmarks. Richard gcc/ * modulo-sched.c (ps_insn): Adjust comment. (ps_reg_move_info): New structure. (partial_schedule): Add reg_moves field. (SCHED_PARAMS): Use node_sched_param_vec instead of node_sched_params. (node_sched_params): Turn first_reg_move into an identifier. (ps_reg_move): New function. (ps_rtl_insn): Cope with register moves. (ps_first_note): Adjust comment and assert that the instruction isn't a register move. (node_sched_params): Replace with... (node_sched_param_vec): ...this vector. (set_node_sched_params): Adjust accordingly. (print_node_sched_params): Take a partial schedule instead of a ddg. Use ps_rtl_insn and ps_reg_move. (generate_reg_moves): Rename to... (schedule_reg_moves): ...this. Remove rescan parameter. Record each move in the partial schedule, but don't emit it here. Don't perform register substitutions here either. (apply_reg_moves): New function. (duplicate_insns_of_cycles): Use register indices directly, rather than finding instructions using PREV_INSN. Use ps_reg_move. (sms_schedule): Call schedule_reg_moves before committing to a partial schedule. Try the next ii if the schedule fails. Use apply_reg_moves instead of generate_reg_moves. Adjust call to print_node_sched_params. Free node_sched_param_vec instead of node_sched_params. (create_partial_schedule): Initialize reg_moves. (free_partial_schedule): Free reg_moves. Index: gcc/modulo-sched.c === --- gcc/modulo-sched.c 2011-09-28 09:03:10.334301485 +0100 +++ gcc/modulo-sched.c 2011-09-28 11:24:26.530300781 +0100 @@ -124,7 +124,9 @@ #define PS_STAGE_COUNT(ps) (((partial_sc /* A single instruction in the partial schedule. */ struct ps_insn { - /* The number of the ddg node whose instruction is being scheduled. */ + /* Identifies the instruction to be scheduled. Values smaller than + the ddg's num_nodes refer directly to ddg nodes. A value of + X - num_nodes refers to register move X. */ int id; /* The (absolute) cycle in which the PS instruction is scheduled. @@ -137,6 +139,30 @@ struct ps_insn }; +/* Information about a register move that has been added to a partial + schedule. */ +struct ps_reg_move_info +{ + /* The source of the move is defined by the ps_insn with id DEF. + The destination is used by the ps_insns with the ids in USES. */ + int def; + sbitmap uses; + + /* The original form of USES' instructions used OLD_REG, but they + should now use NEW_REG. */ + rtx old_reg; + rtx new_reg; + + /* An instruction that sets NEW_REG to the correct value. The first + move associated with DEF will have an rhs of OLD_REG; later moves + use the result of the previous move. */ + rtx insn; +}; + +typedef struct ps_reg_move_info ps_reg_move_info; +DEF_VEC_O (ps_reg_move_info); +DEF_VEC_ALLOC_O (ps_reg_move_info
Re: [PATCH, SMS 1/2] Avoid generating redundant reg-moves
On Fri, Sep 30, 2011 at 10:03 AM, Revital Eres revital.e...@linaro.org wrote: Hello, This + /* Skip instructions that do not set a register. */ + if (set !REG_P (SET_DEST (set))) + continue; is ok. Can you also prevent !set insns from having reg_moves? (To be updated once auto_inc insns will be supported, if they'll deserve reg_moves too.) I added a check to verify that no reg-moves are created for !set instructions. Currently re-testing on ppc64-redhat-linux (bootstrap and regtest) and arm-linux-gnueabi (bootstrap c). OK to commit once tesing completes? OK. later case latter case. Ayal. Thanks, Revital gcc/ * modulo-sched.c (generate_reg_moves): Skip instructions that do not set a register and verify no regmoves are created for !single_set instructions. testsuite/ * gcc.dg/sms-10.c: New file.
Re: [PATCH, SMS 1/2] Avoid generating redundant reg-moves
On Tue, Sep 27, 2011 at 10:47 AM, Revital Eres revital.e...@linaro.org wrote: Hello, This + /* Skip instructions that do not set a register. */ + if (set !REG_P (SET_DEST (set))) + continue; is ok. Can you also prevent !set insns from having reg_moves? (To be updated once auto_inc insns will be supported, if they'll deserve reg_moves too.) Do you mean leaving any anti-dep edges to !set instructions similar to what is done for auto_inc addresses in part 2 of this patch? No, I mean have a if (!set) continue; right after the above in generate_reg_moves(). For patch 2/2 this would need to be updated to allow generating reg_moves for non-single_set's which are auto-inc's (making sure we generate them for the right reg). Ayal. Thanks, Revital
Re: [PATCH, SMS 2/2] Support instructions with REG_INC_NOTE (second try)
On Tue, Sep 27, 2011 at 9:47 AM, Revital Eres revital.e...@linaro.org wrote: Hello, ok, so if we have an auto-inc'ing insn which defines (auto-inc's) an addr register and another (say, result) register, we want to allow the result register to have life ranges in excess of ii (by eliminating anti-dep edges of distance 1 from uses to def, and generating reg_moves if/where needed), but avoid having such life ranges of addr (by retaining such anti-dep edges). Right? Yes. Are these all the edges? We have only one True dependence edge from insn 1 to insn 2, but insn 1 is setting two registers both used by insn 2 (regardless of what we decide to do with Anti-deps). As for Anti-deps of distance 1, we have only one going back from insn 2 to insn 1, perhaps corresponding to addr, allowing reg_moves for def1(?). But, it won't help def1, because this other Anti-dep will force them to be scheduled w/o reg_moves. Please ignore the edges in the previous example. It indeed was a mistake, sorry about the confusion. Here are two examples taken from bootstrap on PPC of how the address is used; with the current patch applied and running with -fmodulo-sched-allow-regmoves: Ok, this does have two anti-dep edges. But still, only a single true dependence(?) ... can you see why? Thanks, Ayal. Node num: 2 (insn 3681 3678 3682 500 (set (reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ]) (mem:QI (pre_dec:DI (reg:DI 1644 [ ivtmp.687 ])) [0 MEM[base: D.9586_4130, offset: 0B]+0 S1 A8])) ../../gcc/libiberty/regex.c:4259 358 {*movqi_internal} (expr_list:REG_INC (reg:DI 1644 [ ivtmp.687 ]) (nil))) OUT ARCS: [3681 -(T,2,1)- 3681] [3681 -(T,2,0)- 3682] IN ARCS: [3682 -(A,0,1)- 3681] [3681 -(T,2,1)- 3681] [3682 -(A,0,1)- 3681] [3682 -(T,2,1)- 3681] Node num: 3 (insn 3682 3681 3683 500 (set (mem:QI (plus:DI (reg:DI 1644 [ ivtmp.687 ]) (const_int 3 [0x3])) [0 MEM[base: D.9586_4130, offset: 3B]+0 S1 A8]) (reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ])) ../../gcc/libiberty/regex.c:4259 358 {*movqi_internal} (expr_list:REG_DEAD (reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ]) (nil))) OUT ARCS: [3682 -(A,0,1)- 3681] [3682 -(A,0,1)- 3681] [3682 -(O,0,0)- 7263] [3682 -(A,0,0)- 3683] [3682 -(T,2,1)- 3681] IN ARCS: [3681 -(T,2,0)- 3682] Another example of usage is as follows (the address register is not used in MEM): Node num: 0 (insn 1419 1415 1423 9 (set (mem/f:DI (pre_inc:DI (reg:DI 1882 [ ivtmp.1636 ])) [3 MEM[base: D.10911_2945, offset: 0B]+0 S8 A64]) (reg/f:DI 3923)) ../../gcc/libiberty/regex.c:5788 378 {*movdi_internal64} (expr_list:REG_INC (reg:DI 1882 [ ivtmp.1636 ]) (nil))) OUT ARCS: [1419 -(T,2,1)- 1419] [1419 -(O,0,0)- 5932] [1419 -(O,0,0)- 1449] [1419 -(T,2,1)- 1434] [1419 -(T,2,0)- 1434] [1419 -(T,2,0)- 1432] [1419 -(O,0,0)- 1431] [1419 -(O,0,0)- 1427] [1419 -(O,0,0)- 1423] IN ARCS: [1419 -(T,2,1)- 1419] [1432 -(A,0,1)- 1419] [1449 -(O,0,1)- 1419] [1434 -(A,0,1)- 1419] [1431 -(O,0,1)- 1419] [1427 -(O,0,1)- 1419] [1423 -(O,0,1)- 1419] Node num: 4 (insn 1432 1431 1433 9 (set (reg:DI 2632) (plus:DI (reg/v/f:DI 1058 [ reg_info ]) (reg:DI 1882 [ ivtmp.1636 ]))) ../../gcc/libiberty/regex.c:5543 79 {*adddi3_internal1} (nil)) OUT ARCS: [1432 -(A,0,1)- 1419] [1432 -(T,1,0)- 1433] IN ARCS: [1419 -(T,2,0)- 1432] OK for mainline? OK, with the following comments: Thanks, will address the comments and re-submit. In other words, one would expect to see two Anti-dep edges from insn 2 to insn 1, right? Yes, that's indeed the case in the first example above. Thanks, Revital
Re: [PATCH, SMS 1/2] Avoid generating redundant reg-moves
OK for mainline? Doh, hard to believe we never checked that an insn defines a register before spitting out reg_moves for it ... nice catch. This + /* Skip instructions that do not set a register. */ + if (set !REG_P (SET_DEST (set))) + continue; is ok. Can you also prevent !set insns from having reg_moves? (To be updated once auto_inc insns will be supported, if they'll deserve reg_moves too.) Ayal. On Mon, Sep 26, 2011 at 7:31 AM, Revital Eres revital.e...@linaro.org wrote: Hello, The attached patch contains a fix to generate_reg_moves function. Currently we can generate reg-moves for stores which are later eliminated. This happens when we have mem dependency with distance 1 and as a result the number of regmoves is at least 1 based on the following calculation taken from generate_reg_moves (): if (e-distance == 1) nreg_moves4e = (SCHED_TIME (e-dest) - SCHED_TIME (e-src) + ii) / ii; This is an example of register move generated in such cases: reg_move = (insn 152 119 75 4 (set (reg:SI 231) (mem:SI (pre_modify:DI (reg:DI 215) (plus:DI (reg:DI 215) (reg:DI 171 [ ivtmp.42 ]))) [3 MEM[base: pretmp.27_65, index: ivtmp.42_9, offset: 0B]+0 S4 A32])) -1 (nil)) When not handling REG_INC instructions this was not a problem as these reg-moves were removes by dead code elimination. for example: insn 1) mem[x] = ... insn 2) .. = mem[y] When reg-move reg1 = mem [x] was generated mem[x] is not been used in insn 2 and thus reg1 could be eliminated. But with REG_INC this is different because the reg-move instruction remains and leads to bad gen. The attached tescase capture this case. Tested and bootstrap with patch 2 on ppc64-redhat-linux enabling SMS on loops with SC 1. On arm-linux-gnueabi bootstrap c on top of the set of patches that support do-loop pattern (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01807.html) which solves the bootstrap failure on ARM with SMS flags. OK for mainline? Thanks, Revital gcc/ * modulo-sched.c (generate_reg_moves): Skip instructions that do not set a register. testsuite/ * gcc.dg/sms-10.c: New file.
Re: [PATCH, SMS 2/2] Support instructions with REG_INC_NOTE (second try)
On Mon, Sep 26, 2011 at 7:31 AM, Revital Eres revital.e...@linaro.org wrote: Hello, This patch extends the implementation to support instructions with REG_INC notes. It addresses the comments from the previous submission: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01299.html. ok, so if we have an auto-inc'ing insn which defines (auto-inc's) an addr register and another (say, result) register, we want to allow the result register to have life ranges in excess of ii (by eliminating anti-dep edges of distance 1 from uses to def, and generating reg_moves if/where needed), but avoid having such life ranges of addr (by retaining such anti-dep edges). Right? btw, regarding your previous question about the usage of the address register been auto inc, apparently it can be used as follows: insn 1) def1 = MEM [ pre_dec (addr) ] out edges: [1 -(T,2,1)- 1] [1 -(T,2,0)- 2] in edges: [1 -(T,2,1)- 1] [2 -(T,2,1)- 1] [2 -(A,0,1)-1] insn 2) MEM [ addr + 3 ] = def1 out edges: [2 -(T,2,1)- 1] [2 -(A,0,1)-1] in edges: [1 -(T,2,0)- 2] Are these all the edges? We have only one True dependence edge from insn 1 to insn 2, but insn 1 is setting two registers both used by insn 2 (regardless of what we decide to do with Anti-deps). As for Anti-deps of distance 1, we have only one going back from insn 2 to insn 1, perhaps corresponding to addr, allowing reg_moves for def1(?). But, it won't help def1, because this other Anti-dep will force them to be scheduled w/o reg_moves. Reg-moves were not created for the address when testing on ppc. ok; note that the example above shows one could potentially have an insn 2 scheduled in a later stage than insn 1, wanting to feed off an earlier version of the pre_dec'd addr. (In this case, it would probably be better to update the offset (3) than use a reg_move for the addr. (Such folding might work for other cases as well(?))). Tested and bootstrap with patch 1 on ppc64-redhat-linux enabling SMS on loops with SC 1. On arm-linux-gnueabi bootstrap c on top of the set of patches that support do-loop pattern (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01807.html) which solves the bootstrap failure on ARM with SMS flags. OK for mainline? OK, with the following comments: Make sure reg_moves are generated for the correct (result, not addr) register, in generate_reg_moves(). beenbeing (multiple appearances). Add a note that autoinc_var_is_used_p (rtx def_insn, rtx use_insn) doesn't need to consider the specific address register; no reg_moves will be allowed for any life range defined by def_insn and used by use_insn, if use_insn uses an address register auto-inc'ed by def_insn. In other words, one would expect to see two Anti-dep edges from insn 2 to insn 1, right? Ayal. Thanks, Revital * ddg.c (autoinc_var_is_used_p): New function. (create_ddg_dep_from_intra_loop_link, add_cross_iteration_register_deps): Call it. * modulo-sched.c (sms_schedule): Handle instructions with REG_INC.
Re: [3/4] SMS: Record moves in the partial schedule
Only request is to document that the register moves are placed/assigned-id's in a specific order. I suppose this is the downside of splitting the patches up, sorry, but the ids are only ordered for the throw-away loop: FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps-reg_moves, i, move) add_insn_before (move-insn, ps_first_note (ps, move-def), NULL); and for the prologue/epilogue code. Both are replaced in patch 4 with code that doesn't rely on the ordering of ids. Ok then, very well. I was mostly referring to the following in prologue/epiloque code, which merely relies on assigning all regmoves of a node consecutive id's: i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u)); /* The reg_moves start from the *first* reg_move backwards. */ ! i_reg_move = SCHED_FIRST_REG_MOVE (u) + (i_reg_moves - 1); Ayal. 2011/9/22 Richard Sandiford richard.sandif...@linaro.org Ayal Zaks ayal.z...@gmail.com writes: Richard Sandiford richard.sandif...@linaro.org wrote on 30/08/2011 03:10:50 PM: From: Richard Sandiford richard.sandif...@linaro.org To: gcc-patches@gcc.gnu.org Cc: Ayal Zaks/Haifa/IBM@IBMIL Date: 30/08/2011 03:10 PM Subject: [3/4] SMS: Record moves in the partial schedule This patch adds infrastructure that will be used by the final patch. Specifically: - it splits the generation of register moves into two: schedule_reg_moves records moves in the partial schedule, while apply_reg_moves makes the register substitutions. This patch doesn't actually schedule the moves. Instead, there's some throw-away code in apply_reg_moves to emit the moves in the same as we do now. That's throw-away code that will be removed in the final patch. - schedule_reg_moves is allowed to fail. We then try again with the next ii (subject to the usual ii limits). In this patch, schedule_reg_moves always returns true. - The partial schedule uses ids to represent register moves. The first register move has id g-num_nodes. Richard This is fine. Thanks. Only request is to document that the register moves are placed/assigned-id's in a specific order. I suppose this is the downside of splitting the patches up, sorry, but the ids are only ordered for the throw-away loop: FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps-reg_moves, i, move) add_insn_before (move-insn, ps_first_note (ps, move-def), NULL); and for the prologue/epilogue code. Both are replaced in patch 4 with code that doesn't rely on the ordering of ids. I'd rather the code didn't rely on any ordering here. If any future code is added that needs to know more about the moves, I think it should use the move structure instead of the ids. (There's already a fair bit of info in the structure, but we could add more later if we need it.) Functionality-wise it results in identical code as current version, right? Yeah, that's right. Only patch 4 does anything useful to the output. Richard
Re: [4/4] Make SMS schedule register moves
Richard Sandiford richard.sandif...@linaro.org wrote on 30/08/2011 03:29:26 PM: From: Richard Sandiford richard.sandif...@linaro.org To: gcc-patches@gcc.gnu.org Cc: Ayal Zaks/Haifa/IBM@IBMIL Date: 30/08/2011 03:29 PM Subject: [4/4] Make SMS schedule register moves This is the move-scheduling patch itself. It should be fairly self-explanatory. Let me know if it isn't, and I'll try to improve the commentary. One potentially controversial change is to the way we handle moves in the prologue and epilogue. The current code uses a conservative - conservativeaccurate? (Your approach seems to be more conservative) check to decide which stages need which moves. This check copes with values that are live before the loop, and emits some moves that aren't actually needed. The code also emits chains of moves that can be trivially optimised through propagation. We rely on later patches to clean this up for us (and they do). So, rather than keep a rather complicated check here, I've simply emitted the moves for all stages. In principle, that will generate a little more dead code than now in cases where chains of two moves are needed, but that seems to be very rare (when moves are scheduled). indeed, it's better to simplify code generation and rely on subsequent cleanups. Not sure about generating more (dead?) code in principle; could you elaborate, for the record? Richard gcc/ * modulo-sched.c (extend_node_sched_params): New function. (print_node_sched_params): Print the stage. (set_columns_for_row, schedule_reg_move): New functions. (set_columns_for_ps): Move up file and use set_columns_for_now. typo: nowrow (unless you intend this to be temporary ;-) (schedule_reg_moves): Call extend_node_sched_params and schedule_reg_move. Extend size of uses bitmap. Return false if a move could not be scheduled. (apply_reg_moves): Don't emit moves here. (permute_partial_schedule): Handle register moves. (duplicate_insns_of_cycles): Remove for_prolog. Always emit moves. Always emit all moves (generate_prolog_epilog): Update calls accordingly. Index: gcc/modulo-sched.c As I'm going over the patch, let me make sure I understand what's going on: .. +/* Try to schedule the move with ps_insn identifier I_REG_MOVE in PS. + The source of the move is provided by I_MUST_FOLLOW, which has + already been scheduled. The source y of an x=y move is defined by the previous iteration of the next move y=z (or by the original y=... move-def). We schedule moves one after the other bottom-up, starting from the original move-def moving backwards in cycles. This way, the instruction I_MUST_FOLLOW defining y is always already scheduled when we come to schedule the dependent I_REG_MOVE x=y move. .. + /* The cyclic lifetime of move-new_reg starts and ends at move-def + (the instruction that defines move-old_reg). So instruction I_REG_MOVE (new_reg=reg) must be scheduled before the next I_MUST_FOLLOW move/original-def (due to anti-dependence: it overwrites reg), but after the previous instance of I_MUST_FOLLOW (due to true dependence; i.e. account for latency also). Why do moves, except for the one closest to move-def (which is directly dependent upon it, i.e. for whom move-def == I_MUST_FOLLOW), have to worry about move-def at all? (Or have their cyclic lifetimes start and end there?) In general, there's a distinction between the cycle an instruction is scheduled at (for the first iteration), which is an absolute arbitrary integer, and the row it is placed in the PS, which is between 0 and ii-1, and is simply cycle mod ii. When scheduling, it's clearer to reason about cycles, especially if your window includes a row twice. In addition to the (true+anti) dependences involving I_REG_MOVE with I_MUST_FOLLOW, it has (true+anti) dependences with each use it feeds. We could alternatively set these dependencies of I_REG_MOVE on I_MUST_FOLLOW, and on its uses, and call get_sched_window(). But it's presumably simpler to handle it directly here. Right? Ayal.
Re: [3/4] SMS: Record moves in the partial schedule
Richard Sandiford richard.sandif...@linaro.org wrote on 30/08/2011 03:10:50 PM: From: Richard Sandiford richard.sandif...@linaro.org To: gcc-patches@gcc.gnu.org Cc: Ayal Zaks/Haifa/IBM@IBMIL Date: 30/08/2011 03:10 PM Subject: [3/4] SMS: Record moves in the partial schedule This patch adds infrastructure that will be used by the final patch. Specifically: - it splits the generation of register moves into two: schedule_reg_moves records moves in the partial schedule, while apply_reg_moves makes the register substitutions. This patch doesn't actually schedule the moves. Instead, there's some throw-away code in apply_reg_moves to emit the moves in the same as we do now. That's throw-away code that will be removed in the final patch. - schedule_reg_moves is allowed to fail. We then try again with the next ii (subject to the usual ii limits). In this patch, schedule_reg_moves always returns true. - The partial schedule uses ids to represent register moves. The first register move has id g-num_nodes. Richard This is fine. Only request is to document that the register moves are placed/assigned-id's in a specific order. Functionality-wise it results in identical code as current version, right? Thanks, Ayal.
Re: [2/4] SMS: Use ids to represent ps_insns
2011/9/13 Richard Sandiford richard.sandif...@linaro.org Ayal Zaks ayal.z...@gmail.com writes: So instead of navigating directly from ps_insn-ddg_node-node_sched_params, we now use indices and lookup pointees in ddg_node and node_sched_params arrays. A bit of a nuisance, but it's ok with me. Well, IMO, ps_insn-ddg_node-node_sched_params is the same amount of indirection as node_sched_params[ps_insn-id]. Agreed. Both involve one indirection. The difference is in lookup versus using each single direct pointer access. o One could still achieve the goal of having ps_insn's with node_sched_params but free of ddg_node's, w/o penalizing the existing direct access from ddg_node-node_sched_params, and w/o replicating the interface. If you add an (insn field and an) aux union to ps_insn, whose info points to your node_sched_params, you could reuse the same set of macros. Admittedly, it's a space-time tradeoff, where the space is probably not a major concern, and there are other places to look for first in sms to save time. I'm not sure what you mean by replicating the interface. There's still only one interface for getting schedule params: everything still goes through the SCHED_* macros. If anything, I think having a way of going directly from the ps_insn to the sched params _would_ replicate the interface, because some parts of SMS want the scheduling parameters for a ddg node rather than a ps_insn. So some parts would (presumably) still use SCHED_* macros for a node, while the rest would use some other interface for ps_insns. If we add to ps_insn the same aux structure as ddg_node has, with info pointing to node_sched_params, we could reuse the same set of macros. But I'll do that if you think it's better. No need, your solution is fine. Just wanted to clarify the options. If we do, I think we should remove the current SCHED_* macros and just have one that goes from a ddg node to the parameters structure itself. So SCHED_TIME (node) becomes SCHED_PARAMS (node)-time, etc. Code that operates on ps_insns could just use pi-sched_params-time. But if we want to keep SCHED_TIME, etc., macros that can be used for everything, then I don't see how changes to ps_insn would help. o first_reg_move and nreg_moves will be redundant for regmoves, right? No refactoring is perfect... After the patches, they're only there for debugging. I did wonder whether I should just remove them. I'd prefer not to keep useless fields around. If left only for debugging, better document and/or rename. o it could be useful to have a debug version which checks array bounds, in case one feeds a bad index. FWIW, VEC does this for us. ok, right, in patch [3/4], for node_sched_params and ps_reg_move_info. o and a minor comment below: /* The scheduling parameters held for each node. */ typedef struct node_sched_params { - int asap; /* A lower-bound on the absolute scheduling cycle. */ int time; /* The absolute scheduling cycle (time = asap). */ Please fix/remove the (time = asap) comment, as there's no asap there any more. OK. Thanks, Ayal. Richard
Re: [2/4] SMS: Use ids to represent ps_insns
Richard Sandiford richard.sandif...@linaro.org wrote on 30/08/2011 03:03:59 PM: From: Richard Sandiford richard.sandif...@linaro.org To: gcc-patches@gcc.gnu.org Cc: Ayal Zaks/Haifa/IBM@IBMIL Date: 30/08/2011 03:05 PM Subject: [2/4] SMS: Use ids to represent ps_insns Instructions in a partial schedule are currently represented as a ddg node. This patch uses a more abstract id instead. At the moment, the ids map directly to ddg nodes, but the next patch will add register moves to the end. One slight advantage of using ids is that we can leave the ASAP value on the node; we don't need to copy it across to the scheduling info. (Later patches use the same scheduling info for moves, for which an ASAP value would be wasted space.) Richard OK. So instead of navigating directly from ps_insn-ddg_node-node_sched_params, we now use indices and lookup pointees in ddg_node and node_sched_params arrays. A bit of a nuisance, but it's ok with me. A couple of thoughts: o One could still achieve the goal of having ps_insn's with node_sched_params but free of ddg_node's, w/o penalizing the existing direct access from ddg_node-node_sched_params, and w/o replicating the interface. If you add an (insn field and an) aux union to ps_insn, whose info points to your node_sched_params, you could reuse the same set of macros. Admittedly, it's a space-time tradeoff, where the space is probably not a major concern, and there are other places to look for first in sms to save time. o first_reg_move and nreg_moves will be redundant for regmoves, right? No refactoring is perfect... o it could be useful to have a debug version which checks array bounds, in case one feeds a bad index. o and a minor comment below: /* The scheduling parameters held for each node. */ typedef struct node_sched_params { - int asap; /* A lower-bound on the absolute scheduling cycle. */ int time; /* The absolute scheduling cycle (time = asap). */ Please fix/remove the (time = asap) comment, as there's no asap there any more. Thanks, Ayal.
Re: [1/4] SMS: remove register undo list
Resending; didn't seem to go through. -- Forwarded message -- From: Ayal Zaks ayal.z...@gmail.com Date: 2011/9/11 Subject: gcc-patches@gcc.gnu.org To: Richard Sandiford richard.sandif...@linaro.org Richard Sandiford richard.sandif...@linaro.org wrote on 30/08/2011 02:58:22 PM: From: Richard Sandiford richard.sandif...@linaro.org To: gcc-patches@gcc.gnu.org Cc: Ayal Zaks/Haifa/IBM@IBMIL Date: 30/08/2011 02:58 PM Subject: [1/4] SMS: remove register undo list This patch removes the (unused) undo_replace_buff_elem list. Verges on the obvious, but still. Sure, this is fine. Thanks for the clean-up, Ayal. Patch 3 splits the generation of moves into two: one function to record what needs to happen, and another function to actually do it. It's then easy to bail out if we decide that we don't want the moves. Richard gcc/ * modulo-sched.c (undo_replace_buff_elem): Delete. (generate_reg_moves): Don't build and return an undo list. (free_undo_replace_buff): Delete. (sms_schedule): Adjust call to generate_reg_moves. Don't call free_undo_replace_buff. Index: gcc/modulo-sched.c === --- gcc/modulo-sched.c 2011-08-24 12:38:37.171362916 +0100 +++ gcc/modulo-sched.c 2011-08-24 12:56:17.754942951 +0100 @@ -165,17 +165,6 @@ struct partial_schedule int stage_count; /* The stage count of the partial schedule. */ }; -/* We use this to record all the register replacements we do in - the kernel so we can undo SMS if it is not profitable. */ -struct undo_replace_buff_elem -{ - rtx insn; - rtx orig_reg; - rtx new_reg; - struct undo_replace_buff_elem *next; -}; - - static partial_schedule_ptr create_partial_schedule (int ii, ddg_ptr, int history); static void free_partial_schedule (partial_schedule_ptr); @@ -456,13 +445,12 @@ print_node_sched_params (FILE *file, int nreg_moves = --- + 1 - { dependence. ii { 1 if not. */ -static struct undo_replace_buff_elem * +static void generate_reg_moves (partial_schedule_ptr ps, bool rescan) { ddg_ptr g = ps-g; int ii = ps-ii; int i; - struct undo_replace_buff_elem *reg_move_replaces = NULL; for (i = 0; i g-num_nodes; i++) { @@ -543,22 +531,6 @@ generate_reg_moves (partial_schedule_ptr EXECUTE_IF_SET_IN_SBITMAP (uses_of_defs[i_reg_move], 0, i_use, sbi) { - struct undo_replace_buff_elem *rep; - - rep = (struct undo_replace_buff_elem *) - xcalloc (1, sizeof (struct undo_replace_buff_elem)); - rep-insn = g-nodes[i_use].insn; - rep-orig_reg = old_reg; - rep-new_reg = new_reg; - - if (! reg_move_replaces) - reg_move_replaces = rep; - else - { - rep-next = reg_move_replaces; - reg_move_replaces = rep; - } - replace_rtx (g-nodes[i_use].insn, old_reg, new_reg); if (rescan) df_insn_rescan (g-nodes[i_use].insn); @@ -568,21 +540,6 @@ generate_reg_moves (partial_schedule_ptr } sbitmap_vector_free (uses_of_defs); } - return reg_move_replaces; -} - -/* Free memory allocated for the undo buffer. */ -static void -free_undo_replace_buff (struct undo_replace_buff_elem *reg_move_replaces) -{ - - while (reg_move_replaces) - { - struct undo_replace_buff_elem *rep = reg_move_replaces; - - reg_move_replaces = reg_move_replaces-next; - free (rep); - } } /* Update the sched_params (time, row and stage) for node U using the II, @@ -1472,8 +1429,6 @@ sms_schedule (void) } else { - struct undo_replace_buff_elem *reg_move_replaces; - if (!opt_sc_p) { /* Rotate the partial schedule to have the branch in row ii-1. */ @@ -1523,13 +1478,11 @@ sms_schedule (void) /* The life-info is not valid any more. */ df_set_bb_dirty (g-bb); - reg_move_replaces = generate_reg_moves (ps, true); + generate_reg_moves (ps, true); if (dump_file) print_node_sched_params (dump_file, g-num_nodes, g); /* Generate prolog and epilog. */ generate_prolog_epilog (ps, loop, count_reg, count_init); - - free_undo_replace_buff (reg_move_replaces); } free_partial_schedule (ps);
Re: [PATCH, SMS] Minor misc. fixes
Copying the lists.. -- Forwarded message -- From: Ayal Zaks ayal.z...@gmail.com Date: 2011/9/11 Subject: Re: [PATCH, SMS] Minor misc. fixes To: Revital Eres revital.e...@linaro.org 2011/9/8 Revital Eres revital.e...@linaro.org Hello, The attached patch contains minor fixes. Currently testing and bootstrap on ppc64-redhat-linux enabling SMS on loops with SC 1. OK for mainline once testing completes? OK. While we're at it, an alternative would be to have remove_node_from_ps() assert its own (parameters and) return value. That is, replace if (c) return false by assert (!c) and have it return void if successful. There's not much you can do if it returns false. That would check its other invocation too. Ayal. Thanks, Revital Changelog * modulo-sched.c (optimize_sc): Call remove_node_from_ps outside of gcc_assert. (sms_schedule): Add print info.
Re: [PATCH, SMS] Support instructions with REG_INC_NOTE (re-submisson)
Ok, so this extends the infrastructure to support insns which set an arbitrary number of registers, but currently specifically handles only REG_INC situations (which set two registers). I'm not against {0,1,infinity}, but wonder if this case really deserves the complexity: post/pre-inc/decrementing load insns may need regmoves for the register loaded, due to the latency of the load and desire to schedule associated uses farther than ii cycles away (as do regular loads), but do they also need regmoves for the address register being post/pre-inc/decremented? Its latency should not be long, and it's often feeding only itself so regmoves are not needed/won't help. If not, perhaps a simpler solution is to allow REG_INC insns but disallow their address register from being regmove'd, dedicating the single regmove info for the value loaded. Are there actually cases where you need the address register to regmove? Ayal. 2011/8/15 Revital Eres revital.e...@linaro.org Hello, This is a re-submission of the patch to support instructions with REG_INC_NOTE. (http://gcc.gnu.org/ml/gcc-patches/2011-04/msg01309.html) It contains a minor change from the previous submission suggested by Richard Sandiford: to use reg_referenced_p instead of rtx_referenced_p. The patch supports instructions with auto inc/dec operations as to support such instructions we can not assume that there is only one definition in the instruction. This assumption is currently used to generate reg-moves information. Because the auto inc/dec instruction defines more then one register we need to generate different reg-moves for each of the definitions. At some point in the SMS procedure we need to find the first reg-move generated for a specific definition(*) and if there is more than one definition in the instruction (lets call it insn1) we can not simply assume it's first reg-move instruction is the first instruction right before insn1. for example, the following instruction defines two variables: x and i: x = mem [POST_INC (i)] lets say both of them need reg-move insn 1) reg_move_i = i insn 2) reg_move_x = x insn 3) x = mem [POST_INC (i)] then in order to reach the first reg-move of i we can not just go to insn 2 like in the current implementation, so we need to save for each definition in the instruction it's first reg-move instruction. (*) I'm referring to the generation of prologue and epilogue (duplicate_insns_of_cycles) which uses first_reg_move field in node_sched_params structure, and SCHED_FIRST_REG_MOVE definition in modulo-sched.c. Reg-tested and bootstrap on ppc64-redhat-linux enabling SMS on loops with SC 1. OK for mainline? Thanks, Revital * modulo-sched.c (record_inc_dec_insn_info, free_node_sched_params): New functions. (SCHED_FIRST_REG_MOVE, SCHED_NREG_MOVES): Remove. (struct regmove_info): New. (insn_regmove_info): New field in node_sched_params. (print_node_sched_params): Print information for all the definitions in the instructions. (generate_reg_moves, duplicate_insns_of_cycles, set_node_sched_params): Adjust the code to handle instructions that have multiple definitions. (sms_schedule): Handle loops that contain instructions with FIND_REG_INC_NOTE and call free_node_sched_params.
Re: Patches ping
[PATCH, SMS 3/4] Optimize stage count http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01341.html This patch for minimizing the stage count (which also refactors and cleans up the code) is approved. Have some minor comments below, followed by some thoughts for possible follow-up improvements. Thanks, Ayal. Changelog: (sms_schedule_by_order): Update call to get_sched_window. all set_must_precede_follow. ^^^ call +/* Set bitmaps TMP_FOLLOW and TMP_PRECEDE to MUST_FOLLOW and MUST_PRECEDE + respectively only if cycle C falls in the scheduling window boundaries ^^ on the border of sbitmap tmp_precede = NULL; sbitmap tmp_follow = NULL; are redundantly reset in set_must_precede_follow(). +/* Update the sched_params for node U using the II, ^ (time, row and stage) + the CYCLE of U and MIN_CYCLE. */ Please also clarify that we're not simply taking SCHED_STAGE (u) = CALC_STAGE_COUNT (SCHED_TIME (u), min_cycle, ii); because the stages are aligned on cycle 0. + /* First, normailize the partial schedualing. */ ^ ^ + /* Try to achieve optimized SC by normalizing the partial + schedule (having the cycles start from cycle zero). The branch ^ + location must be placed in row ii-1 in the final scheduling. + If that's not the case after the normalization then try to ^^ + move the branch to that row if possible. */ ^ If failed, shift all instructions to position the branch in row ii-1. For consistency and clarity, may be instead of: + /* Bring the branch to cycle -1. */ + int amount = SCHED_TIME (g-closing_branch) + 1; it would be better to have: + /* Bring the branch to cycle ii-1. */ + int amount = SCHED_TIME (g-closing_branch) - (ii - 1); Some thoughts on possible improvements (not mandatory; especially given the delay in approval, sorry..thanks for the ping): o Have optimize_sc() take care of all possible rotations doing the best it can, without returning an indication which leaves subsequent (suboptimal) processing to the caller. o Instead of removing the branch and then trying to get it back into the same cycle if you can't place it in row ii-1, consider keeping it in its place and removing it only if you succeed to schedule it (also..) in row ii-1. o May be worthwhile to apply more refactoring, so that the code to reschedule the branch in row ii-1 reuses more of the general code for scheduling an instruction (possibly setting end = start + end), but avoid splitting a row. o Would be interesting to learn of loops that still have suboptimal SC's, to see if it's still an issue. Ayal. 2011/7/20 Revital Eres revital.e...@linaro.org Hello, [PATCH, SMS 3/4] Optimize stage count http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01341.html [PATCH, SMS 4/4] Misc. fixes http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01342.html [PATCH, SMS] Fix calculation of issue_rate http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01344.html Thanks, Revital
Re: Patches ping
[PATCH, SMS 4/4] Misc. fixes http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01342.html Sure, this is fine. (Sorry for all the previous '?'s..). Thanks, Ayal. 2011/7/20 Revital Eres revital.e...@linaro.org Hello, [PATCH, SMS 3/4] Optimize stage count http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01341.html [PATCH, SMS 4/4] Misc. fixes http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01342.html [PATCH, SMS] Fix calculation of issue_rate http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01344.html Thanks, Revital
Re: Patches ping
[PATCH, SMS] Fix calculation of issue_rate http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01344.html This is ok (with the updated Changelog). Alternatively, we can have a local variable for holding the issue_rate. Ayal. 2011/7/20 Revital Eres revital.e...@linaro.org: Hello, [PATCH, SMS 3/4] Optimize stage count http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01341.html [PATCH, SMS 4/4] Misc. fixes http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01342.html [PATCH, SMS] Fix calculation of issue_rate http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01344.html Thanks, Revital
Re: [PATCH, SMS 1/4] Fix calculation of row_rest_count
Revital Eres revital.e...@linaro.org wrote on 14/06/2011 09:27:32 AM: From: Revital Eres revital.e...@linaro.org To: Ayal Zaks/Haifa/IBM@IBMIL Cc: gcc-patches@gcc.gnu.org, Patch Tracking patc...@linaro.org Date: 14/06/2011 09:27 AM Subject: Re: [PATCH, SMS 1/4] Fix calculation of row_rest_count Hello, Please add the following: o A clarification that rows_length is used only (as an optimization) to back off quickly from trying to schedule a node in a full row; that is, to avoid running through futile DFA state transitions. o An assert that ps-rows_length[i] equals the number of nodes in ps- rows [i] (e.g., in verify_partial_schedule(); and then recheck...). Done. Besides the additions to address your comments I also added two more bits to update rows_length field into rotate_partial_schedule () and ps_insert_empty_row () that were missing in the previous implementation of the patch. Glad the additional assertion already proved useful ;-) The patch was re-tested on top of the patch to fix violation of memory dependence (http://gcc.gnu.org/ml/gcc-patches/2011-06/msg00960.html) as follows: On ppc64-redhat-linux regtest as well as bootstrap with SMS flags, enabling SMS also on loops with stage count 1. Regtested on SPU. On arm-linux-gnueabi bootstrap c language with SMS flags enabling SMS also on loops with stage count 1 and currently regtseted on c,c++. OK for mainline with these changes once regtests on arm-linux-gnueabi completes? OK. Thanks, Ayal. Thanks, Revital * modulo-sched.c (struct ps_insn): Remove row_rest_count field. (struct partial_schedule): Add rows_length field. (verify_partial_schedule): Check rows_length. (ps_insert_empty_row): Handle rows_length. (create_partial_schedule): Likewise. (free_partial_schedule): Likewise. (reset_partial_schedule): Likewise. (create_ps_insn): Remove rest_count argument. (remove_node_from_ps): Update rows_length. (add_node_to_ps): Update rows_length and call create_ps_insn without passing row_rest_count. (rotate_partial_schedule): Update rows_length. [attachment patch_sms_5_6.txt deleted by Ayal Zaks/Haifa/IBM]
Re: [PATCH, SMS] Fix violation of memory dependence
Revital Eres revital.e...@linaro.org wrote on 13/06/2011 10:29:06 AM: From: Revital Eres revital.e...@linaro.org To: Ayal Zaks/Haifa/IBM@IBMIL Cc: gcc-patches@gcc.gnu.org, Patch Tracking patc...@linaro.org Date: 13/06/2011 10:29 AM Subject: [PATCH, SMS] Fix violation of memory dependence Hello, The attached patch fixes violation of memory dependencies. The problematic scenario happens when -fmodulo-sched-allow-regmoves flag is set and certain anti-dep edges are not created. For example, consider the following three instructions and the edges between them. When -fmodulo-sched-allow-regmoves is set the edge (63 - Anti, 0 - 64) is not created. (probably due to transitivity) Insn 63) r168 = MEM[176] Out edges: (63 - Anti, 0 - 64) In edges: (64 - True, 1 - 63), (68 - True, 1 - 63) insn 64) 176 = 176 + 4 Out edges: (64 - True, 1 - 63), (64 - True, 0- 68) In edges: (63 - Anti, 0 - 64) insn 68) MEM[176 – 4] = 193 Out edges: (68 - True, 1 - 63) In edges: (64 - True, 0- 68) This anti-dep edge is on the path from one memory instruction to another --- from 63 to 68; such that removing the edge caused a violation of the memory dependencies as insn 63 was scheduled after insn 68. This patch adds intra edges between every two memory instructions in this case. It fixes recent bootstrap failure on ARM. (with SMS flags) The patch was tested as follows: On ppc64-redhat-linux regtest as well as bootstrap with SMS flags enabling SMS also on loops with stage count 1. Regtested on SPU. On arm-linux-gnueabi bootstrap c language with SMS flags enabling SMS also on loops with stage count 1 and currently regression testing on c,c++. OK for mainline once regtest on arm-linux-gnueabi completes? Yes, this is a straightforward fix to a wrong-code bug, as discussed offline. Other alternatives that might introduce less edges: o connect predecessors of u with v, and u with successors of v, when removing edge (u,v). Maybe there are other cases which rely on transitivity (?). o have a version of sched_analyze that avoids creating register anti-deps to begin with, and thus will create memory-deps in the absence of transitivity. * ddg.c (add_intra_loop_mem_dep): New function. You could check first thing if (from-cuid == to-cuid), for code clarity. Nice catch, Ayal. Thanks, Revital Changelog: gcc/ * ddg.c (add_intra_loop_mem_dep): New function. (build_intra_loop_deps): Call it. testsuite/ * gcc.dg/sms-9.c: New file. [attachment patch_fix_regmoves_12_6.txt deleted by Ayal Zaks/Haifa/IBM]
Re: [PATCH, SMS 1/4] Fix calculation of row_rest_count
The calculation of the number of instructions in a row is currently done by updating row_rest_count field in struct ps_insn on the fly while creating a new instruction. It is used to make sure we do not exceed the issue_rate. This calculation assumes the instruction is inserted in the beginning of a row thus does not take into account the cases where it must follow other instructions. Also, it's not been property updated when an instruction is removed. To avoid the overhead of maintaining this row_rest_count count in every instruction in each row as is currently done; this patch maintains one count per row which holds the number of instructions in the row. The patch was tested together with the rest of the patches in this series. On ppc64-redhat-linux regtest as well as bootstrap with SMS flags enabling SMS also on loops with stage count 1. Regtested on SPU. On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS flags enabling SMS also on loops with stage count 1. OK for mainline? OK. The current situation is indeed broken, and inefficient, as you noted: o We are checking only the first node in a row for its row_rest_count value, to figure out if the row is full; hence it's really a property of the row, not of a node. o When scheduling a node, we update only that node's row_rest_count; hence if we schedule a node in position other than the first, the row_rest_count value of the first node is not incremented. o When removing a node, we do not decrement any row_rest_count value. Please add the following: o A clarification that rows_length is used only (as an optimization) to back off quickly from trying to schedule a node in a full row; that is, to avoid running through futile DFA state transitions. o An assert that ps-rows_length[i] equals the number of nodes in ps-rows [i] (e.g., in verify_partial_schedule(); and then recheck...). Thanks for fixing this, Ayal. Thanks, Revital * modulo-sched.c (struct ps_insn): Remove row_rest_count field. (struct partial_schedule): Add rows_length field. (ps_insert_empty_row): Handle rows_length. (create_partial_schedule): Likewise. (free_partial_schedule): Likewise. (reset_partial_schedule): Likewise. (create_ps_insn): Remove rest_count argument. (remove_node_from_ps): Update rows_length. (add_node_to_ps): Update rows_length and call create_ps_insn without passing row_rest_count. [attachment patch_row_rest_count_17_5.txt deleted by Ayal Zaks/Haifa/IBM]
Re: [PATCH, SMS 2/4] Move the creation of anti-dep edge
Revital Eres revital.e...@linaro.org wrote on 19/05/2011 07:44:23 AM: From: Revital Eres revital.e...@linaro.org To: Ayal Zaks/Haifa/IBM@IBMIL Cc: gcc-patches@gcc.gnu.org, Patch Tracking patc...@linaro.org Date: 19/05/2011 07:44 AM Subject: [PATCH, SMS 2/4] Move the creation of anti-dep edge Hello, The attached patch moves the creation of anti-dep edge from a branch to it's def from create_ddg_dep_from_intra_loop_link () to add_cross_iteration_register_deps () due to the fact the edge is with distance 1 and thus should be in the later function. The edge was added to avoid creating reg-moves. The patch was tested together with the rest of the patches in this series. On ppc64-redhat-linux regtest as well as bootstrap with SMS flags enabling SMS also on loops with stage count 1. Regtested on SPU. On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS flags enabling SMS also on loops with stage count 1. OK for mainline? OK, this makes sense. Just to re-confirm, the exact same edges are created in both cases, right? + /* Always create the edge if the use node is a branch in +order to prevent the creation of reg-moves. */ if (DF_REF_ID (last_def) != DF_REF_ID (first_def) - || !flag_modulo_sched_allow_regmoves) + || !flag_modulo_sched_allow_regmoves + || (flag_modulo_sched_allow_regmoves JUMP_P (use_node- insn))) redundant; suffices to check + || JUMP_P (use_node-insn)) create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP, REG_DEP, 1); Thanks, Revital * ddg.c (create_ddg_dep_from_intra_loop_link): Remove the creation of anti-dep edge from a branch. (add_cross_iteration_register_deps): Create anti-dep edge from a branch. Index: ddg.c === --- ddg.c (revision 173785) +++ ddg.c (working copy) @@ -197,11 +197,6 @@ create_ddg_dep_from_intra_loop_link (ddg } } - /* If a true dep edge enters the branch create an anti edge in the - opposite direction to prevent the creation of reg-moves. */ - if ((DEP_TYPE (link) == REG_DEP_TRUE) JUMP_P (dest_node-insn)) -create_ddg_dep_no_link (g, dest_node, src_node, ANTI_DEP, REG_DEP, 1); - latency = dep_cost (link); e = create_ddg_edge (src_node, dest_node, t, dt, latency, distance); add_edge_to_ddg (g, e); @@ -306,8 +301,11 @@ add_cross_iteration_register_deps (ddg_p gcc_assert (first_def_node); + /* Always create the edge if the use node is a branch in +order to prevent the creation of reg-moves. */ if (DF_REF_ID (last_def) != DF_REF_ID (first_def) - || !flag_modulo_sched_allow_regmoves) + || !flag_modulo_sched_allow_regmoves + || (flag_modulo_sched_allow_regmoves JUMP_P (use_node- insn))) create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP, REG_DEP, 1);
Re: [PATCH, SMS 2/3] Skip DEBUG_INSNs while recognizing doloop
OK for mainline? Yes, this is pretty obvious. (You don't have to change to prev_nondebug_insn btw). Ayal. From: Revital Eres revital.e...@linaro.org To: Ayal Zaks/Haifa/IBM@IBMIL Cc: gcc-patches@gcc.gnu.org, Patch Tracking patc...@linaro.org Date: 08/05/2011 07:37 AM Subject:[PATCH, SMS 2/3] Skip DEBUG_INSNs while recognizing doloop Hello, The attached patch adds code to skip DEBUG_INSNs while recognizing doloop pattern. The patch was tested together with the rest of the patches in this series and on top of the patch to support do-loop for ARM (not yet in mainline, but approved http://gcc.gnu.org/ml/gcc-patches/2011-01/msg01718.html). On ppc64-redhat-linux regtest as well as bootstrap with SMS flags enabling SMS also on loops with stage count 1. Regtested on SPU. On arm-linux-gnueabi regtseted on c,c++. Bootstrap c language with SMS flags enabling SMS also on loops with stage count 1. OK for mainline? Thanks, Revital Changelog: * modulo-sched.c (doloop_register_get): Ignore DEBUG_INSNs while recognizing doloop. Index: modulo-sched.c === --- modulo-sched.c (revision 173368) +++ modulo-sched.c (working copy) @@ -310,10 +313,10 @@ doloop_register_get (rtx head ATTRIBUTE_ either a single (parallel) branch-on-count or a (non-parallel) branch immediately preceded by a single (decrement) insn. */ first_insn_not_to_check = (GET_CODE (PATTERN (tail)) == PARALLEL ? tail - : PREV_INSN (tail)); + : prev_nondebug_insn (tail)); for (insn = head; insn != first_insn_not_to_check; insn = NEXT_INSN (insn)) -if (reg_mentioned_p (reg, insn)) +if (reg_mentioned_p (reg, insn) !DEBUG_INSN_P (insn)) { if (dump_file) {