The following patch fixes PR57293 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57293
It is actually an implementation of missed LRA functionality in reg elimination. Before the patch any explicit change of stack pointer in RTL resulted in necessity to use the frame pointer. The patch has practically no effect on generic tuning of x86/x86-64. But it has a dramatic effect on code performance for other tunings like corei7 which don't use incoming args accumulation. The maximum SPEC2000 improvement 2.5% is achieved on x86 SPECInt2000. But SPECFP2000 rate also has improvement about 1% on x86 and x86-64. Too bad that I did not implement it at the first place. The results would have been even much better ones reported on 2012 GNU Cauldron as I also used -mtune=corei7 that time. The patch was bootstrapped and tested on x86-64/x86 and ppc. Committed as rev. 205498. 2013-11-28 Vladimir Makarov<vmaka...@redhat.com> PR target/57293 * ira.h (ira_setup_eliminable_regset): Remove parameter. * ira.c (ira_setup_eliminable_regset): Ditto. Add SUPPORTS_STACK_ALIGNMENT for crtl->stack_realign_needed. Don't call lra_init_elimination. (ira): Call ira_setup_eliminable_regset without arguments. * loop-invariant.c (calculate_loop_reg_pressure): Remove argument from ira_setup_eliminable_regset call. * gcse.c (calculate_bb_reg_pressure): Ditto. * haifa-sched.c (sched_init): Ditto. * lra.h (lra_init_elimination): Remove the prototype. * lra-int.h (lra_insn_recog_data): New member sp_offset. Move used_insn_alternative upper. (lra_eliminate_regs_1): Add one more parameter. (lra-eliminate): Ditto. * lra.c (lra_invalidate_insn_data): Set sp_offset. (setup_sp_offset): New. (lra_process_new_insns): Call setup_sp_offset. (lra): Add argument to lra_eliminate calls. * lra-constraints.c (get_equiv_substitution): Rename to get_equiv. (get_equiv_with_elimination): New. (process_addr_reg): Call get_equiv_with_elimination instead of get_equiv_substitution. (equiv_address_substitution): Ditto. (loc_equivalence_change_p): Ditto. (loc_equivalence_callback, lra_constraints): Ditto. (curr_insn_transform): Ditto. Print the sp offset (process_alt_operands): Prevent stack pointer reloads. (lra_constraints): Remove one argument from lra_eliminate call. Move it up. Mark used hard regs bfore it. Use get_equiv_with_elimination instead of get_equiv_substitution. * lra-eliminations.c (lra_eliminate_regs_1): Add parameter and assert for param values combination. Use sp offset. Add argument to lra_eliminate_regs_1 calls. (lra_eliminate_regs): Add argument to lra_eliminate_regs_1 call. (curr_sp_change): New static var. (mark_not_eliminable): Add parameter. Update curr_sp_change. Don't prevent elimination to sp if we can calculate its change. Pass the argument to mark_not_eliminable calls. (eliminate_regs_in_insn): Add a parameter. Use sp offset. Add argument to lra_eliminate_regs_1 call. (update_reg_eliminate): Move calculation of hard regs for spill lower. Switch off lra_in_progress temporarily to generate regs involved into elimination. (lra_init_elimination): Rename to init_elimination. Make it static. Set up insn sp offset, check the offsets at the end of BBs. (process_insn_for_elimination): Add parameter. Pass its value to eliminate_regs_in_insn. (lra_eliminate): : Add parameter. Pass its value to process_insn_for_elimination. Add assert for param values combination. Call init_elimination. Don't update offsets in equivalence substitutions. * lra-spills.c (assign_mem_slot): Don't call lra_eliminate_regs_1 for created stack slot. (remove_pseudos): Call lra_eliminate_regs_1 before changing memory onto stack slot. 2013-11-28 Vladimir Makarov<vmaka...@redhat.com> PR target/57293 * gcc.target/i386/pr57293.c: New.
Index: gcse.c =================================================================== --- gcse.c (revision 205233) +++ gcse.c (working copy) @@ -3509,7 +3509,7 @@ calculate_bb_reg_pressure (void) bitmap_iterator bi; - ira_setup_eliminable_regset (false); + ira_setup_eliminable_regset (); curr_regs_live = BITMAP_ALLOC (®_obstack); FOR_EACH_BB (bb) { Index: haifa-sched.c =================================================================== --- haifa-sched.c (revision 205233) +++ haifa-sched.c (working copy) @@ -6624,7 +6624,7 @@ sched_init (void) sched_pressure = SCHED_PRESSURE_NONE; if (sched_pressure != SCHED_PRESSURE_NONE) - ira_setup_eliminable_regset (false); + ira_setup_eliminable_regset (); /* Initialize SPEC_INFO. */ if (targetm.sched.set_sched_flags) Index: ira.c =================================================================== --- ira.c (revision 205233) +++ ira.c (working copy) @@ -2380,11 +2380,10 @@ compute_regs_asm_clobbered (void) } -/* Set up ELIMINABLE_REGSET, IRA_NO_ALLOC_REGS, and REGS_EVER_LIVE. - If the function is called from IRA (not from the insn scheduler or - RTL loop invariant motion), FROM_IRA_P is true. */ +/* Set up ELIMINABLE_REGSET, IRA_NO_ALLOC_REGS, and + REGS_EVER_LIVE. */ void -ira_setup_eliminable_regset (bool from_ira_p) +ira_setup_eliminable_regset (void) { #ifdef ELIMINABLE_REGS int i; @@ -2401,16 +2400,16 @@ ira_setup_eliminable_regset (bool from_i if the stack pointer is moving. */ || (flag_stack_check && STACK_CHECK_MOVING_SP) || crtl->accesses_prior_frames - || crtl->stack_realign_needed + || (SUPPORTS_STACK_ALIGNMENT && crtl->stack_realign_needed) /* We need a frame pointer for all Cilk Plus functions that use Cilk keywords. */ || (flag_enable_cilkplus && cfun->is_cilk_function) || targetm.frame_pointer_required ()); - if (from_ira_p && ira_use_lra_p) - /* It can change FRAME_POINTER_NEEDED. We call it only from IRA - because it is expensive. */ - lra_init_elimination (); + /* The chance that FRAME_POINTER_NEEDED is changed from inspecting + RTL is very small. So if we use frame pointer for RA and RTL + actually prevents this, we will spill pseudos assigned to the + frame pointer in LRA. */ if (frame_pointer_needed) df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM, true); @@ -5260,7 +5259,7 @@ ira (FILE *f) find_moveable_pseudos (); max_regno_before_ira = max_reg_num (); - ira_setup_eliminable_regset (true); + ira_setup_eliminable_regset (); ira_overall_cost = ira_reg_cost = ira_mem_cost = 0; ira_load_cost = ira_store_cost = ira_shuffle_cost = 0; Index: ira.h =================================================================== --- ira.h (revision 205233) +++ ira.h (working copy) @@ -178,7 +178,7 @@ extern struct ira_reg_equiv *ira_reg_equ extern void ira_init_once (void); extern void ira_init (void); extern void ira_finish_once (void); -extern void ira_setup_eliminable_regset (bool); +extern void ira_setup_eliminable_regset (void); extern rtx ira_eliminate_regs (rtx, enum machine_mode); extern void ira_set_pseudo_classes (bool, FILE *); extern void ira_implicitly_set_insn_hard_regs (HARD_REG_SET *); Index: loop-invariant.c =================================================================== --- loop-invariant.c (revision 205233) +++ loop-invariant.c (working copy) @@ -1823,7 +1823,7 @@ calculate_loop_reg_pressure (void) bitmap_initialize (&LOOP_DATA (loop)->regs_ref, ®_obstack); bitmap_initialize (&LOOP_DATA (loop)->regs_live, ®_obstack); } - ira_setup_eliminable_regset (false); + ira_setup_eliminable_regset (); bitmap_initialize (&curr_regs_live, ®_obstack); FOR_EACH_BB (bb) { Index: lra-constraints.c =================================================================== --- lra-constraints.c (revision 205233) +++ lra-constraints.c (working copy) @@ -318,7 +318,7 @@ in_mem_p (int regno) /* If we have decided to substitute X with another value, return that value, otherwise return X. */ static rtx -get_equiv_substitution (rtx x) +get_equiv (rtx x) { int regno; rtx res; @@ -337,6 +337,19 @@ get_equiv_substitution (rtx x) gcc_unreachable (); } +/* If we have decided to substitute X with the equivalent value, + return that value after elimination for INSN, otherwise return + X. */ +static rtx +get_equiv_with_elimination (rtx x, rtx insn) +{ + rtx res = get_equiv (x); + + if (x == res || CONSTANT_P (res)) + return res; + return lra_eliminate_regs_1 (insn, res, GET_MODE (res), false, false, true); +} + /* Set up curr_operand_mode. */ static void init_curr_operand_mode (void) @@ -1101,7 +1114,7 @@ process_addr_reg (rtx *loc, rtx *before, { regno = REGNO (reg); rclass = get_reg_class (regno); - if ((*loc = get_equiv_substitution (reg)) != reg) + if ((*loc = get_equiv_with_elimination (reg, curr_insn)) != reg) { if (lra_dump_file != NULL) { @@ -2007,6 +2020,13 @@ process_alt_operands (int only_alternati int const_to_mem = 0; bool no_regs_p; + /* Never do output reload of stack pointer. It makes + impossible to do elimination when SP is changed in + RTL. */ + if (op == stack_pointer_rtx && ! frame_pointer_needed + && curr_static_id->operand[nop].type != OP_IN) + goto fail; + /* If this alternative asks for a specific reg class, see if there is at least one allocatable register in that class. */ no_regs_p @@ -2517,7 +2537,7 @@ equiv_address_substitution (struct addre else { base_reg = *base_term; - new_base_reg = get_equiv_substitution (base_reg); + new_base_reg = get_equiv_with_elimination (base_reg, curr_insn); } index_term = strip_subreg (ad->index_term); if (index_term == NULL) @@ -2525,7 +2545,7 @@ equiv_address_substitution (struct addre else { index_reg = *index_term; - new_index_reg = get_equiv_substitution (index_reg); + new_index_reg = get_equiv_with_elimination (index_reg, curr_insn); } if (base_reg == new_base_reg && index_reg == new_index_reg) return false; @@ -3055,7 +3075,7 @@ curr_insn_transform (void) if (GET_CODE (old) == SUBREG) old = SUBREG_REG (old); - subst = get_equiv_substitution (old); + subst = get_equiv_with_elimination (old, curr_insn); if (subst != old) { subst = copy_rtx (subst); @@ -3260,6 +3280,9 @@ curr_insn_transform (void) if (INSN_CODE (curr_insn) >= 0 && (p = get_insn_name (INSN_CODE (curr_insn))) != NULL) fprintf (lra_dump_file, " {%s}", p); + if (curr_id->sp_offset != 0) + fprintf (lra_dump_file, " (sp_off=%" HOST_WIDE_INT_PRINT "d)", + curr_id->sp_offset); fprintf (lra_dump_file, "\n"); } @@ -3638,7 +3661,7 @@ loc_equivalence_change_p (rtx *loc) if (code == SUBREG) { reg = SUBREG_REG (x); - if ((subst = get_equiv_substitution (reg)) != reg + if ((subst = get_equiv_with_elimination (reg, curr_insn)) != reg && GET_MODE (subst) == VOIDmode) { /* We cannot reload debug location. Simplify subreg here @@ -3648,7 +3671,7 @@ loc_equivalence_change_p (rtx *loc) return true; } } - if (code == REG && (subst = get_equiv_substitution (x)) != x) + if (code == REG && (subst = get_equiv_with_elimination (x, curr_insn)) != x) { *loc = subst; return true; @@ -3676,7 +3699,7 @@ loc_equivalence_callback (rtx loc, const if (!REG_P (loc)) return NULL_RTX; - rtx subst = get_equiv_substitution (loc); + rtx subst = get_equiv_with_elimination (loc, curr_insn); if (subst != loc) return subst; @@ -3848,21 +3871,27 @@ lra_constraints (bool first_p) lra_risky_transformations_p = false; new_insn_uid_start = get_max_uid (); new_regno_start = first_p ? lra_constraint_new_regno_start : max_reg_num (); + /* Mark used hard regs for target stack size calulations. */ + for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++) + if (lra_reg_info[i].nrefs != 0 + && (hard_regno = lra_get_regno_hard_regno (i)) >= 0) + { + int j, nregs; + + nregs = hard_regno_nregs[hard_regno][lra_reg_info[i].biggest_mode]; + for (j = 0; j < nregs; j++) + df_set_regs_ever_live (hard_regno + j, true); + } + /* Do elimination before the equivalence processing as we can spill + some pseudos during elimination. */ + lra_eliminate (false, first_p); bitmap_initialize (&equiv_insn_bitmap, ®_obstack); for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++) if (lra_reg_info[i].nrefs != 0) { ira_reg_equiv[i].profitable_p = true; reg = regno_reg_rtx[i]; - if ((hard_regno = lra_get_regno_hard_regno (i)) >= 0) - { - int j, nregs; - - nregs = hard_regno_nregs[hard_regno][lra_reg_info[i].biggest_mode]; - for (j = 0; j < nregs; j++) - df_set_regs_ever_live (hard_regno + j, true); - } - else if ((x = get_equiv_substitution (reg)) != reg) + if (lra_get_regno_hard_regno (i) < 0 && (x = get_equiv (reg)) != reg) { bool pseudo_p = contains_reg_p (x, false, false); @@ -3911,7 +3940,7 @@ lra_constraints (bool first_p) ira_reg_equiv[i].defined_p = false; if (contains_reg_p (x, false, true)) ira_reg_equiv[i].profitable_p = false; - if (get_equiv_substitution (reg) != reg) + if (get_equiv (reg) != reg) bitmap_ior_into (&equiv_insn_bitmap, &lra_reg_info[i].insn_bitmap); } } @@ -3919,7 +3948,6 @@ lra_constraints (bool first_p) substituted by their equivalences. */ EXECUTE_IF_SET_IN_BITMAP (&equiv_insn_bitmap, 0, uid, bi) lra_push_insn_by_uid (uid); - lra_eliminate (false); min_len = lra_insn_stack_length (); new_insns_num = 0; last_bb = NULL; @@ -3973,7 +4001,7 @@ lra_constraints (bool first_p) if (GET_CODE (dest_reg) == SUBREG) dest_reg = SUBREG_REG (dest_reg); if ((REG_P (dest_reg) - && (x = get_equiv_substitution (dest_reg)) != dest_reg + && (x = get_equiv (dest_reg)) != dest_reg /* Remove insns which set up a pseudo whose value can not be changed. Such insns might be not in init_insns because we don't update equiv data @@ -3993,8 +4021,7 @@ lra_constraints (bool first_p) || in_list_p (curr_insn, ira_reg_equiv [REGNO (dest_reg)].init_insns))) - || (((x = get_equiv_substitution (SET_SRC (set))) - != SET_SRC (set)) + || (((x = get_equiv (SET_SRC (set))) != SET_SRC (set)) && in_list_p (curr_insn, ira_reg_equiv [REGNO (SET_SRC (set))].init_insns))) Index: lra-eliminations.c =================================================================== --- lra-eliminations.c (revision 205233) +++ lra-eliminations.c (working copy) @@ -286,10 +286,11 @@ get_elimination (rtx reg) } /* Scan X and replace any eliminable registers (such as fp) with a - replacement (such as sp) if SUBST_P, plus an offset. The offset is + replacement (such as sp) if SUBST_P, plus an offset. The offset is a change in the offset between the eliminable register and its substitution if UPDATE_P, or the full offset if FULL_P, or - otherwise zero. + otherwise zero. If FULL_P, we also use the SP offsets for + elimination to SP. MEM_MODE is the mode of an enclosing MEM. We need this to know how much to adjust a register for, e.g., PRE_DEC. Also, if we are @@ -298,10 +299,10 @@ get_elimination (rtx reg) outside a MEM. In addition, we need to record the fact that a hard register is referenced outside a MEM. - Alternatively, INSN may be a note (an EXPR_LIST or INSN_LIST). - That's used when we eliminate in expressions stored in notes. */ + If we make full substitution to SP for non-null INSN, add the insn + sp offset. */ rtx -lra_eliminate_regs_1 (rtx x, enum machine_mode mem_mode, +lra_eliminate_regs_1 (rtx insn, rtx x, enum machine_mode mem_mode, bool subst_p, bool update_p, bool full_p) { enum rtx_code code = GET_CODE (x); @@ -311,6 +312,7 @@ lra_eliminate_regs_1 (rtx x, enum machin const char *fmt; int copied = 0; + gcc_assert (!update_p || !full_p); if (! current_function_decl) return x; @@ -338,7 +340,12 @@ lra_eliminate_regs_1 (rtx x, enum machin if (update_p) return plus_constant (Pmode, to, ep->offset - ep->previous_offset); else if (full_p) - return plus_constant (Pmode, to, ep->offset); + return plus_constant (Pmode, to, + ep->offset + - (insn != NULL_RTX + && ep->to_rtx == stack_pointer_rtx + ? lra_get_insn_recog_data (insn)->sp_offset + : 0)); else return to; } @@ -359,6 +366,8 @@ lra_eliminate_regs_1 (rtx x, enum machin offset = (update_p ? ep->offset - ep->previous_offset : ep->offset); + if (full_p && insn != NULL_RTX && ep->to_rtx == stack_pointer_rtx) + offset -= lra_get_insn_recog_data (insn)->sp_offset; if (CONST_INT_P (XEXP (x, 1)) && INTVAL (XEXP (x, 1)) == -offset) return to; @@ -384,9 +393,9 @@ lra_eliminate_regs_1 (rtx x, enum machin an address operand of a load-address insn. */ { - rtx new0 = lra_eliminate_regs_1 (XEXP (x, 0), mem_mode, + rtx new0 = lra_eliminate_regs_1 (insn, XEXP (x, 0), mem_mode, subst_p, update_p, full_p); - rtx new1 = lra_eliminate_regs_1 (XEXP (x, 1), mem_mode, + rtx new1 = lra_eliminate_regs_1 (insn, XEXP (x, 1), mem_mode, subst_p, update_p, full_p); if (new0 != XEXP (x, 0) || new1 != XEXP (x, 1)) @@ -412,10 +421,16 @@ lra_eliminate_regs_1 (rtx x, enum machin (ep->offset - ep->previous_offset) * INTVAL (XEXP (x, 1))); else if (full_p) - return - plus_constant (Pmode, - gen_rtx_MULT (Pmode, to, XEXP (x, 1)), - ep->offset * INTVAL (XEXP (x, 1))); + { + HOST_WIDE_INT offset = ep->offset; + + if (insn != NULL_RTX && ep->to_rtx == stack_pointer_rtx) + offset -= lra_get_insn_recog_data (insn)->sp_offset; + return + plus_constant (Pmode, + gen_rtx_MULT (Pmode, to, XEXP (x, 1)), + offset * INTVAL (XEXP (x, 1))); + } else return gen_rtx_MULT (Pmode, to, XEXP (x, 1)); } @@ -435,10 +450,10 @@ lra_eliminate_regs_1 (rtx x, enum machin case GE: case GT: case GEU: case GTU: case LE: case LT: case LEU: case LTU: { - rtx new0 = lra_eliminate_regs_1 (XEXP (x, 0), mem_mode, + rtx new0 = lra_eliminate_regs_1 (insn, XEXP (x, 0), mem_mode, subst_p, update_p, full_p); rtx new1 = XEXP (x, 1) - ? lra_eliminate_regs_1 (XEXP (x, 1), mem_mode, + ? lra_eliminate_regs_1 (insn, XEXP (x, 1), mem_mode, subst_p, update_p, full_p) : 0; if (new0 != XEXP (x, 0) || new1 != XEXP (x, 1)) @@ -451,7 +466,7 @@ lra_eliminate_regs_1 (rtx x, enum machin eliminate it. */ if (XEXP (x, 0)) { - new_rtx = lra_eliminate_regs_1 (XEXP (x, 0), mem_mode, + new_rtx = lra_eliminate_regs_1 (insn, XEXP (x, 0), mem_mode, subst_p, update_p, full_p); if (new_rtx != XEXP (x, 0)) { @@ -460,7 +475,7 @@ lra_eliminate_regs_1 (rtx x, enum machin REG_DEAD note for the stack or frame pointer. */ if (REG_NOTE_KIND (x) == REG_DEAD) return (XEXP (x, 1) - ? lra_eliminate_regs_1 (XEXP (x, 1), mem_mode, + ? lra_eliminate_regs_1 (insn, XEXP (x, 1), mem_mode, subst_p, update_p, full_p) : NULL_RTX); @@ -477,7 +492,7 @@ lra_eliminate_regs_1 (rtx x, enum machin strictly needed, but it simplifies the code. */ if (XEXP (x, 1)) { - new_rtx = lra_eliminate_regs_1 (XEXP (x, 1), mem_mode, + new_rtx = lra_eliminate_regs_1 (insn, XEXP (x, 1), mem_mode, subst_p, update_p, full_p); if (new_rtx != XEXP (x, 1)) return @@ -504,7 +519,8 @@ lra_eliminate_regs_1 (rtx x, enum machin if (GET_CODE (XEXP (x, 1)) == PLUS && XEXP (XEXP (x, 1), 0) == XEXP (x, 0)) { - rtx new_rtx = lra_eliminate_regs_1 (XEXP (XEXP (x, 1), 1), mem_mode, + rtx new_rtx = lra_eliminate_regs_1 (insn, XEXP (XEXP (x, 1), 1), + mem_mode, subst_p, update_p, full_p); if (new_rtx != XEXP (XEXP (x, 1), 1)) @@ -528,14 +544,14 @@ lra_eliminate_regs_1 (rtx x, enum machin case POPCOUNT: case PARITY: case BSWAP: - new_rtx = lra_eliminate_regs_1 (XEXP (x, 0), mem_mode, + new_rtx = lra_eliminate_regs_1 (insn, XEXP (x, 0), mem_mode, subst_p, update_p, full_p); if (new_rtx != XEXP (x, 0)) return gen_rtx_fmt_e (code, GET_MODE (x), new_rtx); return x; case SUBREG: - new_rtx = lra_eliminate_regs_1 (SUBREG_REG (x), mem_mode, + new_rtx = lra_eliminate_regs_1 (insn, SUBREG_REG (x), mem_mode, subst_p, update_p, full_p); if (new_rtx != SUBREG_REG (x)) @@ -563,12 +579,12 @@ lra_eliminate_regs_1 (rtx x, enum machin return replace_equiv_address_nv (x, - lra_eliminate_regs_1 (XEXP (x, 0), GET_MODE (x), + lra_eliminate_regs_1 (insn, XEXP (x, 0), GET_MODE (x), subst_p, update_p, full_p)); case USE: /* Handle insn_list USE that a call to a pure function may generate. */ - new_rtx = lra_eliminate_regs_1 (XEXP (x, 0), VOIDmode, + new_rtx = lra_eliminate_regs_1 (insn, XEXP (x, 0), VOIDmode, subst_p, update_p, full_p); if (new_rtx != XEXP (x, 0)) return gen_rtx_USE (GET_MODE (x), new_rtx); @@ -589,7 +605,7 @@ lra_eliminate_regs_1 (rtx x, enum machin { if (*fmt == 'e') { - new_rtx = lra_eliminate_regs_1 (XEXP (x, i), mem_mode, + new_rtx = lra_eliminate_regs_1 (insn, XEXP (x, i), mem_mode, subst_p, update_p, full_p); if (new_rtx != XEXP (x, i) && ! copied) { @@ -603,7 +619,7 @@ lra_eliminate_regs_1 (rtx x, enum machin int copied_vec = 0; for (j = 0; j < XVECLEN (x, i); j++) { - new_rtx = lra_eliminate_regs_1 (XVECEXP (x, i, j), mem_mode, + new_rtx = lra_eliminate_regs_1 (insn, XVECEXP (x, i, j), mem_mode, subst_p, update_p, full_p); if (new_rtx != XVECEXP (x, i, j) && ! copied_vec) { @@ -631,16 +647,21 @@ rtx lra_eliminate_regs (rtx x, enum machine_mode mem_mode, rtx insn ATTRIBUTE_UNUSED) { - return lra_eliminate_regs_1 (x, mem_mode, true, false, true); + return lra_eliminate_regs_1 (NULL_RTX, x, mem_mode, true, false, true); } +/* Stack pointer offset before the current insn relative to one at the + func start. RTL insns can change SP explicitly. We keep the + changes from one insn to another through this variable. */ +static HOST_WIDE_INT curr_sp_change; + /* Scan rtx X for references to elimination source or target registers in contexts that would prevent the elimination from happening. Update the table of eliminables to reflect the changed state. MEM_MODE is the mode of an enclosing MEM rtx, or VOIDmode if not within a MEM. */ static void -mark_not_eliminable (rtx x) +mark_not_eliminable (rtx x, enum machine_mode mem_mode) { enum rtx_code code = GET_CODE (x); struct elim_table *ep; @@ -655,17 +676,40 @@ mark_not_eliminable (rtx x) case POST_DEC: case POST_MODIFY: case PRE_MODIFY: - if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER) - /* If we modify the source of an elimination rule, disable - it. Do the same if it is the source and not the hard frame - register. */ - for (ep = reg_eliminate; - ep < ®_eliminate[NUM_ELIMINABLE_REGS]; + if (XEXP (x, 0) == stack_pointer_rtx + && ((code != PRE_MODIFY && code != POST_MODIFY) + || (GET_CODE (XEXP (x, 1)) == PLUS + && XEXP (x, 0) == XEXP (XEXP (x, 1), 0) + && CONST_INT_P (XEXP (XEXP (x, 1), 1))))) + { + int size = GET_MODE_SIZE (mem_mode); + +#ifdef PUSH_ROUNDING + /* If more bytes than MEM_MODE are pushed, account for + them. */ + size = PUSH_ROUNDING (size); +#endif + if (code == PRE_DEC || code == POST_DEC) + curr_sp_change -= size; + else if (code == PRE_INC || code == POST_INC) + curr_sp_change += size; + else if (code == PRE_MODIFY || code == POST_MODIFY) + curr_sp_change += INTVAL (XEXP (XEXP (x, 1), 1)); + } + else if (REG_P (XEXP (x, 0)) + && REGNO (XEXP (x, 0)) >= FIRST_PSEUDO_REGISTER) + { + /* If we modify the source of an elimination rule, disable + it. Do the same if it is the destination and not the + hard frame register. */ + for (ep = reg_eliminate; + ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) - if (ep->from_rtx == XEXP (x, 0) - || (ep->to_rtx == XEXP (x, 0) - && ep->to_rtx != hard_frame_pointer_rtx)) - setup_can_eliminate (ep, false); + if (ep->from_rtx == XEXP (x, 0) + || (ep->to_rtx == XEXP (x, 0) + && ep->to_rtx != hard_frame_pointer_rtx)) + setup_can_eliminate (ep, false); + } return; case USE: @@ -697,12 +741,22 @@ mark_not_eliminable (rtx x) return; case SET: - /* Check for setting a hard register that we know about. */ - if (REG_P (SET_DEST (x)) && REGNO (SET_DEST (x)) < FIRST_PSEUDO_REGISTER) + if (SET_DEST (x) == stack_pointer_rtx + && GET_CODE (SET_SRC (x)) == PLUS + && XEXP (SET_SRC (x), 0) == SET_DEST (x) + && CONST_INT_P (XEXP (SET_SRC (x), 1))) + { + curr_sp_change += INTVAL (XEXP (SET_SRC (x), 1)); + return; + } + if (! REG_P (SET_DEST (x)) + || REGNO (SET_DEST (x)) >= FIRST_PSEUDO_REGISTER) + mark_not_eliminable (SET_DEST (x), mem_mode); + else { /* See if this is setting the replacement hard register for an elimination. - + If DEST is the hard frame pointer, we do nothing because we assume that all assignments to the frame pointer are for non-local gotos and are being done at a time when @@ -711,22 +765,21 @@ mark_not_eliminable (rtx x) even a fake frame pointer) with either the real frame pointer or the stack pointer. Assignments to the hard frame pointer must not prevent this elimination. */ - for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) if (ep->to_rtx == SET_DEST (x) - && SET_DEST (x) != hard_frame_pointer_rtx - && (! (SUPPORTS_STACK_ALIGNMENT && stack_realign_fp - && REGNO (ep->to_rtx) == STACK_POINTER_REGNUM) - || GET_CODE (SET_SRC (x)) != PLUS - || XEXP (SET_SRC (x), 0) != SET_DEST (x) - || ! CONST_INT_P (XEXP (SET_SRC (x), 1)))) + && SET_DEST (x) != hard_frame_pointer_rtx) setup_can_eliminate (ep, false); } + + mark_not_eliminable (SET_SRC (x), mem_mode); + return; - mark_not_eliminable (SET_DEST (x)); - mark_not_eliminable (SET_SRC (x)); + case MEM: + /* Our only special processing is to pass the mode of the MEM to + our recursive call. */ + mark_not_eliminable (XEXP (x, 0), GET_MODE (x)); return; default: @@ -737,10 +790,10 @@ mark_not_eliminable (rtx x) for (i = 0; i < GET_RTX_LENGTH (code); i++, fmt++) { if (*fmt == 'e') - mark_not_eliminable (XEXP (x, i)); + mark_not_eliminable (XEXP (x, i), mem_mode); else if (*fmt == 'E') for (j = 0; j < XVECLEN (x, i); j++) - mark_not_eliminable (XVECEXP (x, i, j)); + mark_not_eliminable (XVECEXP (x, i, j), mem_mode); } } @@ -778,13 +831,14 @@ remove_reg_equal_offset_note (rtx insn, delete the insn as dead it if it is setting an eliminable register. If REPLACE_P is false, just update the offsets while keeping the - base register the same. Attach the note about used elimination for + base register the same. If FIRST_P, use the sp offset for + elimination to sp. Attach the note about used elimination for insns setting frame pointer to update elimination easy (without parsing already generated elimination insns to find offset previously used) in future. */ static void -eliminate_regs_in_insn (rtx insn, bool replace_p) +eliminate_regs_in_insn (rtx insn, bool replace_p, bool first_p) { int icode = recog_memoized (insn); rtx old_set = single_set (insn); @@ -914,6 +968,8 @@ eliminate_regs_in_insn (rtx insn, bool r if (! replace_p) { offset += (ep->offset - ep->previous_offset); + if (first_p && ep->to_rtx == stack_pointer_rtx) + offset -= lra_get_insn_recog_data (insn)->sp_offset; offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src)); } @@ -985,8 +1041,9 @@ eliminate_regs_in_insn (rtx insn, bool r /* Companion to the above plus substitution, we can allow invariants as the source of a plain move. */ substed_operand[i] - = lra_eliminate_regs_1 (*id->operand_loc[i], VOIDmode, - replace_p, ! replace_p, false); + = lra_eliminate_regs_1 (insn, *id->operand_loc[i], VOIDmode, + replace_p, ! replace_p && ! first_p, + first_p); if (substed_operand[i] != orig_operand[i]) validate_p = true; } @@ -1054,7 +1111,7 @@ spill_pseudos (HARD_REG_SET set) } /* Update all offsets and possibility for elimination on eliminable - registers. Spill pseudos assigned to registers which became + registers. Spill pseudos assigned to registers which are uneliminable, update LRA_NO_ALLOC_REGS and ELIMINABLE_REG_SET. Add insns to INSNS_WITH_CHANGED_OFFSETS containing eliminable hard registers whose offsets should be changed. Return true if any @@ -1069,7 +1126,6 @@ update_reg_eliminate (bitmap insns_with_ /* Clear self elimination offsets. */ for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) self_elim_offsets[ep->from] = 0; - CLEAR_HARD_REG_SET (temp_hard_reg_set); for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) { /* If it is a currently used elimination: update the previous @@ -1096,6 +1152,9 @@ update_reg_eliminate (bitmap insns_with_ fprintf (lra_dump_file, " Elimination %d to %d is not possible anymore\n", ep->from, ep->to); + /* If after processing RTL we decides that SP can be used as + a result of elimination, it can not be changed. */ + gcc_assert (ep->to_rtx != stack_pointer_rtx); /* Mark that is not eliminable anymore. */ elimination_map[ep->from] = NULL; for (ep1 = ep + 1; ep1 < ®_eliminate[NUM_ELIMINABLE_REGS]; ep1++) @@ -1106,9 +1165,6 @@ update_reg_eliminate (bitmap insns_with_ if (lra_dump_file != NULL) fprintf (lra_dump_file, " Using elimination %d to %d now\n", ep1->from, ep1->to); - /* Prevent the hard register into which we eliminate now - from the usage for pseudos. */ - SET_HARD_REG_BIT (temp_hard_reg_set, ep1->to); lra_assert (ep1->previous_offset == 0); ep1->previous_offset = ep->offset; } @@ -1121,7 +1177,6 @@ update_reg_eliminate (bitmap insns_with_ fprintf (lra_dump_file, " %d is not eliminable at all\n", ep->from); self_elim_offsets[ep->from] = -ep->offset; - SET_HARD_REG_BIT (temp_hard_reg_set, ep->from); if (ep->offset != 0) bitmap_ior_into (insns_with_changed_offsets, &lra_reg_info[ep->from].insn_bitmap); @@ -1134,23 +1189,33 @@ update_reg_eliminate (bitmap insns_with_ INITIAL_FRAME_POINTER_OFFSET (ep->offset); #endif } - IOR_HARD_REG_SET (lra_no_alloc_regs, temp_hard_reg_set); - AND_COMPL_HARD_REG_SET (eliminable_regset, temp_hard_reg_set); - spill_pseudos (temp_hard_reg_set); setup_elimination_map (); result = false; + CLEAR_HARD_REG_SET (temp_hard_reg_set); for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) - if (elimination_map[ep->from] == ep && ep->previous_offset != ep->offset) + if (elimination_map[ep->from] == NULL) + SET_HARD_REG_BIT (temp_hard_reg_set, ep->from); + else if (elimination_map[ep->from] == ep) { - bitmap_ior_into (insns_with_changed_offsets, - &lra_reg_info[ep->from].insn_bitmap); + /* Prevent the hard register into which we eliminate from + the usage for pseudos. */ + if (ep->from != ep->to) + SET_HARD_REG_BIT (temp_hard_reg_set, ep->to); + if (ep->previous_offset != ep->offset) + { + bitmap_ior_into (insns_with_changed_offsets, + &lra_reg_info[ep->from].insn_bitmap); - /* Update offset when the eliminate offset have been - changed. */ - lra_update_reg_val_offset (lra_reg_info[ep->from].val, - ep->offset - ep->previous_offset); - result = true; + /* Update offset when the eliminate offset have been + changed. */ + lra_update_reg_val_offset (lra_reg_info[ep->from].val, + ep->offset - ep->previous_offset); + result = true; + } } + IOR_HARD_REG_SET (lra_no_alloc_regs, temp_hard_reg_set); + AND_COMPL_HARD_REG_SET (eliminable_regset, temp_hard_reg_set); + spill_pseudos (temp_hard_reg_set); return result; } @@ -1194,31 +1259,54 @@ init_elim_table (void) setup_can_eliminate (®_eliminate[0], ! frame_pointer_needed); #endif - /* Count the number of eliminable registers and build the FROM and TO - REG rtx's. Note that code in gen_rtx_REG will cause, e.g., - gen_rtx_REG (Pmode, STACK_POINTER_REGNUM) to equal stack_pointer_rtx. - We depend on this. */ + /* Build the FROM and TO REG rtx's. Note that code in gen_rtx_REG + will cause, e.g., gen_rtx_REG (Pmode, STACK_POINTER_REGNUM) to + equal stack_pointer_rtx. We depend on this. Threfore we switch + off that we are in LRA temporarily. */ + lra_in_progress = 0; for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) { ep->from_rtx = gen_rtx_REG (Pmode, ep->from); ep->to_rtx = gen_rtx_REG (Pmode, ep->to); eliminable_reg_rtx[ep->from] = ep->from_rtx; } + lra_in_progress = 1; } -/* Entry function for initialization of elimination once per - function. */ -void -lra_init_elimination (void) +/* Function for initialization of elimination once per function. It + sets up sp offset for each insn. */ +static void +init_elimination (void) { + bool stop_to_sp_elimination_p; basic_block bb; rtx insn; + struct elim_table *ep; init_elim_table (); FOR_EACH_BB (bb) - FOR_BB_INSNS (bb, insn) - if (NONDEBUG_INSN_P (insn)) - mark_not_eliminable (PATTERN (insn)); + { + curr_sp_change = 0; + stop_to_sp_elimination_p = false; + FOR_BB_INSNS (bb, insn) + if (INSN_P (insn)) + { + lra_get_insn_recog_data (insn)->sp_offset = curr_sp_change; + if (NONDEBUG_INSN_P (insn)) + { + mark_not_eliminable (PATTERN (insn), VOIDmode); + if (curr_sp_change != 0 + && find_reg_note (insn, REG_LABEL_OPERAND, NULL_RTX)) + stop_to_sp_elimination_p = true; + } + } + if (! frame_pointer_needed + && (curr_sp_change != 0 || stop_to_sp_elimination_p) + && bb->succs && bb->succs->length () != 0) + for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) + if (ep->to == STACK_POINTER_REGNUM) + setup_can_eliminate (ep, false); + } setup_elimination_map (); } @@ -1237,12 +1325,13 @@ lra_eliminate_reg_if_possible (rtx *loc) *loc = ep->to_rtx; } -/* Do (final if FINAL_P) elimination in INSN. Add the insn for - subsequent processing in the constraint pass, update the insn info. */ +/* Do (final if FINAL_P or first if FIRST_P) elimination in INSN. Add + the insn for subsequent processing in the constraint pass, update + the insn info. */ static void -process_insn_for_elimination (rtx insn, bool final_p) +process_insn_for_elimination (rtx insn, bool final_p, bool first_p) { - eliminate_regs_in_insn (insn, final_p); + eliminate_regs_in_insn (insn, final_p, first_p); if (! final_p) { /* Check that insn changed its code. This is a case when a move @@ -1262,20 +1351,23 @@ process_insn_for_elimination (rtx insn, } /* Entry function to do final elimination if FINAL_P or to update - elimination register offsets. */ + elimination register offsets (FIRST_P if we are doing it the first + time). */ void -lra_eliminate (bool final_p) +lra_eliminate (bool final_p, bool first_p) { - int i; unsigned int uid; - rtx mem_loc, invariant; bitmap_head insns_with_changed_offsets; bitmap_iterator bi; struct elim_table *ep; - int regs_num = max_reg_num (); + + gcc_assert (! final_p || ! first_p); timevar_push (TV_LRA_ELIMINATE); + if (first_p) + init_elimination (); + bitmap_initialize (&insns_with_changed_offsets, ®_obstack); if (final_p) { @@ -1299,28 +1391,11 @@ lra_eliminate (bool final_p) fprintf (lra_dump_file, "New elimination table:\n"); print_elim_table (lra_dump_file); } - for (i = FIRST_PSEUDO_REGISTER; i < regs_num; i++) - if (lra_reg_info[i].nrefs != 0) - { - mem_loc = ira_reg_equiv[i].memory; - if (mem_loc != NULL_RTX) - mem_loc = lra_eliminate_regs_1 (mem_loc, VOIDmode, - final_p, ! final_p, false); - ira_reg_equiv[i].memory = mem_loc; - invariant = ira_reg_equiv[i].invariant; - if (invariant != NULL_RTX) - invariant = lra_eliminate_regs_1 (invariant, VOIDmode, - final_p, ! final_p, false); - ira_reg_equiv[i].invariant = invariant; - if (lra_dump_file != NULL - && (mem_loc != NULL_RTX || invariant != NULL)) - fprintf (lra_dump_file, - "Updating elimination of equiv for reg %d\n", i); - } EXECUTE_IF_SET_IN_BITMAP (&insns_with_changed_offsets, 0, uid, bi) /* A dead insn can be deleted in process_insn_for_elimination. */ if (lra_insn_recog_data[uid] != NULL) - process_insn_for_elimination (lra_insn_recog_data[uid]->insn, final_p); + process_insn_for_elimination (lra_insn_recog_data[uid]->insn, + final_p, first_p); bitmap_clear (&insns_with_changed_offsets); lra_eliminate_done: Index: lra-int.h =================================================================== --- lra-int.h (revision 205233) +++ lra-int.h (working copy) @@ -207,6 +207,12 @@ struct lra_insn_recog_data { /* The insn code. */ int icode; + /* The alternative should be used for the insn, -1 if invalid, or we + should try to use any alternative, or the insn is a debug + insn. */ + int used_insn_alternative; + /* SP offset before the insn relative to one at the func start. */ + HOST_WIDE_INT sp_offset; /* The insn itself. */ rtx insn; /* Common data for insns with the same ICODE. Asm insns (their @@ -222,10 +228,6 @@ struct lra_insn_recog_data int *arg_hard_regs; /* Alternative enabled for the insn. NULL for debug insns. */ bool *alternative_enabled_p; - /* The alternative should be used for the insn, -1 if invalid, or we - should try to use any alternative, or the insn is a debug - insn. */ - int used_insn_alternative; /* The following member value is always NULL for a debug insn. */ struct lra_insn_reg *regs; }; @@ -377,8 +379,8 @@ extern void lra_final_code_change (void) extern void lra_debug_elim_table (void); extern int lra_get_elimination_hard_regno (int); -extern rtx lra_eliminate_regs_1 (rtx, enum machine_mode, bool, bool, bool); -extern void lra_eliminate (bool); +extern rtx lra_eliminate_regs_1 (rtx, rtx, enum machine_mode, bool, bool, bool); +extern void lra_eliminate (bool, bool); extern void lra_eliminate_reg_if_possible (rtx *); Index: lra-spills.c =================================================================== --- lra-spills.c (revision 205233) +++ lra-spills.c (working copy) @@ -163,7 +163,6 @@ assign_mem_slot (int i) x = assign_stack_local (mode, total_size, min_align > inherent_align || total_size > inherent_size ? -1 : 0); - x = lra_eliminate_regs_1 (x, GET_MODE (x), false, false, true); stack_slot = x; /* Cancel the big-endian correction done in assign_stack_local. Get the address of the beginning of the slot. This is so we @@ -430,8 +429,15 @@ remove_pseudos (rtx *loc, rtx insn) into scratches back. */ && ! lra_former_scratch_p (i)) { - hard_reg = spill_hard_reg[i]; - *loc = copy_rtx (hard_reg != NULL_RTX ? hard_reg : pseudo_slots[i].mem); + if ((hard_reg = spill_hard_reg[i]) != NULL_RTX) + *loc = copy_rtx (hard_reg); + else + { + rtx x = lra_eliminate_regs_1 (insn, pseudo_slots[i].mem, + GET_MODE (pseudo_slots[i].mem), + false, false, true); + *loc = x != pseudo_slots[i].mem ? x : copy_rtx (x); + } return; } Index: lra.c =================================================================== --- lra.c (revision 205233) +++ lra.c (working copy) @@ -207,7 +207,8 @@ lra_set_regno_unique_value (int regno) lra_reg_info[regno].val = get_new_reg_value (); } -/* Invalidate INSN related info used by LRA. */ +/* Invalidate INSN related info used by LRA. The info should never be + used after that. */ void lra_invalidate_insn_data (rtx insn) { @@ -1273,17 +1274,24 @@ lra_update_insn_recog_data (rtx insn) int n; unsigned int uid = INSN_UID (insn); struct lra_static_insn_data *insn_static_data; + HOST_WIDE_INT sp_offset = 0; check_and_expand_insn_recog_data (uid); if ((data = lra_insn_recog_data[uid]) != NULL && data->icode != INSN_CODE (insn)) { + sp_offset = data->sp_offset; invalidate_insn_data_regno_info (data, insn, get_insn_freq (insn)); invalidate_insn_recog_data (uid); data = NULL; } if (data == NULL) - return lra_get_insn_recog_data (insn); + { + data = lra_get_insn_recog_data (insn); + /* Initiate or restore SP offset. */ + data->sp_offset = sp_offset; + return data; + } insn_static_data = data->insn_static_data; data->used_insn_alternative = -1; if (DEBUG_INSN_P (insn)) @@ -1837,6 +1845,20 @@ push_insns (rtx from, rtx to) lra_push_insn (insn); } +/* Set up sp offset for insn in range [FROM, LAST]. The offset is + taken from the next BB insn after LAST or zero if there in such + insn. */ +static void +setup_sp_offset (rtx from, rtx last) +{ + rtx before = next_nonnote_insn_bb (last); + HOST_WIDE_INT offset = (before == NULL_RTX || ! INSN_P (before) + ? 0 : lra_get_insn_recog_data (before)->sp_offset); + + for (rtx insn = from; insn != NEXT_INSN (last); insn = NEXT_INSN (insn)) + lra_get_insn_recog_data (insn)->sp_offset = offset; +} + /* Emit insns BEFORE before INSN and insns AFTER after INSN. Put the insns onto the stack. Print about emitting the insns with TITLE. */ @@ -1845,7 +1867,9 @@ lra_process_new_insns (rtx insn, rtx bef { rtx last; - if (lra_dump_file != NULL && (before != NULL_RTX || after != NULL_RTX)) + if (before == NULL_RTX && after == NULL_RTX) + return; + if (lra_dump_file != NULL) { dump_insn_slim (lra_dump_file, insn); if (before != NULL_RTX) @@ -1864,6 +1888,7 @@ lra_process_new_insns (rtx insn, rtx bef { emit_insn_before (before, insn); push_insns (PREV_INSN (insn), PREV_INSN (before)); + setup_sp_offset (before, PREV_INSN (insn)); } if (after != NULL_RTX) { @@ -1871,6 +1896,7 @@ lra_process_new_insns (rtx insn, rtx bef ; emit_insn_after (after, insn); push_insns (last, insn); + setup_sp_offset (after, last); } } @@ -2314,7 +2340,7 @@ lra (FILE *f) For example, rs6000 can make RS6000_PIC_OFFSET_TABLE_REGNUM uneliminable if we started to use a constant pool. */ - lra_eliminate (false); + lra_eliminate (false, false); /* Do inheritance only for regular algorithms. */ if (! lra_simple_p) lra_inheritance (); @@ -2368,13 +2394,13 @@ lra (FILE *f) lra_spill (); /* Assignment of stack slots changes elimination offsets for some eliminations. So update the offsets here. */ - lra_eliminate (false); + lra_eliminate (false, false); lra_constraint_new_regno_start = max_reg_num (); lra_constraint_new_insn_uid_start = get_max_uid (); lra_constraint_iter_after_spill = 0; } restore_scratches (); - lra_eliminate (true); + lra_eliminate (true, false); lra_final_code_change (); lra_in_progress = 0; if (live_p) Index: lra.h =================================================================== --- lra.h (revision 205233) +++ lra.h (working copy) @@ -33,7 +33,6 @@ lra_get_allocno_class (int regno) extern rtx lra_create_new_reg (enum machine_mode, rtx, enum reg_class, const char *); -extern void lra_init_elimination (void); extern rtx lra_eliminate_regs (rtx, enum machine_mode, rtx); extern void lra (FILE *); extern void lra_init_once (void); Index: testsuite/gcc.target/i386/pr57293.c =================================================================== --- testsuite/gcc.target/i386/pr57293.c (revision 0) +++ testsuite/gcc.target/i386/pr57293.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile { target { ia32 } } } */ +/* { dg-options "-O2 -fomit-frame-pointer" } */ +/* { dg-final { scan-assembler-not "%ebp" } } */ + +__attribute__((__noinline__, __noclone__, __stdcall__)) void g(int a) +{ + __builtin_printf("in g(): %d\n", a); +} + +__attribute__((__noinline__, __noclone__, __thiscall__)) void h(int a, int b) +{ + __builtin_printf("in h(): %d %d\n", a, b); +} + +void f() +{ + g(0); + h(0, 1); + __builtin_puts("in f()"); +}