On Sat, May 23, 2015 at 08:45:23AM -0500, Segher Boessenkool wrote: > Thanks. Did you see improvements around return as well, or mostly / > only related to the function start?
The rlwinm vs. rldicl change was in dwarf2out.c add_ranges_num for r3 one insn before a blr. I'm sure that was due to not combining return value copies. In fact, I'm sure all the improvements I saw were due to changing the exit.. See below. [snip] > I agree it looks like a bug waiting to happen. Please post it as a > separate patch though. OK. > > +/* Twiddle BLOCK_FOR_INSN to TO for instructions in the first block BB > > + that we don't want to combine with other instructions. */ > > + > > +static void > > +twiddle_first_block (basic_block bb, basic_block to) > > I don't like this much. Messing with global state makes things harder > to change later. If it is hard to come up with a good name for a > factor, it usually means it is not such a good factor. > > You can do these inside can_combine_{def,use}_p as far as I can see? > Probably need to give those an extra parameter then: for _def, a bool > that says "don't allow moves from hard regs", set when the scan has > encountered a NOTE_INSN_FUNCTION_BEG in this bb; and for _use, a regset > of those hard regs we don't want to allow moves to (those seen in USEs > later in that same block). Or do it in the main loop itself, _{def,use} > are each called in one place only; whatever works best. Huh, that's the way I started writing this patch.. The reason I went with modifying BLOCK_FOR_INSN is that the loop setting up LOG_LINKS needs to test that anyway. Changing code in the main loop means every insn in a function will need to be tested for additional conditions. I thought that might be slower. I can see your point though, especially if someone wanted to wean combine off LOG_LINKS. I'll rewrite the patch, which I realized was buggy anyway, and didn't prevent param copies from being combined. :-( > What makes return values different from CALL args here, btw? Why do > you not want to combine return values, but you leave alone call args? I don't think there is much difference between SETs for return values and SETs for call args. The reason I deliberately left them out is that in the original discussion we were talking about parameters and return values. So I thought I'd better restrict the change to just those SETs. It would be a much simpler patch to make combine ignore all SETs that are a reg,reg copy with one of them a hard reg. I was a little worried I'd regress some target if I tried that. (Results on powerpc64le-linux for such a change look good. A lot more cases where code is better, due to catching the parameter reg,reg copies. In looking at gcc/*.o I haven't yet seen any regressions in code quality.) > > +/* Fill in log links field for all insns that we wish to combine. */ > > Please don't change this comment; it was more correct before. But it wasn't correct! LOG_LINKS are not set up for *all* insns. Perhaps I should add "that we might wish to combine" rather than "that we wish to combine"? > > @@ -1103,7 +1192,7 @@ create_log_links (void) > > we might wind up changing the semantics of the insn, > > even if reload can make what appear to be valid > > assignments later. */ > > - if (regno < FIRST_PSEUDO_REGISTER > > + if (HARD_REGISTER_NUM_P (regno) > > && asm_noperands (PATTERN (use_insn)) >= 0) > > continue; > > An independent change. Yeah, I guess. I tidy these if I'm working on a piece of code. Here's the whole file done. Boostrapped and regression tested powerpc64le-linux and x86_64-linux. * combine.c: Use HARD_REGISTER_P and HARD_REGISTER_NUM_P as appropriate throughout file in place of comparing regno against FIRST_PSEUDO_REGISTER. Index: combine.c =================================================================== --- combine.c (revision 223463) +++ combine.c (working copy) @@ -1103,7 +1103,7 @@ create_log_links (void) we might wind up changing the semantics of the insn, even if reload can make what appear to be valid assignments later. */ - if (regno < FIRST_PSEUDO_REGISTER + if (HARD_REGISTER_NUM_P (regno) && asm_noperands (PATTERN (use_insn)) >= 0) continue; @@ -1730,7 +1730,7 @@ set_nonzero_bits_and_sign_copies (rtx x, const_rtx rtx_insn *insn = (rtx_insn *) data; if (REG_P (x) - && REGNO (x) >= FIRST_PSEUDO_REGISTER + && !HARD_REGISTER_P (x) /* If this register is undefined at the start of the file, we can't say what its contents were. */ && ! REGNO_REG_SET_P @@ -1897,7 +1897,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i && (REGNO (XEXP (i3elt, 0)) == regno ? reg_set_between_p (XEXP (elt, 0), PREV_INSN (insn), i3) - : regno >= FIRST_PSEUDO_REGISTER)) + : !HARD_REGISTER_NUM_P (regno))) return 0; } while (--i >= 0); @@ -1971,7 +1971,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i || (CALL_P (i3) && (find_reg_fusage (i3, USE, dest) || (REG_P (dest) - && REGNO (dest) < FIRST_PSEUDO_REGISTER + && HARD_REGISTER_P (dest) && global_regs[REGNO (dest)]))) /* Don't substitute into an incremented register. */ || FIND_REG_INC_NOTE (i3, dest) @@ -2021,7 +2021,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i register. */ if (REG_P (src) - && ((REGNO (dest) < FIRST_PSEUDO_REGISTER + && ((HARD_REGISTER_P (dest) && ! HARD_REGNO_MODE_OK (REGNO (dest), GET_MODE (dest))) /* Don't extend the life of a hard register unless it is user variable (if we have few registers) or it can't @@ -2030,7 +2030,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i Also avoid substituting a return register into I3, because reload can't handle a conflict with constraints of other inputs. */ - || (REGNO (src) < FIRST_PSEUDO_REGISTER + || (HARD_REGISTER_P (src) && ! HARD_REGNO_MODE_OK (REGNO (src), GET_MODE (src))))) return 0; } @@ -2052,7 +2052,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i we leave it up to the machine description to either accept or reject use-and-clobber patterns. */ if (!REG_P (reg) - || REGNO (reg) >= FIRST_PSEUDO_REGISTER + || !HARD_REGISTER_P (reg) || !fixed_regs[REGNO (reg)]) if (reg_overlap_mentioned_p (reg, src)) return 0; @@ -2075,7 +2075,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_i to be an explicit register variable, and was chosen for a reason. */ if (GET_CODE (src) == ASM_OPERANDS - && REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER) + && REG_P (dest) && HARD_REGISTER_P (dest)) return 0; /* If INSN contains volatile references (specifically volatile MEMs), @@ -2221,7 +2221,7 @@ combinable_i3pat (rtx_insn *i3, rtx *loc, rtx i2de checks this; here, we do a more specific test for this case. */ || (REG_P (inner_dest) - && REGNO (inner_dest) < FIRST_PSEUDO_REGISTER + && HARD_REGISTER_P (inner_dest) && (! HARD_REGNO_MODE_OK (REGNO (inner_dest), GET_MODE (inner_dest)))) || (i1_not_in_src && reg_overlap_mentioned_p (i1dest, src)) @@ -2469,7 +2469,7 @@ can_change_dest_mode (rtx x, int added_sets, machi regno = REGNO (x); /* Allow hard registers if the new mode is legal, and occupies no more registers than the old mode. */ - if (regno < FIRST_PSEUDO_REGISTER) + if (HARD_REGISTER_NUM_P (regno)) return (HARD_REGNO_MODE_OK (regno, mode) && REG_NREGS (x) >= hard_regno_nregs[regno][mode]); @@ -2790,7 +2790,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn if (i1 == 0 && NONJUMP_INSN_P (i3) && GET_CODE (PATTERN (i3)) == SET && REG_P (SET_SRC (PATTERN (i3))) - && REGNO (SET_SRC (PATTERN (i3))) >= FIRST_PSEUDO_REGISTER + && !HARD_REGISTER_P (SET_SRC (PATTERN (i3))) && find_reg_note (i3, REG_DEAD, SET_SRC (PATTERN (i3))) && GET_CODE (PATTERN (i2)) == PARALLEL && ! side_effects_p (SET_DEST (PATTERN (i3))) @@ -3219,7 +3219,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn { unsigned int regno = REGNO (newpat_dest); compare_mode = new_mode; - if (regno < FIRST_PSEUDO_REGISTER) + if (HARD_REGISTER_NUM_P (regno)) newpat_dest = gen_rtx_REG (compare_mode, regno); else { @@ -3588,7 +3588,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn machine_mode old_mode = GET_MODE (i2dest); rtx ni2dest; - if (REGNO (i2dest) < FIRST_PSEUDO_REGISTER) + if (HARD_REGISTER_P (i2dest)) ni2dest = gen_rtx_REG (new_mode, REGNO (i2dest)); else { @@ -3604,7 +3604,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn m_split_insn = combine_split_insns (parallel, i3); if (m_split_insn == 0 - && REGNO (i2dest) >= FIRST_PSEUDO_REGISTER) + && !HARD_REGISTER_P (i2dest)) { struct undo *buf; @@ -3725,7 +3725,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn validated that we can do this. */ if (GET_MODE (i2dest) != split_mode && split_mode != VOIDmode) { - if (REGNO (i2dest) < FIRST_PSEUDO_REGISTER) + if (HARD_REGISTER_P (i2dest)) newdest = gen_rtx_REG (split_mode, REGNO (i2dest)); else { @@ -5390,7 +5390,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int i if (code == SUBREG && REG_P (to) - && REGNO (to) < FIRST_PSEUDO_REGISTER + && HARD_REGISTER_P (to) && simplify_subreg_regno (REGNO (to), GET_MODE (to), SUBREG_BYTE (x), GET_MODE (x)) < 0) @@ -6645,7 +6645,7 @@ simplify_set (rtx x) unsigned int regno = REGNO (dest); rtx new_dest; - if (regno < FIRST_PSEUDO_REGISTER) + if (HARD_REGISTER_NUM_P (regno)) new_dest = gen_rtx_REG (compare_mode, regno); else { @@ -6750,7 +6750,7 @@ simplify_set (rtx x) < GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))) #endif #ifdef CANNOT_CHANGE_MODE_CLASS - && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER + && ! (REG_P (dest) && HARD_REGISTER_P (dest) && REG_CANNOT_CHANGE_MODE_P (REGNO (dest), GET_MODE (SUBREG_REG (src)), GET_MODE (src))) @@ -9815,7 +9815,7 @@ reg_nonzero_bits_for_combine (const_rtx x, machine && rsp->last_set_label < label_tick) || (rsp->last_set_label == label_tick && DF_INSN_LUID (rsp->last_set) < subst_low_luid) - || (REGNO (x) >= FIRST_PSEUDO_REGISTER + || (!HARD_REGISTER_P (x) && REGNO (x) < reg_n_sets_max && REG_N_SETS (REGNO (x)) == 1 && !REGNO_REG_SET_P @@ -9879,7 +9879,7 @@ reg_num_sign_bit_copies_for_combine (const_rtx x, && rsp->last_set_label < label_tick) || (rsp->last_set_label == label_tick && DF_INSN_LUID (rsp->last_set) < subst_low_luid) - || (REGNO (x) >= FIRST_PSEUDO_REGISTER + || (!HARD_REGISTER_P (x) && REGNO (x) < reg_n_sets_max && REG_N_SETS (REGNO (x)) == 1 && !REGNO_REG_SET_P @@ -12921,7 +12921,7 @@ record_truncated_value (rtx x) } /* ??? For hard-regs we now record everything. We might be able to optimize this using last_set_mode. */ - else if (REG_P (x) && REGNO (x) < FIRST_PSEUDO_REGISTER) + else if (REG_P (x) && HARD_REGISTER_P (x)) truncated_mode = GET_MODE (x); else return false; @@ -13012,7 +13012,7 @@ get_last_value_validate (rtx *loc, rtx_insn *insn, if (rsp->last_set_invalid /* If this is a pseudo-register that was only set once and not live at the beginning of the function, it is always valid. */ - || (! (regno >= FIRST_PSEUDO_REGISTER + || (! (!HARD_REGISTER_NUM_P (regno) && regno < reg_n_sets_max && REG_N_SETS (regno) == 1 && (!REGNO_REG_SET_P @@ -13129,7 +13129,7 @@ get_last_value (const_rtx x) if (value == 0 || (rsp->last_set_label < label_tick_ebb_start - && (regno < FIRST_PSEUDO_REGISTER + && (HARD_REGISTER_NUM_P (regno) || regno >= reg_n_sets_max || REG_N_SETS (regno) != 1 || REGNO_REG_SET_P @@ -13257,7 +13257,7 @@ reg_dead_at_p (rtx reg, rtx_insn *insn) /* Check that reg isn't mentioned in NEWPAT_USED_REGS. For fixed registers we allow the machine description to decide whether use-and-clobber patterns are OK. */ - if (reg_dead_regno < FIRST_PSEUDO_REGISTER) + if (HARD_REGISTER_NUM_P (reg_dead_regno)) { for (i = reg_dead_regno; i < reg_dead_endregno; i++) if (!fixed_regs[i] && TEST_HARD_REG_BIT (newpat_used_regs, i)) @@ -13331,7 +13331,7 @@ mark_used_regs_combine (rtx x) regno = REGNO (x); /* A hard reg in a wide mode may really be multiple registers. If so, mark all of them just like the first. */ - if (regno < FIRST_PSEUDO_REGISTER) + if (HARD_REGISTER_NUM_P (regno)) { /* None of this applies to the stack, frame or arg pointers. */ if (regno == STACK_POINTER_REGNUM @@ -13449,7 +13449,7 @@ move_deaths (rtx x, rtx maybe_kill_insn, int from_ including X. In that case, we must put REG_DEAD notes for the remaining registers in place of NOTE. */ - if (note != 0 && regno < FIRST_PSEUDO_REGISTER + if (note != 0 && HARD_REGISTER_NUM_P (regno) && (GET_MODE_SIZE (GET_MODE (XEXP (note, 0))) > GET_MODE_SIZE (GET_MODE (x)))) { @@ -13472,7 +13472,7 @@ move_deaths (rtx x, rtx maybe_kill_insn, int from_ || (note != 0 && (GET_MODE_SIZE (GET_MODE (XEXP (note, 0))) < GET_MODE_SIZE (GET_MODE (x))))) - && regno < FIRST_PSEUDO_REGISTER + && HARD_REGISTER_NUM_P (regno) && REG_NREGS (x) > 1) { unsigned int ourend = END_REGNO (x); @@ -13588,7 +13588,7 @@ reg_bitfield_target_p (rtx x, rtx body) return 0; tregno = REGNO (target), regno = REGNO (x); - if (tregno >= FIRST_PSEUDO_REGISTER || regno >= FIRST_PSEUDO_REGISTER) + if (!HARD_REGISTER_NUM_P (tregno) || !HARD_REGISTER_NUM_P (regno)) return target == x; endtregno = end_hard_regno (GET_MODE (target), tregno); @@ -13922,7 +13922,7 @@ distribute_notes (rtx notes, rtx_insn *from_insn, TEM_INSN is doing. If so, delete TEM_INSN. Otherwise, make this into a REG_UNUSED note instead. Don't delete sets to global register vars. */ - if ((REGNO (XEXP (note, 0)) >= FIRST_PSEUDO_REGISTER + if ((!HARD_REGISTER_P (XEXP (note, 0)) || !global_regs[REGNO (XEXP (note, 0))]) && reg_set_p (XEXP (note, 0), PATTERN (tem_insn))) { -- Alan Modra Australia Development Lab, IBM