Hi! This patch adds a little optimization, if %st and %st(1) are results of memory loads and we need to exchange them, it is shorter (and on older machines probably also cheaper) to swap the two loads.
This triggers 26783 in x86_64-linux and i686-linux bootstraps+regtests. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Initially I thought I should also adjust debug insns in between i1 and i2 if they refer to i387 stack registers, but further look at regstack shows that nothing does that and that such debug insns are wrong-debug for multiple reasons. In particular, e.g. emit_swap_insn alone emits the fxch right after i1 or at the start of bb, without trying to adjust debug insns in between that and insn (we should swap %st with the other reg in those). More importantly, at least if the debug regnos actually mean what they stand in the assembly (i.e. the first one %st, the second one %st(1) etc.), then we would need to ensure that every insn that pops or pushes anything onto the i387 stack should update all DWARF expressions that refer to those registers. If that is really the case, the easiest solution for this might be either to reset all debug insns that refer to %st to %st(7) registers (easy), or replace all those registers in debug insns with debug temporaries (8 debug temporaries for each pre-regstack register) and then on i387 pushes or pops emit debug insns for those temporaries, changing how they are mapped to the actual hw registers and let var-tracking.c handle the rest. I guess I should file a PR about this. 2017-01-25 Jakub Jelinek <ja...@redhat.com> PR target/70465 * reg-stack.c (emit_swap_insn): Instead of fld a; fld b; fxchg %st(1); emit fld b; fld a; if possible. * gcc.target/i386/pr70465.c: New test. --- gcc/reg-stack.c.jj 2017-01-25 11:38:54.924320927 +0100 +++ gcc/reg-stack.c 2017-01-25 12:12:08.459045376 +0100 @@ -887,6 +887,52 @@ emit_swap_insn (rtx_insn *insn, stack_pt && REG_P (i1src) && REGNO (i1src) == FIRST_STACK_REG && find_regno_note (i1, REG_DEAD, FIRST_STACK_REG) == NULL_RTX) return; + + if (REG_P (i1dest) + && REGNO (i1dest) == FIRST_STACK_REG + && MEM_P (SET_SRC (i1set)) + && !side_effects_p (SET_SRC (i1set)) + && hard_regno == FIRST_STACK_REG + 1 + && i1 != BB_HEAD (current_block)) + { + rtx_insn *i2 = NULL; + rtx i2set; + rtx_insn *tmp = PREV_INSN (i1); + rtx_insn *limit = PREV_INSN (BB_HEAD (current_block)); + while (tmp != limit) + { + if (LABEL_P (tmp) + || CALL_P (tmp) + || NOTE_INSN_BASIC_BLOCK_P (tmp) + || (NONJUMP_INSN_P (tmp) + && stack_regs_mentioned (tmp))) + { + i2 = tmp; + break; + } + tmp = PREV_INSN (tmp); + } + if (i2 != NULL_RTX + && (i2set = single_set (i2)) != NULL_RTX) + { + /* Instead of fld a; fld b; fxch %st(1); just + use fld b; fld a; if possible. */ + rtx i2dest = *get_true_reg (&SET_DEST (i2set)); + if (REG_P (i2dest) + && REGNO (i2dest) == FIRST_STACK_REG + && MEM_P (SET_SRC (i2set)) + && !side_effects_p (SET_SRC (i2set)) + && !modified_between_p (SET_SRC (i1set), i2, i1)) + { + remove_insn (i1); + SET_PREV_INSN (i1) = NULL_RTX; + SET_NEXT_INSN (i1) = NULL_RTX; + set_block_for_insn (i1, NULL); + emit_insn_before (i1, i2); + return; + } + } + } } /* Avoid emitting the swap if this is the first register stack insn --- gcc/testsuite/gcc.target/i386/pr70465.c.jj 2017-01-25 12:24:25.183041154 +0100 +++ gcc/testsuite/gcc.target/i386/pr70465.c 2017-01-25 12:23:59.000000000 +0100 @@ -0,0 +1,12 @@ +/* PR target/70465 */ +/* { dg-do compile { target ia32 } } */ +/* { dg-options "-O2 -mfpmath=387 -fomit-frame-pointer" } */ +/* { dg-final { scan-assembler-not "fxch\t%st.1" } } */ + +double +atan2 (double y, double x) +{ + double res = 0.0; + asm ("fpatan" : "=t" (res) : "u" (y), "0" (x) : "st(1)"); + return res; +} Jakub