On 8/17/20 4:42 AM, Roger Sayle wrote:
> This patch catches a missed optimization opportunity where GCC currently
> generates worse code than LLVM.  The issue, as nicely analyzed in bugzilla,
> boils down to the following three insns in combine:
>
> (insn 6 5 7 2 (parallel [
>             (set (reg:DI 85)
>                 (ashift:DI (reg:DI 85)
>                     (const_int 32 [0x20])))
>             (clobber (reg:CC 17 flags))
>         ]) "pr92180.c":4:10 564 {*ashldi3_1}
>      (expr_list:REG_UNUSED (reg:CC 17 flags)
>         (nil)))
> (insn 7 6 14 2 (parallel [
>             (set (reg:DI 84)
>                 (ior:DI (reg:DI 84)
>                     (reg:DI 85)))
>             (clobber (reg:CC 17 flags))
>         ]) "pr92180.c":4:10 454 {*iordi_1}
>      (expr_list:REG_DEAD (reg:DI 85)
>         (expr_list:REG_UNUSED (reg:CC 17 flags)
>             (nil))))
> (insn 14 7 15 2 (set (reg/i:SI 0 ax)
>         (subreg:SI (reg:DI 84) 0)) "pr92180.c":5:1 67 {*movsi_internal}
>      (expr_list:REG_DEAD (reg:DI 84)
>         (nil)))
>
> Normally, combine/simplify-rtx would notice that insns 6 and 7
> (which update highpart bits) are unnecessary as the final insn 14
> only requires to lowpart bits.  The complication is that insn 14
> sets a hard register in targetm.class_likely_spilled_p which
> prevents combine from performing its simplifications, and removing
> the redundant instructions.
>
> At first glance a fix would appear to require changes to combine,
> potentially affecting code generation on all small register class
> targets...  An alternate (and I think clever) solution is to spot
> that this problematic situation can be avoided by the backend.
>
> At RTL expansion time, the middle-end has a clear separation between
> pseudos and hard registers, so the RTL initially contains:
>
> (insn 9 8 10 2 (set (reg:SI 86)
>         (subreg:SI (reg:DI 82 [ _1 ]) 0)) "pr92180.c":6:10 -1
>      (nil))
> (insn 10 9 14 2 (set (reg:SI 83 [ <retval> ])
>         (reg:SI 86)) "pr92180.c":6:10 -1
>      (nil))
> (insn 14 10 15 2 (set (reg/i:SI 0 ax)
>         (reg:SI 83 [ <retval> ])) "pr92180.c":7:1 -1
>      (nil))
>
> which can be optimized without problems by combine; it is only the
> intervening passes (initially fwprop1) that propagate computations
> into sets of hard registers, and disable those opportunities.
>
> The solution proposed here is to have the x86 backend/recog prevent
> early RTL passes composing instructions (that set likely_spilled hard
> registers) that they (combine) can't simplify, until after reload.
> We allow sets from pseudo registers, immediate constants and memory
> accesses, but anything more complicated is performed via a temporary
> pseudo.  Not only does this simplify things for the register allocator,
> but any remaining register-to-register moves are easily cleaned up
> by the late optimization passes after reload, such as peephole2 and
> cprop_hardreg.
>
> This patch has been tested on x86_64-pc-linux-gnu with a
> "make bootstrap" and a "make -k check" with no new failures.
> Ok for mainline?
>
>
> 2020-08-17  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>       PR rtl-optimization/92180
>       * config/i386/i386.c (ix86_hardreg_mov_ok): New function to
>       determine whether (set DST SRC) should be allowed at this point.
>       * config/i386/i386-protos.h (ix86_hardreg_mov_ok): Prototype here.
>       * config/i386/i386-expand.c (ix86_expand_move): Check whether
>       this is a complex set of a likely spilled hard register, and if
>       so place the value in a pseudo, and load the hard reg from it.
>       * config/i386/i386.md (*movdi_internal, *movsi_internal,
>       *movhi_internal, *movqi_internal): Make these instructions
>       conditional on ix86_hardreg_mov_ok.
>       (*lea<mode>): Make this define_insn_and_split conditional on
>       ix86_hardreg_mov_ok.
>
> gcc/testsuite/ChangeLog
>       PR rtl-optimization/92180
>       * gcc.target/i386/pr92180.c: New test.

In many way this reminds me of -fforce-mem and -fforce-addr and one
might argue that it should be implemented in the generic parts of the
compiler.   However, I'd be very hesitant to try and do that, it would
seem to be just asking for pain across a wide variety of targets.


I'll ACK and commit it momentarily.


Thanks for your patience,

Jeff



Reply via email to