Worse code generation for FPU on versions after 6

law at redhat dot com Tue, 11 Dec 2018 21:49:51 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79593


--- Comment #21 from Jeffrey A. Law <law at redhat dot com> ---


We have this after IRA:

(insn 27 26 28 4 (set (reg:DI 101 [ pretmp_22 ])
        (zero_extend:DI (subreg:SI (reg:SF 91 [ pretmp_22 ]) 0))) "j.C":20:35
114 {*zero_extendsidi2}
     (expr_list:REG_DEAD (reg:SF 91 [ pretmp_22 ])
        (nil)))
(insn 28 27 29 4 (set (reg:XF 100)
        (float:XF (reg:DI 101 [ pretmp_22 ]))) "j.C":20:35 169 {floatdixf2}
     (expr_list:REG_DEAD (reg:DI 101 [ pretmp_22 ])
        (nil)))

Where 91 and 101 will get assigned to memory locations because of the 'm'
constraint for floatdixf2.  r100 gets a hard register.  We're going to need a
reload for insn 27.  So after LRA we have:

(insn 100 26 27 4 (set (reg:SI 0 ax [110])
        (mem/c:SI (reg/f:SI 7 sp) [6 %sfp+-8 S4 A64])) "j.C":20:35 67
{*movsi_internal}
     (nil))
(insn 27 100 28 4 (set (mem/c:DI (reg/f:SI 7 sp) [6 %sfp+-8 S8 A64])
        (zero_extend:DI (reg:SI 0 ax [110]))) "j.C":20:35 114
{*zero_extendsidi2}
     (nil))

[  insn 28 doesn't really play a role here other than requiring the 'm'
operand]

The x86 backend has a splitter to optimize insn 27.  So post LRA splitting
generates:

(insn 100 26 107 4 (set (reg:SI 0 ax [110])
        (mem/c:SI (reg/f:SI 7 sp) [6 %sfp+-8 S4 A64])) "j.C":20:35 67
{*movsi_internal}
     (nil))
(insn 107 100 108 4 (set (mem/c:SI (reg/f:SI 7 sp) [6 %sfp+-8 S4 A64])
        (reg:SI 0 ax [110])) "j.C":20:35 67 {*movsi_internal}
     (nil))
(insn 108 107 28 4 (set (mem/c:SI (plus:SI (reg/f:SI 7 sp)
                (const_int 4 [0x4])) [6 %sfp+-4 S4 A32])
        (const_int 0 [0])) "j.C":20:35 67 {*movsi_internal}
     (nil))


Now we've finally exposed the redundancy.    This can be addressed in DSE2
which runs after SPLIT2.  But it's not all that generally effective.  Figure
we're getting ~8 hits per stage during a bootstrap -- all in the runtime
system.


I looked at perhaps trying to detect the partial dead store in postreload-gcse.
 THere's a lot of good memory tracking bits in here, but it's still not a good
fit.  

It doesn't really feel like an IRA/LRA problem to me.  Their decisions are sane
AFAICT.

We could try and catch it with a new peephole pattern, but that seems even less
desirable than detecting this in a generic way during DSE.

[Bug rtl-optimization/79593] [7/8/9 Regression] Poor/Worse code generation for FPU on versions after 6

Reply via email to