https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63525

            Bug ID: 63525
           Summary: unnecessary reloads generated in loop
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wmi at google dot com
                CC: vmakarov at gcc dot gnu.org

Created attachment 33700
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33700&action=edit
testcase 1.cxx

For the testcase 1.cxx attached, trunk (r214579) generates an addpd with mem
operand and one extra reload insn in kernel loop. For g++ before r204274, it
generate less insns in the kernel loop.

~/workarea/gcc-r214579/build/install/bin/g++ -O2 -S 1.cxx -o 1.s
kernel loop:
.L3:
       pxor    %xmm0, %xmm0
       cvtsi2sd        %eax, %xmm0
       addl    $1, %eax
       cmpl    %edx, %eax
       unpcklpd        %xmm0, %xmm0
       addpd   -24(%rsp), %xmm0             ===> mem operand used
       movaps  %xmm0, -24(%rsp)           ===> reload
       jne     .L3

~/workarea/gcc-r199418/build/install/bin/g++ -O2 -S 1.cxx -o 2.s
kernel loop:
.L3:
       xorpd   %xmm1, %xmm1
       cvtsi2sd        %eax, %xmm1
       addl    $1, %eax
       unpcklpd        %xmm1, %xmm1
       addpd   %xmm1, %xmm0
       cmpl    %edx, %eax
       jne     .L3


The reload insns in trunk are generated because of following steps:

With r204274, the IR after expand like this:
Loop:
...
(insn 15 14 16 5 (set (reg/v:V2DF 83 [ v ])
       (plus:V2DF (reg/v:V2DF 83 [ v ])
           (reg:V2DF 92 [ D.5005 ]))) 1.cxx:14 -1
    (nil))
...
end Loop.
(insn 23 22 24 7 (set (reg/v:TI 90 [ tmp ])
       (subreg:TI (reg/v:V2DF 83 [ v ]) 0))
/usr/local/google/home/wmi/workarea/gcc-r212442/build/install/lib/gcc/x86_64-unknown-linux-gnu/4.10.0/include/emmintrin.h:157
-1
    (nil))
(insn 24 23 25 7 (set (mem/c:DF (symbol_ref:DI ("x") [flags 0x2]  <var_decl
0x7ffff5c6d5a0 x>) [2 x+0 S8 A64])
       (subreg:DF (reg/v:TI 90 [ tmp ]) 0)) 1.cxx:17 -1
    (nil))
(insn 25 24 0 7 (set (mem/c:DF (symbol_ref:DI ("y") [flags 0x2]  <var_decl
0x7ffff5c6d630 y>) [2 y+0 S8 A64])
       (subreg:DF (reg/v:TI 90 [ tmp ]) 8)) 1.cxx:18 -1
    (nil))

forward propagation will propagate reg 90 from insn 23 to insn 24 and insn 25,
and remove subreg:TI, so we get the IR before IRA like this:

Loop:
...
(insn 15 14 16 4 (set (reg/v:V2DF 83 [ v ])
       (plus:V2DF (reg/v:V2DF 83 [ v ])
           (reg:V2DF 92 [ D.5005 ]))) 1.cxx:14 1263 {*addv2df3}
    (expr_list:REG_DEAD (reg:V2DF 92 [ D.5005 ])
       (nil)))
...
end Loop.
(insn 24 22 25 5 (set (mem/c:DF (symbol_ref:DI ("x") [flags 0x2]  <var_decl
0x7ffff5c6d5a0 x>) [2 x+0 S8 A64])
       (subreg:DF (reg/v:V2DF 83 [ v ]) 0)) 1.cxx:17 128 {*movdf_internal}
    (nil))
(insn 25 24 0 5 (set (mem/c:DF (symbol_ref:DI ("y") [flags 0x2]  <var_decl
0x7ffff5c6d630 y>) [2 y+0 S8 A64])
       (subreg:DF (reg/v:V2DF 83 [ v ]) 8)) 1.cxx:18 128 {*movdf_internal}
    (expr_list:REG_DEAD (reg/v:V2DF 83 [ v ])
       (nil)))

ix86_cannot_change_mode_class doesn't allow such subreg: "subreg:DF (reg/v:V2DF
83 [ v ]) 8)" in insn 25, so reg 83 will be added in invalid_mode_changes by
record_subregs_of_mode and will be allocated NO_REGS regclass.

reg 83 has NO_REGS regclass while plus:V2DF requires the target operand to be
xmm register in insn 15, so reload insns are needed. The kernel loop has low
register pressure and it doesn't form a separate IRA region, so live range
splitting on region boarder doesn't kick in here.

Without r204274, IR after expand is like this:
Loop:
...
(insn 15 14 16 5 (set (reg/v:V2DF 61 [ v ])
       (plus:V2DF (reg/v:V2DF 61 [ v ])
           (reg:V2DF 68 [ D.4966 ]))) 1.cxx:14 -1
    (nil))
...
End Loop.
(insn 25 24 26 7 (set (subreg:V2DF (reg/v:TI 66 [ tmp ]) 0)
       (reg/v:V2DF 61 [ v ]))
/usr/local/google/home/wmi/workarea/gcc-r199418/build/install/lib/gcc/x86_64-unknown-linux-gnu/4.9.0/include/emmintrin.h:147
-1
    (nil))
(insn 26 25 27 7 (set (mem/c:DF (symbol_ref:DI ("x") [flags 0x2]  <var_decl
0x7ffff5e80be0 x>) [2 x+0 S8 A64])
       (subreg:DF (reg/v:TI 66 [ tmp ]) 0)) 1.cxx:17 -1
    (nil))
(insn 27 26 0 7 (set (mem/c:DF (symbol_ref:DI ("y") [flags 0x2]  <var_decl
0x7ffff5e80c78 y>) [2 y+0 S8 A64])
       (subreg:DF (reg/v:TI 66 [ tmp ]) 8)) 1.cxx:18 -1
    (nil))

Because the subreg is on the left handside of insn 25, it is impossible for
forward propagation to merge insn 25 to insn 26 and insn 27. reg 61 will not
have reference like this: "subreg:DF (reg/v:V2DF 61 [ v ]) 8)", so it gets SSE
regclass and will not introduce extra reload insns in kernel loop.

r204274 just enables more forward propagations and exposes the problem here.

Reply via email to