On Wed, May 13, 2020 at 3:25 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Wed, May 13, 2020 at 6:17 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > On Wed, May 13, 2020 at 2:37 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > On Wed, May 13, 2020 at 5:04 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > > > > > On Wed, May 13, 2020 at 1:05 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > > > > > > > On Tue, May 12, 2020 at 10:07 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > > > > > Update STV pass to properly count cost of XMM register push. In > > > > > > 32-bit > > > > > > mode, to convert XMM register push in DImode, we do an XMM store in > > > > > > DImode, followed by 2 memory pushes in SImode, instead of 2 integer > > > > > > register pushes in SImode. To convert XM register push in SImode, > > > > > > we > > > > > > do an XMM register to integer register move in SImode, followed an > > > > > > integer register push in SImode, instead of an integer register > > > > > > push in > > > > > > SImode. In 64-bit mode, we do an XMM register to integer register > > > > > > move > > > > > > in SImode or DImode, followed an integer register push in SImode or > > > > > > DImode, instead of an integer register push SImode or DImode. > > > > > > > > > > > > Tested on Linux/x86 and Linux/x86-64. > > > > > > > > > > I think it is better to implement XMM register pushes, and split them > > > > > after reload to a sequence of: > > > > > > > > > > (set (reg:P SP_REG) (plus:P SP_REG) (const_int -8))) > > > > > (set (match_dup 0) (match_dup 1)) > > > > > > > > > > This is definitely better than trips through memory to stack. > > > > > > > > Attached (untested patch) allows fake pushes from XMM registers, so > > > > STV pass can allow pushes. > > > > > > The problem isn't STV pass. The IRA pass won't assign hard register for > > > > Please see the sequence before STV pass: > > > > (insn 24 23 25 3 (set (reg/v:DI 85 [ target ]) > > (mem:DI (reg/f:SI 86 [ target_p ]) [2 *target_p.0_1+0 S8 > > A32])) "pr95021.c":17:7 64 {*movdi_internal} > > (expr_list:REG_DEAD (reg/f:SI 86 [ target_p ]) > > (nil))) > > > > (insn 26 25 27 3 (set (mem:DI (reg/f:SI 87 [ c ]) [2 c.1_2->target+0 S8 > > A32]) > > (reg/v:DI 85 [ target ])) "pr95021.c":18:15 64 {*movdi_internal} > > (expr_list:REG_DEAD (reg/f:SI 87 [ c ]) > > (nil))) > > > > (insn 28 27 29 3 (set (mem:DI (pre_dec:SI (reg/f:SI 7 sp)) [2 S8 A64]) > > (reg/v:DI 85 [ target ])) "pr95021.c":19:5 40 {*pushdi2} > > (expr_list:REG_DEAD (reg/v:DI 85 [ target ]) > > (expr_list:REG_ARGS_SIZE (const_int 16 [0x10]) > > (nil)))) > > > > So, (reg 85) gets moved to a memory in (insn 26) and pushed to a stack > > in (insn 28). > > > > STV pass does this: > > > > (insn 24 41 37 3 (set (subreg:V2DI (reg:DI 89) 0) > > (subreg:V2DI (reg:DI 91) 0)) "pr95021.c":17:7 1342 > > {movv2di_internal} > > (expr_list:REG_DEAD (reg/f:SI 86 [ target_p ]) > > (nil))) > > > > (insn 37 24 38 3 (set (reg:V2DI 90) > > (subreg:V2DI (reg:DI 89) 0)) "pr95021.c":17:7 -1 > > (nil)) > > (insn 38 37 39 3 (set (subreg:SI (reg/v:DI 85 [ target ]) 0) > > (subreg:SI (reg:V2DI 90) 0)) "pr95021.c":17:7 -1 > > (nil)) > > (insn 39 38 40 3 (set (reg:V2DI 90) > > (lshiftrt:V2DI (reg:V2DI 90) > > (const_int 32 [0x20]))) "pr95021.c":17:7 -1 > > (nil)) > > (insn 40 39 25 3 (set (subreg:SI (reg/v:DI 85 [ target ]) 4) > > (subreg:SI (reg:V2DI 90) 0)) "pr95021.c":17:7 -1 > > (nil)) > > > > (insn 26 25 27 3 (set (mem:DI (reg/f:SI 87 [ c ]) [2 c.1_2->target+0 S8 > > A32]) > > (reg:DI 89)) "pr95021.c":18:15 64 {*movdi_internal} > > (expr_list:REG_DEAD (reg/f:SI 87 [ c ]) > > (nil))) > > > > (insn 28 27 29 3 (set (mem:DI (pre_dec:SI (reg/f:SI 7 sp)) [2 S8 A64]) > > (reg/v:DI 85 [ target ])) "pr95021.c":19:5 40 {*pushdi2} > > (expr_list:REG_DEAD (reg/v:DI 85 [ target ]) > > (expr_list:REG_ARGS_SIZE (const_int 16 [0x10]) > > (nil)))) > > > > For some reason (reg 89) moves to (reg 85) via sequence (insn 37) to > > (insn 40), splitting to SImode and reassembling back to (reg 85). > > > > > > > > (insn 28 27 29 3 (set (mem:DI (pre_dec:SI (reg/f:SI 7 sp)) [2 S8 A64]) > > > (reg/v:DI 85 [ target ])) "x.i":19:5 40 {*pushdi2} > > > (expr_list:REG_DEAD (reg/v:DI 85 [ target ]) > > > (expr_list:REG_ARGS_SIZE (const_int 16 [0x10]) > > > (nil)))) > > > > > > and the reload pass turns into > > > > > > .... > > > (insn 28 27 29 3 (set (mem:DI (pre_dec:SI (reg/f:SI 7 sp)) [2 S8 A64]) > > > (mem/c:DI (plus:SI (reg/f:SI 7 sp) > > > (const_int 16 [0x10])) [8 %sfp+-8 S8 A64])) "x.i":19:5 > > > 40 {*pushdi2} > > > (expr_list:REG_ARGS_SIZE (const_int 16 [0x10]) > > > (nil))) > > > > This is an optimization of the reload pass, it figures out where the > > values live and tries its best to assign hard regs and stack slots to > > the above convoluted sequence. Note, that DImode push from integer > > regs splits to SImode pushes after reload. This has nothing with STV > > pass. > > > > The question is, why STV pass creates its funny sequence? The original > > sequence should be easily solved by storing DImode from XMM register > > and (with patched gcc) pushing DImode value from the same XMM > > register. > > STV pass is mostly OK since it has to use XMM to move upper and lower > 32 bits of a 64-bit integer. The problem is that push XMM becomes very > expensive later.
As shown in the patch, XMM push should be just decrement of SP reg and move to the created stack slot. Uros.