On Wed, May 13, 2020 at 6:17 AM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Wed, May 13, 2020 at 2:37 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > On Wed, May 13, 2020 at 5:04 AM Uros Bizjak <ubiz...@gmail.com> wrote:
> > >
> > > On Wed, May 13, 2020 at 1:05 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> > > >
> > > > On Tue, May 12, 2020 at 10:07 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > > >
> > > > > Update STV pass to properly count cost of XMM register push.  In 
> > > > > 32-bit
> > > > > mode, to convert XMM register push in DImode, we do an XMM store in
> > > > > DImode, followed by 2 memory pushes in SImode, instead of 2 integer
> > > > > register pushes in SImode.  To convert XM register push in SImode, we
> > > > > do an XMM register to integer register move in SImode, followed an
> > > > > integer register push in SImode, instead of an integer register push 
> > > > > in
> > > > > SImode.  In 64-bit mode, we do an XMM register to integer register 
> > > > > move
> > > > > in SImode or DImode, followed an integer register push in SImode or
> > > > > DImode, instead of an integer register push SImode or DImode.
> > > > >
> > > > > Tested on Linux/x86 and Linux/x86-64.
> > > >
> > > > I think it is better to implement XMM register pushes, and split them
> > > > after reload to a sequence of:
> > > >
> > > > (set (reg:P SP_REG) (plus:P SP_REG) (const_int -8)))
> > > > (set (match_dup 0) (match_dup 1))
> > > >
> > > > This is definitely better than trips through memory to stack.
> > >
> > > Attached (untested patch) allows fake pushes from XMM registers, so
> > > STV pass can allow pushes.
> >
> > The problem isn't STV pass.  The IRA pass won't assign hard register for
>
> Please see the sequence before STV pass:
>
> (insn 24 23 25 3 (set (reg/v:DI 85 [ target ])
>         (mem:DI (reg/f:SI 86 [ target_p ]) [2 *target_p.0_1+0 S8
> A32])) "pr95021.c":17:7 64 {*movdi_internal}
>      (expr_list:REG_DEAD (reg/f:SI 86 [ target_p ])
>         (nil)))
>
> (insn 26 25 27 3 (set (mem:DI (reg/f:SI 87 [ c ]) [2 c.1_2->target+0 S8 A32])
>         (reg/v:DI 85 [ target ])) "pr95021.c":18:15 64 {*movdi_internal}
>      (expr_list:REG_DEAD (reg/f:SI 87 [ c ])
>         (nil)))
>
> (insn 28 27 29 3 (set (mem:DI (pre_dec:SI (reg/f:SI 7 sp)) [2  S8 A64])
>         (reg/v:DI 85 [ target ])) "pr95021.c":19:5 40 {*pushdi2}
>      (expr_list:REG_DEAD (reg/v:DI 85 [ target ])
>         (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
>             (nil))))
>
> So, (reg 85) gets moved to a memory in (insn 26) and pushed to a stack
> in (insn 28).
>
> STV pass does this:
>
> (insn 24 41 37 3 (set (subreg:V2DI (reg:DI 89) 0)
>         (subreg:V2DI (reg:DI 91) 0)) "pr95021.c":17:7 1342 {movv2di_internal}
>      (expr_list:REG_DEAD (reg/f:SI 86 [ target_p ])
>         (nil)))
>
> (insn 37 24 38 3 (set (reg:V2DI 90)
>         (subreg:V2DI (reg:DI 89) 0)) "pr95021.c":17:7 -1
>      (nil))
> (insn 38 37 39 3 (set (subreg:SI (reg/v:DI 85 [ target ]) 0)
>         (subreg:SI (reg:V2DI 90) 0)) "pr95021.c":17:7 -1
>      (nil))
> (insn 39 38 40 3 (set (reg:V2DI 90)
>         (lshiftrt:V2DI (reg:V2DI 90)
>             (const_int 32 [0x20]))) "pr95021.c":17:7 -1
>      (nil))
> (insn 40 39 25 3 (set (subreg:SI (reg/v:DI 85 [ target ]) 4)
>         (subreg:SI (reg:V2DI 90) 0)) "pr95021.c":17:7 -1
>      (nil))
>
> (insn 26 25 27 3 (set (mem:DI (reg/f:SI 87 [ c ]) [2 c.1_2->target+0 S8 A32])
>         (reg:DI 89)) "pr95021.c":18:15 64 {*movdi_internal}
>      (expr_list:REG_DEAD (reg/f:SI 87 [ c ])
>         (nil)))
>
> (insn 28 27 29 3 (set (mem:DI (pre_dec:SI (reg/f:SI 7 sp)) [2  S8 A64])
>         (reg/v:DI 85 [ target ])) "pr95021.c":19:5 40 {*pushdi2}
>      (expr_list:REG_DEAD (reg/v:DI 85 [ target ])
>         (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
>             (nil))))
>
> For some reason (reg 89) moves to (reg 85) via sequence (insn 37) to
> (insn 40), splitting to SImode and reassembling back to (reg 85).
>
> >
> > (insn 28 27 29 3 (set (mem:DI (pre_dec:SI (reg/f:SI 7 sp)) [2  S8 A64])
> >         (reg/v:DI 85 [ target ])) "x.i":19:5 40 {*pushdi2}
> >      (expr_list:REG_DEAD (reg/v:DI 85 [ target ])
> >         (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
> >             (nil))))
> >
> > and the reload pass turns into
> >
> > ....
> > (insn 28 27 29 3 (set (mem:DI (pre_dec:SI (reg/f:SI 7 sp)) [2  S8 A64])
> >         (mem/c:DI (plus:SI (reg/f:SI 7 sp)
> >                 (const_int 16 [0x10])) [8 %sfp+-8 S8 A64])) "x.i":19:5
> > 40 {*pushdi2}
> >      (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
> >         (nil)))
>
> This is an optimization of the reload pass, it figures out where the
> values live and tries its best to assign hard regs and stack slots to
> the above convoluted sequence. Note, that DImode push from integer
> regs splits to SImode pushes after reload. This has nothing with STV
> pass.
>
> The question is, why STV pass creates its funny sequence? The original
> sequence should be easily solved by storing DImode from XMM register
> and (with patched gcc) pushing DImode value from the same XMM
> register.

STV pass is mostly OK since it has to use XMM to move upper and lower
32 bits of a 64-bit integer.  The problem is that push XMM becomes very
expensive later.

-- 
H.J.

Reply via email to