On Mon, May 30, 2022 at 2:22 PM Alexander Monakov via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> > > In the PR, the spill happens in the initial basic block of the function, 
> > > i.e.
> > > the one with the highest frequency.
> > >
> > > Also as noted in the PR, swapping the 'unlikely' branch to 'likely' 
> > > avoids the spill,
> > > even though it does not affect the frequency of the initial basic block, 
> > > and
> > > makes the block with the use more rarely executed.
> >
> > The spill is mainly decided by 3 insns related to r92
> >
> > 283(insn 3 61 4 2 (set (reg/v:SF 92 [ x ])
> > 284        (reg:SF 102)) "test3.c":7:1 142 {*movsf_internal}
> > 285     (expr_list:REG_DEAD (reg:SF 102)
> >
> > 288(insn 9 4 12 2 (set (reg:SI 89 [ _11 ])
> > 289        (subreg:SI (reg/v:SF 92 [ x ]) 0)) "test3.c":3:36 81 
> > {*movsi_internal}
> > 290     (nil))
> >
> > And
> > 382(insn 28 27 29 5 (set (reg:DF 98)
> > 383        (float_extend:DF (reg/v:SF 92 [ x ]))) "test3.c":11:13 163 
> > {*extendsfdf2}
> > 384     (expr_list:REG_DEAD (reg/v:SF 92 [ x ])
> > 385        (nil)))
> > 386(insn 29 28 30 5 (s
> >
> > The frequency the for INSN 3 and INSN 9 is not affected, but frequency of 
> > INSN
> > 28 drop from 805 -> 89 after swapping "unlikely" and "likely".  Because of
> > that, GPR cost decreases a lot, finally make the RA choose GPR instead of 
> > MEM.
> >
> > GENERAL_REGS:2356,2356
> > SSE_REGS:6000,6000
> > MEM:4089,4089
>
> But why are SSE_REGS costed so high? r92 is used in SFmode, it doesn't make
> sense that selecting a GPR for it looks cheaper than xmm0.
For INSN3 and INSN 28, SSE_REGS costs zero.
But for INSN 9, it's a SImode move, we have disparaged non-gpr
alternatives in movsi_internal pattern which finally makes SSE_REGS
costs 6 * 1000(1000 is frequency, 6 is move cost between SSE_REGS and
GPR, sse_to_integer/integer_to_sse).
value of sse_to_integer/integer_to_sse is decided as 6(3 times as GPR
move cost) to avoid too many spills between gpr and xmm, we once set
sse_to_integer/integer_to_sse  as 2 and generated many movd
instructions but finally caused frequency drop and regressed
performance.

Another choice is changing movsi_internal alternatives from *v to
?v(just like what we did in *movsf_internal for gpr alternatives),
then the separate change can fix PR, and also eliminate the extra
movd, but it may regress cases elsewhere.
Any thoughts for changing *v to ?v @Uros Bizjak
>
> > Dump of 301.ira:
> > 67  a4(r92,l0) costs: AREG:2356,2356 DREG:2356,2356 CREG:2356,2356 
> > BREG:2356,2356 SIREG:2356,2356 DIREG:2356,2356 AD_REGS:2356,2356 
> > CLOBBERED_REGS:2356,2356 Q_REGS:2356,2356 NON_Q_REGS:2356,2356 
> > TLS_GOTBASE_REGS:2356,2356 GENERAL_REGS:2356,2356 SSE_FIRST_REG:6000,6000 
> > NO_REX_SSE_REGS:6000,6000 SSE_REGS:6000,6000 \
> >    MMX_REGS:19534,19534 INT_SSE_REGS:19534,19534 ALL_REGS:214534,214534 
> > MEM:4089,4089
> >
> > And although there's no spill, there's an extra VMOVD in the later BB which
> > looks suboptimal(Guess we can stand with that since it's cold.)
>
> I think that falls out of the wrong decision for SSE_REGS cost.
>
> Alexander
>
> >
> > 24        vmovd   %eax, %xmm2
> > 25        vcvtss2sd       %xmm2, %xmm2, %xmm1
> > 26        vmulsd  %xmm0, %xmm1, %xmm0
> > 27        vcvtsd2ss       %xmm0, %xmm0, %xmm0
> > >
> > > Do you have a root cause analysis that explains the above?
> > >
> > > Alexander



-- 
BR,
Hongtao

Reply via email to