> > In the PR, the spill happens in the initial basic block of the function, 
> > i.e.
> > the one with the highest frequency.
> > 
> > Also as noted in the PR, swapping the 'unlikely' branch to 'likely' avoids 
> > the spill,
> > even though it does not affect the frequency of the initial basic block, and
> > makes the block with the use more rarely executed.
> 
> The spill is mainly decided by 3 insns related to r92
> 
> 283(insn 3 61 4 2 (set (reg/v:SF 92 [ x ])
> 284        (reg:SF 102)) "test3.c":7:1 142 {*movsf_internal}
> 285     (expr_list:REG_DEAD (reg:SF 102)
> 
> 288(insn 9 4 12 2 (set (reg:SI 89 [ _11 ])
> 289        (subreg:SI (reg/v:SF 92 [ x ]) 0)) "test3.c":3:36 81 
> {*movsi_internal}
> 290     (nil))
> 
> And
> 382(insn 28 27 29 5 (set (reg:DF 98)
> 383        (float_extend:DF (reg/v:SF 92 [ x ]))) "test3.c":11:13 163 
> {*extendsfdf2}
> 384     (expr_list:REG_DEAD (reg/v:SF 92 [ x ])
> 385        (nil)))
> 386(insn 29 28 30 5 (s
> 
> The frequency the for INSN 3 and INSN 9 is not affected, but frequency of INSN
> 28 drop from 805 -> 89 after swapping "unlikely" and "likely".  Because of
> that, GPR cost decreases a lot, finally make the RA choose GPR instead of MEM.
> 
> GENERAL_REGS:2356,2356 
> SSE_REGS:6000,6000
> MEM:4089,4089

But why are SSE_REGS costed so high? r92 is used in SFmode, it doesn't make
sense that selecting a GPR for it looks cheaper than xmm0.

> Dump of 301.ira:
> 67  a4(r92,l0) costs: AREG:2356,2356 DREG:2356,2356 CREG:2356,2356 
> BREG:2356,2356 SIREG:2356,2356 DIREG:2356,2356 AD_REGS:2356,2356 
> CLOBBERED_REGS:2356,2356 Q_REGS:2356,2356 NON_Q_REGS:2356,2356 
> TLS_GOTBASE_REGS:2356,2356 GENERAL_REGS:2356,2356 SSE_FIRST_REG:6000,6000 
> NO_REX_SSE_REGS:6000,6000 SSE_REGS:6000,6000 \
>    MMX_REGS:19534,19534 INT_SSE_REGS:19534,19534 ALL_REGS:214534,214534 
> MEM:4089,4089
> 
> And although there's no spill, there's an extra VMOVD in the later BB which
> looks suboptimal(Guess we can stand with that since it's cold.)

I think that falls out of the wrong decision for SSE_REGS cost.

Alexander

> 
> 24        vmovd   %eax, %xmm2
> 25        vcvtss2sd       %xmm2, %xmm2, %xmm1
> 26        vmulsd  %xmm0, %xmm1, %xmm0
> 27        vcvtsd2ss       %xmm0, %xmm0, %xmm0
> > 
> > Do you have a root cause analysis that explains the above?
> > 
> > Alexander

Reply via email to