On Mon, May 30, 2022 at 2:22 PM Alexander Monakov via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > > In the PR, the spill happens in the initial basic block of the function, > > > i.e. > > > the one with the highest frequency. > > > > > > Also as noted in the PR, swapping the 'unlikely' branch to 'likely' > > > avoids the spill, > > > even though it does not affect the frequency of the initial basic block, > > > and > > > makes the block with the use more rarely executed. > > > > The spill is mainly decided by 3 insns related to r92 > > > > 283(insn 3 61 4 2 (set (reg/v:SF 92 [ x ]) > > 284 (reg:SF 102)) "test3.c":7:1 142 {*movsf_internal} > > 285 (expr_list:REG_DEAD (reg:SF 102) > > > > 288(insn 9 4 12 2 (set (reg:SI 89 [ _11 ]) > > 289 (subreg:SI (reg/v:SF 92 [ x ]) 0)) "test3.c":3:36 81 > > {*movsi_internal} > > 290 (nil)) > > > > And > > 382(insn 28 27 29 5 (set (reg:DF 98) > > 383 (float_extend:DF (reg/v:SF 92 [ x ]))) "test3.c":11:13 163 > > {*extendsfdf2} > > 384 (expr_list:REG_DEAD (reg/v:SF 92 [ x ]) > > 385 (nil))) > > 386(insn 29 28 30 5 (s > > > > The frequency the for INSN 3 and INSN 9 is not affected, but frequency of > > INSN > > 28 drop from 805 -> 89 after swapping "unlikely" and "likely". Because of > > that, GPR cost decreases a lot, finally make the RA choose GPR instead of > > MEM. > > > > GENERAL_REGS:2356,2356 > > SSE_REGS:6000,6000 > > MEM:4089,4089 > > But why are SSE_REGS costed so high? r92 is used in SFmode, it doesn't make > sense that selecting a GPR for it looks cheaper than xmm0. For INSN3 and INSN 28, SSE_REGS costs zero. But for INSN 9, it's a SImode move, we have disparaged non-gpr alternatives in movsi_internal pattern which finally makes SSE_REGS costs 6 * 1000(1000 is frequency, 6 is move cost between SSE_REGS and GPR, sse_to_integer/integer_to_sse). value of sse_to_integer/integer_to_sse is decided as 6(3 times as GPR move cost) to avoid too many spills between gpr and xmm, we once set sse_to_integer/integer_to_sse as 2 and generated many movd instructions but finally caused frequency drop and regressed performance.
Another choice is changing movsi_internal alternatives from *v to ?v(just like what we did in *movsf_internal for gpr alternatives), then the separate change can fix PR, and also eliminate the extra movd, but it may regress cases elsewhere. Any thoughts for changing *v to ?v @Uros Bizjak > > > Dump of 301.ira: > > 67 a4(r92,l0) costs: AREG:2356,2356 DREG:2356,2356 CREG:2356,2356 > > BREG:2356,2356 SIREG:2356,2356 DIREG:2356,2356 AD_REGS:2356,2356 > > CLOBBERED_REGS:2356,2356 Q_REGS:2356,2356 NON_Q_REGS:2356,2356 > > TLS_GOTBASE_REGS:2356,2356 GENERAL_REGS:2356,2356 SSE_FIRST_REG:6000,6000 > > NO_REX_SSE_REGS:6000,6000 SSE_REGS:6000,6000 \ > > MMX_REGS:19534,19534 INT_SSE_REGS:19534,19534 ALL_REGS:214534,214534 > > MEM:4089,4089 > > > > And although there's no spill, there's an extra VMOVD in the later BB which > > looks suboptimal(Guess we can stand with that since it's cold.) > > I think that falls out of the wrong decision for SSE_REGS cost. > > Alexander > > > > > 24 vmovd %eax, %xmm2 > > 25 vcvtss2sd %xmm2, %xmm2, %xmm1 > > 26 vmulsd %xmm0, %xmm1, %xmm0 > > 27 vcvtsd2ss %xmm0, %xmm0, %xmm0 > > > > > > Do you have a root cause analysis that explains the above? > > > > > > Alexander -- BR, Hongtao