On Jul 26, 2005, at 12:51 AM, Paolo Bonzini wrote:
Dale Johannesen wrote:
With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code
like
double d = atof(foo);
int i = d;
call atof
fstpl -8(%ebp)
movsd -8(%ebp), %xmm0
cvttsd2si %xmm0, %eax
(This is Linux, Darwin is similar.) I think the difficulty is that
for
(set (reg/v:DF 58 [ d ]) (reg:DF 8 st)) 64 {*movdf_nointeger}
Try the attached patch. It gave a 3% speedup on -mfpmath=sse for
tramp3d. Richard Henderson asked for SPEC testing, then it may go in.
Thanks. That's progress; the cost computation in regclass now figures
out that memory
is that fastest place to put R58:
Register 58 costs: AD_REGS:87000 Q_REGS:87000 NON_Q_REGS:87000
INDEX_REGS:87000 LEGACY_REGS:87000 GENERAL_REGS:87000 FP_TOP_REG:49000
FP_SECOND_REG:50000 FLOAT_REGS:50000 SSE_REGS:50000
FP_TOP_SSE_REGS:75000
FP_SECOND_SSE_REGS:75000 FLOAT_SSE_REGS:75000 FLOAT_INT_REGS:87000
INT_SSE_REGS:91000 FLOAT_INT_SSE_REGS:91000
ALL_REGS:91000 MEM:40000
Unfortunately local-alloc insists on putting in a register anyway
(ST(0) instead of an XMM,
but the end codegen is unchanged):
;; Register 58 in 8.
I think the RA may be missing the concept that memory might be faster
than any possible register....
will dig further.