On Jul 26, 2005, at 12:51 AM, Paolo Bonzini wrote:
Dale Johannesen wrote:
With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code like
    double d = atof(foo);
    int i = d;
        call    atof
        fstpl   -8(%ebp)
        movsd   -8(%ebp), %xmm0
        cvttsd2si       %xmm0, %eax
(This is Linux, Darwin is similar.) I think the difficulty is that for

(set (reg/v:DF 58 [ d ]) (reg:DF 8 st)) 64 {*movdf_nointeger}

Try the attached patch. It gave a 3% speedup on -mfpmath=sse for tramp3d. Richard Henderson asked for SPEC testing, then it may go in.

Thanks. That's progress; the cost computation in regclass now figures out that memory
is that fastest place to put R58:

  Register 58 costs: AD_REGS:87000 Q_REGS:87000 NON_Q_REGS:87000
INDEX_REGS:87000 LEGACY_REGS:87000 GENERAL_REGS:87000 FP_TOP_REG:49000
FP_SECOND_REG:50000 FLOAT_REGS:50000 SSE_REGS:50000 FP_TOP_SSE_REGS:75000
FP_SECOND_SSE_REGS:75000 FLOAT_SSE_REGS:75000 FLOAT_INT_REGS:87000
INT_SSE_REGS:91000 FLOAT_INT_SSE_REGS:91000
ALL_REGS:91000 MEM:40000

Unfortunately local-alloc insists on putting in a register anyway (ST(0) instead of an XMM,
but the end codegen is unchanged):

;; Register 58 in 8.

I think the RA may be missing the concept that memory might be faster than any possible register....
will dig further.

Reply via email to