Hi Evandro,

> I have however encountered precision issues with DF, namely some benchmarks 
> in the SPECfp CPU2000 suite would fail to validate. 

Accuracy is not an issue, the computation is extremely accurate. The issue is 
that your patch doesn't support sqrt(0.0) - it returns NaN rather than zero, 
and that causes the miscompares you're seeing. So support for the zero case 
should be added.

This would be a better expansion, supporting zero, and with lower latency than 
the current sequence:

    fcmp    s0, 0.0
    beq      zero
    frsqrte    s1, s0
    fmul    s2, s1, s1
    frsqrts    s2, s0, s2
    fmul    s1, s1, s2
    fmul    s2, s1, s1
    fmul   s1, s0, s1
    frsqrts    s2, s0, s2
    fmul    s0, s1, s2
zero:

For the vector variant you can't avoid the extra latency of an AND, but it 
should not be slower than it is today.

Cheers,
Wilco






Reply via email to