On Tuesday, 5 January 2021 at 03:20:16 UTC, Walter Bright wrote:
On 1/4/2021 4:11 AM, 9il wrote:
[...]
The reason those switches are provided is because the write/read is a performance hog.

D provides a couple functions in druntime which guarantee rounding intermediate values to float/double precision. Those can be used as required. This is better than a compiler switch because having compiler switches that influence floating point results is poor design.

> Since C99 the default x87 behavior is precise.

Not entirely:

 float f(float a, float b) {
    float d = (a + b) - b;
    return d;
 }

 f:
        sub     esp, 4
        fld     DWORD PTR [esp+12]
        fld     st(0)
        fadd    DWORD PTR [esp+8]
        [no write/read to memory here, so no round to float]
        fsubrp  st(1), st
        fstp    DWORD PTR [esp]
        fld     DWORD PTR [esp]
        add     esp, 4
        ret

In any case, let's try your example https://cpp.godbolt.org/z/7sa8dP with dmd for 32 bits:

                push    EAX
                push    EAX
                fld     float ptr 010h[ESP]
                fadd    float ptr 0Ch[ESP]
                fstp    float ptr [ESP]     // there's the write
                fld     float ptr [ESP]     // there's the read!
                fsub    float ptr 0Ch[ESP]
                fstp    float ptr 4[ESP]    // the write
                fld     float ptr 4[ESP]    // the read
                add     ESP,8
                ret     8

It's semantically equivalent to the godbolt asm you posted.

I can't reproduce the same DMD output as you.

DMD with flags -m32 -O generates

https://cpp.godbolt.org/z/9b4e9K
        assume  CS:.text._D7example1fFffZf
                push    EBP
                mov     EBP,ESP
                fld     float ptr 0Ch[ESP]
                fadd    float ptr 8[EBP]
                fsub    float ptr 8[EBP]
                pop     EBP
                ret     8
                add     [EAX],AL
                add     [EAX],AL

As you can see there are no write-read op codes.

DMD with flag -m32 generates

https://cpp.godbolt.org/z/GMGMra
        assume  CS:.text._D7example1fFffZf
                push    EBP
                mov     EBP,ESP
                sub     ESP,018h
                movss   XMM0,0Ch[EBP]
                movss   XMM1,8[EBP]
                addss   XMM0,XMM1
                movss   -8[EBP],XMM0
                subss   XMM0,XMM1
                movss   -4[EBP],XMM0
                movss   -018h[EBP],XMM0
                fld     float ptr -018h[EBP]
                leave
                ret     8
                add     [EAX],AL

It just uses SSE, which I think a good way to go, haha. Probably if no one has raised this bug then all real-world DMD targets have at least SSE support.

The only D compiler that uses excess precision is DMD and only if -O flag is passed. The same example compiled with GDC uses write-read codes. LDC uses SSE codes.

As for C, it allows an intuitive built-in way to work with exact precision when an assignment works like a directive to use exact precision for the expression result, unlike D. It doesn't cover all cases but an intuitive and very easy way to do things the right way.

Reply via email to