On Tuesday, 5 January 2021 at 03:20:16 UTC, Walter Bright wrote:
On 1/4/2021 4:11 AM, 9il wrote:
[...]
The reason those switches are provided is because the
write/read is a performance hog.
D provides a couple functions in druntime which guarantee
rounding intermediate values to float/double precision. Those
can be used as required. This is better than a compiler switch
because having compiler switches that influence floating point
results is poor design.
> Since C99 the default x87 behavior is precise.
Not entirely:
float f(float a, float b) {
float d = (a + b) - b;
return d;
}
f:
sub esp, 4
fld DWORD PTR [esp+12]
fld st(0)
fadd DWORD PTR [esp+8]
[no write/read to memory here, so no round to float]
fsubrp st(1), st
fstp DWORD PTR [esp]
fld DWORD PTR [esp]
add esp, 4
ret
In any case, let's try your example
https://cpp.godbolt.org/z/7sa8dP with dmd for 32 bits:
push EAX
push EAX
fld float ptr 010h[ESP]
fadd float ptr 0Ch[ESP]
fstp float ptr [ESP] // there's the write
fld float ptr [ESP] // there's the read!
fsub float ptr 0Ch[ESP]
fstp float ptr 4[ESP] // the write
fld float ptr 4[ESP] // the read
add ESP,8
ret 8
It's semantically equivalent to the godbolt asm you posted.
I can't reproduce the same DMD output as you.
DMD with flags -m32 -O generates
https://cpp.godbolt.org/z/9b4e9K
assume CS:.text._D7example1fFffZf
push EBP
mov EBP,ESP
fld float ptr 0Ch[ESP]
fadd float ptr 8[EBP]
fsub float ptr 8[EBP]
pop EBP
ret 8
add [EAX],AL
add [EAX],AL
As you can see there are no write-read op codes.
DMD with flag -m32 generates
https://cpp.godbolt.org/z/GMGMra
assume CS:.text._D7example1fFffZf
push EBP
mov EBP,ESP
sub ESP,018h
movss XMM0,0Ch[EBP]
movss XMM1,8[EBP]
addss XMM0,XMM1
movss -8[EBP],XMM0
subss XMM0,XMM1
movss -4[EBP],XMM0
movss -018h[EBP],XMM0
fld float ptr -018h[EBP]
leave
ret 8
add [EAX],AL
It just uses SSE, which I think a good way to go, haha. Probably
if no one has raised this bug then all real-world DMD targets
have at least SSE support.
The only D compiler that uses excess precision is DMD and only if
-O flag is passed. The same example compiled with GDC uses
write-read codes. LDC uses SSE codes.
As for C, it allows an intuitive built-in way to work with exact
precision when an assignment works like a directive to use exact
precision for the expression result, unlike D. It doesn't cover
all cases but an intuitive and very easy way to do things the
right way.