On Tuesday, 1 July 2014 at 17:00:30 UTC, Walter Bright wrote:
On 7/1/2014 3:26 AM, Don wrote:
Yes, it's complicated. The interesting thing is that there are
no 128 bit
registers. The temporaries exist only while the FMA operation
is in progress.
You cannot even preserve them between consecutive FMA
operations.
An important consequence is that allowing intermediate
calculations to be
performed at higher precision than the operands, is crucial,
and applies outside
of x86. This is something we've got right.
But it's not possible to say that "the intermediate
calculations are done at the
precision of 'real'". This is the semantics which I think we
currently have
wrong. Our model is too simplistic.
On modern x86, calculations on float operands may have
intermediate calculations
done at only 32 bits (if using straight SSE), 80 bits (if
using x87), or 64 bits
(if using float FMA). And for double operands, they may be 64
bits, 80 bits, or
128 bits.
Yet, in the FMA case, non-FMA operations will be performed at
lower precision.
It's entirely possible for all three intermediate precisions
to be active at the
same time!
I'm not sure that we need to change anything WRT code
generation. But I think
our style recommendations aren't quite right. And we have at
least one missing
primitive operation (discard all excess precision).
What do you recommend?
It needs some thought. But some things are clear.
Definitely, discarding excess precision is a crucial operation. C
and C++ tried to do it implicitly with "sequence points", but
that kills optimisation possibilities so much that compilers
don't respect it. I think it's actually quite similar to write
barriers in multithreaded programming. C got it wrong, but we're
currently in an even worse situation because it doesn't
necessarily happen at all.
We need a builtin operation -- and not in std.math, this is as
crucial as addition, and it's purely a signal to the optimiser.
It's very similar to a casting operation. I wonder if we can do
it as an attribute? .exact_float, .restrict_float, .force_float,
.spill_float or something similar?
With D's current floating point semantics, it's actually
impossible to write correct floating-point code. Everything that
works right now, is technically only working by accident.
But if we get this right, we can have very nice semantics for
when things like FMA are allowed to happen -- essentially the
optimiser would have free reign between these explicit
discard_excess_precision sequence points.
After that, I'm a bit less sure. It does seem to me that we're
trying to make 'real' do double-duty as meaning both "x87 80 bit
floating-point number" and also as something like a storage class
that is specific to double: "compiler, don't discard excess
precision". Which are both useful concepts, but aren't identical.
The two concepts did coincide on x86 32-bit, but they're
different on x86-64. I think we need to distinguish the two.
Ideally, I think we'd have a __real80 type. On x86 32 bit this
would be the same as 'real', while on x86-64 __real80 would be
available but probably 'real' would alias to double. But I'm a
lot less certain about this.