On Monday, 30 June 2014 at 16:54:17 UTC, Walter Bright wrote:
On 6/30/2014 12:20 AM, Don wrote:
What I think is highly likely is that it will only have legacy
support, with
such awful performance that it never makes sense to use them.
For example, the
speed of 80-bit and 64-bit calculations in x87 used to be
identical. But on
recent Intel CPUs, the 80-bit operations run at half the speed
of the 64 bit
operations. They are already partially microcoded.
For me, a stronger argument is that you can get *higher*
precision using
doubles, in many cases. The reason is that FMA gives you an
intermediate value
with 128 bits of precision; it's available in SIMD but not on
x87.
So, if we want to use the highest precision supported by the
hardware, that does
*not* mean we should always use 80 bits.
I've experienced this in CTFE, where the calculations are
currently done in 80
bits, I've seen cases where the 64-bit runtime results were
more accurate,
because of those 128 bit FMA temporaries. 80 bits are not
enough!!
I did not know this. It certainly adds another layer of nuance
- as the higher level of precision will only apply as long as
one can keep the value in a register.
Yes, it's complicated. The interesting thing is that there are no
128 bit registers. The temporaries exist only while the FMA
operation is in progress. You cannot even preserve them between
consecutive FMA operations.
An important consequence is that allowing intermediate
calculations to be performed at higher precision than the
operands, is crucial, and applies outside of x86. This is
something we've got right.
But it's not possible to say that "the intermediate calculations
are done at the precision of 'real'". This is the semantics which
I think we currently have wrong. Our model is too simplistic.
On modern x86, calculations on float operands may have
intermediate calculations done at only 32 bits (if using straight
SSE), 80 bits (if using x87), or 64 bits (if using float FMA).
And for double operands, they may be 64 bits, 80 bits, or 128
bits.
Yet, in the FMA case, non-FMA operations will be performed at
lower precision.
It's entirely possible for all three intermediate precisions to
be active at the same time!
I'm not sure that we need to change anything WRT code generation.
But I think our style recommendations aren't quite right. And we
have at least one missing primitive operation (discard all excess
precision).