On Monday, 30 June 2014 at 16:54:17 UTC, Walter Bright wrote:
On 6/30/2014 12:20 AM, Don wrote:
What I think is highly likely is that it will only have legacy support, with such awful performance that it never makes sense to use them. For example, the speed of 80-bit and 64-bit calculations in x87 used to be identical. But on recent Intel CPUs, the 80-bit operations run at half the speed of the 64 bit
operations. They are already partially microcoded.

For me, a stronger argument is that you can get *higher* precision using doubles, in many cases. The reason is that FMA gives you an intermediate value with 128 bits of precision; it's available in SIMD but not on x87.

So, if we want to use the highest precision supported by the hardware, that does
*not* mean we should always use 80 bits.

I've experienced this in CTFE, where the calculations are currently done in 80 bits, I've seen cases where the 64-bit runtime results were more accurate, because of those 128 bit FMA temporaries. 80 bits are not enough!!

I did not know this. It certainly adds another layer of nuance - as the higher level of precision will only apply as long as one can keep the value in a register.

Yes, it's complicated. The interesting thing is that there are no 128 bit registers. The temporaries exist only while the FMA operation is in progress. You cannot even preserve them between consecutive FMA operations.

An important consequence is that allowing intermediate calculations to be performed at higher precision than the operands, is crucial, and applies outside of x86. This is something we've got right.

But it's not possible to say that "the intermediate calculations are done at the precision of 'real'". This is the semantics which I think we currently have wrong. Our model is too simplistic.

On modern x86, calculations on float operands may have intermediate calculations done at only 32 bits (if using straight SSE), 80 bits (if using x87), or 64 bits (if using float FMA). And for double operands, they may be 64 bits, 80 bits, or 128 bits. Yet, in the FMA case, non-FMA operations will be performed at lower precision. It's entirely possible for all three intermediate precisions to be active at the same time!

I'm not sure that we need to change anything WRT code generation. But I think our style recommendations aren't quite right. And we have at least one missing primitive operation (discard all excess precision).


Reply via email to