Re: std.math performance (SSE vs. real)

Don via Digitalmars-d Tue, 01 Jul 2014 03:31:31 -0700

On Monday, 30 June 2014 at 16:54:17 UTC, Walter Bright wrote:

On 6/30/2014 12:20 AM, Don wrote:
What I think is highly likely is that it will only have legacysupport, withsuch awful performance that it never makes sense to use them.For example, thespeed of 80-bit and 64-bit calculations in x87 used to beidentical. But onrecent Intel CPUs, the 80-bit operations run at half the speedof the 64 bit
operations. They are already partially microcoded.
For me, a stronger argument is that you can get *higher*precision usingdoubles, in many cases. The reason is that FMA gives you anintermediate valuewith 128 bits of precision; it's available in SIMD but not onx87.
So, if we want to use the highest precision supported by thehardware, that does
*not* mean we should always use 80 bits.
I've experienced this in CTFE, where the calculations arecurrently done in 80bits, I've seen cases where the 64-bit runtime results weremore accurate,because of those 128 bit FMA temporaries. 80 bits are notenough!!
I did not know this. It certainly adds another layer of nuance- as the higher level of precision will only apply as long asone can keep the value in a register.

Yes, it's complicated. The interesting thing is that there are no128 bit registers. The temporaries exist only while the FMAoperation is in progress. You cannot even preserve them betweenconsecutive FMA operations.

An important consequence is that allowing intermediatecalculations to be performed at higher precision than theoperands, is crucial, and applies outside of x86. This issomething we've got right.

But it's not possible to say that "the intermediate calculationsare done at the precision of 'real'". This is the semantics whichI think we currently have wrong. Our model is too simplistic.

On modern x86, calculations on float operands may haveintermediate calculations done at only 32 bits (if using straightSSE), 80 bits (if using x87), or 64 bits (if using float FMA).And for double operands, they may be 64 bits, 80 bits, or 128bits.Yet, in the FMA case, non-FMA operations will be performed atlower precision.It's entirely possible for all three intermediate precisions tobe active at the same time!

I'm not sure that we need to change anything WRT code generation.But I think our style recommendations aren't quite right. And wehave at least one missing primitive operation (discard all excessprecision).

Re: std.math performance (SSE vs. real)

Reply via email to