Re: std.math performance (SSE vs. real)

Don via Digitalmars-d Wed, 02 Jul 2014 02:02:18 -0700

On Tuesday, 1 July 2014 at 17:00:30 UTC, Walter Bright wrote:

On 7/1/2014 3:26 AM, Don wrote:
Yes, it's complicated. The interesting thing is that there areno 128 bitregisters. The temporaries exist only while the FMA operationis in progress.You cannot even preserve them between consecutive FMAoperations.
An important consequence is that allowing intermediatecalculations to beperformed at higher precision than the operands, is crucial,and applies outside
of x86. This is something we've got right.
But it's not possible to say that "the intermediatecalculations are done at theprecision of 'real'". This is the semantics which I think wecurrently have
wrong. Our model is too simplistic.
On modern x86, calculations on float operands may haveintermediate calculationsdone at only 32 bits (if using straight SSE), 80 bits (ifusing x87), or 64 bits(if using float FMA). And for double operands, they may be 64bits, 80 bits, or
128 bits.
Yet, in the FMA case, non-FMA operations will be performed atlower precision.It's entirely possible for all three intermediate precisionsto be active at the
same time!
I'm not sure that we need to change anything WRT codegeneration. But I thinkour style recommendations aren't quite right. And we have atleast one missing
primitive operation (discard all excess precision).
What do you recommend?


It needs some thought. But some things are clear.

Definitely, discarding excess precision is a crucial operation. Cand C++ tried to do it implicitly with "sequence points", butthat kills optimisation possibilities so much that compilersdon't respect it. I think it's actually quite similar to writebarriers in multithreaded programming. C got it wrong, but we'recurrently in an even worse situation because it doesn'tnecessarily happen at all.

We need a builtin operation -- and not in std.math, this is ascrucial as addition, and it's purely a signal to the optimiser.It's very similar to a casting operation. I wonder if we can doit as an attribute? .exact_float, .restrict_float, .force_float,.spill_float or something similar?

With D's current floating point semantics, it's actuallyimpossible to write correct floating-point code. Everything thatworks right now, is technically only working by accident.

But if we get this right, we can have very nice semantics forwhen things like FMA are allowed to happen -- essentially theoptimiser would have free reign between these explicitdiscard_excess_precision sequence points.

After that, I'm a bit less sure. It does seem to me that we'retrying to make 'real' do double-duty as meaning both "x87 80 bitfloating-point number" and also as something like a storage classthat is specific to double: "compiler, don't discard excessprecision". Which are both useful concepts, but aren't identical.The two concepts did coincide on x86 32-bit, but they'redifferent on x86-64. I think we need to distinguish the two.

Ideally, I think we'd have a __real80 type. On x86 32 bit thiswould be the same as 'real', while on x86-64 __real80 would beavailable but probably 'real' would alias to double. But I'm alot less certain about this.

Re: std.math performance (SSE vs. real)

Reply via email to