On Tuesday, 1 July 2014 at 17:00:30 UTC, Walter Bright wrote:
On 7/1/2014 3:26 AM, Don wrote:
Yes, it's complicated. The interesting thing is that there are no 128 bit registers. The temporaries exist only while the FMA operation is in progress. You cannot even preserve them between consecutive FMA operations.

An important consequence is that allowing intermediate calculations to be performed at higher precision than the operands, is crucial, and applies outside
of x86. This is something we've got right.

But it's not possible to say that "the intermediate calculations are done at the precision of 'real'". This is the semantics which I think we currently have
wrong. Our model is too simplistic.

On modern x86, calculations on float operands may have intermediate calculations done at only 32 bits (if using straight SSE), 80 bits (if using x87), or 64 bits (if using float FMA). And for double operands, they may be 64 bits, 80 bits, or
128 bits.
Yet, in the FMA case, non-FMA operations will be performed at lower precision. It's entirely possible for all three intermediate precisions to be active at the
same time!

I'm not sure that we need to change anything WRT code generation. But I think our style recommendations aren't quite right. And we have at least one missing
primitive operation (discard all excess precision).

What do you recommend?

It needs some thought. But some things are clear.

Definitely, discarding excess precision is a crucial operation. C and C++ tried to do it implicitly with "sequence points", but that kills optimisation possibilities so much that compilers don't respect it. I think it's actually quite similar to write barriers in multithreaded programming. C got it wrong, but we're currently in an even worse situation because it doesn't necessarily happen at all.

We need a builtin operation -- and not in std.math, this is as crucial as addition, and it's purely a signal to the optimiser. It's very similar to a casting operation. I wonder if we can do it as an attribute? .exact_float, .restrict_float, .force_float, .spill_float or something similar?

With D's current floating point semantics, it's actually impossible to write correct floating-point code. Everything that works right now, is technically only working by accident.

But if we get this right, we can have very nice semantics for when things like FMA are allowed to happen -- essentially the optimiser would have free reign between these explicit discard_excess_precision sequence points.



After that, I'm a bit less sure. It does seem to me that we're trying to make 'real' do double-duty as meaning both "x87 80 bit floating-point number" and also as something like a storage class that is specific to double: "compiler, don't discard excess precision". Which are both useful concepts, but aren't identical. The two concepts did coincide on x86 32-bit, but they're different on x86-64. I think we need to distinguish the two.

Ideally, I think we'd have a __real80 type. On x86 32 bit this would be the same as 'real', while on x86-64 __real80 would be available but probably 'real' would alias to double. But I'm a lot less certain about this.

Reply via email to