On 2 July 2014 09:53, Don via Digitalmars-d <digitalmars-d@puremagic.com> wrote: > On Tuesday, 1 July 2014 at 17:00:30 UTC, Walter Bright wrote: >> >> On 7/1/2014 3:26 AM, Don wrote: >>> >>> Yes, it's complicated. The interesting thing is that there are no 128 bit >>> registers. The temporaries exist only while the FMA operation is in >>> progress. >>> You cannot even preserve them between consecutive FMA operations. >>> >>> An important consequence is that allowing intermediate calculations to be >>> performed at higher precision than the operands, is crucial, and applies >>> outside >>> of x86. This is something we've got right. >>> >>> But it's not possible to say that "the intermediate calculations are done >>> at the >>> precision of 'real'". This is the semantics which I think we currently >>> have >>> wrong. Our model is too simplistic. >>> >>> On modern x86, calculations on float operands may have intermediate >>> calculations >>> done at only 32 bits (if using straight SSE), 80 bits (if using x87), or >>> 64 bits >>> (if using float FMA). And for double operands, they may be 64 bits, 80 >>> bits, or >>> 128 bits. >>> Yet, in the FMA case, non-FMA operations will be performed at lower >>> precision. >>> It's entirely possible for all three intermediate precisions to be active >>> at the >>> same time! >>> >>> I'm not sure that we need to change anything WRT code generation. But I >>> think >>> our style recommendations aren't quite right. And we have at least one >>> missing >>> primitive operation (discard all excess precision). >> >> >> What do you recommend? > > > It needs some thought. But some things are clear. > > Definitely, discarding excess precision is a crucial operation. C and C++ > tried to do it implicitly with "sequence points", but that kills > optimisation possibilities so much that compilers don't respect it. I think > it's actually quite similar to write barriers in multithreaded programming. > C got it wrong, but we're currently in an even worse situation because it > doesn't necessarily happen at all. > > We need a builtin operation -- and not in std.math, this is as crucial as > addition, and it's purely a signal to the optimiser. It's very similar to a > casting operation. I wonder if we can do it as an attribute? .exact_float, > .restrict_float, .force_float, .spill_float or something similar? > > With D's current floating point semantics, it's actually impossible to write > correct floating-point code. Everything that works right now, is technically > only working by accident. > > But if we get this right, we can have very nice semantics for when things > like FMA are allowed to happen -- essentially the optimiser would have free > reign between these explicit discard_excess_precision sequence points. >
Fixing this is the goal I assume. :) --- import std.stdio; void test(double x, double y) { double y2 = x + 1.0; if (y != y2) writeln("error"); // Prints 'error' under -O2 } void main() { immutable double x = .012; // Removing 'immutable' and it works. double y = x + 1.0; test(x, y); } --- > > After that, I'm a bit less sure. It does seem to me that we're trying to > make 'real' do double-duty as meaning both "x87 80 bit floating-point > number" and also as something like a storage class that is specific to > double: "compiler, don't discard excess precision". Which are both useful > concepts, but aren't identical. The two concepts did coincide on x86 32-bit, > but they're different on x86-64. I think we need to distinguish the two. > > Ideally, I think we'd have a __real80 type. On x86 32 bit this would be the > same as 'real', while on x86-64 __real80 would be available but probably > 'real' would alias to double. But I'm a lot less certain about this. There are flags for that in gdc: -mlong-double-64 -mlong-double-80 -mlong-double-128