On 7/1/2014 3:26 AM, Don wrote:
Yes, it's complicated. The interesting thing is that there are no 128 bit
registers. The temporaries exist only while the FMA operation is in progress.
You cannot even preserve them between consecutive FMA operations.

An important consequence is that allowing intermediate calculations to be
performed at higher precision than the operands, is crucial, and applies outside
of x86. This is something we've got right.

But it's not possible to say that "the intermediate calculations are done at the
precision of 'real'". This is the semantics which I think we currently have
wrong. Our model is too simplistic.

On modern x86, calculations on float operands may have intermediate calculations
done at only 32 bits (if using straight SSE), 80 bits (if using x87), or 64 bits
(if using float FMA). And for double operands, they may be 64 bits, 80 bits, or
128 bits.
Yet, in the FMA case, non-FMA operations will be performed at lower precision.
It's entirely possible for all three intermediate precisions to be active at the
same time!

I'm not sure that we need to change anything WRT code generation. But I think
our style recommendations aren't quite right. And we have at least one missing
primitive operation (discard all excess precision).

What do you recommend?

Reply via email to