On Oct 15 2012, Iliev, Hristo wrote:

Numeric differences are to be expected with parallel applications. The basic reason for that is that on many architectures floating-point operations are performed using higher internal precision than that of the arguments and only the final result is rounded back to the lower output precision. When performing the same operation in parallel, intermediate results are communicated using the lower precision and thus the final result could differ. ...

Not quite.  That's ONE reason.

You could try to "cure" this (non-problem) by telling your compiler to not
use higher precision for intermediate results.

But it wouldn't help if the problem is the other reason, which is that
floating-point arithmetic is not associative.  That means that the actual
order of the operations makes a difference to the final result, and that
is (correctly) unspecified for MPI_Reduce.

I have had long arguments with people who believe in deterministic
floating-point (i.e. that consistency implies correctness), but the
actual fact is that it is an unavoidable problem with parallel use of
floating-point or indeed any serious numeric optimisation.

So the summary is that anyone doing floating-point work has to learn
to live with it.  Any traditional book on numerical programming (i.e.
before 1980) will take that for granted.


Regards,
Nick Maclaren.






Reply via email to