On 5/6/24 23:35, Toon Moene wrote:

On 5/6/24 23:32, Andrew Pinski wrote:

Did you test x86_64 with -march=native (or with -mfma) or just -O3?
The reason why I am asking is aarch64 includes FMA by default while
x86_64 does not.
Most recent x86_64 includes an FMA instruction but since the base ISA
does not include it, it is not enabled by default.
I am suspect the aarch64 "excessive exceeding the threshold for
errors" are all caused by the more use of FMA rather than anything
else.

Aah, I forgot to include that tidbit, because its readily apparent from the full logs - I compiled with *just* -O3.

Thanks,


OK, perhaps on the aarch64 I need the following option to make the comparison fair:

‘rdma’

Enable Round Double Multiply Accumulate instructions. This is on by default for -march=armv8.1-a.

I.e., -mno-rdma

(I hope that's correct - I'll will try that when the Sun rises again and I have some power to run the AArch64 machine ...).

I must say I didn't expected this - the discussion on the "Intel" side was always that the fact that fused multiply-add instruction didn't express the "real computations" expressed by the program meant that they were evil and therefore had to be hidden behind some special compiler option that made it very clear that those instruction were evil.

Again, thanks to point me to the difference (in philosophy, if not math) between to the two continents (i.e., the Americas and Europe's - before Brexit - England :-)

Kind regards,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands

Reply via email to