https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116979
--- Comment #36 from Paul Caprioli <paul at hpkfft dot com> --- I cannot comment on your concerns regarding GCC's optimization pipeline as that is outside my knowledge. Would you agree it's OK to use mul_fma for complex multiplication if -ffp-contract=on ? Since mul_fma has better maximum relative normwise error than mul_mul_add (and neither algorithm is good componentwise), some developers might want it for that reason. I would suggest they should be able to get it without resorting to -ffast-math, or -funsafe-math-optimizations, or even -ffp-contract=fast. It was interesting to learn of the case in which FMA broke image filtering. I'm not sure this should be used as evidence against FMA in complex multiplication, at least in cases where the programmer is using std::complex to model the mathematical ideal of complex numbers. Is it not also possible that switching from mul_fma to mul_mul_add could break numerical stability of an overall numerical computation? Again, the latter algorithm is provably worse normwise. Two more random thoughts: Complex multiplication often occurs as part of a dot product or matrix multiplication. In that case, a desire for higher accuracy might motivate the use of a compensated dot product rather than the use of compensation within each individual complex multiply. In other words, Kahan's algorithm is applied to the entire summation rather than to each individual complex multiply. So, a hypothetical cmulha() function as we discussed earlier may be less useful than I had initially imagined. AVX512 (and AVX10.2) does not have VADDSUBPS or VADDSUBPD instructions. The ISAs do have VFMADDSUB*. So, SIMD code generation for mul_mul_add in 512 bits is awkward compared to mul_fma. For whatever it's worth, x86 intends code to use mul_fma for complex multiply.
