https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116979

--- Comment #36 from Paul Caprioli <paul at hpkfft dot com> ---
I cannot comment on your concerns regarding GCC's optimization pipeline as that
is outside my knowledge.

Would you agree it's OK to use mul_fma for complex multiplication if
-ffp-contract=on ?

Since mul_fma has better maximum relative normwise error than mul_mul_add (and
neither algorithm is good componentwise), some developers might want it for
that reason.  I would suggest they should be able to get it without resorting
to -ffast-math, or -funsafe-math-optimizations, or even -ffp-contract=fast.

It was interesting to learn of the case in which FMA broke image filtering. 
I'm not sure this should be used as evidence against FMA in complex
multiplication, at least in cases where the programmer is using std::complex to
model the mathematical ideal of complex numbers.  Is it not also possible that
switching from mul_fma to mul_mul_add could break numerical stability of an
overall numerical computation?  Again, the latter algorithm is provably worse
normwise.

Two more random thoughts:

Complex multiplication often occurs as part of a dot product or matrix
multiplication.  In that case, a desire for higher accuracy might motivate the
use of a compensated dot product rather than the use of compensation within
each individual complex multiply.  In other words, Kahan's algorithm is applied
to the entire summation rather than to each individual complex multiply.  So, a
hypothetical cmulha() function as we discussed earlier may be less useful than
I had initially imagined.

AVX512 (and AVX10.2) does not have VADDSUBPS or VADDSUBPD instructions.  The
ISAs do have VFMADDSUB*.  So, SIMD code generation for mul_mul_add in 512 bits
is awkward compared to mul_fma.  For whatever it's worth, x86 intends code to
use mul_fma for complex multiply.

Reply via email to