[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

ramana at gcc dot gnu.org Fri, 10 Oct 2014 06:59:27 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503


Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2014-10-10
     Ever confirmed|0                           |1

--- Comment #7 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---

(In reply to Wilco from comment #6)
> I ran the assembler examples on A57 hardware with identical input. The FMADD
> code is ~20% faster irrespectively of the size of the input. This is not a
> surprise given that the FMADD latency is lower than the FADD and FMUL
> latency.
> 
> The alignment of the loop or scheduling don't matter at all as the FMADD
> latency dominates by far - with serious optimization this code could run 4-5
> times as fast and would only be limited by memory bandwidth on datasets
> larger than L2.
> 
> So this particular example shows issues in LLVM, not in GCC.

The difference as to why LLVM puts out an fma vs we don't is probably because
of default language standards. GCC defaults to GNU89 while LLVM defaults to
C99. If you used -std=c99 with GCC as well you'd get the same sequence as LLVM.

As Evandro doesn't mention flags it's hard to say whether there really is a
problem here or not.

I only know of a separate gotcha with fmadds which is unfortunate but that's
not relevant to this discussion. 

http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/200282

This probably needs more analysis than the current state.

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

Reply via email to