https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82074
Bug ID: 82074 Summary: [aarch64] vmlsq_f32 compiled into 2 instructions Product: gcc Version: 7.2.0 URL: https://godbolt.org/g/jWvmxS Status: UNCONFIRMED Keywords: TREE Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gcc.account at lemaitre dot re Target Milestone: --- Created attachment 42100 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42100&action=edit simplest example showing the bug On aarch64, the Neon intrinsic "vmlsq_f32" is compiled into: fneg v1.4s, v1.4s fmla v0.4s, v1.4s, v2.4s instead of: fmls v0.4s, v1.4s, v2.4s The same output is produced by all the following expressions: vmlsq_f32(a, b, c) a - b*c vsubq_f32(a, vmulq_f32(b, c)) The example has been compiled with gcc -O3 I tested on GCC 4.8.5, GCC 6.3.0 and GCC 7.2.0. All of them has the bug. The bug is also present at -O1, but with a slightly different output: fmul v1.4s, v1.4s, v2.4s fsub v0.4s, v0.4s, v1.4s If it can help, here is a godbolt link that shows the bug: https://godbolt.org/g/jWvmxS Sometimes, depending on the surrounding, it is successfully converted into the FMLS instruction, but never on the attached example.