https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88705
Devin Hussey <husseydevin at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |UNCONFIRMED Resolution|INVALID |--- --- Comment #3 from Devin Hussey <husseydevin at gmail dot com> --- Well, it is still not as efficient as it should be. This would be the code that only uses VFP: fmul: vadd.f32 s0, s0, s4 vadd.f32 s1, s1, s5 vadd.f32 s2, s2, s6 vadd.f32 s3, s3, s7 bx lr dmul: vadd.f64 d0, d0, d2 vadd.f64 d1, d1, d3 bx lr There is no need to keep swapping in and out of NEON registers.