https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88705

Devin Hussey <husseydevin at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|INVALID                     |---

--- Comment #3 from Devin Hussey <husseydevin at gmail dot com> ---
Well, it is still not as efficient as it should be.

This would be the code that only uses VFP:

fmul:
        vadd.f32        s0, s0, s4
        vadd.f32        s1, s1, s5
        vadd.f32        s2, s2, s6
        vadd.f32        s3, s3, s7
        bx      lr

dmul:
        vadd.f64        d0, d0, d2
        vadd.f64        d1, d1, d3
        bx      lr

There is no need to keep swapping in and out of NEON registers.

Reply via email to