https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102522
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Denis Yaroshevskiy from comment #0) > ARM-V7 Neon has intrinsics like vmulq_n_u32 that are suppose to generate one > mul instruction. Read the outputed code again. you need to move the argument x which is currently in r0 into a SIMD register. GCC zeros out the other parts of the register just because. And then it does the multiple. vmov.i32 d7, #0 @ v2si // d7 = {0,0} vmov.32 d7[0], r0 // d7 = {x, 0} vmul.i32 q0, q0, d7[0] // q0 *= d7[0] (or rather q0 *= x)