https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88013
--- Comment #6 from krux <hoganmeier at gmail dot com> --- -mfloat-abi=hard was missing indeed. It's a pity there's no warning like when trying to use the intrinsics. Still I see a lot more instructions, maybe that got fixed after v7.2? https://godbolt.org/z/OWzgXi vld3.8 {d16, d18, d20}, [r3] add ip, r3, #24 add lr, lr, #1 add r3, r3, #48 cmp lr, r5 vld3.8 {d17, d19, d21}, [ip] vmovl.u8 q5, d16 vmovl.u8 q15, d18 vmovl.u8 q11, d17 vmovl.u8 q4, d19 vmovl.u8 q0, d20 vmovl.u8 q1, d21 vmull.s16 q6, d10, d28 vmull.s16 q3, d22, d28 vmull.s16 q2, d30, d26 vmull.s16 q11, d23, d29 vmull.s16 q15, d31, d27 vmull.s16 q5, d11, d29 vmull.s16 q9, d8, d26 vmull.s16 q8, d9, d27 vadd.i32 q2, q6, q2 vadd.i32 q10, q5, q15 vadd.i32 q9, q3, q9 vmull.s16 q15, d0, d24 vadd.i32 q8, q11, q8 vmull.s16 q3, d2, d24 vmull.s16 q0, d1, d25 vmull.s16 q1, d3, d25 vadd.i32 q11, q2, q15 vadd.i32 q9, q9, q3 vadd.i32 q10, q10, q0 vadd.i32 q8, q8, q1 vshr.s32 q11, q11, #8 vshr.s32 q9, q9, #8 vshr.s32 q10, q10, #8 vshr.s32 q8, q8, #8 vmovn.i32 d30, q11 vmovn.i32 d31, q10 vmovn.i32 d20, q9 vmovn.i32 d21, q8 vmovn.i16 d16, q15 vmovn.i16 d17, q10 vst1.8 {q8}, [r4]