https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88705
Bug ID: 88705 Summary: [ARM][Generic Vector Extensions] float32x4/float64x2 vector operator overloads scalarize on NEON Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: husseydevin at gmail dot com Target Milestone: --- For some reason, GCC scalarizes float32x4_t and float64x2_t on ARM32 NEON when using vector extensions. typedef float f32x4 __attribute__((vector_size(16))); typedef double f64x2 __attribute__((vector_size(16))); f32x4 fmul (f32x4 v1, f32x4 v2) { return v1 * v2; } f64x2 dmul (f64x2 v1, f64x2 v2) { return v1 * v2; } Expected output: arm-none-eabi-gcc (git commit 640647d4, not the latest) -O3 -S -march=armv7-a -mfloat-abi=hard -mfpu=neon fmul: vmul.f32 q0, q0, q1 bx lr dmul: vmul.f64 d1, d1, d3 vmul.f64 d0, d0, d2 bx lr Actual output: fmul: vmov.32 r3, d0[0] sub sp, sp, #16 vmov s12, r3 vmov.32 r3, d2[0] vmov s9, r3 vmov.32 r3, d0[1] vmul.f32 s12, s12, s9 vstr.32 s12, [sp] vmov s13, r3 vmov.32 r3, d2[1] vmov s10, r3 vmov.32 r3, d1[0] vmul.f32 s13, s13, s10 vstr.32 s13, [sp, #4] vmov s14, r3 vmov.32 r3, d1[1] vmov s15, r3 vmov.32 r3, d3[0] vmov s11, r3 vmov.32 r3, d3[1] vmul.f32 s14, s14, s11 vstr.32 s14, [sp, #8] vmov s0, r3 vmul.f32 s0, s15, s0 vstr.32 s0, [sp, #12] vld1.64 {d0-d1}, [sp:64] add sp, sp, #16 bx lr dmul: push {r4, r5, r6, r7} sub sp, sp, #96 vstr d0, [sp, #64] vstr d1, [sp, #72] vstr d2, [sp, #48] vstr d3, [sp, #56] vldr.64 d17, [sp, #64] vldr.64 d19, [sp, #48] vldr.64 d16, [sp, #72] vldr.64 d18, [sp, #56] vmul.f64 d17, d17, d19 vmul.f64 d16, d16, d18 vstr.64 d17, [sp, #32] ldrd r0, [sp, #32] mov r4, r0 mov r5, r1 strd r4, [sp] vstr.64 d16, [sp, #40] ldr r2, [sp, #40] ldr ip, [sp, #44] str r2, [sp, #8] str ip, [sp, #12] vld1.64 {d0-d1}, [sp:64] add sp, sp, #96 pop {r4, r5, r6, r7} bx lr The same thing happens for other operators. Oddly, according to Godbolt, GCC 4.5 actually did 32-bit float vectors properly, but regressed more and more each release starting in 4.6.